[Bug target/107757] New: PPCLE: Inefficient vector constant creation

2022-11-18 Thread jens.seifert at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107757

Bug ID: 107757
   Summary: PPCLE: Inefficient vector constant creation
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jens.seifert at de dot ibm.com
  Target Milestone: ---

Due to the fact that vslw, vsld, vsrd, ... only use the modulo of bit width for
shifting, the combination with 0xFF..FF vector can be used to create vector
constants for:
vec_splats(-0.0) or vec_splats(1ULL << 31) and scalar -0.0
vec_splats(-0.0f) or vec_splats(1U << 31)
vec_splats((short)0x8000)
with only 2 2-cycle vector instructions.

Sample:

vector long long lsb64()
{
   return vec_splats(1LL);
}

creates:

lsb64():
.LCF5:
addi 2,2,.TOC.-.LCF5@l
addis 9,2,.LC12@toc@ha
addi 9,9,.LC12@toc@l
lvx 2,0,9
blr
.long 0
.byte 0,9,0,0,0,0,0,0

while:

vector long long lsb64_opt()
{
   vector long long a = vec_splats(~0LL);
   __asm__("vsrd %0,%0,%0":"=v"(a):"v"(a),"v"(a));
   return a;
}

creates:
lsb64_opt():
vspltisw 2,-1
vsrd 2,2,2
blr
.long 0
.byte 0,9,0,0,0,0,0,0

Re: nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel' (was: [MentorEmbedded/nvptx-tools] Match standard 'ld' "search" behavior (PR #38))

2022-11-18 Thread Tom de Vries via Gcc-patches

On 11/19/22 00:25, Thomas Schwinge wrote:

Hi!

Re
:

On 2022-11-18T11:05:23-0800, I wrote:

Actually, in GCC/nvptx target testing, this #38's commit 
886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820 "ld: Don't search for input files in 
'-L'directories" is generally causing linking to fail with:

```
error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1
```

I'm investigating.


OK to push the attached
GCC "nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'" to all
active GCC branches?  (... instead of having to restore this "blunder"
(do "search for input files in '-L'directories") in nvptx-tools...)



Hi,

yes, LGTM.

Thanks,
- Tom



Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-18 Thread Hongyu Wang via Gcc-patches
Hi,

Followed by the discussion in pr107602, -munroll-only-small-loops
Does not turns on/off -funroll-loops, and current check in
pass_rtl_unroll_loops::gate would cause -funroll-loops do not take
effect. Revert the change about targetm.loop_unroll_adjust and apply
the backend option change to strictly follow the rule that
-funroll-loops takes full control of loop unrolling, and
munroll-only-small-loops just change its behavior to unroll small size
loops.

Bootstrapped and regtested on x86-64-pc-linux-gnu.

Ok for trunk?

gcc/ChangeLog:

PR target/107602
* common/config/i386/i386-common.cc (ix86_optimization_table):
Enable loop unroll O2, disable -fweb and -frename-registers
by default.
* config/i386/i386-options.cc
(ix86_override_options_after_change):
Disable small loop unroll when funroll-loops enabled, reset
cunroll_grow_size when it is not explicitly enabled.
(ix86_option_override_internal): Call
ix86_override_options_after_change instead of calling
ix86_recompute_optlev_based_flags and ix86_default_align
separately.
* config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
factor if -munroll-only-small-loops enabled.
* loop-init.cc (pass_rtl_unroll_loops::gate): Do not enable
loop unrolling for -O2-speed.
(pass_rtl_unroll_loops::execute): Rmove
targetm.loop_unroll_adjust check.

gcc/testsuite/ChangeLog:

PR target/107602
* gcc.target/i386/pr86270.c: Add -fno-unroll-loops.
* gcc.target/i386/pr93002.c: Likewise.
---
 gcc/common/config/i386/i386-common.cc   |  8 ++
 gcc/config/i386/i386-options.cc | 34 ++---
 gcc/config/i386/i386.cc | 18 -
 gcc/loop-init.cc| 11 +++-
 gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
 6 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 6ce2a588adc..660a977b68b 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1808,7 +1808,15 @@ static const struct default_options 
ix86_option_optimization_table[] =
 /* The STC algorithm produces the smallest code at -Os, for x86.  */
 { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
   REORDER_BLOCKS_ALGORITHM_STC },
+
+/* Turn on -funroll-loops with -munroll-only-small-loops to enable small
+   loop unrolling at -O2.  */
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+/* Turns off -frename-registers and -fweb which are enabled by
+   funroll-loops.  */
+{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
+{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
 /* Turn off -fschedule-insns by default.  It tends to make the
problem with not enough registers even worse.  */
 { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e5c77f3a84d..bc1d36e36a8 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1838,8 +1838,37 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
*opts,
 void
 ix86_override_options_after_change (void)
 {
+  /* Default align_* from the processor table.  */
   ix86_default_align (_options);
+
   ix86_recompute_optlev_based_flags (_options, _options_set);
+
+  /* Disable unrolling small loops when there's explicit
+ -f{,no}unroll-loop.  */
+  if ((OPTION_SET_P (flag_unroll_loops))
+ || (OPTION_SET_P (flag_unroll_all_loops)
+&& flag_unroll_all_loops))
+{
+  if (!OPTION_SET_P (ix86_unroll_only_small_loops))
+   ix86_unroll_only_small_loops = 0;
+  /* Re-enable -frename-registers and -fweb if funroll-loops
+enabled.  */
+  if (!OPTION_SET_P (flag_web))
+   flag_web = flag_unroll_loops;
+  if (!OPTION_SET_P (flag_rename_registers))
+   flag_rename_registers = flag_unroll_loops;
+  /* -fcunroll-grow-size default follws -f[no]-unroll-loops.  */
+  if (!OPTION_SET_P (flag_cunroll_grow_size))
+   flag_cunroll_grow_size = flag_unroll_loops
+|| flag_peel_loops
+|| optimize >= 3;
+}
+  else
+{
+  if (!OPTION_SET_P (flag_cunroll_grow_size))
+   flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
+}
+
 }
 
 /* Clear stack slot assignments remembered from previous functions.
@@ -2351,7 +2380,7 @@ ix86_option_override_internal (bool main_args_p,
 
   set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
 
-  ix86_recompute_optlev_based_flags (opts, opts_set);
+  ix86_override_options_after_change ();
 
   ix86_tune_cost = processor_cost_table[ix86_tune];
   

[Bug c/107756] New: Change in sizeof(enum) with -std=gnu11 breaks Linux kernel code compilation (PR c/36113 change regression)

2022-11-18 Thread macro at orcam dot me.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107756

Bug ID: 107756
   Summary: Change in sizeof(enum) with -std=gnu11 breaks Linux
kernel code compilation (PR c/36113 change regression)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: macro at orcam dot me.uk
CC: jsm28 at gcc dot gnu.org
  Target Milestone: ---
Target: mips-linux-gnu

Created attachment 53929
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53929=edit
Preprocessor output from libahci.c; use lzip to decompress

The fix for PR c/36113 has added support for C2x enums wider than int,
but it seems to affect code built for earlier standards, such as with
`-std=gnu11' and libahci.c code from Linux kernel, which causes an
assertion failure with the `mips-linux-gnu' (o32) target as from commit
3b3083a598ca ("c: C2x enums wider than int [PR36113]"):

In file included from :
drivers/ata/libahci.c: In function 'ahci_led_store':
././include/linux/compiler_types.h:357:45: error: call to
'__compiletime_assert_309' declared with attribute error: BUILD_BUG_ON failed:
sizeof(_s) > sizeof(long)
  357 | _compiletime_assert(condition, msg, __compiletime_assert_,
__COUNTER__)
  | ^
././include/linux/compiler_types.h:338:25: note: in definition of macro
'__compiletime_assert'
  338 | prefix ## suffix();
\
  | ^~
././include/linux/compiler_types.h:357:9: note: in expansion of macro
'_compiletime_assert'
  357 | _compiletime_assert(condition, msg, __compiletime_assert_,
__COUNTER__)
  | ^~~
./include/linux/build_bug.h:39:37: note: in expansion of macro
'compiletime_assert'
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
  | ^~
./include/linux/build_bug.h:50:9: note: in expansion of macro
'BUILD_BUG_ON_MSG'
   50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  | ^~~~
./include/linux/nospec.h:58:9: note: in expansion of macro 'BUILD_BUG_ON'
   58 | BUILD_BUG_ON(sizeof(_s) > sizeof(long));   
\
  | ^~~~
drivers/ata/libahci.c:1198:23: note: in expansion of macro 'array_index_nospec'
 1198 | pmp = array_index_nospec(pmp, EM_MAX_SLOTS);
  |   ^~

with the compiler invocation as follows:

mips-linux-gnu-gcc -Wp,-MMD,drivers/ata/.libahci.o.d  -nostdinc
-I./arch/mips/include -I./arch/mips/include/generated  -I./include
-I./arch/mips/include/uapi -I./arch/mips/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include
./include/linux/compiler_types.h -D__KERNEL__
-DVMLINUX_LOAD_ADDRESS=0x8010 -DLINKER_LOAD_ADDRESS=0x8010
-DDATAOFFSET=0 -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes
-Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE
-Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type
-Wno-format-security -std=gnu11 -mno-check-zero-division -mabi=32 -G 0
-mno-abicalls -fno-pic -pipe -msoft-float -DGAS_HAS_SET_HARDFLOAT
-Wa,-msoft-float -ffreestanding -EB -fno-stack-check
-Wa,-mno-fix-loongson3-llsc -march=mips32r2 -Wa,--trap
-DTOOLCHAIN_SUPPORTS_VIRT -DTOOLCHAIN_SUPPORTS_XPA -DTOOLCHAIN_SUPPORTS_CRC
-DTOOLCHAIN_SUPPORTS_DSP -DTOOLCHAIN_SUPPORTS_GINV
-I./arch/mips/include/asm/mach-malta -I./arch/mips/include/asm/mach-generic
-fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow
-Wno-address-of-packed-member -O2 -fno-allow-store-data-races
-Wframe-larger-than=2048 -fno-stack-protector -Wno-main
-Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-dangling-pointer
-fomit-frame-pointer -fno-stack-clash-protection -Wdeclaration-after-statement
-Wvla -Wno-pointer-sign -Wcast-function-type -Wno-stringop-truncation
-Wno-stringop-overflow -Wno-restrict -Wno-maybe-uninitialized
-Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -fno-strict-overflow
-fno-stack-check -fconserve-stack -Werror=date-time
-Werror=incompatible-pointer-types -Werror=designated-init
-Wno-packed-not-aligned-DKBUILD_MODFILE='"drivers/ata/libahci"'
-DKBUILD_BASENAME='"libahci"' -DKBUILD_MODNAME='"libahci"'
-D__KBUILD_MODNAME=kmod_libahci -c -o drivers/ata/libahci.o
drivers/ata/libahci.c

This code builds just fine with earlier GCC checkouts.

I have attached preprocessor output from libahci.c.

Re: [PATCH] c++: cache the normal form of a concept-id

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 16:43, Patrick Palka wrote:

We already cache the overall normal form of a declaration's constraints
under the assumption that it can't change over the translation unit.
But if we have two constrained declarations such as

   template void f() requires expensive && A;
   template void g() requires expensive && B;

then despite this high-level caching we'd still redundantly have to
expand the concept-id expensive twice, once during normalization of
f's constraints and again during normalization of g's.  Ideally, we'd
reuse the previously computed normal form of expensive the second
time around.

To that end this patch introduces an intermediate layer of caching
during constraint normalization -- caching of the normal form of a
concept-id -- that sits between our high-level caching of the overall
normal form of a declaration's constraints and our low-level caching of
each individual atomic constraint.

It turns out this caching generalizes some ad-hoc caching of the normal
form of concept definition (which is equivalent to the normal form of
the concept-id C where gtargs are C's generic arguments) so
this patch unifies the caching accordingly.

This change improves compile time/memory usage for e.g. the libstdc++
test std/ranges/adaptors/join.cc by 10%/5%.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


Hmm, if we cache at this level, do we still also need to cache the full 
normal form of the decl's constraints?


Exploring that doesn't seem like stage 3 material, though.  The patch is OK.


gcc/cp/ChangeLog:

* constraint.cc (struct norm_entry): Define.
(struct norm_hasher): Define.
(norm_cache): Define.
(normalize_concept_check): Add function comment.  Cache the
result of concept-id normalization.  Canonicalize generic
arguments as NULL_TREE.  Don't coerce arguments unless
substitution occurred.
(normalize_concept_definition): Simplify.  Use norm_cache
instead of ad-hoc caching.
---
  gcc/cp/constraint.cc | 94 ++--
  1 file changed, 82 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a113d3e269e..c9740b1ec78 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -698,6 +698,40 @@ normalize_logical_operation (tree t, tree args, tree_code 
c, norm_info info)
return build2 (c, ci, t0, t1);
  }
  
+/* Data types and hash functions for caching the normal form of a concept-id.

+   This essentially memoizes calls to normalize_concept_check.  */
+
+struct GTY((for_user)) norm_entry
+{
+  /* The CONCEPT_DECL of the concept-id.  */
+  tree tmpl;
+  /* The arguments of the concept-id.  */
+  tree args;
+  /* The normal form of the concept-id.  */
+  tree norm;
+};
+
+struct norm_hasher : ggc_ptr_hash
+{
+  static hashval_t hash (norm_entry *t)
+  {
+hashval_t hash = iterative_hash_template_arg (t->tmpl, 0);
+hash = iterative_hash_template_arg (t->args, hash);
+return hash;
+  }
+
+  static bool equal (norm_entry *t1, norm_entry *t2)
+  {
+return t1->tmpl == t2->tmpl
+  && template_args_equal (t1->args, t2->args);
+  }
+};
+
+static GTY((deletable)) hash_table *norm_cache;
+
+/* Normalize the concept check CHECK where ARGS are the
+   arguments to be substituted into CHECK's arguments.  */
+
  static tree
  normalize_concept_check (tree check, tree args, norm_info info)
  {
@@ -720,24 +754,53 @@ normalize_concept_check (tree check, tree args, norm_info 
info)
  targs = tsubst_template_args (targs, args, info.complain, info.in_decl);
if (targs == error_mark_node)
  return error_mark_node;
+  if (template_args_equal (targs, generic_targs_for (tmpl)))
+/* Canonicalize generic arguments as NULL_TREE, as an optimization.  */
+targs = NULL_TREE;
  
/* Build the substitution for the concept definition.  */

tree parms = TREE_VALUE (DECL_TEMPLATE_PARMS (tmpl));
-  /* Turn on template processing; coercing non-type template arguments
- will automatically assume they're non-dependent.  */
++processing_template_decl;
-  tree subst = coerce_template_parms (parms, targs, tmpl, tf_none);
+  if (targs && args)
+/* If substitution occurred, coerce the resulting arguments.  */
+targs = coerce_template_parms (parms, targs, tmpl, tf_none);
--processing_template_decl;
-  if (subst == error_mark_node)
+  if (targs == error_mark_node)
  return error_mark_node;
  
+  if (!norm_cache)

+norm_cache = hash_table::create_ggc (31);
+  norm_entry entry = {tmpl, targs, NULL_TREE};
+  norm_entry **slot = nullptr;
+  hashval_t hash = 0;
+  if (!info.generate_diagnostics ())
+{
+  /* If we're not diagnosing, cache the normal form of the
+substituted concept-id.  */
+  hash = norm_hasher::hash ();
+  slot = norm_cache->find_slot_with_hash (, hash, INSERT);
+  if (*slot)
+   return (*slot)->norm;
+}
+
/* The concept may 

Re: [PATCH] constexprify some tree variables

2022-11-18 Thread Andrew Pinski via Gcc-patches
On Fri, Nov 18, 2022 at 12:06 PM Jeff Law via Gcc-patches
 wrote:
>
>
> On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > Since we use C++11 by default now, we can
> > use constexpr for some const decls in tree-core.h.
> >
> > This patch does that and it allows for better optimizations
> > of GCC code with checking enabled and without LTO.
> >
> > For an example generic-match.cc compiling is speed up due
> > to the less number of basic blocks and less debugging info
> > produced. I did not check the speed of compiling the same source
> > but rather the speed of compiling the old vs new sources here
> > (but with the same compiler base).
> >
> > The small slow down in the parsing of the arrays in each TU
> > is migrated by a speed up in how much code/debugging info
> > is produced in the end.
> >
> > Note I looked at generic-match.cc since it is one of the
> > compiling sources which causes parallel building to stall and
> > I wanted to speed it up.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > Or should this wait until GCC 13 branches off?
> >
> > gcc/ChangeLog:
> >
> >   PR middle-end/14840
> >   * tree-core.h (tree_code_type): Constexprify
> >   by including all-tree.def.
> >   (tree_code_length): Likewise.
> >   * tree.cc (tree_code_type): Remove.
> >   (tree_code_length): Remove.
>
> I would have preferred this a week ago :-)   And if it was just
> const-ifying, I'd ACK it without hesitation.

Yes I know which is why I am ok with waiting for GCC 14 really. I
decided to try to clear out some of the old bug reports assigned to
myself and this one was one of the oldest and also one of the easiest
to do.

>
> Can you share any of the build-time speedups you're seeing, even if
> they're not perfect.  It'd help to get a sense of the potential gain
> here and whether or not there's enough gain to gate it into gcc-13 or
> have it wait for gcc-14.
>
>
> And if we can improve the compile-time of the files generated by
> match.pd, that's a win.  It's definitely a serialization point -- it
> becomes *painfully* obvious when doing a bootstrap using qemu, when that
> file takes 1-2hrs after everything else has finished.

I recorded some of the timings in the bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14840#c14

Summary is using the same compiler as a base, compiling
generic-match.cc is now ~7% faster.
I have not looked into why but I can only assume it is due to less
debug info and less basic blocks.
I assume without checking enabled (or rather release checking) on the
sources, I can only assume the speedup is
not going to be seen. Most of the constant reads are in the checking
part of the code.

Thanks,
Andrew Pinski


>
>
> Jeff


Re: [PATCH RFA] libstdc++: add experimental Contracts support

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 13:17, Jonathan Wakely wrote:

On 03/11/22 15:57 -0400, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.  OK for trunk?

-- >8 --

This patch adds the library support for the experimental C++ Contracts
implementation.  This now consists only of a default definition of the
violation handler, which users can override through defining their own
version.  To avoid ABI stability problems with libstdc++.so this is 
added to
a separate -lstdc++exp static library, which the driver knows to add 
when it

sees -fcontracts.

libstdc++-v3/ChangeLog:

* acinclude.m4 (glibcxx_SUBDIRS): Add src/experimental.
* include/Makefile.am (experimental_headers): Add contract.
* include/Makefile.in: Regenerate.
* src/Makefile.am (SUBDIRS): Add experimental.
* src/Makefile.in: Regenerate.
* configure: Regenerate.
* src/experimental/contract.cc: New file.
* src/experimental/Makefile.am: New file.
* src/experimental/Makefile.in: New file.
* include/experimental/contract: New file.
---
libstdc++-v3/src/experimental/contract.cc  |  41 ++
libstdc++-v3/acinclude.m4  |   2 +-
libstdc++-v3/include/Makefile.am   |   1 +
libstdc++-v3/include/Makefile.in   |   1 +
libstdc++-v3/src/Makefile.am   |   3 +-
libstdc++-v3/src/Makefile.in   |   6 +-
libstdc++-v3/src/experimental/Makefile.am  |  96 +++
libstdc++-v3/src/experimental/Makefile.in  | 796 +
libstdc++-v3/include/experimental/contract |  84 +++
9 files changed, 1026 insertions(+), 4 deletions(-)
create mode 100644 libstdc++-v3/src/experimental/contract.cc
create mode 100644 libstdc++-v3/src/experimental/Makefile.am
create mode 100644 libstdc++-v3/src/experimental/Makefile.in
create mode 100644 libstdc++-v3/include/experimental/contract


base-commit: a4cd2389276a30c39034a83d640ce68fa407bac1
prerequisite-patch-id: 329bc16a88dc9a3b13cd3fcecb3678826cc592dc

diff --git a/libstdc++-v3/src/experimental/contract.cc 
b/libstdc++-v3/src/experimental/contract.cc

new file mode 100644
index 000..b9b72cd7df0
--- /dev/null
+++ b/libstdc++-v3/src/experimental/contract.cc
@@ -0,0 +1,41 @@
+// -*- C++ -*- std::experimental::contract_violation and friends
+// Copyright (C) 1994-2022 Free Software Foundation, Inc.


Copy from an old file? I don't think this uses anything
existing, should be just 2022.


+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+#include 
+#include 
+
+__attribute__ ((weak)) void
+handle_contract_violation (const 
std::experimental::contract_violation )

+{
+  std::cerr << "default std::handle_contract_violation called: " << 
std::endl


No need for flushing with endl here, just \n please.


+    << " " << violation.file_name()
+    << " " << violation.line_number()
+    << " " << violation.function_name()
+    << " " << violation.comment()
+    << " " << violation.assertion_level()
+    << " " << violation.assertion_role()
+    << " " << (int)violation.continuation_mode()
+    << std::endl;


And this will flush too, which typically isn't needed for stderr
because it's unbuffered. But somebody could have fiddled with cerr, so
doing this final flush seems OK.


+}
+
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 6f672924a73..baf01913a90 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
  # Keep these sync'd with the list in Makefile.am.  The first 
provides an
  # expandable list at autoconf time; the second provides an 
expandable list

  # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 
src/c++11 src/c++17 src/c++20 src/filesystem src/libbacktrace doc po 
testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 
src/c++11 src/c++17 src/c++20 src/filesystem src/libbacktrace 
src/experimental doc po testsuite python])

  SUBDIRS='glibcxx_SUBDIRS'

  # These need to be absolute paths, yet at the same time need to
diff --git a/libstdc++-v3/include/Makefile.am 

Re: [PATCH v3] c++: P2448 - Relaxing some constexpr restrictions [PR106649]

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/16/22 15:27, Jason Merrill wrote:

On 11/16/22 11:06, Marek Polacek wrote:

On Wed, Nov 16, 2022 at 08:41:53AM -0500, Jason Merrill wrote:

On 11/15/22 19:30, Marek Polacek wrote:
@@ -996,19 +1040,26 @@ register_constexpr_fundef (const 
constexpr_fundef )

 **slot = value;
   }
-/* FUN is a non-constexpr function called in a context that requires a
-   constant expression.  If it comes from a constexpr template, 
explain why

-   the instantiation isn't constexpr.  */
+/* FUN is a non-constexpr (or, with -Wno-invalid-constexpr, a 
constexpr

+   function called in a context that requires a constant expression).
+   If it comes from a constexpr template, explain why the 
instantiation

+   isn't constexpr.  */


The "if it comes from a constexpr template" wording has needed an 
update for

a while now.


Probably ever since r178519.  I've added "Otherwise, explain why the 
function

cannot be used in a constexpr context."  Is that acceptable?

--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/constexpr-nonlit15.C
@@ -0,0 +1,43 @@
+// PR c++/106649
+// P2448 - Relaxing some constexpr restrictions
+// { dg-do compile { target c++23 } }
+// { dg-options "-Winvalid-constexpr" }
+// A copy/move assignment operator for a class X that is defaulted and
+// not defined as deleted is implicitly defined when it is odr-used,
+// when it is needed for constant evaluation, or when it is explicitly
+// defaulted after its first declaration.
+// The implicitly-defined copy/move assignment operator is constexpr.
+
+struct S {
+  constexpr S() {}
+  S& operator=(const S&) = default; // #1
+  S& operator=(S&&) = default; // #2
+};
+
+struct U {
+  constexpr U& operator=(const U&) = default;
+  constexpr U& operator=(U&&) = default;
+};
+
+/* FIXME: If we only declare #1 and #2, and default them here:
+
+   S& S::operator=(const S&) = default;
+   S& S::operator=(S&&) = default;
+
+then they aren't constexpr.  This sounds like a bug:
+.  */


As I commented on the PR, I don't think this is actually a bug, so let's
omit this FIXME.


I'm glad I didn't really attempt to "fix" it (the inform message is 
flawed

and should be improved).  Thanks for taking a look.

Here's a version with the two comments updated.

Ok?


OK.


Since this patch I'm seeing these failures:

FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++23 -fimplicit-constexpr  at 
line 91 (test for errors, line 89)
FAIL: g++.dg/cpp23/constexpr-nonlit10.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 14)
FAIL: g++.dg/cpp23/constexpr-nonlit10.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 20)
FAIL: g++.dg/cpp23/constexpr-nonlit11.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 28)
FAIL: g++.dg/cpp23/constexpr-nonlit11.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 31)
FAIL: g++.dg/cpp2a/spaceship-eq3.C  -std=c++23 -fimplicit-constexpr 
(test for excess errors)


Jason



Re: [PATCH v2] c++: Reject UDLs in certain contexts [PR105300]

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 18:52, Marek Polacek wrote:

On Thu, Nov 17, 2022 at 07:06:34PM -0500, Jason Merrill wrote:

On 11/16/22 20:12, Marek Polacek wrote:

On Wed, Nov 16, 2022 at 08:22:39AM -0500, Jason Merrill wrote:

On 11/15/22 19:35, Marek Polacek wrote:

On Tue, Nov 15, 2022 at 06:58:39PM -0500, Jason Merrill wrote:

On 11/12/22 06:53, Marek Polacek wrote:

In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


Since the grammar has

user-defined-string-literal :
string-literal ud-suffix

maybe we want to move the UDL handling out to a cp_parser_udl_string_literal
that calls cp_parser_string_literal?


Umm, maybe, but the UDL handling code seems to be too entrenched in
cp_parser_string_literal and I don't think it's going to be easy to extract
it :/.


Fair enough; maybe a wrapper, then?


As in, have a cp_parser_udl_string_literal wrapper that calls
cp_parser_string_literal with udl_ok=true, rename cp_parser_string_literal,
introduce a new cp_parser_string_literal wrapper that passes udl_ok=false?


That's what I was thinking.  And the new cp_parser_string_literal could also
omit the lookup_udlit parm.


One problem with cp_parser_udl_string_literal is that it's too similar to
cp_parser_userdef_string_literal, which would be confusing, I think.


True, probably better to use that name instead, and rename the current one
to something like finish_userdef_string_literal


Sounds good, here's the patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

I've introduced two new wrappers; the existing cp_parser_string_literal
was renamed to cp_parser_string_literal_common and should not be called
directly.  finish_userdef_string_literal is renamed from
cp_parser_userdef_string_literal.

PR c++/105300

gcc/c-family/ChangeLog:

* c-pragma.cc (handle_pragma_message): Warn for CPP_STRING_USERDEF.

gcc/cp/ChangeLog:

* parser.cc: Remove unnecessary forward declarations.
(cp_parser_string_literal): New wrapper.
(cp_parser_string_literal_common): Renamed from
cp_parser_string_literal.  Add a bool parameter.  Give an error when
UDLs are not permitted.
(cp_parser_userdef_string_literal): New wrapper.
(finish_userdef_string_literal): Renamed from
cp_parser_userdef_string_literal.
(cp_parser_primary_expression): Call cp_parser_userdef_string_literal
instead of cp_parser_string_literal.
(cp_parser_linkage_specification): Move a variable declaration closer
to its first use.
(cp_parser_static_assert): Likewise.
(cp_parser_operator): Call cp_parser_userdef_string_literal instead of
cp_parser_string_literal.
(cp_parser_asm_definition): Move a variable declaration closer to its
first use.
(cp_parser_asm_specification_opt): Move variable declarations closer to
their first use.
(cp_parser_asm_operand_list): Likewise.
(cp_parser_asm_clobber_list): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-error1.C: New test.
---
  gcc/c-family/c-pragma.cc  |   3 +
  gcc/cp/parser.cc  | 131 ++
  gcc/testsuite/g++.dg/cpp0x/udlit-error1.C |  21 
  3 files changed, 111 insertions(+), 44 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-error1.C

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 142a46441ac..49f405b605b 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1390,6 +1390,9 @@ handle_pragma_message (cpp_reader *)
  }
else if (token == CPP_STRING)
  message = x;
+  else if (token == CPP_STRING_USERDEF)
+GCC_BAD ("string literal with user-defined suffix is invalid in this "
+"context");
else
  GCC_BAD ("expected a string after %<#pragma message%>");
  
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc

index c5929a6cc5f..e3bd94ffe11 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2223,16 +2223,8 @@ pop_unparsed_function_queues (cp_parser *parser)
  
  /* Lexical conventions [gram.lex]  */
  
-static cp_expr cp_parser_identifier

-  (cp_parser *);
-static cp_expr cp_parser_string_literal
-  (cp_parser *, bool, bool, bool);
-static cp_expr cp_parser_userdef_char_literal
-  (cp_parser *);
-static tree cp_parser_userdef_string_literal
+static tree finish_userdef_string_literal
(tree);
-static cp_expr cp_parser_userdef_numeric_literal
-  

[Bug c++/107755] ICE: in fold_convert_loc, at fold-const.c:2435, with -Wlogical-op, implicit user-defined conversion operator, template function, logical operator, and conditional operator

2022-11-18 Thread pokechu022+gccbugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107755

Pokechu22  changed:

   What|Removed |Added

  Known to fail||10.3.0, 12.2.0, 4.8.1,
   ||9.4.0
  Known to work||4.7.4

--- Comment #1 from Pokechu22  ---
Here are a few cases that would be useful to include in a test:

```
return (!false && (false ? a : b));
return (!true && (false ? a : b));
return (!false || (false ? a : b));
return (!true || (false ? a : b));
return (x && (y ? a : b));
return (x || (y ? a : b));
```

As an aside, `(false | (false ? a : b))` (where a is bool and b is Foo) used to
cause an ICE (confirmed in 4.7.4, 4.8.1, and 5.5). This was fixed in or before
6.1. I can't find a bug report corresponding to that issue either, but it
probably should be verified that that doesn't regress either.

[Bug c++/107755] New: ICE: in fold_convert_loc, at fold-const.c:2435, with -Wlogical-op, implicit user-defined conversion operator, template function, logical operator, and conditional operator

2022-11-18 Thread pokechu022+gccbugzilla at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107755

Bug ID: 107755
   Summary: ICE: in fold_convert_loc, at fold-const.c:2435, with
-Wlogical-op, implicit user-defined conversion
operator, template function, logical operator, and
conditional operator
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pokechu022+gccbugzilla at gmail dot com
  Target Milestone: ---

Created attachment 53928
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53928=edit
failing code using (!false && (false ? a : b))

A combination of -Wlogical-op, a template function, an implicit user-defined
conversion operator, a logical AND or OR operator, and a ternary conditional
operator results in an ICE.

```
$ gcc-10 -Wlogical-op -save-temps test.cpp
test.cpp: In function ‘bool Bar()’:
test.cpp:12:35: internal compiler error: in fold_convert_loc, at
fold-const.c:2435
   12 |   return (!false && (false ? a : b));
  |   ^
0x7f08a2bf4082 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
$ gcc-10 -v
Using built-in specs.
COLLECT_GCC=gcc-10
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
10.3.0-1ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-10
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-10-S4I5Pr/gcc-10-10.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-S4I5Pr/gcc-10-10.3.0/debian/tmp-gcn/usr,hsa
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu1~20.04)
```

I tested at https://godbolt.org/z/574Wq1Ws5 and found that this issue was not
present in 4.7.4 and is present in 4.8.1, 12.2, and trunk. Here's the output on
trunk from compiler explorer, for convenience:

```
Using built-in specs.
COLLECT_GCC=/opt/compiler-explorer/gcc-snapshot/bin/g++
Target: x86_64-linux-gnu
Configured with: ../gcc-trunk-20221118/configure
--prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu
--enable-languages=c,c++,fortran,ada,objc,obj-c++,d --enable-ld=yes
--enable-gold=yes --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-linker-build-id --enable-lto --enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build-gcc-7b3b2f50953c5143d4b14b59d322d8a793f411dd-binutils-2.38
--enable-libstdcxx-backtrace=yes
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20221117 (experimental)
(Compiler-Explorer-Build-gcc-7b3b2f50953c5143d4b14b59d322d8a793f411dd-binutils-2.38)
 
COLLECT_GCC_OPTIONS='-fdiagnostics-color=always' '-g' '-o' '/app/output.s'
'-masm=intel' '-S' '-Wlogical-op' '-v' '-shared-libgcc' '-mtune=generic'
'-march=x86-64' '-dumpdir' '/app/'

/opt/compiler-explorer/gcc-trunk-20221118/bin/../libexec/gcc/x86_64-linux-gnu/13.0.0/cc1plus
-quiet -v -imultiarch x86_64-linux-gnu -iprefix
/opt/compiler-explorer/gcc-trunk-20221118/bin/../lib/gcc/x86_64-linux-gnu/13.0.0/
-D_GNU_SOURCE  -quiet -dumpdir /app/ -dumpbase output.cpp -dumpbase-ext
.cpp -masm=intel -mtune=generic -march=x86-64 -g -Wlogical-op -version
-fdiagnostics-color=always -o /app/output.s
GNU C++17
(Compiler-Explorer-Build-gcc-7b3b2f50953c5143d4b14b59d322d8a793f411dd-binutils-2.38)
version 13.0.0 20221117 (experimental) (x86_64-linux-gnu)
compiled by GNU C version 9.4.0, GMP version 

[Bug analyzer/107582] - -Wanalyzer-use-of-uninitialized-value false positive with while loop in pthread_cleanup_push

2022-11-18 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107582

--- Comment #9 from David Malcolm  ---
s/earlier/earliest/

[Bug analyzer/107582] - -Wanalyzer-use-of-uninitialized-value false positive with while loop in pthread_cleanup_push

2022-11-18 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107582

--- Comment #8 from David Malcolm  ---
(In reply to David Malcolm from comment #7)
> I hope to backport this to GCC 12; keeping this open to track that.

I believe the buggy implementation of dynamic_call_info_t::update_model was
introduced in r12-3002-gaef703cf982072, so GCC 12 is probably the earlier
branch to backport to.

[Bug analyzer/107582] - -Wanalyzer-use-of-uninitialized-value false positive with while loop in pthread_cleanup_push

2022-11-18 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107582

--- Comment #7 from David Malcolm  ---
Fixed on trunk for GCC 13 by the above commit.

I hope to backport this to GCC 12; keeping this open to track that.

[committed] analyzer: fix feasibility false +ve on jumps through function ptrs [PR107582]

2022-11-18 Thread David Malcolm via Gcc-patches
PR analyzer/107582 reports a false +ve from
-Wanalyzer-use-of-uninitialized-value where
the analyzer's feasibility checker erroneously decides
that point (B) in the code below is reachable, with "x" being
uninitialized there:

pthread_cleanup_push(func, NULL);

while (ret != ETIMEDOUT)
ret = rand() % 1000;

/* (A): after the while loop  */

if (ret != ETIMEDOUT)
  x = 

pthread_cleanup_pop(1);

if (ret == ETIMEDOUT)
  return 0;

/* (B): after not bailing out  */

due to these contradictionary conditions somehow both holding:
  * (ret == ETIMEDOUT), at (A) (skipping the initialization of x), and
  * (ret != ETIMEDOUT), at (B)

The root cause is that after the while loop, state merger puts ret in
the exploded graph in an UNKNOWN state, and saves the diagnostic at (B).

Later, as we explore the feasibilty of reaching the enode for (B),
dynamic_call_info_t::update_model is called to push/pop the
frames for handling the call to "func" in pthread_cleanup_pop.
The "ret" at these nodes in the feasible_graph has a conjured_svalue for
"ret", and a constraint on it being either == *or* != ETIMEDOUT.

However dynamic_call_info_t::update_model blithely clobbers the
model with a copy from the exploded_graph, in which "ret" is UNKNOWN.

This patch fixes dynamic_call_info_t::update_model so that it
simulates pushing/popping a frame on the model we're working with,
preserving knowledge of the constraint on "ret", and enabling the
analyzer to "know" that the bail-out must happen.

Doing so fixes the false positive.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4158-ga7aef0a5a2b7e2.

gcc/analyzer/ChangeLog:
PR analyzer/107582
* engine.cc (dynamic_call_info_t::update_model): Update the model
by pushing or pop a frame, rather than by clobbering it with the
model from the exploded_node's state.

gcc/testsuite/ChangeLog:
PR analyzer/107582
* gcc.dg/analyzer/feasibility-4.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-1.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-2.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc| 14 --
 gcc/testsuite/gcc.dg/analyzer/feasibility-4.c | 42 ++
 .../gcc.dg/analyzer/feasibility-pr107582-1.c  | 43 +++
 .../gcc.dg/analyzer/feasibility-pr107582-2.c  | 34 +++
 4 files changed, 129 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/feasibility-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/feasibility-pr107582-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/feasibility-pr107582-2.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index b52753da793..db1881cd140 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -2024,16 +2024,22 @@ exploded_node::dump_succs_and_preds (FILE *outf) const
 /* Implementation of custom_edge_info::update_model vfunc
for dynamic_call_info_t.
 
-   Update state for the dynamically discorverd calls */
+   Update state for a dynamically discovered call (or return), by pushing
+   or popping the a frame for the appropriate function.  */
 
 bool
 dynamic_call_info_t::update_model (region_model *model,
   const exploded_edge *eedge,
-  region_model_context *) const
+  region_model_context *ctxt) const
 {
   gcc_assert (eedge);
-  const program_state _state = eedge->m_dest->get_state ();
-  *model = *dest_state.m_region_model;
+  if (m_is_returning_call)
+model->update_for_return_gcall (m_dynamic_call, ctxt);
+  else
+{
+  function *callee = eedge->m_dest->get_function ();
+  model->update_for_gcall (m_dynamic_call, ctxt, callee);
+}
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/feasibility-4.c 
b/gcc/testsuite/gcc.dg/analyzer/feasibility-4.c
new file mode 100644
index 000..1a1128089fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/feasibility-4.c
@@ -0,0 +1,42 @@
+#include "analyzer-decls.h"
+
+extern int rand (void);
+
+void test_1 (void)
+{
+  int   ret = 0;
+  while (ret != 42)
+ret = rand() % 1000;
+
+  if (ret != 42)
+__analyzer_dump_path (); /* { dg-bogus "path" } */
+}
+
+static void empty_local_fn (void) {}
+extern void external_fn (void);
+
+void test_2 (void)
+{
+  void (*callback) () = empty_local_fn;
+  int   ret = 0;
+  while (ret != 42)
+ret = rand() % 1000;
+
+  (*callback) ();
+
+  if (ret != 42)
+__analyzer_dump_path (); /* { dg-bogus "path" } */
+}
+
+void test_3 (void)
+{
+  void (*callback) () = external_fn;
+  int   ret = 0;
+  while (ret != 42)
+ret = rand() % 1000;
+
+  (*callback) ();
+
+  if (ret != 42)
+__analyzer_dump_path (); /* { dg-bogus "path" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/feasibility-pr107582-1.c 

[Bug analyzer/107582] - -Wanalyzer-use-of-uninitialized-value false positive with while loop in pthread_cleanup_push

2022-11-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107582

--- Comment #6 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:a7aef0a5a2b7e20048275a29bd80674c1a061a24

commit r13-4158-ga7aef0a5a2b7e20048275a29bd80674c1a061a24
Author: David Malcolm 
Date:   Fri Nov 18 19:38:25 2022 -0500

analyzer: fix feasibility false +ve on jumps through function ptrs
[PR107582]

PR analyzer/107582 reports a false +ve from
-Wanalyzer-use-of-uninitialized-value where
the analyzer's feasibility checker erroneously decides
that point (B) in the code below is reachable, with "x" being
uninitialized there:

pthread_cleanup_push(func, NULL);

while (ret != ETIMEDOUT)
ret = rand() % 1000;

/* (A): after the while loop  */

if (ret != ETIMEDOUT)
  x = 

pthread_cleanup_pop(1);

if (ret == ETIMEDOUT)
  return 0;

/* (B): after not bailing out  */

due to these contradictionary conditions somehow both holding:
  * (ret == ETIMEDOUT), at (A) (skipping the initialization of x), and
  * (ret != ETIMEDOUT), at (B)

The root cause is that after the while loop, state merger puts ret in
the exploded graph in an UNKNOWN state, and saves the diagnostic at (B).

Later, as we explore the feasibilty of reaching the enode for (B),
dynamic_call_info_t::update_model is called to push/pop the
frames for handling the call to "func" in pthread_cleanup_pop.
The "ret" at these nodes in the feasible_graph has a conjured_svalue for
"ret", and a constraint on it being either == *or* != ETIMEDOUT.

However dynamic_call_info_t::update_model blithely clobbers the
model with a copy from the exploded_graph, in which "ret" is UNKNOWN.

This patch fixes dynamic_call_info_t::update_model so that it
simulates pushing/popping a frame on the model we're working with,
preserving knowledge of the constraint on "ret", and enabling the
analyzer to "know" that the bail-out must happen.

Doing so fixes the false positive.

gcc/analyzer/ChangeLog:
PR analyzer/107582
* engine.cc (dynamic_call_info_t::update_model): Update the model
by pushing or pop a frame, rather than by clobbering it with the
model from the exploded_node's state.

gcc/testsuite/ChangeLog:
PR analyzer/107582
* gcc.dg/analyzer/feasibility-4.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-1.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-2.c: New test.

Signed-off-by: David Malcolm 

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #9 from Steve Kargl  ---
On Fri, Nov 18, 2022 at 11:24:29PM +, sgk at troutmask dot
apl.washington.edu wrote:
> 
> Does anyone know what is meant by "Fortran rules"?  F66 does not
> have any particular algorithm specified.  I'll look at F77 shortly.
> 

Well, I hunted down the origins of -fcx-fortran-rules.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

So, it appears to be an optimization, where Smith's algorithm
will fail for extreme values of the real and imaginary parts
of the complex number.

[Bug ipa/96503] attribute alloc_size effect lost after inlining

2022-11-18 Thread pageexec at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503

PaX Team  changed:

   What|Removed |Added

 CC||pageexec at gmail dot com

--- Comment #2 from Siddhesh Poyarekar  ---
(In reply to Kees Cook from comment #1)
> Created attachment 53643 [details]
> PoC showing unexpected __bdos results across inlines
> 
> Fixing this is needed for the Linux kernel to do much useful with
> alloc_size. Most of the allocators are inline wrappers, for example.

For cases where the size doesn't really change across the inlines, it ought to
be sufficient to annotate the non-inlined implementation function, e.g. in case
of kvmalloc, annotate kvmalloc_node as __alloc_size(1).

For other cases it may be less trivial, e.g.:

/* Some padding the wrapper adds to the actual allocation.  */
size_t metadata_size;

__attribute__ ((alloc_size (1))) void *alloc_wrapper (size_t sz)
{
  return real_alloc (size + metadata_size);
}

extern void *real_alloc (size_t) __attribute__ ((alloc_size(1)));

here the compiler will end up seeing the padded size, which may not be correct.

To fix this we'll have to store the alloc_size info somewhere (ptr_info seems
to be aliasing-specific, so maybe a new member to tree_ssa_name) during
inlining and then teach the tree-object-size pass to access it.

[PATCH v2] c++: Reject UDLs in certain contexts [PR105300]

2022-11-18 Thread Marek Polacek via Gcc-patches
On Thu, Nov 17, 2022 at 07:06:34PM -0500, Jason Merrill wrote:
> On 11/16/22 20:12, Marek Polacek wrote:
> > On Wed, Nov 16, 2022 at 08:22:39AM -0500, Jason Merrill wrote:
> > > On 11/15/22 19:35, Marek Polacek wrote:
> > > > On Tue, Nov 15, 2022 at 06:58:39PM -0500, Jason Merrill wrote:
> > > > > On 11/12/22 06:53, Marek Polacek wrote:
> > > > > > In this PR, we are crashing because we've encountered a UDL where a
> > > > > > string-literal is expected.  This patch makes the parser reject 
> > > > > > string
> > > > > > and character UDLs in all places where the grammar requires a
> > > > > > string-literal and not a user-defined-string-literal.
> > > > > > 
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > 
> > > > > Since the grammar has
> > > > > 
> > > > > user-defined-string-literal :
> > > > >   string-literal ud-suffix
> > > > > 
> > > > > maybe we want to move the UDL handling out to a 
> > > > > cp_parser_udl_string_literal
> > > > > that calls cp_parser_string_literal?
> > > > 
> > > > Umm, maybe, but the UDL handling code seems to be too entrenched in
> > > > cp_parser_string_literal and I don't think it's going to be easy to 
> > > > extract
> > > > it :/.
> > > 
> > > Fair enough; maybe a wrapper, then?
> > 
> > As in, have a cp_parser_udl_string_literal wrapper that calls
> > cp_parser_string_literal with udl_ok=true, rename cp_parser_string_literal,
> > introduce a new cp_parser_string_literal wrapper that passes udl_ok=false?
> 
> That's what I was thinking.  And the new cp_parser_string_literal could also
> omit the lookup_udlit parm.
> 
> > One problem with cp_parser_udl_string_literal is that it's too similar to
> > cp_parser_userdef_string_literal, which would be confusing, I think.
> 
> True, probably better to use that name instead, and rename the current one
> to something like finish_userdef_string_literal

Sounds good, here's the patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

I've introduced two new wrappers; the existing cp_parser_string_literal
was renamed to cp_parser_string_literal_common and should not be called
directly.  finish_userdef_string_literal is renamed from
cp_parser_userdef_string_literal.

PR c++/105300

gcc/c-family/ChangeLog:

* c-pragma.cc (handle_pragma_message): Warn for CPP_STRING_USERDEF.

gcc/cp/ChangeLog:

* parser.cc: Remove unnecessary forward declarations.
(cp_parser_string_literal): New wrapper.
(cp_parser_string_literal_common): Renamed from
cp_parser_string_literal.  Add a bool parameter.  Give an error when
UDLs are not permitted.
(cp_parser_userdef_string_literal): New wrapper.
(finish_userdef_string_literal): Renamed from
cp_parser_userdef_string_literal.
(cp_parser_primary_expression): Call cp_parser_userdef_string_literal
instead of cp_parser_string_literal.
(cp_parser_linkage_specification): Move a variable declaration closer
to its first use.
(cp_parser_static_assert): Likewise.
(cp_parser_operator): Call cp_parser_userdef_string_literal instead of
cp_parser_string_literal.
(cp_parser_asm_definition): Move a variable declaration closer to its
first use.
(cp_parser_asm_specification_opt): Move variable declarations closer to
their first use.
(cp_parser_asm_operand_list): Likewise.
(cp_parser_asm_clobber_list): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-error1.C: New test.
---
 gcc/c-family/c-pragma.cc  |   3 +
 gcc/cp/parser.cc  | 131 ++
 gcc/testsuite/g++.dg/cpp0x/udlit-error1.C |  21 
 3 files changed, 111 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-error1.C

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 142a46441ac..49f405b605b 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1390,6 +1390,9 @@ handle_pragma_message (cpp_reader *)
 }
   else if (token == CPP_STRING)
 message = x;
+  else if (token == CPP_STRING_USERDEF)
+GCC_BAD ("string literal with user-defined suffix is invalid in this "
+"context");
   else
 GCC_BAD ("expected a string after %<#pragma message%>");
 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index c5929a6cc5f..e3bd94ffe11 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2223,16 +2223,8 @@ pop_unparsed_function_queues (cp_parser *parser)
 
 /* Lexical conventions [gram.lex]  */
 
-static cp_expr cp_parser_identifier
-  (cp_parser *);
-static cp_expr cp_parser_string_literal
-  (cp_parser *, bool, bool, 

Re: [PATCH] c++: remove coerce_innermost_template_parms

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 16:59, Patrick Palka wrote:

IIUC the only practical difference between coerce_innermost_template_parms
and the main function coerce_template_parms is that the former takes
a multi-level template parameter list and returns a template argument
vector of the same depth, whereas the latter takes a single-level
template parameter vector and returns a single-level template argument
vector.

This patch gets rid of the wrapper function and just overloads the
behavior of the main function according to whether 'parms' is a
multi-level template parameter list or a single-level template argument
vector.  It turns out we can assume parms and args have the same depth
in the multi-level case, which simplifies the overloading logic.

Besides the (subjective) simplificatio benefit, another benefit of this
unification is that it avoids a redundant copy of a multi-level 'args'.
Now, we can return new_args directly from c_t_p.  (And because of this,
we need to turn new_inner_args into a reference so that updating it also
updates new_args.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.  But this doesn't really seem like stage 3 material; let's hold off 
on further cleanups like this until next stage 1.



gcc/cp/ChangeLog:

* pt.cc (coerce_template_parms): Salvage part of the function
comment from c_innermost_t_p.  Handle parms being a full
template parameter list.
(coerce_innermost_template_parms): Remove.
(lookup_template_class): Use c_t_p instead of c_innermost_t_p.
(finish_template_variable): Likewise.
(tsubst_decl): Likewise.
(instantiate_alias_template): Likewise.
---
  gcc/cp/pt.cc | 92 +++-
  1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0310e38c9b9..2666e455edf 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -148,8 +148,6 @@ static void add_pending_template (tree);
  static tree reopen_tinst_level (struct tinst_level *);
  static tree tsubst_initializer_list (tree, tree);
  static tree get_partial_spec_bindings (tree, tree, tree);
-static tree coerce_innermost_template_parms (tree, tree, tree, tsubst_flags_t,
-bool = true);
  static void tsubst_enum   (tree, tree, tree);
  static bool check_instantiated_args (tree, tree, tsubst_flags_t);
  static int check_non_deducible_conversion (tree, tree, unification_kind_t, 
int,
@@ -8827,6 +8825,14 @@ pack_expansion_args_count (tree args)
 arguments.  If any error occurs, return error_mark_node. Error and
 warning messages are issued under control of COMPLAIN.
  
+   If PARMS represents all template parameters levels, this function

+   returns a vector of vectors representing all the resulting argument
+   levels.  Note that in this case, only the innermost arguments are
+   coerced because the outermost ones are supposed to have been coerced
+   already.  Otherwise, if PARMS represents only (the innermost) vector
+   of parameters, this function returns a vector containing just the
+   innermost resulting arguments.
+
 If REQUIRE_ALL_ARGS is false, argument deduction will be performed
 for arguments not specified in ARGS.  If REQUIRE_ALL_ARGS is true,
 arguments not specified in ARGS must have default arguments which
@@ -8842,8 +8848,6 @@ coerce_template_parms (tree parms,
int nparms, nargs, parm_idx, arg_idx, lost = 0;
tree orig_inner_args;
tree inner_args;
-  tree new_args;
-  tree new_inner_args;
  
/* When used as a boolean value, indicates whether this is a

   variadic template parameter list. Since it's an int, we can also
@@ -8864,6 +8868,17 @@ coerce_template_parms (tree parms,
if (args == error_mark_node)
  return error_mark_node;
  
+  bool return_full_args = false;

+  if (TREE_CODE (parms) == TREE_LIST)
+{
+  if (TMPL_PARMS_DEPTH (parms) > 1)
+   {
+ gcc_assert (TMPL_PARMS_DEPTH (parms) == TMPL_ARGS_DEPTH (args));
+ return_full_args = true;
+   }
+  parms = INNERMOST_TEMPLATE_PARMS (parms);
+}
+
nparms = TREE_VEC_LENGTH (parms);
  
/* Determine if there are any parameter packs or default arguments.  */

@@ -8961,8 +8976,8 @@ coerce_template_parms (tree parms,
   template-id may be nested within a "sizeof".  */
cp_evaluated ev;
  
-  new_inner_args = make_tree_vec (nparms);

-  new_args = add_outermost_template_args (args, new_inner_args);
+  tree new_args = add_outermost_template_args (args, make_tree_vec (nparms));
+  tree& new_inner_args = TMPL_ARGS_LEVEL (new_args, TMPL_ARGS_DEPTH 
(new_args));
int pack_adjust = 0;
for (parm_idx = 0, arg_idx = 0; parm_idx < nparms; parm_idx++, arg_idx++)
  {
@@ -9164,59 +9179,7 @@ coerce_template_parms (tree parms,
  SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 TREE_VEC_LENGTH (new_inner_args));
  

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread weslley.pereira at ucdenver dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #8 from Weslley da Silva Pereira  ---
Created attachment 53927
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53927=edit
Test case with many examples of complex division

Test code used in LAPACK 3.11.0.
Code extracted from
https://github.com/Reference-LAPACK/lapack/blob/master/INSTALL/test_zcomplexdiv.f

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread weslley.pereira at ucdenver dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #7 from Weslley da Silva Pereira  ---
(In reply to anlauf from comment #3)
> I guess the reporter assumes that gcc uses a clever algorithm like Smith's
> to handle such extreme cases of complex division.  Not sure if that one is
> available by some compilation flag, and I think it would impact performance.
> 
> In any case, if the reporter wants to get robust results and in a portable
> way, I would advise him to change/fix his algorithm accordingly.  It appears
> that a few other compilers behave here like gfortran.

Thanks for the suggestion of changing the algorithm that needs such a division.
What are the ranges for nominator and denominator where one should rely on the
intrinsic complex division? Maybe this is the good question to ask. Then we can
build algorithms that attend such requisites. LAPACK has cladiv and zladiv,
which are routines for complex division that avoids unnecessary under and
overflow. They are used in many parts of the code. This implementation is for
sure less efficient than the intrinsic complex division, but we rely on it
because of robustness.

More data for the discussion:
1. In a Ubuntu 18.04.5 LTS, using GNU Fortran 7.5.0, I tested optimization
flags `-O` but still reproduce the wrong result for complex divisions with huge
numbers. See
https://github.com/Reference-LAPACK/lapack/issues/575#issuecomment-910616816
that used the code from
https://github.com/Reference-LAPACK/lapack/blob/master/INSTALL/test_zcomplexdiv.f.
This is the test currently in LAPACK 3.11.0.
2. I have just reproduced what was reported in
https://github.com/Reference-LAPACK/lapack/issues/575#issuecomment-910616816 in
my Ubuntu 20.04.5 LTS, using GNU Fortran 9.4.0.
3. I noticed that the optimization flag is unable to target divisions like
`x/x` depending on where they are inside a program.
4. My Ubuntu 20.04.5 LTS with compiler ifort 2021.7.1 computes the complex
division `x/x` accurately even for the case of huge numbers. Scenarios tested:
   - I tested the program in
https://github.com/Reference-LAPACK/lapack/blob/master/INSTALL/test_zcomplexdiv.f
and the one in https://godbolt.org/z/b3WKWodvn.
   - I tested ifort with flags -fp-model precise and -fp-model fast. The latter
enables more aggressive optimizations on floating-point data.
   - I tested compilation with optimization flags -O0, -O, -O1, -O2, -O3. 

Here is the implementation of the complex division in LAPACK if it somehow
helps the discussion:
https://netlib.org/lapack/explore-html/d8/d9b/group__double_o_t_h_e_rauxiliary_gad1c0279ec29e8ac222f1e319f4144fcb.html

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #6 from Steve Kargl  ---
On Fri, Nov 18, 2022 at 11:24:29PM +, sgk at troutmask dot
apl.washington.edu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753
> 
> --- Comment #5 from Steve Kargl  ---
> On Fri, Nov 18, 2022 at 10:05:21PM +, kargl at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753
> > 
> > --- Comment #4 from kargl at gcc dot gnu.org ---
> > (In reply to anlauf from comment #3)
> > > I guess the reporter assumes that gcc uses a clever algorithm like Smith's
> > > to handle such extreme cases of complex division.  Not sure if that one is
> > > available by some compilation flag, and I think it would impact 
> > > performance.
> > > 
> > > In any case, if the reporter wants to get robust results and in a portable
> > > way, I would advise him to change/fix his algorithm accordingly.  It 
> > > appears
> > > that a few other compilers behave here like gfortran.
> > 
> > It's likely coming from the middle-end where gcc.info has
> > the option
> > 
> > '-fcx-fortran-rules'
> >  Complex multiplication and division follow Fortran rules.  Range
> >  reduction is done as part of complex division, but there is no
> >  checking whether the result of a complex multiplication or division
> >  is 'NaN + I*NaN', with an attempt to rescue the situation in that
> >  case.
> 
> Does anyone know what is meant by "Fortran rules"?  F66 does not
> have any particular algorithm specified.  I'll look at F77 shortly.
> 

I add the subroutine

   subroutine ohno
  complex(dp), parameter :: a = cmplx(huge(1.d0),huge(1.d0),dp)
  complex(dp), parameter :: b = a / a
  write(*,*) a
  write(*,*) b
   end subroutine ohno 


% gfortran -o z a.f90 && ./z
   (1.79769313486231571E+308,1.79769313486231571E+308)
  (NaN,0.)
   (1.79769313486231571E+308,1.79769313486231571E+308)
   (1.,0.)

The last two lines are from ohno.

nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel' (was: [MentorEmbedded/nvptx-tools] Match standard 'ld' "search" behavior (PR #38))

2022-11-18 Thread Thomas Schwinge
Hi!

Re
:

On 2022-11-18T11:05:23-0800, I wrote:
> Actually, in GCC/nvptx target testing, this #38's commit 
> 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820 "ld: Don't search for input files in 
> '-L'directories" is generally causing linking to fail with:
>
> ```
> error opening crt0.o
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
> ```
>
> I'm investigating.

OK to push the attached
GCC "nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'" to all
active GCC branches?  (... instead of having to restore this "blunder"
(do "search for input files in '-L'directories") in nvptx-tools...)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 85ddd99017968e8aa45342645be9642e63bcc5bb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 18 Nov 2022 23:57:52 +0100
Subject: [PATCH] nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'

A recent nvptx-tools change: commit 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820
"ld: Don't search for input files in '-L'directories" (of

"Match standard 'ld' "search" behavior") in GCC/nvptx target testing
generally causes linking to fail with:

error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1

Indeed per GCC '-v' output, there is an undecorated 'crt0.o' on the linker
('collect2') command line:

 [...]/build-gcc/./gcc/collect2 -o [...] crt0.o [...]

This is due to:

gcc/config/nvptx/nvptx.h:#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"

..., and the fix, as used by numerous other GCC targets, is to instead use
'crt0.o%s'; for '%s' means, per 'gcc/gcc.cc', "The Specs Language":

 %s current argument is the name of a library or startup file of some sort.
Search for that file in a standard list of directories
and substitute the full name found.

With that, we get the expected path to 'crt0.o'.

	gcc/
	* config/nvptx/nvptx.h (STARTFILE_SPEC): Fix 'crt0.o' for
	'-mmainkernel'.
---
 gcc/config/nvptx/nvptx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 0afc83b10a3..dc676dcb5fc 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,7 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
+#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
-- 
2.25.1



[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #5 from Steve Kargl  ---
On Fri, Nov 18, 2022 at 10:05:21PM +, kargl at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753
> 
> --- Comment #4 from kargl at gcc dot gnu.org ---
> (In reply to anlauf from comment #3)
> > I guess the reporter assumes that gcc uses a clever algorithm like Smith's
> > to handle such extreme cases of complex division.  Not sure if that one is
> > available by some compilation flag, and I think it would impact performance.
> > 
> > In any case, if the reporter wants to get robust results and in a portable
> > way, I would advise him to change/fix his algorithm accordingly.  It appears
> > that a few other compilers behave here like gfortran.
> 
> It's likely coming from the middle-end where gcc.info has
> the option
> 
> '-fcx-fortran-rules'
>  Complex multiplication and division follow Fortran rules.  Range
>  reduction is done as part of complex division, but there is no
>  checking whether the result of a complex multiplication or division
>  is 'NaN + I*NaN', with an attempt to rescue the situation in that
>  case.

Does anyone know what is meant by "Fortran rules"?  F66 does not
have any particular algorithm specified.  I'll look at F77 shortly.

Tracking down what -fcx-fortran-rules does, one finds the
eventually flag_complex_method is set to 1.  The lower of
complex division occurs in gcc/tree-complex.cc (expand_complex_division).
If I use this patch

% git diff gcc/tree-complex.cc | cat
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index ea9df6114a1..8051b7a3843 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -1501,6 +1501,7 @@ expand_complex_division (gimple_stmt_iterator *gsi, tree
type,
  break;

case 2:
+   case 1:
  if (SCALAR_FLOAT_TYPE_P (inner_type))
{
  expand_complex_libcall (gsi, type, ar, ai, br, bi, code, true);
@@ -1508,7 +1509,6 @@ expand_complex_division (gimple_stmt_iterator *gsi, tree
type,
}
  /* FALLTHRU */

-   case 1:
  /* wide ranges of inputs must work for complex divide.  */
  expand_complex_div_wide (gsi, inner_type, ar, ai, br, bi, code);
  break;

to force gfortran through the C language code path, I get

void doit (complex(kind=8) & restrict z)
{
  complex(kind=8) _1;
  complex(kind=8) _2;
  complex(kind=8) _3;
  real(kind=8) _7;
  real(kind=8) _8;
  real(kind=8) _9;
  real(kind=8) _10;
  real(kind=8) _11;
  real(kind=8) _12;

   :
  _7 = REALPART_EXPR <*z_5(D)>;
  _8 = IMAGPART_EXPR <*z_5(D)>;
  _1 = COMPLEX_EXPR <_7, _8>;
  _9 = REALPART_EXPR <*z_5(D)>;
  _10 = IMAGPART_EXPR <*z_5(D)>;
  _2 = COMPLEX_EXPR <_9, _10>;
  _3 = __divdc3 (_7, _8, _9, _10);
  _11 = REALPART_EXPR <_3>;
  _12 = IMAGPART_EXPR <_3>;
  REALPART_EXPR <*z_5(D)> = _11;
  IMAGPART_EXPR <*z_5(D)> = _12;
  return;

}

with the result

%  gfcx -o z -fdump-tree-all a.f90 && ./z
   (1.79769313486231571E+308,1.79769313486231571E+308)
   (1.,0.)

So, is -fcx-fortran-rules a relic of g77 past?

[Bug debug/99090] gsplit-dwarf broken on riscv64-linux

2022-11-18 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #9 from Jeffrey A. Law  ---
It was fixed on the trunk, in time for gcc-12.  I can't see that we're likely
to backport to gcc-11 or earlier.  So closing as fixed.

[Bug rtl-optimization/100647] ICE during sms pass

2022-11-18 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100647

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P4
 CC||law at gcc dot gnu.org

[Bug rtl-optimization/103296] Select satisfied register for deleting noop move instruction.

2022-11-18 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103296

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Jeffrey A. Law  ---
Should have been fixed by JoJo's patch on the trunk last year.

gcc-11-20221118 is now available

2022-11-18 Thread GCC Administrator via Gcc
Snapshot gcc-11-20221118 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20221118/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision 81f30630e8375eb91b0e6e4aa705c7f6419edda6

You'll find:

 gcc-11-20221118.tar.xz   Complete GCC

  SHA256=10336f3bf2fc4a117583628ece36a7e210458ae1948621977870e27cd58a4774
  SHA1=e5d3ab1b49d2d453415fccc76f08d00926b2b98b

Diffs from 11-2022 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-18 Thread Jeff Law



On 11/18/22 14:26, Philipp Tomsich wrote:

On Fri, 18 Nov 2022 at 22:13, Jeff Law  wrote:


On 11/9/22 16:07, Philipp Tomsich wrote:

Handling the register-const_int addition has very quickly escalated to
creating a full sign-extended 32bit constant and performing a
register-register for RISC-V in GCC so far, resulting in sequences like
(for the case of "a + 2048"):
   li  a5,4096
   addia5,a5,-2048
   add a0,a0,a5

By adding an expansion for add3, we can emit optimised RTL that
matches the capabilities of RISC-V better by adding support for the
following, previously unoptimised cases:
- addi + addi
   addia0,a0,2047
   addia0,a0,1
- li + sh[123]add (if Zba is enabled)
   li  a5,960
   sh3add  a0,a5,a0

With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
the expander will otherwise wrap the resulting set-expression in an
insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.

This closes the gap to LLVM, which has already been emitting these
optimised sequences.

Note that this benefits is perlbench (in SPEC CPU 2017), which needs
to add the constant 3840.

gcc/ChangeLog:

   * config/riscv/bitmanip.md (*shNadd): Rename.
   (riscv_shNadd): Expose as gen_riscv_shNadd{di/si}.
   * config/riscv/predicates.md (const_arith_shifted123_operand):
   New predicate (for constants that are a simm12, shifted by
   1, 2 or 3).
   (const_arith_2simm12_operand): New predicate (that can be
   expressed by adding 2 simm12 together).
   (addi_operand): New predicate (an immedaite operand suitable
   for the new add3 expansion).
   * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
   Don't use gen_add3_insn, where a RTX instead of an INSN is
   required (otherwise this will break as soon as we have a
   define_expand for add3).
   (riscv_adjust_libcall_cfi_epilogue): Same.
   * config/riscv/riscv.md (addsi3): Rename.
   (riscv_addsi3): New name for addsi3.
   (adddi3): Rename.
   (riscv_adddi3): New name for adddi3.
   (add3): New expander that handles the basic and fancy
   (such as li+sh[123]add, addi+addi, ...) cases for adding
   register-register and register-const_int.

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/addi.c: New test.
   * gcc.target/riscv/zba-shNadd-06.c: New test.

Signed-off-by: Philipp Tomsich 
---

   gcc/config/riscv/bitmanip.md  |  2 +-
   gcc/config/riscv/predicates.md| 28 +
   gcc/config/riscv/riscv.cc | 10 ++--
   gcc/config/riscv/riscv.md | 58 ++-
   gcc/testsuite/gcc.target/riscv/addi.c | 39 +
   .../gcc.target/riscv/zba-shNadd-06.c  | 11 
   6 files changed, 141 insertions(+), 7 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
   create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c



diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 171a0cdced6..289ff7470c6 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -464,6 +464,60 @@
 [(set_attr "type" "arith")
  (set_attr "mode" "DI")])

+(define_expand "add3"
+  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
+ (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
+   (match_operand:GPR 2 "addi_operand"  " r,I")))]
+  ""
+{
+  if (arith_operand (operands[2], mode))
+emit_insn (gen_riscv_add3 (operands[0], operands[1], operands[2]));
+  else if (const_arith_2simm12_operand (operands[2], mode))
+{
+  /* Split into two immediates that add up to the desired value:
+   * e.g., break up "a + 2445" into:
+   * addia0,a0,2047
+   *  addi   a0,a0,398
+   */

Nit.  GNU comment style please.



+
+  HOST_WIDE_INT val = INTVAL (operands[2]);
+  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
+
+  if (val >= 0)
+  saturated = ~saturated;
+
+  val -= saturated;
+
+  rtx tmp = gen_reg_rtx (mode);

Can't add3 be generated by LRA?  If so, don't you have to guard
against going into this path as we shouldn't be creating new pseudos at
that point (I know LRA can create some internally, but I don't think it
handles new ones showing up due to target expanders).


Similarly for the shifted_123 case immediately following.


If we do indeed have an issue here, I'm not sure how best to resolve.
If the output operand does not overlap with the inputs, then we're
golden and can just re-use it to form the constant.  If not,  then it's
a bit tougher.  I'm not keen to add a test of no_new_pseudos to the
operand predicate, but I don't see a better option yet.

 From a cursory glance, LRA does not try to go through gen_add3_insn,
but rather forms PLUS rtx.  This 

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #4 from kargl at gcc dot gnu.org ---
(In reply to anlauf from comment #3)
> I guess the reporter assumes that gcc uses a clever algorithm like Smith's
> to handle such extreme cases of complex division.  Not sure if that one is
> available by some compilation flag, and I think it would impact performance.
> 
> In any case, if the reporter wants to get robust results and in a portable
> way, I would advise him to change/fix his algorithm accordingly.  It appears
> that a few other compilers behave here like gfortran.

It's likely coming from the middle-end where gcc.info has
the option

'-fcx-fortran-rules'
 Complex multiplication and division follow Fortran rules.  Range
 reduction is done as part of complex division, but there is no
 checking whether the result of a complex multiplication or division
 is 'NaN + I*NaN', with an attempt to rescue the situation in that
 case.

Consider

program zdiv
   implicit none
   integer, parameter :: dp = kind(1.d0)
   real(dp) r
   complex(dp) :: y

!   r = huge(1.d0) / 2  ! yields (1,0)
!   r = nearest(huge(1.d0), -1.d0)  ! yields (1,0)
   r = huge(1.d0)   ! yields (NaN,0)
   y = cmplx(r, r, dp)
   write(*,*) y
   call doit(y)
   write(*,*) y
   contains
  subroutine doit(z)
 complex(dp) z
 z = z / z
  end subroutine doit
end program zdiv

If you compile this with -fdump-tree-all, then one gets
(I've added annotation with <--- marker)


% more z-a.f90.241t.cplxlower0

__attribute__((fn spec (". w ")))
void doit (complex(kind=8) & restrict z)
{
  real(kind=8) D.4265;
  real(kind=8) D.4264;
  complex(kind=8) _1;
  complex(kind=8) _2;
  complex(kind=8) _3;
  real(kind=8) _7;
  real(kind=8) _8;
  real(kind=8) _9;
  real(kind=8) _10;
  real(kind=8) _11;
  real(kind=8) _12;
  logical(kind=1) _13;
  real(kind=8) _14;
  real(kind=8) _15;
  real(kind=8) _16;
  real(kind=8) _17;
  real(kind=8) _18;
  real(kind=8) _19;
  real(kind=8) _20;
  real(kind=8) _21;
  real(kind=8) _22;
  real(kind=8) _23;
  real(kind=8) _24;
  real(kind=8) _25;
  real(kind=8) _26;
  real(kind=8) _27;
  real(kind=8) _28;
  real(kind=8) _29;
  real(kind=8) _30;
  real(kind=8) _31;
  real(kind=8) _32;
  real(kind=8) _33;
  real(kind=8) _34;
  real(kind=8) _35;
  real(kind=8) _36;
  real(kind=8) _37;
  real(kind=8) _38;
  real(kind=8) _39;

   :
  _7 = REALPART_EXPR <*z_5(D)>;
  _8 = IMAGPART_EXPR <*z_5(D)>;
  _1 = COMPLEX_EXPR <_7, _8>;
  _9 = REALPART_EXPR <*z_5(D)>;
  _10 = IMAGPART_EXPR <*z_5(D)>;
  _2 = COMPLEX_EXPR <_9, _10>;
  _11 = ABS_EXPR <_9>;
  _12 = ABS_EXPR <_10>;
  _13 = _11 < _12;
  if (_13 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   :
  _14 = _9 / _10;  <--- Should be one
  _15 = _9 * _14;  <--- huge(1.d0)
  _16 = _15 + _10; <--- huge(1.d0) + huge(1.d0) = inf
  _17 = _7 * _14;  <--- huge(1.d0)
  _18 = _17 + _8;  <--- huge(1.d0) + huge(1.d0) = inf
  _19 = _8 * _14;  <--- huge(1.d0)
  _20 = _19 - _7;  <--- huge(1.d0) - huge(1.d0) = 0
  _21 = _18 / _16; <--- inf / inf = NaN
  _22 = _20 / _16; <--- 0 / inf = 0
  _36 = _21;
  _37 = _22;
  goto ; [100.00%]

   :
  _23 = _10 / _9;
  _24 = _10 * _23;
  _25 = _24 + _9;
  _26 = _8 * _23;
  _27 = _26 + _7;
  _28 = _7 * _23;
  _29 = _8 - _28;
  _30 = _27 / _25;
  _31 = _29 / _25;
  _38 = _30;
  _39 = _31;

   :
  # _34 = PHI <_36(4), _38(5)>
  # _35 = PHI <_37(4), _39(5)>
  _3 = COMPLEX_EXPR <_34, _35>;
  _32 = _34;
  _33 = _35;
  REALPART_EXPR <*z_5(D)> = _32;
  IMAGPART_EXPR <*z_5(D)> = _33;
  return;

}

[committed] analyzer: move more impl_* to known_function

2022-11-18 Thread David Malcolm via Gcc-patches
Fix a missing check that the argument to __analyzer_dump_capacity must
be a pointer type (which would otherwise lead to an ICE).

Do so by using the known_function_manager rather than by doing lots of
string matching.  Do the same for many other functions.

Doing so moves the type-checking closer to the logic that makes use
of it, by putting them in the same class, rather than splitting them
up between two source files (and sometimes three, e.g. for "pipe").
I hope this reduces the number of missing checks.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4157-g1c4a7881c49279.

gcc/analyzer/ChangeLog:
* analyzer.cc (is_pipe_call_p): Delete.
* analyzer.h (is_pipe_call_p): Delete.
* region-model-impl-calls.cc (call_details::get_location): New.
(class kf_analyzer_break): New, adapted from
region_model::on_stmt_pre.
(region_model::impl_call_analyzer_describe): Convert to...
(class kf_analyzer_describe): ...this.
(region_model::impl_call_analyzer_dump_capacity): Convert to...
(class kf_analyzer_dump_capacity): ...this.
(region_model::impl_call_analyzer_dump_escaped): Convert to...
(class kf_analyzer_dump_escaped): ...this.
(class kf_analyzer_dump_exploded_nodes): New.
(region_model::impl_call_analyzer_dump_named_constant): Convert
to...
(class kf_analyzer_dump_named_constant): ...this.
(class dump_path_diagnostic): Move here from region-model.cc.
(class kf_analyzer_dump_path) New, adapted from
region_model::on_stmt_pre.
(class kf_analyzer_dump_region_model): Likewise.
(region_model::impl_call_analyzer_eval): Convert to...
(class kf_analyzer_eval): ...this.
(region_model::impl_call_analyzer_get_unknown_ptr): Convert to...
(class kf_analyzer_get_unknown_ptr): ...this.
(class known_function_accept): Rename to...
(class kf_accept): ...this.
(class known_function_bind): Rename to...
(class kf_bind): ...this.
(class known_function_connect): Rename to...
(class kf_connect): ...this.
(region_model::impl_call_errno_location): Convert to...
(class kf_errno_location): ...this.
(class known_function_listen): Rename to...
(class kf_listen): ...this.
(region_model::impl_call_pipe): Convert to...
(class kf_pipe): ...this.
(region_model::impl_call_putenv): Convert to...
(class kf_putenv): ...this.
(region_model::impl_call_operator_new): Convert to...
(class kf_operator_new): ...this.
(region_model::impl_call_operator_delete): Convert to...
(class kf_operator_delete): ...this.
(class known_function_socket): Rename to...
(class kf_socket): ...this.
(register_known_functions): Rename param to KFM.  Break out
existing known functions into a "POSIX" section, and add "pipe",
"pipe2", and "putenv".  Add debugging functions
"__analyzer_break", "__analyzer_describe",
"__analyzer_dump_capacity", "__analyzer_dump_escaped",
"__analyzer_dump_exploded_nodes",
"__analyzer_dump_named_constant", "__analyzer_dump_path",
"__analyzer_dump_region_model", "__analyzer_eval",
"__analyzer_get_unknown_ptr".  Add C++ support functions
"operator new", "operator new []", "operator delete", and
"operator delete []".
* region-model.cc (class dump_path_diagnostic): Move to
region-model-impl-calls.cc.
(region_model::on_stmt_pre): Eliminate special-casing of
"__analyzer_describe", "__analyzer_dump_capacity",
"__analyzer_dump_escaped", "__analyzer_dump_named_constant",
"__analyzer_dump_path", "__analyzer_dump_region_model",
"__analyzer_eval", "__analyzer_break",
"__analyzer_dump_exploded_nodes", "__analyzer_get_unknown_ptr",
"__errno_location", "pipe", "pipe2", "putenv", "operator new",
"operator new []", "operator delete", "operator delete []"
"pipe" and "pipe2", handling them instead via the known_functions
mechanism.
* region-model.h (call_details::get_location): New decl.
(region_model::impl_call_analyzer_describe): Delete decl.
(region_model::impl_call_analyzer_dump_capacity): Delete decl.
(region_model::impl_call_analyzer_dump_escaped): Delete decl.
(region_model::impl_call_analyzer_dump_named_constant): Delete decl.
(region_model::impl_call_analyzer_eval): Delete decl.
(region_model::impl_call_analyzer_get_unknown_ptr): Delete decl.
(region_model::impl_call_errno_location): Delete decl.
(region_model::impl_call_pipe): Delete decl.
(region_model::impl_call_putenv): Delete decl.
(region_model::impl_call_operator_new): Delete decl.
(region_model::impl_call_operator_delete): Delete decl.
* sm-fd.cc: 

[PATCH] c++: remove coerce_innermost_template_parms

2022-11-18 Thread Patrick Palka via Gcc-patches
IIUC the only practical difference between coerce_innermost_template_parms
and the main function coerce_template_parms is that the former takes
a multi-level template parameter list and returns a template argument
vector of the same depth, whereas the latter takes a single-level
template parameter vector and returns a single-level template argument
vector.

This patch gets rid of the wrapper function and just overloads the
behavior of the main function according to whether 'parms' is a
multi-level template parameter list or a single-level template argument
vector.  It turns out we can assume parms and args have the same depth
in the multi-level case, which simplifies the overloading logic.

Besides the (subjective) simplificatio benefit, another benefit of this
unification is that it avoids a redundant copy of a multi-level 'args'.
Now, we can return new_args directly from c_t_p.  (And because of this,
we need to turn new_inner_args into a reference so that updating it also
updates new_args.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (coerce_template_parms): Salvage part of the function
comment from c_innermost_t_p.  Handle parms being a full
template parameter list.
(coerce_innermost_template_parms): Remove.
(lookup_template_class): Use c_t_p instead of c_innermost_t_p.
(finish_template_variable): Likewise.
(tsubst_decl): Likewise.
(instantiate_alias_template): Likewise.
---
 gcc/cp/pt.cc | 92 +++-
 1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0310e38c9b9..2666e455edf 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -148,8 +148,6 @@ static void add_pending_template (tree);
 static tree reopen_tinst_level (struct tinst_level *);
 static tree tsubst_initializer_list (tree, tree);
 static tree get_partial_spec_bindings (tree, tree, tree);
-static tree coerce_innermost_template_parms (tree, tree, tree, tsubst_flags_t,
-bool = true);
 static void tsubst_enum(tree, tree, tree);
 static bool check_instantiated_args (tree, tree, tsubst_flags_t);
 static int check_non_deducible_conversion (tree, tree, unification_kind_t, int,
@@ -8827,6 +8825,14 @@ pack_expansion_args_count (tree args)
arguments.  If any error occurs, return error_mark_node. Error and
warning messages are issued under control of COMPLAIN.
 
+   If PARMS represents all template parameters levels, this function
+   returns a vector of vectors representing all the resulting argument
+   levels.  Note that in this case, only the innermost arguments are
+   coerced because the outermost ones are supposed to have been coerced
+   already.  Otherwise, if PARMS represents only (the innermost) vector
+   of parameters, this function returns a vector containing just the
+   innermost resulting arguments.
+
If REQUIRE_ALL_ARGS is false, argument deduction will be performed
for arguments not specified in ARGS.  If REQUIRE_ALL_ARGS is true,
arguments not specified in ARGS must have default arguments which
@@ -8842,8 +8848,6 @@ coerce_template_parms (tree parms,
   int nparms, nargs, parm_idx, arg_idx, lost = 0;
   tree orig_inner_args;
   tree inner_args;
-  tree new_args;
-  tree new_inner_args;
 
   /* When used as a boolean value, indicates whether this is a
  variadic template parameter list. Since it's an int, we can also
@@ -8864,6 +8868,17 @@ coerce_template_parms (tree parms,
   if (args == error_mark_node)
 return error_mark_node;
 
+  bool return_full_args = false;
+  if (TREE_CODE (parms) == TREE_LIST)
+{
+  if (TMPL_PARMS_DEPTH (parms) > 1)
+   {
+ gcc_assert (TMPL_PARMS_DEPTH (parms) == TMPL_ARGS_DEPTH (args));
+ return_full_args = true;
+   }
+  parms = INNERMOST_TEMPLATE_PARMS (parms);
+}
+
   nparms = TREE_VEC_LENGTH (parms);
 
   /* Determine if there are any parameter packs or default arguments.  */
@@ -8961,8 +8976,8 @@ coerce_template_parms (tree parms,
  template-id may be nested within a "sizeof".  */
   cp_evaluated ev;
 
-  new_inner_args = make_tree_vec (nparms);
-  new_args = add_outermost_template_args (args, new_inner_args);
+  tree new_args = add_outermost_template_args (args, make_tree_vec (nparms));
+  tree& new_inner_args = TMPL_ARGS_LEVEL (new_args, TMPL_ARGS_DEPTH 
(new_args));
   int pack_adjust = 0;
   for (parm_idx = 0, arg_idx = 0; parm_idx < nparms; parm_idx++, arg_idx++)
 {
@@ -9164,59 +9179,7 @@ coerce_template_parms (tree parms,
 SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 TREE_VEC_LENGTH (new_inner_args));
 
-  return new_inner_args;
-}
-
-/* Like coerce_template_parms.  If PARMS represents all template
-   parameters levels, this function returns a vector of vectors
-   representing all the resulting argument 

[Bug tree-optimization/25290] PHI-OPT could be rewritten so that is uses match

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25290

--- Comment #26 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #25)

> The minmax is recorded as PR 101024.  There is some more improvements to
> gimple_simplify_phiopt needed for early_p as the way min/max patterns are
> generated in match.pd (extra casts).

The early_p part was fixed in r12-2185-g5f2d3ff4e5e2ec .

[PATCH] c++: cache the normal form of a concept-id

2022-11-18 Thread Patrick Palka via Gcc-patches
We already cache the overall normal form of a declaration's constraints
under the assumption that it can't change over the translation unit.
But if we have two constrained declarations such as

  template void f() requires expensive && A;
  template void g() requires expensive && B;

then despite this high-level caching we'd still redundantly have to
expand the concept-id expensive twice, once during normalization of
f's constraints and again during normalization of g's.  Ideally, we'd
reuse the previously computed normal form of expensive the second
time around.

To that end this patch introduces an intermediate layer of caching
during constraint normalization -- caching of the normal form of a
concept-id -- that sits between our high-level caching of the overall
normal form of a declaration's constraints and our low-level caching of
each individual atomic constraint.

It turns out this caching generalizes some ad-hoc caching of the normal
form of concept definition (which is equivalent to the normal form of
the concept-id C where gtargs are C's generic arguments) so
this patch unifies the caching accordingly.

This change improves compile time/memory usage for e.g. the libstdc++
test std/ranges/adaptors/join.cc by 10%/5%.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* constraint.cc (struct norm_entry): Define.
(struct norm_hasher): Define.
(norm_cache): Define.
(normalize_concept_check): Add function comment.  Cache the
result of concept-id normalization.  Canonicalize generic
arguments as NULL_TREE.  Don't coerce arguments unless
substitution occurred.
(normalize_concept_definition): Simplify.  Use norm_cache
instead of ad-hoc caching.
---
 gcc/cp/constraint.cc | 94 ++--
 1 file changed, 82 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a113d3e269e..c9740b1ec78 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -698,6 +698,40 @@ normalize_logical_operation (tree t, tree args, tree_code 
c, norm_info info)
   return build2 (c, ci, t0, t1);
 }
 
+/* Data types and hash functions for caching the normal form of a concept-id.
+   This essentially memoizes calls to normalize_concept_check.  */
+
+struct GTY((for_user)) norm_entry
+{
+  /* The CONCEPT_DECL of the concept-id.  */
+  tree tmpl;
+  /* The arguments of the concept-id.  */
+  tree args;
+  /* The normal form of the concept-id.  */
+  tree norm;
+};
+
+struct norm_hasher : ggc_ptr_hash
+{
+  static hashval_t hash (norm_entry *t)
+  {
+hashval_t hash = iterative_hash_template_arg (t->tmpl, 0);
+hash = iterative_hash_template_arg (t->args, hash);
+return hash;
+  }
+
+  static bool equal (norm_entry *t1, norm_entry *t2)
+  {
+return t1->tmpl == t2->tmpl
+  && template_args_equal (t1->args, t2->args);
+  }
+};
+
+static GTY((deletable)) hash_table *norm_cache;
+
+/* Normalize the concept check CHECK where ARGS are the
+   arguments to be substituted into CHECK's arguments.  */
+
 static tree
 normalize_concept_check (tree check, tree args, norm_info info)
 {
@@ -720,24 +754,53 @@ normalize_concept_check (tree check, tree args, norm_info 
info)
 targs = tsubst_template_args (targs, args, info.complain, info.in_decl);
   if (targs == error_mark_node)
 return error_mark_node;
+  if (template_args_equal (targs, generic_targs_for (tmpl)))
+/* Canonicalize generic arguments as NULL_TREE, as an optimization.  */
+targs = NULL_TREE;
 
   /* Build the substitution for the concept definition.  */
   tree parms = TREE_VALUE (DECL_TEMPLATE_PARMS (tmpl));
-  /* Turn on template processing; coercing non-type template arguments
- will automatically assume they're non-dependent.  */
   ++processing_template_decl;
-  tree subst = coerce_template_parms (parms, targs, tmpl, tf_none);
+  if (targs && args)
+/* If substitution occurred, coerce the resulting arguments.  */
+targs = coerce_template_parms (parms, targs, tmpl, tf_none);
   --processing_template_decl;
-  if (subst == error_mark_node)
+  if (targs == error_mark_node)
 return error_mark_node;
 
+  if (!norm_cache)
+norm_cache = hash_table::create_ggc (31);
+  norm_entry entry = {tmpl, targs, NULL_TREE};
+  norm_entry **slot = nullptr;
+  hashval_t hash = 0;
+  if (!info.generate_diagnostics ())
+{
+  /* If we're not diagnosing, cache the normal form of the
+substituted concept-id.  */
+  hash = norm_hasher::hash ();
+  slot = norm_cache->find_slot_with_hash (, hash, INSERT);
+  if (*slot)
+   return (*slot)->norm;
+}
+
   /* The concept may have been ill-formed.  */
   tree def = get_concept_definition (DECL_TEMPLATE_RESULT (tmpl));
   if (def == error_mark_node)
 return error_mark_node;
 
   info.update_context (check, args);
-  return normalize_expression (def, subst, info);
+  tree norm = 

[Bug target/107692] [13 regression] r13-3950-g071e428c24ee8c breaks many test cases

2022-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107692

--- Comment #10 from Segher Boessenkool  ---
(In reply to Hongyu Wang from comment #9)
> The difference is, -mno-unroll-only-small-loops -O2 would cause
> rtl-loop-unroll takeing effect,

No.  -m{no-,}unroll-only-small-loops does not enable or disable loop unrolling
at all.  The only thing it does is modify which loops are candidate to be
unrolled.

> I think the intension of -munroll-only-small-loops is to just adjust
> rtl-loop-unrolling and do not touch middle-end unroll/cunroll.

It modifies the behaviour of -funroll-loops.  It doesn't do anythyng else.
Anything that wants to see if unrolling is active can just look if
flag_unroll_loops is set.  The sane and simple thing.

> But I think
> your point is also reasonable. Maybe we can split the flag_unroll_loops to
> tree and rtl seperately?

Users do not care if something is done on Gimple or on RTL.  The command line
flags are for users.  They work fine as-is.

> Anyway I will propose a patch and re-discuss with maintainers later. Thanks!

Please fix this regression asap.  It is a P1, and we are in stage 3 already.

[Bug tree-optimization/107754] Confusing -Warray-bounds warning with strcpy with a null pointer and non-zero offset for struct array

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107754

--- Comment #2 from Andrew Pinski  ---
Note in the origin "reduced" testcase, we had a conditional null pointer which
was exposing the null pointer at -O2 due to optimizations.

[Bug tree-optimization/107754] Confusing -Warray-bounds warning with strcpy with a null pointer

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107754

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2022-11-18
 Ever confirmed|0   |1
Summary|Confusing -Warray-bounds|Confusing -Warray-bounds
   |warning with strcpy |warning with strcpy with a
   ||null pointer
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed. Note the warning message is correct; just confusing and does not
mention a null pointer here.

Reduced further:
struct Foo
{
   unsigned int a;
   char bar[1024];
};
void setFoo(const char * value)
{
   struct Foo * ptr = 0;
   __builtin_strcpy(ptr->bar, value);
}


 CUT ---
Since the offset for the character array is non-zero, we see a non-zero
constant and (based on other settings) assume it is the null pointer page and
we get a size of 0 (which is ok) but don't mention a null pointer.

There might be other dups of this but I am not going to search for it right
now.

Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-18 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 22:13, Jeff Law  wrote:
>
>
> On 11/9/22 16:07, Philipp Tomsich wrote:
> > Handling the register-const_int addition has very quickly escalated to
> > creating a full sign-extended 32bit constant and performing a
> > register-register for RISC-V in GCC so far, resulting in sequences like
> > (for the case of "a + 2048"):
> >   li  a5,4096
> >   addia5,a5,-2048
> >   add a0,a0,a5
> >
> > By adding an expansion for add3, we can emit optimised RTL that
> > matches the capabilities of RISC-V better by adding support for the
> > following, previously unoptimised cases:
> >- addi + addi
> >   addia0,a0,2047
> >   addia0,a0,1
> >- li + sh[123]add (if Zba is enabled)
> >   li  a5,960
> >   sh3add  a0,a5,a0
> >
> > With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
> > and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
> > the expander will otherwise wrap the resulting set-expression in an
> > insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.
> >
> > This closes the gap to LLVM, which has already been emitting these
> > optimised sequences.
> >
> > Note that this benefits is perlbench (in SPEC CPU 2017), which needs
> > to add the constant 3840.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*shNadd): Rename.
> >   (riscv_shNadd): Expose as gen_riscv_shNadd{di/si}.
> >   * config/riscv/predicates.md (const_arith_shifted123_operand):
> >   New predicate (for constants that are a simm12, shifted by
> >   1, 2 or 3).
> >   (const_arith_2simm12_operand): New predicate (that can be
> >   expressed by adding 2 simm12 together).
> >   (addi_operand): New predicate (an immedaite operand suitable
> >   for the new add3 expansion).
> >   * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
> >   Don't use gen_add3_insn, where a RTX instead of an INSN is
> >   required (otherwise this will break as soon as we have a
> >   define_expand for add3).
> >   (riscv_adjust_libcall_cfi_epilogue): Same.
> >   * config/riscv/riscv.md (addsi3): Rename.
> >   (riscv_addsi3): New name for addsi3.
> >   (adddi3): Rename.
> >   (riscv_adddi3): New name for adddi3.
> >   (add3): New expander that handles the basic and fancy
> >   (such as li+sh[123]add, addi+addi, ...) cases for adding
> >   register-register and register-const_int.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/addi.c: New test.
> >   * gcc.target/riscv/zba-shNadd-06.c: New test.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >   gcc/config/riscv/bitmanip.md  |  2 +-
> >   gcc/config/riscv/predicates.md| 28 +
> >   gcc/config/riscv/riscv.cc | 10 ++--
> >   gcc/config/riscv/riscv.md | 58 ++-
> >   gcc/testsuite/gcc.target/riscv/addi.c | 39 +
> >   .../gcc.target/riscv/zba-shNadd-06.c  | 11 
> >   6 files changed, 141 insertions(+), 7 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c
> >
> >
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 171a0cdced6..289ff7470c6 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -464,6 +464,60 @@
> > [(set_attr "type" "arith")
> >  (set_attr "mode" "DI")])
> >
> > +(define_expand "add3"
> > +  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
> > + (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
> > +   (match_operand:GPR 2 "addi_operand"  " r,I")))]
> > +  ""
> > +{
> > +  if (arith_operand (operands[2], mode))
> > +emit_insn (gen_riscv_add3 (operands[0], operands[1], 
> > operands[2]));
> > +  else if (const_arith_2simm12_operand (operands[2], mode))
> > +{
> > +  /* Split into two immediates that add up to the desired value:
> > +   * e.g., break up "a + 2445" into:
> > +   * addia0,a0,2047
> > +   *  addi   a0,a0,398
> > +   */
>
> Nit.  GNU comment style please.
>
>
> > +
> > +  HOST_WIDE_INT val = INTVAL (operands[2]);
> > +  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
> > +
> > +  if (val >= 0)
> > +  saturated = ~saturated;
> > +
> > +  val -= saturated;
> > +
> > +  rtx tmp = gen_reg_rtx (mode);
>
> Can't add3 be generated by LRA?  If so, don't you have to guard
> against going into this path as we shouldn't be creating new pseudos at
> that point (I know LRA can create some internally, but I don't think it
> handles new ones showing up due to target expanders).
>
>
> Similarly for the shifted_123 case immediately following.
>
>
> If we do indeed have an issue here, I'm not sure how best to resolve.
> If the output 

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

--- Comment #3 from anlauf at gcc dot gnu.org ---
I guess the reporter assumes that gcc uses a clever algorithm like Smith's
to handle such extreme cases of complex division.  Not sure if that one is
available by some compilation flag, and I think it would impact performance.

In any case, if the reporter wants to get robust results and in a portable
way, I would advise him to change/fix his algorithm accordingly.  It appears
that a few other compilers behave here like gfortran.

[Bug c/107754] New: Confusing -Warray-bounds warning with strcpy

2022-11-18 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107754

Bug ID: 107754
   Summary: Confusing -Warray-bounds warning with strcpy
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nightstrike at gmail dot com
  Target Milestone: ---

Metabug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=warray-bounds

// Compile with GCC 12.2.0 with -O2:
//warning: ‘strcpy’ offset 0 is out of the bounds [0, 0] [-Warray-bounds]
// no warning without -O2

struct Inst;
struct Class { int offset; };

static struct Class * classFoo; // No problem without static here

struct Foo
{
   unsigned int a;   // no problem if commented out
   char bar[1024];
};

void setFoo(struct Inst * this, const char * value)
{
   struct Foo * ptr = (struct Foo *)(this ? (((char *)this) + classFoo->offset)
: 0);
   __builtin_strcpy(ptr->bar, value);
}


$ gcc-12 -c -O2 -Warray-bounds a.c -o /dev/null
a.c: In function 'setFoo':
a.c:19:4: warning: '__builtin_strcpy' offset 0 is out of the bounds [0, 0]
[-Warray-bounds]
   19 |__builtin_strcpy(ptr->bar, value);
  |^

Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/9/22 16:07, Philipp Tomsich wrote:

Handling the register-const_int addition has very quickly escalated to
creating a full sign-extended 32bit constant and performing a
register-register for RISC-V in GCC so far, resulting in sequences like
(for the case of "a + 2048"):
li  a5,4096
addia5,a5,-2048
add a0,a0,a5

By adding an expansion for add3, we can emit optimised RTL that
matches the capabilities of RISC-V better by adding support for the
following, previously unoptimised cases:
   - addi + addi
addia0,a0,2047
addia0,a0,1
   - li + sh[123]add (if Zba is enabled)
li  a5,960
sh3add  a0,a5,a0

With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
the expander will otherwise wrap the resulting set-expression in an
insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.

This closes the gap to LLVM, which has already been emitting these
optimised sequences.

Note that this benefits is perlbench (in SPEC CPU 2017), which needs
to add the constant 3840.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*shNadd): Rename.
(riscv_shNadd): Expose as gen_riscv_shNadd{di/si}.
* config/riscv/predicates.md (const_arith_shifted123_operand):
New predicate (for constants that are a simm12, shifted by
1, 2 or 3).
(const_arith_2simm12_operand): New predicate (that can be
expressed by adding 2 simm12 together).
(addi_operand): New predicate (an immedaite operand suitable
for the new add3 expansion).
* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
Don't use gen_add3_insn, where a RTX instead of an INSN is
required (otherwise this will break as soon as we have a
define_expand for add3).
(riscv_adjust_libcall_cfi_epilogue): Same.
* config/riscv/riscv.md (addsi3): Rename.
(riscv_addsi3): New name for addsi3.
(adddi3): Rename.
(riscv_adddi3): New name for adddi3.
(add3): New expander that handles the basic and fancy
(such as li+sh[123]add, addi+addi, ...) cases for adding
register-register and register-const_int.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/addi.c: New test.
* gcc.target/riscv/zba-shNadd-06.c: New test.

Signed-off-by: Philipp Tomsich 
---

  gcc/config/riscv/bitmanip.md  |  2 +-
  gcc/config/riscv/predicates.md| 28 +
  gcc/config/riscv/riscv.cc | 10 ++--
  gcc/config/riscv/riscv.md | 58 ++-
  gcc/testsuite/gcc.target/riscv/addi.c | 39 +
  .../gcc.target/riscv/zba-shNadd-06.c  | 11 
  6 files changed, 141 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c



diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 171a0cdced6..289ff7470c6 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -464,6 +464,60 @@
[(set_attr "type" "arith")
 (set_attr "mode" "DI")])
  
+(define_expand "add3"

+  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
+   (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
+ (match_operand:GPR 2 "addi_operand"  " r,I")))]
+  ""
+{
+  if (arith_operand (operands[2], mode))
+emit_insn (gen_riscv_add3 (operands[0], operands[1], operands[2]));
+  else if (const_arith_2simm12_operand (operands[2], mode))
+{
+  /* Split into two immediates that add up to the desired value:
+   * e.g., break up "a + 2445" into:
+   * addi  a0,a0,2047
+   *addi   a0,a0,398
+   */


Nit.  GNU comment style please.



+
+  HOST_WIDE_INT val = INTVAL (operands[2]);
+  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
+
+  if (val >= 0)
+saturated = ~saturated;
+
+  val -= saturated;
+
+  rtx tmp = gen_reg_rtx (mode);


Can't add3 be generated by LRA?  If so, don't you have to guard 
against going into this path as we shouldn't be creating new pseudos at 
that point (I know LRA can create some internally, but I don't think it 
handles new ones showing up due to target expanders).



Similarly for the shifted_123 case immediately following.


If we do indeed have an issue here, I'm not sure how best to resolve.  
If the output operand does not overlap with the inputs, then we're 
golden and can just re-use it to form the constant.  If not,  then it's 
a bit tougher.  I'm not keen to add a test of no_new_pseudos to the 
operand predicate, but I don't see a better option yet.



jeff




[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org
 Status|NEW |WAITING

--- Comment #2 from kargl at gcc dot gnu.org ---
Please either include the program in your original email or attach it the PR.

Re: [PATCH v2] RISC-V: No extensions for SImode min/max against safe constant

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.

On Fri, 18 Nov 2022 at 21:11, Jeff Law  wrote:

>
> On 11/8/22 17:06, Philipp Tomsich wrote:
> > Optimize the common case of a SImode min/max against a constant
> > that is safe both for sign- and zero-extension.
> > E.g., consider the case
> >int f(unsigned int* a)
> >{
> >  const int C = 1000;
> >  return *a * 3 > C ? C : *a * 3;
> >}
> > where the constant C will yield the same result in DImode whether
> > sign- or zero-extended.
> >
> > This should eventually go away once the lowering to RTL smartens up
> > and considers the precision/signedness and the value-ranges of the
> > operands to MIN_EXPR nad MAX_EXPR.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*minmax): Additional pattern for
> >min/max against constants that are extension-invariant.
> >   * config/riscv/iterators.md (minmax_optab): Add an iterator
> > that has only min and max rtl.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zbb-min-max-02.c: New test.
>
> Ok
>
> jeff
>
>
>


[Bug analyzer/107582] - -Wanalyzer-use-of-uninitialized-value false positive with while loop in pthread_cleanup_push

2022-11-18 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107582

--- Comment #5 from David Malcolm  ---
It's a bug in feasibility-checking when jumping through a function pointer:
dynamic_call_info_t::update_model blindly copies over the state from the
exploded_node's state, overwriting the precise knowledge of "ret" (which was
known to be == 110) with the UNKNOWN svalue.

Re: [PATCH v2] libcpp: Avoid remapping filenames within directives

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/2/22 04:47, Richard Purdie via Gcc-patches wrote:

Code such as:

  #include __FILE__

can interact poorly with the *-prefix-map options when cross compiling. In
general you're after to remap filenames for use in target context but the
local paths should be used to find include files at compile time. Ingoring
filename remapping for directives allows avoiding such failures.

Fix this to improve such usage and then document this against file-prefix-map
(referenced by the other *-prefix-map options) to make the behaviour clear
and defined.

libcpp/ChangeLog:

 * macro.cc (_cpp_builtin_macro_text): Don't remap filenames within 
directives

gcc/ChangeLog:

 * doc/invoke.texi: Document prefix-maps don't affect directives


THanks.  Installed.  Sorry about the wait.

jeff




Re: [PATCH v2] RISC-V: No extensions for SImode min/max against safe constant

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 17:06, Philipp Tomsich wrote:

Optimize the common case of a SImode min/max against a constant
that is safe both for sign- and zero-extension.
E.g., consider the case
   int f(unsigned int* a)
   {
 const int C = 1000;
 return *a * 3 > C ? C : *a * 3;
   }
where the constant C will yield the same result in DImode whether
sign- or zero-extended.

This should eventually go away once the lowering to RTL smartens up
and considers the precision/signedness and the value-ranges of the
operands to MIN_EXPR nad MAX_EXPR.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*minmax): Additional pattern for
   min/max against constants that are extension-invariant.
* config/riscv/iterators.md (minmax_optab): Add an iterator
  that has only min and max rtl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: New test.


Ok

jeff




[Bug libstdc++/101228] tbb/task.h is Deprecated in newer TBB.

2022-11-18 Thread kerukuro at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101228

kerukuro  changed:

   What|Removed |Added

 CC||kerukuro at gmail dot com

--- Comment #13 from kerukuro  ---
Yes, this issue is not fixed.

[Bug analyzer/107582] - -Wanalyzer-use-of-uninitialized-value false positive with while loop in pthread_cleanup_push

2022-11-18 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107582

David Malcolm  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2022-11-18

--- Comment #4 from David Malcolm  ---
Thanks for filing this bug.  Am debugging it now...

Re: [PATCH] RISC-V: Fix RVV testcases.

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/5/22 18:13, Kito Cheng via Gcc-patches wrote:

Alternative fix for those testcase has posted:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605126.html


Did this ever get addressed, in either form?


jeff




Re: [PATCH] constexprify some tree variables

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/18/22 11:05, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Since we use C++11 by default now, we can
use constexpr for some const decls in tree-core.h.

This patch does that and it allows for better optimizations
of GCC code with checking enabled and without LTO.

For an example generic-match.cc compiling is speed up due
to the less number of basic blocks and less debugging info
produced. I did not check the speed of compiling the same source
but rather the speed of compiling the old vs new sources here
(but with the same compiler base).

The small slow down in the parsing of the arrays in each TU
is migrated by a speed up in how much code/debugging info
is produced in the end.

Note I looked at generic-match.cc since it is one of the
compiling sources which causes parallel building to stall and
I wanted to speed it up.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Or should this wait until GCC 13 branches off?

gcc/ChangeLog:

PR middle-end/14840
* tree-core.h (tree_code_type): Constexprify
by including all-tree.def.
(tree_code_length): Likewise.
* tree.cc (tree_code_type): Remove.
(tree_code_length): Remove.


I would have preferred this a week ago :-)   And if it was just 
const-ifying, I'd ACK it without hesitation.


Can you share any of the build-time speedups you're seeing, even if 
they're not perfect.  It'd help to get a sense of the potential gain 
here and whether or not there's enough gain to gate it into gcc-13 or 
have it wait for gcc-14.



And if we can improve the compile-time of the files generated by 
match.pd, that's a win.  It's definitely a serialization point -- it 
becomes *painfully* obvious when doing a bootstrap using qemu, when that 
file takes 1-2hrs after everything else has finished.



Jeff


[Bug tree-optimization/107751] [11/12/13 regression] False positive -Wmaybe-uninitialized at -O0

2022-11-18 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107751

Marek Polacek  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||mpolacek at gcc dot gnu.org

--- Comment #3 from Marek Polacek  ---
Started with r11-959-gb825a22890740f:

commit b825a22890740f341eae566af27e18e528cd29a7
Author: Martin Sebor 
Date:   Thu Jun 4 16:06:10 2020 -0600

Implement a solution for PR middle-end/10138 and PR middle-end/95136.

PR middle-end/10138 - warn for uninitialized arrays passed as const
arguments
PR middle-end/95136 - missing -Wuninitialized on an array access with a
variable offset

Re: [PATCH v2 0/2] Use Zbs with xori/ori/andi and polarity-reversed twobit-tests

2022-11-18 Thread Philipp Tomsich
(Both) applied to master. Thanks!
--Philipp.

On Fri, 18 Nov 2022 at 20:13, Jeff Law  wrote:

>
> On 11/18/22 04:09, Philipp Tomsich wrote:
> > We had a few patches on the list that shared predicates (for extending
> > the reach of xori and ori -- and for the branches on two
> > polarity-reversed bits) and thus depended on each other.
> >
> > These all had approval with requested changes, so these are now
> > collected together for v2.
> >
> > Note that this adds the (a & ~C) case, so please take a look on that
> > part and OK the updated series.
> >
> >
> >
> > Changes in v2:
> > - Collects already approved changes for v2 for (a | C) and (a ^ C).
> > - Pulls in the (already) approved branch on polarity-reversed bits
> >for v2, as it shares predicates with the other changes.
> > - Newly adds support for the (a & ~C) case.
> >
> > Philipp Tomsich (2):
> >RISC-V: Use bseti/bclri/binvi to extend reach of ori/andi/xori
> >RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs
> >
> >   gcc/config/riscv/bitmanip.md  | 79 +++
> >   gcc/config/riscv/iterators.md |  8 ++
> >   gcc/config/riscv/predicates.md| 33 
> >   gcc/config/riscv/riscv.h  |  8 ++
> >   .../riscv/{zbs-bclri.c => zbs-bclri-01.c} |  0
> >   gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c | 27 +++
> >   gcc/testsuite/gcc.target/riscv/zbs-binvi.c| 22 ++
> >   gcc/testsuite/gcc.target/riscv/zbs-bseti.c| 27 +++
> >   .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> >   9 files changed, 224 insertions(+)
> >   rename gcc/testsuite/gcc.target/riscv/{zbs-bclri.c => zbs-bclri-01.c}
> (100%)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-binvi.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
>
> 1/2 and 2/2 are both OK.
>
> jeff
>
>


[Bug c++/107751] [11/12/13 regression] False positive -Wmaybe-uninitialized at -O0

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107751

--- Comment #2 from Andrew Pinski  ---
Reduced testcase (removes the templates, also now able to compile as C):
typedef const int T1;
typedef const int T2;
void std_equal(T1* a1, T1* a2, T2* b1);
void f() {
int a[3] = {1, 2, 3};
T1* x= a;
T2* y= a;
std_equal(x, x+3, y);
}

[Bug c++/107751] [11/12/13 regression] False positive -Wmaybe-uninitialized at -O0

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107751

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.4
   Last reconfirmed||2022-11-18
Summary|[11/12 regression] False|[11/12/13 regression] False
   |positive|positive
   |-Wmaybe-uninitialized at|-Wmaybe-uninitialized at
   |-O0 |-O0
 Ever confirmed|0   |1
  Known to fail||11.1.0, 12.2.0, 13.0
 Status|UNCONFIRMED |NEW
  Known to work||10.4.0
   Keywords||needs-bisection

--- Comment #1 from Andrew Pinski  ---
Confirmed.

Re: [PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-18 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 20:52, Jeff Law  wrote:

> Something to consider.  We're gaining a lot of
>
> (subreg:SI (reg:DI) 0) kinds of operands.
>
>
> Would it make sense to make an operand predicate that accepted
>
> (reg:SI) or (subreg:SI (reg:DI) 0)?
>
>
> It will reduce my compaints about subregs :-)  But the real reason I'm
> suggesting we consider adding such a predicate is, AFIACT, it it gives
> combine a chance to eliminate the subreg.  I haven't actually tested
> this, but it seems like it might be worth a quick experiment independent
> of these patches (and probably targeted towards gcc-14 rather than gcc-13).
>

I like the idea. Definitively something to consider. We'll give this a try.
--Philipp.


Re: [PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks.
--Philipp.


On Fri, 18 Nov 2022 at 20:52, Jeff Law  wrote:

>
> On 11/8/22 12:57, Philipp Tomsich wrote:
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md: Handle corner-cases for combine
> >   when chaining slli(.uw)? + addw
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zba-shNadd-04.c: New test.
>
> OK.
>
> Something to consider.  We're gaining a lot of
>
> (subreg:SI (reg:DI) 0) kinds of operands.
>
>
> Would it make sense to make an operand predicate that accepted
>
> (reg:SI) or (subreg:SI (reg:DI) 0)?
>
>
> It will reduce my compaints about subregs :-)  But the real reason I'm
> suggesting we consider adding such a predicate is, AFIACT, it it gives
> combine a chance to eliminate the subreg.  I haven't actually tested
> this, but it seems like it might be worth a quick experiment independent
> of these patches (and probably targeted towards gcc-14 rather than gcc-13).
>
>
>
> jeff
>
>


Re: [PATCH] RISC-V: split to allow formation of sh[123]add before divw

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.

On Fri, 18 Nov 2022 at 20:37, Jeff Law  wrote:

>
> On 11/8/22 12:56, Philipp Tomsich wrote:
> > When using strength-reduction, we will reduce a multiplication to a
> > sequence of shifts and adds.  If this is performed with 32-bit types
> > and followed by a division, the lack of w-form sh[123]add will make
> > combination impossible and lead to a slli + addw being generated.
> >
> > Split the sequence with the knowledge that a w-form div will perform
> > implicit sign-extensions.
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/bitmanip.md: Add a define_split to optimize
> >slliw + addiw + divw into sh[123]add + divw.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/riscv/zba-shNadd-05.c: New test.
>
> OK.  I won't complain about the subregs on this one :-)
>
>
> jeff
>
>
>


Re: [PATCH] RISC-V: Optimize branches testing a bit-range or a shifted immediate

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
Philipp.

On Fri, 18 Nov 2022 at 20:30, Jeff Law  wrote:

>
> On 11/8/22 13:46, Philipp Tomsich wrote:
> > gcc/ChangeLog:
> >
> >   * config/riscv/predicates.md (shifted_const_arith_operand):
> >   (uimm_extra_bit_operand):
> >   * config/riscv/riscv.md
> (*branch_shiftedarith_equals_zero):
> >   (*branch_shiftedmask_equals_zero):
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/branch-1.c: New test.
>
> Nice...It seems so obvious, but I'm not offhand aware of other ports
> doing this, though many could likely benefit.
>
> OK
>
>
> jeff
>
>
>


[Bug sanitizer/107752] Lack of column information in AddressSanitizer reports

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107752

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2022-11-18
 Status|UNCONFIRMED |NEW

--- Comment #4 from Andrew Pinski  ---
(In reply to Li Shaohua from comment #3)
> (In reply to Andrew Pinski from comment #1)
> > Do you mean the column information rather than offset?
> 
> Yes, I meant the column information.
> 
> I don’t know the implementation details of ASAN. But as UBsan can include
> the column information, I presume it’s also doable in ASAN?

UBSAN column information is passed directly from the compiler to the library
while ASAN (inside GCC) uses libbacktrace to find the full backtrace.

CLang/LLVM does not use libbacktrace do the backtrace, they have their own
library to do it and that provides which is why it is there for them.

I looked into libbacktrace somewhat to see what needs to be done but it seems
to be a lot (though I could be wrong).

Re: [PATCH] RISC-V: allow bseti on SImode without sign-extension

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
Philipp.

On Fri, 18 Nov 2022 at 20:26, Jeff Law  wrote:

>
> On 11/8/22 13:03, Philipp Tomsich wrote:
> > As long as the SImode operand is not a partial subreg, we can use a
> > bseti without postprocessing to or in a bit, as the middle end is
> > smart enough to stay away from the signbit.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*bsetidisi): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zbs-bexti-02.c: New test.
>
> OK, with my usual grumble about SUBREGs.
>
> jeff
>
>
>


Re: [PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 12:57, Philipp Tomsich wrote:

gcc/ChangeLog:

* config/riscv/bitmanip.md: Handle corner-cases for combine
when chaining slli(.uw)? + addw

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-04.c: New test.


OK.

Something to consider.  We're gaining a lot of

(subreg:SI (reg:DI) 0) kinds of operands.


Would it make sense to make an operand predicate that accepted

(reg:SI) or (subreg:SI (reg:DI) 0)?


It will reduce my compaints about subregs :-)  But the real reason I'm 
suggesting we consider adding such a predicate is, AFIACT, it it gives 
combine a chance to eliminate the subreg.  I haven't actually tested 
this, but it seems like it might be worth a quick experiment independent 
of these patches (and probably targeted towards gcc-14 rather than gcc-13).




jeff



[Bug sanitizer/107752] Lack of column information in AddressSanitizer reports

2022-11-18 Thread shaohua.li at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107752

--- Comment #3 from Li Shaohua  ---
(In reply to Andrew Pinski from comment #1)
> Do you mean the column information rather than offset?

Yes, I meant the column information.

I don’t know the implementation details of ASAN. But as UBsan can include the
column information, I presume it’s also doable in ASAN?

[Bug fortran/107753] gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2022-11-18
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from anlauf at gcc dot gnu.org ---
Confirmed if compiled without optimization (-O0,-Og) on x86_64-pc-linux-gnu.

If I compile with -O, I get:

   (1.79769313486231571E+308,1.79769313486231571E+308)
   (8.98846567431157954E+307,8.98846567431157954E+307)
   (4.49423283715578977E+307,4.49423283715578977E+307)
   (1.,0.)
   (1.,0.)
   (1.,0.)

Re: [PATCH v2] genmultilib: Add sanity check

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/3/22 03:52, Christophe Lyon via Gcc-patches wrote:

When a list of dirnames is provided to genmultilib, its length is
expected to match the number of options.  If this is not the case, the
build fails later for reasons not obviously related to this mistake.
This patch adds a sanity check to help diagnose such cases.

Tested by adding an option to t-aarch64 and no corresponding dirname,
with both bash and dash.

v2: do not use arrays (bash feature).

OK for trunk?

gcc/ChangeLog:

* genmultilib: Add sanity check.


OK.  It should be interesting to see if it trips.


jeff




Re: [PATCH] RISC-V: split to allow formation of sh[123]add before divw

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 12:56, Philipp Tomsich wrote:

When using strength-reduction, we will reduce a multiplication to a
sequence of shifts and adds.  If this is performed with 32-bit types
and followed by a division, the lack of w-form sh[123]add will make
combination impossible and lead to a slli + addw being generated.

Split the sequence with the knowledge that a w-form div will perform
implicit sign-extensions.

gcc/ChangeLog:

 * config/riscv/bitmanip.md: Add a define_split to optimize
   slliw + addiw + divw into sh[123]add + divw.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zba-shNadd-05.c: New test.


OK.  I won't complain about the subregs on this one :-)


jeff




[Bug target/107692] [13 regression] r13-3950-g071e428c24ee8c breaks many test cases

2022-11-18 Thread wwwhhhyyy333 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107692

--- Comment #9 from Hongyu Wang  ---
(In reply to Segher Boessenkool from comment #8)
> (In reply to Jiu Fu Guo from comment #5)
> > > -munroll-only-small-loops does not turn on or off -funroll-loops, and it
> > > should not, so that it does what it says, if nothing else.
> > 
> > Yes, and -funroll-loops would win over -munroll-only-small-loops
> 
> -funroll-loops is the only thing that enables loop unrolling.
> -munroll-only-small-loops, like the name says, says to only unroll small
> loops,
> and no others.  It is not something at the same level as -funroll-loops, that
> would be insanity: other code likes to see if the user requested loops to be
> unrolled as well!

I can understand the logic, my initial patch
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604345.html is something
similar to rs6000 and x86 only.
The difference is, -mno-unroll-only-small-loops -O2 would cause rtl-loop-unroll
takeing effect, and cunroll will also work if we follow the rs6000 change. We
do not really want these so the patch becomes ugly as said :(
I think the intension of -munroll-only-small-loops is to just adjust
rtl-loop-unrolling and do not touch middle-end unroll/cunroll. But I think your
point is also reasonable. Maybe we can split the flag_unroll_loops to tree and
rtl seperately?
Anyway I will propose a patch and re-discuss with maintainers later. Thanks!

[Bug fortran/107753] New: gfortran returns NaN in complex divisions (x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)

2022-11-18 Thread weslley.pereira at ucdenver dot edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107753

Bug ID: 107753
   Summary: gfortran returns NaN in complex divisions
(x+x*I)/(x+x*I) and (x+x*I)/(x-x*I)
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: weslley.pereira at ucdenver dot edu
  Target Milestone: ---

If `x=huge(0.0d0)` or `x=2.0d0**(dble(maxexponent(0.0d0))-1)`, the GNU Fortran
12.2.0 returns a NaN for the complex divisions `(x+x*I)/(x+x*I)` and
`(x+x*I)/(x-x*I)`. We verified this after running compiler tests for the new
LAPACK 3.11.0 release. All other divisions with `x=2**m`, for for
`MINEXPONENT-1 <= m < MAXEXPONENT` return the expected results:
`(x+x*I)/(x+x*I)=1` and `(x+x*I)/(x-x*I)=I`.

Related links:
How to reproduce this issue: https://godbolt.org/z/b3WKWodvn
Open issue in LAPACK: https://github.com/Reference-LAPACK/lapack/issues/757
Tests added to LAPACK: https://github.com/Reference-LAPACK/lapack/pull/623

Re: [PATCH] RISC-V: Optimize branches testing a bit-range or a shifted immediate

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 13:46, Philipp Tomsich wrote:

gcc/ChangeLog:

* config/riscv/predicates.md (shifted_const_arith_operand):
(uimm_extra_bit_operand):
* config/riscv/riscv.md (*branch_shiftedarith_equals_zero):
(*branch_shiftedmask_equals_zero):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/branch-1.c: New test.


Nice...    It seems so obvious, but I'm not offhand aware of other ports 
doing this, though many could likely benefit.


OK


jeff




Re: [PATCH] RISC-V: allow bseti on SImode without sign-extension

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 13:03, Philipp Tomsich wrote:

As long as the SImode operand is not a partial subreg, we can use a
bseti without postprocessing to or in a bit, as the middle end is
smart enough to stay away from the signbit.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*bsetidisi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti-02.c: New test.


OK, with my usual grumble about SUBREGs.

jeff




Re: [PATCH v2 0/2] Use Zbs with xori/ori/andi and polarity-reversed twobit-tests

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/18/22 04:09, Philipp Tomsich wrote:

We had a few patches on the list that shared predicates (for extending
the reach of xori and ori -- and for the branches on two
polarity-reversed bits) and thus depended on each other.

These all had approval with requested changes, so these are now
collected together for v2.

Note that this adds the (a & ~C) case, so please take a look on that
part and OK the updated series.



Changes in v2:
- Collects already approved changes for v2 for (a | C) and (a ^ C).
- Pulls in the (already) approved branch on polarity-reversed bits
   for v2, as it shares predicates with the other changes.
- Newly adds support for the (a & ~C) case.

Philipp Tomsich (2):
   RISC-V: Use bseti/bclri/binvi to extend reach of ori/andi/xori
   RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

  gcc/config/riscv/bitmanip.md  | 79 +++
  gcc/config/riscv/iterators.md |  8 ++
  gcc/config/riscv/predicates.md| 33 
  gcc/config/riscv/riscv.h  |  8 ++
  .../riscv/{zbs-bclri.c => zbs-bclri-01.c} |  0
  gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c | 27 +++
  gcc/testsuite/gcc.target/riscv/zbs-binvi.c| 22 ++
  gcc/testsuite/gcc.target/riscv/zbs-bseti.c| 27 +++
  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
  9 files changed, 224 insertions(+)
  rename gcc/testsuite/gcc.target/riscv/{zbs-bclri.c => zbs-bclri-01.c} (100%)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-binvi.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c


1/2 and 2/2 are both OK.

jeff



[PATCH] gomp: Various fixes for SVE types [PR101018]

2022-11-18 Thread Richard Sandiford via Gcc-patches
[I posted this late in stage 4 as an RFC, but it wasn't suitable for
GCC 12 at that point.  I kind-of dropped the ball after that, sorry.]

Various parts of the omp code checked whether the size of a decl
was an INTEGER_CST in order to determine whether the decl was
variable-sized or not.  If it was variable-sized, it was expected
to have a DECL_VALUE_EXPR replacement, as for VLAs.

This patch uses poly_int_tree_p instead, so that variable-length
SVE vectors are treated like constant-length vectors.  This means
that some structures become poly_int-sized, with some fields at
poly_int offsets, but we already have code to handle that.

An alternative would have been to handle the data via indirection
instead.  However, that's likely to be more complicated, and it
would contradict is_variable_sized, which already uses a check
for TREE_CONSTANT rather than INTEGER_CST.

gimple_add_tmp_var should probably not add a safelen of 1
for SVE vectors, but that's really a separate thing and might
be hard to test.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR middle-end/101018
* poly-int.h (can_and_p): New function.
* fold-const.cc (poly_int_binop): Use it to optimize BIT_AND_EXPRs
involving POLY_INT_CSTs.
* expr.cc (get_inner_reference): Fold poly_uint64 size_trees
into the constant bitsize.
* gimplify.cc (gimplify_bind_expr): Use poly_int_tree_p instead
of INTEGER_CST when checking for constant-sized omp data.
(omp_add_variable): Likewise.
(omp_notice_variable): Likewise.
(gimplify_adjust_omp_clauses_1): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_omp_target): Likewise.

gcc/testsuite/
PR middle-end/101018
* gcc.target/aarch64/sve/acle/pr101018-1.c: New test.
* gcc.target/aarch64/sve/acle/pr101018-2.c: Likewise
---
 gcc/expr.cc   |  4 +--
 gcc/fold-const.cc |  7 +
 gcc/gimplify.cc   | 23 
 gcc/omp-low.cc| 10 +++
 gcc/poly-int.h| 19 +
 .../aarch64/sve/acle/general/pr101018-1.c | 27 +++
 .../aarch64/sve/acle/general/pr101018-2.c | 23 
 7 files changed, 94 insertions(+), 19 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-2.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..a304c583d16 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7941,10 +7941,10 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
 
   if (size_tree != 0)
 {
-  if (! tree_fits_uhwi_p (size_tree))
+  if (! tree_fits_poly_uint64_p (size_tree))
mode = BLKmode, *pbitsize = -1;
   else
-   *pbitsize = tree_to_uhwi (size_tree);
+   *pbitsize = tree_to_poly_uint64 (size_tree);
 }
 
   *preversep = reverse_storage_order_for_component_p (exp);
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index b89cac91cae..000600017e2 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1183,6 +1183,13 @@ poly_int_binop (poly_wide_int , enum tree_code code,
return false;
   break;
 
+case BIT_AND_EXPR:
+  if (TREE_CODE (arg2) != INTEGER_CST
+ || !can_and_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
+))
+   return false;
+  break;
+
 case BIT_IOR_EXPR:
   if (TREE_CODE (arg2) != INTEGER_CST
  || !can_ior_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f06ce3cc77a..096738c8ed4 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7352,7 +7352,7 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree 
decl, unsigned int flags)
   /* When adding a variable-sized variable, we have to handle all sorts
  of additional bits of data: the pointer replacement variable, and
  the parameters of the type.  */
-  if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+  if (DECL_SIZE (decl) && !poly_int_tree_p (DECL_SIZE (decl)))
 {
   /* Add the pointer replacement variable as PRIVATE if the variable
 replacement is private, else FIRSTPRIVATE since we'll need the
@@ -8002,7 +8002,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
   && (flags & (GOVD_SEEN | GOVD_LOCAL)) == GOVD_SEEN
   && DECL_SIZE (decl))
 {
-  if (TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+  tree size;
+  if (!poly_int_tree_p (DECL_SIZE (decl)))
{
  splay_tree_node n2;
  tree t = DECL_VALUE_EXPR (decl);
@@ -8013,16 +8014,14 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
  n2->value |= GOVD_SEEN;
}

[Bug fortran/107680] ICE in arith_power, at fortran/arith.cc:989 and :1006

2022-11-18 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107680

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #7 from anlauf at gcc dot gnu.org ---
Fixed on mainline.

The open issue of the fate(?) of the typespec is tracked in pr107721.

Thanks for the report!

Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-18 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 18 Nov 2022 11:06:29 -0500
Jason Merrill  wrote:

> Ah, so the problem is deferred parsing of methods, rather than 
> templates.  Building the DECL_RESULT sooner does seem like the right 
> approach to handling that, whether that's in grokfndecl or grokmethod.

> >> I'd like to get the template case right while we're looking at it.  I
> >> guess I can add that myself if you're done trying.

Please do, i'd be glad if you could take care of these locations.
It icks me that they are wrong, and be it just for the sake of QOI :)

> >>> Is the hunk for normal functions OK for trunk?  
> >>
> >> You also need a testcase for the desired behavior, with e.g.
> >> { dg-error "23:" }  
> > 
> > I'd have to think about how to test that with trunk, yes.
> > There are no existing warnings that want to point to the return type,
> > are there?  
> 
> Good point.  Do any of your later patches add such a warning?

I didn't mean to have that -Wtype-demotion applied in it's current
form, or at all, so no. I was curious if anybody liked the idea of
pointing out such code though. I've had no feedback but everybody is or
was busy with end of stage3 and real work, so that's expected. The only
real purpose i had for it was to find places in the Fortran FE that
could use narrower types, bools for the most part.
IMHO it would be a nice thing to have, but then, embedded software
usually is cautious to use sensible types in the first place and the
rest doesn't really care anyway, supposedly.

Maybe it would have made more sense to just do an IPA pass that does the
demotion silently where it's feasable.

As to the test, i don't think these locations in the c++ FE are changed
all that often, so chances are rather low that they would be broken
once in.
So, short of trying to use the result decl locus for any existing
-Wreturn-type, -Waggregate-return, -Wno-return-local-addr,
-Wsuggest-attribute=[pure|const|noreturn|format|malloc] or another
existing warning that would be concerned, we could, as said, have a
plugin with fix-it hints and ideally -fdiagnostics-generate-patch to
test these bits. Patch generation has the advantage that it will ICE
more often than not if asked to generate patches for locations that
have a negative relative start (think: memcpy(...,..., -7)), which you
can get easily if the locations are off IMHO.

> > Maybe a g++.dg/plugin/result_decl_plugin.c then.


[Bug fortran/107576] [10/11/12/13 Regression] ICE in gfc_conv_procedure_call, at fortran/trans-expr.cc:6193

2022-11-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107576

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:820c25c83561085f54268bd536f9d216d03c3e18

commit r13-4147-g820c25c83561085f54268bd536f9d216d03c3e18
Author: Harald Anlauf 
Date:   Thu Nov 17 21:36:49 2022 +0100

Fortran: reject NULL actual argument without explicit interface [PR107576]

gcc/fortran/ChangeLog:

PR fortran/107576
* interface.cc (gfc_procedure_use): Reject NULL as actual argument
when there is no explicit procedure interface.

gcc/testsuite/ChangeLog:

PR fortran/107576
* gfortran.dg/null_actual_3.f90: New test.

Re: [PATCH RFA] libstdc++: add experimental Contracts support

2022-11-18 Thread Jonathan Wakely via Gcc-patches

On 03/11/22 15:57 -0400, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.  OK for trunk?

-- >8 --

This patch adds the library support for the experimental C++ Contracts
implementation.  This now consists only of a default definition of the
violation handler, which users can override through defining their own
version.  To avoid ABI stability problems with libstdc++.so this is added to
a separate -lstdc++exp static library, which the driver knows to add when it
sees -fcontracts.

libstdc++-v3/ChangeLog:

* acinclude.m4 (glibcxx_SUBDIRS): Add src/experimental.
* include/Makefile.am (experimental_headers): Add contract.
* include/Makefile.in: Regenerate.
* src/Makefile.am (SUBDIRS): Add experimental.
* src/Makefile.in: Regenerate.
* configure: Regenerate.
* src/experimental/contract.cc: New file.
* src/experimental/Makefile.am: New file.
* src/experimental/Makefile.in: New file.
* include/experimental/contract: New file.
---
libstdc++-v3/src/experimental/contract.cc  |  41 ++
libstdc++-v3/acinclude.m4  |   2 +-
libstdc++-v3/include/Makefile.am   |   1 +
libstdc++-v3/include/Makefile.in   |   1 +
libstdc++-v3/src/Makefile.am   |   3 +-
libstdc++-v3/src/Makefile.in   |   6 +-
libstdc++-v3/src/experimental/Makefile.am  |  96 +++
libstdc++-v3/src/experimental/Makefile.in  | 796 +
libstdc++-v3/include/experimental/contract |  84 +++
9 files changed, 1026 insertions(+), 4 deletions(-)
create mode 100644 libstdc++-v3/src/experimental/contract.cc
create mode 100644 libstdc++-v3/src/experimental/Makefile.am
create mode 100644 libstdc++-v3/src/experimental/Makefile.in
create mode 100644 libstdc++-v3/include/experimental/contract


base-commit: a4cd2389276a30c39034a83d640ce68fa407bac1
prerequisite-patch-id: 329bc16a88dc9a3b13cd3fcecb3678826cc592dc

diff --git a/libstdc++-v3/src/experimental/contract.cc 
b/libstdc++-v3/src/experimental/contract.cc
new file mode 100644
index 000..b9b72cd7df0
--- /dev/null
+++ b/libstdc++-v3/src/experimental/contract.cc
@@ -0,0 +1,41 @@
+// -*- C++ -*- std::experimental::contract_violation and friends
+// Copyright (C) 1994-2022 Free Software Foundation, Inc.


Copy from an old file? I don't think this uses anything
existing, should be just 2022.


+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+#include 
+#include 
+
+__attribute__ ((weak)) void
+handle_contract_violation (const std::experimental::contract_violation 
)
+{
+  std::cerr << "default std::handle_contract_violation called: " << std::endl


No need for flushing with endl here, just \n please.


+<< " " << violation.file_name()
+<< " " << violation.line_number()
+<< " " << violation.function_name()
+<< " " << violation.comment()
+<< " " << violation.assertion_level()
+<< " " << violation.assertion_role()
+<< " " << (int)violation.continuation_mode()
+<< std::endl;


And this will flush too, which typically isn't needed for stderr
because it's unbuffered. But somebody could have fiddled with cerr, so
doing this final flush seems OK.


+}
+
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 6f672924a73..baf01913a90 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
  # Keep these sync'd with the list in Makefile.am.  The first provides an
  # expandable list at autoconf time; the second provides an expandable list
  # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/filesystem src/libbacktrace doc po testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/filesystem src/libbacktrace src/experimental doc po 
testsuite python])
  SUBDIRS='glibcxx_SUBDIRS'

  # These need to be absolute paths, yet at the same time need to
diff --git a/libstdc++-v3/include/Makefile.am 

[PATCH] constexprify some tree variables

2022-11-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Since we use C++11 by default now, we can
use constexpr for some const decls in tree-core.h.

This patch does that and it allows for better optimizations
of GCC code with checking enabled and without LTO.

For an example generic-match.cc compiling is speed up due
to the less number of basic blocks and less debugging info
produced. I did not check the speed of compiling the same source
but rather the speed of compiling the old vs new sources here
(but with the same compiler base).

The small slow down in the parsing of the arrays in each TU
is migrated by a speed up in how much code/debugging info
is produced in the end.

Note I looked at generic-match.cc since it is one of the
compiling sources which causes parallel building to stall and
I wanted to speed it up.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Or should this wait until GCC 13 branches off?

gcc/ChangeLog:

PR middle-end/14840
* tree-core.h (tree_code_type): Constexprify
by including all-tree.def.
(tree_code_length): Likewise.
* tree.cc (tree_code_type): Remove.
(tree_code_length): Remove.
---
 gcc/tree-core.h | 21 +++--
 gcc/tree.cc | 24 
 2 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af75522504f..e146b133dbd 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -2284,15 +2284,32 @@ struct floatn_type_info {
 /* Matrix describing the structures contained in a given tree code.  */
 extern bool tree_contains_struct[MAX_TREE_CODES][64];
 
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
+#define END_OF_BASE_TREE_CODES tcc_exceptional,
+
+
 /* Class of tree given its code.  */
-extern const enum tree_code_class tree_code_type[];
+constexpr enum tree_code_class tree_code_type[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
 
 /* Each tree code class has an associated string representation.
These must correspond to the tree_code_class entries.  */
 extern const char *const tree_code_class_strings[];
 
 /* Number of argument-words in each kind of tree-node.  */
-extern const unsigned char tree_code_length[];
+
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
+#define END_OF_BASE_TREE_CODES 0,
+constexpr unsigned char tree_code_length[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
 
 /* Vector of all alias pairs for global symbols.  */
 extern GTY(()) vec *alias_pairs;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 574bd2e65d9..254b2373dcf 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -74,31 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "ubsan.h"
 
-/* Tree code classes.  */
 
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
-#define END_OF_BASE_TREE_CODES tcc_exceptional,
-
-const enum tree_code_class tree_code_type[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
-
-/* Table indexed by tree code giving number of expression
-   operands beyond the fixed part of the node structure.
-   Not used for types or decls.  */
-
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
-#define END_OF_BASE_TREE_CODES 0,
-
-const unsigned char tree_code_length[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
 
 /* Names of tree components.
Used for printing out the tree and error messages.  */
-- 
2.17.1



Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Andrew Stubbs

On 18/11/2022 17:41, Tobias Burnus wrote:
Attached is the updated/rediffed version, which now uses the builtin 
instead of the 'asm("s8").


The code in principle works; that is: If no private stack variables are 
copied, it works.


Or in other words: reverse-offload target regions that don't use 
firstprivate or mapping work, the rest would crash. That's avoided by 
not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now.


To get it working, the manual stack allocation patch + the trivial 
update to that get_num_devices func is needed, but no change to the 
attached patch.


In order to reduce local patches, I would love to have it on mainline – 
otherwise, I have at least the current version in gcc-patches@.


OK with me.

Andrew


Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Andrew Stubbs

On 18/11/2022 17:20, Tobias Burnus wrote:

This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this 
thread).


The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stack on the compiler side without the 
need to
modify newlib again. The GCC patch matches what newlib did in reent; I 
could

imagine that we change this later on.

Lightly tested (especially by visual inspection).
Currently doing a final regtest, OK when it passes?

Any  comments to this patch - or the attached newlib patch?*

Tobias

(*) I also included a patch to newlib to see where were are heading
+ to actually use them for regtesting ...


This looks wrong:


+   /* stackbase = (stack_segment_decr & 0x)
+   + stack_wave_offset);
+  seg_size = dispatch_ptr->private_segment_size;
+  stacklimit = stackbase + seg_size*64;
+  with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t);
+  cf. struct hsa_kernel_dispatch_packet_s in the HSA doc.  */
+   rtx ptr;
+   if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+   && cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+ {
+   rtx size_rtx = gen_rtx_REG (DImode,
+   
cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+   size_rtx = gen_rtx_MEM (DImode,
+   gen_rtx_PLUS (DImode, size_rtx,
+ GEN_INT (6*16 + 3*32)));
+   size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+


seg_size is calculated from the private_segment_size loaded from the 
dispatch_ptr, not calculated from the dispatch_ptr itself.


Andrew


Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Tobias Burnus

Attached is the updated/rediffed version, which now uses the builtin
instead of the 'asm("s8").

The code in principle works; that is: If no private stack variables are
copied, it works.

Or in other words: reverse-offload target regions that don't use
firstprivate or mapping work, the rest would crash. That's avoided by
not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now.

To get it working, the manual stack allocation patch + the trivial
update to that get_num_devices func is needed, but no change to the
attached patch.

In order to reduce local patches, I would love to have it on mainline –
otherwise, I have at least the current version in gcc-patches@.

Tobias

PS: Previous patch email quoted below. Note: there were two follow up
emails, one by Andrew and one by me; cf. your own mail archive (of this
thread) or
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603383.html + the
next two by thread messages.

On 12.10.22 16:29, Tobias Burnus wrote:

On 29.09.22 18:24, Andrew Stubbs wrote:

On 27/09/2022 14:16, Tobias Burnus wrote:

Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance regarding how to actually implement it.

[...]
The point is that you can use the "msg" and "text" fields for
whatever data you want, as long as you invent a new value for "type".
[]
You can make "case 4" do whatever you want. There are enough bytes
for 4 pointers, and you could use multiple packets (although it's not
safe to assume they're contiguous or already arrived; maybe "case 4"
for part 1, "case 5" for part 2). It's possible to change this
structure, of course, but the target implementation is in newlib so
versioning becomes a problem.


I think  – also looking at the Newlib write.c implementation - that
the data is contiguous: there is an atomic add, where instead of
passing '1' for a single slot, I could also add '2' for two slots.

Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev,
it needs the generic parts of the sister nvptx patch.*

2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 +
32.) As target_ext is blocking, I decided to use a stack local
variable for the remaining arguments and pass it along. Alternatively,
I could also use 2 slots - and process them together. This would avoid
one device->host memory copy but would make console_output less clear.

OK for mainline?

Tobias

* https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html

PS: Currently, device stack variables are private and cannot be
accessed from the host; this will change in a separate patch. It not
only affects the "rest" part as used in this patch but also the actual
arrays behind addr, kinds, and sizes. And quite likely a lot of the
map/firstprivate variables passed to addr.

As num_devices() will return 0 or -1, this is for now a non-issue.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/gcn: Prepare for reverse-offload callback handling

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h: New file; contains
	struct output, declared previously in plugin-gcn.c.
	* config/gcn/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare as extern var.
	(GOMP_target_ext): Handle reverse offload.
	* plugin/plugin-gcn.c: Include libgomp-gcn.h.
	(struct kernargs): Replace struct def by the one
	from libgomp-gcn.h for output_data.
	(process_reverse_offload): New.
	(console_output): Call it.

 libgomp/config/gcn/libgomp-gcn.h | 61 
 libgomp/config/gcn/target.c  | 44 -
 libgomp/plugin/plugin-gcn.c  | 34 --
 3 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
new file mode 100644
index 000..91560be787f
--- /dev/null
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -0,0 +1,61 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Tobias Burnus .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, 

[Bug testsuite/107689] [13 regression] r13-3979-g9d29dd2fcf2922 causes failures in diagnostic-format-json-2.c and others

2022-11-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107689

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Jakub Jelinek  ---
r13-4057-gfe26b040ce8e74700d22f9abf4306e4a93e2b99e
did that.

[Bug sanitizer/107752] Lack of column information in AddressSanitizer reports

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107752

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||diagnostic

[Bug sanitizer/107752] Lack of column information in AddressSanitizer reports

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107752

Andrew Pinski  changed:

   What|Removed |Added

Summary|Lack of offset information  |Lack of column information
   |in AddressSanitizer reports |in AddressSanitizer reports

--- Comment #2 from Andrew Pinski  ---
libbacktrace does not pass the column information:
static int SymbolizeCodePCInfoCallback(void *vdata, uintptr_t addr,
   const char *filename, int lineno,
   const char *function) {

[Bug sanitizer/107752] Lack of offset information in AddressSanitizer reports

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107752

--- Comment #1 from Andrew Pinski  ---
Do you mean the column information rather than offset?

[Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Tobias Burnus

This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this thread).

The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stack on the compiler side without the need to
modify newlib again. The GCC patch matches what newlib did in reent; I could
imagine that we change this later on.

Lightly tested (especially by visual inspection).
Currently doing a final regtest, OK when it passes?

Any  comments to this patch - or the attached newlib patch?*

Tobias

(*) I also included a patch to newlib to see where were are heading
+ to actually use them for regtesting ...
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

The new builtins have been added for newlib to reduce dependency on
compiler-internal implementation choices of GCC in newlibs' getreent.c.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P,
GET_STACK_LIMIT): Add new builtins.
	* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them.
	* config/gcn/gcn.md (prologue_use): Add "register_operand" as
	arg to match_operand.
	(prologue_use_di): New; DI insn_and_split variant of the former.

Co-Authored-By: Andrew Stubbs 

 gcc/config/gcn/gcn-builtins.def |  4 +++
 gcc/config/gcn/gcn.cc   | 70 -
 gcc/config/gcn/gcn.md   | 15 -
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index eeeaebf9013..f1cf30bbc94 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -160,8 +160,12 @@ DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID),
 
 /* Kernel inputs.  */
 
+DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
+	 _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
 DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
 	 gcn_expand_builtin_1)
+DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
+	 _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
 
 #undef _A1
 #undef _A2
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b3814c2e7c6..051eadee783 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4493,6 +4493,44 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
   emit_insn (gen_gcn_wavefront_barrier ());
   return target;
 
+case GCN_BUILTIN_GET_STACK_LIMIT:
+  {
+	/* stackbase = (stack_segment_decr & 0x)
+			+ stack_wave_offset);
+	   seg_size = dispatch_ptr->private_segment_size;
+	   stacklimit = stackbase + seg_size*64;
+	   with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t);
+	   cf. struct hsa_kernel_dispatch_packet_s in the HSA doc.  */
+	rtx ptr;
+	if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+	&& cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx size_rtx = gen_rtx_REG (DImode,
+	cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+	size_rtx = gen_rtx_MEM (DImode,
+gen_rtx_PLUS (DImode, size_rtx,
+		  GEN_INT (6*16 + 3*32)));
+	size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+
+	ptr = gen_rtx_REG (DImode,
+		cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	ptr = gen_rtx_AND (DImode, ptr, GEN_INT (0x));
+	ptr = gen_rtx_PLUS (DImode, ptr, size_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG] >= 0)
+	  {
+		rtx off;
+		off = gen_rtx_REG (SImode,
+		  cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]);
+		ptr = gen_rtx_PLUS (DImode, ptr, off);
+	  }
+	  }
+	else
+	  {
+	ptr = gen_reg_rtx (DImode);
+	emit_move_insn (ptr, const0_rtx);
+	  }
+	return ptr;
+  }
 case GCN_BUILTIN_KERNARG_PTR:
   {
 	rtx ptr;
@@ -4506,7 +4544,37 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	  }
 	return ptr;
   }
-
+case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
+  {
+	/* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
+	   whether it was the first call.  */
+	rtx result = gen_reg_rtx (BImode);
+	emit_move_insn (result, const0_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx not_first = gen_label_rtx ();
+	rtx reg = gen_rtx_REG (DImode,
+			cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	rtx cmp = force_reg (DImode,
+ gen_rtx_AND (DImode, reg,
+	  GEN_INT (0xL)));
+	emit_insn (gen_cstoresi4 (result, gen_rtx_EQ (BImode, cmp,
+			  GEN_INT(12345L << 48)),
+  cmp, GEN_INT(12345L << 48)));
+	

[Bug c/106765] [12/13 Regression] ICE (invalid code) in tree check: expected class 'type', have 'exceptional' (error_mark) in create_tmp_from_val, at gimplify.cc since r12-7222-g3f10e0d50b5e3b3f

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106765

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|12.3|13.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Andrew Pinski  ---
Fixed for GCC 13. Since this is an ICE after error, no reason to backport it.

[Bug middle-end/107307] [12/13 Regression] ICE tree check: expected class 'type', have 'exceptional' (error_mark) in canonicalize_component_ref, at gimplify.cc:2923 since r12-3278-g823685221de986af

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107307

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|12.3|13.0

--- Comment #5 from Andrew Pinski  ---
Fixed for GCC 13. Since this is an ICE after error, no reason to backport it.

[Bug c/106764] [12/13 Regression] ICE on invalid code in tree check: expected function_type or method_type, have error_mark in gimplify_call_expr, at gimplify.cc

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106764

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|12.3|13.0

--- Comment #6 from Andrew Pinski  ---
Fixed for GCC 13. Since this is an ICE after error, no reason to backport it.

[Bug c/107705] [12/13 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in ix86_function_type_abi, at config/i386/i386.cc:1529

2022-11-18 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107705

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|12.3|13.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Andrew Pinski  ---
Fixed for GCC 13.

[Bug c/106764] [12/13 Regression] ICE on invalid code in tree check: expected function_type or method_type, have error_mark in gimplify_call_expr, at gimplify.cc

2022-11-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106764

--- Comment #5 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:bd0c9d9e706adaeea0d96152daade0a6819a8715

commit r13-4143-gbd0c9d9e706adaeea0d96152daade0a6819a8715
Author: Andrew Pinski 
Date:   Thu Nov 17 22:08:07 2022 +

Fix PRs 106764, 106765, and 107307, all ICE after invalid re-declaration

The problem here is the gimplifier returns GS_ERROR but
in some cases we don't check that soon enough and try
to do other work which could crash.
So the fix in these two cases is to return GS_ERROR
early if the gimplify_* functions had return GS_ERROR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gimplify.cc (gimplify_compound_lval): Return GS_ERROR
if gimplify_expr had return GS_ERROR.
(gimplify_call_expr): Likewise.

gcc/testsuite/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gcc.dg/redecl-19.c: New test.
* gcc.dg/redecl-20.c: New test.
* gcc.dg/redecl-21.c: New test.

[Bug middle-end/107307] [12/13 Regression] ICE tree check: expected class 'type', have 'exceptional' (error_mark) in canonicalize_component_ref, at gimplify.cc:2923 since r12-3278-g823685221de986af

2022-11-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107307

--- Comment #4 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:bd0c9d9e706adaeea0d96152daade0a6819a8715

commit r13-4143-gbd0c9d9e706adaeea0d96152daade0a6819a8715
Author: Andrew Pinski 
Date:   Thu Nov 17 22:08:07 2022 +

Fix PRs 106764, 106765, and 107307, all ICE after invalid re-declaration

The problem here is the gimplifier returns GS_ERROR but
in some cases we don't check that soon enough and try
to do other work which could crash.
So the fix in these two cases is to return GS_ERROR
early if the gimplify_* functions had return GS_ERROR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gimplify.cc (gimplify_compound_lval): Return GS_ERROR
if gimplify_expr had return GS_ERROR.
(gimplify_call_expr): Likewise.

gcc/testsuite/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gcc.dg/redecl-19.c: New test.
* gcc.dg/redecl-20.c: New test.
* gcc.dg/redecl-21.c: New test.

[Bug c/107705] [12/13 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in ix86_function_type_abi, at config/i386/i386.cc:1529

2022-11-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107705

--- Comment #3 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:ceba66ee230bb96b0889fc8ec7333c7ffae96d6e

commit r13-4144-gceba66ee230bb96b0889fc8ec7333c7ffae96d6e
Author: Andrew Pinski 
Date:   Thu Nov 17 22:03:08 2022 +

Fix PR middle-end/107705: ICE after reclaration error

The problem here is after we created a call expression
in the C front-end, we replace the decl type with
an error mark node. We then end up calling
aggregate_value_p with the call expression
with the decl with the error mark as the type
and we ICE.

The fix is to check the function type
after we process the call expression inside
aggregate_value_p to get it.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/ChangeLog:

PR middle-end/107705
* function.cc (aggregate_value_p): Return 0 if
the function type was an error operand.

gcc/testsuite/ChangeLog:

* gcc.dg/redecl-22.c: New test.

[Bug c/106765] [12/13 Regression] ICE (invalid code) in tree check: expected class 'type', have 'exceptional' (error_mark) in create_tmp_from_val, at gimplify.cc since r12-7222-g3f10e0d50b5e3b3f

2022-11-18 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106765

--- Comment #2 from CVS Commits  ---
The trunk branch has been updated by Andrew Pinski :

https://gcc.gnu.org/g:bd0c9d9e706adaeea0d96152daade0a6819a8715

commit r13-4143-gbd0c9d9e706adaeea0d96152daade0a6819a8715
Author: Andrew Pinski 
Date:   Thu Nov 17 22:08:07 2022 +

Fix PRs 106764, 106765, and 107307, all ICE after invalid re-declaration

The problem here is the gimplifier returns GS_ERROR but
in some cases we don't check that soon enough and try
to do other work which could crash.
So the fix in these two cases is to return GS_ERROR
early if the gimplify_* functions had return GS_ERROR.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

gcc/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gimplify.cc (gimplify_compound_lval): Return GS_ERROR
if gimplify_expr had return GS_ERROR.
(gimplify_call_expr): Likewise.

gcc/testsuite/ChangeLog:

PR c/106764
PR c/106765
PR c/107307
* gcc.dg/redecl-19.c: New test.
* gcc.dg/redecl-20.c: New test.
* gcc.dg/redecl-21.c: New test.

Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-11-18 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 22:59:08 PST (-0800), Kito Cheng wrote:

Wait, what's Xgnuzihintpausestate???


I just made it up, it's defined right next to the name like those 
profile extensions are.  I figured that's the most RISC-V way to define 
something like this, but we could just drop it and run with the 
definition -- IIRC we just stuck a comment in for Linux and QEMU, I 
doubt anyone is actually going to implement the "doesn't touch PC" 
version of pause.



On Fri, Nov 18, 2022 at 12:30 PM Palmer Dabbelt  wrote:


gcc/ChangeLog:

* doc/extend.texi (__builtin_riscv_pause): Imply
Xgnuzihintpausestate.
---
 gcc/doc/extend.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1dd39e64b8..26f14e61bc8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21103,7 +21103,9 @@ Returns the value that is currently set in the 
@samp{tp} register.
 @end deftypefn

 @deftypefn {Built-in Function}  void __builtin_riscv_pause (void)
-Generates the @code{pause} (hint) machine instruction.
+Generates the @code{pause} (hint) machine instruction.  This implies the
+Xgnuzihintpausestate extension, which redefines the @code{pause} instruction to
+change architectural state.
 @end deftypefn

 @node RX Built-in Functions
--
2.38.1



RE: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
> overloading [PR107515]
> 
> From: Stam Markianos-Wright 
> 
> This patch adds explicit references to other float types
> to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
> 
> gcc/ChangeLog:
> PR 107515
> * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but 
for now ok.
The ChangeLog should say "PR target/107515" for the git hook to recognize it 
IIRC.
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index fd1876b57a0..f6b42dc3fab 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35582,6 +35582,9 @@ enum {
>   short: __ARM_mve_type_int_n, \
>   int: __ARM_mve_type_int_n, \
>   long: __ARM_mve_type_int_n, \
> + _Float16: __ARM_mve_type_fp_n, \
> + __fp16: __ARM_mve_type_fp_n, \
> + float: __ARM_mve_type_fp_n, \
>   double: __ARM_mve_type_fp_n, \
>   long long: __ARM_mve_type_int_n, \
>   unsigned char: __ARM_mve_type_int_n, \
> --
> 2.25.1



Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 11:34, Jakub Jelinek wrote:

On Fri, Nov 18, 2022 at 11:24:45AM -0500, Jason Merrill wrote:

Right, that's the C++17 implicit constexpr for lambdas, finish_function:

/* Lambda closure members are implicitly constexpr if possible.  */
if (cxx_dialect >= cxx17
&& LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl)))
  DECL_DECLARED_CONSTEXPR_P (fndecl)
= ((processing_template_decl
|| is_valid_constexpr_fn (fndecl, /*complain*/false))
   && potential_constant_expression (DECL_SAVED_TREE (fndecl)));


Yeah, I guess potential_constant_expression needs to be stricter in a
lambda. Or perhaps any function that isn't already
DECL_DECLARED_CONSTEXPR_P?


potential_constant_expression can't be relied on that it catches up
everything if it, even a simple if statement with a condition not yet
known to be 0 or non-0 results in just a requirement that at least
one of the substatements is potential constant, etc.
Similarly switch statements etc.
If there is a way to distinguish between functions with user
specified constexpr/consteval and DECL_DECLARED_CONSTEXPR_P set
through the above if condition, sure, cp_finish_decl ->
check_static_in_constexpr could be perhaps silent about those, but then
we want to diagnose it during constexpr evaluation at least.  But in that
case having it a pedwarn rather than "this is a constant expression"
vs. "this is not a constant expression, if !ctx->quiet emit an error"
is something I don't see how to handle.  Because something needs
to be returned, it is a constant expression or it is not.


True.  Let's go with your option 2, then, thanks.

Jason



RE: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n
> intrinsic
> 
> From: Stam Markianos-Wright 
> 
> It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
> resolution would fall back to the `__ARM_undef` failure state.
> 
> This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79`
> and
> `6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
> identified, because the tests were not checking for this kind of failure.
> 
> The above commits changed the definitions of the intrinsics from using
> `[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
> allowed `int` to be supported in user code through the overloaded
> `#defines`, but seems to have broken the `[u]int[8/16/32]_t` types
> 
> The solution implemented by this patch is to explicitly use a new
> _Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
> change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
> user code and are handled by the overloading mechanism correctly.
> 
> gcc/ChangeLog:
> 
> * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
> (__arm_vaddq_m_n_s32): Likewise.
> (__arm_vaddq_m_n_s16): Likewise.
> (__arm_vaddq_m_n_u8): Likewise.
> (__arm_vaddq_m_n_u32): Likewise.
> (__arm_vaddq_m_n_u16): Likewise.
> (__arm_vaddq_m): Fix Overloading.
> (__ARM_mve_coerce3): New.

Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit 
message?
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 78 
>  1 file changed, 40 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 684f997520f..951dc25374b 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pr
> 
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
>  }
> @@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pred16
> 
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline 

RE: [PATCH 10/35] arm: improve tests for vabavq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 10/35] arm: improve tests for vabavq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vabavq_p_s16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_s32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_s8.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_u16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_u32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_u8.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_s16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_s32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_s8.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_u16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_u32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_u8.c:

Missing ChangeLog text?
Ok with ChangeLog fixed.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vabavq_p_s16.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_s32.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_s8.c  | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_u16.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_u32.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_u8.c  | 40 ++-
>  .../arm/mve/intrinsics/vabavq_s16.c   | 28 -
>  .../arm/mve/intrinsics/vabavq_s32.c   | 28 -
>  .../gcc.target/arm/mve/intrinsics/vabavq_s8.c | 28 -
>  .../arm/mve/intrinsics/vabavq_u16.c   | 28 -
>  .../arm/mve/intrinsics/vabavq_u32.c   | 28 -
>  .../gcc.target/arm/mve/intrinsics/vabavq_u8.c | 28 -
>  12 files changed, 384 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> index 78ac801fa3c..843d022c418 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s16  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
>  {
>return vabavq_p_s16 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s16  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
>  {
>return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s16  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
> +uint32_t
> +foo2 (int16x8_t b, int16x8_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> index af4e30b6127..6ed9b9ac1c4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s32  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vabavq_p_s32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s32  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vabavq_p (a, b, c, p);
>  }
> 
> -/* { 

RE: [PATCH 12/35] arm: improve tests and fix vabsq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 12/35] arm: improve tests and fix vabsq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vabsq_f): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vabsq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  2 +-
>  .../gcc.target/arm/mve/intrinsics/vabsq_f16.c | 22 +++-
>  .../gcc.target/arm/mve/intrinsics/vabsq_f32.c | 22 +++-
>  .../arm/mve/intrinsics/vabsq_m_f16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_f32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_s16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_s32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_s8.c   | 25 ---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s16.c | 20 ---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s32.c | 20 ---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s8.c  | 16 ++--
>  .../arm/mve/intrinsics/vabsq_x_f16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_f32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_s16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_s32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_s8.c   | 25 ---
>  16 files changed, 309 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 3330a220aea..bc4e2f2ac21 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -279,7 +279,7 @@ (define_insn "mve_vabsq_f"
>   (abs:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")))
>]
>"TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vabs.f%#  %q0, %q1"
> +  "vabs.f%#\t%q0, %q1"
>[(set_attr "type" "mve_move")
>  ])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> index 08e141baedc..f29ada8c058 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> @@ -1,13 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vabs.f16q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo (float16x8_t a)
>  {
>return vabsq_f16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.f16"  }  } */
> +
> +/*
> +**foo1:
> +**   ...
> +**   vabs.f16q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo1 (float16x8_t a)
> +{
> +  return vabsq (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> index 3614a44fbdc..cc24744fb26 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> @@ -1,13 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vabs.f32q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
>  float32x4_t
>  foo (float32x4_t a)
>  {
>return vabsq_f32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.f32"  }  } */
> +
> +/*
> +**foo1:
> +**   ...
> +**   vabs.f32q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo1 (float32x4_t a)
> +{
> +  return vabsq 

RE: [PATCH 11/35] arm: improve tests for vabdq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 11/35] arm: improve tests for vabdq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vabdq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vabdq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../gcc.target/arm/mve/intrinsics/vabdq_f16.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_f32.c | 16 ++--
>  .../arm/mve/intrinsics/vabdq_m_f16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_f32.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_s32.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_s8.c   | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_u16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_u32.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_u8.c   | 26 ---
>  .../gcc.target/arm/mve/intrinsics/vabdq_s16.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_s32.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_s8.c  | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u16.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u32.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u8.c  | 16 ++--
>  .../arm/mve/intrinsics/vabdq_x_f16.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_f32.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_x_s32.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_s8.c   | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_u16.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_u32.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_u8.c   | 25 +++---
>  24 files changed, 464 insertions(+), 73 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> index b55e826e4b6..f379b25c49e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vabd.f16q[0-9]+, q[0-9]+, q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>return vabdq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vabd.f16q[0-9]+, q[0-9]+, q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> index f1a95b14e03..3ba808e0b4d 100644
> --- 

  1   2   3   >