Re: nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel' (was: [MentorEmbedded/nvptx-tools] Match standard 'ld' "search" behavior (PR #38))

2022-11-18 Thread Tom de Vries via Gcc-patches

On 11/19/22 00:25, Thomas Schwinge wrote:

Hi!

Re
:

On 2022-11-18T11:05:23-0800, I wrote:

Actually, in GCC/nvptx target testing, this #38's commit 
886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820 "ld: Don't search for input files in 
'-L'directories" is generally causing linking to fail with:

```
error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1
```

I'm investigating.


OK to push the attached
GCC "nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'" to all
active GCC branches?  (... instead of having to restore this "blunder"
(do "search for input files in '-L'directories") in nvptx-tools...)



Hi,

yes, LGTM.

Thanks,
- Tom



Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] i386: Only enable small loop unrolling in backend [PR 107602]

2022-11-18 Thread Hongyu Wang via Gcc-patches
Hi,

Followed by the discussion in pr107602, -munroll-only-small-loops
Does not turns on/off -funroll-loops, and current check in
pass_rtl_unroll_loops::gate would cause -funroll-loops do not take
effect. Revert the change about targetm.loop_unroll_adjust and apply
the backend option change to strictly follow the rule that
-funroll-loops takes full control of loop unrolling, and
munroll-only-small-loops just change its behavior to unroll small size
loops.

Bootstrapped and regtested on x86-64-pc-linux-gnu.

Ok for trunk?

gcc/ChangeLog:

PR target/107602
* common/config/i386/i386-common.cc (ix86_optimization_table):
Enable loop unroll O2, disable -fweb and -frename-registers
by default.
* config/i386/i386-options.cc
(ix86_override_options_after_change):
Disable small loop unroll when funroll-loops enabled, reset
cunroll_grow_size when it is not explicitly enabled.
(ix86_option_override_internal): Call
ix86_override_options_after_change instead of calling
ix86_recompute_optlev_based_flags and ix86_default_align
separately.
* config/i386/i386.cc (ix86_loop_unroll_adjust): Adjust unroll
factor if -munroll-only-small-loops enabled.
* loop-init.cc (pass_rtl_unroll_loops::gate): Do not enable
loop unrolling for -O2-speed.
(pass_rtl_unroll_loops::execute): Rmove
targetm.loop_unroll_adjust check.

gcc/testsuite/ChangeLog:

PR target/107602
* gcc.target/i386/pr86270.c: Add -fno-unroll-loops.
* gcc.target/i386/pr93002.c: Likewise.
---
 gcc/common/config/i386/i386-common.cc   |  8 ++
 gcc/config/i386/i386-options.cc | 34 ++---
 gcc/config/i386/i386.cc | 18 -
 gcc/loop-init.cc| 11 +++-
 gcc/testsuite/gcc.target/i386/pr86270.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr93002.c |  2 +-
 6 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 6ce2a588adc..660a977b68b 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -1808,7 +1808,15 @@ static const struct default_options 
ix86_option_optimization_table[] =
 /* The STC algorithm produces the smallest code at -Os, for x86.  */
 { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
   REORDER_BLOCKS_ALGORITHM_STC },
+
+/* Turn on -funroll-loops with -munroll-only-small-loops to enable small
+   loop unrolling at -O2.  */
+{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
 { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
+/* Turns off -frename-registers and -fweb which are enabled by
+   funroll-loops.  */
+{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
+{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
 /* Turn off -fschedule-insns by default.  It tends to make the
problem with not enough registers even worse.  */
 { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index e5c77f3a84d..bc1d36e36a8 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1838,8 +1838,37 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
*opts,
 void
 ix86_override_options_after_change (void)
 {
+  /* Default align_* from the processor table.  */
   ix86_default_align (_options);
+
   ix86_recompute_optlev_based_flags (_options, _options_set);
+
+  /* Disable unrolling small loops when there's explicit
+ -f{,no}unroll-loop.  */
+  if ((OPTION_SET_P (flag_unroll_loops))
+ || (OPTION_SET_P (flag_unroll_all_loops)
+&& flag_unroll_all_loops))
+{
+  if (!OPTION_SET_P (ix86_unroll_only_small_loops))
+   ix86_unroll_only_small_loops = 0;
+  /* Re-enable -frename-registers and -fweb if funroll-loops
+enabled.  */
+  if (!OPTION_SET_P (flag_web))
+   flag_web = flag_unroll_loops;
+  if (!OPTION_SET_P (flag_rename_registers))
+   flag_rename_registers = flag_unroll_loops;
+  /* -fcunroll-grow-size default follws -f[no]-unroll-loops.  */
+  if (!OPTION_SET_P (flag_cunroll_grow_size))
+   flag_cunroll_grow_size = flag_unroll_loops
+|| flag_peel_loops
+|| optimize >= 3;
+}
+  else
+{
+  if (!OPTION_SET_P (flag_cunroll_grow_size))
+   flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
+}
+
 }
 
 /* Clear stack slot assignments remembered from previous functions.
@@ -2351,7 +2380,7 @@ ix86_option_override_internal (bool main_args_p,
 
   set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
 
-  ix86_recompute_optlev_based_flags (opts, opts_set);
+  ix86_override_options_after_change ();
 
   ix86_tune_cost = processor_cost_table[ix86_tune];
   

Re: [PATCH] c++: cache the normal form of a concept-id

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 16:43, Patrick Palka wrote:

We already cache the overall normal form of a declaration's constraints
under the assumption that it can't change over the translation unit.
But if we have two constrained declarations such as

   template void f() requires expensive && A;
   template void g() requires expensive && B;

then despite this high-level caching we'd still redundantly have to
expand the concept-id expensive twice, once during normalization of
f's constraints and again during normalization of g's.  Ideally, we'd
reuse the previously computed normal form of expensive the second
time around.

To that end this patch introduces an intermediate layer of caching
during constraint normalization -- caching of the normal form of a
concept-id -- that sits between our high-level caching of the overall
normal form of a declaration's constraints and our low-level caching of
each individual atomic constraint.

It turns out this caching generalizes some ad-hoc caching of the normal
form of concept definition (which is equivalent to the normal form of
the concept-id C where gtargs are C's generic arguments) so
this patch unifies the caching accordingly.

This change improves compile time/memory usage for e.g. the libstdc++
test std/ranges/adaptors/join.cc by 10%/5%.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


Hmm, if we cache at this level, do we still also need to cache the full 
normal form of the decl's constraints?


Exploring that doesn't seem like stage 3 material, though.  The patch is OK.


gcc/cp/ChangeLog:

* constraint.cc (struct norm_entry): Define.
(struct norm_hasher): Define.
(norm_cache): Define.
(normalize_concept_check): Add function comment.  Cache the
result of concept-id normalization.  Canonicalize generic
arguments as NULL_TREE.  Don't coerce arguments unless
substitution occurred.
(normalize_concept_definition): Simplify.  Use norm_cache
instead of ad-hoc caching.
---
  gcc/cp/constraint.cc | 94 ++--
  1 file changed, 82 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a113d3e269e..c9740b1ec78 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -698,6 +698,40 @@ normalize_logical_operation (tree t, tree args, tree_code 
c, norm_info info)
return build2 (c, ci, t0, t1);
  }
  
+/* Data types and hash functions for caching the normal form of a concept-id.

+   This essentially memoizes calls to normalize_concept_check.  */
+
+struct GTY((for_user)) norm_entry
+{
+  /* The CONCEPT_DECL of the concept-id.  */
+  tree tmpl;
+  /* The arguments of the concept-id.  */
+  tree args;
+  /* The normal form of the concept-id.  */
+  tree norm;
+};
+
+struct norm_hasher : ggc_ptr_hash
+{
+  static hashval_t hash (norm_entry *t)
+  {
+hashval_t hash = iterative_hash_template_arg (t->tmpl, 0);
+hash = iterative_hash_template_arg (t->args, hash);
+return hash;
+  }
+
+  static bool equal (norm_entry *t1, norm_entry *t2)
+  {
+return t1->tmpl == t2->tmpl
+  && template_args_equal (t1->args, t2->args);
+  }
+};
+
+static GTY((deletable)) hash_table *norm_cache;
+
+/* Normalize the concept check CHECK where ARGS are the
+   arguments to be substituted into CHECK's arguments.  */
+
  static tree
  normalize_concept_check (tree check, tree args, norm_info info)
  {
@@ -720,24 +754,53 @@ normalize_concept_check (tree check, tree args, norm_info 
info)
  targs = tsubst_template_args (targs, args, info.complain, info.in_decl);
if (targs == error_mark_node)
  return error_mark_node;
+  if (template_args_equal (targs, generic_targs_for (tmpl)))
+/* Canonicalize generic arguments as NULL_TREE, as an optimization.  */
+targs = NULL_TREE;
  
/* Build the substitution for the concept definition.  */

tree parms = TREE_VALUE (DECL_TEMPLATE_PARMS (tmpl));
-  /* Turn on template processing; coercing non-type template arguments
- will automatically assume they're non-dependent.  */
++processing_template_decl;
-  tree subst = coerce_template_parms (parms, targs, tmpl, tf_none);
+  if (targs && args)
+/* If substitution occurred, coerce the resulting arguments.  */
+targs = coerce_template_parms (parms, targs, tmpl, tf_none);
--processing_template_decl;
-  if (subst == error_mark_node)
+  if (targs == error_mark_node)
  return error_mark_node;
  
+  if (!norm_cache)

+norm_cache = hash_table::create_ggc (31);
+  norm_entry entry = {tmpl, targs, NULL_TREE};
+  norm_entry **slot = nullptr;
+  hashval_t hash = 0;
+  if (!info.generate_diagnostics ())
+{
+  /* If we're not diagnosing, cache the normal form of the
+substituted concept-id.  */
+  hash = norm_hasher::hash ();
+  slot = norm_cache->find_slot_with_hash (, hash, INSERT);
+  if (*slot)
+   return (*slot)->norm;
+}
+
/* The concept may 

Re: [PATCH] constexprify some tree variables

2022-11-18 Thread Andrew Pinski via Gcc-patches
On Fri, Nov 18, 2022 at 12:06 PM Jeff Law via Gcc-patches
 wrote:
>
>
> On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > Since we use C++11 by default now, we can
> > use constexpr for some const decls in tree-core.h.
> >
> > This patch does that and it allows for better optimizations
> > of GCC code with checking enabled and without LTO.
> >
> > For an example generic-match.cc compiling is speed up due
> > to the less number of basic blocks and less debugging info
> > produced. I did not check the speed of compiling the same source
> > but rather the speed of compiling the old vs new sources here
> > (but with the same compiler base).
> >
> > The small slow down in the parsing of the arrays in each TU
> > is migrated by a speed up in how much code/debugging info
> > is produced in the end.
> >
> > Note I looked at generic-match.cc since it is one of the
> > compiling sources which causes parallel building to stall and
> > I wanted to speed it up.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > Or should this wait until GCC 13 branches off?
> >
> > gcc/ChangeLog:
> >
> >   PR middle-end/14840
> >   * tree-core.h (tree_code_type): Constexprify
> >   by including all-tree.def.
> >   (tree_code_length): Likewise.
> >   * tree.cc (tree_code_type): Remove.
> >   (tree_code_length): Remove.
>
> I would have preferred this a week ago :-)   And if it was just
> const-ifying, I'd ACK it without hesitation.

Yes I know which is why I am ok with waiting for GCC 14 really. I
decided to try to clear out some of the old bug reports assigned to
myself and this one was one of the oldest and also one of the easiest
to do.

>
> Can you share any of the build-time speedups you're seeing, even if
> they're not perfect.  It'd help to get a sense of the potential gain
> here and whether or not there's enough gain to gate it into gcc-13 or
> have it wait for gcc-14.
>
>
> And if we can improve the compile-time of the files generated by
> match.pd, that's a win.  It's definitely a serialization point -- it
> becomes *painfully* obvious when doing a bootstrap using qemu, when that
> file takes 1-2hrs after everything else has finished.

I recorded some of the timings in the bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14840#c14

Summary is using the same compiler as a base, compiling
generic-match.cc is now ~7% faster.
I have not looked into why but I can only assume it is due to less
debug info and less basic blocks.
I assume without checking enabled (or rather release checking) on the
sources, I can only assume the speedup is
not going to be seen. Most of the constant reads are in the checking
part of the code.

Thanks,
Andrew Pinski


>
>
> Jeff


Re: [PATCH RFA] libstdc++: add experimental Contracts support

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 13:17, Jonathan Wakely wrote:

On 03/11/22 15:57 -0400, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.  OK for trunk?

-- >8 --

This patch adds the library support for the experimental C++ Contracts
implementation.  This now consists only of a default definition of the
violation handler, which users can override through defining their own
version.  To avoid ABI stability problems with libstdc++.so this is 
added to
a separate -lstdc++exp static library, which the driver knows to add 
when it

sees -fcontracts.

libstdc++-v3/ChangeLog:

* acinclude.m4 (glibcxx_SUBDIRS): Add src/experimental.
* include/Makefile.am (experimental_headers): Add contract.
* include/Makefile.in: Regenerate.
* src/Makefile.am (SUBDIRS): Add experimental.
* src/Makefile.in: Regenerate.
* configure: Regenerate.
* src/experimental/contract.cc: New file.
* src/experimental/Makefile.am: New file.
* src/experimental/Makefile.in: New file.
* include/experimental/contract: New file.
---
libstdc++-v3/src/experimental/contract.cc  |  41 ++
libstdc++-v3/acinclude.m4  |   2 +-
libstdc++-v3/include/Makefile.am   |   1 +
libstdc++-v3/include/Makefile.in   |   1 +
libstdc++-v3/src/Makefile.am   |   3 +-
libstdc++-v3/src/Makefile.in   |   6 +-
libstdc++-v3/src/experimental/Makefile.am  |  96 +++
libstdc++-v3/src/experimental/Makefile.in  | 796 +
libstdc++-v3/include/experimental/contract |  84 +++
9 files changed, 1026 insertions(+), 4 deletions(-)
create mode 100644 libstdc++-v3/src/experimental/contract.cc
create mode 100644 libstdc++-v3/src/experimental/Makefile.am
create mode 100644 libstdc++-v3/src/experimental/Makefile.in
create mode 100644 libstdc++-v3/include/experimental/contract


base-commit: a4cd2389276a30c39034a83d640ce68fa407bac1
prerequisite-patch-id: 329bc16a88dc9a3b13cd3fcecb3678826cc592dc

diff --git a/libstdc++-v3/src/experimental/contract.cc 
b/libstdc++-v3/src/experimental/contract.cc

new file mode 100644
index 000..b9b72cd7df0
--- /dev/null
+++ b/libstdc++-v3/src/experimental/contract.cc
@@ -0,0 +1,41 @@
+// -*- C++ -*- std::experimental::contract_violation and friends
+// Copyright (C) 1994-2022 Free Software Foundation, Inc.


Copy from an old file? I don't think this uses anything
existing, should be just 2022.


+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+#include 
+#include 
+
+__attribute__ ((weak)) void
+handle_contract_violation (const 
std::experimental::contract_violation )

+{
+  std::cerr << "default std::handle_contract_violation called: " << 
std::endl


No need for flushing with endl here, just \n please.


+    << " " << violation.file_name()
+    << " " << violation.line_number()
+    << " " << violation.function_name()
+    << " " << violation.comment()
+    << " " << violation.assertion_level()
+    << " " << violation.assertion_role()
+    << " " << (int)violation.continuation_mode()
+    << std::endl;


And this will flush too, which typically isn't needed for stderr
because it's unbuffered. But somebody could have fiddled with cerr, so
doing this final flush seems OK.


+}
+
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 6f672924a73..baf01913a90 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
  # Keep these sync'd with the list in Makefile.am.  The first 
provides an
  # expandable list at autoconf time; the second provides an 
expandable list

  # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 
src/c++11 src/c++17 src/c++20 src/filesystem src/libbacktrace doc po 
testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 
src/c++11 src/c++17 src/c++20 src/filesystem src/libbacktrace 
src/experimental doc po testsuite python])

  SUBDIRS='glibcxx_SUBDIRS'

  # These need to be absolute paths, yet at the same time need to
diff --git a/libstdc++-v3/include/Makefile.am 

Re: [PATCH v3] c++: P2448 - Relaxing some constexpr restrictions [PR106649]

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/16/22 15:27, Jason Merrill wrote:

On 11/16/22 11:06, Marek Polacek wrote:

On Wed, Nov 16, 2022 at 08:41:53AM -0500, Jason Merrill wrote:

On 11/15/22 19:30, Marek Polacek wrote:
@@ -996,19 +1040,26 @@ register_constexpr_fundef (const 
constexpr_fundef )

 **slot = value;
   }
-/* FUN is a non-constexpr function called in a context that requires a
-   constant expression.  If it comes from a constexpr template, 
explain why

-   the instantiation isn't constexpr.  */
+/* FUN is a non-constexpr (or, with -Wno-invalid-constexpr, a 
constexpr

+   function called in a context that requires a constant expression).
+   If it comes from a constexpr template, explain why the 
instantiation

+   isn't constexpr.  */


The "if it comes from a constexpr template" wording has needed an 
update for

a while now.


Probably ever since r178519.  I've added "Otherwise, explain why the 
function

cannot be used in a constexpr context."  Is that acceptable?

--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/constexpr-nonlit15.C
@@ -0,0 +1,43 @@
+// PR c++/106649
+// P2448 - Relaxing some constexpr restrictions
+// { dg-do compile { target c++23 } }
+// { dg-options "-Winvalid-constexpr" }
+// A copy/move assignment operator for a class X that is defaulted and
+// not defined as deleted is implicitly defined when it is odr-used,
+// when it is needed for constant evaluation, or when it is explicitly
+// defaulted after its first declaration.
+// The implicitly-defined copy/move assignment operator is constexpr.
+
+struct S {
+  constexpr S() {}
+  S& operator=(const S&) = default; // #1
+  S& operator=(S&&) = default; // #2
+};
+
+struct U {
+  constexpr U& operator=(const U&) = default;
+  constexpr U& operator=(U&&) = default;
+};
+
+/* FIXME: If we only declare #1 and #2, and default them here:
+
+   S& S::operator=(const S&) = default;
+   S& S::operator=(S&&) = default;
+
+then they aren't constexpr.  This sounds like a bug:
+.  */


As I commented on the PR, I don't think this is actually a bug, so let's
omit this FIXME.


I'm glad I didn't really attempt to "fix" it (the inform message is 
flawed

and should be improved).  Thanks for taking a look.

Here's a version with the two comments updated.

Ok?


OK.


Since this patch I'm seeing these failures:

FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++23 -fimplicit-constexpr  at 
line 91 (test for errors, line 89)
FAIL: g++.dg/cpp23/constexpr-nonlit10.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 14)
FAIL: g++.dg/cpp23/constexpr-nonlit10.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 20)
FAIL: g++.dg/cpp23/constexpr-nonlit11.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 28)
FAIL: g++.dg/cpp23/constexpr-nonlit11.C  -std=gnu++23 
-fimplicit-constexpr  (test for warnings, line 31)
FAIL: g++.dg/cpp2a/spaceship-eq3.C  -std=c++23 -fimplicit-constexpr 
(test for excess errors)


Jason



Re: [PATCH v2] c++: Reject UDLs in certain contexts [PR105300]

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 18:52, Marek Polacek wrote:

On Thu, Nov 17, 2022 at 07:06:34PM -0500, Jason Merrill wrote:

On 11/16/22 20:12, Marek Polacek wrote:

On Wed, Nov 16, 2022 at 08:22:39AM -0500, Jason Merrill wrote:

On 11/15/22 19:35, Marek Polacek wrote:

On Tue, Nov 15, 2022 at 06:58:39PM -0500, Jason Merrill wrote:

On 11/12/22 06:53, Marek Polacek wrote:

In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


Since the grammar has

user-defined-string-literal :
string-literal ud-suffix

maybe we want to move the UDL handling out to a cp_parser_udl_string_literal
that calls cp_parser_string_literal?


Umm, maybe, but the UDL handling code seems to be too entrenched in
cp_parser_string_literal and I don't think it's going to be easy to extract
it :/.


Fair enough; maybe a wrapper, then?


As in, have a cp_parser_udl_string_literal wrapper that calls
cp_parser_string_literal with udl_ok=true, rename cp_parser_string_literal,
introduce a new cp_parser_string_literal wrapper that passes udl_ok=false?


That's what I was thinking.  And the new cp_parser_string_literal could also
omit the lookup_udlit parm.


One problem with cp_parser_udl_string_literal is that it's too similar to
cp_parser_userdef_string_literal, which would be confusing, I think.


True, probably better to use that name instead, and rename the current one
to something like finish_userdef_string_literal


Sounds good, here's the patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

I've introduced two new wrappers; the existing cp_parser_string_literal
was renamed to cp_parser_string_literal_common and should not be called
directly.  finish_userdef_string_literal is renamed from
cp_parser_userdef_string_literal.

PR c++/105300

gcc/c-family/ChangeLog:

* c-pragma.cc (handle_pragma_message): Warn for CPP_STRING_USERDEF.

gcc/cp/ChangeLog:

* parser.cc: Remove unnecessary forward declarations.
(cp_parser_string_literal): New wrapper.
(cp_parser_string_literal_common): Renamed from
cp_parser_string_literal.  Add a bool parameter.  Give an error when
UDLs are not permitted.
(cp_parser_userdef_string_literal): New wrapper.
(finish_userdef_string_literal): Renamed from
cp_parser_userdef_string_literal.
(cp_parser_primary_expression): Call cp_parser_userdef_string_literal
instead of cp_parser_string_literal.
(cp_parser_linkage_specification): Move a variable declaration closer
to its first use.
(cp_parser_static_assert): Likewise.
(cp_parser_operator): Call cp_parser_userdef_string_literal instead of
cp_parser_string_literal.
(cp_parser_asm_definition): Move a variable declaration closer to its
first use.
(cp_parser_asm_specification_opt): Move variable declarations closer to
their first use.
(cp_parser_asm_operand_list): Likewise.
(cp_parser_asm_clobber_list): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-error1.C: New test.
---
  gcc/c-family/c-pragma.cc  |   3 +
  gcc/cp/parser.cc  | 131 ++
  gcc/testsuite/g++.dg/cpp0x/udlit-error1.C |  21 
  3 files changed, 111 insertions(+), 44 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-error1.C

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 142a46441ac..49f405b605b 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1390,6 +1390,9 @@ handle_pragma_message (cpp_reader *)
  }
else if (token == CPP_STRING)
  message = x;
+  else if (token == CPP_STRING_USERDEF)
+GCC_BAD ("string literal with user-defined suffix is invalid in this "
+"context");
else
  GCC_BAD ("expected a string after %<#pragma message%>");
  
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc

index c5929a6cc5f..e3bd94ffe11 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2223,16 +2223,8 @@ pop_unparsed_function_queues (cp_parser *parser)
  
  /* Lexical conventions [gram.lex]  */
  
-static cp_expr cp_parser_identifier

-  (cp_parser *);
-static cp_expr cp_parser_string_literal
-  (cp_parser *, bool, bool, bool);
-static cp_expr cp_parser_userdef_char_literal
-  (cp_parser *);
-static tree cp_parser_userdef_string_literal
+static tree finish_userdef_string_literal
(tree);
-static cp_expr cp_parser_userdef_numeric_literal
-  

[committed] analyzer: fix feasibility false +ve on jumps through function ptrs [PR107582]

2022-11-18 Thread David Malcolm via Gcc-patches
PR analyzer/107582 reports a false +ve from
-Wanalyzer-use-of-uninitialized-value where
the analyzer's feasibility checker erroneously decides
that point (B) in the code below is reachable, with "x" being
uninitialized there:

pthread_cleanup_push(func, NULL);

while (ret != ETIMEDOUT)
ret = rand() % 1000;

/* (A): after the while loop  */

if (ret != ETIMEDOUT)
  x = 

pthread_cleanup_pop(1);

if (ret == ETIMEDOUT)
  return 0;

/* (B): after not bailing out  */

due to these contradictionary conditions somehow both holding:
  * (ret == ETIMEDOUT), at (A) (skipping the initialization of x), and
  * (ret != ETIMEDOUT), at (B)

The root cause is that after the while loop, state merger puts ret in
the exploded graph in an UNKNOWN state, and saves the diagnostic at (B).

Later, as we explore the feasibilty of reaching the enode for (B),
dynamic_call_info_t::update_model is called to push/pop the
frames for handling the call to "func" in pthread_cleanup_pop.
The "ret" at these nodes in the feasible_graph has a conjured_svalue for
"ret", and a constraint on it being either == *or* != ETIMEDOUT.

However dynamic_call_info_t::update_model blithely clobbers the
model with a copy from the exploded_graph, in which "ret" is UNKNOWN.

This patch fixes dynamic_call_info_t::update_model so that it
simulates pushing/popping a frame on the model we're working with,
preserving knowledge of the constraint on "ret", and enabling the
analyzer to "know" that the bail-out must happen.

Doing so fixes the false positive.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4158-ga7aef0a5a2b7e2.

gcc/analyzer/ChangeLog:
PR analyzer/107582
* engine.cc (dynamic_call_info_t::update_model): Update the model
by pushing or pop a frame, rather than by clobbering it with the
model from the exploded_node's state.

gcc/testsuite/ChangeLog:
PR analyzer/107582
* gcc.dg/analyzer/feasibility-4.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-1.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-2.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc| 14 --
 gcc/testsuite/gcc.dg/analyzer/feasibility-4.c | 42 ++
 .../gcc.dg/analyzer/feasibility-pr107582-1.c  | 43 +++
 .../gcc.dg/analyzer/feasibility-pr107582-2.c  | 34 +++
 4 files changed, 129 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/feasibility-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/feasibility-pr107582-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/feasibility-pr107582-2.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index b52753da793..db1881cd140 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -2024,16 +2024,22 @@ exploded_node::dump_succs_and_preds (FILE *outf) const
 /* Implementation of custom_edge_info::update_model vfunc
for dynamic_call_info_t.
 
-   Update state for the dynamically discorverd calls */
+   Update state for a dynamically discovered call (or return), by pushing
+   or popping the a frame for the appropriate function.  */
 
 bool
 dynamic_call_info_t::update_model (region_model *model,
   const exploded_edge *eedge,
-  region_model_context *) const
+  region_model_context *ctxt) const
 {
   gcc_assert (eedge);
-  const program_state _state = eedge->m_dest->get_state ();
-  *model = *dest_state.m_region_model;
+  if (m_is_returning_call)
+model->update_for_return_gcall (m_dynamic_call, ctxt);
+  else
+{
+  function *callee = eedge->m_dest->get_function ();
+  model->update_for_gcall (m_dynamic_call, ctxt, callee);
+}
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/feasibility-4.c 
b/gcc/testsuite/gcc.dg/analyzer/feasibility-4.c
new file mode 100644
index 000..1a1128089fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/feasibility-4.c
@@ -0,0 +1,42 @@
+#include "analyzer-decls.h"
+
+extern int rand (void);
+
+void test_1 (void)
+{
+  int   ret = 0;
+  while (ret != 42)
+ret = rand() % 1000;
+
+  if (ret != 42)
+__analyzer_dump_path (); /* { dg-bogus "path" } */
+}
+
+static void empty_local_fn (void) {}
+extern void external_fn (void);
+
+void test_2 (void)
+{
+  void (*callback) () = empty_local_fn;
+  int   ret = 0;
+  while (ret != 42)
+ret = rand() % 1000;
+
+  (*callback) ();
+
+  if (ret != 42)
+__analyzer_dump_path (); /* { dg-bogus "path" } */
+}
+
+void test_3 (void)
+{
+  void (*callback) () = external_fn;
+  int   ret = 0;
+  while (ret != 42)
+ret = rand() % 1000;
+
+  (*callback) ();
+
+  if (ret != 42)
+__analyzer_dump_path (); /* { dg-bogus "path" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/feasibility-pr107582-1.c 

[PATCH v2] c++: Reject UDLs in certain contexts [PR105300]

2022-11-18 Thread Marek Polacek via Gcc-patches
On Thu, Nov 17, 2022 at 07:06:34PM -0500, Jason Merrill wrote:
> On 11/16/22 20:12, Marek Polacek wrote:
> > On Wed, Nov 16, 2022 at 08:22:39AM -0500, Jason Merrill wrote:
> > > On 11/15/22 19:35, Marek Polacek wrote:
> > > > On Tue, Nov 15, 2022 at 06:58:39PM -0500, Jason Merrill wrote:
> > > > > On 11/12/22 06:53, Marek Polacek wrote:
> > > > > > In this PR, we are crashing because we've encountered a UDL where a
> > > > > > string-literal is expected.  This patch makes the parser reject 
> > > > > > string
> > > > > > and character UDLs in all places where the grammar requires a
> > > > > > string-literal and not a user-defined-string-literal.
> > > > > > 
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > 
> > > > > Since the grammar has
> > > > > 
> > > > > user-defined-string-literal :
> > > > >   string-literal ud-suffix
> > > > > 
> > > > > maybe we want to move the UDL handling out to a 
> > > > > cp_parser_udl_string_literal
> > > > > that calls cp_parser_string_literal?
> > > > 
> > > > Umm, maybe, but the UDL handling code seems to be too entrenched in
> > > > cp_parser_string_literal and I don't think it's going to be easy to 
> > > > extract
> > > > it :/.
> > > 
> > > Fair enough; maybe a wrapper, then?
> > 
> > As in, have a cp_parser_udl_string_literal wrapper that calls
> > cp_parser_string_literal with udl_ok=true, rename cp_parser_string_literal,
> > introduce a new cp_parser_string_literal wrapper that passes udl_ok=false?
> 
> That's what I was thinking.  And the new cp_parser_string_literal could also
> omit the lookup_udlit parm.
> 
> > One problem with cp_parser_udl_string_literal is that it's too similar to
> > cp_parser_userdef_string_literal, which would be confusing, I think.
> 
> True, probably better to use that name instead, and rename the current one
> to something like finish_userdef_string_literal

Sounds good, here's the patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In this PR, we are crashing because we've encountered a UDL where a
string-literal is expected.  This patch makes the parser reject string
and character UDLs in all places where the grammar requires a
string-literal and not a user-defined-string-literal.

I've introduced two new wrappers; the existing cp_parser_string_literal
was renamed to cp_parser_string_literal_common and should not be called
directly.  finish_userdef_string_literal is renamed from
cp_parser_userdef_string_literal.

PR c++/105300

gcc/c-family/ChangeLog:

* c-pragma.cc (handle_pragma_message): Warn for CPP_STRING_USERDEF.

gcc/cp/ChangeLog:

* parser.cc: Remove unnecessary forward declarations.
(cp_parser_string_literal): New wrapper.
(cp_parser_string_literal_common): Renamed from
cp_parser_string_literal.  Add a bool parameter.  Give an error when
UDLs are not permitted.
(cp_parser_userdef_string_literal): New wrapper.
(finish_userdef_string_literal): Renamed from
cp_parser_userdef_string_literal.
(cp_parser_primary_expression): Call cp_parser_userdef_string_literal
instead of cp_parser_string_literal.
(cp_parser_linkage_specification): Move a variable declaration closer
to its first use.
(cp_parser_static_assert): Likewise.
(cp_parser_operator): Call cp_parser_userdef_string_literal instead of
cp_parser_string_literal.
(cp_parser_asm_definition): Move a variable declaration closer to its
first use.
(cp_parser_asm_specification_opt): Move variable declarations closer to
their first use.
(cp_parser_asm_operand_list): Likewise.
(cp_parser_asm_clobber_list): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-error1.C: New test.
---
 gcc/c-family/c-pragma.cc  |   3 +
 gcc/cp/parser.cc  | 131 ++
 gcc/testsuite/g++.dg/cpp0x/udlit-error1.C |  21 
 3 files changed, 111 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-error1.C

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 142a46441ac..49f405b605b 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1390,6 +1390,9 @@ handle_pragma_message (cpp_reader *)
 }
   else if (token == CPP_STRING)
 message = x;
+  else if (token == CPP_STRING_USERDEF)
+GCC_BAD ("string literal with user-defined suffix is invalid in this "
+"context");
   else
 GCC_BAD ("expected a string after %<#pragma message%>");
 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index c5929a6cc5f..e3bd94ffe11 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2223,16 +2223,8 @@ pop_unparsed_function_queues (cp_parser *parser)
 
 /* Lexical conventions [gram.lex]  */
 
-static cp_expr cp_parser_identifier
-  (cp_parser *);
-static cp_expr cp_parser_string_literal
-  (cp_parser *, bool, bool, 

Re: [PATCH] c++: remove coerce_innermost_template_parms

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 16:59, Patrick Palka wrote:

IIUC the only practical difference between coerce_innermost_template_parms
and the main function coerce_template_parms is that the former takes
a multi-level template parameter list and returns a template argument
vector of the same depth, whereas the latter takes a single-level
template parameter vector and returns a single-level template argument
vector.

This patch gets rid of the wrapper function and just overloads the
behavior of the main function according to whether 'parms' is a
multi-level template parameter list or a single-level template argument
vector.  It turns out we can assume parms and args have the same depth
in the multi-level case, which simplifies the overloading logic.

Besides the (subjective) simplificatio benefit, another benefit of this
unification is that it avoids a redundant copy of a multi-level 'args'.
Now, we can return new_args directly from c_t_p.  (And because of this,
we need to turn new_inner_args into a reference so that updating it also
updates new_args.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.  But this doesn't really seem like stage 3 material; let's hold off 
on further cleanups like this until next stage 1.



gcc/cp/ChangeLog:

* pt.cc (coerce_template_parms): Salvage part of the function
comment from c_innermost_t_p.  Handle parms being a full
template parameter list.
(coerce_innermost_template_parms): Remove.
(lookup_template_class): Use c_t_p instead of c_innermost_t_p.
(finish_template_variable): Likewise.
(tsubst_decl): Likewise.
(instantiate_alias_template): Likewise.
---
  gcc/cp/pt.cc | 92 +++-
  1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0310e38c9b9..2666e455edf 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -148,8 +148,6 @@ static void add_pending_template (tree);
  static tree reopen_tinst_level (struct tinst_level *);
  static tree tsubst_initializer_list (tree, tree);
  static tree get_partial_spec_bindings (tree, tree, tree);
-static tree coerce_innermost_template_parms (tree, tree, tree, tsubst_flags_t,
-bool = true);
  static void tsubst_enum   (tree, tree, tree);
  static bool check_instantiated_args (tree, tree, tsubst_flags_t);
  static int check_non_deducible_conversion (tree, tree, unification_kind_t, 
int,
@@ -8827,6 +8825,14 @@ pack_expansion_args_count (tree args)
 arguments.  If any error occurs, return error_mark_node. Error and
 warning messages are issued under control of COMPLAIN.
  
+   If PARMS represents all template parameters levels, this function

+   returns a vector of vectors representing all the resulting argument
+   levels.  Note that in this case, only the innermost arguments are
+   coerced because the outermost ones are supposed to have been coerced
+   already.  Otherwise, if PARMS represents only (the innermost) vector
+   of parameters, this function returns a vector containing just the
+   innermost resulting arguments.
+
 If REQUIRE_ALL_ARGS is false, argument deduction will be performed
 for arguments not specified in ARGS.  If REQUIRE_ALL_ARGS is true,
 arguments not specified in ARGS must have default arguments which
@@ -8842,8 +8848,6 @@ coerce_template_parms (tree parms,
int nparms, nargs, parm_idx, arg_idx, lost = 0;
tree orig_inner_args;
tree inner_args;
-  tree new_args;
-  tree new_inner_args;
  
/* When used as a boolean value, indicates whether this is a

   variadic template parameter list. Since it's an int, we can also
@@ -8864,6 +8868,17 @@ coerce_template_parms (tree parms,
if (args == error_mark_node)
  return error_mark_node;
  
+  bool return_full_args = false;

+  if (TREE_CODE (parms) == TREE_LIST)
+{
+  if (TMPL_PARMS_DEPTH (parms) > 1)
+   {
+ gcc_assert (TMPL_PARMS_DEPTH (parms) == TMPL_ARGS_DEPTH (args));
+ return_full_args = true;
+   }
+  parms = INNERMOST_TEMPLATE_PARMS (parms);
+}
+
nparms = TREE_VEC_LENGTH (parms);
  
/* Determine if there are any parameter packs or default arguments.  */

@@ -8961,8 +8976,8 @@ coerce_template_parms (tree parms,
   template-id may be nested within a "sizeof".  */
cp_evaluated ev;
  
-  new_inner_args = make_tree_vec (nparms);

-  new_args = add_outermost_template_args (args, new_inner_args);
+  tree new_args = add_outermost_template_args (args, make_tree_vec (nparms));
+  tree& new_inner_args = TMPL_ARGS_LEVEL (new_args, TMPL_ARGS_DEPTH 
(new_args));
int pack_adjust = 0;
for (parm_idx = 0, arg_idx = 0; parm_idx < nparms; parm_idx++, arg_idx++)
  {
@@ -9164,59 +9179,7 @@ coerce_template_parms (tree parms,
  SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 TREE_VEC_LENGTH (new_inner_args));
  

nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel' (was: [MentorEmbedded/nvptx-tools] Match standard 'ld' "search" behavior (PR #38))

2022-11-18 Thread Thomas Schwinge
Hi!

Re
:

On 2022-11-18T11:05:23-0800, I wrote:
> Actually, in GCC/nvptx target testing, this #38's commit 
> 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820 "ld: Don't search for input files in 
> '-L'directories" is generally causing linking to fail with:
>
> ```
> error opening crt0.o
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
> ```
>
> I'm investigating.

OK to push the attached
GCC "nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'" to all
active GCC branches?  (... instead of having to restore this "blunder"
(do "search for input files in '-L'directories") in nvptx-tools...)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 85ddd99017968e8aa45342645be9642e63bcc5bb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 18 Nov 2022 23:57:52 +0100
Subject: [PATCH] nvptx: In 'STARTFILE_SPEC', fix 'crt0.o' for '-mmainkernel'

A recent nvptx-tools change: commit 886a95faf66bf66a82fc0fe7d2a9fd9e9fec2820
"ld: Don't search for input files in '-L'directories" (of

"Match standard 'ld' "search" behavior") in GCC/nvptx target testing
generally causes linking to fail with:

error opening crt0.o
collect2: error: ld returned 1 exit status
compiler exited with status 1

Indeed per GCC '-v' output, there is an undecorated 'crt0.o' on the linker
('collect2') command line:

 [...]/build-gcc/./gcc/collect2 -o [...] crt0.o [...]

This is due to:

gcc/config/nvptx/nvptx.h:#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"

..., and the fix, as used by numerous other GCC targets, is to instead use
'crt0.o%s'; for '%s' means, per 'gcc/gcc.cc', "The Specs Language":

 %s current argument is the name of a library or startup file of some sort.
Search for that file in a standard list of directories
and substitute the full name found.

With that, we get the expected path to 'crt0.o'.

	gcc/
	* config/nvptx/nvptx.h (STARTFILE_SPEC): Fix 'crt0.o' for
	'-mmainkernel'.
---
 gcc/config/nvptx/nvptx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 0afc83b10a3..dc676dcb5fc 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -35,7 +35,7 @@
'../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
 #define ASM_SPEC "%{v}"
 
-#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
+#define STARTFILE_SPEC "%{mmainkernel:crt0.o%s}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
 
-- 
2.25.1



Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-18 Thread Jeff Law



On 11/18/22 14:26, Philipp Tomsich wrote:

On Fri, 18 Nov 2022 at 22:13, Jeff Law  wrote:


On 11/9/22 16:07, Philipp Tomsich wrote:

Handling the register-const_int addition has very quickly escalated to
creating a full sign-extended 32bit constant and performing a
register-register for RISC-V in GCC so far, resulting in sequences like
(for the case of "a + 2048"):
   li  a5,4096
   addia5,a5,-2048
   add a0,a0,a5

By adding an expansion for add3, we can emit optimised RTL that
matches the capabilities of RISC-V better by adding support for the
following, previously unoptimised cases:
- addi + addi
   addia0,a0,2047
   addia0,a0,1
- li + sh[123]add (if Zba is enabled)
   li  a5,960
   sh3add  a0,a5,a0

With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
the expander will otherwise wrap the resulting set-expression in an
insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.

This closes the gap to LLVM, which has already been emitting these
optimised sequences.

Note that this benefits is perlbench (in SPEC CPU 2017), which needs
to add the constant 3840.

gcc/ChangeLog:

   * config/riscv/bitmanip.md (*shNadd): Rename.
   (riscv_shNadd): Expose as gen_riscv_shNadd{di/si}.
   * config/riscv/predicates.md (const_arith_shifted123_operand):
   New predicate (for constants that are a simm12, shifted by
   1, 2 or 3).
   (const_arith_2simm12_operand): New predicate (that can be
   expressed by adding 2 simm12 together).
   (addi_operand): New predicate (an immedaite operand suitable
   for the new add3 expansion).
   * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
   Don't use gen_add3_insn, where a RTX instead of an INSN is
   required (otherwise this will break as soon as we have a
   define_expand for add3).
   (riscv_adjust_libcall_cfi_epilogue): Same.
   * config/riscv/riscv.md (addsi3): Rename.
   (riscv_addsi3): New name for addsi3.
   (adddi3): Rename.
   (riscv_adddi3): New name for adddi3.
   (add3): New expander that handles the basic and fancy
   (such as li+sh[123]add, addi+addi, ...) cases for adding
   register-register and register-const_int.

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/addi.c: New test.
   * gcc.target/riscv/zba-shNadd-06.c: New test.

Signed-off-by: Philipp Tomsich 
---

   gcc/config/riscv/bitmanip.md  |  2 +-
   gcc/config/riscv/predicates.md| 28 +
   gcc/config/riscv/riscv.cc | 10 ++--
   gcc/config/riscv/riscv.md | 58 ++-
   gcc/testsuite/gcc.target/riscv/addi.c | 39 +
   .../gcc.target/riscv/zba-shNadd-06.c  | 11 
   6 files changed, 141 insertions(+), 7 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
   create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c



diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 171a0cdced6..289ff7470c6 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -464,6 +464,60 @@
 [(set_attr "type" "arith")
  (set_attr "mode" "DI")])

+(define_expand "add3"
+  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
+ (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
+   (match_operand:GPR 2 "addi_operand"  " r,I")))]
+  ""
+{
+  if (arith_operand (operands[2], mode))
+emit_insn (gen_riscv_add3 (operands[0], operands[1], operands[2]));
+  else if (const_arith_2simm12_operand (operands[2], mode))
+{
+  /* Split into two immediates that add up to the desired value:
+   * e.g., break up "a + 2445" into:
+   * addia0,a0,2047
+   *  addi   a0,a0,398
+   */

Nit.  GNU comment style please.



+
+  HOST_WIDE_INT val = INTVAL (operands[2]);
+  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
+
+  if (val >= 0)
+  saturated = ~saturated;
+
+  val -= saturated;
+
+  rtx tmp = gen_reg_rtx (mode);

Can't add3 be generated by LRA?  If so, don't you have to guard
against going into this path as we shouldn't be creating new pseudos at
that point (I know LRA can create some internally, but I don't think it
handles new ones showing up due to target expanders).


Similarly for the shifted_123 case immediately following.


If we do indeed have an issue here, I'm not sure how best to resolve.
If the output operand does not overlap with the inputs, then we're
golden and can just re-use it to form the constant.  If not,  then it's
a bit tougher.  I'm not keen to add a test of no_new_pseudos to the
operand predicate, but I don't see a better option yet.

 From a cursory glance, LRA does not try to go through gen_add3_insn,
but rather forms PLUS rtx.  This 

[committed] analyzer: move more impl_* to known_function

2022-11-18 Thread David Malcolm via Gcc-patches
Fix a missing check that the argument to __analyzer_dump_capacity must
be a pointer type (which would otherwise lead to an ICE).

Do so by using the known_function_manager rather than by doing lots of
string matching.  Do the same for many other functions.

Doing so moves the type-checking closer to the logic that makes use
of it, by putting them in the same class, rather than splitting them
up between two source files (and sometimes three, e.g. for "pipe").
I hope this reduces the number of missing checks.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-4157-g1c4a7881c49279.

gcc/analyzer/ChangeLog:
* analyzer.cc (is_pipe_call_p): Delete.
* analyzer.h (is_pipe_call_p): Delete.
* region-model-impl-calls.cc (call_details::get_location): New.
(class kf_analyzer_break): New, adapted from
region_model::on_stmt_pre.
(region_model::impl_call_analyzer_describe): Convert to...
(class kf_analyzer_describe): ...this.
(region_model::impl_call_analyzer_dump_capacity): Convert to...
(class kf_analyzer_dump_capacity): ...this.
(region_model::impl_call_analyzer_dump_escaped): Convert to...
(class kf_analyzer_dump_escaped): ...this.
(class kf_analyzer_dump_exploded_nodes): New.
(region_model::impl_call_analyzer_dump_named_constant): Convert
to...
(class kf_analyzer_dump_named_constant): ...this.
(class dump_path_diagnostic): Move here from region-model.cc.
(class kf_analyzer_dump_path) New, adapted from
region_model::on_stmt_pre.
(class kf_analyzer_dump_region_model): Likewise.
(region_model::impl_call_analyzer_eval): Convert to...
(class kf_analyzer_eval): ...this.
(region_model::impl_call_analyzer_get_unknown_ptr): Convert to...
(class kf_analyzer_get_unknown_ptr): ...this.
(class known_function_accept): Rename to...
(class kf_accept): ...this.
(class known_function_bind): Rename to...
(class kf_bind): ...this.
(class known_function_connect): Rename to...
(class kf_connect): ...this.
(region_model::impl_call_errno_location): Convert to...
(class kf_errno_location): ...this.
(class known_function_listen): Rename to...
(class kf_listen): ...this.
(region_model::impl_call_pipe): Convert to...
(class kf_pipe): ...this.
(region_model::impl_call_putenv): Convert to...
(class kf_putenv): ...this.
(region_model::impl_call_operator_new): Convert to...
(class kf_operator_new): ...this.
(region_model::impl_call_operator_delete): Convert to...
(class kf_operator_delete): ...this.
(class known_function_socket): Rename to...
(class kf_socket): ...this.
(register_known_functions): Rename param to KFM.  Break out
existing known functions into a "POSIX" section, and add "pipe",
"pipe2", and "putenv".  Add debugging functions
"__analyzer_break", "__analyzer_describe",
"__analyzer_dump_capacity", "__analyzer_dump_escaped",
"__analyzer_dump_exploded_nodes",
"__analyzer_dump_named_constant", "__analyzer_dump_path",
"__analyzer_dump_region_model", "__analyzer_eval",
"__analyzer_get_unknown_ptr".  Add C++ support functions
"operator new", "operator new []", "operator delete", and
"operator delete []".
* region-model.cc (class dump_path_diagnostic): Move to
region-model-impl-calls.cc.
(region_model::on_stmt_pre): Eliminate special-casing of
"__analyzer_describe", "__analyzer_dump_capacity",
"__analyzer_dump_escaped", "__analyzer_dump_named_constant",
"__analyzer_dump_path", "__analyzer_dump_region_model",
"__analyzer_eval", "__analyzer_break",
"__analyzer_dump_exploded_nodes", "__analyzer_get_unknown_ptr",
"__errno_location", "pipe", "pipe2", "putenv", "operator new",
"operator new []", "operator delete", "operator delete []"
"pipe" and "pipe2", handling them instead via the known_functions
mechanism.
* region-model.h (call_details::get_location): New decl.
(region_model::impl_call_analyzer_describe): Delete decl.
(region_model::impl_call_analyzer_dump_capacity): Delete decl.
(region_model::impl_call_analyzer_dump_escaped): Delete decl.
(region_model::impl_call_analyzer_dump_named_constant): Delete decl.
(region_model::impl_call_analyzer_eval): Delete decl.
(region_model::impl_call_analyzer_get_unknown_ptr): Delete decl.
(region_model::impl_call_errno_location): Delete decl.
(region_model::impl_call_pipe): Delete decl.
(region_model::impl_call_putenv): Delete decl.
(region_model::impl_call_operator_new): Delete decl.
(region_model::impl_call_operator_delete): Delete decl.
* sm-fd.cc: 

[PATCH] c++: remove coerce_innermost_template_parms

2022-11-18 Thread Patrick Palka via Gcc-patches
IIUC the only practical difference between coerce_innermost_template_parms
and the main function coerce_template_parms is that the former takes
a multi-level template parameter list and returns a template argument
vector of the same depth, whereas the latter takes a single-level
template parameter vector and returns a single-level template argument
vector.

This patch gets rid of the wrapper function and just overloads the
behavior of the main function according to whether 'parms' is a
multi-level template parameter list or a single-level template argument
vector.  It turns out we can assume parms and args have the same depth
in the multi-level case, which simplifies the overloading logic.

Besides the (subjective) simplificatio benefit, another benefit of this
unification is that it avoids a redundant copy of a multi-level 'args'.
Now, we can return new_args directly from c_t_p.  (And because of this,
we need to turn new_inner_args into a reference so that updating it also
updates new_args.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (coerce_template_parms): Salvage part of the function
comment from c_innermost_t_p.  Handle parms being a full
template parameter list.
(coerce_innermost_template_parms): Remove.
(lookup_template_class): Use c_t_p instead of c_innermost_t_p.
(finish_template_variable): Likewise.
(tsubst_decl): Likewise.
(instantiate_alias_template): Likewise.
---
 gcc/cp/pt.cc | 92 +++-
 1 file changed, 27 insertions(+), 65 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 0310e38c9b9..2666e455edf 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -148,8 +148,6 @@ static void add_pending_template (tree);
 static tree reopen_tinst_level (struct tinst_level *);
 static tree tsubst_initializer_list (tree, tree);
 static tree get_partial_spec_bindings (tree, tree, tree);
-static tree coerce_innermost_template_parms (tree, tree, tree, tsubst_flags_t,
-bool = true);
 static void tsubst_enum(tree, tree, tree);
 static bool check_instantiated_args (tree, tree, tsubst_flags_t);
 static int check_non_deducible_conversion (tree, tree, unification_kind_t, int,
@@ -8827,6 +8825,14 @@ pack_expansion_args_count (tree args)
arguments.  If any error occurs, return error_mark_node. Error and
warning messages are issued under control of COMPLAIN.
 
+   If PARMS represents all template parameters levels, this function
+   returns a vector of vectors representing all the resulting argument
+   levels.  Note that in this case, only the innermost arguments are
+   coerced because the outermost ones are supposed to have been coerced
+   already.  Otherwise, if PARMS represents only (the innermost) vector
+   of parameters, this function returns a vector containing just the
+   innermost resulting arguments.
+
If REQUIRE_ALL_ARGS is false, argument deduction will be performed
for arguments not specified in ARGS.  If REQUIRE_ALL_ARGS is true,
arguments not specified in ARGS must have default arguments which
@@ -8842,8 +8848,6 @@ coerce_template_parms (tree parms,
   int nparms, nargs, parm_idx, arg_idx, lost = 0;
   tree orig_inner_args;
   tree inner_args;
-  tree new_args;
-  tree new_inner_args;
 
   /* When used as a boolean value, indicates whether this is a
  variadic template parameter list. Since it's an int, we can also
@@ -8864,6 +8868,17 @@ coerce_template_parms (tree parms,
   if (args == error_mark_node)
 return error_mark_node;
 
+  bool return_full_args = false;
+  if (TREE_CODE (parms) == TREE_LIST)
+{
+  if (TMPL_PARMS_DEPTH (parms) > 1)
+   {
+ gcc_assert (TMPL_PARMS_DEPTH (parms) == TMPL_ARGS_DEPTH (args));
+ return_full_args = true;
+   }
+  parms = INNERMOST_TEMPLATE_PARMS (parms);
+}
+
   nparms = TREE_VEC_LENGTH (parms);
 
   /* Determine if there are any parameter packs or default arguments.  */
@@ -8961,8 +8976,8 @@ coerce_template_parms (tree parms,
  template-id may be nested within a "sizeof".  */
   cp_evaluated ev;
 
-  new_inner_args = make_tree_vec (nparms);
-  new_args = add_outermost_template_args (args, new_inner_args);
+  tree new_args = add_outermost_template_args (args, make_tree_vec (nparms));
+  tree& new_inner_args = TMPL_ARGS_LEVEL (new_args, TMPL_ARGS_DEPTH 
(new_args));
   int pack_adjust = 0;
   for (parm_idx = 0, arg_idx = 0; parm_idx < nparms; parm_idx++, arg_idx++)
 {
@@ -9164,59 +9179,7 @@ coerce_template_parms (tree parms,
 SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 TREE_VEC_LENGTH (new_inner_args));
 
-  return new_inner_args;
-}
-
-/* Like coerce_template_parms.  If PARMS represents all template
-   parameters levels, this function returns a vector of vectors
-   representing all the resulting argument 

[PATCH] c++: cache the normal form of a concept-id

2022-11-18 Thread Patrick Palka via Gcc-patches
We already cache the overall normal form of a declaration's constraints
under the assumption that it can't change over the translation unit.
But if we have two constrained declarations such as

  template void f() requires expensive && A;
  template void g() requires expensive && B;

then despite this high-level caching we'd still redundantly have to
expand the concept-id expensive twice, once during normalization of
f's constraints and again during normalization of g's.  Ideally, we'd
reuse the previously computed normal form of expensive the second
time around.

To that end this patch introduces an intermediate layer of caching
during constraint normalization -- caching of the normal form of a
concept-id -- that sits between our high-level caching of the overall
normal form of a declaration's constraints and our low-level caching of
each individual atomic constraint.

It turns out this caching generalizes some ad-hoc caching of the normal
form of concept definition (which is equivalent to the normal form of
the concept-id C where gtargs are C's generic arguments) so
this patch unifies the caching accordingly.

This change improves compile time/memory usage for e.g. the libstdc++
test std/ranges/adaptors/join.cc by 10%/5%.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* constraint.cc (struct norm_entry): Define.
(struct norm_hasher): Define.
(norm_cache): Define.
(normalize_concept_check): Add function comment.  Cache the
result of concept-id normalization.  Canonicalize generic
arguments as NULL_TREE.  Don't coerce arguments unless
substitution occurred.
(normalize_concept_definition): Simplify.  Use norm_cache
instead of ad-hoc caching.
---
 gcc/cp/constraint.cc | 94 ++--
 1 file changed, 82 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a113d3e269e..c9740b1ec78 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -698,6 +698,40 @@ normalize_logical_operation (tree t, tree args, tree_code 
c, norm_info info)
   return build2 (c, ci, t0, t1);
 }
 
+/* Data types and hash functions for caching the normal form of a concept-id.
+   This essentially memoizes calls to normalize_concept_check.  */
+
+struct GTY((for_user)) norm_entry
+{
+  /* The CONCEPT_DECL of the concept-id.  */
+  tree tmpl;
+  /* The arguments of the concept-id.  */
+  tree args;
+  /* The normal form of the concept-id.  */
+  tree norm;
+};
+
+struct norm_hasher : ggc_ptr_hash
+{
+  static hashval_t hash (norm_entry *t)
+  {
+hashval_t hash = iterative_hash_template_arg (t->tmpl, 0);
+hash = iterative_hash_template_arg (t->args, hash);
+return hash;
+  }
+
+  static bool equal (norm_entry *t1, norm_entry *t2)
+  {
+return t1->tmpl == t2->tmpl
+  && template_args_equal (t1->args, t2->args);
+  }
+};
+
+static GTY((deletable)) hash_table *norm_cache;
+
+/* Normalize the concept check CHECK where ARGS are the
+   arguments to be substituted into CHECK's arguments.  */
+
 static tree
 normalize_concept_check (tree check, tree args, norm_info info)
 {
@@ -720,24 +754,53 @@ normalize_concept_check (tree check, tree args, norm_info 
info)
 targs = tsubst_template_args (targs, args, info.complain, info.in_decl);
   if (targs == error_mark_node)
 return error_mark_node;
+  if (template_args_equal (targs, generic_targs_for (tmpl)))
+/* Canonicalize generic arguments as NULL_TREE, as an optimization.  */
+targs = NULL_TREE;
 
   /* Build the substitution for the concept definition.  */
   tree parms = TREE_VALUE (DECL_TEMPLATE_PARMS (tmpl));
-  /* Turn on template processing; coercing non-type template arguments
- will automatically assume they're non-dependent.  */
   ++processing_template_decl;
-  tree subst = coerce_template_parms (parms, targs, tmpl, tf_none);
+  if (targs && args)
+/* If substitution occurred, coerce the resulting arguments.  */
+targs = coerce_template_parms (parms, targs, tmpl, tf_none);
   --processing_template_decl;
-  if (subst == error_mark_node)
+  if (targs == error_mark_node)
 return error_mark_node;
 
+  if (!norm_cache)
+norm_cache = hash_table::create_ggc (31);
+  norm_entry entry = {tmpl, targs, NULL_TREE};
+  norm_entry **slot = nullptr;
+  hashval_t hash = 0;
+  if (!info.generate_diagnostics ())
+{
+  /* If we're not diagnosing, cache the normal form of the
+substituted concept-id.  */
+  hash = norm_hasher::hash ();
+  slot = norm_cache->find_slot_with_hash (, hash, INSERT);
+  if (*slot)
+   return (*slot)->norm;
+}
+
   /* The concept may have been ill-formed.  */
   tree def = get_concept_definition (DECL_TEMPLATE_RESULT (tmpl));
   if (def == error_mark_node)
 return error_mark_node;
 
   info.update_context (check, args);
-  return normalize_expression (def, subst, info);
+  tree norm = 

Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-18 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 22:13, Jeff Law  wrote:
>
>
> On 11/9/22 16:07, Philipp Tomsich wrote:
> > Handling the register-const_int addition has very quickly escalated to
> > creating a full sign-extended 32bit constant and performing a
> > register-register for RISC-V in GCC so far, resulting in sequences like
> > (for the case of "a + 2048"):
> >   li  a5,4096
> >   addia5,a5,-2048
> >   add a0,a0,a5
> >
> > By adding an expansion for add3, we can emit optimised RTL that
> > matches the capabilities of RISC-V better by adding support for the
> > following, previously unoptimised cases:
> >- addi + addi
> >   addia0,a0,2047
> >   addia0,a0,1
> >- li + sh[123]add (if Zba is enabled)
> >   li  a5,960
> >   sh3add  a0,a5,a0
> >
> > With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
> > and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
> > the expander will otherwise wrap the resulting set-expression in an
> > insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.
> >
> > This closes the gap to LLVM, which has already been emitting these
> > optimised sequences.
> >
> > Note that this benefits is perlbench (in SPEC CPU 2017), which needs
> > to add the constant 3840.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*shNadd): Rename.
> >   (riscv_shNadd): Expose as gen_riscv_shNadd{di/si}.
> >   * config/riscv/predicates.md (const_arith_shifted123_operand):
> >   New predicate (for constants that are a simm12, shifted by
> >   1, 2 or 3).
> >   (const_arith_2simm12_operand): New predicate (that can be
> >   expressed by adding 2 simm12 together).
> >   (addi_operand): New predicate (an immedaite operand suitable
> >   for the new add3 expansion).
> >   * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
> >   Don't use gen_add3_insn, where a RTX instead of an INSN is
> >   required (otherwise this will break as soon as we have a
> >   define_expand for add3).
> >   (riscv_adjust_libcall_cfi_epilogue): Same.
> >   * config/riscv/riscv.md (addsi3): Rename.
> >   (riscv_addsi3): New name for addsi3.
> >   (adddi3): Rename.
> >   (riscv_adddi3): New name for adddi3.
> >   (add3): New expander that handles the basic and fancy
> >   (such as li+sh[123]add, addi+addi, ...) cases for adding
> >   register-register and register-const_int.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/addi.c: New test.
> >   * gcc.target/riscv/zba-shNadd-06.c: New test.
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >   gcc/config/riscv/bitmanip.md  |  2 +-
> >   gcc/config/riscv/predicates.md| 28 +
> >   gcc/config/riscv/riscv.cc | 10 ++--
> >   gcc/config/riscv/riscv.md | 58 ++-
> >   gcc/testsuite/gcc.target/riscv/addi.c | 39 +
> >   .../gcc.target/riscv/zba-shNadd-06.c  | 11 
> >   6 files changed, 141 insertions(+), 7 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c
> >
> >
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 171a0cdced6..289ff7470c6 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -464,6 +464,60 @@
> > [(set_attr "type" "arith")
> >  (set_attr "mode" "DI")])
> >
> > +(define_expand "add3"
> > +  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
> > + (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
> > +   (match_operand:GPR 2 "addi_operand"  " r,I")))]
> > +  ""
> > +{
> > +  if (arith_operand (operands[2], mode))
> > +emit_insn (gen_riscv_add3 (operands[0], operands[1], 
> > operands[2]));
> > +  else if (const_arith_2simm12_operand (operands[2], mode))
> > +{
> > +  /* Split into two immediates that add up to the desired value:
> > +   * e.g., break up "a + 2445" into:
> > +   * addia0,a0,2047
> > +   *  addi   a0,a0,398
> > +   */
>
> Nit.  GNU comment style please.
>
>
> > +
> > +  HOST_WIDE_INT val = INTVAL (operands[2]);
> > +  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
> > +
> > +  if (val >= 0)
> > +  saturated = ~saturated;
> > +
> > +  val -= saturated;
> > +
> > +  rtx tmp = gen_reg_rtx (mode);
>
> Can't add3 be generated by LRA?  If so, don't you have to guard
> against going into this path as we shouldn't be creating new pseudos at
> that point (I know LRA can create some internally, but I don't think it
> handles new ones showing up due to target expanders).
>
>
> Similarly for the shifted_123 case immediately following.
>
>
> If we do indeed have an issue here, I'm not sure how best to resolve.
> If the output 

Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/9/22 16:07, Philipp Tomsich wrote:

Handling the register-const_int addition has very quickly escalated to
creating a full sign-extended 32bit constant and performing a
register-register for RISC-V in GCC so far, resulting in sequences like
(for the case of "a + 2048"):
li  a5,4096
addia5,a5,-2048
add a0,a0,a5

By adding an expansion for add3, we can emit optimised RTL that
matches the capabilities of RISC-V better by adding support for the
following, previously unoptimised cases:
   - addi + addi
addia0,a0,2047
addia0,a0,1
   - li + sh[123]add (if Zba is enabled)
li  a5,960
sh3add  a0,a5,a0

With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
the expander will otherwise wrap the resulting set-expression in an
insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.

This closes the gap to LLVM, which has already been emitting these
optimised sequences.

Note that this benefits is perlbench (in SPEC CPU 2017), which needs
to add the constant 3840.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*shNadd): Rename.
(riscv_shNadd): Expose as gen_riscv_shNadd{di/si}.
* config/riscv/predicates.md (const_arith_shifted123_operand):
New predicate (for constants that are a simm12, shifted by
1, 2 or 3).
(const_arith_2simm12_operand): New predicate (that can be
expressed by adding 2 simm12 together).
(addi_operand): New predicate (an immedaite operand suitable
for the new add3 expansion).
* config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
Don't use gen_add3_insn, where a RTX instead of an INSN is
required (otherwise this will break as soon as we have a
define_expand for add3).
(riscv_adjust_libcall_cfi_epilogue): Same.
* config/riscv/riscv.md (addsi3): Rename.
(riscv_addsi3): New name for addsi3.
(adddi3): Rename.
(riscv_adddi3): New name for adddi3.
(add3): New expander that handles the basic and fancy
(such as li+sh[123]add, addi+addi, ...) cases for adding
register-register and register-const_int.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/addi.c: New test.
* gcc.target/riscv/zba-shNadd-06.c: New test.

Signed-off-by: Philipp Tomsich 
---

  gcc/config/riscv/bitmanip.md  |  2 +-
  gcc/config/riscv/predicates.md| 28 +
  gcc/config/riscv/riscv.cc | 10 ++--
  gcc/config/riscv/riscv.md | 58 ++-
  gcc/testsuite/gcc.target/riscv/addi.c | 39 +
  .../gcc.target/riscv/zba-shNadd-06.c  | 11 
  6 files changed, 141 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c



diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 171a0cdced6..289ff7470c6 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -464,6 +464,60 @@
[(set_attr "type" "arith")
 (set_attr "mode" "DI")])
  
+(define_expand "add3"

+  [(set (match_operand:GPR   0 "register_operand"  "=r,r")
+   (plus:GPR (match_operand:GPR 1 "register_operand"  " r,r")
+ (match_operand:GPR 2 "addi_operand"  " r,I")))]
+  ""
+{
+  if (arith_operand (operands[2], mode))
+emit_insn (gen_riscv_add3 (operands[0], operands[1], operands[2]));
+  else if (const_arith_2simm12_operand (operands[2], mode))
+{
+  /* Split into two immediates that add up to the desired value:
+   * e.g., break up "a + 2445" into:
+   * addi  a0,a0,2047
+   *addi   a0,a0,398
+   */


Nit.  GNU comment style please.



+
+  HOST_WIDE_INT val = INTVAL (operands[2]);
+  HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
+
+  if (val >= 0)
+saturated = ~saturated;
+
+  val -= saturated;
+
+  rtx tmp = gen_reg_rtx (mode);


Can't add3 be generated by LRA?  If so, don't you have to guard 
against going into this path as we shouldn't be creating new pseudos at 
that point (I know LRA can create some internally, but I don't think it 
handles new ones showing up due to target expanders).



Similarly for the shifted_123 case immediately following.


If we do indeed have an issue here, I'm not sure how best to resolve.  
If the output operand does not overlap with the inputs, then we're 
golden and can just re-use it to form the constant.  If not,  then it's 
a bit tougher.  I'm not keen to add a test of no_new_pseudos to the 
operand predicate, but I don't see a better option yet.



jeff




Re: [PATCH v2] RISC-V: No extensions for SImode min/max against safe constant

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.

On Fri, 18 Nov 2022 at 21:11, Jeff Law  wrote:

>
> On 11/8/22 17:06, Philipp Tomsich wrote:
> > Optimize the common case of a SImode min/max against a constant
> > that is safe both for sign- and zero-extension.
> > E.g., consider the case
> >int f(unsigned int* a)
> >{
> >  const int C = 1000;
> >  return *a * 3 > C ? C : *a * 3;
> >}
> > where the constant C will yield the same result in DImode whether
> > sign- or zero-extended.
> >
> > This should eventually go away once the lowering to RTL smartens up
> > and considers the precision/signedness and the value-ranges of the
> > operands to MIN_EXPR nad MAX_EXPR.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*minmax): Additional pattern for
> >min/max against constants that are extension-invariant.
> >   * config/riscv/iterators.md (minmax_optab): Add an iterator
> > that has only min and max rtl.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zbb-min-max-02.c: New test.
>
> Ok
>
> jeff
>
>
>


Re: [PATCH v2] libcpp: Avoid remapping filenames within directives

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/2/22 04:47, Richard Purdie via Gcc-patches wrote:

Code such as:

  #include __FILE__

can interact poorly with the *-prefix-map options when cross compiling. In
general you're after to remap filenames for use in target context but the
local paths should be used to find include files at compile time. Ingoring
filename remapping for directives allows avoiding such failures.

Fix this to improve such usage and then document this against file-prefix-map
(referenced by the other *-prefix-map options) to make the behaviour clear
and defined.

libcpp/ChangeLog:

 * macro.cc (_cpp_builtin_macro_text): Don't remap filenames within 
directives

gcc/ChangeLog:

 * doc/invoke.texi: Document prefix-maps don't affect directives


THanks.  Installed.  Sorry about the wait.

jeff




Re: [PATCH v2] RISC-V: No extensions for SImode min/max against safe constant

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 17:06, Philipp Tomsich wrote:

Optimize the common case of a SImode min/max against a constant
that is safe both for sign- and zero-extension.
E.g., consider the case
   int f(unsigned int* a)
   {
 const int C = 1000;
 return *a * 3 > C ? C : *a * 3;
   }
where the constant C will yield the same result in DImode whether
sign- or zero-extended.

This should eventually go away once the lowering to RTL smartens up
and considers the precision/signedness and the value-ranges of the
operands to MIN_EXPR nad MAX_EXPR.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*minmax): Additional pattern for
   min/max against constants that are extension-invariant.
* config/riscv/iterators.md (minmax_optab): Add an iterator
  that has only min and max rtl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-min-max-02.c: New test.


Ok

jeff




Re: [PATCH] RISC-V: Fix RVV testcases.

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/5/22 18:13, Kito Cheng via Gcc-patches wrote:

Alternative fix for those testcase has posted:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605126.html


Did this ever get addressed, in either form?


jeff




Re: [PATCH] constexprify some tree variables

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/18/22 11:05, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Since we use C++11 by default now, we can
use constexpr for some const decls in tree-core.h.

This patch does that and it allows for better optimizations
of GCC code with checking enabled and without LTO.

For an example generic-match.cc compiling is speed up due
to the less number of basic blocks and less debugging info
produced. I did not check the speed of compiling the same source
but rather the speed of compiling the old vs new sources here
(but with the same compiler base).

The small slow down in the parsing of the arrays in each TU
is migrated by a speed up in how much code/debugging info
is produced in the end.

Note I looked at generic-match.cc since it is one of the
compiling sources which causes parallel building to stall and
I wanted to speed it up.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Or should this wait until GCC 13 branches off?

gcc/ChangeLog:

PR middle-end/14840
* tree-core.h (tree_code_type): Constexprify
by including all-tree.def.
(tree_code_length): Likewise.
* tree.cc (tree_code_type): Remove.
(tree_code_length): Remove.


I would have preferred this a week ago :-)   And if it was just 
const-ifying, I'd ACK it without hesitation.


Can you share any of the build-time speedups you're seeing, even if 
they're not perfect.  It'd help to get a sense of the potential gain 
here and whether or not there's enough gain to gate it into gcc-13 or 
have it wait for gcc-14.



And if we can improve the compile-time of the files generated by 
match.pd, that's a win.  It's definitely a serialization point -- it 
becomes *painfully* obvious when doing a bootstrap using qemu, when that 
file takes 1-2hrs after everything else has finished.



Jeff


Re: [PATCH v2 0/2] Use Zbs with xori/ori/andi and polarity-reversed twobit-tests

2022-11-18 Thread Philipp Tomsich
(Both) applied to master. Thanks!
--Philipp.

On Fri, 18 Nov 2022 at 20:13, Jeff Law  wrote:

>
> On 11/18/22 04:09, Philipp Tomsich wrote:
> > We had a few patches on the list that shared predicates (for extending
> > the reach of xori and ori -- and for the branches on two
> > polarity-reversed bits) and thus depended on each other.
> >
> > These all had approval with requested changes, so these are now
> > collected together for v2.
> >
> > Note that this adds the (a & ~C) case, so please take a look on that
> > part and OK the updated series.
> >
> >
> >
> > Changes in v2:
> > - Collects already approved changes for v2 for (a | C) and (a ^ C).
> > - Pulls in the (already) approved branch on polarity-reversed bits
> >for v2, as it shares predicates with the other changes.
> > - Newly adds support for the (a & ~C) case.
> >
> > Philipp Tomsich (2):
> >RISC-V: Use bseti/bclri/binvi to extend reach of ori/andi/xori
> >RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs
> >
> >   gcc/config/riscv/bitmanip.md  | 79 +++
> >   gcc/config/riscv/iterators.md |  8 ++
> >   gcc/config/riscv/predicates.md| 33 
> >   gcc/config/riscv/riscv.h  |  8 ++
> >   .../riscv/{zbs-bclri.c => zbs-bclri-01.c} |  0
> >   gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c | 27 +++
> >   gcc/testsuite/gcc.target/riscv/zbs-binvi.c| 22 ++
> >   gcc/testsuite/gcc.target/riscv/zbs-bseti.c| 27 +++
> >   .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> >   9 files changed, 224 insertions(+)
> >   rename gcc/testsuite/gcc.target/riscv/{zbs-bclri.c => zbs-bclri-01.c}
> (100%)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-binvi.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
>
> 1/2 and 2/2 are both OK.
>
> jeff
>
>


Re: [PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-18 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 20:52, Jeff Law  wrote:

> Something to consider.  We're gaining a lot of
>
> (subreg:SI (reg:DI) 0) kinds of operands.
>
>
> Would it make sense to make an operand predicate that accepted
>
> (reg:SI) or (subreg:SI (reg:DI) 0)?
>
>
> It will reduce my compaints about subregs :-)  But the real reason I'm
> suggesting we consider adding such a predicate is, AFIACT, it it gives
> combine a chance to eliminate the subreg.  I haven't actually tested
> this, but it seems like it might be worth a quick experiment independent
> of these patches (and probably targeted towards gcc-14 rather than gcc-13).
>

I like the idea. Definitively something to consider. We'll give this a try.
--Philipp.


Re: [PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks.
--Philipp.


On Fri, 18 Nov 2022 at 20:52, Jeff Law  wrote:

>
> On 11/8/22 12:57, Philipp Tomsich wrote:
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md: Handle corner-cases for combine
> >   when chaining slli(.uw)? + addw
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zba-shNadd-04.c: New test.
>
> OK.
>
> Something to consider.  We're gaining a lot of
>
> (subreg:SI (reg:DI) 0) kinds of operands.
>
>
> Would it make sense to make an operand predicate that accepted
>
> (reg:SI) or (subreg:SI (reg:DI) 0)?
>
>
> It will reduce my compaints about subregs :-)  But the real reason I'm
> suggesting we consider adding such a predicate is, AFIACT, it it gives
> combine a chance to eliminate the subreg.  I haven't actually tested
> this, but it seems like it might be worth a quick experiment independent
> of these patches (and probably targeted towards gcc-14 rather than gcc-13).
>
>
>
> jeff
>
>


Re: [PATCH] RISC-V: split to allow formation of sh[123]add before divw

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.

On Fri, 18 Nov 2022 at 20:37, Jeff Law  wrote:

>
> On 11/8/22 12:56, Philipp Tomsich wrote:
> > When using strength-reduction, we will reduce a multiplication to a
> > sequence of shifts and adds.  If this is performed with 32-bit types
> > and followed by a division, the lack of w-form sh[123]add will make
> > combination impossible and lead to a slli + addw being generated.
> >
> > Split the sequence with the knowledge that a w-form div will perform
> > implicit sign-extensions.
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/bitmanip.md: Add a define_split to optimize
> >slliw + addiw + divw into sh[123]add + divw.
> >
> > gcc/testsuite/ChangeLog:
> >
> >  * gcc.target/riscv/zba-shNadd-05.c: New test.
>
> OK.  I won't complain about the subregs on this one :-)
>
>
> jeff
>
>
>


Re: [PATCH] RISC-V: Optimize branches testing a bit-range or a shifted immediate

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
Philipp.

On Fri, 18 Nov 2022 at 20:30, Jeff Law  wrote:

>
> On 11/8/22 13:46, Philipp Tomsich wrote:
> > gcc/ChangeLog:
> >
> >   * config/riscv/predicates.md (shifted_const_arith_operand):
> >   (uimm_extra_bit_operand):
> >   * config/riscv/riscv.md
> (*branch_shiftedarith_equals_zero):
> >   (*branch_shiftedmask_equals_zero):
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/branch-1.c: New test.
>
> Nice...It seems so obvious, but I'm not offhand aware of other ports
> doing this, though many could likely benefit.
>
> OK
>
>
> jeff
>
>
>


Re: [PATCH] RISC-V: allow bseti on SImode without sign-extension

2022-11-18 Thread Philipp Tomsich
Applied to master. Thanks!
Philipp.

On Fri, 18 Nov 2022 at 20:26, Jeff Law  wrote:

>
> On 11/8/22 13:03, Philipp Tomsich wrote:
> > As long as the SImode operand is not a partial subreg, we can use a
> > bseti without postprocessing to or in a bit, as the middle end is
> > smart enough to stay away from the signbit.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*bsetidisi): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zbs-bexti-02.c: New test.
>
> OK, with my usual grumble about SUBREGs.
>
> jeff
>
>
>


Re: [PATCH] RISC-V: Optimize slli(.uw)? + addw + zext.w into sh[123]add + zext.w

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 12:57, Philipp Tomsich wrote:

gcc/ChangeLog:

* config/riscv/bitmanip.md: Handle corner-cases for combine
when chaining slli(.uw)? + addw

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zba-shNadd-04.c: New test.


OK.

Something to consider.  We're gaining a lot of

(subreg:SI (reg:DI) 0) kinds of operands.


Would it make sense to make an operand predicate that accepted

(reg:SI) or (subreg:SI (reg:DI) 0)?


It will reduce my compaints about subregs :-)  But the real reason I'm 
suggesting we consider adding such a predicate is, AFIACT, it it gives 
combine a chance to eliminate the subreg.  I haven't actually tested 
this, but it seems like it might be worth a quick experiment independent 
of these patches (and probably targeted towards gcc-14 rather than gcc-13).




jeff



Re: [PATCH v2] genmultilib: Add sanity check

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/3/22 03:52, Christophe Lyon via Gcc-patches wrote:

When a list of dirnames is provided to genmultilib, its length is
expected to match the number of options.  If this is not the case, the
build fails later for reasons not obviously related to this mistake.
This patch adds a sanity check to help diagnose such cases.

Tested by adding an option to t-aarch64 and no corresponding dirname,
with both bash and dash.

v2: do not use arrays (bash feature).

OK for trunk?

gcc/ChangeLog:

* genmultilib: Add sanity check.


OK.  It should be interesting to see if it trips.


jeff




Re: [PATCH] RISC-V: split to allow formation of sh[123]add before divw

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 12:56, Philipp Tomsich wrote:

When using strength-reduction, we will reduce a multiplication to a
sequence of shifts and adds.  If this is performed with 32-bit types
and followed by a division, the lack of w-form sh[123]add will make
combination impossible and lead to a slli + addw being generated.

Split the sequence with the knowledge that a w-form div will perform
implicit sign-extensions.

gcc/ChangeLog:

 * config/riscv/bitmanip.md: Add a define_split to optimize
   slliw + addiw + divw into sh[123]add + divw.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zba-shNadd-05.c: New test.


OK.  I won't complain about the subregs on this one :-)


jeff




Re: [PATCH] RISC-V: Optimize branches testing a bit-range or a shifted immediate

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 13:46, Philipp Tomsich wrote:

gcc/ChangeLog:

* config/riscv/predicates.md (shifted_const_arith_operand):
(uimm_extra_bit_operand):
* config/riscv/riscv.md (*branch_shiftedarith_equals_zero):
(*branch_shiftedmask_equals_zero):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/branch-1.c: New test.


Nice...    It seems so obvious, but I'm not offhand aware of other ports 
doing this, though many could likely benefit.


OK


jeff




Re: [PATCH] RISC-V: allow bseti on SImode without sign-extension

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/8/22 13:03, Philipp Tomsich wrote:

As long as the SImode operand is not a partial subreg, we can use a
bseti without postprocessing to or in a bit, as the middle end is
smart enough to stay away from the signbit.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*bsetidisi): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bexti-02.c: New test.


OK, with my usual grumble about SUBREGs.

jeff




Re: [PATCH v2 0/2] Use Zbs with xori/ori/andi and polarity-reversed twobit-tests

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/18/22 04:09, Philipp Tomsich wrote:

We had a few patches on the list that shared predicates (for extending
the reach of xori and ori -- and for the branches on two
polarity-reversed bits) and thus depended on each other.

These all had approval with requested changes, so these are now
collected together for v2.

Note that this adds the (a & ~C) case, so please take a look on that
part and OK the updated series.



Changes in v2:
- Collects already approved changes for v2 for (a | C) and (a ^ C).
- Pulls in the (already) approved branch on polarity-reversed bits
   for v2, as it shares predicates with the other changes.
- Newly adds support for the (a & ~C) case.

Philipp Tomsich (2):
   RISC-V: Use bseti/bclri/binvi to extend reach of ori/andi/xori
   RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

  gcc/config/riscv/bitmanip.md  | 79 +++
  gcc/config/riscv/iterators.md |  8 ++
  gcc/config/riscv/predicates.md| 33 
  gcc/config/riscv/riscv.h  |  8 ++
  .../riscv/{zbs-bclri.c => zbs-bclri-01.c} |  0
  gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c | 27 +++
  gcc/testsuite/gcc.target/riscv/zbs-binvi.c| 22 ++
  gcc/testsuite/gcc.target/riscv/zbs-bseti.c| 27 +++
  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
  9 files changed, 224 insertions(+)
  rename gcc/testsuite/gcc.target/riscv/{zbs-bclri.c => zbs-bclri-01.c} (100%)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-binvi.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c


1/2 and 2/2 are both OK.

jeff



[PATCH] gomp: Various fixes for SVE types [PR101018]

2022-11-18 Thread Richard Sandiford via Gcc-patches
[I posted this late in stage 4 as an RFC, but it wasn't suitable for
GCC 12 at that point.  I kind-of dropped the ball after that, sorry.]

Various parts of the omp code checked whether the size of a decl
was an INTEGER_CST in order to determine whether the decl was
variable-sized or not.  If it was variable-sized, it was expected
to have a DECL_VALUE_EXPR replacement, as for VLAs.

This patch uses poly_int_tree_p instead, so that variable-length
SVE vectors are treated like constant-length vectors.  This means
that some structures become poly_int-sized, with some fields at
poly_int offsets, but we already have code to handle that.

An alternative would have been to handle the data via indirection
instead.  However, that's likely to be more complicated, and it
would contradict is_variable_sized, which already uses a check
for TREE_CONSTANT rather than INTEGER_CST.

gimple_add_tmp_var should probably not add a safelen of 1
for SVE vectors, but that's really a separate thing and might
be hard to test.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR middle-end/101018
* poly-int.h (can_and_p): New function.
* fold-const.cc (poly_int_binop): Use it to optimize BIT_AND_EXPRs
involving POLY_INT_CSTs.
* expr.cc (get_inner_reference): Fold poly_uint64 size_trees
into the constant bitsize.
* gimplify.cc (gimplify_bind_expr): Use poly_int_tree_p instead
of INTEGER_CST when checking for constant-sized omp data.
(omp_add_variable): Likewise.
(omp_notice_variable): Likewise.
(gimplify_adjust_omp_clauses_1): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_omp_target): Likewise.

gcc/testsuite/
PR middle-end/101018
* gcc.target/aarch64/sve/acle/pr101018-1.c: New test.
* gcc.target/aarch64/sve/acle/pr101018-2.c: Likewise
---
 gcc/expr.cc   |  4 +--
 gcc/fold-const.cc |  7 +
 gcc/gimplify.cc   | 23 
 gcc/omp-low.cc| 10 +++
 gcc/poly-int.h| 19 +
 .../aarch64/sve/acle/general/pr101018-1.c | 27 +++
 .../aarch64/sve/acle/general/pr101018-2.c | 23 
 7 files changed, 94 insertions(+), 19 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr101018-2.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..a304c583d16 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7941,10 +7941,10 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
 
   if (size_tree != 0)
 {
-  if (! tree_fits_uhwi_p (size_tree))
+  if (! tree_fits_poly_uint64_p (size_tree))
mode = BLKmode, *pbitsize = -1;
   else
-   *pbitsize = tree_to_uhwi (size_tree);
+   *pbitsize = tree_to_poly_uint64 (size_tree);
 }
 
   *preversep = reverse_storage_order_for_component_p (exp);
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index b89cac91cae..000600017e2 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -1183,6 +1183,13 @@ poly_int_binop (poly_wide_int , enum tree_code code,
return false;
   break;
 
+case BIT_AND_EXPR:
+  if (TREE_CODE (arg2) != INTEGER_CST
+ || !can_and_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
+))
+   return false;
+  break;
+
 case BIT_IOR_EXPR:
   if (TREE_CODE (arg2) != INTEGER_CST
  || !can_ior_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index f06ce3cc77a..096738c8ed4 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7352,7 +7352,7 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree 
decl, unsigned int flags)
   /* When adding a variable-sized variable, we have to handle all sorts
  of additional bits of data: the pointer replacement variable, and
  the parameters of the type.  */
-  if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+  if (DECL_SIZE (decl) && !poly_int_tree_p (DECL_SIZE (decl)))
 {
   /* Add the pointer replacement variable as PRIVATE if the variable
 replacement is private, else FIRSTPRIVATE since we'll need the
@@ -8002,7 +8002,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
   && (flags & (GOVD_SEEN | GOVD_LOCAL)) == GOVD_SEEN
   && DECL_SIZE (decl))
 {
-  if (TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+  tree size;
+  if (!poly_int_tree_p (DECL_SIZE (decl)))
{
  splay_tree_node n2;
  tree t = DECL_VALUE_EXPR (decl);
@@ -8013,16 +8014,14 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
  n2->value |= GOVD_SEEN;
}

Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-18 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 18 Nov 2022 11:06:29 -0500
Jason Merrill  wrote:

> Ah, so the problem is deferred parsing of methods, rather than 
> templates.  Building the DECL_RESULT sooner does seem like the right 
> approach to handling that, whether that's in grokfndecl or grokmethod.

> >> I'd like to get the template case right while we're looking at it.  I
> >> guess I can add that myself if you're done trying.

Please do, i'd be glad if you could take care of these locations.
It icks me that they are wrong, and be it just for the sake of QOI :)

> >>> Is the hunk for normal functions OK for trunk?  
> >>
> >> You also need a testcase for the desired behavior, with e.g.
> >> { dg-error "23:" }  
> > 
> > I'd have to think about how to test that with trunk, yes.
> > There are no existing warnings that want to point to the return type,
> > are there?  
> 
> Good point.  Do any of your later patches add such a warning?

I didn't mean to have that -Wtype-demotion applied in it's current
form, or at all, so no. I was curious if anybody liked the idea of
pointing out such code though. I've had no feedback but everybody is or
was busy with end of stage3 and real work, so that's expected. The only
real purpose i had for it was to find places in the Fortran FE that
could use narrower types, bools for the most part.
IMHO it would be a nice thing to have, but then, embedded software
usually is cautious to use sensible types in the first place and the
rest doesn't really care anyway, supposedly.

Maybe it would have made more sense to just do an IPA pass that does the
demotion silently where it's feasable.

As to the test, i don't think these locations in the c++ FE are changed
all that often, so chances are rather low that they would be broken
once in.
So, short of trying to use the result decl locus for any existing
-Wreturn-type, -Waggregate-return, -Wno-return-local-addr,
-Wsuggest-attribute=[pure|const|noreturn|format|malloc] or another
existing warning that would be concerned, we could, as said, have a
plugin with fix-it hints and ideally -fdiagnostics-generate-patch to
test these bits. Patch generation has the advantage that it will ICE
more often than not if asked to generate patches for locations that
have a negative relative start (think: memcpy(...,..., -7)), which you
can get easily if the locations are off IMHO.

> > Maybe a g++.dg/plugin/result_decl_plugin.c then.


Re: [PATCH RFA] libstdc++: add experimental Contracts support

2022-11-18 Thread Jonathan Wakely via Gcc-patches

On 03/11/22 15:57 -0400, Jason Merrill wrote:

Tested x86_64-pc-linux-gnu.  OK for trunk?

-- >8 --

This patch adds the library support for the experimental C++ Contracts
implementation.  This now consists only of a default definition of the
violation handler, which users can override through defining their own
version.  To avoid ABI stability problems with libstdc++.so this is added to
a separate -lstdc++exp static library, which the driver knows to add when it
sees -fcontracts.

libstdc++-v3/ChangeLog:

* acinclude.m4 (glibcxx_SUBDIRS): Add src/experimental.
* include/Makefile.am (experimental_headers): Add contract.
* include/Makefile.in: Regenerate.
* src/Makefile.am (SUBDIRS): Add experimental.
* src/Makefile.in: Regenerate.
* configure: Regenerate.
* src/experimental/contract.cc: New file.
* src/experimental/Makefile.am: New file.
* src/experimental/Makefile.in: New file.
* include/experimental/contract: New file.
---
libstdc++-v3/src/experimental/contract.cc  |  41 ++
libstdc++-v3/acinclude.m4  |   2 +-
libstdc++-v3/include/Makefile.am   |   1 +
libstdc++-v3/include/Makefile.in   |   1 +
libstdc++-v3/src/Makefile.am   |   3 +-
libstdc++-v3/src/Makefile.in   |   6 +-
libstdc++-v3/src/experimental/Makefile.am  |  96 +++
libstdc++-v3/src/experimental/Makefile.in  | 796 +
libstdc++-v3/include/experimental/contract |  84 +++
9 files changed, 1026 insertions(+), 4 deletions(-)
create mode 100644 libstdc++-v3/src/experimental/contract.cc
create mode 100644 libstdc++-v3/src/experimental/Makefile.am
create mode 100644 libstdc++-v3/src/experimental/Makefile.in
create mode 100644 libstdc++-v3/include/experimental/contract


base-commit: a4cd2389276a30c39034a83d640ce68fa407bac1
prerequisite-patch-id: 329bc16a88dc9a3b13cd3fcecb3678826cc592dc

diff --git a/libstdc++-v3/src/experimental/contract.cc 
b/libstdc++-v3/src/experimental/contract.cc
new file mode 100644
index 000..b9b72cd7df0
--- /dev/null
+++ b/libstdc++-v3/src/experimental/contract.cc
@@ -0,0 +1,41 @@
+// -*- C++ -*- std::experimental::contract_violation and friends
+// Copyright (C) 1994-2022 Free Software Foundation, Inc.


Copy from an old file? I don't think this uses anything
existing, should be just 2022.


+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+#include 
+#include 
+
+__attribute__ ((weak)) void
+handle_contract_violation (const std::experimental::contract_violation 
)
+{
+  std::cerr << "default std::handle_contract_violation called: " << std::endl


No need for flushing with endl here, just \n please.


+<< " " << violation.file_name()
+<< " " << violation.line_number()
+<< " " << violation.function_name()
+<< " " << violation.comment()
+<< " " << violation.assertion_level()
+<< " " << violation.assertion_role()
+<< " " << (int)violation.continuation_mode()
+<< std::endl;


And this will flush too, which typically isn't needed for stderr
because it's unbuffered. But somebody could have fiddled with cerr, so
doing this final flush seems OK.


+}
+
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 6f672924a73..baf01913a90 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -49,7 +49,7 @@ AC_DEFUN([GLIBCXX_CONFIGURE], [
  # Keep these sync'd with the list in Makefile.am.  The first provides an
  # expandable list at autoconf time; the second provides an expandable list
  # (i.e., shell variable) at configure time.
-  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/filesystem src/libbacktrace doc po testsuite python])
+  m4_define([glibcxx_SUBDIRS],[include libsupc++ src src/c++98 src/c++11 
src/c++17 src/c++20 src/filesystem src/libbacktrace src/experimental doc po 
testsuite python])
  SUBDIRS='glibcxx_SUBDIRS'

  # These need to be absolute paths, yet at the same time need to
diff --git a/libstdc++-v3/include/Makefile.am 

[PATCH] constexprify some tree variables

2022-11-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Since we use C++11 by default now, we can
use constexpr for some const decls in tree-core.h.

This patch does that and it allows for better optimizations
of GCC code with checking enabled and without LTO.

For an example generic-match.cc compiling is speed up due
to the less number of basic blocks and less debugging info
produced. I did not check the speed of compiling the same source
but rather the speed of compiling the old vs new sources here
(but with the same compiler base).

The small slow down in the parsing of the arrays in each TU
is migrated by a speed up in how much code/debugging info
is produced in the end.

Note I looked at generic-match.cc since it is one of the
compiling sources which causes parallel building to stall and
I wanted to speed it up.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Or should this wait until GCC 13 branches off?

gcc/ChangeLog:

PR middle-end/14840
* tree-core.h (tree_code_type): Constexprify
by including all-tree.def.
(tree_code_length): Likewise.
* tree.cc (tree_code_type): Remove.
(tree_code_length): Remove.
---
 gcc/tree-core.h | 21 +++--
 gcc/tree.cc | 24 
 2 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af75522504f..e146b133dbd 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -2284,15 +2284,32 @@ struct floatn_type_info {
 /* Matrix describing the structures contained in a given tree code.  */
 extern bool tree_contains_struct[MAX_TREE_CODES][64];
 
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
+#define END_OF_BASE_TREE_CODES tcc_exceptional,
+
+
 /* Class of tree given its code.  */
-extern const enum tree_code_class tree_code_type[];
+constexpr enum tree_code_class tree_code_type[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
 
 /* Each tree code class has an associated string representation.
These must correspond to the tree_code_class entries.  */
 extern const char *const tree_code_class_strings[];
 
 /* Number of argument-words in each kind of tree-node.  */
-extern const unsigned char tree_code_length[];
+
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
+#define END_OF_BASE_TREE_CODES 0,
+constexpr unsigned char tree_code_length[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
 
 /* Vector of all alias pairs for global symbols.  */
 extern GTY(()) vec *alias_pairs;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 574bd2e65d9..254b2373dcf 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -74,31 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "ubsan.h"
 
-/* Tree code classes.  */
 
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
-#define END_OF_BASE_TREE_CODES tcc_exceptional,
-
-const enum tree_code_class tree_code_type[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
-
-/* Table indexed by tree code giving number of expression
-   operands beyond the fixed part of the node structure.
-   Not used for types or decls.  */
-
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
-#define END_OF_BASE_TREE_CODES 0,
-
-const unsigned char tree_code_length[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
 
 /* Names of tree components.
Used for printing out the tree and error messages.  */
-- 
2.17.1



Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Andrew Stubbs

On 18/11/2022 17:41, Tobias Burnus wrote:
Attached is the updated/rediffed version, which now uses the builtin 
instead of the 'asm("s8").


The code in principle works; that is: If no private stack variables are 
copied, it works.


Or in other words: reverse-offload target regions that don't use 
firstprivate or mapping work, the rest would crash. That's avoided by 
not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now.


To get it working, the manual stack allocation patch + the trivial 
update to that get_num_devices func is needed, but no change to the 
attached patch.


In order to reduce local patches, I would love to have it on mainline – 
otherwise, I have at least the current version in gcc-patches@.


OK with me.

Andrew


Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Andrew Stubbs

On 18/11/2022 17:20, Tobias Burnus wrote:

This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this 
thread).


The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stack on the compiler side without the 
need to
modify newlib again. The GCC patch matches what newlib did in reent; I 
could

imagine that we change this later on.

Lightly tested (especially by visual inspection).
Currently doing a final regtest, OK when it passes?

Any  comments to this patch - or the attached newlib patch?*

Tobias

(*) I also included a patch to newlib to see where were are heading
+ to actually use them for regtesting ...


This looks wrong:


+   /* stackbase = (stack_segment_decr & 0x)
+   + stack_wave_offset);
+  seg_size = dispatch_ptr->private_segment_size;
+  stacklimit = stackbase + seg_size*64;
+  with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t);
+  cf. struct hsa_kernel_dispatch_packet_s in the HSA doc.  */
+   rtx ptr;
+   if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+   && cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+ {
+   rtx size_rtx = gen_rtx_REG (DImode,
+   
cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+   size_rtx = gen_rtx_MEM (DImode,
+   gen_rtx_PLUS (DImode, size_rtx,
+ GEN_INT (6*16 + 3*32)));
+   size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+


seg_size is calculated from the private_segment_size loaded from the 
dispatch_ptr, not calculated from the dispatch_ptr itself.


Andrew


Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Tobias Burnus

Attached is the updated/rediffed version, which now uses the builtin
instead of the 'asm("s8").

The code in principle works; that is: If no private stack variables are
copied, it works.

Or in other words: reverse-offload target regions that don't use
firstprivate or mapping work, the rest would crash. That's avoided by
not accepting reverse offload inside GOMP_OFFLOAD_get_num_devices for now.

To get it working, the manual stack allocation patch + the trivial
update to that get_num_devices func is needed, but no change to the
attached patch.

In order to reduce local patches, I would love to have it on mainline –
otherwise, I have at least the current version in gcc-patches@.

Tobias

PS: Previous patch email quoted below. Note: there were two follow up
emails, one by Andrew and one by me; cf. your own mail archive (of this
thread) or
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603383.html + the
next two by thread messages.

On 12.10.22 16:29, Tobias Burnus wrote:

On 29.09.22 18:24, Andrew Stubbs wrote:

On 27/09/2022 14:16, Tobias Burnus wrote:

Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance regarding how to actually implement it.

[...]
The point is that you can use the "msg" and "text" fields for
whatever data you want, as long as you invent a new value for "type".
[]
You can make "case 4" do whatever you want. There are enough bytes
for 4 pointers, and you could use multiple packets (although it's not
safe to assume they're contiguous or already arrived; maybe "case 4"
for part 1, "case 5" for part 2). It's possible to change this
structure, of course, but the target implementation is in newlib so
versioning becomes a problem.


I think  – also looking at the Newlib write.c implementation - that
the data is contiguous: there is an atomic add, where instead of
passing '1' for a single slot, I could also add '2' for two slots.

Attached is one variant – for the decl of the GOMP_OFFLOAD_target_rev,
it needs the generic parts of the sister nvptx patch.*

2*128 bytes were not enough, I need 3*128 bytes. (Or rather 5*64 +
32.) As target_ext is blocking, I decided to use a stack local
variable for the remaining arguments and pass it along. Alternatively,
I could also use 2 slots - and process them together. This would avoid
one device->host memory copy but would make console_output less clear.

OK for mainline?

Tobias

* https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603354.html

PS: Currently, device stack variables are private and cannot be
accessed from the host; this will change in a separate patch. It not
only affects the "rest" part as used in this patch but also the actual
arrays behind addr, kinds, and sizes. And quite likely a lot of the
map/firstprivate variables passed to addr.

As num_devices() will return 0 or -1, this is for now a non-issue.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp/gcn: Prepare for reverse-offload callback handling

libgomp/ChangeLog:

	* config/gcn/libgomp-gcn.h: New file; contains
	struct output, declared previously in plugin-gcn.c.
	* config/gcn/target.c: Include it.
	(GOMP_ADDITIONAL_ICVS): Declare as extern var.
	(GOMP_target_ext): Handle reverse offload.
	* plugin/plugin-gcn.c: Include libgomp-gcn.h.
	(struct kernargs): Replace struct def by the one
	from libgomp-gcn.h for output_data.
	(process_reverse_offload): New.
	(console_output): Call it.

 libgomp/config/gcn/libgomp-gcn.h | 61 
 libgomp/config/gcn/target.c  | 44 -
 libgomp/plugin/plugin-gcn.c  | 34 --
 3 files changed, 117 insertions(+), 22 deletions(-)

diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
new file mode 100644
index 000..91560be787f
--- /dev/null
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -0,0 +1,61 @@
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Tobias Burnus .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, 

[Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Tobias Burnus

This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this thread).

The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stack on the compiler side without the need to
modify newlib again. The GCC patch matches what newlib did in reent; I could
imagine that we change this later on.

Lightly tested (especially by visual inspection).
Currently doing a final regtest, OK when it passes?

Any  comments to this patch - or the attached newlib patch?*

Tobias

(*) I also included a patch to newlib to see where were are heading
+ to actually use them for regtesting ...
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

The new builtins have been added for newlib to reduce dependency on
compiler-internal implementation choices of GCC in newlibs' getreent.c.

gcc/ChangeLog:

	* config/gcn/gcn-builtins.def (FIRST_CALL_THIS_THREAD_P,
GET_STACK_LIMIT): Add new builtins.
	* config/gcn/gcn.cc (gcn_expand_builtin_1): Expand them.
	* config/gcn/gcn.md (prologue_use): Add "register_operand" as
	arg to match_operand.
	(prologue_use_di): New; DI insn_and_split variant of the former.

Co-Authored-By: Andrew Stubbs 

 gcc/config/gcn/gcn-builtins.def |  4 +++
 gcc/config/gcn/gcn.cc   | 70 -
 gcc/config/gcn/gcn.md   | 15 -
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index eeeaebf9013..f1cf30bbc94 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -160,8 +160,12 @@ DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID),
 
 /* Kernel inputs.  */
 
+DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
+	 _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
 DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
 	 gcn_expand_builtin_1)
+DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
+	 _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
 
 #undef _A1
 #undef _A2
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b3814c2e7c6..051eadee783 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4493,6 +4493,44 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
   emit_insn (gen_gcn_wavefront_barrier ());
   return target;
 
+case GCN_BUILTIN_GET_STACK_LIMIT:
+  {
+	/* stackbase = (stack_segment_decr & 0x)
+			+ stack_wave_offset);
+	   seg_size = dispatch_ptr->private_segment_size;
+	   stacklimit = stackbase + seg_size*64;
+	   with segsize = dispatch_ptr + 6*sizeof(int16_t) + 3*sizeof(int32_t);
+	   cf. struct hsa_kernel_dispatch_packet_s in the HSA doc.  */
+	rtx ptr;
+	if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0
+	&& cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx size_rtx = gen_rtx_REG (DImode,
+	cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+	size_rtx = gen_rtx_MEM (DImode,
+gen_rtx_PLUS (DImode, size_rtx,
+		  GEN_INT (6*16 + 3*32)));
+	size_rtx = gen_rtx_MULT (DImode, size_rtx, GEN_INT (64));
+
+	ptr = gen_rtx_REG (DImode,
+		cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	ptr = gen_rtx_AND (DImode, ptr, GEN_INT (0x));
+	ptr = gen_rtx_PLUS (DImode, ptr, size_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG] >= 0)
+	  {
+		rtx off;
+		off = gen_rtx_REG (SImode,
+		  cfun->machine->args.reg[PRIVATE_SEGMENT_WAVE_OFFSET_ARG]);
+		ptr = gen_rtx_PLUS (DImode, ptr, off);
+	  }
+	  }
+	else
+	  {
+	ptr = gen_reg_rtx (DImode);
+	emit_move_insn (ptr, const0_rtx);
+	  }
+	return ptr;
+  }
 case GCN_BUILTIN_KERNARG_PTR:
   {
 	rtx ptr;
@@ -4506,7 +4544,37 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	  }
 	return ptr;
   }
-
+case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
+  {
+	/* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
+	   whether it was the first call.  */
+	rtx result = gen_reg_rtx (BImode);
+	emit_move_insn (result, const0_rtx);
+	if (cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG] >= 0)
+	  {
+	rtx not_first = gen_label_rtx ();
+	rtx reg = gen_rtx_REG (DImode,
+			cfun->machine->args.reg[PRIVATE_SEGMENT_BUFFER_ARG]);
+	rtx cmp = force_reg (DImode,
+ gen_rtx_AND (DImode, reg,
+	  GEN_INT (0xL)));
+	emit_insn (gen_cstoresi4 (result, gen_rtx_EQ (BImode, cmp,
+			  GEN_INT(12345L << 48)),
+  cmp, GEN_INT(12345L << 48)));
+	

Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-11-18 Thread Palmer Dabbelt

On Thu, 17 Nov 2022 22:59:08 PST (-0800), Kito Cheng wrote:

Wait, what's Xgnuzihintpausestate???


I just made it up, it's defined right next to the name like those 
profile extensions are.  I figured that's the most RISC-V way to define 
something like this, but we could just drop it and run with the 
definition -- IIRC we just stuck a comment in for Linux and QEMU, I 
doubt anyone is actually going to implement the "doesn't touch PC" 
version of pause.



On Fri, Nov 18, 2022 at 12:30 PM Palmer Dabbelt  wrote:


gcc/ChangeLog:

* doc/extend.texi (__builtin_riscv_pause): Imply
Xgnuzihintpausestate.
---
 gcc/doc/extend.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1dd39e64b8..26f14e61bc8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21103,7 +21103,9 @@ Returns the value that is currently set in the 
@samp{tp} register.
 @end deftypefn

 @deftypefn {Built-in Function}  void __builtin_riscv_pause (void)
-Generates the @code{pause} (hint) machine instruction.
+Generates the @code{pause} (hint) machine instruction.  This implies the
+Xgnuzihintpausestate extension, which redefines the @code{pause} instruction to
+change architectural state.
 @end deftypefn

 @node RX Built-in Functions
--
2.38.1



RE: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
> overloading [PR107515]
> 
> From: Stam Markianos-Wright 
> 
> This patch adds explicit references to other float types
> to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515
> 
> gcc/ChangeLog:
> PR 107515
> * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but 
for now ok.
The ChangeLog should say "PR target/107515" for the git hook to recognize it 
IIRC.
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index fd1876b57a0..f6b42dc3fab 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -35582,6 +35582,9 @@ enum {
>   short: __ARM_mve_type_int_n, \
>   int: __ARM_mve_type_int_n, \
>   long: __ARM_mve_type_int_n, \
> + _Float16: __ARM_mve_type_fp_n, \
> + __fp16: __ARM_mve_type_fp_n, \
> + float: __ARM_mve_type_fp_n, \
>   double: __ARM_mve_type_fp_n, \
>   long long: __ARM_mve_type_int_n, \
>   unsigned char: __ARM_mve_type_int_n, \
> --
> 2.25.1



Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 11:34, Jakub Jelinek wrote:

On Fri, Nov 18, 2022 at 11:24:45AM -0500, Jason Merrill wrote:

Right, that's the C++17 implicit constexpr for lambdas, finish_function:

/* Lambda closure members are implicitly constexpr if possible.  */
if (cxx_dialect >= cxx17
&& LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl)))
  DECL_DECLARED_CONSTEXPR_P (fndecl)
= ((processing_template_decl
|| is_valid_constexpr_fn (fndecl, /*complain*/false))
   && potential_constant_expression (DECL_SAVED_TREE (fndecl)));


Yeah, I guess potential_constant_expression needs to be stricter in a
lambda. Or perhaps any function that isn't already
DECL_DECLARED_CONSTEXPR_P?


potential_constant_expression can't be relied on that it catches up
everything if it, even a simple if statement with a condition not yet
known to be 0 or non-0 results in just a requirement that at least
one of the substatements is potential constant, etc.
Similarly switch statements etc.
If there is a way to distinguish between functions with user
specified constexpr/consteval and DECL_DECLARED_CONSTEXPR_P set
through the above if condition, sure, cp_finish_decl ->
check_static_in_constexpr could be perhaps silent about those, but then
we want to diagnose it during constexpr evaluation at least.  But in that
case having it a pedwarn rather than "this is a constant expression"
vs. "this is not a constant expression, if !ctx->quiet emit an error"
is something I don't see how to handle.  Because something needs
to be returned, it is a constant expression or it is not.


True.  Let's go with your option 2, then, thanks.

Jason



RE: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n
> intrinsic
> 
> From: Stam Markianos-Wright 
> 
> It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
> resolution would fall back to the `__ARM_undef` failure state.
> 
> This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79`
> and
> `6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
> identified, because the tests were not checking for this kind of failure.
> 
> The above commits changed the definitions of the intrinsics from using
> `[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
> allowed `int` to be supported in user code through the overloaded
> `#defines`, but seems to have broken the `[u]int[8/16/32]_t` types
> 
> The solution implemented by this patch is to explicitly use a new
> _Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
> change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
> user code and are handled by the overloading mechanism correctly.
> 
> gcc/ChangeLog:
> 
> * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
> (__arm_vaddq_m_n_s32): Likewise.
> (__arm_vaddq_m_n_s16): Likewise.
> (__arm_vaddq_m_n_u8): Likewise.
> (__arm_vaddq_m_n_u32): Likewise.
> (__arm_vaddq_m_n_u16): Likewise.
> (__arm_vaddq_m): Fix Overloading.
> (__ARM_mve_coerce3): New.

Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit 
message?
Thanks,
Kyrill

> ---
>  gcc/config/arm/arm_mve.h | 78 
>  1 file changed, 40 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index 684f997520f..951dc25374b 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pr
> 
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline int16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
> mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
> __b, mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline uint16x8_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
> __b, mve_pred16_t __p)
>  {
>return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
>  }
> @@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive,
> uint16x8_t __a, uint16x8_t __b, mve_pred16
> 
>  __extension__ extern __inline int8x16_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b,
> mve_pred16_t __p)
> +__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
> mve_pred16_t __p)
>  {
>   return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p);
>  }
> 
>  __extension__ extern __inline 

RE: [PATCH 10/35] arm: improve tests for vabavq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 10/35] arm: improve tests for vabavq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vabavq_p_s16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_s32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_s8.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_u16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_u32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_p_u8.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_s16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_s32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_s8.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_u16.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_u32.c:
>   * gcc.target/arm/mve/intrinsics/vabavq_u8.c:

Missing ChangeLog text?
Ok with ChangeLog fixed.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vabavq_p_s16.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_s32.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_s8.c  | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_u16.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_u32.c | 40 ++-
>  .../arm/mve/intrinsics/vabavq_p_u8.c  | 40 ++-
>  .../arm/mve/intrinsics/vabavq_s16.c   | 28 -
>  .../arm/mve/intrinsics/vabavq_s32.c   | 28 -
>  .../gcc.target/arm/mve/intrinsics/vabavq_s8.c | 28 -
>  .../arm/mve/intrinsics/vabavq_u16.c   | 28 -
>  .../arm/mve/intrinsics/vabavq_u32.c   | 28 -
>  .../gcc.target/arm/mve/intrinsics/vabavq_u8.c | 28 -
>  12 files changed, 384 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> index 78ac801fa3c..843d022c418 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s16.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s16  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
>  {
>return vabavq_p_s16 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s16  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
>  {
>return vabavq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s16  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
> +uint32_t
> +foo2 (int16x8_t b, int16x8_t c, mve_pred16_t p)
> +{
> +  return vabavq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> index af4e30b6127..6ed9b9ac1c4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabavq_p_s32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s32  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vabavq_p_s32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vabavt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vabavt.s32  (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint32_t
>  foo1 (uint32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vabavq_p (a, b, c, p);
>  }
> 
> -/* { 

RE: [PATCH 12/35] arm: improve tests and fix vabsq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 12/35] arm: improve tests and fix vabsq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vabsq_f): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vabsq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vabsq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabsq_x_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  2 +-
>  .../gcc.target/arm/mve/intrinsics/vabsq_f16.c | 22 +++-
>  .../gcc.target/arm/mve/intrinsics/vabsq_f32.c | 22 +++-
>  .../arm/mve/intrinsics/vabsq_m_f16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_f32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_s16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_s32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_m_s8.c   | 25 ---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s16.c | 20 ---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s32.c | 20 ---
>  .../gcc.target/arm/mve/intrinsics/vabsq_s8.c  | 16 ++--
>  .../arm/mve/intrinsics/vabsq_x_f16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_f32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_s16.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_s32.c  | 25 ---
>  .../arm/mve/intrinsics/vabsq_x_s8.c   | 25 ---
>  16 files changed, 309 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 3330a220aea..bc4e2f2ac21 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -279,7 +279,7 @@ (define_insn "mve_vabsq_f"
>   (abs:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")))
>]
>"TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vabs.f%#  %q0, %q1"
> +  "vabs.f%#\t%q0, %q1"
>[(set_attr "type" "mve_move")
>  ])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> index 08e141baedc..f29ada8c058 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f16.c
> @@ -1,13 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vabs.f16q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo (float16x8_t a)
>  {
>return vabsq_f16 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.f16"  }  } */
> +
> +/*
> +**foo1:
> +**   ...
> +**   vabs.f16q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo1 (float16x8_t a)
> +{
> +  return vabsq (a);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> index 3614a44fbdc..cc24744fb26 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabsq_f32.c
> @@ -1,13 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vabs.f32q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
>  float32x4_t
>  foo (float32x4_t a)
>  {
>return vabsq_f32 (a);
>  }
> 
> -/* { dg-final { scan-assembler "vabs.f32"  }  } */
> +
> +/*
> +**foo1:
> +**   ...
> +**   vabs.f32q[0-9]+, q[0-9]+(?: @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo1 (float32x4_t a)
> +{
> +  return vabsq 

RE: [PATCH 11/35] arm: improve tests for vabdq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 11/35] arm: improve tests for vabdq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vabdq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vabdq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vabdq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../gcc.target/arm/mve/intrinsics/vabdq_f16.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_f32.c | 16 ++--
>  .../arm/mve/intrinsics/vabdq_m_f16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_f32.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_s32.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_s8.c   | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_u16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_u32.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_m_u8.c   | 26 ---
>  .../gcc.target/arm/mve/intrinsics/vabdq_s16.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_s32.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_s8.c  | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u16.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u32.c | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vabdq_u8.c  | 16 ++--
>  .../arm/mve/intrinsics/vabdq_x_f16.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_f32.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vabdq_x_s32.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_s8.c   | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_u16.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_u32.c  | 25 +++---
>  .../arm/mve/intrinsics/vabdq_x_u8.c   | 25 +++---
>  24 files changed, 464 insertions(+), 73 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> index b55e826e4b6..f379b25c49e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f16.c
> @@ -1,21 +1,33 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vabd.f16q[0-9]+, q[0-9]+, q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo (float16x8_t a, float16x8_t b)
>  {
>return vabdq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vabd.f16q[0-9]+, q[0-9]+, q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo1 (float16x8_t a, float16x8_t b)
>  {
>return vabdq (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vabd.f16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vabdq_f32.c
> index f1a95b14e03..3ba808e0b4d 100644
> --- 

Re: [PATCH] Allow prologues and epilogues to be inserted later

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/18/22 08:18, Richard Sandiford wrote:


Yeah, good point.  How does the version below look?  Tested as before.

I guess it's a philosophical question what distinguishes "late compilation"
from everything else, but I think it makes sense for it to mean "no code
motion" (among other things).  And it's useful if targets have a well-
defined point at which they can insert their own passes while guaranteeing
that:

- the CFG still exists and hasn't lost information
- no code motion occurs later
- alignments aren't nailed down yet
- variable tracking occurs later (and so will account for whatever the
   target does in its pass)


Seems like a reasonable set of properties.  Do we want to document this 
somewhere so that it get captured?  That can be independent of this 
particular patch.





I don't think it's controversial to say that delay-branch reorg should
happen as part of normal scheduling, with the later passes coping with
the SEQUENCEs generated from it, but there's no realistic chance of
that happening.  So unfortunately it's always likely to be a special
case...


I've been wanting the guts of dbr moved into sched2 for a long time.  
I've speculated that we could use the dependence analysis from sched2 to 
provide the candidates for delay slot filling and that doing so would 
probably pick up the vast majority of opportunities, but without the 
ad-hoc dependency bits in reorg.c. But yea, realistically nobody's going 
to invest the time to revamp reorg.





Bernd did some nice work on avoiding dbr for bfin (IIRC), but without
the handling of SEQUENCEs in rtl passes, even that version had to
happen during md_reorg.


Never really looked at it.





Thanks,
Richard


gcc/
* target.def (use_late_prologue_epilogue): New hook.
* doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE.
* doc/tm.texi: Regenerate.
* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
(pass_data_late_thread_prologue_and_epilogue): New pass variable.
(pass_late_thread_prologue_and_epilogue): New pass class.
(make_pass_late_thread_prologue_and_epilogue): New function.


OK

jeff




RE: [PATCH 09/35] arm: improve tests for vmax*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 09/35] arm: improve tests for vmax*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmaxaq_m_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vmaxaq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxaq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxaq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxaq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxaq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmaq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmaq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmaq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxq_x_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmaxaq_m_s16.c | 25 +--
>  .../arm/mve/intrinsics/vmaxaq_m_s32.c | 25 +--
>  .../arm/mve/intrinsics/vmaxaq_m_s8.c  | 25 +--
>  .../arm/mve/intrinsics/vmaxaq_s16.c   | 16 +++-
>  .../arm/mve/intrinsics/vmaxaq_s32.c   | 16 +++-
>  .../gcc.target/arm/mve/intrinsics/vmaxaq_s8.c | 16 +++-
>  .../arm/mve/intrinsics/vmaxavq_p_s16.c| 41 ---
>  .../arm/mve/intrinsics/vmaxavq_p_s32.c| 41 ---
>  .../arm/mve/intrinsics/vmaxavq_p_s8.c | 41 ---
>  .../arm/mve/intrinsics/vmaxavq_s16.c  | 29 ++---
>  .../arm/mve/intrinsics/vmaxavq_s32.c  | 29 ++---
>  

RE: [PATCH 08/35] arm: improve tests for vmin*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 08/35] arm: improve tests for vmin*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vminaq_m_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vminaq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminaq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminaq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminaq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminaq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmaq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmaq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmaq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmaq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminq_x_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vminaq_m_s16.c | 25 +--
>  .../arm/mve/intrinsics/vminaq_m_s32.c | 25 +--
>  .../arm/mve/intrinsics/vminaq_m_s8.c  | 25 +--
>  .../arm/mve/intrinsics/vminaq_s16.c   | 16 +++-
>  .../arm/mve/intrinsics/vminaq_s32.c   | 16 +++-
>  .../gcc.target/arm/mve/intrinsics/vminaq_s8.c | 16 +++-
>  .../arm/mve/intrinsics/vminavq_p_s16.c| 41 ---
>  .../arm/mve/intrinsics/vminavq_p_s32.c| 41 ---
>  .../arm/mve/intrinsics/vminavq_p_s8.c | 41 ---
>  .../arm/mve/intrinsics/vminavq_s16.c  | 29 ++---
>  .../arm/mve/intrinsics/vminavq_s32.c  | 29 ++---
>  

RE: [PATCH 06/35] arm: improve tests and fix vdupq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 06/35] arm: improve tests and fix vdupq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vdupq_n_f)
>   (mve_vdupq_n_, mve_vdupq_m_n_)
>   (mve_vdupq_m_n_f): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdupq_x_n_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  8 ++--
>  .../arm/mve/intrinsics/vdupq_m_n_f16.c| 41 +--
>  .../arm/mve/intrinsics/vdupq_m_n_f32.c| 41 +--
>  .../arm/mve/intrinsics/vdupq_m_n_s16.c| 25 +--
>  .../arm/mve/intrinsics/vdupq_m_n_s32.c| 25 +--
>  .../arm/mve/intrinsics/vdupq_m_n_s8.c | 25 +--
>  .../arm/mve/intrinsics/vdupq_m_n_u16.c| 41 +--
>  .../arm/mve/intrinsics/vdupq_m_n_u32.c| 41 +--
>  .../arm/mve/intrinsics/vdupq_m_n_u8.c | 41 +--
>  .../arm/mve/intrinsics/vdupq_n_f16.c  | 21 +-
>  .../arm/mve/intrinsics/vdupq_n_f32.c  | 21 +-
>  .../arm/mve/intrinsics/vdupq_n_s16.c  | 13 --
>  .../arm/mve/intrinsics/vdupq_n_s32.c  | 13 --
>  .../arm/mve/intrinsics/vdupq_n_s8.c   |  9 +++-
>  .../arm/mve/intrinsics/vdupq_n_u16.c  | 23 ++-
>  .../arm/mve/intrinsics/vdupq_n_u32.c  | 23 ++-
>  .../arm/mve/intrinsics/vdupq_n_u8.c   | 23 ++-
>  .../arm/mve/intrinsics/vdupq_x_n_f16.c| 30 +-
>  .../arm/mve/intrinsics/vdupq_x_n_f32.c| 30 +-
>  .../arm/mve/intrinsics/vdupq_x_n_s16.c| 14 ++-
>  .../arm/mve/intrinsics/vdupq_x_n_s32.c| 14 ++-
>  .../arm/mve/intrinsics/vdupq_x_n_s8.c | 14 ++-
>  .../arm/mve/intrinsics/vdupq_x_n_u16.c| 30 +-
>  .../arm/mve/intrinsics/vdupq_x_n_u32.c| 30 +-
>  .../arm/mve/intrinsics/vdupq_x_n_u8.c | 30 +-
>  25 files changed, 567 insertions(+), 59 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 58ffe03c499..6d5270281ec 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -266,7 +266,7 @@ (define_insn "mve_vdupq_n_f"
>VDUPQ_N_F))
>]
>"TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vdup.%#   %q0, %1"
> +  "vdup.%#\t%q0, %1"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -435,7 +435,7 @@ (define_insn "mve_vdupq_n_"
>VDUPQ_N))
>]
>"TARGET_HAVE_MVE"
> -  "vdup.%#   %q0, %1"
> +  "vdup.%#\t%q0, %1"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -3046,7 +3046,7 @@ (define_insn "mve_vdupq_m_n_"
>VDUPQ_M_N))
>]
>"TARGET_HAVE_MVE"
> -  "vpst\;vdupt.%# %q0, %2"
> +  "vpst\;vdupt.%#\t%q0, %2"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> 
> @@ -3991,7 +3991,7 @@ (define_insn "mve_vdupq_m_n_f"
>VDUPQ_M_N_F))
>]
>"TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
> -  "vpst\;vdupt.%# %q0, %2"
> +  "vpst\;vdupt.%#\t%q0, %2"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdupq_m_n_f16.c
> index 

RE: [PATCH 04/35] arm: improve tests and fix vdwdupq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 04/35] arm: improve tests and fix vdwdupq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vdwdupq_m_wb_u_insn): Fix
> spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c : Improve test.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_n_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_n_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_n_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_x_n_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c : Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  2 +-
>  .../arm/mve/intrinsics/vdwdupq_m_n_u16.c  | 44 ++--
>  .../arm/mve/intrinsics/vdwdupq_m_n_u32.c  | 46 ++---
>  .../arm/mve/intrinsics/vdwdupq_m_n_u8.c   | 46 ++---
>  .../arm/mve/intrinsics/vdwdupq_m_wb_u16.c | 50 ---
>  .../arm/mve/intrinsics/vdwdupq_m_wb_u32.c | 48 +++---
>  .../arm/mve/intrinsics/vdwdupq_m_wb_u8.c  | 50 ---
>  .../arm/mve/intrinsics/vdwdupq_n_u16.c| 32 ++--
>  .../arm/mve/intrinsics/vdwdupq_n_u32.c| 32 ++--
>  .../arm/mve/intrinsics/vdwdupq_n_u8.c | 32 ++--
>  .../arm/mve/intrinsics/vdwdupq_wb_u16.c   | 32 ++--
>  .../arm/mve/intrinsics/vdwdupq_wb_u32.c   | 32 ++--
>  .../arm/mve/intrinsics/vdwdupq_wb_u8.c| 32 ++--
>  .../arm/mve/intrinsics/vdwdupq_x_n_u16.c  | 42 ++--
>  .../arm/mve/intrinsics/vdwdupq_x_n_u32.c  | 46 ++---
>  .../arm/mve/intrinsics/vdwdupq_x_n_u8.c   | 46 ++---
>  .../arm/mve/intrinsics/vdwdupq_x_wb_u16.c | 50 ---
>  .../arm/mve/intrinsics/vdwdupq_x_wb_u32.c | 46 ++---
>  .../arm/mve/intrinsics/vdwdupq_x_wb_u8.c  | 50 ---
>  19 files changed, 655 insertions(+), 103 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 1215f845388..58ffe03c499 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9195,7 +9195,7 @@ (define_insn
> "mve_vdwdupq_m_wb_u_insn"
>VDWDUPQ_M))
>]
>"TARGET_HAVE_MVE"
> -  "vpst\;\tvdwdupt.u%#\t%q2, %3, %R4, %5"
> +  "vpst\;vdwdupt.u%#\t%q2, %3, %R4, %5"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> index 5303fd7d361..8f53f5ef0cb 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vdwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return vdwdupq_m (inactive, a, b, 1, p);
> +  return vdwdupq_m_n_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vdwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vdwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
>return vdwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { 

RE: [PATCH 05/35] arm: improve vidupq* tests

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 05/35] arm: improve vidupq* tests
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vidupq_m_n_u16.c   | 46 +---
>  .../arm/mve/intrinsics/vidupq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vidupq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vidupq_m_wb_u16.c  | 46 +---
>  .../arm/mve/intrinsics/vidupq_m_wb_u32.c  | 42 +--
>  .../arm/mve/intrinsics/vidupq_m_wb_u8.c   | 42 +--
>  .../arm/mve/intrinsics/vidupq_n_u16.c | 32 ++--
>  .../arm/mve/intrinsics/vidupq_n_u32.c | 28 +-
>  .../arm/mve/intrinsics/vidupq_n_u8.c  | 28 +-
>  .../arm/mve/intrinsics/vidupq_wb_u16.c| 32 ++--
>  .../arm/mve/intrinsics/vidupq_wb_u32.c| 28 +-
>  .../arm/mve/intrinsics/vidupq_wb_u8.c | 28 +-
>  .../arm/mve/intrinsics/vidupq_x_n_u16.c   | 46 +---
>  .../arm/mve/intrinsics/vidupq_x_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vidupq_x_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vidupq_x_wb_u16.c  | 52 +++
>  .../arm/mve/intrinsics/vidupq_x_wb_u32.c  | 52 +++
>  .../arm/mve/intrinsics/vidupq_x_wb_u8.c   | 52 +++
>  18 files changed, 634 insertions(+), 88 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> index 822d41197e6..b4ee7af36e3 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vidupt.u16  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vidupq_m_n_u16 (inactive, a, 4, p);
> +  return vidupq_m_n_u16 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vidupt.u16  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
> -  return vidupq_m (inactive, a, 4, p);
> +  return vidupq_m (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vidupt.u16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vidupt.u16  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  @.*|)
> +**   ...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return vidupq_m (inactive, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vidupq_m_n_u32.c
> index c01826e15dc..b13a7a80dcb 100644
> --- 

RE: [PATCH 03/35] arm: improve tests and fix vddupq*

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 03/35] arm: improve tests and fix vddupq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vddupq_u_insn): Fix 'vddup.u'
>   spacing.
>   (mve_vddupq_m_wb_u_insn): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vddupq_m_n_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_m_n_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_n_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_x_n_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_x_n_u8.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c : Likewise.
>   * gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c : Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  4 +-
>  .../arm/mve/intrinsics/vddupq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vddupq_m_n_u32.c   | 46 +---
>  .../arm/mve/intrinsics/vddupq_m_n_u8.c| 46 +---
>  .../arm/mve/intrinsics/vddupq_m_wb_u16.c  | 42 +--
>  .../arm/mve/intrinsics/vddupq_m_wb_u32.c  | 46 +---
>  .../arm/mve/intrinsics/vddupq_m_wb_u8.c   | 46 +---
>  .../arm/mve/intrinsics/vddupq_n_u16.c | 32 ++--
>  .../arm/mve/intrinsics/vddupq_n_u32.c | 28 +-
>  .../arm/mve/intrinsics/vddupq_n_u8.c  | 28 +-
>  .../arm/mve/intrinsics/vddupq_wb_u16.c| 32 ++--
>  .../arm/mve/intrinsics/vddupq_wb_u32.c| 28 +-
>  .../arm/mve/intrinsics/vddupq_wb_u8.c | 28 +-
>  .../arm/mve/intrinsics/vddupq_x_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vddupq_x_n_u32.c   | 46 +---
>  .../arm/mve/intrinsics/vddupq_x_n_u8.c| 46 +---
>  .../arm/mve/intrinsics/vddupq_x_wb_u16.c  | 52 +++
>  .../arm/mve/intrinsics/vddupq_x_wb_u32.c  | 52 +++
>  .../arm/mve/intrinsics/vddupq_x_wb_u8.c   | 52 +++
>  19 files changed, 642 insertions(+), 96 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 62186f124da..1215f845388 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -9043,7 +9043,7 @@ (define_insn "mve_vddupq_u_insn"
> (minus:SI (match_dup 2)
>(match_operand:SI 4 "immediate_operand" "i")))]
>   "TARGET_HAVE_MVE"
> - "vddup.u%#  %q0, %1, %3")
> + "vddup.u%#\t%q0, %1, %3")
> 
>  ;;
>  ;; [vddupq_m_n_u])
> @@ -9079,7 +9079,7 @@ (define_insn
> "mve_vddupq_m_wb_u_insn"
> (minus:SI (match_dup 3)
>(match_operand:SI 6 "immediate_operand" "i")))]
>   "TARGET_HAVE_MVE"
> - "vpst\;\tvddupt.u%#\t%q0, %2, %4"
> + "vpst\;vddupt.u%#\t%q0, %2, %4"
>   [(set_attr "length""8")])
> 
>  ;;
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> index 7332711f6a7..7c8b0152763 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vddupt.u16  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, mve_pred16_t p)
>  {
>return vddupq_m_n_u16 (inactive, a, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vddupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vddupt.u16  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  

Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 18, 2022 at 11:24:45AM -0500, Jason Merrill wrote:
> > Right, that's the C++17 implicit constexpr for lambdas, finish_function:
> > 
> >/* Lambda closure members are implicitly constexpr if possible.  */
> >if (cxx_dialect >= cxx17
> >&& LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl)))
> >  DECL_DECLARED_CONSTEXPR_P (fndecl)
> >= ((processing_template_decl
> >|| is_valid_constexpr_fn (fndecl, /*complain*/false))
> >   && potential_constant_expression (DECL_SAVED_TREE (fndecl)));
> 
> Yeah, I guess potential_constant_expression needs to be stricter in a
> lambda. Or perhaps any function that isn't already
> DECL_DECLARED_CONSTEXPR_P?

potential_constant_expression can't be relied on that it catches up
everything if it, even a simple if statement with a condition not yet
known to be 0 or non-0 results in just a requirement that at least
one of the substatements is potential constant, etc.
Similarly switch statements etc.
If there is a way to distinguish between functions with user
specified constexpr/consteval and DECL_DECLARED_CONSTEXPR_P set
through the above if condition, sure, cp_finish_decl ->
check_static_in_constexpr could be perhaps silent about those, but then
we want to diagnose it during constexpr evaluation at least.  But in that
case having it a pedwarn rather than "this is a constant expression"
vs. "this is not a constant expression, if !ctx->quiet emit an error"
is something I don't see how to handle.  Because something needs
to be returned, it is a constant expression or it is not.

Jakub



RE: [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 02/35] arm: fix 'vmsr' spacing and register capitalization
> 
> gcc/ChangeLog:
> 
>   * config/arm/vfp.md (*thumb2_movhi_vfp, *thumb2_movhi_fp16):
> Fix
>   'vmsr' spacing and reg capitalization.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:
>   Update test.
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c:
>   Likewise.
>   * gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c:
>   Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/vfp.md | 8 
>  .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c  | 2 +-
>  .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c  | 2 +-
>  .../arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c  | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index d0f423cc3c5..932e4b7447e 100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -105,9 +105,9 @@ (define_insn "*thumb2_movhi_vfp"
>  case 8:
>return "vmov%?.f32\t%0, %1\t%@ int";
>  case 9:
> -  return "vmsr%?\t P0, %1\t@ movhi";
> +  return "vmsr%?\tp0, %1\t@ movhi";
>  case 10:
> -  return "vmrs%?\t %0, P0\t@ movhi";
> +  return "vmrs%?\t%0, p0\t@ movhi";
>  default:
>gcc_unreachable ();
>  }
> @@ -209,9 +209,9 @@ (define_insn "*thumb2_movhi_fp16"
>  case 8:
>return "vmov%?.f32\t%0, %1\t%@ int";
>  case 9:
> -  return "vmsr%?\t P0, %1\t%@ movhi";
> +  return "vmsr%?\tp0, %1\t%@ movhi";
>  case 10:
> -  return "vmrs%?\t%0, P0\t%@ movhi";
> +  return "vmrs%?\t%0, p0\t%@ movhi";
>  default:
>gcc_unreachable ();
>  }
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> index f3219e2e825..1e57ca40739 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3
> 2.c
> @@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>  }
> 
>  /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } 
> */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
> +/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
>  /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } 
> */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> index 4d093d243fe..f8d77fdfd5b 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3
> 2.c
> @@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>  }
> 
>  /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } 
> */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
> +/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
>  /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } 
> */
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3
> 2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u
> 32.c
> index e796522a49c..8a0e109c70c 100644
> ---
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3
> 2.c
> +++
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u
> 32.c
> @@ -11,7 +11,7 @@ foo (uint32x4_t * addr, mve_pred16_t p)
>  }
> 
>  /* { dg-final { scan-assembler "vldrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } 
> */
> -/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
> +/* { dg-final { scan-assembler "vmsr\tp0, r\[0-9\]+.*" } } */
>  /* { dg-final { scan-assembler "vpst" } } */
>  /* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-
> 9\]+\\\]!" } } */
>  /* { dg-final { scan-assembler "vstrw.32\tq\[0-9\]+, \\\[r\[0-9\]+\\\]" } } 
> */
> --
> 2.25.1



Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-18 Thread Kees Cook via Gcc-patches
On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
> Hi, Richard,
> 
> Honestly, it’s very hard for me to decide what’s the best way to handle the 
> interaction 
> between -fstrict-flex-array=M and -Warray-bounds=N. 
> 
> Ideally,  -fstrict-flex-array=M should completely control the behavior of 
> -Warray-bounds.
> If possible, I prefer this solution.
> 
> However, -Warray-bounds is included in -Wall, and has been used extensively 
> for a long time.
> It’s not safe to change its default behavior. 

I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
it is in -Wall is _good_ for this reason. :) No one is going to add
-fstrict-flex-arrays (at any level) without understanding what it does
and wanting those effects on -Warray-bounds.

-- 
Kees Cook


Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 10:03, Marek Polacek wrote:

On Fri, Nov 18, 2022 at 08:48:32AM +0100, Jakub Jelinek wrote:

On Thu, Nov 17, 2022 at 07:15:05PM -0500, Marek Polacek wrote:

--- gcc/cp/decl.cc.jj   2022-11-16 14:44:43.692339668 +0100
+++ gcc/cp/decl.cc  2022-11-17 20:53:44.102011594 +0100
@@ -5600,6 +5600,57 @@ groktypename (cp_decl_specifier_seq *typ
return type;
  }
  
+/* For C++17 and older diagnose static or thread_local decls in constexpr

+   or consteval functions.  For C++20 similarly, except if they are


In C++17 we don't support consteval so I guess drop the "or consteval "?


I just forgot to update the function comment.

Anyway, I think:


BTW, I notice that the patch breaks
g++.dg/cpp1y/lambda-generic-func1.C
g++.dg/cpp1z/constexpr-lambda16.C
Maybe they just need dg- tweaks.


this is actually a real bug and I'm not sure how to resolve that.

We have there:

int main()
{
   [](auto i) { if (i) { int j; static int k; return i + j; } return i; }(0);
}

and for C++17/20 I presume something (haven't figured out yet what) marks


Right, that's the C++17 implicit constexpr for lambdas, finish_function:

   /* Lambda closure members are implicitly constexpr if possible.  */
   if (cxx_dialect >= cxx17
   && LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl)))
 DECL_DECLARED_CONSTEXPR_P (fndecl)
   = ((processing_template_decl
   || is_valid_constexpr_fn (fndecl, /*complain*/false))
  && potential_constant_expression (DECL_SAVED_TREE (fndecl)));


Yeah, I guess potential_constant_expression needs to be stricter in a 
lambda. Or perhaps any function that isn't already 
DECL_DECLARED_CONSTEXPR_P?



the lambda operator() when still a template as constexpr and then
cp_finish_decl -> diagnose_static_in_constexpr pedwarns on it.
For the above perhaps we could figure out there is a static int k; in the
operator() and don't turn it into constexpr, but what if there is
something that would e.g. satisfy decl_maybe_constant_var_p but not
decl_constant_var_p when actually instantiated?


I'd expect the above change to potential_c_e to handle that case.

Jason



Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-18 Thread Jason Merrill via Gcc-patches

On 11/18/22 05:49, Bernhard Reutner-Fischer wrote:

On Thu, 17 Nov 2022 18:52:36 -0500
Jason Merrill  wrote:


On 11/17/22 14:02, Bernhard Reutner-Fischer wrote:

On Thu, 17 Nov 2022 09:53:32 -0500
Jason Merrill  wrote:



Instead, you want to copy the location for instantiations, i.e. check
DECL_TEMPLATE_INSTANTIATION instead of !DECL_USE_TEMPLATE.


No, that makes no difference.


Hmm, when I stop there when processing the instantiation the template's
DECL_RESULT has the right location information, e.g. for

template  int f() { return 42; }

int main()
{
f();
}

#1  0x00f950e8 in instantiate_body (pattern=, args=, d=, nested_p=false) at /home/jason/gt/gcc/cp/pt.cc:26470
#0  start_preparsed_function (decl1=,
attrs=, flags=1) at /home/jason/gt/gcc/cp/decl.cc:17252
(gdb) p expand_location (input_location)
$13 = {file = 0x4962370 "wa.C", line = 1, column = 24, data = 0x0, sysp
= false}
(gdb) p expand_location (DECL_SOURCE_LOCATION (DECL_RESULT
(DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1)
$14 = {file = 0x4962370 "wa.C", line = 1, column = 20, data = 0x0, sysp
= false}


Yes, that works. Sorry if i was not clear: The thing in the cover
letter in this series does not work, the mini_vector reduced testcase
from the libstdc++-v3/include/ext/bitmap_allocator.h.
class template member function return type location, would that be it?

AFAIR the problem was that that these member functions get their result
decl late. When they get them, there are no
declspecs->locations[ds_type_spec] around anywhere to tuck that on the
resdecl. While the result decl is clear, there is no obvious way where
to store the ds_type_spec (somewhere in the template, as you told me).

Back then I tried moving the resdecl building from
start_preparsed_function to grokfndecl but that did not work out easily
IIRC and i ultimately gave up to move stuff around rather blindly.
I also tried to find a spot where i could store the ds_type_spec locus
somewhere in grokmethod, but i think the problem was the same, i had
just the type where i cannot store a locus and did not find a place
where i could smuggle the locus along.


Ah, so the problem is deferred parsing of methods, rather than 
templates.  Building the DECL_RESULT sooner does seem like the right 
approach to handling that, whether that's in grokfndecl or grokmethod.



So, to make that clear. Your template function (?) works:

$ XXX=1 ./xg++ -B. -S -o /dev/null ../tmp4/return-narrow-2j.cc
../tmp4/return-narrow-2j.cc: In function ‘int f()’:
../tmp4/return-narrow-2j.cc:1:20: warning: result decl locus sample
 1 | template  int f() { return 42; }
   |^~~
   |the return type
../tmp4/return-narrow-2j.cc: In function ‘int main()’:
../tmp4/return-narrow-2j.cc:3:1: warning: result decl locus sample
 3 | int main()
   | ^~~
   | the return type
../tmp4/return-narrow-2j.cc: In instantiation of ‘int f() [with T = int]’:
../tmp4/return-narrow-2j.cc:5:10:   required from here
../tmp4/return-narrow-2j.cc:1:20: warning: result decl locus sample
 1 | template  int f() { return 42; }
   |^~~
   |the return type


The class member fn not so much (IMHO, see attached):

$ XXX=1 ./xg++ -B. -S -o /dev/null ../tmp4/return-narrow-2.cc
../tmp4/return-narrow-2.cc: In member function ‘const long unsigned int __mini_vector< 
 >::_M_space_left()’:
../tmp4/return-narrow-2.cc:9:3: warning: result decl locus sample
 9 |   { return _M_finish != 0; }
   |   ^
   |   the return type
../tmp4/return-narrow-2.cc: In instantiation of ‘const long unsigned int __mini_vector< 
 >::_M_space_left() [with  = 
std::pair]’:
../tmp4/return-narrow-2.cc:11:17:   required from here
../tmp4/return-narrow-2.cc:9:3: warning: result decl locus sample
 9 |   { return _M_finish != 0; }
   |   ^
   |   the return type
../tmp4/return-narrow-2.cc: In instantiation of ‘const long unsigned int __mini_vector< 
 >::_M_space_left() [with  = int]’:
../tmp4/return-narrow-2.cc:12:17:   required from here
../tmp4/return-narrow-2.cc:9:3: warning: result decl locus sample
 9 |   { return _M_finish != 0; }
   |   ^
   |   the return type





But really I'm not interested in the template case, i only mentioned
them because they don't work and in case somebody wanted to have correct
locations.
I remember just frustration when i looked at those a year ago.


I'd like to get the template case right while we're looking at it.  I
guess I can add that myself if you're done trying.


Is the hunk for normal functions OK for trunk?


You also need a testcase for the desired behavior, with e.g.
{ dg-error "23:" }


I'd have to think about how to test that with trunk, yes.
There are no existing warnings that want to point to the return type,
are there?


Good point.  Do any of your later patches add such a warning?


Maybe a g++.dg/plugin/result_decl_plugin.c then.

set plugin_test_list [list
hmz. That 

Re: [PATCH] optimize the testcase for architectures that use ".srodata"

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/17/22 22:49, Yixuan Chen wrote:

Subject:
[PATCH] optimize the testcase for architectures that use ".srodata"
From:
Yixuan Chen 
Date:
11/17/22, 22:49

To:
gcc-patches@gcc.gnu.org
CC:
kito.ch...@gmail.com, and...@sifive.com, oriachi...@gmail.com, 
jia...@iscas.ac.cn, Yixuan Chen 



2022-11-18  Yixuan Chen

 * gcc.dg/pr25521.c: optimize the testcast for architectures that use 
".srodata"


I fixed up the ChangeLog a bit and committed this for you. Thanks.

jeff



Re: [PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare

2022-11-18 Thread Segher Boessenkool
Hi!

On Thu, Nov 17, 2022 at 03:52:26PM +0800, Kewen.Lin wrote:
> on 2022/11/17 02:58, Segher Boessenkool wrote:
> > On Wed, Nov 16, 2022 at 02:51:04PM +0800, Kewen.Lin wrote:
> >>/* In vector.md, we support all kinds of vector float point
> >>   comparison operators in a comparison rtl pattern, we can
> >>   just emit the comparison rtx insn directly here.  Besides,
> >>   we should have a centralized place to handle the possibility
> >> - of raising invalid exception.  */
> >> -  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
> >> + of raising invalid exception.  Also emit directly for vector
> >> + integer comparison operators EQ/GT/GTU.  */
> >> +  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
> >> +  || rcode == EQ
> >> +  || rcode == GT
> >> +  || rcode == GTU)
> > 
> > The comment still says it handles FP only.  That would be best to keep
> > imo: add a separate block of code to handle the integer stuff you want
> > to add.  You get the same or better generated code, the compiler is
> > smart enough.  Code is for the user to read, and C is not a portable
> > assembler language.
> 
> OK, I'll make two blocks for FP and integer respectively.  I struggled
> a bit updating this hunk with comments for integer comparison
> consideration, someone could argue that both can share the same handling
> if updating the condition.

The golden rule of programming: if something is hard to explain, you
probably overcomplicated it.  Sometimes more code is much easier to
read, too.

> > This whole series needs to be factored better, it does way too many
> > things, and only marginally related things, at every step.  Or I don't
> > see it anyway :-)
> 
> OK, I was thinking patch 1/2 is to unify the current vector float
> comparison handlings, patch 2/2 is to refine the remaining handlings
> for vector integer comparison.  I'm pleased to factor it better, any
> suggestions on concrete code is highly appreciated.  :)

Often it helps to start with a patch (or patches) that only restructures
existing code, without changing what it does, so that the patch that
does change anything material is smaller and easier to read and review.
The (perhaps big) patch that doesn't change anything is easy to review
as well then, simple because it *says* it does not change anything, and
reviewing it boils down to verifying that is true.


Segher


Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-18 Thread Qing Zhao via Gcc-patches
Hi, Richard,

Honestly, it’s very hard for me to decide what’s the best way to handle the 
interaction 
between -fstrict-flex-array=M and -Warray-bounds=N. 

Ideally,  -fstrict-flex-array=M should completely control the behavior of 
-Warray-bounds.
If possible, I prefer this solution.

However, -Warray-bounds is included in -Wall, and has been used extensively for 
a long time.
It’s not safe to change its default behavior. 

So, I guess that the bottom-line for this work is:

Keeping the default behavior of -Warray-bounds.

Is this correct understanding?


> On Nov 18, 2022, at 8:14 AM, Richard Biener  wrote:
> 
> On Tue, 8 Nov 2022, Qing Zhao wrote:
> 
>> '-Wstrict-flex-arrays'
>> Warn about inproper usages of flexible array members according to
>> the LEVEL of the 'strict_flex_array (LEVEL)' attribute attached to
>> the trailing array field of a structure if it's available,
>> otherwise according to the LEVEL of the option
>> '-fstrict-flex-arrays=LEVEL'.
>> 
>> This option is effective only when LEVEL is bigger than 0.
>> Otherwise, it will be ignored with a warning.
>> 
>> when LEVEL=1, warnings will be issued for a trailing array
>> reference of a structure that have 2 or more elements if the
>> trailing array is referenced as a flexible array member.
>> 
>> when LEVEL=2, in addition to LEVEL=1, additional warnings will be
>> issued for a trailing one-element array reference of a structure if
>> the array is referenced as a flexible array member.
>> 
>> when LEVEL=3, in addition to LEVEL=2, additional warnings will be
>> issued for a trailing zero-length array reference of a structure if
>> the array is referenced as a flexible array member.
>> 
>> At the same time, keep -Warray-bounds=[1|2] warnings unchanged from
>> -fstrict-flex-arrays.
> 
> Looking at this, is this a way to avoid interpreting -Warray-bounds=N
> together with -fstrict-flex-arrays=M?  Won't this be just confusing to
> users?  Especially since -Wall includes -Warray-bounds and thus we'll
> diagnose
> 
> +  if (opts->x_warn_array_bounds)
> +if (opts->x_flag_strict_flex_arrays)
> +  {
> +   warning_at (UNKNOWN_LOCATION, 0,
> +   "%<-Warray-bounds%> is not impacted by "
> +   "%<-fstrict-flex-arrays%>");
> +  }
> 
> and do that even when -Wstrict-flex-arrays is given?

The basic idea here is:   -Warray-bounds=N will NOT be controlled by 
-fstrict-flex-array=M at all.
And the new -Wstrict-flex-array will be used to report warnings for different 
level of “M”. 

> 
> Would it be better/possible to add a note: to existing -Warray-bounds
> diagnostics on how the behavior is altered by -fstrict-flex-arrays?

If -Warray-bounds does not have the additional level “N” argument. It’s 
reasonable and natural for it to be controlled 
by the level of -fstrict-flex-arrays. 
> 
> I guess this will inevitably re-iterate the -fstrict-flex-arrays=N
> vs. -Warray-bounds=M discussion ...

Yes, that’s the most confusion and challenge part for this work… and took me a 
lot of thinking but still cannot
 find the best way to handle it…..
> 
> I think it would be better to stick with -Warray-bounds and flex
> its =2 mode to work according to -fstrict-flex-arrays=N instead of
> "out of bounds accesses to trailing struct members of one-element array 
> types" (thus, not add [1] but instead the cases that are not flex 
> arrays according to -fstrict-flex-arrays).

From my understanding, you suggested the following:

1. Keep -Warray-bounds default behavior.
i.e. when -Warray-bounds=1, it’s behavior will not be impacted by 
-fstrict-flex-array=M

2. When -Warray-bounds=2,  it’s behavior will be controlled by 
-fstrict-flex-array=M


Is the above understanding correct?

If yes, then the major question is:

When -Warray-bounds=2, -fstrict-flex-array = 0 or 1,2,  i.e, when the level of 
-fstrict-flex-array is lower or equal to 2. 
 [0] and [1] will be treated as flexible array member by -fstrict-flex-array, 
it’s conflict with how -Warray-bounds=2’s 
behavior on treating flexible array members.  Under such situation, which one 
has higher priority?


I have another idea now:

1. Keep -Warray-bounds=1 default behavior. 
2. Change the behavior of -Warray-bounds=2 from:

Current:
-Warray-bounds=2
This warning level also warns about out of bounds accesses to trailing struct 
members of one-element array types (see Zero Length) and about the intermediate 
results of pointer arithmetic that may yield out of bounds values. This warning 
level may give a larger number of false positives and is deactivated by default.

New:
-Warray-bounds=2
This warning level also warns  about the intermediate results of pointer 
arithmetic that may yield out of bounds values. This warning level may give a 
larger number of false positives and is deactivated by default.

i.e, delete the control on flexible array member from the level of 
-Warray-bounds. 

3. Use 

Re: [PATCH] Allow prologues and epilogues to be inserted later

2022-11-18 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 11/11/22 09:21, Richard Sandiford via Gcc-patches wrote:
>> Arm's SME adds a new processor mode called streaming mode.
>> This mode enables some new (matrix-oriented) instructions and
>> disables several existing groups of instructions, such as most
>> Advanced SIMD vector instructions and a much smaller set of SVE
>> instructions.  It can also change the current vector length.
>>
>> There are instructions to switch in and out of streaming mode.
>> However, their effect on the ISA and vector length can't be represented
>> directly in RTL, so they need to be emitted late in the pass pipeline,
>> close to md_reorg.
>>
>> It's sometimes the responsibility of the prologue and epilogue to
>> switch modes, which means we need to emit the prologue and epilogue
>> sequences late as well.  (This loses shrink-wrapping and scheduling
>> opportunities, but that's a price worth paying.)
>>
>> This patch therefore adds a target hook for forcing prologue
>> and epilogue insertion to happen later in the pipeline.
>>
>> Tested on aarch64-linux-gnu (including with a follow-on patch)
>> and x86_64-linux-gnu.  OK to install?
>>   I'll ob
>> Richard
>>
>>
>> gcc/
>>  * target.def (use_late_prologue_epilogue): New hook.
>>  * doc/gccint/target-macros/miscellaneous-parameters.rst: Add
>>  TARGET_USE_LATE_PROLOGUE_EPILOGUE.
>>  * doc/gccint/target-macros/tm.rst.in: Regenerate.
>>  * passes.def (pass_late_thread_prologue_and_epilogue): New pass.
>>  * tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
>>  * function.cc (pass_thread_prologue_and_epilogue::gate): New function.
>>  (pass_data_late_thread_prologue_and_epilogue): New pass variable.
>>  (pass_late_thread_prologue_and_epilogue): New pass class.
>>  (make_pass_late_thread_prologue_and_epilogue): New function.
>
> I'm not sure how we'll enforce the no target independent code motion 
> limitation that this seems to need and the exception made for reorg is 
> hackish in that it appears we just rely on the fact that reorg isn't run 
> for the one target where this matters.  That does make me wonder if we 
> should future proof this ever so slightly -- is there a reasonably easy 
> way to fail if a target were to define delay slots and the need for late 
> prologue/epilogue?  If so, that seems advisable.
>
>
> No objection to the meat of the patch, just wondering a bit about the 
> additional sanity checking we can do...

Yeah, good point.  How does the version below look?  Tested as before.

I guess it's a philosophical question what distinguishes "late compilation"
from everything else, but I think it makes sense for it to mean "no code
motion" (among other things).  And it's useful if targets have a well-
defined point at which they can insert their own passes while guaranteeing
that:

- the CFG still exists and hasn't lost information
- no code motion occurs later
- alignments aren't nailed down yet
- variable tracking occurs later (and so will account for whatever the
  target does in its pass)

One of the SME patches uses it for that purpose, independently of this patch,
and also needs there to be no code motion.

I don't think it's controversial to say that delay-branch reorg should
happen as part of normal scheduling, with the later passes coping with
the SEQUENCEs generated from it, but there's no realistic chance of
that happening.  So unfortunately it's always likely to be a special
case...

Bernd did some nice work on avoiding dbr for bfin (IIRC), but without
the handling of SEQUENCEs in rtl passes, even that version had to
happen during md_reorg.

Thanks,
Richard


gcc/
* target.def (use_late_prologue_epilogue): New hook.
* doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE.
* doc/tm.texi: Regenerate.
* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
(pass_data_late_thread_prologue_and_epilogue): New pass variable.
(pass_late_thread_prologue_and_epilogue): New pass class.
(make_pass_late_thread_prologue_and_epilogue): New function.
---
 gcc/doc/tm.texi| 19 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/function.cc| 49 ++
 gcc/passes.def |  3 +++
 gcc/target.def | 21 
 gcc/tree-pass.h|  2 ++
 6 files changed, 96 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index af77d16030c..6624768d68c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11667,6 +11667,25 @@ of the if-block in the @code{struct ce_if_block} 
structure that is pointed
 to by @var{ce_info}.
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_USE_LATE_PROLOGUE_EPILOGUE ()
+Return true if the current function's prologue and epilogue should
+be emitted late in the pass 

Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 18, 2022 at 10:03:18AM -0500, Marek Polacek wrote:
> > the lambda operator() when still a template as constexpr and then
> > cp_finish_decl -> diagnose_static_in_constexpr pedwarns on it.
> > For the above perhaps we could figure out there is a static int k; in the
> > operator() and don't turn it into constexpr, but what if there is
> > something that would e.g. satisfy decl_maybe_constant_var_p but not
> > decl_constant_var_p when actually instantiated?
> > Without my patch, the diagnostics is in start_decl which isn't called again
> > during instantiation, so I presume we mark it as constexpr and then we'd
> > diagnose it during constant evaluation.
> 
> Um, can we give up on trying to handle C++17/C++20 then?

That was why I've posted the other two variant patches (with the 3rd one
being a strict C++23 only change).  Even if it is just a temporary state,
make C++23 work first and then iterate if it is possible to make C++17/20
working with the pedwarns right.

Jakub



Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare

2022-11-18 Thread Segher Boessenkool
Hi!

On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote:
> on 2022/11/17 02:44, Segher Boessenkool wrote:
> > On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
> >>* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
> >>float only comparison operators.
> > 
> > Why?  Is that correct?  Your mail says nothing about this :-(
> > 
> > Is there any testcase that covers this, and that shows things still
> > generate the same code?
> > 
> 
> Sorry for the unclear description, I thought mistakenly that it's
> probably straightforward.
> 
> With the change in this patch, all 14 vector float comparison operators
> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
> would be handled early in rs6000_emit_vector_compare.
> 
> For unordered/ordered/ltgt/uneq, the new way is exactly the same
> as what we do in rs6000_emit_vector_compare_inner, it means there is
> no chance to get into rs6000_emit_vector_compare_inner with any of them.

Ah!  In that case, please add an assert there.  It helps catch problems,
but much more importantly even, if helps the reader understand what is
going on :-)

> For eq/ge/gt, it's the same too, but they are shared with vector integer
> comparison, I just left them alone here.  Just noticed we can remove ge
> safely too as it's guarded with !MODE_VECTOR_INT.

ge is nasty for float, it means something different with and without
-ffast-math (with fast-math ge means not lt, le means not gt; both can
be done with a simple single condition, no cror needed.  (Compare to ne
which is the same with and without -ffast-math, that is because it has a
"not" in its definition!)

> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
> with reverse_condition_maybe_unordered and invert the result, it's the
> same as what we have in vector.md.
> 
> ; unge(a,b) = ~lt(a,b)
> ; unle(a,b) = ~gt(a,b)
> ; ne(a,b)   = ~eq(a,b)
> ; ungt(a,b) = ~le(a,b)
> ; unlt(a,b) = ~ge(a,b)

But for these last two do we generate identical code still?  Since
forever we have only use cror here (with CCEQ), not crnor etc. (and will
CCEQ still do the correct thing always then?)

> Then eq/ge/gt on the right side would match the cases that were mentioned
> above.  So we just need to focus on lt and le then.
> 
> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
> it's the same as what we have in vector.md:
> 
> ; lt(a,b)   = gt(b,a)
> 
> , and further matches the case mentioned above.
> 
> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
> and further handle lt recursively, that is:
>le = lt(a,b) || eq(a,b)
>   = gt(b,a) || eq(a,b)
> 
> actually this is worse than what vector.md supports:
> 
> ; le(a,b)   = ge(b,a)
> 
> In short, the function rs6000_emit_vector_compare_inner is only called by
> twice in rs6000_emit_vector_compare, there is no chance to enter
> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
> any more, I think it's safe to make the change in function
> rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
> vector float comparison can improve slightly for UNGT and LE handlings.

Thanks for the explanation!

Can you do this in multiple steps, which will make it much easier to
review, and to spot the problem if some unexpected problem shows up?

> I constructed a test case, compiled with option -O2 -ftree-vectorize
> -fno-vect-cost-model on ppc64le, which goes into this function
> rs6000_emit_vector_compare with all 14 vector float comparison codes,
> the assembly of most functions doesn't change after this patch,
> excepting for test_UNGT_{float,double} and test_LE_{float,double}.

For, this is a separate change, a separate and the other patches will
show no changes in generated code at all.

> Maybe it's good to add one test case with function 
> test_{UNGT,LE}_{float,double}
> and scan not xvcmp{gt,eq}[sd]p.

In the patch that changes code gen for those, sure :-)


Segher


Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-18 Thread Richard Biener via Gcc-patches
On Wed, 2 Nov 2022, Tamar Christina wrote:

> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab.

I'm looking at this now, first some high-level questions.

Why do we need a new cbranch optab?  It seems implementing
a vector comparison and mask test against zero sufficies?

You have some elaborate explanation on how peeling works but I
somewhat miss the high-level idea how to vectorize the early
exit.  I've applied the patches and from looking at how
vect-early-break_1.c gets transformed on aarch64 it seems you
vectorize

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;
 }

as

 for (int i = 0; i < N;)
 {
   if (any (vect_a[i] > x))
 break;
   i += VF;
   vect_b[i] = x + i;
   vect_a[i] = x;
 }
 for (; i < N; i++)
 { 
   vect_b[i] = x + i;
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;
 }

As you outline below this requires that the side-effects done as part
of  and  before exiting can be moved after the
exit, basically you need to be able to compute whether any scalar
iteration covered by a vector iteration will exit the loop early.
Code generation wise you'd simply "ignore" code generating early exits
at the place they appear in the scalar code and instead emit them
vectorized in the loop header.

> Concretely the kind of loops supported are of the forms:
> 
>  for (int i = 0; i < N; i++)
>  {
>
>if ()
>  ;
>
>  }
> 
> where  can be:
>  - break
>  - return
> 
> Any number of statements can be used before the  occurs.
> 
> Since this is an initial version for GCC 13 it has the following limitations:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.

Why?

> - any stores in  should not be to the same objects as in
>   .  Loads are fine as long as they don't have the possibility to
>   alias.

I think that's a fundamental limitation - you have to be able to compute 
the early exit condition at the beginning of the vectorized loop.  For
a single alternate exit it might be possible to apply loop rotation to
move things but that can introduce "bad" cross-iteration dependences(?)

> - No support for prologue peeling.  Since we only support fixed buffers this
>   wouldn't be an issue as we assume the arrays are correctly aligned.

Huh, I don't understand how prologue or epilogue peeling is an issue?  Is
that just because you didn't handle the early exit triggering?

> - Fully masked loops or unmasked loops are supported, but not partially masked
>   loops.
> - Only one additional exit is supported at this time.  The majority of the 
> code
>   will handle n exits. But not all so at this time this restriction is needed.
> - The early exit must be before the natural loop exit/latch.  The vectorizer 
> is
>   designed in way to propage phi-nodes downwards.  As such supporting this
>   inverted control flow is hard.

How do you identify the "natural" exit?  It's the one 
number_of_iterations_exit works on?  Your normal_exit picks the
first from the loops recorded exit list but I don't think that list
is ordered in any particular way.

"normal_exit" would rather be single_countable_exit () or so?  A loop
already has a list of control_ivs (not sure if we ever have more than
one), I wonder if that can be annotated with the corresponding exit
edge?

I think that vect_analyze_loop_form should record the counting IV
exit edge and that recorded edge should be passed to utilities
like slpeel_can_duplicate_loop_p rather than re-querying 'normal_exit',
for example if we'd have

for (;; ++i, ++j)
  {
if (i < n)
  break;
a[i] = 0;
if (j < m)
  break;
  }

which counting IV we choose as "normal" should be up to the vectorizer,
not up to the loop infrastructure.

The patch should likely be split, doing single_exit () replacements
with, say, LOOP_VINFO_IV_EXIT (..) first.


> - No support for epilogue vectorization.  The only epilogue supported is the
>   scalar final one.
>
> With the help of IPA this still gets hit quite often.  During bootstrap it
> hit rather frequently as well.
> 
> This implementation does not support completely handling the early break 
> inside
> the vector loop itself but instead supports adding checks such that if we know
> that we have to exit in the current iteration then we branch to scalar code to
> actually do the final VF iterations which handles all the code in .
> 
> niters analysis and the majority of the vectorizer with hardcoded single_exit
> have been updated with the use of a new function normal_exit which returns the
> loop's natural exit.
> 
> for niters the natural exit is still what determines the overall iterations as
> that is the O(iters) for the loop.
> 
> For the scalar loop we know that whatever exit you take you have to perform at

Re: [PATCH] c++, v4: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Marek Polacek via Gcc-patches
On Fri, Nov 18, 2022 at 08:48:32AM +0100, Jakub Jelinek wrote:
> On Thu, Nov 17, 2022 at 07:15:05PM -0500, Marek Polacek wrote:
> > > --- gcc/cp/decl.cc.jj 2022-11-16 14:44:43.692339668 +0100
> > > +++ gcc/cp/decl.cc2022-11-17 20:53:44.102011594 +0100
> > > @@ -5600,6 +5600,57 @@ groktypename (cp_decl_specifier_seq *typ
> > >return type;
> > >  }
> > >  
> > > +/* For C++17 and older diagnose static or thread_local decls in constexpr
> > > +   or consteval functions.  For C++20 similarly, except if they are
> > 
> > In C++17 we don't support consteval so I guess drop the "or consteval "?
> 
> I just forgot to update the function comment.
> 
> Anyway, I think:
> 
> > BTW, I notice that the patch breaks
> > g++.dg/cpp1y/lambda-generic-func1.C
> > g++.dg/cpp1z/constexpr-lambda16.C
> > Maybe they just need dg- tweaks.
> 
> this is actually a real bug and I'm not sure how to resolve that.
> 
> We have there:
> 
> int main()
> {
>   [](auto i) { if (i) { int j; static int k; return i + j; } return i; }(0);
> }
> 
> and for C++17/20 I presume something (haven't figured out yet what) marks

Right, that's the C++17 implicit constexpr for lambdas, finish_function:

  /* Lambda closure members are implicitly constexpr if possible.  */
  if (cxx_dialect >= cxx17
  && LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl)))
DECL_DECLARED_CONSTEXPR_P (fndecl)
  = ((processing_template_decl
  || is_valid_constexpr_fn (fndecl, /*complain*/false))
 && potential_constant_expression (DECL_SAVED_TREE (fndecl)));

> the lambda operator() when still a template as constexpr and then
> cp_finish_decl -> diagnose_static_in_constexpr pedwarns on it.
> For the above perhaps we could figure out there is a static int k; in the
> operator() and don't turn it into constexpr, but what if there is
> something that would e.g. satisfy decl_maybe_constant_var_p but not
> decl_constant_var_p when actually instantiated?
> Without my patch, the diagnostics is in start_decl which isn't called again
> during instantiation, so I presume we mark it as constexpr and then we'd
> diagnose it during constant evaluation.

Um, can we give up on trying to handle C++17/C++20 then?

Marek



RE: [PATCH][committed] aarch64: Fix up LDAPR codegen

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Kyrylo
> Tkachov via Gcc-patches
> Sent: Friday, November 18, 2022 9:06 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][committed] aarch64: Fix up LDAPR codegen
> 
> Hi all,
> 
> Upon some further inspection I realised I had misunderstood some
> intricacies of the extending loads of the RCPC feature.
> This patch fixes up the recent GCC support accordingly. In particular:
> * The sign-extending forms are a form of LDAPURS* and are actually part of
> FEAT_RCPC2 that is enabled with Armv8.4-a rather than the base Armv8.3-a
> FEAT_RCPC. The patch introduces a TARGET_RCPC2 macro and gates this
> combine pattern accordingly.
> * The assembly output for the zero-extending LDAPR instruction should
> always use %w formatting for its destination register.
> 

... And another follow-up once I realised that the sign-extending load, of 
course,
needs to have strictly an X-reg as a destination for DImode extensions and a 
W-reg
for SImode ones. The zext pattern change was correct

Bootstrapped and tested on aarch64-none-linux.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/atomics.md (*aarch64_atomic_load_rcpc_sext):
Use   for destination format.
* config/aarch64/iterators.md (w_sz): Delete.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ldapr-sext.c: Adjust expected output.

> The testcase is split into zero-extending and sign-extending parts since they
> require different architecture pragmas.
> It's also straightforward to add the rest of the FEAT_RCPC2 codegen (with
> immediate offset addressing modes) but that can be done as a separate
> patch.
> Apologies for not catching this sooner, but it hasn't been in trunk long, so 
> no
> harm done.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Pushing to trunk.
> Thanks,
> Kyrill
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.h (TARGET_RCPC2): Define.
>   * config/aarch64/atomics.md
> (*aarch64_atomic_load_rcpc_zext):
>   Adjust output template.
>   (*aarch64_atomic_load_rcpc_sex): Guard on
> TARGET_RCPC2.
>   Adjust output template.
>   * config/aarch64/iterators.md (w_sz): New mode attr.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/ldapr-ext.c: Rename to...
>   * gcc.target/aarch64/ldapr-zext.c: ... This.  Fix expected assembly.
>   * gcc.target/aarch64/ldapr-sext.c: New test.


ldapur-w.patch
Description: ldapur-w.patch


Re: [PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-18 Thread Segher Boessenkool
[ Please cc: me and Ke Wen on rs6000 patches ]

On Thu, Nov 17, 2022 at 07:54:29AM +0800, Hongyu Wang wrote:
> r13-3950-g071e428c24ee8c enables O2 small loop unrolling, but it breaks
> -fno-unroll-loops for rs6000 with loop_unroll_adjust hook. Adjust the
> option handling and target hook accordingly.

NAK.

This is wrong.  -munroll-only-small-loops does not enable loop
unrolling; doing that with a machine flag is completely unmaintainable,
also for people using different targets.

Something in your patch was wrong, please fix that (or revert the
patch).  You should not have to touch config/rs6000/ at all.


Segher


Re: [PATCH 4/7] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-18 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 15:34, Jeff Law  wrote:
>
>
> On 11/17/22 16:56, Palmer Dabbelt wrote:
> > On Thu, 17 Nov 2022 15:41:26 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >>
> >> On 11/12/22 14:29, Philipp Tomsich wrote:
> >>> Users might use explicit arithmetic operations to create a mask and
> >>> then and it, in a sequence like
> >>>  cond = (bits >> SHIFT) & 1;
> >>>  mask = ~(cond - 1);
> >>>  val &= mask;
> >>> which will present as a single-bit sign-extract.
> >>>
> >>> Dependening on what combination of XVentanaCondOps and Zbs are
> >>> available, this will map to the following sequences:
> >>>   - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
> >>>   - andi + vt.maskc, if only XVentanaCondOps is available and the
> >>>  sign-extract is operating on bits 10:0 (bit
> >>> 11 can't be reached, as the immediate is
> >>> sign-extended)
> >>>   - slli + srli + and, otherwise.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> * config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
> >>>   of a single-bit followed by AND for XVentanaCondOps.
> >>>
> >>> Signed-off-by: Philipp Tomsich 
> >>> ---
> >>>
> >>>   gcc/config/riscv/xventanacondops.md | 46
> >>> +
> >>>   1 file changed, 46 insertions(+)
> >>>
> >>> diff --git a/gcc/config/riscv/xventanacondops.md
> >>> b/gcc/config/riscv/xventanacondops.md
> >>> index 7930ef1d837..3e9d5833a4b 100644
> >>> --- a/gcc/config/riscv/xventanacondops.md
> >>> +++ b/gcc/config/riscv/xventanacondops.md
> >>> @@ -73,3 +73,49 @@
> >>> "TARGET_XVENTANACONDOPS"
> >>> [(set (match_dup 5) (match_dup 1))
> >>>  (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int
> >>> 0)))
> >>> +
> >>> +;; Users might use explicit arithmetic operations to create a mask and
> >>> +;; then and it, in a sequence like
> >>
> >> Nit.  Seems like a word is missing.  "make and then and it"??
> >>
> >>
> >> Do we really care about TARGET_XVENTANACONDOPS && ! TARGET_ZBS?
> >
> > I guess that's really more of a question for the Ventana folks, but
> > assuming all the Ventana widgets have Zbs then it seems reasonable to
> > just couple them -- there's already enough options in RISC-V land to
> > test everything, might as well make sure what slips through the cracks
> > isn't being built.
>
> I'm pretty confident Ventana won't be making a part without Zbs which is
> why I raised the issue
>
>
> I also understand Philipp's position that one could explicitly turn on
> ventanacondops and zbs off and that there's a notable possibility that
> this ultimately turns into ZICondOps independent of Ventana.
>
>
> So I guess we keep it...  But it also feels like a ticking time bomb WRT
> the ability to mix and match things the way we currently allow.  I
> suspect if we were to look at the full test matrix and deeply test that
> full matrix that we'd find a number of problems where two options
> interact badly.

I have been worrying about the exponential growth of the test matrix
for 2 years now and still haven't come up with a good solution. It is
clear that this is a challenge for the entire RISC-V ecosystem and
that it needs to be addressed across vendors and across the entire
membership: unfortunately, that doesn't make for an easier path to a
solution.

And just as an aside: pure extensions are still less worrisome than
subtractive changes (think Zfinx and Zdinx), or the fact that we have
different options for the memory model (RVWMO vs. Ztso), or variations
in regard to what facilities are available for atomics...


Philipp.


Re: [PATCH 4/7] RISC-V: Recognize sign-extract + and cases for XVentanaCondOps

2022-11-18 Thread Jeff Law via Gcc-patches



On 11/17/22 16:56, Palmer Dabbelt wrote:

On Thu, 17 Nov 2022 15:41:26 PST (-0800), gcc-patches@gcc.gnu.org wrote:


On 11/12/22 14:29, Philipp Tomsich wrote:

Users might use explicit arithmetic operations to create a mask and
then and it, in a sequence like
 cond = (bits >> SHIFT) & 1;
 mask = ~(cond - 1);
 val &= mask;
which will present as a single-bit sign-extract.

Dependening on what combination of XVentanaCondOps and Zbs are
available, this will map to the following sequences:
  - bexti + vt.maskc, if both Zbs and XVentanaCondOps are present
  - andi + vt.maskc, if only XVentanaCondOps is available and the
 sign-extract is operating on bits 10:0 (bit
    11 can't be reached, as the immediate is
    sign-extended)
  - slli + srli + and, otherwise.

gcc/ChangeLog:

* config/riscv/xventanacondops.md: Recognize SIGN_EXTRACT
  of a single-bit followed by AND for XVentanaCondOps.

Signed-off-by: Philipp Tomsich 
---

  gcc/config/riscv/xventanacondops.md | 46 
+

  1 file changed, 46 insertions(+)

diff --git a/gcc/config/riscv/xventanacondops.md 
b/gcc/config/riscv/xventanacondops.md

index 7930ef1d837..3e9d5833a4b 100644
--- a/gcc/config/riscv/xventanacondops.md
+++ b/gcc/config/riscv/xventanacondops.md
@@ -73,3 +73,49 @@
    "TARGET_XVENTANACONDOPS"
    [(set (match_dup 5) (match_dup 1))
 (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 5) (const_int 
0)))

+
+;; Users might use explicit arithmetic operations to create a mask and
+;; then and it, in a sequence like


Nit.  Seems like a word is missing.  "make and then and it"??


Do we really care about TARGET_XVENTANACONDOPS && ! TARGET_ZBS?


I guess that's really more of a question for the Ventana folks, but 
assuming all the Ventana widgets have Zbs then it seems reasonable to 
just couple them -- there's already enough options in RISC-V land to 
test everything, might as well make sure what slips through the cracks 
isn't being built.


I'm pretty confident Ventana won't be making a part without Zbs which is 
why I raised the issue



I also understand Philipp's position that one could explicitly turn on 
ventanacondops and zbs off and that there's a notable possibility that 
this ultimately turns into ZICondOps independent of Ventana.



So I guess we keep it...  But it also feels like a ticking time bomb WRT 
the ability to mix and match things the way we currently allow.  I 
suspect if we were to look at the full test matrix and deeply test that 
full matrix that we'd find a number of problems where two options 
interact badly.


Jeff


Re: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

2022-11-18 Thread Andre Vieira (lists) via Gcc-patches
Sorry for the late reply on this. I was wondering though why the check 
made sense. The way I see it, SI -> SI mode is either wrong or useless. 
So why not:
if it is wrong, error (gcc_assert?) so we know it was generated wrongly 
somehow and fix it;
if it is useless, still use this pattern as we avoid an extra 
instruction (doing useless work).


Unless, you expect the backend to be 'probing' for this and the way we 
tell it not to is to not implement any pattern that allows for this? But 
somehow that doesn't feel like the right approach...


On 17/11/2022 11:30, Kyrylo Tkachov wrote:



-Original Message-
From: Richard Sandiford 
Sent: Tuesday, November 15, 2022 6:05 PM
To: Andre Simoes Dias Vieira 
Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
Richard Earnshaw 
Subject: Re: [PATCH 2/2] aarch64: Add support for widening LDAPR
instructions

"Andre Vieira (lists)"  writes:

Updated version of the patch to account for the testsuite changes in the
first patch.

On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This patch adds support for the widening LDAPR instructions.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

OK for trunk?

2022-11-09  Andre Vieira  
 Kyrylo Tkachov  

gcc/ChangeLog:

 * config/aarch64/atomics.md
(*aarch64_atomic_load_rcpc_zext): New pattern.
 (*aarch64_atomic_load_rcpc_zext): Likewise.

gcc/testsuite/ChangeLog:

 * gcc.target/aarch64/ldapr-ext.c: New test.

diff --git a/gcc/config/aarch64/atomics.md

b/gcc/config/aarch64/atomics.md

index

dc5f52ee8a4b349c0d8466a16196f83604893cbb..9670bef7d8cb2b32c5146536
d806a7e8bdffb2e3 100644

--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -704,6 +704,28 @@
}
  )

+(define_insn "*aarch64_atomic_load_rcpc_zext"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(zero_extend:GPI
+  (unspec_volatile:ALLX
+[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
+ (match_operand:SI 2 "const_int_operand")] ;;

model

+   UNSPECV_LDAP)))]
+  "TARGET_RCPC"
+  "ldapr\t%0, %1"

It would be good to add:

> 

to the condition, so that we don't provide bogus SI->SI and DI->DI
extensions.  (They shouldn't be generated, but it's better not to provide
them anyway.)


I agree. I'm pushing the attached patch to trunk.

gcc/ChangeLog:

 * config/aarch64/atomics.md 
(*aarch64_atomic_load_rcpc_zext):
 Add mode size check to condition.
 (*aarch64_atomic_load_rcpc_sext): Likewise.


Thanks,
Richard


+)
+
+(define_insn "*aarch64_atomic_load_rcpc_sext"
+  [(set (match_operand:GPI  0 "register_operand" "=r")
+(sign_extend:GPI
+  (unspec_volatile:ALLX
+[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
+ (match_operand:SI 2 "const_int_operand")] ;;

model

+   UNSPECV_LDAP)))]
+  "TARGET_RCPC"
+  "ldaprs\t%0, %1"
+)
+
  (define_insn "atomic_store"
[(set (match_operand:ALLI 0 "aarch64_rcpc_memory_operand" "=Q,Ust")
  (unspec_volatile:ALLI
diff --git a/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c

b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c

new file mode 100644
index

..aed27e06235b1d266decf11
745dacf94cc59e76d

--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -std=c99" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+#include 
+
+#pragma GCC target "+rcpc"
+
+atomic_ullong u64;
+atomic_llong s64;
+atomic_uint u32;
+atomic_int s32;
+atomic_ushort u16;
+atomic_short s16;
+atomic_uchar u8;
+atomic_schar s8;
+
+#define TEST(name, ldsize, rettype)\
+rettype\
+test_##name (void) \
+{  \
+  return atomic_load_explicit (, memory_order_acquire); \
+}
+
+/*
+**test_u8_u64:
+**...
+** ldaprb  x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u8_u64, u8, unsigned long long)
+
+/*
+**test_s8_s64:
+**...
+** ldaprsb x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s8_s64, s8, long long)
+
+/*
+**test_u16_u64:
+**...
+** ldaprh  x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u16_u64, u16, unsigned long long)
+
+/*
+**test_s16_s64:
+**...
+** ldaprsh x0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s16_s64, s16, long long)
+
+/*
+**test_u8_u32:
+**...
+** ldaprb  w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u8_u32, u8, unsigned)
+
+/*
+**test_s8_s32:
+**...
+** ldaprsb w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s8_s32, s8, int)
+
+/*
+**test_u16_u32:
+**...
+** ldaprh  w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(u16_u32, u16, unsigned)
+
+/*
+**test_s16_s32:
+**...
+** ldaprsh w0, \[x[0-9]+\]
+** ret
+*/
+
+TEST(s16_s32, s16, int)


Re: [PATCH] Add a new target hook: TARGET_START_CALL_ARGS

2022-11-18 Thread Richard Biener via Gcc-patches
On Fri, Nov 11, 2022 at 5:29 PM Richard Sandiford via Gcc-patches
 wrote:
>
> We have the following two hooks into the call expansion code:
>
> - TARGET_CALL_ARGS is called for each argument before arguments
>   are moved into hard registers.
>
> - TARGET_END_CALL_ARGS is called after the end of the call
>   sequence (specifically, after any return value has been
>   moved to a pseudo).
>
> This patch adds a TARGET_START_CALL_ARGS hook that is called before
> the TARGET_CALL_ARGS sequence.  This means that TARGET_START_CALL_REGS
> and TARGET_END_CALL_REGS bracket the region in which argument pseudos
> might be live.  They also bracket a region in which the only call
> emiitted by target-independent code is the call to the target function
> itself.  (For example, TARGET_START_CALL_ARGS happens after any use of
> memcpy to copy arguments, and TARGET_END_CALL_ARGS happens before any
> use of memcpy to copy the result.)
>
> Also, the patch adds the cumulative argument structure as an argument
> to the hooks, so that the target can use it to record and retrieve
> information about the call as a whole.
>
> The TARGET_CALL_ARGS docs said:
>
>While generating RTL for a function call, this target hook is invoked once
>for each argument passed to the function, either a register returned by
>``TARGET_FUNCTION_ARG`` or a memory location.  It is called just
> -  before the point where argument registers are stored.
>
> The last bit was true for normal calls, but for libcalls the hook was
> invoked earlier, before stack arguments have been copied.  I don't think
> this caused a practical difference for nvptx (the only port to use the
> hooks) since I wouldn't expect any libcalls to take stack parameters.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested by
> building cc1 for nvptx-none.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> * doc/gccint/target-macros/implementing-the-varargs-macros.rst:
> Add TARGET_START_CALL_ARGS.
> * doc/gccint/target-macros/tm.rst.in: Regenerate.
> * target.def (start_call_args): New hook.
> (call_args, end_call_args): Add a parameter for the cumulative
> argument information.
> * hooks.h (hook_void_rtx_tree): Delete.
> * hooks.cc (hook_void_rtx_tree): Likewise.
> * targhooks.h (hook_void_CUMULATIVE_ARGS): Declare.
> (hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
> * targhooks.cc (hook_void_CUMULATIVE_ARGS): New function.
> (hook_void_CUMULATIVE_ARGS_rtx_tree): Likewise.
> * calls.cc (expand_call): Call start_call_args before computing
> and storing stack parameters.  Pass the cumulative argument
> information to call_args and end_call_args.
> (emit_library_call_value_1): Likewise.
> * config/nvptx/nvptx.cc (nvptx_call_args): Add a cumulative
> argument parameter.
> (nvptx_end_call_args): Likewise.
> ---
>  gcc/calls.cc  | 61 ++-
>  gcc/config/nvptx/nvptx.cc |  4 +-
>  .../implementing-the-varargs-macros.rst   |  5 ++
>  gcc/doc/gccint/target-macros/tm.rst.in| 53 +---
>  gcc/hooks.cc  |  5 --
>  gcc/hooks.h   |  1 -
>  gcc/target.def| 56 +
>  gcc/targhooks.cc  | 10 +++
>  gcc/targhooks.h   |  5 +-
>  9 files changed, 140 insertions(+), 60 deletions(-)
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 51b664f1b4d..d3287bcc277 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -3542,15 +3542,26 @@ expand_call (tree exp, rtx target, int ignore)
> sibcall_failure = 1;
> }
>
> +  /* Set up the next argument register.  For sibling calls on machines
> +with register windows this should be the incoming register.  */
> +  if (pass == 0)
> +   next_arg_reg = targetm.calls.function_incoming_arg
> + (args_so_far, function_arg_info::end_marker ());
> +  else
> +   next_arg_reg = targetm.calls.function_arg
> + (args_so_far, function_arg_info::end_marker ());
> +
> +  targetm.calls.start_call_args (args_so_far);
> +
>bool any_regs = false;
>for (i = 0; i < num_actuals; i++)
> if (args[i].reg != NULL_RTX)
>   {
> any_regs = true;
> -   targetm.calls.call_args (args[i].reg, funtype);
> +   targetm.calls.call_args (args_so_far, args[i].reg, funtype);
>   }
>if (!any_regs)
> -   targetm.calls.call_args (pc_rtx, funtype);
> +   targetm.calls.call_args (args_so_far, pc_rtx, funtype);
>
>/* Figure out the register where the value, if any, will come back.  */
>valreg = 0;
> @@ -3613,15 +3624,6 @@ expand_call (tree exp, rtx target, int ignore)
>  later safely search backwards 

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-18 Thread Richard Biener via Gcc-patches
On Tue, 8 Nov 2022, Qing Zhao wrote:

> '-Wstrict-flex-arrays'
>  Warn about inproper usages of flexible array members according to
>  the LEVEL of the 'strict_flex_array (LEVEL)' attribute attached to
>  the trailing array field of a structure if it's available,
>  otherwise according to the LEVEL of the option
>  '-fstrict-flex-arrays=LEVEL'.
> 
>  This option is effective only when LEVEL is bigger than 0.
>  Otherwise, it will be ignored with a warning.
> 
>  when LEVEL=1, warnings will be issued for a trailing array
>  reference of a structure that have 2 or more elements if the
>  trailing array is referenced as a flexible array member.
> 
>  when LEVEL=2, in addition to LEVEL=1, additional warnings will be
>  issued for a trailing one-element array reference of a structure if
>  the array is referenced as a flexible array member.
> 
>  when LEVEL=3, in addition to LEVEL=2, additional warnings will be
>  issued for a trailing zero-length array reference of a structure if
>  the array is referenced as a flexible array member.
> 
> At the same time, keep -Warray-bounds=[1|2] warnings unchanged from
>  -fstrict-flex-arrays.

Looking at this, is this a way to avoid interpreting -Warray-bounds=N
together with -fstrict-flex-arrays=M?  Won't this be just confusing to
users?  Especially since -Wall includes -Warray-bounds and thus we'll
diagnose

+  if (opts->x_warn_array_bounds)
+if (opts->x_flag_strict_flex_arrays)
+  {
+   warning_at (UNKNOWN_LOCATION, 0,
+   "%<-Warray-bounds%> is not impacted by "
+   "%<-fstrict-flex-arrays%>");
+  }

and do that even when -Wstrict-flex-arrays is given?

Would it be better/possible to add a note: to existing -Warray-bounds
diagnostics on how the behavior is altered by -fstrict-flex-arrays?

I guess this will inevitably re-iterate the -fstrict-flex-arrays=N
vs. -Warray-bounds=M discussion ...

I think it would be better to stick with -Warray-bounds and flex
its =2 mode to work according to -fstrict-flex-arrays=N instead of
"out of bounds accesses to trailing struct members of one-element array 
types" (thus, not add [1] but instead the cases that are not flex 
arrays according to -fstrict-flex-arrays).

Richard.

> gcc/ChangeLog:
> 
>   * attribs.cc (strict_flex_array_level_of): New function.
>   * attribs.h (strict_flex_array_level_of): Prototype for new function.
>   * doc/invoke.texi: Document -Wstrict-flex-arrays option. Update
>   -fstrict-flex-arrays[=n] options.
>   * gimple-array-bounds.cc (array_bounds_checker::check_array_ref):
>   Issue warnings for -Wstrict-flex-arrays.
>   (get_up_bounds_for_array_ref): New function.
>   (check_out_of_bounds_and_warn): New function.
>   * opts.cc (finish_options): Issue warnings for unsupported combination
>   of -Warray-bounds and -fstrict-flex-arrays, -Wstrict_flex_arrays and
>   -fstrict-flex-array.
>   * tree-vrp.cc (execute_vrp): Enable the pass when
>   warn_strict_flex_array is true.
>   (execute_ranger_vrp): Likewise.
>   * tree.cc (array_ref_flexible_size_p): Add one new argument.
>   (component_ref_sam_type): New function.
>   (component_ref_size): Add one new argument,
>   * tree.h (array_ref_flexible_size_p): Update prototype.
>   (enum struct special_array_member): Add two new enum values.
>   (component_ref_sam_type): New prototype.
>   (component_ref_size): Update prototype.
> 
> gcc/c-family/ChangeLog:
> 
>   * c.opt (Wstrict-flex-arrays): New option.
> 
> gcc/c/ChangeLog:
> 
>   * c-decl.cc (is_flexible_array_member_p): Call new function
>   strict_flex_array_level_of.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Wstrict-flex-arrays.c: New test.
>   * c-c++-common/Wstrict-flex-arrays_2.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-2.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-3.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-4.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-5.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-6.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-7.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-8.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-9.c: New test.
>   * gcc.dg/Wstrict-flex-arrays.c: New test.
> ---
>  gcc/attribs.cc|  30 ++
>  gcc/attribs.h |   2 +
>  gcc/c-family/c.opt|   5 +
>  gcc/c/c-decl.cc   |  22 +-
>  gcc/doc/invoke.texi   |  33 ++-
>  gcc/gimple-array-bounds.cc| 264 +-
>  gcc/opts.cc   |  15 +
>  .../c-c++-common/Wstrict-flex-arrays.c|   9 +
>  .../c-c++-common/Wstrict-flex-arrays_2.c  |   9 +
>  gcc/testsuite/gcc.dg/Wstrict-flex-arrays-2.c  |  46 +++
>  

Re: [PATCH Rust front-end v3 38/46] gccrs: Add HIR to GCC GENERIC lowering entry point

2022-11-18 Thread Richard Biener via Gcc-patches
On Tue, Nov 15, 2022 at 2:46 PM Arthur Cohen  wrote:
>
>
>
> On 11/9/22 14:53, Richard Biener wrote:
> > On Wed, Oct 26, 2022 at 10:37 AM  wrote:
> >>
> >> From: Philip Herron 
> >>
> >> This patch contains the entry point and utilities used for the lowering
> >> of HIR nodes to `tree`s. It also contains a constant evaluator, ported
> >> over from the C++ frontend.
> >>
> >> Co-authored-by: David Faust 
> >> Co-authored-by: Faisal Abbas <90.abbasfai...@gmail.com>
> >> ---
> >>   gcc/rust/backend/rust-compile-context.cc | 146 
> >>   gcc/rust/backend/rust-compile-context.h  | 343 ++
> >>   gcc/rust/backend/rust-compile.cc | 414 +
> >>   gcc/rust/backend/rust-compile.h  |  47 +++
> >>   gcc/rust/backend/rust-constexpr.cc   | 441 +++
> >>   gcc/rust/backend/rust-constexpr.h|  31 ++
> >>   6 files changed, 1422 insertions(+)
> >>   create mode 100644 gcc/rust/backend/rust-compile-context.cc
> >>   create mode 100644 gcc/rust/backend/rust-compile-context.h
> >>   create mode 100644 gcc/rust/backend/rust-compile.cc
> >>   create mode 100644 gcc/rust/backend/rust-compile.h
> >>   create mode 100644 gcc/rust/backend/rust-constexpr.cc
> >>   create mode 100644 gcc/rust/backend/rust-constexpr.h
> >>
> >> diff --git a/gcc/rust/backend/rust-compile-context.cc 
> >> b/gcc/rust/backend/rust-compile-context.cc
> >> new file mode 100644
> >> index 000..cb2addf6c21
> >> --- /dev/null
> >> +++ b/gcc/rust/backend/rust-compile-context.cc
> >> @@ -0,0 +1,146 @@
> >> +// Copyright (C) 2020-2022 Free Software Foundation, Inc.
> >> +
> >> +// This file is part of GCC.
> >> +
> >> +// GCC is free software; you can redistribute it and/or modify it under
> >> +// the terms of the GNU General Public License as published by the Free
> >> +// Software Foundation; either version 3, or (at your option) any later
> >> +// version.
> >> +
> >> +// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> >> +// WARRANTY; without even the implied warranty of MERCHANTABILITY or
> >> +// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> >> +// for more details.
> >> +
> >> +// You should have received a copy of the GNU General Public License
> >> +// along with GCC; see the file COPYING3.  If not see
> >> +// .
> >> +
> >> +#include "rust-compile-context.h"
> >> +#include "rust-compile-type.h"
> >> +
> >> +namespace Rust {
> >> +namespace Compile {
> >> +
> >> +Context::Context (::Backend *backend)
> >> +  : backend (backend), resolver (Resolver::Resolver::get ()),
> >> +tyctx (Resolver::TypeCheckContext::get ()),
> >> +mappings (Analysis::Mappings::get ()), mangler (Mangler ())
> >> +{
> >> +  setup_builtins ();
> >> +}
> >> +
> >> +void
> >> +Context::setup_builtins ()
> >> +{
> >> +  auto builtins = resolver->get_builtin_types ();
> >> +  for (auto it = builtins.begin (); it != builtins.end (); it++)
> >> +{
> >> +  HirId ref;
> >> +  bool ok = tyctx->lookup_type_by_node_id ((*it)->get_node_id (), 
> >> );
> >> +  rust_assert (ok);
> >> +
> >> +  TyTy::BaseType *lookup;
> >> +  ok = tyctx->lookup_type (ref, );
> >> +  rust_assert (ok);
> >> +
> >> +  TyTyResolveCompile::compile (this, lookup);
> >> +}
> >> +}
> >> +
> >> +hashval_t
> >> +Context::type_hasher (tree type)
> >> +{
> >> +  inchash::hash hstate;
> >> +
> >> +  hstate.add_int (TREE_CODE (type));
> >> +
> >> +  if (TYPE_NAME (type))
> >> +{
> >> +  hashval_t record_name_hash
> >> +   = IDENTIFIER_HASH_VALUE (DECL_NAME (TYPE_NAME (type)));
> >> +  hstate.add_object (record_name_hash);
> >> +}
> >
> > The following does look a bit like type_hash_canon_hash.  I'll probably see 
> > what
> > we use tree type hashing for, just wondering here.
> >
> >> +  for (tree t = TYPE_ATTRIBUTES (type); t; t = TREE_CHAIN (t))
> >> +/* Just the identifier is adequate to distinguish.  */
> >> +hstate.add_object (IDENTIFIER_HASH_VALUE (TREE_PURPOSE (t)));
> >> +
> >> +  switch (TREE_CODE (type))
> >> +{
> >> +case METHOD_TYPE:
> >> +  hstate.add_object (TYPE_HASH (TYPE_METHOD_BASETYPE (type)));
> >> +  /* FALLTHROUGH. */
> >> +case FUNCTION_TYPE:
> >> +  for (tree t = TYPE_ARG_TYPES (type); t; t = TREE_CHAIN (t))
> >> +   if (TREE_VALUE (t) != error_mark_node)
> >> + hstate.add_object (TYPE_HASH (TREE_VALUE (t)));
> >> +  break;
> >> +
> >> +case OFFSET_TYPE:
> >> +  hstate.add_object (TYPE_HASH (TYPE_OFFSET_BASETYPE (type)));
> >> +  break;
> >> +
> >> +  case ARRAY_TYPE: {
> >
> > GCC coding conventions would say the { goes to the next line and indented.
> > The rust FE might intentionally diverge from that standard, if so a
> > pointer in some
> > README in rust/ would be helpful.
>
> This is not our intention. We would like to stick to the GCC coding
> convention, and use a `.clang-format` file to do so 

Re: [PATCH v2] match.pd: rewrite select to branchless expression

2022-11-18 Thread Richard Biener via Gcc-patches
On Fri, Nov 11, 2022 at 3:28 AM Michael Collison  wrote:
>
> This patches transforms ((x & 0x1) == 0) ? y : z  y -into
> (-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also
> transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
> 0x1)) & z ) op y.
>
> Matching this patterns allows GCC to generate branchless code for one of
> the functions in coremark.
>
> Bootstrapped and tested on x86 and RISC-V. Okay?

OK.

Thanks,
Richard.

> Michael.
>
> 2022-11-10  Michael Collison  
>
>  * match.pd ((x & 0x1) == 0) ? y : z  y
>  -> (-(typeof(y))(x & 0x1) & z)  y.
>
> 2022-11-10  Michael Collison 
>
>  * gcc.dg/tree-ssa/branchless-cond.c: New test.
>
> ---
>
> Changes in v2:
>
> - Rewrite comment to use C syntax
>
> - Guard against 1-bit types
>
> - Simplify pattern by using zero_one_valued_p
>
>   gcc/match.pd  | 24 +
>   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
>   2 files changed, 50 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 194ba8f5188..258531e9046 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
> (max @2 @1))
>
> +/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
> +(for op (bit_xor bit_ior)
> + (simplify
> +  (cond (eq zero_one_valued_p@0
> +integer_zerop)
> +@1
> +(op:c @2 @1))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && TYPE_PRECISION (type) > 1
> +   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> +   (op (bit_and (negate (convert:type @0)) @2) @1
> +
> +/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
> +(for op (bit_xor bit_ior)
> + (simplify
> +  (cond (ne zero_one_valued_p@0
> +integer_zerop)
> +   (op:c @2 @1)
> +@1)
> +  (if (INTEGRAL_TYPE_P (type)
> +   && TYPE_PRECISION (type) > 1
> +   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
> +   (op (bit_and (negate (convert:type @0)) @2) @1
> +
>   /* Simplifications of shift and rotates.  */
>
>   (for rotate (lrotate rrotate)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> new file mode 100644
> index 000..68087ae6568
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int f1(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z ^ y;
> +}
> +
> +int f2(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z ^ y : y;
> +}
> +
> +int f3(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z | y;
> +}
> +
> +int f4(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z | y : y;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
> --
> 2.34.1
>


Re: [PATCH] 15/19 modula2 front end: cc1gm2 additional non modula2 source files

2022-11-18 Thread Richard Biener via Gcc-patches
On Mon, Oct 10, 2022 at 5:44 PM Gaius Mulley via Gcc-patches
 wrote:
>
>
>
> This patch set contains the .h, .cc and .flex files found in
> gcc/m2.  The files are tightly coupled with the gimple interface
> (see 04-gimple-interface) and built using the rules found in
> (01-03-make).
>
>
> --8<--8<--8<--8<--8<--8<
> diff -ruw /dev/null gcc-git-devel-modula2/gcc/m2/gm2-lang.cc
> --- /dev/null   2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/gcc/m2/gm2-lang.cc2022-10-07 20:21:18.650096940 
> +0100
> @@ -0,0 +1,938 @@
> +/* gm2-lang.cc language-dependent hooks for GNU Modula-2.
> +
> +Copyright (C) 2002-2022 Free Software Foundation, Inc.
> +Contributed by Gaius Mulley .
> +
> +This file is part of GNU Modula-2.
> +
> +GNU Modula-2 is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GNU Modula-2 is distributed in the hope that it will be useful, but
> +WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GNU Modula-2; see the file COPYING.  If not, write to the
> +Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
> +02110-1301, USA.  */
> +
> +#include "gm2-gcc/gcc-consolidation.h"
> +
> +#include "langhooks-def.h" /* FIXME: for lhd_set_decl_assembler_name.  */
> +#include "tree-pass.h" /* FIXME: only for PROP_gimple_any.  */
> +#include "toplev.h"
> +#include "debug.h"
> +
> +#include "opts.h"
> +
> +#define GM2_LANG_C
> +#include "gm2-lang.h"
> +#include "m2block.h"
> +#include "dynamicstrings.h"
> +#include "m2options.h"
> +#include "m2convert.h"
> +#include "m2linemap.h"
> +#include "init.h"
> +#include "m2-tree.h"
> +#include "convert.h"
> +#include "rtegraph.h"
> +
> +static void write_globals (void);
> +
> +static int insideCppArgs = FALSE;
> +
> +#define EXPR_STMT_EXPR(NODE) TREE_OPERAND (EXPR_STMT_CHECK (NODE), 0)

This seems to be in m2-tree.h already.

> +/* start of new stuff.  */
> +
> +/* Language-dependent contents of a type.  */
> +
> +struct GTY (()) lang_type
> +{
> +  char dummy;
> +};
> +
> +/* Language-dependent contents of a decl.  */
> +
> +struct GTY (()) lang_decl
> +{
> +  char dummy;
> +};
> +
> +/* Language-dependent contents of an identifier.  This must include a
> +   tree_identifier.  */
> +
> +struct GTY (()) lang_identifier
> +{
> +  struct tree_identifier common;
> +};
> +
> +/* The resulting tree type.  */
> +
> +union GTY ((desc ("TREE_CODE (&%h.generic) == IDENTIFIER_NODE"),
> +chain_next ("CODE_CONTAINS_STRUCT (TREE_CODE (&%h.generic), "
> +"TS_COMMON) ? ((union lang_tree_node *) TREE_CHAIN "
> +"(&%h.generic)) : NULL"))) lang_tree_node
> +{
> +  union tree_node GTY ((tag ("0"),
> +desc ("tree_node_structure (&%h)"))) generic;
> +  struct lang_identifier GTY ((tag ("1"))) identifier;
> +};
> +
> +/* We don't use language_function.  */

well ...

> +struct GTY (()) language_function
> +{
> +
> +  /* While we are parsing the function, this contains information about
> +  the statement-tree that we are building.  */
> +  /* struct stmt_tree_s stmt_tree;  */
> +  tree stmt_tree;

... but this?

> +};
> +
> +/* end of new stuff.  */
> +
> +/* Language hooks.  */
> +
> +bool
> +gm2_langhook_init (void)
> +{
> +  build_common_tree_nodes (false);
> +
> +  /* I don't know why this has to be done explicitly.  */
> +  void_list_node = build_tree_list (NULL_TREE, void_type_node);

it's now done in build_common_tree_nodes

> +  build_common_builtin_nodes ();
> +
> +  /* The default precision for floating point numbers.  This is used
> + for floating point constants with abstract type.  This may eventually
> + be controllable by a command line option.  */
> +  mpfr_set_default_prec (256);
> +
> +  /* GNU Modula-2 uses exceptions.  */
> +  using_eh_for_cleanups ();
> +
> +  return true;
> +}
> +
> +/* The option mask.  */
> +
> +static unsigned int
> +gm2_langhook_option_lang_mask (void)
> +{
> +  return CL_ModulaX2;
> +}
> +
> +/* Initialize the options structure.  */
> +
> +static void
> +gm2_langhook_init_options_struct (struct gcc_options *opts)
> +{
> +  /* Default to avoiding range issues for complex multiply and divide.  */
> +  opts->x_flag_complex_method = 2;
> +
> +  /* The builtin math functions should not set errno.  */
> +  opts->x_flag_errno_math = 0;
> +  opts->frontend_set_flag_errno_math = true;
> +
> +  /* Exceptions are used to handle recovering from panics.  */
> +  opts->x_flag_exceptions = 1;
> +  opts->x_flag_non_call_exceptions = 1;

whohoo - really non-call-exceptions?

> +
> +  init_FrontEndInit ();
> +}

Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-18 Thread David Edelsohn via Gcc-patches
On Fri, Nov 18, 2022 at 7:20 AM Segher Boessenkool <
seg...@kernel.crashing.org> wrote:

> On Fri, Nov 18, 2022 at 02:35:30PM +0800, HAO CHEN GUI wrote:
> > 在 2022/11/17 21:24, David Edelsohn 写道:
> > > Why are you using zero_constant predicate instead of matching
> (const_int 0) for operand 2?
> > The "const_int 0" is an operand other than a predicate. We need a
> predicate here.
>
> Said differently, it is passed as an operand to this named pattern or
> optab, so you need a match_operand here.
>

Earlier versions of patterns for other targets used (const_int 0), but they
seem to have changed that, so match_operand is needed.

Thanks, David


>
> > > Why does this need the new all_branch_comparison_operator?  Can the
> ifcvt optimization correctly elide the 2 insn sequence?
> > Because rs6000 defines "*cbranch_2insn" insn, such insns are generated
> after expand.
> >
> > (jump_insn 50 47 51 11 (set (pc)
> > (if_then_else (ge (reg:CCFP 156)
> > (const_int 0 [0]))
> > (label_ref 53)
> > (pc)))
> "/home/guihaoc/gcc/gcc-mainline-base/gmp/mpz/cmpabs_d.c":80:7 884
> {*cbranch_2insn}
> >  (expr_list:REG_DEAD (reg:CCFP 156)
> > (int_list:REG_BR_PROB 633507684 (nil)))
> >  -> 53)
>
> But notice the cost of *cbranch_2insn -- ifcvt should never generate
> cbranchcc4 with such composite conditions!
>
> > In prepare_cmp_insn, the comparison is verified by insn_operand_matches.
> If
> > extra_insn_branch_comparison_operator is not included in "cbranchcc4"
> predicate,
> > it hits ICE here.
> >
> >   if (GET_MODE_CLASS (mode) == MODE_CC)
> > {
> >   enum insn_code icode = optab_handler (cbranch_optab, CCmode);
> >   test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> >   gcc_assert (icode != CODE_FOR_nothing
> >   && insn_operand_matches (icode, 0, test));
> >   *ptest = test;
> >   return;
> > }
> >
> > The real conditional move is generated by emit_conditional_move_1.
> Commonly
> > "*cbranch_2insn" can't be optimized out and it returns NULL_RTX.
> >
> >   if (COMPARISON_P (comparison))
> > {
> >   saved_pending_stack_adjust save;
> >   save_pending_stack_adjust ();
> >   last = get_last_insn ();
> >   do_pending_stack_adjust ();
> >   machine_mode cmpmode = comp.mode;
> >   prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > GET_CODE (comparison), NULL_RTX, unsignedp,
> > OPTAB_WIDEN, , );
> >   if (comparison)
> > {
> >rtx res = emit_conditional_move_1 (target, comparison,
> >   op2, op3, mode);
> >if (res != NULL_RTX)
> >  return res;
> > }
> >   delete_insns_since (last);
> >   restore_pending_stack_adjust ();
> >
> > I think that extra_insn_branch_comparison_operator should be included in
> > "cbranchcc4" predicates as such insns exist. And leave it to
> > emit_conditional_move which decides whether it can be optimized or not.
>
> I don't think we should pretend we have any conditional jumps the
> machine does not actually have, in cbranchcc4.  When would this ever be
> useful?  cror;beq can be quite expensive, compared to the code it would
> replace anyway.
>
> If something generates those here (which then ICEs later), that is
> wrong, fix *that*?  Is it ifcvt doing it?
>
>
> Segher
>


Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-18 Thread Segher Boessenkool
On Fri, Nov 18, 2022 at 02:35:30PM +0800, HAO CHEN GUI wrote:
> 在 2022/11/17 21:24, David Edelsohn 写道:
> > Why are you using zero_constant predicate instead of matching (const_int 0) 
> > for operand 2?
> The "const_int 0" is an operand other than a predicate. We need a predicate 
> here.

Said differently, it is passed as an operand to this named pattern or
optab, so you need a match_operand here.

> > Why does this need the new all_branch_comparison_operator?  Can the ifcvt 
> > optimization correctly elide the 2 insn sequence?
> Because rs6000 defines "*cbranch_2insn" insn, such insns are generated after 
> expand.
> 
> (jump_insn 50 47 51 11 (set (pc)
> (if_then_else (ge (reg:CCFP 156)
> (const_int 0 [0]))
> (label_ref 53)
> (pc))) 
> "/home/guihaoc/gcc/gcc-mainline-base/gmp/mpz/cmpabs_d.c":80:7 884 
> {*cbranch_2insn}
>  (expr_list:REG_DEAD (reg:CCFP 156)
> (int_list:REG_BR_PROB 633507684 (nil)))
>  -> 53)

But notice the cost of *cbranch_2insn -- ifcvt should never generate
cbranchcc4 with such composite conditions!

> In prepare_cmp_insn, the comparison is verified by insn_operand_matches. If
> extra_insn_branch_comparison_operator is not included in "cbranchcc4" 
> predicate,
> it hits ICE here.
> 
>   if (GET_MODE_CLASS (mode) == MODE_CC)
> {
>   enum insn_code icode = optab_handler (cbranch_optab, CCmode);
>   test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
>   gcc_assert (icode != CODE_FOR_nothing
>   && insn_operand_matches (icode, 0, test));
>   *ptest = test;
>   return;
> }
> 
> The real conditional move is generated by emit_conditional_move_1. Commonly
> "*cbranch_2insn" can't be optimized out and it returns NULL_RTX.
> 
>   if (COMPARISON_P (comparison))
> {
>   saved_pending_stack_adjust save;
>   save_pending_stack_adjust ();
>   last = get_last_insn ();
>   do_pending_stack_adjust ();
>   machine_mode cmpmode = comp.mode;
>   prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> GET_CODE (comparison), NULL_RTX, unsignedp,
> OPTAB_WIDEN, , );
>   if (comparison)
> {
>rtx res = emit_conditional_move_1 (target, comparison,
>   op2, op3, mode);
>if (res != NULL_RTX)
>  return res;
> }
>   delete_insns_since (last);
>   restore_pending_stack_adjust ();
> 
> I think that extra_insn_branch_comparison_operator should be included in
> "cbranchcc4" predicates as such insns exist. And leave it to
> emit_conditional_move which decides whether it can be optimized or not.

I don't think we should pretend we have any conditional jumps the
machine does not actually have, in cbranchcc4.  When would this ever be
useful?  cror;beq can be quite expensive, compared to the code it would
replace anyway.

If something generates those here (which then ICEs later), that is
wrong, fix *that*?  Is it ifcvt doing it?


Segher


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-18 Thread Richard Biener via Gcc-patches



> Am 18.11.2022 um 11:44 schrieb Jakub Jelinek :
> 
> On Fri, Nov 18, 2022 at 11:37:42AM +0100, Aldy Hernandez wrote:
>>> Practically strictly
>>> preserving IEEE exceptions is only important for a very small audience, and
>>> for that even INEXACT will matter (but we still have -ftrapping-math
>>> by default).
>>> For that audience likely all constant / range propagation is futile and 
>>> thus the
>>> easiest thing might be to simply cut that off completely?
>>> 
>>> I'd say what ranger does is reasonable with -ftrapping-math given the 
>>> current
>>> practice of handling this option.  There's no point in trying to preserve 
>>> the
>>> (by accident) "better" handling without ranger.  Instead as Joseph says 
>>> somebody
>>> would need to sit down, split -ftrapping-math, adjust the default and 
>>> thorougly
>>> document things (also with -fnon-call-exceptions which magically makes
>>> IEEE flag raising operations possibly throw exceptions).  As there's 
>>> currently
>>> no code motion barriers for FP code with respect to exception flag 
>>> inspection
>>> any dead code we preserve is likely going to be unhelpful.
>>> 
>>> So for now simply amend the documentation as to what -ftrapping-math
>>> currently means with respect to range/constant propagation?
>> 
>> So something like "Even in the presence of -ftrapping-math, VRP may fold
>> operations that may cause exceptions  For example, an addition that is
>> guaranteed to produce a NAN, may be replaced with a NAN, thus eliding the
>> addition.  This may cause any exception that may have been generated by the
>> addition to not appear in the final program."
>> 
>> ??
> 
> If we just adjust user expectations for -ftrapping-math, shouldn't we
> introduce another option that will make sure we never optimize away floating
> point operations which can trap (and probably just disable frange for that
> mode)?

I think it’s just like -frounding-math and Fenv access - the intent is there 
but the implementation is known buggy (and disabling optimizations doesn’t 
fully fix it).

Richard 

>Jakub
> 


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-18 Thread Aldy Hernandez via Gcc-patches
I wonder if instead of disabling ranger altogether, we could disable code
changes (constant propagation, jump threading and simplify_using_ranges)?
Or does that sound like too much hassle?

It seems that some passes (instruction selection?) could benefit from
global ranges being available even if no propagation was done.

Just a thought. I don't have strong opinions here.

Aldy

On Fri, Nov 18, 2022, 12:20 Aldy Hernandez  wrote:

>
>
> On 11/18/22 11:44, Jakub Jelinek wrote:
> > On Fri, Nov 18, 2022 at 11:37:42AM +0100, Aldy Hernandez wrote:
> >>> Practically strictly
> >>> preserving IEEE exceptions is only important for a very small
> audience, and
> >>> for that even INEXACT will matter (but we still have -ftrapping-math
> >>> by default).
> >>> For that audience likely all constant / range propagation is futile
> and thus the
> >>> easiest thing might be to simply cut that off completely?
> >>>
> >>> I'd say what ranger does is reasonable with -ftrapping-math given the
> current
> >>> practice of handling this option.  There's no point in trying to
> preserve the
> >>> (by accident) "better" handling without ranger.  Instead as Joseph
> says somebody
> >>> would need to sit down, split -ftrapping-math, adjust the default and
> thorougly
> >>> document things (also with -fnon-call-exceptions which magically makes
> >>> IEEE flag raising operations possibly throw exceptions).  As there's
> currently
> >>> no code motion barriers for FP code with respect to exception flag
> inspection
> >>> any dead code we preserve is likely going to be unhelpful.
> >>>
> >>> So for now simply amend the documentation as to what -ftrapping-math
> >>> currently means with respect to range/constant propagation?
> >>
> >> So something like "Even in the presence of -ftrapping-math, VRP may fold
> >> operations that may cause exceptions  For example, an addition that is
> >> guaranteed to produce a NAN, may be replaced with a NAN, thus eliding
> the
> >> addition.  This may cause any exception that may have been generated by
> the
> >> addition to not appear in the final program."
> >>
> >> ??
> >
> > If we just adjust user expectations for -ftrapping-math, shouldn't we
> > introduce another option that will make sure we never optimize away
> floating
> > point operations which can trap (and probably just disable frange for
> that
> > mode)?
>
> That seems like a big hammer, but sure.  We could change
> frange::supports_p() to return false for flag_severely_limiting_option :).
>
> Aldy
>


Re: [PATCH] Fortran: reject NULL actual argument without explicit interface [PR107576]

2022-11-18 Thread Mikael Morin

Le 17/11/2022 à 21:48, Harald Anlauf via Fortran a écrit :

Dear all,

one cannot pass a NULL actual argument to a procedure without an
explicit interface.  This is detected and reported by NAG and Intel.
(Cray accepts this silently, and some other brands ICE.)

The testcase by Gerhard even tricked gfortran into inconsistent
behavior which could lead to an ICE with -fallow-argument-mismatch,
or silently accepting invalid code.

The solution is to reject such code, see attached patch.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK.


As this is marked as a regression which started at v7,
OK for backports to open branches?


OK.


Re: [PATCH] [range-ops] Implement sqrt.

2022-11-18 Thread Aldy Hernandez via Gcc-patches




On 11/18/22 11:44, Jakub Jelinek wrote:

On Fri, Nov 18, 2022 at 11:37:42AM +0100, Aldy Hernandez wrote:

Practically strictly
preserving IEEE exceptions is only important for a very small audience, and
for that even INEXACT will matter (but we still have -ftrapping-math
by default).
For that audience likely all constant / range propagation is futile and thus the
easiest thing might be to simply cut that off completely?

I'd say what ranger does is reasonable with -ftrapping-math given the current
practice of handling this option.  There's no point in trying to preserve the
(by accident) "better" handling without ranger.  Instead as Joseph says somebody
would need to sit down, split -ftrapping-math, adjust the default and thorougly
document things (also with -fnon-call-exceptions which magically makes
IEEE flag raising operations possibly throw exceptions).  As there's currently
no code motion barriers for FP code with respect to exception flag inspection
any dead code we preserve is likely going to be unhelpful.

So for now simply amend the documentation as to what -ftrapping-math
currently means with respect to range/constant propagation?


So something like "Even in the presence of -ftrapping-math, VRP may fold
operations that may cause exceptions  For example, an addition that is
guaranteed to produce a NAN, may be replaced with a NAN, thus eliding the
addition.  This may cause any exception that may have been generated by the
addition to not appear in the final program."

??


If we just adjust user expectations for -ftrapping-math, shouldn't we
introduce another option that will make sure we never optimize away floating
point operations which can trap (and probably just disable frange for that
mode)?


That seems like a big hammer, but sure.  We could change 
frange::supports_p() to return false for flag_severely_limiting_option :).


Aldy



[PATCH v2 2/2] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-18 Thread Philipp Tomsich
Use Zbs when generating a sequence for
   "if ((a & twobits) == singlebit) ..."
that can be expressed as
   bexti + bexti + andn.

gcc/ChangeLog:

* config/riscv/bitmanip.md
(*branch_mask_twobits_equals_singlebit):
Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C
has one of these tow bits set.
* config/riscv/predicates.md (const_twobits_not_arith_operand):
New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-if_then_else-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- Convert the FAIL into a gcc_assert.
- Merge the !SMALL_OPERAND check into a new predicate.
- Some of the predicates moved into the other patch of the series due to
  the order the reviews were processed.

 gcc/config/riscv/bitmanip.md  | 42 +++
 gcc/config/riscv/predicates.md|  5 +++
 .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
 3 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d7c64270c00..be53aecbb13 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -517,3 +517,45 @@ (define_insn_and_split "*andi_extrabit"
operands[3] = GEN_INT (bits | topbit);
operands[4] = GEN_INT (~topbit);
 })
+
+;; IF_THEN_ELSE: test for 2 bits of opposite polarity
+(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
+  [(set (pc)
+   (if_then_else
+ (match_operator 1 "equality_operator"
+   [(and:X (match_operand:X 2 "register_operand" "r")
+   (match_operand:X 3 "const_twobits_not_arith_operand" "i"))
+(match_operand:X 4 "single_bit_mask_operand" "i")])
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:X 5 "="))
+   (clobber (match_scratch:X 6 "="))]
+  "TARGET_ZBS && TARGET_ZBB"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5) (zero_extract:X (match_dup 2)
+ (const_int 1)
+ (match_dup 8)))
+   (set (match_dup 6) (zero_extract:X (match_dup 2)
+ (const_int 1)
+ (match_dup 9)))
+   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
+   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
+  (label_ref (match_dup 0))
+  (pc)))]
+{
+   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
+   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
+
+   /* We should never see an unsatisfiable condition.  */
+   gcc_assert (twobits_mask & singlebit_mask);
+
+   int setbit = ctz_hwi (singlebit_mask);
+   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
+
+   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
+mode, operands[6], GEN_INT(0));
+
+   operands[8] = GEN_INT (setbit);
+   operands[9] = GEN_INT (clearbit);
+})
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 3300c0e36eb..9e2f7c9b6b3 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -296,6 +296,11 @@ (define_predicate "const_twobits_operand"
   (and (match_code "const_int")
(match_test "popcount_hwi (UINTVAL (op)) == 2")))
 
+(define_predicate "const_twobits_not_arith_operand"
+  (and (match_code "const_int")
+   (and (not (match_operand 0 "arith_operand"))
+   (match_operand 0 "const_twobits_operand"
+
 ;; A CONST_INT operand that fits into the unsigned half of a
 ;; signed-immediate after the top bit has been cleared
 (define_predicate "uimm_extra_bit_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
new file mode 100644
index 000..d249a841ff9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+
+void g();
+
+void f1 (long a)
+{
+  if ((a & ((1ul << 33) | (1 << 4))) == (1ul << 33))
+g();
+}
+
+void f2 (long a)
+{
+  if ((a & 0x12) == 0x10)
+g();
+}
+
+/* { dg-final { scan-assembler-times "bexti\t" 2 } } */
+/* { dg-final { scan-assembler-times "andn\t" 1 } } */
-- 
2.34.1



[PATCH v2 1/2] RISC-V: Use bseti/bclri/binvi to extend reach of ori/andi/xori

2022-11-18 Thread Philipp Tomsich
Sequences of the form "a | C" and "a ^ C" with C being the positive
half of a signed immediate's range with one extra bit set in addition
are mapped to ori/xori and one bseti/binvi to avoid using a temporary
(and a multi-insn sequence to load C into that temporary).

Something similar holds for "a & ~C" being representable as either
bclri + bclri or bclri + andi.

gcc/ChangeLog:

* config/riscv/bitmanip.md (*i_extrabit):
New pattern for binvi+binvi/xori and bseti+bseti/ori
(*andi_extrabit): New pattern for bclri+bclri/andi
* config/riscv/iterators.md (any_or): Match or and ior
* config/riscv/predicates.md (const_twobits_operand):
New predicate.
(uimm_extra_bit_operand): New predicate.
(uimm_extra_bit_or_twobits): New predicate.
(not_uimm_extra_bit_operand): New predicate.
(not_uimm_extra_bit_or_nottwobits): New predicate.
* config/riscv/riscv.h (UIMM_EXTRA_BIT_OPERAND):
Helper for the uimm_extra_bit_operand and
not_uimm_extra_bit_operand predicates.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bclri-02.c: New test.
* gcc.target/riscv/zbs-binvi.c: New test.
* gcc.target/riscv/zbs-bseti.c: New test.

Signed-off-by: Philipp Tomsich 
---
- This no longer depends on "RISC-V: Optimize branches testing a
  bit-range or a shifted immediate".  The other series now needs to be
  adjusted before merging.

Changes in v2:
- Collects already approved changes for v2 for (a | C) and (a ^ C).
- Pulls in the (already) approved branch on polarity-reversed bits
  for v2, as it shares predicates with the other changes.
- Newly adds support for the (a & ~C) case.
- Use an iterator for the ori/xori case and share one pattern
- Adds the andi (a & ~C) case, expanding to bclri/andi.
- Cleans up the predicates (incl. removing the non-intuitive inclusion
  of two-bits-set under the uimm_extra_bits)

 gcc/config/riscv/bitmanip.md  | 37 +++
 gcc/config/riscv/iterators.md |  8 
 gcc/config/riscv/predicates.md| 28 ++
 gcc/config/riscv/riscv.h  |  8 
 .../riscv/{zbs-bclri.c => zbs-bclri-01.c} |  0
 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c | 27 ++
 gcc/testsuite/gcc.target/riscv/zbs-binvi.c| 22 +++
 gcc/testsuite/gcc.target/riscv/zbs-bseti.c| 27 ++
 8 files changed, 157 insertions(+)
 rename gcc/testsuite/gcc.target/riscv/{zbs-bclri.c => zbs-bclri-01.c} (100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-binvi.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 2175c626ee5..d7c64270c00 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -480,3 +480,40 @@ (define_split
   "TARGET_ZBS"
   [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 
2)))
(set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
+
+;; Catch those cases where we can use a bseti/binvi + ori/xori or
+;; bseti/binvi + bseti/binvi instead of a lui + addi + or/xor sequence.
+(define_insn_and_split "*i_extrabit"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (any_or:X (match_operand:X 1 "register_operand" "r")
+ (match_operand:X 2 "uimm_extra_bit_or_twobits" "i")))]
+  "TARGET_ZBS"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (:X (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (:X (match_dup 0) (match_dup 4)))]
+{
+   unsigned HOST_WIDE_INT bits = UINTVAL (operands[2]);
+   unsigned HOST_WIDE_INT topbit = HOST_WIDE_INT_1U << floor_log2 (bits);
+
+   operands[3] = GEN_INT (bits &~ topbit);
+   operands[4] = GEN_INT (topbit);
+})
+
+;; Same to use blcri + andi and blcri + bclri
+(define_insn_and_split "*andi_extrabit"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X (match_operand:X 1 "register_operand" "r")
+  (match_operand:X 2 "not_uimm_extra_bit_or_nottwobits" "i")))]
+  "TARGET_ZBS"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (and:X (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (and:X (match_dup 0) (match_dup 4)))]
+{
+   unsigned HOST_WIDE_INT bits = UINTVAL (operands[2]);
+   unsigned HOST_WIDE_INT topbit = HOST_WIDE_INT_1U << floor_log2 (~bits);
+
+   operands[3] = GEN_INT (bits | topbit);
+   operands[4] = GEN_INT (~topbit);
+})
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 50380ecfac9..ab1f4ee8d34 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -136,6 +136,10 @@ (define_code_iterator any_shift [ashift ashiftrt lshiftrt])
 ;; from the same template.
 (define_code_iterator any_bitwise [and ior xor])
 
+;; This code iterator allows ior and xor instructions to be generated

[PATCH v2 0/2] Use Zbs with xori/ori/andi and polarity-reversed twobit-tests

2022-11-18 Thread Philipp Tomsich


We had a few patches on the list that shared predicates (for extending
the reach of xori and ori -- and for the branches on two
polarity-reversed bits) and thus depended on each other.

These all had approval with requested changes, so these are now
collected together for v2.

Note that this adds the (a & ~C) case, so please take a look on that
part and OK the updated series.



Changes in v2:
- Collects already approved changes for v2 for (a | C) and (a ^ C).
- Pulls in the (already) approved branch on polarity-reversed bits
  for v2, as it shares predicates with the other changes.
- Newly adds support for the (a & ~C) case.

Philipp Tomsich (2):
  RISC-V: Use bseti/bclri/binvi to extend reach of ori/andi/xori
  RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

 gcc/config/riscv/bitmanip.md  | 79 +++
 gcc/config/riscv/iterators.md |  8 ++
 gcc/config/riscv/predicates.md| 33 
 gcc/config/riscv/riscv.h  |  8 ++
 .../riscv/{zbs-bclri.c => zbs-bclri-01.c} |  0
 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c | 27 +++
 gcc/testsuite/gcc.target/riscv/zbs-binvi.c| 22 ++
 gcc/testsuite/gcc.target/riscv/zbs-bseti.c| 27 +++
 .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
 9 files changed, 224 insertions(+)
 rename gcc/testsuite/gcc.target/riscv/{zbs-bclri.c => zbs-bclri-01.c} (100%)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bclri-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-binvi.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bseti.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c

-- 
2.34.1



Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-18 Thread Bernhard Reutner-Fischer via Gcc-patches
On Thu, 17 Nov 2022 18:52:36 -0500
Jason Merrill  wrote:

> On 11/17/22 14:02, Bernhard Reutner-Fischer wrote:
> > On Thu, 17 Nov 2022 09:53:32 -0500
> > Jason Merrill  wrote:

> >> Instead, you want to copy the location for instantiations, i.e. check
> >> DECL_TEMPLATE_INSTANTIATION instead of !DECL_USE_TEMPLATE.  
> > 
> > No, that makes no difference.  
> 
> Hmm, when I stop there when processing the instantiation the template's 
> DECL_RESULT has the right location information, e.g. for
> 
> template  int f() { return 42; }
> 
> int main()
> {
>f();
> }
> 
> #1  0x00f950e8 in instantiate_body (pattern= 0x77ff5080 f>, args=, d= 0x7fffe971e600 f>, nested_p=false) at /home/jason/gt/gcc/cp/pt.cc:26470
> #0  start_preparsed_function (decl1=, 
> attrs=, flags=1) at /home/jason/gt/gcc/cp/decl.cc:17252
> (gdb) p expand_location (input_location)
> $13 = {file = 0x4962370 "wa.C", line = 1, column = 24, data = 0x0, sysp 
> = false}
> (gdb) p expand_location (DECL_SOURCE_LOCATION (DECL_RESULT 
> (DECL_TEMPLATE_RESULT (DECL_TI_TEMPLATE (decl1)
> $14 = {file = 0x4962370 "wa.C", line = 1, column = 20, data = 0x0, sysp 
> = false}

Yes, that works. Sorry if i was not clear: The thing in the cover
letter in this series does not work, the mini_vector reduced testcase
from the libstdc++-v3/include/ext/bitmap_allocator.h.
class template member function return type location, would that be it?

AFAIR the problem was that that these member functions get their result
decl late. When they get them, there are no
declspecs->locations[ds_type_spec] around anywhere to tuck that on the
resdecl. While the result decl is clear, there is no obvious way where
to store the ds_type_spec (somewhere in the template, as you told me).

Back then I tried moving the resdecl building from
start_preparsed_function to grokfndecl but that did not work out easily
IIRC and i ultimately gave up to move stuff around rather blindly.
I also tried to find a spot where i could store the ds_type_spec locus
somewhere in grokmethod, but i think the problem was the same, i had
just the type where i cannot store a locus and did not find a place
where i could smuggle the locus along.

So, to make that clear. Your template function (?) works:

$ XXX=1 ./xg++ -B. -S -o /dev/null ../tmp4/return-narrow-2j.cc 
../tmp4/return-narrow-2j.cc: In function ‘int f()’:
../tmp4/return-narrow-2j.cc:1:20: warning: result decl locus sample
1 | template  int f() { return 42; }
  |^~~
  |the return type
../tmp4/return-narrow-2j.cc: In function ‘int main()’:
../tmp4/return-narrow-2j.cc:3:1: warning: result decl locus sample
3 | int main()
  | ^~~
  | the return type
../tmp4/return-narrow-2j.cc: In instantiation of ‘int f() [with T = int]’:
../tmp4/return-narrow-2j.cc:5:10:   required from here
../tmp4/return-narrow-2j.cc:1:20: warning: result decl locus sample
1 | template  int f() { return 42; }
  |^~~
  |the return type


The class member fn not so much (IMHO, see attached):

$ XXX=1 ./xg++ -B. -S -o /dev/null ../tmp4/return-narrow-2.cc 
../tmp4/return-narrow-2.cc: In member function ‘const long unsigned int 
__mini_vector<  >::_M_space_left()’:
../tmp4/return-narrow-2.cc:9:3: warning: result decl locus sample
9 |   { return _M_finish != 0; }
  |   ^
  |   the return type
../tmp4/return-narrow-2.cc: In instantiation of ‘const long unsigned int 
__mini_vector<  >::_M_space_left() [with 
 = std::pair]’:
../tmp4/return-narrow-2.cc:11:17:   required from here
../tmp4/return-narrow-2.cc:9:3: warning: result decl locus sample
9 |   { return _M_finish != 0; }
  |   ^
  |   the return type
../tmp4/return-narrow-2.cc: In instantiation of ‘const long unsigned int 
__mini_vector<  >::_M_space_left() [with 
 = int]’:
../tmp4/return-narrow-2.cc:12:17:   required from here
../tmp4/return-narrow-2.cc:9:3: warning: result decl locus sample
9 |   { return _M_finish != 0; }
  |   ^
  |   the return type


> 
> > But really I'm not interested in the template case, i only mentioned
> > them because they don't work and in case somebody wanted to have correct
> > locations.
> > I remember just frustration when i looked at those a year ago.  
> 
> I'd like to get the template case right while we're looking at it.  I 
> guess I can add that myself if you're done trying.
> 
> > Is the hunk for normal functions OK for trunk?  
> 
> You also need a testcase for the desired behavior, with e.g.
> { dg-error "23:" }

I'd have to think about how to test that with trunk, yes.
There are no existing warnings that want to point to the return type,
are there?

Maybe a g++.dg/plugin/result_decl_plugin.c then.

set plugin_test_list [list
hmz. That strikes me as not all that flexible.
We could glob *_plugin.[cC][c]*, and have foo_plugin.lst contain it's
files. Whatever.

thanks,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 

Re: [PATCH] [range-ops] Implement sqrt.

2022-11-18 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 18, 2022 at 11:37:42AM +0100, Aldy Hernandez wrote:
> > Practically strictly
> > preserving IEEE exceptions is only important for a very small audience, and
> > for that even INEXACT will matter (but we still have -ftrapping-math
> > by default).
> > For that audience likely all constant / range propagation is futile and 
> > thus the
> > easiest thing might be to simply cut that off completely?
> > 
> > I'd say what ranger does is reasonable with -ftrapping-math given the 
> > current
> > practice of handling this option.  There's no point in trying to preserve 
> > the
> > (by accident) "better" handling without ranger.  Instead as Joseph says 
> > somebody
> > would need to sit down, split -ftrapping-math, adjust the default and 
> > thorougly
> > document things (also with -fnon-call-exceptions which magically makes
> > IEEE flag raising operations possibly throw exceptions).  As there's 
> > currently
> > no code motion barriers for FP code with respect to exception flag 
> > inspection
> > any dead code we preserve is likely going to be unhelpful.
> > 
> > So for now simply amend the documentation as to what -ftrapping-math
> > currently means with respect to range/constant propagation?
> 
> So something like "Even in the presence of -ftrapping-math, VRP may fold
> operations that may cause exceptions  For example, an addition that is
> guaranteed to produce a NAN, may be replaced with a NAN, thus eliding the
> addition.  This may cause any exception that may have been generated by the
> addition to not appear in the final program."
> 
> ??

If we just adjust user expectations for -ftrapping-math, shouldn't we
introduce another option that will make sure we never optimize away floating
point operations which can trap (and probably just disable frange for that
mode)?

Jakub



Re: [PATCH] [range-ops] Implement sqrt.

2022-11-18 Thread Aldy Hernandez via Gcc-patches




On 11/18/22 09:39, Richard Biener wrote:

On Thu, Nov 17, 2022 at 8:38 PM Jakub Jelinek via Gcc-patches
 wrote:


On Thu, Nov 17, 2022 at 06:59:45PM +, Joseph Myers wrote:

On Thu, 17 Nov 2022, Aldy Hernandez via Gcc-patches wrote:


So... is the optimization wrong?  Are we not allowed to substitute
that NAN if we know it's gonna happen?  Should we also allow F F F F F
in the test?  Or something else?


This seems like the usual ambiguity about what transformations
-ftrapping-math (on by default) is meant to prevent.

Generally it's understood to prevent transformations that add *or remove*
exceptions, so folding a case that raises "invalid" to a NaN (with
"invalid" no longer raised) is invalid with -ftrapping-math.  But that
doesn't tend to be applied if the operation raising the exceptions has a
result that is otherwise unused - in such a case the operation may still
be removed completely (the exception isn't properly treated as a side
effect to avoid dead code elimination; cf. Marc Glisse's -ffenv-access
patches from August 2020).  And it may often also not be applied to
"inexact".


The problem is that the above model I'm afraid is largely incompatible with
the optimizations ranger provides.
A strict model where no operations that could raise exceptions are discarded
is easy, we let frange optimize as much as it wants and just tell DCE not to
eliminate operations that can raise exceptions.
But in the model where some exceptions can be discarded if results are unused
but not others where they are used, there is no way to distinguish between
the result of the operation really isn't needed and ranger figured out a
result (or usable range of something) and therefore the result of the
operation isn't needed.
Making frange more limited with -ftrapping-math, making it punt for
operations that could raise an exception would be quite drastic
pessimization.  Perhaps for -ftrapping-math we could say no frange value is
singleton and so at least for most of operations we actually wouldn't
optimize out the whole computation when we know the result?  Still, we could
also just have
r = long_computation (x, y, z);
if (r > 42.0)
and if frange figures out that r must be [256.0, 1024.0] and never NAN, we'd
still happily optimize away the comparison.


Yes, I don't think singling out the singleton case will help.


There is also simplify_using_ranges::fold_cond() which is used by VRP 
and DOM to fold conditionals.  So twiddling frange::singleton_p will 
have no effect here since FP conditionals results are integers (f > 3.0 
is true or false).


And now that we're on this subject...

We are very careful in frange (range-op-floats.o) to avoid returning 
true/false in relational which may have a NAN.  This keeps us from 
folding conditionals that may result in a trapping NAN.


For example, if we see [if (x_5 unord_lt 10.0)...] and we know x_5 is 
[-INF, -8.0] +-NAN, this conditional is always true, but we return 
VARYING to avoid folding a NAN producing conditional.  I wonder whether 
we're being too conservative?


An alternative woudld be:
z_8 = x_5 unord_lt 10.0
goto true_side;

But if DCE is going to clean that up anyhow without regards to 
exceptions, then maybe we can fold these conditionals altogether?  If 
not in this release, then in the next one.


ISTM that range-ops should always tell the truth of what it knows, 
instead of being conservative wrt exceptions.  It should be up to the 
clients (VRP or simplify_using_ranges::fold_cond) to use the information 
correctly.



Practically strictly
preserving IEEE exceptions is only important for a very small audience, and
for that even INEXACT will matter (but we still have -ftrapping-math
by default).
For that audience likely all constant / range propagation is futile and thus the
easiest thing might be to simply cut that off completely?

I'd say what ranger does is reasonable with -ftrapping-math given the current
practice of handling this option.  There's no point in trying to preserve the
(by accident) "better" handling without ranger.  Instead as Joseph says somebody
would need to sit down, split -ftrapping-math, adjust the default and thorougly
document things (also with -fnon-call-exceptions which magically makes
IEEE flag raising operations possibly throw exceptions).  As there's currently
no code motion barriers for FP code with respect to exception flag inspection
any dead code we preserve is likely going to be unhelpful.

So for now simply amend the documentation as to what -ftrapping-math
currently means with respect to range/constant propagation?


So something like "Even in the presence of -ftrapping-math, VRP may fold 
operations that may cause exceptions  For example, an addition that is 
guaranteed to produce a NAN, may be replaced with a NAN, thus eliding 
the addition.  This may cause any exception that may have been generated 
by the addition to not appear in the final program."


??

Aldy



Re: PING^5 [PATCH] testsuite: Verify that module-mapper is available

2022-11-18 Thread Torbjorn SVENSSON via Gcc-patches




On 2022-11-18 09:14, Richard Biener wrote:

On Thu, Nov 17, 2022 at 6:09 PM Torbjorn SVENSSON via Gcc-patches
 wrote:


Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604895.html

Ok for trunk?


OK.


Pushed.




Kind regards,
Torbjörn

On 2022-11-02 19:13, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602844.html

Ok for trunk?

Kind regards,
Torbjörn

On 2022-10-25 16:24, Torbjorn SVENSSON via Gcc-patches wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603544.html

Kind regards,
Torbjörn

On 2022-10-14 09:42, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602843.html

Kind regards,
Torbjörn

On 2022-10-05 11:17, Torbjorn SVENSSON wrote:

Hi,

Ping,
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602111.html

Kind regards,
Torbjörn

On 2022-09-23 14:03, Torbjörn SVENSSON wrote:

For some test cases, it's required that the optional module mapper
"g++-mapper-server" is built. As the server is not required, the
test cases will fail if it can't be found.

gcc/testsuite/ChangeLog:

 * lib/target-supports.exp (check_is_prog_name_available):
 New.
 * lib/target-supports-dg.exp
 (dg-require-prog-name-available): New.
 * g++.dg/modules/modules.exp: Verify avilability of module
 mapper.

Signed-off-by: Torbjörn SVENSSON  
---
   gcc/testsuite/g++.dg/modules/modules.exp | 31

   gcc/testsuite/lib/target-supports-dg.exp | 15 
   gcc/testsuite/lib/target-supports.exp| 15 
   3 files changed, 61 insertions(+)

diff --git a/gcc/testsuite/g++.dg/modules/modules.exp
b/gcc/testsuite/g++.dg/modules/modules.exp
index afb323d0efd..4784803742a 100644
--- a/gcc/testsuite/g++.dg/modules/modules.exp
+++ b/gcc/testsuite/g++.dg/modules/modules.exp
@@ -279,6 +279,29 @@ proc module-init { src } {
   return $option_list
   }
+# Return 1 if requirements are met
+proc module-check-requirements { tests } {
+foreach test $tests {
+set tmp [dg-get-options $test]
+foreach op $tmp {
+switch [lindex $op 0] {
+"dg-additional-options" {
+# Example strings to match:
+# -fmodules-ts -fmodule-mapper=|@g++-mapper-server\\
-t\\ [srcdir]/inc-xlate-1.map
+# -fmodules-ts -fmodule-mapper=|@g++-mapper-server
+if [regexp -- {(^| )-fmodule-mapper=\|@([^\\ ]*)}
[lindex $op 2] dummy dummy2 prog] {
+verbose "Checking that mapper exist: $prog"
+if { ![ check_is_prog_name_available $prog ] } {
+return 0
+}
+}
+}
+}
+}
+}
+return 1
+}
+
   # cleanup any detritus from previous run
   cleanup_module_files [find $DEFAULT_REPO *.gcm]
@@ -307,6 +330,14 @@ foreach src [lsort [find $srcdir/$subdir
{*_a.[CHX}]] {
   set tests [lsort [find [file dirname $src] \
 [regsub {_a.[CHX]$} [file tail $src]
{_[a-z].[CHX]}]]]
+if { ![module-check-requirements $tests] } {
+set testcase [regsub {_a.[CH]} $src {}]
+set testcase \
+[string range $testcase [string length "$srcdir/"] end]
+unsupported $testcase
+continue
+}
+
   set std_list [module-init $src]
   foreach std $std_list {
   set mod_files {}
diff --git a/gcc/testsuite/lib/target-supports-dg.exp
b/gcc/testsuite/lib/target-supports-dg.exp
index aa2164bc789..6ce3b2b1a1b 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -683,3 +683,18 @@ proc dg-require-symver { args } {
   set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
   }
   }
+
+# If this target does not provide prog named "$args", skip this test.
+
+proc dg-require-prog-name-available { args } {
+# The args are within another list; pull them out.
+set args [lindex $args 0]
+
+set prog [lindex $args 1]
+
+if { ![ check_is_prog_name_available $prog ] } {
+upvar dg-do-what dg-do-what
+set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index 703aba412a6..c3b7a6c17b3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11928,3 +11928,18 @@ main:
   .byte 0
 } ""]
   }
+
+# Return 1 if this target has prog named "$prog", 0 otherwise.
+
+proc check_is_prog_name_available { prog } {
+global tool
+
+set options [list "additional_flags=-print-prog-name=$prog"]
+set output [lindex [${tool}_target_compile "" "" "none"
$options] 0]
+
+if { $output == $prog } {
+return 0
+}
+
+return 1
+}


Re: [PATCH] 8/19 modula2 front end: libgm2 contents

2022-11-18 Thread Richard Biener via Gcc-patches
On Mon, Oct 10, 2022 at 5:35 PM Gaius Mulley via Gcc-patches
 wrote:
>
>
>
> This patch set consists of the libgm2 makefile, autoconf sources
> necessary to build the libm2pim, libm2iso, libm2min, libm2cor
> and libm2log.

This looks OK.  I suppose it was also tested building a cross-compiler?

Can we get some up-to-date status on the build and support status for the
list of primary and secondary platforms we list on
https://gcc.gnu.org/gcc-13/criteria.html?

Thanks,
Richard.

>
> --8<--8<--8<--8<--8<--8<
> diff -ruw /dev/null gcc-git-devel-modula2/libgm2/ChangeLog
> --- /dev/null   2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/libgm2/ChangeLog  2022-10-07 20:21:18.730097923 
> +0100
> @@ -0,0 +1,506 @@
> +2022-05-18  Gaius Mulley  
> +
> +   * Corrected dates on all source files.
> +   * libm2pim/Selective.c: Reformatted comments.
> +   * libm2pim/SysExceptions.c: Reformatted comments.
> +   * libm2pim/dtoa.c: Reformatted comments.
> +   * libm2pim/ldtoa.c: Reformatted comments.
> +   * libm2pim/sckt.c: Reformatted comments.
> +   * libm2pim/termios.c: Reformatted comments.
> +   * libm2pim/wrapc.c: Reformatted comments.
> +   * libm2pim/termios.c: Reformatted comments within enum.
> +   * libm2pim/Selective.c: Correct spelling.
> +   * libm2pim/termios.c: Use GNU comment formatting.
> +
> +2022-05-17  Gaius Mulley  
> +
> +   * Corrected dates on all source files.
> +
> +2022-03-02  Gaius Mulley  
> +
> +   * libm2pim/sckt.c (tcpServerEstablishPort): Corrected spelling.
> +   (tcpServerEstablish) Corrected spelling.
> +
> +2021-06-27  Gaius Mulley  
> +
> +   * Makefile.am: renamed getopt.c to cgetopt.c.
> +
> +2021-05-29  Gaius Mulley  
> +
> +   * Makefile.in: (rebuilt).
> +   * aclocal.m4: (rebuilt).
> +   * configure: (rebuilt).
> +   * configure.ac: tidied up messages.  Removed android
> +   from the list of supported hosts.  Corrected a comment
> +   * libm2pim/Makefile.am: Conditionally build.
> +   * libm2cor/Makefile.am: Conditionally build.
> +   * libm2log/Makefile.am: Conditionally build.
> +   * libm2iso/Makefile.am: Conditionally build.
> +   * libm2cor/Makefile.in: (Rebuilt).
> +   * libm2iso/Makefile.in: (Rebuilt).
> +   * libm2log/Makefile.in: (Rebuilt).
> +   * libm2min/Makefile.in: (Rebuilt).
> +   * libm2pim/Makefile.in: (Rebuilt).
> +
> +2021-05-28  Gaius Mulley  
> +
> +   * Makefile.in: (Rebuilt).
> +   * aclocal.m4: (Rebuilt).
> +   * configure: (Rebuilt).
> +   * configure.ac: Introduce checks for supported host
> +   operating system and also known detect target architectures
> +   which are currently restricted to minimal runtime libraries.
> +   * libm2cor/Makefile.in: (Rebuilt).
> +   * libm2iso/Makefile.in: (Rebuilt).
> +   * libm2log/Makefile.in: (Rebuilt).
> +   * libm2min/Makefile.in: (Rebuilt).
> +   * libm2pim/Makefile.in: (Rebuilt).
> +
> +2021-02-12  Gaius Mulley  
> +
> +   * libm2iso/RTco.c: (threadSem) new declaration
> +and implmentation of thread semaphores used internally by
> +the m2 runtime system.
> +
> +2021-01-13  Gaius Mulley  
> +
> +   * Makefile.am: Updated dates.
> +   * aclocal.m4: (Rebuilt).
> +   * autogen.sh: Updated dates.
> +   * configure: (Rebuilt).
> +   * configure.ac: Updated dates.
> +   * libm2cor/KeyBoardLEDs.c: Updated dates.
> +   * libm2cor/Makefile.am: Updated dates.
> +   * libm2iso/ChanConsts.h: Updated dates.
> +   * libm2iso/ErrnoCategory.c: Updated dates.
> +   * libm2iso/Makefile.am: Updated dates.
> +   * libm2iso/RTco.c: Updated dates.
> +   * libm2iso/wrapsock.c: Updated dates.
> +   * libm2iso/wraptime.c: Updated dates.
> +   * libm2log/Break.c: Updated dates.
> +   * libm2log/Makefile.am: Updated dates.
> +   * libm2min/Makefile.am: Updated dates.
> +   * libm2min/libc.c: Updated dates.
> +   * libm2pim/Makefile.am: Updated dates.
> +   * libm2pim/Selective.c: Updated dates.
> +   * libm2pim/SysExceptions.c: Updated dates.
> +   * libm2pim/UnixArgs.c: Updated dates.
> +   * libm2pim/dtoa.c: Updated dates.
> +   * libm2pim/errno.c: Updated dates.
> +   * libm2pim/getopt.c: Updated dates.
> +   * libm2pim/ldtoa.c: Updated dates.
> +   * libm2pim/sckt.c: Updated dates.
> +   * libm2pim/target.c: Updated dates.
> +   * libm2pim/termios.c: Updated dates.
> +   * libm2pim/wrapc.c: Updated dates.
> +
> +2020-11-20  Gaius Mulley  
> +
> +   * Makefile.in: (Rebuilt).
> +   * aclocal.m4: (Rebuilt).
> +   * configure: (Rebuilt).
> +   * configure.ac: (libtool_VERSION=17.0.0)
> +
> +2020-06-18  Gaius Mulley  
> +
> +   * Makefile.in: (Rebuilt).
> +   * aclocal.m4: (Rebuilt).
> +   * autogen.sh: Execute automake including dependencies.
> +   * 

Re: [PATCH] 3/19 modula2 front end: gm2 driver files.

2022-11-18 Thread Richard Biener via Gcc-patches
On Mon, Oct 10, 2022 at 5:33 PM Gaius Mulley via Gcc-patches
 wrote:
>
>
>
> This patchset contains the c++, h and option related files necessary
> to build the driver program gm2.  The patch also consists of the
> autoconf/configure related build infastructure sources found in
> gcc/m2.  The reviewer might need to look at the 01-02-make patchset.
> The gm2 driver is heavily based on the fortran driver, it also adds
> the c++ libraries and modula-2 search paths and libraries depending
> upon dialect for user convenience.  Users could link modula-2 objects
> using g++ if they supply the include and link paths.
>
>
> --8<--8<--8<--8<--8<--8<
> diff -ruw /dev/null gcc-git-devel-modula2/gcc/m2/gm2spec.cc
> --- /dev/null   2022-08-24 16:22:16.88870 +0100
> +++ gcc-git-devel-modula2/gcc/m2/gm2spec.cc 2022-10-07 20:21:18.662097087 
> +0100
> @@ -0,0 +1,937 @@
> +/* gm2spec.cc specific flags and argument handling within GNU Modula-2.
> +
> +Copyright (C) 2007-2022 Free Software Foundation, Inc.
> +Contributed by Gaius Mulley .
> +
> +This file is part of GNU Modula-2.
> +
> +GNU Modula-2 is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GNU Modula-2 is distributed in the hope that it will be useful, but
> +WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GNU Modula-2; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "xregex.h"
> +#include "obstack.h"
> +#include "intl.h"
> +#include "prefix.h"
> +#include "opt-suggestions.h"
> +#include "gcc.h"
> +#include "opts.h"
> +#include "vec.h"
> +
> +#include "m2/gm2config.h"
> +
> +#ifdef HAVE_DIRENT_H
> +#include 
> +#else
> +#ifdef HAVE_SYS_NDIR_H
> +#include 
> +#endif
> +#ifdef HAVE_SYS_DIR_H
> +#include 
> +#endif
> +#ifdef HAVE_NDIR_H
> +#include 
> +#endif
> +#endif
> +
> +/* This bit is set if we saw a `-xfoo' language specification.  */
> +#define LANGSPEC   (1<<1)
> +/* This bit is set if they did `-lm' or `-lmath'.  */
> +#define MATHLIB(1<<2)
> +/* This bit is set if they did `-lc'.  */
> +#define WITHLIBC   (1<<3)
> +/* Skip this option.  */
> +#define SKIPOPT(1<<4)
> +
> +#ifndef MATH_LIBRARY
> +#define MATH_LIBRARY "m"
> +#endif
> +#ifndef MATH_LIBRARY_PROFILE
> +#define MATH_LIBRARY_PROFILE MATH_LIBRARY
> +#endif
> +
> +#ifndef LIBSTDCXX
> +#define LIBSTDCXX "stdc++"
> +#endif
> +#ifndef LIBSTDCXX_PROFILE
> +#define LIBSTDCXX_PROFILE LIBSTDCXX
> +#endif
> +#ifndef LIBSTDCXX_STATIC
> +#define LIBSTDCXX_STATIC NULL
> +#endif
> +
> +#ifndef LIBCXX
> +#define LIBCXX "c++"
> +#endif
> +#ifndef LIBCXX_PROFILE
> +#define LIBCXX_PROFILE LIBCXX
> +#endif
> +#ifndef LIBCXX_STATIC
> +#define LIBCXX_STATIC NULL
> +#endif
> +
> +#ifndef LIBCXXABI
> +#define LIBCXXABI "c++abi"
> +#endif
> +#ifndef LIBCXXABI_PROFILE
> +#define LIBCXXABI_PROFILE LIBCXXABI
> +#endif
> +#ifndef LIBCXXABI_STATIC
> +#define LIBCXXABI_STATIC NULL
> +#endif
> +
> +/* The values used here must match those of the stdlib_kind enumeration
> +   in c.opt.  */
> +enum stdcxxlib_kind
> +{
> +  USE_LIBSTDCXX = 1,
> +  USE_LIBCXX = 2
> +};
> +
> +#define DEFAULT_DIALECT "pim"
> +#undef DEBUG_ARG
> +
> +typedef enum { iso, pim, min, logitech, pimcoroutine, maxlib } libs;
> +
> +/* These are the library names which are installed as part of gm2 and reflect
> +   -flibs=name.  The -flibs= option provides the user with a short cut to add
> +   libraries without having to know the include and link path.  */
> +
> +static const char *library_name[maxlib]
> += { "m2iso", "m2pim", "m2min", "m2log", "m2cor" };
> +
> +/* They match the installed archive name for example libm2iso.a,
> +   libm2pim.a, libm2min.a, libm2log.a and libm2cor.a.  They also match a
> +   subdirectory name where the definition modules are kept.  The driver
> +   checks the argument to -flibs= for an entry in library_name or
> +   alternatively the existance of the subdirectory (to allow for third
> +   party libraries to coexist).  */
> +
> +static const char *library_abbrev[maxlib]
> += { "iso", "pim", "min", "log", "cor" };
> +
> +/* Users may specifiy -flibs=pim,iso etc which are mapped onto
> +   -flibs=m2pim,m2iso respectively.  This provides a match between
> +   the dialect of Modula-2 and the library set.  */
> +
> +static const char *add_include (const char *libpath, const char *library);
> +
> +static bool seen_scaffold_static = false;
> +static bool seen_scaffold_dynamic = false;
> +static bool scaffold_static = 

RE: [PATCH 01/35] arm: improve vcreateq* tests

2022-11-18 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 01/35] arm: improve vcreateq* tests
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vcreateq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vcreateq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vcreateq_u8.c: Likewise.
> ---
>  .../arm/mve/intrinsics/vcreateq_f16.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_f32.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_s16.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_s32.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_s64.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_s8.c  | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_u16.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_u32.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_u64.c | 23 ++-
>  .../arm/mve/intrinsics/vcreateq_u8.c  | 23 ++-
>  10 files changed, 220 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> index fb3601edb94..c39303daa03 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f16.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**   vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**   ...

Eventually I'd like to see these tests tightened to match more specific codegen 
for the tests that have only one intrinsic call in their body, but I appreciate 
the codegen for many of these is still immature and there are softfp/hard ABI 
differences as well.
This patch is definitely an improvement over what's there now though, so ok.
Thanks,
Kyrill

> +*/
>  float16x8_t
>  foo (uint64_t a, uint64_t b)
>  {
>return vcreateq_f16 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**   ...
> +**   vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**   vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**   ...
> +*/
> +float16x8_t
> +foo1 ()
> +{
> +  return vcreateq_f16 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> index 4f4da62eed7..ad66f4407cd 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_f32.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**   vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**   ...
> +*/
>  float32x4_t
>  foo (uint64_t a, uint64_t b)
>  {
>return vcreateq_f32 (a, b);
>  }
> 
> -/* { dg-final { scan-assembler "vmov"  }  } */
> +/*
> +**foo1:
> +**   ...
> +**   vmov q[0-9+]\[2\], q[0-9+]\[0\], r[0-9+], r[0-9+]
> +**   vmov q[0-9+]\[3\], q[0-9+]\[1\], r[0-9+], r[0-9+]
> +**   ...
> +*/
> +float32x4_t
> +foo1 ()
> +{
> +  return vcreateq_f32 (1, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> index 103be6310bd..7e70a486513 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcreateq_s16.c
> @@ -1,13 +1,34 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { 

RE: [PATCH][GCC] arm: Add support for new frame unwinding instruction "0xb5".

2022-11-18 Thread Srinath Parvathaneni via Gcc-patches
Hi,

> -Original Message-
> From: Ramana Radhakrishnan 
> Sent: Thursday, November 17, 2022 8:27 PM
> To: Srinath Parvathaneni 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH][GCC] arm: Add support for new frame unwinding
> instruction "0xb5".
> 
> On Thu, Nov 10, 2022 at 10:38 AM Srinath Parvathaneni via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi,
> >
> > This patch adds support for Arm frame unwinding instruction "0xb5"
> > [1]. When an exception is taken and "0xb5" instruction is encounter
> > during runtime stack-unwinding, we use effective vsp as modifier in pointer
> authentication.
> > On completion of stack unwinding if "0xb5" instruction is not
> > encountered then CFA will be used as modifier in pointer authentication.
> >
> > [1]
> > https://github.com/ARM-software/abi-
> aa/releases/download/2022Q3/ehabi3
> > 2.pdf
> >
> > Regression tested on arm-none-eabi target and found no regressions.
> >
> > Ok for master?
> >
> 
> No, not yet.
> 
> Presumably the logic to produce 0xb5 is in the source base and this was
> tested with suitable options that produce said opcode ? I see no logic in 
> place
> to produce the said opcode in the backend in a quick read as the pacbti
> patches still seem to be in review. ?
> 
> So what was the test suite run actually testing ?

Sorry for the late response, the patch supporting the said opcode (directive 
".pacspval)" is here: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605524.html (still 
under upstream review)

and the patch to encode ".pacspval" with the mentioned opcode "0xb5" in 
binutils is here:
https://sourceware.org/pipermail/binutils/2022-November/124328.html (approved 
and committed to binutils).

Regards,
Srinath.

> regards 
> Ramana
> 
> 
> > Regards,
> > Srinath.
> >
> > gcc/ChangeLog:
> >
> > 2022-11-09  Srinath Parvathaneni  
> >
> > * libgcc/config/arm/pr-support.c (__gnu_unwind_execute): Decode
> opcode
> > "0xb5".
> >
> >
> > ### Attachment also inlined for ease of reply
> ###
> >
> >
> > diff --git a/libgcc/config/arm/pr-support.c
> > b/libgcc/config/arm/pr-support.c index
> >
> e48854587c667a959aa66ccc4982231f6ecc..73e4942a39b34a83c2da85de
> f6b1
> > 3e82ec501552 100644
> > --- a/libgcc/config/arm/pr-support.c
> > +++ b/libgcc/config/arm/pr-support.c
> > @@ -107,7 +107,9 @@ __gnu_unwind_execute (_Unwind_Context *
> context, __gnu_unwind_state * uws)
> >_uw op;
> >int set_pc;
> >int set_pac = 0;
> > +  int set_pac_sp = 0;
> >_uw reg;
> > +  _uw sp;
> >
> >set_pc = 0;
> >for (;;)
> > @@ -124,10 +126,11 @@ __gnu_unwind_execute (_Unwind_Context *
> context,
> > __gnu_unwind_state * uws)  #if defined(TARGET_HAVE_PACBTI)
> >   if (set_pac)
> > {
> > - _uw sp;
> >   _uw lr;
> >   _uw pac;
> > - _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> _UVRSD_UINT32, );
> > + if (!set_pac_sp)
> > +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> _UVRSD_UINT32,
> > +);
> >   _Unwind_VRS_Get (context, _UVRSC_CORE, R_LR, _UVRSD_UINT32,
> );
> >   _Unwind_VRS_Get (context, _UVRSC_PAC, R_IP,
> >_UVRSD_UINT32, ); @@ -259,7 +262,19
> > @@ __gnu_unwind_execute (_Unwind_Context * context,
> __gnu_unwind_state * uws)
> >   continue;
> > }
> >
> > - if ((op & 0xfc) == 0xb4)  /* Obsolete FPA.  */
> > + /* Use current VSP as modifier in PAC validation.  */
> > + if (op == 0xb5)
> > +   {
> > + if (set_pac)
> > +   _Unwind_VRS_Get (context, _UVRSC_CORE, R_SP,
> _UVRSD_UINT32,
> > +);
> > + else
> > +   return _URC_FAILURE;
> > + set_pac_sp = 1;
> > + continue;
> > +   }
> > +
> > + if ((op & 0xfd) == 0xb6)  /* Obsolete FPA.  */
> > return _URC_FAILURE;
> >
> >   /* op & 0xf8 == 0xb8.  */
> >
> >
> >


Re: [PATCH]AArch64 Fix vector re-interpretation between partial SIMD modes

2022-11-18 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> Tamar Christina  writes:
>> Hi All,
>>
>> While writing a patch series I started getting incorrect codegen out from
>> VEC_PERM on partial struct types.
>>
>> It turns out that this was happening because the TARGET_CAN_CHANGE_MODE_CLASS
>> implementation has a slight bug in it.  The hook only checked for SIMD to
>> Partial but never Partial to SIMD.   This resulted in incorrect subregs to be
>> generated from the fallback code in VEC_PERM_EXPR expansions.
>>
>> I have unfortunately not been able to trigger it using a standalone testcase 
>> as
>> the mid-end optimizes away the permute every time I try to describe a permute
>> that would result in the bug.
>>
>> The patch now rejects any conversion of partial SIMD struct types, unless 
>> they
>> are both partial structures of the same number of registers or one is a SIMD
>> type who's size is less than 8 bytes.
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master? And backport to GCC 12?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64.cc (aarch64_can_change_mode_class): Restrict
>>  conversions between partial struct types properly.
>>
>> --- inline copy of patch -- 
>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> index 
>> d3c3650d7d728f56adb65154127dc7b72386c5a7..84dbe2f4ea7d03b424602ed98a34e7824217dc91
>>  100644
>> --- a/gcc/config/aarch64/aarch64.cc
>> +++ b/gcc/config/aarch64/aarch64.cc
>> @@ -26471,9 +26471,10 @@ aarch64_can_change_mode_class (machine_mode from,
>>bool from_pred_p = (from_flags & VEC_SVE_PRED);
>>bool to_pred_p = (to_flags & VEC_SVE_PRED);
>>  
>> -  bool from_full_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | 
>> VEC_STRUCT));
>>bool to_partial_advsimd_struct_p = (to_flags == (VEC_ADVSIMD | VEC_STRUCT
>> | VEC_PARTIAL));
>> +  bool from_partial_advsimd_struct_p = (from_flags == (VEC_ADVSIMD | 
>> VEC_STRUCT
>> +   | VEC_PARTIAL));
>>  
>>/* Don't allow changes between predicate modes and other modes.
>>   Only predicate registers can hold predicate modes and only
>> @@ -26496,9 +26497,23 @@ aarch64_can_change_mode_class (machine_mode from,
>>  return false;
>>  
>>/* Don't allow changes between partial and full Advanced SIMD structure
>> - modes.  */
>> -  if (from_full_advsimd_struct_p && to_partial_advsimd_struct_p)
>> -return false;
>> + modes unless both are a partial struct with the same number of 
>> registers
>> + or the vector bitsizes must be the same.  */
>> +  if (to_partial_advsimd_struct_p ^ from_partial_advsimd_struct_p)
>> +{
>> +  /* If they're both partial structures, allow if they have the same 
>> number
>> + or registers.  */
>> +  if (to_partial_advsimd_struct_p == from_partial_advsimd_struct_p)
>> +return known_eq (GET_MODE_SIZE (from), GET_MODE_SIZE (to));
>
> It looks like the ^ makes this line unreachable.  I guess it should
> be a separate top-level condition.
>
>> +  /* If one is a normal SIMD register, allow only if no larger than 
>> 64-bit.  */
>> +  if ((to_flags & VEC_ADVSIMD) == to_flags)
>> +return known_le (GET_MODE_SIZE (to), 8);
>> +  else if ((from_flags & VEC_ADVSIMD) == from_flags)
>> +return known_le (GET_MODE_SIZE (from), 8);
>> +
>> +  return false;
>> +}
>
> I don't think we need to restrict this to SIMD modes.  A plain DI would
> be OK too.  So I think it should just be:
>
> return (known_le (GET_MODE_SIZE (to), 8)
> || known_le (GET_MODE_SIZE (from, 8));

Looking again, all the other tests return false if they found a definite
problem and fall through to later code otherwise.  I think we should do
the same here.

Thanks,
Richard


Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-18 Thread Richard Sandiford via Gcc-patches
Hongtao Liu  writes:
> On Thu, Nov 17, 2022 at 9:59 PM Richard Sandiford
>  wrote:
>>
>> Hongtao Liu  writes:
>> > On Thu, Nov 17, 2022 at 5:39 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Hongtao Liu  writes:
>> >> > On Wed, Nov 16, 2022 at 1:39 AM Richard Sandiford
>> >> >  wrote:
>> >> >>
>> >> >> Tamar Christina  writes:
>> >> >> >> -Original Message-
>> >> >> >> From: Hongtao Liu 
>> >> >> >> Sent: Tuesday, November 15, 2022 9:37 AM
>> >> >> >> To: Tamar Christina 
>> >> >> >> Cc: Richard Sandiford ; Tamar Christina 
>> >> >> >> via
>> >> >> >> Gcc-patches ; nd ;
>> >> >> >> rguent...@suse.de
>> >> >> >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of 
>> >> >> >> subvectors from
>> >> >> >> arbitrary element position inside a vector
>> >> >> >>
>> >> >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina
>> >> >> >>  wrote:
>> >> >> >> >
>> >> >> >> > > -Original Message-
>> >> >> >> > > From: Hongtao Liu 
>> >> >> >> > > Sent: Tuesday, November 15, 2022 8:36 AM
>> >> >> >> > > To: Tamar Christina 
>> >> >> >> > > Cc: Richard Sandiford ; Tamar 
>> >> >> >> > > Christina
>> >> >> >> > > via Gcc-patches ; nd ;
>> >> >> >> > > rguent...@suse.de
>> >> >> >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of
>> >> >> >> > > subvectors from arbitrary element position inside a vector
>> >> >> >> > >
>> >> >> >> > > Hi:
>> >> >> >> > >   I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-
>> >> >> >> > > November/606040.html.
>> >> >> >> > > >  }
>> >> >> >> > > >
>> >> >> >> > > >/* See if we can get a better vector mode before 
>> >> >> >> > > > extracting.
>> >> >> >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
>> >> >> >> > > >
>> >> >> >> > >
>> >> >> >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600
>> >> >> >> > > 9
>> >> >> >> > > 0
>> >> >> >> > > > a453cc6a28d9 100644
>> >> >> >> > > > --- a/gcc/optabs.cc
>> >> >> >> > > > +++ b/gcc/optabs.cc
>> >> >> >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode
>> >> >> >> mode,
>> >> >> >> > > rtx v0, rtx v1,
>> >> >> >> > > >v0_qi = gen_lowpart (qimode, v0);
>> >> >> >> > > >v1_qi = gen_lowpart (qimode, v1);
>> >> >> >> > > >if (targetm.vectorize.vec_perm_const != NULL
>> >> >> >> > > > + && targetm.can_change_mode_class (mode, qimode,
>> >> >> >> > > > + ALL_REGS)
>> >> >> >> > > It looks like you want to guard gen_lowpart, shouldn't it be 
>> >> >> >> > > better
>> >> >> >> > > to use validate_subreg  or (tmp = gen_lowpart_if_possible (mode,
>> >> >> >> target_qi)).
>> >> >> >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but 
>> >> >> >> > > not
>> >> >> >> > > to guard gen_lowpart.
>> >> >> >> >
>> >> >> >> > Hmm I don't think this is quite true, there are existing usages in
>> >> >> >> > expr.cc and rtanal.cc That do this and aren't part of RA.  As I
>> >> >> >> > mentioned before for instance the canoncalization of vec_select 
>> >> >> >> > to subreg
>> >> >> >> in rtlanal for instances uses this.
>> >> >> >> In theory, we need to iterate through all reg classes that can be 
>> >> >> >> assigned for
>> >> >> >> both qimode and mode, if any regclass returns true for
>> >> >> >> targetm.can_change_mode_class, the bitcast(validate_subreg) should 
>> >> >> >> be ok.
>> >> >> >> Here we just passed ALL_REGS.
>> >> >> >
>> >> >> > Yes, and most targets where this transformation is valid return true 
>> >> >> > here.
>> >> >> >
>> >> >> > I've checked:
>> >> >> >  * alpha
>> >> >> >  * arm
>> >> >> >  * aarch64
>> >> >> >  * rs6000
>> >> >> >  * s390
>> >> >> >  * sparc
>> >> >> >  * pa
>> >> >> >  * mips
>> >> >> >
>> >> >> > And even the default example that other targets use from the 
>> >> >> > documentation
>> >> >> > would return true as the size of the modes are the same.
>> >> >> >
>> >> >> > X86 and RISCV are the only two targets that I found (but didn't 
>> >> >> > check all) that
>> >> >> > blankly return a result based on just the register classes.
>> >> >> >
>> >> >> > That is to say, there are more targets that adhere to the 
>> >> >> > interpretation that
>> >> >> > rclass here means "should be possible in some class in rclass" 
>> >> >> > rather than
>> >> >> > "should be possible in ALL classes of rclass".
>> >> >>
>> >> >> Yeah, I agree.  A query "can something stored in ALL_REGS change from
>> >> >> mode M1 to mode M2?" is meaningful if at least one register R in 
>> >> >> ALL_REGS
>> >> >> can hold both M1 and M2.  It's then the target's job to answer
>> >> >> conservatively so that the result covers all such R.
>> >> >>
>> >> >> In principle it's OK for a target to err on the side of caution and 
>> >> >> forbid
>> >> >> things that are actually OK.  But that's going to risk losing 
>> >> >> performance
>> >> >> in some cases, and sometimes that loss of performance will be 
>> >> >> unacceptable.
>> >> >> IMO that's what's happening here.  The target is applying x87 rules to
>> >> >> 

Re: [PATCH] RISC-V: branch-(not)equals-zero compares against $zero

2022-11-18 Thread Philipp Tomsich
On Fri, 18 Nov 2022 at 05:53, Palmer Dabbelt  wrote:
>
> On Thu, 17 Nov 2022 14:44:31 PST (-0800), jeffreya...@gmail.com wrote:
> >
> > On 11/8/22 12:55, Philipp Tomsich wrote:
> >> If we are testing a register or a paradoxical subreg (i.e. anything that 
> >> is not
> >> a partial subreg) for equality/non-equality with zero, we can generate a 
> >> branch
> >> that compares against $zero.  This will work for QI, HI, SI and DImode, so 
> >> we
> >> enable this for ANYI.
> >>
> >> 2020-08-30  gcc/ChangeLog:
> >>
> >>  * config/riscv/riscv.md (*branch_equals_zero): Added pattern.
> >
> > I've gone back an forth on this a few times.  As you know, I hate
> > subregs in the target descriptions and I guess I need to extend that to
> > querying if something is a subreg or not rather than just subregs
> > appearing in the RTL.
> >
> >
> > Presumably the idea behind rejecting partial subregs is the bits outside
> > the partial is unspecified, but that's also going to be true if we're
> > looking at a hardreg in QImode (for example) irrespective of it being
> > wrapped in a subreg.
> >
> >
> > I don't doubt it works the vast majority of the time, but I haven't been
> > able to convince myself it'll work all the time.  How do we ensure that
> > the bits outside the mode are zero?  I've been bitten by this kind of
> > problem before, and it's safe to say it was exceedingly painful to find.
>
> I don't really understand the middle-end issues here (if there are
> any?), but I'm pretty sure code like this has passed by a few times
> before and we've yet to find a reliable way to optimize these cases.
> There's a bunch of patterns where knowing the XLEN-extension of shorter
> values would let us generate better code, but there's also cases where
> we'd generate worse code by ensure any extension scheme is followed.
>
> Every time I've seen this come up before I've managed to convince myself
> we can't really fix the problem in the backend, though: if we always
> generate extended values in registers then we just push the cost over to
> the other patterns.  The only way I've come up with to handle something
> like this is to push more types into the middle-end so we can track
> these high bits and generate the faster sequences where we know what
> they are.  That seems like a huge mess, though, and every time it comes
> up folks run away ;)

You are perfectly right that this problem can not be fixed in the
backend, at least not in a general manner (i.e., additional patterns
can resolve some of the cases and it is messy in the backend).

In fact, we are looking at fixing this before/during lowering by
avoiding the extension whenever possible (based on the type
information and even value ranges).  However, this work will miss
GCC13 and that is the reason why the band-aid was submitted here.

> Sorry if that's kind of vague, I usually find a way to break these but
> my box isn't cooperating with GCC builds today so I haven't even gotten
> that far yet...

I am gathering the original rationale why this should be safe from our
internal communication (the change is 2 years old, after all) and will
follow up.
If you find a way to break this in the meantime, please let us know.

Philipp.


[PATCH] c++, v5: Implement C++23 P2647R1 - Permitting static constexpr variables in constexpr functions

2022-11-18 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 17, 2022 at 07:28:56PM -0500, Jason Merrill wrote:
> On 11/17/22 15:42, Jakub Jelinek wrote:
> > On Thu, Nov 17, 2022 at 07:42:40PM +0100, Jakub Jelinek via Gcc-patches 
> > wrote:
> > > I thought for older C++ this is to catch
> > > void
> > > foo ()
> > > {
> > >constexpr int a = ({ static constexpr int b = 2; b; });
> > > }
> > > and for C++23 the only 3 spots that diagnose those.
> > > But perhaps for C++20 or older we can check if the var has a context
> > > of a constexpr function (then assume cp_finish_decl errored or pedwarned
> > > already) and only error or pedwarn otherwise.
> 
> We could, but I wouldn't bother to enforce this specially for
> statement-expressions, which are already an extension.

Ok.

> OTOH, we should test that static constexpr is handled properly for lambdas,
> i.e. this should still fail:
> 
> constexpr int q = [](int i)
> { static constexpr int x = 42; return x+i; }(24);

I guess that is related on how to handle the
lambda-generic-func1.C constexpr-lambda16.C
FAILs.

Attached are 3 patches, one is just an updated version of the previous
patch with simplified constexpr.cc (and fixed the function comment
Marek talked about), this one will accept the statement
expression case with decl_constant_var_p static in it, and passes
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ \
RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='constexpr-nonlit* 
feat-cxx2b* stmtexpr19.C stmtexpr25.C lambda-generic-func1.C 
constexpr-lambda16.C'"
except for:
+FAIL: g++.dg/cpp1y/lambda-generic-func1.C  -std=c++17 (test for excess errors)
+FAIL: g++.dg/cpp1y/lambda-generic-func1.C  -std=c++20 (test for excess errors)
+FAIL: g++.dg/cpp1z/constexpr-lambda16.C  -std=c++17 (test for excess errors)
+FAIL: g++.dg/cpp1z/constexpr-lambda16.C  -std=c++20 (test for excess errors)
plus the testcase above needs to be dealt with if we want to pedwarn on it
for older C++.

The third one is if we just want something for C++23 and don't want to touch
C++20 and older at all (that one doesn't regress anything), the second one
similarly but will no longer reject the statement expression cases for C++11
.. 20.

Jakub
2022-11-18  Jakub Jelinek  

gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Bump __cpp_constexpr
value from 202207L to 202211L.
gcc/cp/
* constexpr.cc (cxx_eval_constant_expression): Implement C++23
P2647R1 - Permitting static constexpr variables in constexpr functions.
Allow DECL_EXPRs of decl_constant_var_p static or thread_local vars.
(potential_constant_expression_1): Similarly, except use
decl_maybe_constant_var_p instead of decl_constant_var_p if
processing_template_decl.
* decl.cc (diagnose_static_in_constexpr): New function.
(start_decl): Remove diagnostics of static or thread_local
vars in constexpr or consteval functions.
(cp_finish_decl): Call diagnose_static_in_constexpr.
gcc/testsuite/
* g++.dg/cpp23/constexpr-nonlit17.C: New test.
* g++.dg/cpp23/constexpr-nonlit18.C: New test.
* g++.dg/cpp23/constexpr-nonlit19.C: New test.
* g++.dg/cpp23/constexpr-nonlit20.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Adjust expected __cpp_constexpr
value.
* g++.dg/ext/stmtexpr19.C: Don't expect an error.
* g++.dg/ext/stmtexpr25.C: New test.

--- gcc/c-family/c-cppbuiltin.cc.jj 2022-11-18 09:00:17.102704379 +0100
+++ gcc/c-family/c-cppbuiltin.cc2022-11-18 09:32:00.389372850 +0100
@@ -1074,7 +1074,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
- cpp_define (pfile, "__cpp_constexpr=202207L");
+ cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
--- gcc/cp/constexpr.cc.jj  2022-11-18 09:00:17.108704295 +0100
+++ gcc/cp/constexpr.cc 2022-11-18 09:35:39.822342414 +0100
@@ -7098,7 +7098,8 @@ cxx_eval_constant_expression (const cons
&& (TREE_STATIC (r)
|| (CP_DECL_THREAD_LOCAL_P (r) && !DECL_REALLY_EXTERN (r)))
/* Allow __FUNCTION__ etc.  */
-   && !DECL_ARTIFICIAL (r))
+   && !DECL_ARTIFICIAL (r)
+   && !decl_constant_var_p (r))
  {
if (!ctx->quiet)
  {
@@ -9586,7 +9587,10 @@ potential_constant_expression_1 (tree t,
 
 case DECL_EXPR:
   tmp = DECL_EXPR_DECL (t);
-  if (VAR_P (tmp) && !DECL_ARTIFICIAL (tmp))
+  if (VAR_P (tmp) && !DECL_ARTIFICIAL (tmp)
+ && (processing_template_decl
+ ? !decl_maybe_constant_var_p (tmp)
+ : !decl_constant_var_p (tmp)))
{
  if 

  1   2   >