Re: [PATCH] rs6000: suboptimal code for returning bool value on target ppc

2023-03-17 Thread Peter Bergner via Gcc-patches
On 3/17/23 4:20 PM, Peter Bergner via Gcc-patches wrote:
> On 3/16/23 10:37 PM, Surya Kumari Jangala wrote:
>> The issue of suboptimal code exists even for integer return value and not 
>> just bool return value. See 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103784#c9 
>> So the patch would need to take care of integer return values too.
> 
> Correct.  Basically any time we return an integral type (signed or unsigned)
> type is smaller than the hard register we are returning it in, we can get 
> these
> unwanted sign/zero extensions.

I'm sorry, I didn't mean to imply every sign and zero extend is useless.
I just meant to say that there are many cases when these sign and zero extends
are not necessary.

Peter



Re: [PATCH] rs6000: Don't ICE when compiling the __builtin_vec_xst_trunc built-in [PR109178]

2023-03-17 Thread Peter Bergner via Gcc-patches
On 3/17/23 7:17 PM, Peter Bergner wrote:
> On 3/17/23 5:35 PM, Peter Bergner wrote:
>> When we expand the __builtin_vec_xst_trunc built-in, we use the wrong mode
>> for the MEM operand which causes an unrecognizable insn ICE.  The solution
>> is to use the correct TMODE mode.
>>
>> Is this ok for trunk and gcc12 assuming my bootstraps and regtests show
>> no regressions?
> 
> The trunk bootstrap and regtests were clean.  I'm still waiting on the
> backport testing to finish.
...and the gcc12 backported bootstrap and regtest were clean too.

Peter



Re: [committed] libstdc++: Add const to hash>::operator() [PR109165]

2023-03-17 Thread Nathaniel Shead via Gcc-patches
On Sat, Mar 18, 2023 at 7:36 AM Jonathan Wakely via Libstdc++
 wrote:
>
> Tested x86_64-linux. Pushed to trunk. gcc-12 backport needed too.
>
> -- >8 --
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/109165
> * include/std/coroutine (hash<>::operator()): Add const.
> * testsuite/18_support/coroutines/hash.cc: New test.
> ---
>  libstdc++-v3/include/std/coroutine|  2 +-
>  .../testsuite/18_support/coroutines/hash.cc   | 22 +++
>  2 files changed, 23 insertions(+), 1 deletion(-)
>  create mode 100644 libstdc++-v3/testsuite/18_support/coroutines/hash.cc
>
> diff --git a/libstdc++-v3/include/std/coroutine 
> b/libstdc++-v3/include/std/coroutine
> index f6e65566f10..b0ca18949db 100644
> --- a/libstdc++-v3/include/std/coroutine
> +++ b/libstdc++-v3/include/std/coroutine
> @@ -345,7 +345,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct hash>
>  {
>size_t
> -  operator()(const coroutine_handle<_Promise>& __h) noexcept
> +  operator()(const coroutine_handle<_Promise>& __h) const noexcept
>{
> return reinterpret_cast(__h.address());
>}
> diff --git a/libstdc++-v3/testsuite/18_support/coroutines/hash.cc 
> b/libstdc++-v3/testsuite/18_support/coroutines/hash.cc
> new file mode 100644
> index 000..68e5e640477
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/18_support/coroutines/hash.cc
> @@ -0,0 +1,22 @@
> +// { dg-options "-std=gnu++2a" }
> +// { dg-do run { target c++2a } }
> +
> +#include 
> +#include 
> +
> +void
> +test01()
> +{
> +  std::hash h;
> +  std::size_t v = h(std::noop_coroutine());
> +
> +  const auto& ch = h;
> +  std::size_t v2 = h(std::noop_coroutine()); // PR libstdc++/109165

Is this supposed to be `std::size_t v2 = ch(...)`?

> +
> +  VERIFY( v2 == v );
> +}
> +
> +int main()
> +{
> +  test01();
> +}
> --
> 2.39.2
>


Re: [PATCH] rs6000: Don't ICE when compiling the __builtin_vec_xst_trunc built-in [PR109178]

2023-03-17 Thread Peter Bergner via Gcc-patches
On 3/17/23 5:35 PM, Peter Bergner wrote:
> When we expand the __builtin_vec_xst_trunc built-in, we use the wrong mode
> for the MEM operand which causes an unrecognizable insn ICE.  The solution
> is to use the correct TMODE mode.
> 
> Is this ok for trunk and gcc12 assuming my bootstraps and regtests show
> no regressions?

The trunk bootstrap and regtests were clean.  I'm still waiting on the
backport testing to finish.

Peter




[committed] lra: Ignore debug insns and notes in combine_reload_insn [PR109179]

2023-03-17 Thread Peter Bergner via Gcc-patches
We ICE in combine_reload_insn if we've deleted the TO insn operand during
processing, because lra_get_insn_recog_data doesn't expect to see the note
that replaces the deleted insn.  The solution here is to exit early if TO
is a debug insn or note.

This caused a bootstrap issue on powerpc64le-linux.  The fix was approved
by Vlad.  Committed and pushed after bootstrap testing and checking there
were no testsuite regressions when compared to the commit before the
commit that caused the ICE.

Peter


gcc/
PR rtl-optimization/109179
* lra-constraints.cc (combine_reload_insn): Enforce TO is not a debug
insn or note.  Move the tests earlier to guard lra_get_insn_recog_data.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 95b534e1a70..405b8b92f5e 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5014,14 +5014,19 @@ combine_reload_insn (rtx_insn *from, rtx_insn *to)
   enum reg_class to_class, from_class;
   int n, nop;
   signed char changed_nops[MAX_RECOG_OPERANDS + 1];
-  lra_insn_recog_data_t id = lra_get_insn_recog_data (to);
-  struct lra_static_insn_data *static_id = id->insn_static_data;
   
   /* Check conditions for second memory reload and original insn:  */
   if ((targetm.secondary_memory_needed
== hook_bool_mode_reg_class_t_reg_class_t_false)
-  || NEXT_INSN (from) != to || CALL_P (to)
-  || id->used_insn_alternative == LRA_UNKNOWN_ALT
+  || NEXT_INSN (from) != to
+  || !NONDEBUG_INSN_P (to)
+  || CALL_P (to))
+return false;
+
+  lra_insn_recog_data_t id = lra_get_insn_recog_data (to);
+  struct lra_static_insn_data *static_id = id->insn_static_data;
+  
+  if (id->used_insn_alternative == LRA_UNKNOWN_ALT
   || (set = single_set (from)) == NULL_RTX)
 return false;
   from_reg = SET_DEST (set);


[pushed] c++: constant, array, lambda, template [PR108975]

2023-03-17 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

When a lambda refers to a constant local variable in the enclosing scope, we
tentatively capture it, but if we end up pulling out its constant value, we
go back at the end of the lambda and prune any unneeded captures.  Here
while parsing the template we decided that the dim capture was unneeded,
because we folded it away, but then we brought back the use in the template
trees that try to preserve the source representation with added type info.
So then when we tried to instantiate that use, we couldn't find what it was
trying to use, and crashed.

Fixed by not trying to prune when parsing a template; we'll prune at
instantiation time.

PR c++/108975

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Don't bother in a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const11.C: New test.
---
 gcc/cp/lambda.cc   |  3 +++
 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const11.C | 14 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const11.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index 212990a21bf..9925209b2ed 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1760,6 +1760,9 @@ prune_lambda_captures (tree body)
   if (LAMBDA_EXPR_DEFAULT_CAPTURE_MODE (lam) == CPLD_NONE)
 /* No default captures, and we don't prune explicit captures.  */
 return;
+  /* Don't bother pruning in a template, we'll prune at instantiation time.  */
+  if (dependent_type_p (TREE_TYPE (lam)))
+return;
 
   hash_map const_vars;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const11.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const11.C
new file mode 100644
index 000..26af75bf132
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-const11.C
@@ -0,0 +1,14 @@
+// PR c++/108975
+// { dg-do compile { target c++11 } }
+
+template
+void f() {
+  constexpr int dim = 1;
+  auto l = [&] {
+int n[dim * 1];
+  };
+  // In f, we shouldn't actually capture dim.
+  static_assert (sizeof(l) == 1, "");
+}
+
+template void f();

base-commit: 890043314a7f405081990ea9d0cb577dd44f883f
-- 
2.31.1



[PATCH] rs6000: Don't ICE when compiling the __builtin_vec_xst_trunc built-in [PR109178]

2023-03-17 Thread Peter Bergner via Gcc-patches
When we expand the __builtin_vec_xst_trunc built-in, we use the wrong mode
for the MEM operand which causes an unrecognizable insn ICE.  The solution
is to use the correct TMODE mode.

Is this ok for trunk and gcc12 assuming my bootstraps and regtests show
no regressions?

Peter


gcc/
PR target/109178
* config/rs6000/rs6000-builtin.cc (stv_expand_builtin): Use tmode.

gcc/testsuite/
PR target/109178
* gcc.target/powerpc/pr109178.c: New test.

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 737a5c42bfb..83c28cd8af3 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2933,7 +2933,7 @@ stv_expand_builtin (insn_code icode, rtx *op,
 
   rtx addr;
   if (op[1] == const0_rtx)
-   addr = gen_rtx_MEM (Pmode, op[2]);
+   addr = gen_rtx_MEM (tmode, op[2]);
   else
{
  op[1] = copy_to_mode_reg (Pmode, op[1]);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr109178.c 
b/gcc/testsuite/gcc.target/powerpc/pr109178.c
new file mode 100644
index 000..0f6e2d6b2eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr109178.c
@@ -0,0 +1,13 @@
+/* PR target/109178 */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* Verify we do not ICE on the following.  */
+
+typedef __attribute__ ((altivec (vector__))) signed __int128 v1ti_t;
+
+void
+foo (signed int *dst, v1ti_t src)
+{
+  __builtin_vec_xst_trunc(src, 0, dst);
+}


[PATCH] Fortran: procedures with BIND(C) attribute require explicit interface [PR85877]

2023-03-17 Thread Harald Anlauf via Gcc-patches
Dear all,

the Fortran standard requires an explicit procedure interface in certain
situations, such as when they have a BIND(C) attribute (F2018:15.4.2.2).
The attached patch adds a check for this.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

The PR marks this as a long-time regression, so it might be backported
to all open branches.

Thanks,
Harald

From c48c670ff0ce4f0d2ffb1d43aca2ec1bed1fa2ef Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 17 Mar 2023 22:24:49 +0100
Subject: [PATCH] Fortran: procedures with BIND(C) attribute require explicit
 interface [PR85877]

gcc/fortran/ChangeLog:

	PR fortran/85877
	* resolve.cc (resolve_fl_procedure): Check for an explicit interface
	of procedures with the BIND(C) attribute (F2018:15.4.2.2).

gcc/testsuite/ChangeLog:

	PR fortran/85877
	* gfortran.dg/pr85877.f90: New test.
---
 gcc/fortran/resolve.cc| 10 ++
 gcc/testsuite/gfortran.dg/pr85877.f90 | 17 +
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr85877.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 46585879ddc..7ec65f16240 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -13661,6 +13661,16 @@ check_formal:
 	}
 	}
 }
+
+  /* F2018:15.4.2.2 requires an explicit interface for procedures with the
+ BIND(C) attribute.  */
+  if (sym->attr.is_bind_c && sym->attr.if_source == IFSRC_UNKNOWN)
+{
+  gfc_error ("Interface of %qs at %L must be explicit",
+		 sym->name, >declared_at);
+  return false;
+}
+
   return true;
 }

diff --git a/gcc/testsuite/gfortran.dg/pr85877.f90 b/gcc/testsuite/gfortran.dg/pr85877.f90
new file mode 100644
index 000..675faac0027
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr85877.f90
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! PR fortran/85877
+! A procedure with the bind(c) attribute shall have an explicit interface
+! Contributed by G. Steinmetz
+
+subroutine p
+  bind(c) f ! { dg-error "must be explicit" }
+  x = f()
+end
+
+subroutine s
+  interface
+ function g() bind(c)
+ end function g
+  end interface
+  x = g()
+end
--
2.35.3



[pushed] c++: throw and private destructor [PR109172]

2023-03-17 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Since we aren't going through the normal call machinery, we need to check
the dtor access specifically.

PR c++/109172

gcc/cp/ChangeLog:

* except.cc (build_throw): Check dtor access.

gcc/testsuite/ChangeLog:

* g++.dg/eh/dtor4.C: New test.
---
 gcc/cp/except.cc| 10 --
 gcc/testsuite/g++.dg/eh/dtor4.C | 15 +++
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/eh/dtor4.C

diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
index 916e8189db6..91a5e049860 100644
--- a/gcc/cp/except.cc
+++ b/gcc/cp/except.cc
@@ -639,6 +639,8 @@ build_throw (location_t loc, tree exp)
   tree object, ptr;
   tree allocate_expr;
 
+  tsubst_flags_t complain = tf_warning_or_error;
+
   /* The CLEANUP_TYPE is the internal type of a destructor.  */
   if (!cleanup_type)
{
@@ -759,11 +761,15 @@ build_throw (location_t loc, tree exp)
   cleanup = NULL_TREE;
   if (type_build_dtor_call (TREE_TYPE (object)))
{
- tree dtor_fn = lookup_fnfields (TYPE_BINFO (TREE_TYPE (object)),
+ tree binfo = TYPE_BINFO (TREE_TYPE (object));
+ tree dtor_fn = lookup_fnfields (binfo,
  complete_dtor_identifier, 0,
  tf_warning_or_error);
  dtor_fn = BASELINK_FUNCTIONS (dtor_fn);
- mark_used (dtor_fn);
+ if (!mark_used (dtor_fn)
+ || !perform_or_defer_access_check (binfo, dtor_fn,
+dtor_fn, complain))
+   return error_mark_node;
  if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (object)))
{
  cxx_mark_addressable (dtor_fn);
diff --git a/gcc/testsuite/g++.dg/eh/dtor4.C b/gcc/testsuite/g++.dg/eh/dtor4.C
new file mode 100644
index 000..6c0e804fe8a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/eh/dtor4.C
@@ -0,0 +1,15 @@
+// PR c++/109172
+
+class Demo
+{
+  ~Demo();
+};
+
+int main()
+{
+  try
+{
+  throw *new Demo; // { dg-error private }
+}
+  catch(const Demo& e) { }
+}

base-commit: c48be8298c27143c1a684c0cb9689c88d16f4b49
-- 
2.31.1



Re: [PATCH] rs6000: suboptimal code for returning bool value on target ppc

2023-03-17 Thread Peter Bergner via Gcc-patches
On 3/16/23 10:37 PM, Surya Kumari Jangala wrote:
> The issue of suboptimal code exists even for integer return value and not 
> just bool return value. See 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103784#c9 
> So the patch would need to take care of integer return values too.

Correct.  Basically any time we return an integral type (signed or unsigned)
type is smaller than the hard register we are returning it in, we can get these
unwanted sign/zero extensions.

I'm sure we have quite a few bugzillas mentioning the unneeded sign/zero 
extends.

Peter




[PATCH] json: preserve key-insertion order [PR109163]

2023-03-17 Thread David Malcolm via Gcc-patches
PR other/109163 notes that when we write out JSON files, we traverse
the keys within each object via hash_map iteration, and thus the
ordering is non-deterministic - it can arbitrarily vary from run to
run and from different machines, making it harder for users to compare
results and determine if anything has "really" changed.

I'm running into this issue with SARIF output, but there are several
places where we're currently emitting JSON:

  * -fsave-optimization-record emits SRCFILE.opt-record.json.gz
"This option is experimental and the format of the data within
the compressed JSON file is subject to change."; see
optinfo-emit-json.{h,cc}, dumpfile.cc, etc
  * -fdiagnostics-format= with the various "sarif" and "json" options
  * -fdump-analyzer-json is a developer option in the analyzer
  * gcov has:
 "-j, --json-format: Output JSON intermediate format into
 .gcov.json.gz file"

This patch adds an auto_vec to class json::object to preserve
key-insertion order, and use it when writing out objects.  Potentially
this slightly slows down JSON output, but I believe that this isn't
normally a bottleneck, and that the benefits to the user of
deterministic output are worth it.

I had first attempted to use ordered_hash_map.h for this, but ran into
impenetrable template errors, so this patch uses a simpler approach of
just adding an auto_vec to json::object.

Testing showed a failure of diagnostic-format-json-5.c, which was using
a convoluted set of regexps to consume the output; I believe that this
was brittle, and was intermittently failing for some of the random
orderings of output.  I rewrote these regexps to work with the expected
output order.  The other such tests seem to pass with the
now-deterministic orderings.

Lightly tested with valgrind.
I manually verified that the SARIF output is now deterministic.
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
PR other/109163
* json.cc: Update comments to indicate that we now preserve
insertion order of keys within objects.
(object::print): Traverse keys in insertion order.
(object::set): Preserve insertion order of keys.
(selftest::test_writing_objects): Add an additional key to verify
that we preserve insertion order.
* json.h (object::m_keys): New field.

gcc/testsuite/ChangeLog:
PR other/109163
* c-c++-common/diagnostic-format-json-1.c: Update comment.
* c-c++-common/diagnostic-format-json-2.c: Likewise.
* c-c++-common/diagnostic-format-json-3.c: Likewise.
* c-c++-common/diagnostic-format-json-4.c: Likewise.
* c-c++-common/diagnostic-format-json-5.c: Rewrite regexps.
* c-c++-common/diagnostic-format-json-stderr-1.c: Update comment.

Signed-off-by: David Malcolm 
---
 gcc/json.cc   |  40 ---
 gcc/json.h|  10 +-
 .../c-c++-common/diagnostic-format-json-1.c   |   3 +-
 .../c-c++-common/diagnostic-format-json-2.c   |   3 +-
 .../c-c++-common/diagnostic-format-json-3.c   |   3 +-
 .../c-c++-common/diagnostic-format-json-4.c   |   3 +-
 .../c-c++-common/diagnostic-format-json-5.c   | 100 ++
 .../diagnostic-format-json-stderr-1.c |   3 +-
 8 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/gcc/json.cc b/gcc/json.cc
index 01879c9c466..741e97b20e5 100644
--- a/gcc/json.cc
+++ b/gcc/json.cc
@@ -31,8 +31,11 @@ using namespace json;
 /* class json::value.  */
 
 /* Dump this json::value tree to OUTF.
-   No formatting is done.  There are no guarantees about the order
-   in which the key/value pairs of json::objects are printed.  */
+
+   No formatting is done.
+
+   The key/value pairs of json::objects are printed in the order
+   in which the keys were originally inserted.  */
 
 void
 value::dump (FILE *outf) const
@@ -44,7 +47,7 @@ value::dump (FILE *outf) const
 }
 
 /* class json::object, a subclass of json::value, representing
-   an unordered collection of key/value pairs.  */
+   an ordered collection of key/value pairs.  */
 
 /* json:object's dtor.  */
 
@@ -62,14 +65,17 @@ object::~object ()
 void
 object::print (pretty_printer *pp) const
 {
-  /* Note that the order is not guaranteed.  */
   pp_character (pp, '{');
-  for (map_t::iterator it = m_map.begin (); it != m_map.end (); ++it)
+
+  /* Iterate in the order that the keys were inserted.  */
+  unsigned i;
+  const char *key;
+  FOR_EACH_VEC_ELT (m_keys, i, key)
 {
-  if (it != m_map.begin ())
+  if (i > 0)
pp_string (pp, ", ");
-  const char *key = const_cast ((*it).first);
-  value *value = (*it).second;
+  map_t _map = const_cast (m_map);
+  value *value = *mut_map.get (key);
   pp_doublequote (pp);
   pp_string (pp, key); // FIXME: escaping?
   pp_doublequote (pp);
@@ -97,9 +103,13 @@ object::set (const char *key, value *v)
   *ptr = v;
 }
  

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Andrew Waterman via Gcc-patches
On Fri, Mar 17, 2023 at 6:16 AM Philipp Tomsich
 wrote:
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener  
> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
>
> AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while
> comparing against a constant requires to load any non-zero value into
> a register first.

Equality comparisons against 0 are also slightly cheaper than equality
comparisons against 1 on x86, though it's a code-size difference, not
an instruction-count difference.  Not sure this changes the
story--just pointing out that this optimization might be slightly more
generally applicable than it initially seems.

>
> This feels a bit like we need to call onto the backend to check
> whether comparisons against 0 are cheaper.
>
> Obviously, the underlying issue become worse if the immediate can not
> be built up in a single instruction.
> Using RISC-V as an example (primarily, as RISC-V makes it particularly
> easy to run into multi-instruction sequences for constants), we can
> construct the following case:
>
>   void f(unsigned int *a)
>   {
> if ((*a += 0x900) == 0x900)
>g();
>   }
>
> which GCC 12.2.0 (trunk may already be small enough to reuse the
> constant once loaded into register, but I did not check…) with -O3
> turns into:
>
> f:
>   lw a4,0(a0)
>   li a5,4096
>   addiw a5,a5,-1792
>   addw a4,a5,a4
>   li a5,4096
>   sw a4,0(a0)
>   addi a5,a5,-1792
>   beq a4,a5,.L4
>   ret
> .L4:
>   tail g
>
> Thanks,
> Philipp.
>
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener  
> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
> >
> > We do have quite some number of bugreports with regards to making VRPs
> > life harder when splitting things this way.  It's easier for VRP to handle
> >
> >   _1 = _2 + 1;
> >   if (_1 == 1)
> >
> > than it is
> >
> >   _1 = _2 + 1;
> >   if (_2 == 0)
> >
> > where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> > the life-range issue there's other side-effects as well.  Maybe ranger 
> > meanwhile
> > can handle the above case?
> >
> > What's the overall effect of the change on a larger code base?
> >
> > Thanks,
> > Richard.
> >
> > >
> > > Example from Aarch64:
> > >
> > > Before
> > > ldr w1, [x0]
> > > add w1, w1, 1
> > > str w1, [x0]
> > > cmp w1, 1
> > > beq .L4
> > > ret
> > >
> > > After
> > > ldr w1, [x0]
> > > add w2, w1, 1
> > > str w2, [x0]
> > > cbz w1, .L4
> > > ret
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > > (forward_propagate_into_comparison_1): Optimize
> > > for zero comparisons.
> > >
> > > Signed-off-by: Manolis Tsamis 
> > > ---
> > >
> > >  gcc/tree-ssa-forwprop.cc | 41 +++-
> > >  1 file changed, 

[committed] libstdc++: Add const to hash>::operator() [PR109165]

2023-03-17 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk. gcc-12 backport needed too.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/109165
* include/std/coroutine (hash<>::operator()): Add const.
* testsuite/18_support/coroutines/hash.cc: New test.
---
 libstdc++-v3/include/std/coroutine|  2 +-
 .../testsuite/18_support/coroutines/hash.cc   | 22 +++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/18_support/coroutines/hash.cc

diff --git a/libstdc++-v3/include/std/coroutine 
b/libstdc++-v3/include/std/coroutine
index f6e65566f10..b0ca18949db 100644
--- a/libstdc++-v3/include/std/coroutine
+++ b/libstdc++-v3/include/std/coroutine
@@ -345,7 +345,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct hash>
 {
   size_t
-  operator()(const coroutine_handle<_Promise>& __h) noexcept
+  operator()(const coroutine_handle<_Promise>& __h) const noexcept
   {
return reinterpret_cast(__h.address());
   }
diff --git a/libstdc++-v3/testsuite/18_support/coroutines/hash.cc 
b/libstdc++-v3/testsuite/18_support/coroutines/hash.cc
new file mode 100644
index 000..68e5e640477
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/coroutines/hash.cc
@@ -0,0 +1,22 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  std::hash h;
+  std::size_t v = h(std::noop_coroutine());
+
+  const auto& ch = h;
+  std::size_t v2 = h(std::noop_coroutine()); // PR libstdc++/109165
+
+  VERIFY( v2 == v );
+}
+
+int main()
+{
+  test01();
+}
-- 
2.39.2



[PATCH] c++: further -Wdangling-reference refinement [PR107532]

2023-03-17 Thread Marek Polacek via Gcc-patches
Based on ,
it seems like we should treat *any* class with a reference member
as a reference wrapper.  This simplifies the code so I'm happy to
make that change.

The patch, however, does not suppress the warning in

  int i = 42;
  auto const& v = std::get<0>(std::tuple(i));

Since reference_like_class_p already checks for std::pair
maybe it could also check for std::tuple.  I don't know if we
want to make that change in GCC 13, or move -Wdangling-reference to
-Wextra for GCC 13 and perhaps move it back to -Wall in GCC 14.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/107532

gcc/cp/ChangeLog:

* call.cc (reference_like_class_p): Don't look for a constructor.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference11.C: New test.
---
 gcc/cp/call.cc| 35 +++
 .../g++.dg/warn/Wdangling-reference11.C   | 23 
 2 files changed, 35 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference11.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c01e7b82457..00d56a157b6 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13781,8 +13781,9 @@ std_pair_ref_ref_p (tree t)
 
 /* Return true if a class CTYPE is either std::reference_wrapper or
std::ref_view, or a reference wrapper class.  We consider a class
-   a reference wrapper class if it has a reference member and a
-   constructor taking the same reference type.  */
+   a reference wrapper class if it has a reference member.  We no
+   longer check that it has a constructor taking the same reference type
+   since that approach still generated too many false positives.  */
 
 static bool
 reference_like_class_p (tree ctype)
@@ -13798,31 +13799,19 @@ reference_like_class_p (tree ctype)
   if (decl_in_std_namespace_p (tdecl))
 {
   tree name = DECL_NAME (tdecl);
-  return (name
- && (id_equal (name, "reference_wrapper")
- || id_equal (name, "span")
- || id_equal (name, "ref_view")));
+  if (name
+ && (id_equal (name, "reference_wrapper")
+ || id_equal (name, "span")
+ || id_equal (name, "ref_view")))
+   return true;
 }
   for (tree fields = TYPE_FIELDS (ctype);
fields;
fields = DECL_CHAIN (fields))
-{
-  if (TREE_CODE (fields) != FIELD_DECL || DECL_ARTIFICIAL (fields))
-   continue;
-  tree type = TREE_TYPE (fields);
-  if (!TYPE_REF_P (type))
-   continue;
-  /* OK, the field is a reference member.  Do we have a constructor
-taking its type?  */
-  for (tree fn : ovl_range (CLASSTYPE_CONSTRUCTORS (ctype)))
-   {
- tree args = FUNCTION_FIRST_USER_PARMTYPE (fn);
- if (args
- && same_type_p (TREE_VALUE (args), type)
- && TREE_CHAIN (args) == void_list_node)
-   return true;
-   }
-}
+if (TREE_CODE (fields) == FIELD_DECL
+   && !DECL_ARTIFICIAL (fields)
+   && TYPE_REF_P (TREE_TYPE (fields)))
+  return true;
   return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference11.C 
b/gcc/testsuite/g++.dg/warn/Wdangling-reference11.C
new file mode 100644
index 000..667618e7196
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference11.C
@@ -0,0 +1,23 @@
+// PR c++/107532
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wdangling-reference" }
+
+struct R
+{
+int& r;
+int& get() { return r; }
+int&& rget() { return static_cast(r); }
+};
+
+int main()
+{
+int i = 42;
+int& l = R{i}.get(); // { dg-bogus "dangling reference" }
+int const& cl = R{i}.get(); // { dg-bogus "dangling reference" }
+int&& r = R{i}.rget(); // { dg-bogus "dangling reference" }
+int const&& cr = R{i}.rget(); // { dg-bogus "dangling reference" }
+(void) l;
+(void) r;
+(void) cr;
+(void) cl;
+}

base-commit: ae7190e345a8d80310835cb83b3b41ef2aeb0d37
-- 
2.39.2



Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > +   /* Drop the const attribute from the call type (the pure
> > +  attribute is not available on types).  */
> > +   tree fntype = gimple_call_fntype (call);
> > +   if (fntype && TYPE_READONLY (fntype))
> > + gimple_call_set_fntype
> > +   (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> > + & ~TYPE_QUAL_CONST)));
> 
> Sorry, now I am bit confused on why Jakub's fix did not need similar
> fixup.  The flag is only set during the profiling stage and thus I would
> expect it to still run into the problem that VOPs are missing.
> Is it only becuase we do not sanity check?

My patch started from this point ignoring all TYPE_READONLY bits on
all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
show up in any of the gimple_call_fntype types unless it is a direct
call to a const function for which we don't have a body.

In either case, vops are added on the update_stmt a few lines later.

> Here is a testcase that shows the problem:
> 
> include 
> float c;
> 
> __attribute__ ((const))
> float
> instrument(float v)
> {
> return sin (v);
> }
> __attribute__ ((no_profile_instrument_function,const,noinline))
> float noinstrument (float v)
> {
> return instrument (v);
> }
> 
> m()
> {
> c+=instrument(c);
> if (!__builtin_expect (c,1))
> {
>   c+=noinstrument (c);
> }
> c+=instrument(c);
> }
> main()
> {
> m();
> }
> 
> Compiling 
> gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage 
> -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> makes gcov to report 3 executions on instrument while with -O3 it is 2.
> 
> This is because the noinstrument has const flag that is not cleared and
> it becoames essentially non-const because it calls instrument that gets
> intrumented and partially inlined.  

I thought tree-profile.cc right now clears const on all const functions
with body, or is that
!(!node->clone_of || node->decl != node->clone_of->decl)
restricting it to instrumented functions only?

Jakub



Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-17 Thread Jan Hubicka via Gcc-patches
> 
> The following is what I profile-bootstrapped and tested on 
> x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Thu, 16 Mar 2023 13:51:19 +0100
> Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
> To: gcc-patches@gcc.gnu.org
> 
> The following makes sure that after clearing pure/const from
> instrumented function declarations we are adjusting call statements
> fntype as well to handle indirect calls and because gimple_call_flags
> looks at both decl and fntype.
> 
> Like the pure/const flag clearing on decls we refrain from touching
> calls to known functions that do not have a body in the current TU.
> 
>   PR tree-optimization/106912
>   * tree-profile.cc (tree_profiling): Update stmts only when
>   profiling or testing coverage.  Make sure to update calls
>   fntype, stripping 'const' there.
> 
>   * gcc.dg/profile-generate-4.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/profile-generate-4.c | 19 
>  gcc/tree-profile.cc   | 38 +--
>  2 files changed, 47 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
> 
> diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c 
> b/gcc/testsuite/gcc.dg/profile-generate-4.c
> new file mode 100644
> index 000..c2b999fe4cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
> @@ -0,0 +1,19 @@
> +/* PR106912 */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
> +
> +__attribute__ ((__simd__))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
> +double foo (double x);
> +
> +void bar(double *f, int n)
> +{
> +  int i;
> +  for (i = 0; i < n; i++)
> +f[i] = foo(f[i]);
> +}
> +
> +double foo(double x)
> +{
> +  return x * x / 3.0;
> +}
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index ea9d7a23443..7854cd4bc31 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -835,16 +835,34 @@ tree_profiling (void)
>  
>push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>  
> -  FOR_EACH_BB_FN (bb, cfun)
> - {
> -   gimple_stmt_iterator gsi;
> -   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> - {
> -   gimple *stmt = gsi_stmt (gsi);
> -   if (is_gimple_call (stmt))
> - update_stmt (stmt);
> - }
> - }
> +  if (profile_arc_flag || flag_test_coverage)
> + FOR_EACH_BB_FN (bb, cfun)
> +   {
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> +   {
> + gcall *call = dyn_cast  (gsi_stmt (gsi));
> + if (!call)
> +   continue;
> +
> + /* We do not clear pure/const on decls without body.  */
> + tree fndecl = gimple_call_fndecl (call);
> + if (fndecl && !gimple_has_body_p (fndecl))
> +   continue;
> +
> + /* Drop the const attribute from the call type (the pure
> +attribute is not available on types).  */
> + tree fntype = gimple_call_fntype (call);
> + if (fntype && TYPE_READONLY (fntype))
> +   gimple_call_set_fntype
> + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> +   & ~TYPE_QUAL_CONST)));

Sorry, now I am bit confused on why Jakub's fix did not need similar
fixup.  The flag is only set during the profiling stage and thus I would
expect it to still run into the problem that VOPs are missing.
Is it only becuase we do not sanity check?

Here is a testcase that shows the problem:

include 
float c;

__attribute__ ((const))
float
instrument(float v)
{
return sin (v);
}
__attribute__ ((no_profile_instrument_function,const,noinline))
float noinstrument (float v)
{
return instrument (v);
}

m()
{
c+=instrument(c);
if (!__builtin_expect (c,1))
{
  c+=noinstrument (c);
}
c+=instrument(c);
}
main()
{
m();
}

Compiling 
gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage 
-S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
makes gcov to report 3 executions on instrument while with -O3 it is 2.

This is because the noinstrument has const flag that is not cleared and
it becoames essentially non-const because it calls instrument that gets
intrumented and partially inlined.  

Honza
> +
> + /* Update virtual operands of calls to no longer const/pure
> +functions.  */
> + update_stmt (call);
> +   }
> +   }
>  
>/* re-merge split blocks.  */
>cleanup_tree_cfg ();
> -- 
> 2.35.3
> 


Re: [PATCH V4] Rework 128-bit complex multiply and divide.

2023-03-17 Thread Segher Boessenkool
Hi!

On Thu, Mar 09, 2023 at 08:40:36PM -0500, Michael Meissner wrote:
>   PR target/109067
>   * config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
>   (init_float128_ieee): Delete code to switch complex multiply and divide
>   for long double.
>   (complex_multiply_builtin_code): New helper function.
>   (complex_divide_builtin_code): Likewise.
>   (rs6000_mangle_decl_assembler_name): Add support for mangling the name
>   of complex 128-bit multiply and divide built-in functions.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/divic3-1.c
> +/* { dg-final { scan-assembler "__divtc3" } } */

/* { dg-final { scan-assembler {\m__divtc3\M} } } */

It might well be that we can use a sloppier regexp here, but why would
we do that?  It is a good thing to use the \m and \M constraint escapes
pretty much always.

Similar for the other three testcases of course.

This patch is okay for trunk, if you have tested it on all
configurations (powerpc-linux, powerpc64-linux, powerpc64le-linux with
and without default IEEE128 long double at least).  Thank you!

Does this need backports?


Segher


Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 17, 2023 at 08:09:17PM +0100, Jan Hubicka wrote:
> > 
> > I have in the meantime briefly tested following.
> > 
> > But if you want to the above way, then at least the testcase could be
> > useful.  Though, not sure if the above is all that is needed.  Shouldn't
> > set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
> > TREE_TYPE on the function to
> >   tree fntype = TREE_TYPE (node->decl);
> >   TREE_TYPE (node->decl)
> > = build_qualified_type (fntype,
> > TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
> > too (perhaps guarded on TREE_READONLY (fntype))?
> > 
> > 2023-03-16  Jakub Jelinek  
> > 
> > PR tree-optimization/106912
> > gcc/
> > * calls.h (ignore_const_fntype): Declare.
> > * calls.cc (flags_from_decl_or_type): Ignore TYPE_READONLY
> > on types if ignore_const_fntype.
> > * tree-profile.cc: Include calls.h.
> > (tree_profiling): Set ignore_const_fntype if profile_arc_flag
> > or flag_test_coverage.
> > gcc/lto/
> > * lto-lang.cc (lto_post_options): Set ignore_const_fntype if
> > profile_arc_flag or flag_test_coverage and not flag_auto_profile.
> > gcc/testsuite/
> > * gcc.dg/pr106912.c: New test.
> > 
> > --- gcc/calls.h.jj  2023-01-02 09:32:51.252868185 +0100
> > +++ gcc/calls.h 2023-03-16 12:23:51.632460586 +0100
> > @@ -134,5 +134,6 @@ extern void maybe_complain_about_tail_ca
> >  
> >  extern rtx rtx_for_static_chain (const_tree, bool);
> >  extern bool cxx17_empty_base_field_p (const_tree);
> > +extern bool ignore_const_fntype;
> >  
> >  #endif // GCC_CALLS_H
> > --- gcc/calls.cc.jj 2023-02-21 11:44:48.460464845 +0100
> > +++ gcc/calls.cc2023-03-16 12:27:45.427032110 +0100
> > @@ -800,6 +800,13 @@ is_tm_builtin (const_tree fndecl)
> >return false;
> >  }
> >  
> > +/* Set if flags_from_decl_or_type should ignore TYPE_READONLY of function
> > +   types.  This is used when tree-profile.cc instruments const calls,
> > +   clears TREE_READONLY on FUNCTION_DECLs which have been instrumented, but
> > +   for function types or indirect calls we don't know if the callees have 
> > been
> > +   instrumented or not.  */
> > +bool ignore_const_fntype;
> > +
> >  /* Detect flags (function attributes) from the function decl or type node. 
> >  */
> >  
> >  int
> > @@ -849,7 +856,7 @@ flags_from_decl_or_type (const_tree exp)
> >  }
> >else if (TYPE_P (exp))
> >  {
> > -  if (TYPE_READONLY (exp))
> > +  if (TYPE_READONLY (exp) && !ignore_const_fntype)
> 
> I think we want to ignore PURE flag too: if callee modifies counters
> seen by caller, we need to take that into account when optimizing it.
> 
> It is not very pretty to have global flag for prifled state, but I can't
> think of scenario where this would break, so perhaps it is good way to
> go?

Note, Richi posted what I think is much cleaner patch in
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614074.html

Jakub



Re: [PATCH] tree-optimization/106912 - IPA profile and pure/const

2023-03-17 Thread Jan Hubicka via Gcc-patches
> 
> I have in the meantime briefly tested following.
> 
> But if you want to the above way, then at least the testcase could be
> useful.  Though, not sure if the above is all that is needed.  Shouldn't
> set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
> TREE_TYPE on the function to
> tree fntype = TREE_TYPE (node->decl);
> TREE_TYPE (node->decl)
>   = build_qualified_type (fntype,
>   TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
> too (perhaps guarded on TREE_READONLY (fntype))?
> 
> 2023-03-16  Jakub Jelinek  
> 
>   PR tree-optimization/106912
> gcc/
>   * calls.h (ignore_const_fntype): Declare.
>   * calls.cc (flags_from_decl_or_type): Ignore TYPE_READONLY
>   on types if ignore_const_fntype.
>   * tree-profile.cc: Include calls.h.
>   (tree_profiling): Set ignore_const_fntype if profile_arc_flag
>   or flag_test_coverage.
> gcc/lto/
>   * lto-lang.cc (lto_post_options): Set ignore_const_fntype if
>   profile_arc_flag or flag_test_coverage and not flag_auto_profile.
> gcc/testsuite/
>   * gcc.dg/pr106912.c: New test.
> 
> --- gcc/calls.h.jj2023-01-02 09:32:51.252868185 +0100
> +++ gcc/calls.h   2023-03-16 12:23:51.632460586 +0100
> @@ -134,5 +134,6 @@ extern void maybe_complain_about_tail_ca
>  
>  extern rtx rtx_for_static_chain (const_tree, bool);
>  extern bool cxx17_empty_base_field_p (const_tree);
> +extern bool ignore_const_fntype;
>  
>  #endif // GCC_CALLS_H
> --- gcc/calls.cc.jj   2023-02-21 11:44:48.460464845 +0100
> +++ gcc/calls.cc  2023-03-16 12:27:45.427032110 +0100
> @@ -800,6 +800,13 @@ is_tm_builtin (const_tree fndecl)
>return false;
>  }
>  
> +/* Set if flags_from_decl_or_type should ignore TYPE_READONLY of function
> +   types.  This is used when tree-profile.cc instruments const calls,
> +   clears TREE_READONLY on FUNCTION_DECLs which have been instrumented, but
> +   for function types or indirect calls we don't know if the callees have 
> been
> +   instrumented or not.  */
> +bool ignore_const_fntype;
> +
>  /* Detect flags (function attributes) from the function decl or type node.  
> */
>  
>  int
> @@ -849,7 +856,7 @@ flags_from_decl_or_type (const_tree exp)
>  }
>else if (TYPE_P (exp))
>  {
> -  if (TYPE_READONLY (exp))
> +  if (TYPE_READONLY (exp) && !ignore_const_fntype)

I think we want to ignore PURE flag too: if callee modifies counters
seen by caller, we need to take that into account when optimizing it.

It is not very pretty to have global flag for prifled state, but I can't
think of scenario where this would break, so perhaps it is good way to
go?

Honza


[pushed] c++: namespace-scoped friend in local class [PR69410]

2023-03-17 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

do_friend was only considering class-qualified identifiers for the
qualified-id case, but we also need to skip local scope when there's an
explicit namespace scope.

PR c++/69410

gcc/cp/ChangeLog:

* friend.cc (do_friend): Handle namespace as scope argument.
* decl.cc (grokdeclarator): Pass down in_namespace.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/friend24.C: New test.
---
 gcc/cp/decl.cc |  3 ++-
 gcc/cp/friend.cc   | 21 +
 gcc/testsuite/g++.dg/lookup/friend24.C |  9 +
 3 files changed, 28 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lookup/friend24.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 1d1ae022606..f5785339c5a 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14393,7 +14393,8 @@ grokdeclarator (const cp_declarator *declarator,
cplus_decl_attributes (, *attrlist, 0);
*attrlist = NULL_TREE;
 
-   decl = do_friend (ctype, unqualified_id, decl,
+   tree scope = ctype ? ctype : in_namespace;
+   decl = do_friend (scope, unqualified_id, decl,
  flags, funcdef_flag);
return decl;
  }
diff --git a/gcc/cp/friend.cc b/gcc/cp/friend.cc
index 2d9bd4bb4aa..b36de2b20bb 100644
--- a/gcc/cp/friend.cc
+++ b/gcc/cp/friend.cc
@@ -487,19 +487,32 @@ make_friend_class (tree type, tree friend_type, bool 
complain)
 }
 
 /* Record DECL (a FUNCTION_DECL) as a friend of the
-   CURRENT_CLASS_TYPE.  If DECL is a member function, CTYPE is the
+   CURRENT_CLASS_TYPE.  If DECL is a member function, SCOPE is the
class of which it is a member, as named in the friend declaration.
+   If the friend declaration was explicitly namespace-qualified, SCOPE
+   is that namespace.
DECLARATOR is the name of the friend.  FUNCDEF_FLAG is true if the
friend declaration is a definition of the function.  FLAGS is as
for grokclass fn.  */
 
 tree
-do_friend (tree ctype, tree declarator, tree decl,
+do_friend (tree scope, tree declarator, tree decl,
   enum overload_flags flags,
   bool funcdef_flag)
 {
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
-  gcc_assert (!ctype || MAYBE_CLASS_TYPE_P (ctype));
+
+  tree ctype = NULL_TREE;
+  tree in_namespace = NULL_TREE;
+  if (!scope)
+;
+  else if (MAYBE_CLASS_TYPE_P (scope))
+ctype = scope;
+  else
+{
+  gcc_checking_assert (TREE_CODE (scope) == NAMESPACE_DECL);
+  in_namespace = scope;
+}
 
   /* Friend functions are unique, until proved otherwise.  */
   DECL_UNIQUE_FRIEND_P (decl) = 1;
@@ -609,7 +622,7 @@ do_friend (tree ctype, tree declarator, tree decl,
   parameters.  Instead, we call pushdecl when the class
   is instantiated.  */
decl = push_template_decl (decl, /*is_friend=*/true);
- else if (current_function_decl)
+ else if (current_function_decl && !in_namespace)
/* pushdecl will check there's a local decl already.  */
decl = pushdecl (decl, /*hiding=*/true);
  else
diff --git a/gcc/testsuite/g++.dg/lookup/friend24.C 
b/gcc/testsuite/g++.dg/lookup/friend24.C
new file mode 100644
index 000..9a45410d2a7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/friend24.C
@@ -0,0 +1,9 @@
+// PR c++/69410
+
+void a();
+void f() {
+class A {
+friend void ::a();
+friend class Z;
+};
+}

base-commit: 103d423f6ce72ccb03d55b7b1dfa2dabd5854371
-- 
2.31.1



Re: [PATCH] tree-inline: Fix up multiversioning with vector arguments [PR105554]

2023-03-17 Thread Richard Biener via Gcc-patches



> Am 17.03.2023 um 18:48 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase ICEs, because we call tree_function_versioning from
> old_decl which has target attributes not supporting V4DImode and so
> DECL_MODE of DECL_ARGUMENTS is BLKmode, while new_decl supports those.
> tree_function_versioning initially copies DECL_RESULT and DECL_ARGUMENTS
> from old_decl to new_decl, then calls initialize_cfun to create cfun
> and only when the cfun is created it can later actually remap_decl
> DECL_RESULT and DECL_ARGUMENTS etc.
> The problem is that initialize_cfun -> push_struct_function ->
> allocate_struct_function calls relayout_decl on DECL_RESULT and
> DECL_ARGUMENTS, which clobbers DECL_MODE of old_decl and we then ICE because
> of it.
> In particular, allocate_struct_function does:
>  if (!abstract_p)
>{
>  /* Now that we have activated any function-specific attributes
> that might affect layout, particularly vector modes, relayout
> each of the parameters and the result.  */
>  relayout_decl (result);
>  for (tree parm = DECL_ARGUMENTS (fndecl); parm;
>   parm = DECL_CHAIN (parm))
>relayout_decl (parm);
> 
>  /* Similarly relayout the function decl.  */
>  targetm.target_option.relayout_function (fndecl);
>}
> 
>  if (!abstract_p && aggregate_value_p (result, fndecl))
>{
> #ifdef PCC_STATIC_STRUCT_RETURN
>  cfun->returns_pcc_struct = 1;
> #endif
>  cfun->returns_struct = 1;
>}
> Now, in the case of tree_function_versioning, I believe all that we need
> from these is possibly the
> targetm.target_option.relayout_function (fndecl);
> call (arm only), we will remap DECL_RESULT and DECL_ARGUMENTS later on
> and copy_decl_for_dup_finish in that case will handle all we need:
>  /* For vector typed decls make sure to update DECL_MODE according
> to the new function context.  */
>  if (VECTOR_TYPE_P (TREE_TYPE (copy)))
>SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy)));
> We don't need the cfun->returns_*struct either, because we override it
> in initialize_cfun a few lines later:
>  /* Copy items we preserve during cloning.  */
> ...
>  cfun->returns_struct = src_cfun->returns_struct;
>  cfun->returns_pcc_struct = src_cfun->returns_pcc_struct;
> 
> So, to avoid the clobbering of DECL_RESULT/DECL_ARGUMENTS of old_decl,
> the following patch arranges allocate_struct_function to be called with
> abstract_p true and calls targetm.target_option.relayout_function (fndecl);
> by hand.
> 
> The removal of DECL_RESULT/DECL_ARGUMENTS copying at the start of
> initialize_cfun is removed because the only caller -
> tree_function_versioning, does that unconditionally before.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard 

> 2023-03-17  Jakub Jelinek  
> 
>PR target/105554
>* function.h (push_struct_function): Add ABSTRACT_P argument defaulted
>to false.
>* function.cc (push_struct_function): Add ABSTRACT_P argument, pass it
>to allocate_struct_function instead of false.
>* tree-inline.cc (initialize_cfun): Don't copy DECL_ARGUMENTS
>nor DECL_RESULT here.  Pass true as ABSTRACT_P to
>push_struct_function.  Call targetm.target_option.relayout_function
>after it.
>(tree_function_versioning): Formatting fix.
> 
>* gcc.target/i386/pr105554.c: New test.
> 
> --- gcc/function.h.jj2023-01-12 21:04:08.647239183 +0100
> +++ gcc/function.h2023-03-17 10:47:27.489571469 +0100
> @@ -687,7 +687,7 @@ extern void pop_cfun (void);
> extern int get_next_funcdef_no (void);
> extern int get_last_funcdef_no (void);
> extern void allocate_struct_function (tree, bool);
> -extern void push_struct_function (tree fndecl);
> +extern void push_struct_function (tree fndecl, bool = false);
> extern void push_dummy_function (bool);
> extern void pop_dummy_function (void);
> extern void init_dummy_function_start (void);
> --- gcc/function.cc.jj2023-01-14 10:00:20.308560059 +0100
> +++ gcc/function.cc2023-03-17 10:48:10.009945483 +0100
> @@ -4891,7 +4891,7 @@ allocate_struct_function (tree fndecl, b
>instead of just setting it.  */
> 
> void
> -push_struct_function (tree fndecl)
> +push_struct_function (tree fndecl, bool abstract_p)
> {
>   /* When in_dummy_function we might be in the middle of a pop_cfun and
>  current_function_decl and cfun may not match.  */
> @@ -4900,7 +4900,7 @@ push_struct_function (tree fndecl)
>  || (cfun && current_function_decl == cfun->decl));
>   cfun_stack.safe_push (cfun);
>   current_function_decl = fndecl;
> -  allocate_struct_function (fndecl, false);
> +  allocate_struct_function (fndecl, abstract_p);
> }
> 
> /* Reset crtl and other non-struct-function variables to defaults as
> --- gcc/tree-inline.cc.jj2023-01-19 09:58:51.037226790 +0100
> +++ gcc/tree-inline.cc2023-03-17 10:56:50.321285330 +0100
> @@ -2781,16 +2781,12 @@ 

[PATCH] c++: Drop TREE_READONLY on vars (possibly) initialized by tls wrapper [PR109164]

2023-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

The following two testcases are miscompiled, because we keep TREE_READONLY
on the vars even when they are (possibly) dynamically initialized by a TLS
wrapper function.  Normally cp_finish_decl drops TREE_READONLY from vars
which need dynamic initialization, but for TLS we do this kind of
initialization upon every access to those variables.  Keeping them
TREE_READONLY means e.g. PRE can hoist loads from those before loops
which contain the TLS wrapper calls, so we can access the TLS variables
before they are initialized.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-17  Jakub Jelinek  

PR c++/109164
* decl2.cc (get_tls_wrapper_fn): Clear TREE_READONLY on variables for
which a TLS wrapper is added.

* g++.dg/tls/thread_local13.C: New test.
* g++.dg/tls/thread_local13-aux.cc: New file.
* g++.dg/tls/thread_local14.C: New test.
* g++.dg/tls/thread_local14-aux.cc: New file.

--- gcc/cp/decl2.cc.jj  2023-03-07 21:20:31.800491531 +0100
+++ gcc/cp/decl2.cc 2023-03-17 12:20:11.960678291 +0100
@@ -3773,6 +3773,12 @@ get_tls_wrapper_fn (tree var)
   DECL_BEFRIENDING_CLASSES (fn) = var;
 
   set_global_binding (fn);
+
+  /* The variable now needs dynamic initialization by the wrapper
+function, we don't want to hoist accesses to it before the
+wrapper.  */
+  if (TREE_READONLY (var))
+   TREE_READONLY (var) = 0;
 }
   return fn;
 }
--- gcc/testsuite/g++.dg/tls/thread_local13.C.jj2023-03-17 
12:28:24.692427351 +0100
+++ gcc/testsuite/g++.dg/tls/thread_local13.C   2023-03-17 12:30:34.505519746 
+0100
@@ -0,0 +1,21 @@
+// PR c++/109164
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+// { dg-add-options tls }
+// { dg-require-effective-target tls_runtime }
+// { dg-additional-sources "thread_local13-aux.cc" }
+
+struct S { virtual void foo (); int s; };
+extern thread_local S 
+bool bar ();
+
+bool
+baz ()
+{
+  while (1)
+{
+  t.foo ();
+  if (!bar ())
+return false;
+}
+}
--- gcc/testsuite/g++.dg/tls/thread_local13-aux.cc.jj   2023-03-17 
12:28:28.721368058 +0100
+++ gcc/testsuite/g++.dg/tls/thread_local13-aux.cc  2023-03-17 
12:37:53.952070861 +0100
@@ -0,0 +1,35 @@
+// PR c++/109164
+
+struct S { virtual void foo (); int s; };
+extern bool baz ();
+
+void
+S::foo ()
+{
+  if (s != 42)
+__builtin_abort ();
+}
+
+S s;
+
+S &
+qux ()
+{
+  s.s = 42;
+  return s;
+}
+
+thread_local S  = qux ();
+
+bool
+bar ()
+{
+  return false;
+}
+
+int
+main ()
+{
+  if (baz ())
+__builtin_abort ();
+}
--- gcc/testsuite/g++.dg/tls/thread_local14.C.jj2023-03-17 
12:35:48.951905245 +0100
+++ gcc/testsuite/g++.dg/tls/thread_local14.C   2023-03-17 12:49:03.456249628 
+0100
@@ -0,0 +1,19 @@
+// PR c++/109164
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+// { dg-add-options tls }
+// { dg-require-effective-target tls_runtime }
+// { dg-additional-sources "thread_local14-aux.cc" }
+
+extern thread_local const int t;
+bool bar (int);
+
+bool
+baz ()
+{
+  while (1)
+{
+  if (!bar (t))
+return false;
+}
+}
--- gcc/testsuite/g++.dg/tls/thread_local14-aux.cc.jj   2023-03-17 
12:36:58.724881322 +0100
+++ gcc/testsuite/g++.dg/tls/thread_local14-aux.cc  2023-03-17 
12:48:53.914389421 +0100
@@ -0,0 +1,26 @@
+// PR c++/109164
+
+extern bool baz ();
+
+int
+qux ()
+{
+  return 42;
+}
+
+extern thread_local const int t = qux ();
+
+bool
+bar (int x)
+{
+  if (x != 42)
+__builtin_abort ();
+  return false;
+}
+
+int
+main ()
+{
+  if (baz ())
+__builtin_abort ();
+}

Jakub



[PATCH] tree-inline: Fix up multiversioning with vector arguments [PR105554]

2023-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs, because we call tree_function_versioning from
old_decl which has target attributes not supporting V4DImode and so
DECL_MODE of DECL_ARGUMENTS is BLKmode, while new_decl supports those.
tree_function_versioning initially copies DECL_RESULT and DECL_ARGUMENTS
from old_decl to new_decl, then calls initialize_cfun to create cfun
and only when the cfun is created it can later actually remap_decl
DECL_RESULT and DECL_ARGUMENTS etc.
The problem is that initialize_cfun -> push_struct_function ->
allocate_struct_function calls relayout_decl on DECL_RESULT and
DECL_ARGUMENTS, which clobbers DECL_MODE of old_decl and we then ICE because
of it.
In particular, allocate_struct_function does:
  if (!abstract_p)
{
  /* Now that we have activated any function-specific attributes
 that might affect layout, particularly vector modes, relayout
 each of the parameters and the result.  */
  relayout_decl (result);
  for (tree parm = DECL_ARGUMENTS (fndecl); parm;
   parm = DECL_CHAIN (parm))
relayout_decl (parm);

  /* Similarly relayout the function decl.  */
  targetm.target_option.relayout_function (fndecl);
}

  if (!abstract_p && aggregate_value_p (result, fndecl))
{
#ifdef PCC_STATIC_STRUCT_RETURN
  cfun->returns_pcc_struct = 1;
#endif
  cfun->returns_struct = 1;
}
Now, in the case of tree_function_versioning, I believe all that we need
from these is possibly the
targetm.target_option.relayout_function (fndecl);
call (arm only), we will remap DECL_RESULT and DECL_ARGUMENTS later on
and copy_decl_for_dup_finish in that case will handle all we need:
  /* For vector typed decls make sure to update DECL_MODE according
 to the new function context.  */
  if (VECTOR_TYPE_P (TREE_TYPE (copy)))
SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy)));
We don't need the cfun->returns_*struct either, because we override it
in initialize_cfun a few lines later:
  /* Copy items we preserve during cloning.  */
...
  cfun->returns_struct = src_cfun->returns_struct;
  cfun->returns_pcc_struct = src_cfun->returns_pcc_struct;

So, to avoid the clobbering of DECL_RESULT/DECL_ARGUMENTS of old_decl,
the following patch arranges allocate_struct_function to be called with
abstract_p true and calls targetm.target_option.relayout_function (fndecl);
by hand.

The removal of DECL_RESULT/DECL_ARGUMENTS copying at the start of
initialize_cfun is removed because the only caller -
tree_function_versioning, does that unconditionally before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-17  Jakub Jelinek  

PR target/105554
* function.h (push_struct_function): Add ABSTRACT_P argument defaulted
to false.
* function.cc (push_struct_function): Add ABSTRACT_P argument, pass it
to allocate_struct_function instead of false.
* tree-inline.cc (initialize_cfun): Don't copy DECL_ARGUMENTS
nor DECL_RESULT here.  Pass true as ABSTRACT_P to
push_struct_function.  Call targetm.target_option.relayout_function
after it.
(tree_function_versioning): Formatting fix.

* gcc.target/i386/pr105554.c: New test.

--- gcc/function.h.jj   2023-01-12 21:04:08.647239183 +0100
+++ gcc/function.h  2023-03-17 10:47:27.489571469 +0100
@@ -687,7 +687,7 @@ extern void pop_cfun (void);
 extern int get_next_funcdef_no (void);
 extern int get_last_funcdef_no (void);
 extern void allocate_struct_function (tree, bool);
-extern void push_struct_function (tree fndecl);
+extern void push_struct_function (tree fndecl, bool = false);
 extern void push_dummy_function (bool);
 extern void pop_dummy_function (void);
 extern void init_dummy_function_start (void);
--- gcc/function.cc.jj  2023-01-14 10:00:20.308560059 +0100
+++ gcc/function.cc 2023-03-17 10:48:10.009945483 +0100
@@ -4891,7 +4891,7 @@ allocate_struct_function (tree fndecl, b
instead of just setting it.  */
 
 void
-push_struct_function (tree fndecl)
+push_struct_function (tree fndecl, bool abstract_p)
 {
   /* When in_dummy_function we might be in the middle of a pop_cfun and
  current_function_decl and cfun may not match.  */
@@ -4900,7 +4900,7 @@ push_struct_function (tree fndecl)
  || (cfun && current_function_decl == cfun->decl));
   cfun_stack.safe_push (cfun);
   current_function_decl = fndecl;
-  allocate_struct_function (fndecl, false);
+  allocate_struct_function (fndecl, abstract_p);
 }
 
 /* Reset crtl and other non-struct-function variables to defaults as
--- gcc/tree-inline.cc.jj   2023-01-19 09:58:51.037226790 +0100
+++ gcc/tree-inline.cc  2023-03-17 10:56:50.321285330 +0100
@@ -2781,16 +2781,12 @@ initialize_cfun (tree new_fndecl, tree c
 {
   struct function *src_cfun = DECL_STRUCT_FUNCTION (callee_fndecl);
 
-  if (!DECL_ARGUMENTS (new_fndecl))
-DECL_ARGUMENTS (new_fndecl) = DECL_ARGUMENTS 

Re: [PATCH] [rs6000] adjust return_pc debug attrs

2023-03-17 Thread Segher Boessenkool
On Fri, Mar 17, 2023 at 10:33:22AM -0600, Tom Tromey wrote:
> > "Segher" == Segher Boessenkool  writes:
> 
> Segher> Yes.  On most architectures you can get multiple machine instructions 
> of
> Segher> course (for long calls for example), but on rs6000 (with some ABIs, in
> Segher> some circumstances) we generate a nop insn after calls, so that the
> Segher> linker has a spot to insert fixup code after calls (typically to 
> restore
> Segher> the r2 contents, but it could be anything).
> 
> FWIW I sent a gdb patch to work around this bug.  However, in my
> examples, I only ever saw a nop following the call instruction -- so I
> had gdb check for this.

GCC inserts just a nop in most cases, but the linker or dynamic linker
can replace it.
> Patch is here:
> 
> https://sourceware.org/pipermail/gdb-patches/2023-March/197951.html
> 
> ... but I suppose I should change it to drop the nop check?
> 
> It would of course be better not to have to have gdb work around this
> problem.

Yup.


Segher


Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-03-17 Thread Palmer Dabbelt

On Tue, 14 Mar 2023 10:48:24 PDT (-0700), gcc-patches@gcc.gnu.org wrote:



On 2/23/23 21:04, Kito Cheng wrote:

Hi Jeff:


What I'd been planning to do internally at Ventana was to update our
codebase to gcc-13 once it's released.  Then I'd backport RVV autovec
work from the gcc-14 dev tree into that Ventana branch.

Instead, but along the same lines, we could have a public gcc-13 based
branch which follows that same process and where Rivos, SiFive, Rivai,
Ventana (and potentially others with an interest in this space) could
collaborate.  Essentially it'd be gcc-13 + RVV autovec support.  We'd
probably have to hash out a bit of policy with the shared branch, but
I'd like to think we could make it work.


+1, I like the idea, I could imagine we definitely will do the same
work more than four times by different companies if we don't have a
collaboration branch...

So it looks like there's a general sense that a coordination branch off
gcc-13 is reasonable.  So I'd like to hammer out a few details.


First, I recommend we cut a branch from gcc-13 soon after gcc-13
branches.  That way we've got a place to land the vector work.

Second, I recommend we rebase that branch periodically so that it
follows gcc-13.  That means downstream consumers may have non-ff pulls,
but I think we want to follow gcc-13 fairly closely.  I'm open to other
approaches here.

Third, I was thinking that once a patch related to risc-v vectorization
goes to the trunk, any one of the principals should be able to
cherry-pick that patch onto our branch.


I'm a little bit confused about what the proposal is here: is the idea 
to have a branch based on gcc-13 where we coordinate work before it 
lands on trunk, or a branch based on gcc-13 where we backport 
autovec-related patches once they've landed on trunk?  In my mind those 
are actually two different things and I think they're both useful, maybe 
we should just do both?


Having a shared work-in-progress branch for the autovec stuff makes 
sense to me: it's a big patch set with engineers at multiple companies 
working on it, so having a shared patch stack should help with the 
coordination.  That branch will need to get re-written as patches get 
reviewed/merged, so having it rebase seems reasonable.  I'd have the 
branch based on trunk, as that's the eventual target for the patches, 
but trunk can be unstable so maybe that'll be too much of a headache.


For pretty much every other GCC release we've ended up with a "extra 
RISC-V backports" branch, where we end up with some patches that aren't 
suitable for proper upstream backports (usually because they're a 
performance improvement).  We've always talked about doing that as a FSF 
vendor branch, but I don't think we really ever got organized enough to 
do it.  We're doing that internally anyway at Rivos and I'd bet everyone 
else is too, it'd be great to find some way to share as much of that 
work as we can.


It's sort of a headache to just propose doing everything, but in this 
case I think we're going to end up with various flavors of both of these 
branches internally at the various companies so we might as well just 
try and do that in public where we can.



That implies we need to identify the principals.  I'll suggest Kito,
Juzhe, Michael and myself as the initial list.  I'm certainly open to
others joining.


+Vineet, who's been handling our internal GCC branches.

We'll still have internal branches for 13 regardless of how the autovec 
stuff proceeds, but having any sort of upstream backport branch will 
make life easier as we'll be able to share some of that work.



Other thoughts or suggestions?


Sorry if that throws a bit of a wrench in the works.

Just for context: in Rivos land we don't have any specific timelines 
around 13, so the goal on our end is just to keep the vectorization 
stuff progressing smoothly as we spin up more engineering resources on 
it.  Our aim is just to get everything on trunk eventually, anything 
else is just a stop-gap and we can work around it (though sharing that 
work is always a win).




Jeff


Re: [PATCH] testsuite: Handle default_packed targets in gcc.dg/plugin

2023-03-17 Thread David Malcolm via Gcc-patches
On Fri, 2023-03-17 at 17:10 +0100, Hans-Peter Nilsson wrote:
> > From: David Malcolm 
> > Date: Thu, 16 Mar 2023 14:42:58 -0400
> 
> > I think I prefer the top one-liner dg-skip-if approach you
> > mentioned in
> > your original email; it seems simplest.
> 
> Ok then.  There's also a choice between adding a
> target-specifier (i.e. "{ target { ! default_packed } }") to
> the dg-compile line, but a dg-skip-if is somewhat more
> scalable and can, like here, be a bit more informative.
> 
> This is what I'll commit, as "testsuite: Skip some
> gcc.dg/plugin tests for default_packed targets":

Thanks!
Dave



Re: [PATCH] [rs6000] adjust return_pc debug attrs

2023-03-17 Thread Tom Tromey via Gcc-patches
> "Segher" == Segher Boessenkool  writes:

Segher> Yes.  On most architectures you can get multiple machine instructions of
Segher> course (for long calls for example), but on rs6000 (with some ABIs, in
Segher> some circumstances) we generate a nop insn after calls, so that the
Segher> linker has a spot to insert fixup code after calls (typically to restore
Segher> the r2 contents, but it could be anything).

FWIW I sent a gdb patch to work around this bug.  However, in my
examples, I only ever saw a nop following the call instruction -- so I
had gdb check for this.

Patch is here:

https://sourceware.org/pipermail/gdb-patches/2023-March/197951.html

... but I suppose I should change it to drop the nop check?

It would of course be better not to have to have gdb work around this
problem.

Tom


Re: [PATCH] testsuite: Handle default_packed targets in gcc.dg/plugin

2023-03-17 Thread Hans-Peter Nilsson via Gcc-patches
> From: David Malcolm 
> Date: Thu, 16 Mar 2023 14:42:58 -0400

> I think I prefer the top one-liner dg-skip-if approach you mentioned in
> your original email; it seems simplest.

Ok then.  There's also a choice between adding a
target-specifier (i.e. "{ target { ! default_packed } }") to
the dg-compile line, but a dg-skip-if is somewhat more
scalable and can, like here, be a bit more informative.

This is what I'll commit, as "testsuite: Skip some
gcc.dg/plugin tests for default_packed targets":

Avoid unweildy structure-layout-specific message-matching
expressions by exluding targets that lay out structures as
if they had been specified with __attribute__ ((__packed__)),
for tests where multiple messages depend on the structure
layout.

It's arguably a judgement call whether to skip some of these
tests or add multiple lines of matches depending on the
layout of structures.

* gcc.dg/plugin/infoleak-2.c,
gcc.dg/plugin/infoleak-CVE-2011-1078-1.c,
gcc.dg/plugin/infoleak-CVE-2011-1078-2.c,
gcc.dg/plugin/infoleak-CVE-2017-18549-1.c,
gcc.dg/plugin/infoleak-CVE-2017-18550-1.c,
gcc.dg/plugin/infoleak-antipatterns-1.c,
gcc.dg/plugin/infoleak-fixit-1.c: Skip for default_packed targets.
---
 gcc/testsuite/gcc.dg/plugin/infoleak-2.c| 1 +
 gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-1.c  | 1 +
 gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-2.c  | 1 +
 gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18549-1.c | 1 +
 gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18550-1.c | 1 +
 gcc/testsuite/gcc.dg/plugin/infoleak-antipatterns-1.c   | 1 +
 gcc/testsuite/gcc.dg/plugin/infoleak-fixit-1.c  | 1 +
 7 files changed, 7 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-2.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-2.c
index 252f8f25918a..43ab41b8a97b 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-skip-if "structure layout assumption not met" { default_packed } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-1.c
index 3616fbe176b3..8afce8eefac2 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-1.c
@@ -9,6 +9,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-skip-if "structure layout assumption not met" { default_packed } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-2.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-2.c
index 2096bda71798..1142cf22655a 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2011-1078-2.c
@@ -3,6 +3,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-skip-if "structure layout assumption not met" { default_packed } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18549-1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18549-1.c
index 8a1c816cc1b5..239c7d1df5d3 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18549-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18549-1.c
@@ -10,6 +10,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-skip-if "structure layout assumption not met" { default_packed } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18550-1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18550-1.c
index 4272da96bab0..449348a1017f 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18550-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-CVE-2017-18550-1.c
@@ -10,6 +10,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-skip-if "structure layout assumption not met" { default_packed } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-antipatterns-1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-antipatterns-1.c
index 500845364388..84789a7f157e 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-antipatterns-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-antipatterns-1.c
@@ -3,6 +3,7 @@
 /* { dg-do compile } */
 /* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-skip-if "structure layout assumption not met" { default_packed } } */
 
 typedef unsigned char u8;
 typedef unsigned __INT16_TYPE__ u16;
diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-fixit-1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-fixit-1.c
index 6961b44f76b9..56158c12520a 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-fixit-1.c
+++ 

Re: [PATCH] c++: NTTP constraint depending on outer args [PR109160]

2023-03-17 Thread Patrick Palka via Gcc-patches
On Fri, 17 Mar 2023, Patrick Palka wrote:

> Here we're crashing during satisfaction for the NTTP 'C auto' from
> do_auto_deduction ultimately because convert_template_argument / unify
> don't pass all outer template arguments to do_auto_deduction, and during
> satisfaction we need to know all arguments.  While these callers do
> pass some outer arguments, they are only sufficient to properly
> substitute the 'auto' and are not necessarily the complete set.
> 
> Fortunately it seems it's possible to obtain the full set of outer
> arguments from these callers via convert_template_argument's IN_DECL
> parameter and unify's TPARMS parameter.  So this patch adds a TMPL
> parameter to do_auto_deduction, used only during adc_unify deduction,
> which contains the (partially instantiated) template corresponding to
> this auto and from which we can obtain all outer template arguments for
> satisfaction.
> 
> This patch also adjusts the IN_DECL argument passed to
> coerce_template_parms from tsubst_decl so that we could in turn safely
> assume convert_template_argument's IN_DECL is always a TEMPLATE_DECL,
> and thus could pass it as-is to do_auto_deduction.  (tsubst_decl seems
> to be the only caller that passes a non-empty non-template IN_DECL to
> coerce_template_parms.)
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/12?
> 
>   PR c++/109160
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (do_auto_deduction): Add defaulted TMPL parameter.
>   * pt.cc (convert_template_argument): Pass IN_DECL as TMPL to
>   do_auto_deduction.
>   (tsubst_decl) : Pass TMPL instead of T as
>   IN_DECL to coerce_template_parms.
>   (unify) : Pass the corresponding
>   template as TMPL to do_auto_deduction.
>   (do_auto_deduction): Document default arguments.  Use TMPL
>   to obtain a full set of template arguments for satisfaction
>   in the adc_unify case.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-placeholder12.C: New test.
> ---
>  gcc/cp/cp-tree.h  |  3 +-
>  gcc/cp/pt.cc  | 30 ++-
>  .../g++.dg/cpp2a/concepts-placeholder12.C | 29 ++
>  3 files changed, 53 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder12.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index dfc1c845768..e7190c5cc62 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7324,7 +7324,8 @@ extern tree do_auto_deduction   (tree, 
> tree, tree,
>   auto_deduction_context
>= adc_unspecified,
>tree = NULL_TREE,
> -  int = LOOKUP_NORMAL);
> +  int = LOOKUP_NORMAL,
> +  tree = NULL_TREE);
>  extern tree type_uses_auto   (tree);
>  extern tree type_uses_auto_or_concept(tree);
>  extern void append_type_to_template_for_access_check (tree, tree, tree,
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index ddbd73371b9..6400b686a58 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -8638,7 +8638,7 @@ convert_template_argument (tree parm,
>else if (tree a = type_uses_auto (t))
>   {
> t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
> -  LOOKUP_IMPLICIT);
> +  LOOKUP_IMPLICIT, in_decl);
> if (t == error_mark_node)
>   return error_mark_node;
>   }
> @@ -15243,7 +15243,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
> complain)
>the template.  */
> argvec = (coerce_template_parms
>   (DECL_TEMPLATE_PARMS (gen_tmpl),
> -  argvec, t, complain));
> +  argvec, tmpl, complain));
>   if (argvec == error_mark_node)
> RETURN (error_mark_node);
>   hash = spec_hasher::hash (gen_tmpl, argvec);
> @@ -24655,7 +24655,9 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
> int strict,
> if (tree a = type_uses_auto (tparm))
>   {
> tparm = do_auto_deduction (tparm, arg, a,
> -  complain, adc_unify, targs);
> +  complain, adc_unify, targs,
> +  LOOKUP_NORMAL,
> +  TPARMS_PRIMARY_TEMPLATE (tparms));
> if (tparm == error_mark_node)
>   return 1;
>   }
> @@ -30643,13 +30645,20 @@ unparenthesized_id_or_class_member_access_p (tree 
> init)
> adc_requirement contexts to communicate the necessary template arguments
> to satisfaction.  OUTER_TARGS is 

[PATCH] c++: NTTP constraint depending on outer args [PR109160]

2023-03-17 Thread Patrick Palka via Gcc-patches
Here we're crashing during satisfaction for the NTTP 'C auto' from
do_auto_deduction ultimately because convert_template_argument / unify
don't pass all outer template arguments to do_auto_deduction, and during
satisfaction we need to know all arguments.  While these callers do
pass some outer arguments, they are only sufficient to properly
substitute the 'auto' and are not necessarily the complete set.

Fortunately it seems it's possible to obtain the full set of outer
arguments from these callers via convert_template_argument's IN_DECL
parameter and unify's TPARMS parameter.  So this patch adds a TMPL
parameter to do_auto_deduction, used only during adc_unify deduction,
which contains the (partially instantiated) template corresponding to
this auto and from which we can obtain all outer template arguments for
satisfaction.

This patch also adjusts the IN_DECL argument passed to
coerce_template_parms from tsubst_decl so that we could in turn safely
assume convert_template_argument's IN_DECL is always a TEMPLATE_DECL,
and thus could pass it as-is to do_auto_deduction.  (tsubst_decl seems
to be the only caller that passes a non-empty non-template IN_DECL to
coerce_template_parms.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12?

PR c++/109160

gcc/cp/ChangeLog:

* cp-tree.h (do_auto_deduction): Add defaulted TMPL parameter.
* pt.cc (convert_template_argument): Pass IN_DECL as TMPL to
do_auto_deduction.
(tsubst_decl) : Pass TMPL instead of T as
IN_DECL to coerce_template_parms.
(unify) : Pass the corresponding
template as TMPL to do_auto_deduction.
(do_auto_deduction): Document default arguments.  Use TMPL
to obtain a full set of template arguments for satisfaction
in the adc_unify case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-placeholder12.C: New test.
---
 gcc/cp/cp-tree.h  |  3 +-
 gcc/cp/pt.cc  | 30 ++-
 .../g++.dg/cpp2a/concepts-placeholder12.C | 29 ++
 3 files changed, 53 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder12.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index dfc1c845768..e7190c5cc62 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7324,7 +7324,8 @@ extern tree do_auto_deduction   (tree, 
tree, tree,
  auto_deduction_context
 = adc_unspecified,
 tree = NULL_TREE,
-int = LOOKUP_NORMAL);
+int = LOOKUP_NORMAL,
+tree = NULL_TREE);
 extern tree type_uses_auto (tree);
 extern tree type_uses_auto_or_concept  (tree);
 extern void append_type_to_template_for_access_check (tree, tree, tree,
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ddbd73371b9..6400b686a58 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -8638,7 +8638,7 @@ convert_template_argument (tree parm,
   else if (tree a = type_uses_auto (t))
{
  t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
-LOOKUP_IMPLICIT);
+LOOKUP_IMPLICIT, in_decl);
  if (t == error_mark_node)
return error_mark_node;
}
@@ -15243,7 +15243,7 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 the template.  */
  argvec = (coerce_template_parms
(DECL_TEMPLATE_PARMS (gen_tmpl),
-argvec, t, complain));
+argvec, tmpl, complain));
if (argvec == error_mark_node)
  RETURN (error_mark_node);
hash = spec_hasher::hash (gen_tmpl, argvec);
@@ -24655,7 +24655,9 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
  if (tree a = type_uses_auto (tparm))
{
  tparm = do_auto_deduction (tparm, arg, a,
-complain, adc_unify, targs);
+complain, adc_unify, targs,
+LOOKUP_NORMAL,
+TPARMS_PRIMARY_TEMPLATE (tparms));
  if (tparm == error_mark_node)
return 1;
}
@@ -30643,13 +30645,20 @@ unparenthesized_id_or_class_member_access_p (tree 
init)
adc_requirement contexts to communicate the necessary template arguments
to satisfaction.  OUTER_TARGS is ignored in other contexts.
 
-   For partial-concept-ids, extra args may be appended to the list of deduced
-   template arguments prior to determining constraint satisfaction.  

Re: [PATCH] c, ubsan: Instrument even shortened divisions [PR109151]

2023-03-17 Thread Marek Polacek via Gcc-patches
On Fri, Mar 17, 2023 at 09:14:04AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> On the following testcase, the C FE decides to shorten the division because
> it has a guarantee that INT_MIN / -1 division won't be encountered, the
> first operand is widened from narrower unsigned and/or the second operand is
> a constant other than all ones (in this case both are true).
> The problem is that the narrower type in this case is _Bool and
> ubsan_instrument_division only instruments it if op0's type is INTEGER_TYPE
> or REAL_TYPE.  Strangely this doesn't happen in C++ FE.

I was curious.  The difference is because C++ passes this

(gdb) pge op0
(int) (short int) (VIEW_CONVERT_EXPR(d) == 1 | VIEW_CONVERT_EXPR(d) > 
9)

to shorten_binary_op while C passes:

(gdb) pge op0
(int) (<<< Unknown tree: c_maybe_const_expr
  
  d >>> == 1 || <<< Unknown tree: c_maybe_const_expr
  
  d >>> > 9)

so when we remove the '(int)' cast, we have different types underneath,
either short or bool.

In C, the BIT_IOR_EXPR -> TRUTH_OR_EXPR change is because we call c_convert ->
convert_to_integer -> do_narrow.
In C++, we never called do_narrow so shorten_binary_op gets the original tree.

Anyway, thanks for the patch.

Marek



Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Mar 2023, Jakub Jelinek wrote:

> On Fri, Mar 17, 2023 at 01:55:51PM +, Richard Biener wrote:
> > > Anyway, I think it is fine to implement __builtin_expect this way
> > > for now, ERF_RETURNS_ARG will be more important for pointers, especially 
> > > if
> > > we propagate something more than just maybe be/can't be/must be null.
> > > Don't you need to handle BUILT_IN_EXPECT_WITH_PROBABILITY the same though?
> > 
> > Yes, BUILT_IN_ASSUME_ALIGNED would be another candidate.
> 
> BUILT_IN_ASSUME_ALIGNED doesn't do that intentionally.
> E.g. tree-object-size.cc (pass_through_call) comments on this:
>   unsigned rf = gimple_call_return_flags (call);
>   if (rf & ERF_RETURNS_ARG)
> {
>   unsigned argnum = rf & ERF_RETURN_ARG_MASK;
>   if (argnum < gimple_call_num_args (call))
> return gimple_call_arg (call, argnum);
> }
> 
>   /* __builtin_assume_aligned is intentionally not marked RET1.  */
>   if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED))
> return gimple_call_arg (call, 0);
> The reason is that we don't want passes to propagate the argument to the
> return value but use a different SSA_NAME there, so that we can have
> there different alignment info.

Only that we now _do_ have BUILT_IN_ASSUME_ALIGNED marked as RET1 ...

> And as you show on the testcases, it probably isn't a good idea for
> BUILT_IN_EXPECT* either.
> 
> So, perhaps use op_cfn_pass_through_arg1 for the ERF_RETURNS_ARG functions
> and BUILT_IN_EXPECT* ?

But that already causes the problems (I didn't yet finish testing
adding RET1 to BUILT_IN_EXPECT*).  Note FRE currently does not use
returns_arg but I have old patches that do - but those replace
uses after a RET1 function with the return value to also reduce
spilling around a call (they of course CSE equal calls).

Anyway, I suspect the ssa-lim-21.c testcase is just very fragile
while the predict-20.c shows a case which should be handled
better even without the __builtin_expect (which is now gone):

  for (;;)
{
  if (__builtin_expect (b < 0, 0))
break;
  if (__builtin_expect (a, 1))
break;
}
  int d = b < 0;
  if (__builtin_expect (d, 0))
asm("");

we should be able to handle

 _1 = PHI <1, 0>
 if (_1 != 0)

by copying the probabilities from the incoming PHI edges here,
possibly even by using ranger to test for known condition
on some edges.

Of course it's bad to regress more here.  I'm considering
backing out the -Wuse-after-free changes.

Richard.


> > >From feb846cbff9774125d8401dfeacd8a4b9c2dccfa Mon Sep 17 00:00:00 2001
> > From: Richard Biener 
> > Date: Fri, 17 Mar 2023 13:14:49 +0100
> > Subject: [PATCH] tree-optimization/109170 - bogus use-after-free with
> >  __builtin_expect
> > To: gcc-patches@gcc.gnu.org
> > 
> > The following adds a missing range-op for __builtin_expect which
> > helps -Wuse-after-free to detect the case a realloc original
> > pointer is used when the result was NULL.  The implementation
> > should handle all argument one pass-through builtins we handle
> > in the fnspec machinery.
> > 
> > tree-optimization/109170
> > * gimple-range-op.cc (cfn_pass_through_arg1): New.
> > (gimple_range_op_handler::maybe_builtin_call): Handle
> > __builtin_expect and similar via cfn_pass_through_arg1
> > and inspecting the calls fnspec.
> > * builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
> > and BUILT_IN_EXPECT_WITH_PROBABILITY.
> > 
> > * gcc.dg/Wuse-after-free-pr109170.c: New testcase.
> > ---
> >  gcc/builtins.cc   |  2 ++
> >  gcc/gimple-range-op.cc| 32 ++-
> >  .../gcc.dg/Wuse-after-free-pr109170.c | 15 +
> >  3 files changed, 48 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> > 
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index 90246e214d6..56545027297 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -11715,6 +11715,8 @@ builtin_fnspec (tree callee)
> >case BUILT_IN_RETURN_ADDRESS:
> > return ".c";
> >case BUILT_IN_ASSUME_ALIGNED:
> > +  case BUILT_IN_EXPECT:
> > +  case BUILT_IN_EXPECT_WITH_PROBABILITY:
> > return "1cX ";
> >/* But posix_memalign stores a pointer into the memory pointed to
> >  by its first argument.  */
> > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> > index a5d625387e7..1a00f1690e5 100644
> > --- a/gcc/gimple-range-op.cc
> > +++ b/gcc/gimple-range-op.cc
> > @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "range.h"
> >  #include "value-query.h"
> >  #include "gimple-range.h"
> > +#include "attr-fnspec.h"
> >  
> >  // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
> >  // on the statement.  For efficiency, it is an error to not pass in enough
> > @@ -309,6 +310,26 @@ public:
> >}
> >  } op_cfn_constant_p;
> >  
> > +// Implement 

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Andrew MacLeod via Gcc-patches



On 3/17/23 04:31, Richard Biener wrote:

On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  wrote:

For this C testcase:

void g();
void f(unsigned int *a)
{
   if (++*a == 1)
 g();
}

GCC will currently emit a comparison with 1 by using the value
of *a after the increment. This can be improved by comparing
against 0 and using the value before the increment. As a result
there is a potentially shorter dependancy chain (no need to wait
for the result of +1) and on targets with compare zero instructions
the generated code is one instruction shorter.

The downside is we now need two registers and their lifetime overlaps.

Your patch mixes changing / inverting a parameter (which seems unneeded
for the actual change) with preferring compares against zero.

What's the reason to specifically prefer compares against zero?  On x86
we have add that sets flags, so ++*a == 0 would be preferred, but
for your sequence we'd need a test reg, reg; branch on zero, so we do
not save any instruction.

We do have quite some number of bugreports with regards to making VRPs
life harder when splitting things this way.  It's easier for VRP to handle

   _1 = _2 + 1;
   if (_1 == 1)

than it is

   _1 = _2 + 1;
   if (_2 == 0)

where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides



Heh?

    _1 = *a_5(D);
    b_6 = _1 + 1;
    if (_1 == 0)
  goto ; [INV]
    else
  goto ; [INV]

2->3  (T) _1 :  [irange] unsigned int [0, 0] NONZERO 0x0
2->3  (T) b_6 : [irange] unsigned int [1, 1] NONZERO 0x1
2->4  (F) _1 :  [irange] unsigned int [1, +INF]
2->4  (F) b_6 : [irange] unsigned int [0, 0][2, +INF]

I will grant you that if the definition of b_6 is in a different basic 
block that if (_1 == 0) we may not always get a range for it,  but 
generally this should be OK?  especialyl within a basic block.


I do have a few re-computation cases to improve upon of course :-P

Andrew





Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 17, 2023 at 01:55:51PM +, Richard Biener wrote:
> > Anyway, I think it is fine to implement __builtin_expect this way
> > for now, ERF_RETURNS_ARG will be more important for pointers, especially if
> > we propagate something more than just maybe be/can't be/must be null.
> > Don't you need to handle BUILT_IN_EXPECT_WITH_PROBABILITY the same though?
> 
> Yes, BUILT_IN_ASSUME_ALIGNED would be another candidate.

BUILT_IN_ASSUME_ALIGNED doesn't do that intentionally.
E.g. tree-object-size.cc (pass_through_call) comments on this:
  unsigned rf = gimple_call_return_flags (call);
  if (rf & ERF_RETURNS_ARG)
{
  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
  if (argnum < gimple_call_num_args (call))
return gimple_call_arg (call, argnum);
}

  /* __builtin_assume_aligned is intentionally not marked RET1.  */
  if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED))
return gimple_call_arg (call, 0);
The reason is that we don't want passes to propagate the argument to the
return value but use a different SSA_NAME there, so that we can have
there different alignment info.

And as you show on the testcases, it probably isn't a good idea for
BUILT_IN_EXPECT* either.

So, perhaps use op_cfn_pass_through_arg1 for the ERF_RETURNS_ARG functions
and BUILT_IN_EXPECT* ?

> >From feb846cbff9774125d8401dfeacd8a4b9c2dccfa Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Fri, 17 Mar 2023 13:14:49 +0100
> Subject: [PATCH] tree-optimization/109170 - bogus use-after-free with
>  __builtin_expect
> To: gcc-patches@gcc.gnu.org
> 
> The following adds a missing range-op for __builtin_expect which
> helps -Wuse-after-free to detect the case a realloc original
> pointer is used when the result was NULL.  The implementation
> should handle all argument one pass-through builtins we handle
> in the fnspec machinery.
> 
>   tree-optimization/109170
>   * gimple-range-op.cc (cfn_pass_through_arg1): New.
>   (gimple_range_op_handler::maybe_builtin_call): Handle
>   __builtin_expect and similar via cfn_pass_through_arg1
>   and inspecting the calls fnspec.
>   * builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
>   and BUILT_IN_EXPECT_WITH_PROBABILITY.
> 
>   * gcc.dg/Wuse-after-free-pr109170.c: New testcase.
> ---
>  gcc/builtins.cc   |  2 ++
>  gcc/gimple-range-op.cc| 32 ++-
>  .../gcc.dg/Wuse-after-free-pr109170.c | 15 +
>  3 files changed, 48 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> 
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 90246e214d6..56545027297 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -11715,6 +11715,8 @@ builtin_fnspec (tree callee)
>case BUILT_IN_RETURN_ADDRESS:
>   return ".c";
>case BUILT_IN_ASSUME_ALIGNED:
> +  case BUILT_IN_EXPECT:
> +  case BUILT_IN_EXPECT_WITH_PROBABILITY:
>   return "1cX ";
>/* But posix_memalign stores a pointer into the memory pointed to
>by its first argument.  */
> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index a5d625387e7..1a00f1690e5 100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "range.h"
>  #include "value-query.h"
>  #include "gimple-range.h"
> +#include "attr-fnspec.h"
>  
>  // Given stmt S, fill VEC, up to VEC_SIZE elements, with relevant ssa-names
>  // on the statement.  For efficiency, it is an error to not pass in enough
> @@ -309,6 +310,26 @@ public:
>}
>  } op_cfn_constant_p;
>  
> +// Implement range operator for integral/pointer functions returning
> +// the first argument.
> +class cfn_pass_through_arg1 : public range_operator
> +{
> +public:
> +  using range_operator::fold_range;
> +  virtual bool fold_range (irange , tree, const irange ,
> +const irange &, relation_trio) const
> +  {
> +r = lh;
> +return true;
> +  }
> +  virtual bool op1_range (irange , tree, const irange ,
> +   const irange &, relation_trio) const
> +  {
> +r = lhs;
> +return true;
> +  }
> +} op_cfn_pass_through_arg1;
> +
>  // Implement range operator for CFN_BUILT_IN_SIGNBIT.
>  class cfn_signbit : public range_operator_float
>  {
> @@ -967,6 +988,15 @@ gimple_range_op_handler::maybe_builtin_call ()
>break;
>  
>  default:
> -  break;
> +  {
> + unsigned arg;
> + if (gimple_call_fnspec (call).returns_arg () && arg == 0)
> +   {
> + m_valid = true;
> + m_op1 = gimple_call_arg (call, 0);
> + m_int = _cfn_pass_through_arg1;
> +   }
> + break;
> +  }
>  }
>  }
> diff --git a/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c 
> b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> new file mode 100644
> index 000..fa7dc66d66c
> --- /dev/null
> +++ 

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, Mar 17, 2023 at 2:15 PM Philipp Tomsich
 wrote:
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener  
> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
>
> AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while
> comparing against a constant requires to load any non-zero value into
> a register first.
> This feels a bit like we need to call onto the backend to check
> whether comparisons against 0 are cheaper.

Hmm - we should try hard to not introduce too many target dependent IL
differences early.  I do wonder if it's possible to handle this later
(and also the reverse transform).  Ideally we'd know register pressure
and the cost of the constants involved as well.  We do some related
"fixups" when rewriting out-of-SSA, mainly to improve coalescing
around backedges and avoiding overlapping life ranges there for loop
exit compares with live before/after IVs after the loop.

> Obviously, the underlying issue become worse if the immediate can not
> be built up in a single instruction.
> Using RISC-V as an example (primarily, as RISC-V makes it particularly
> easy to run into multi-instruction sequences for constants), we can
> construct the following case:
>
>   void f(unsigned int *a)
>   {
> if ((*a += 0x900) == 0x900)
>g();
>   }
>
> which GCC 12.2.0 (trunk may already be small enough to reuse the
> constant once loaded into register, but I did not check…) with -O3
> turns into:
>
> f:
>   lw a4,0(a0)
>   li a5,4096
>   addiw a5,a5,-1792
>   addw a4,a5,a4
>   li a5,4096
>   sw a4,0(a0)
>   addi a5,a5,-1792
>   beq a4,a5,.L4
>   ret
> .L4:
>   tail g
>
> Thanks,
> Philipp.
>
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener  
> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
> >
> > We do have quite some number of bugreports with regards to making VRPs
> > life harder when splitting things this way.  It's easier for VRP to handle
> >
> >   _1 = _2 + 1;
> >   if (_1 == 1)
> >
> > than it is
> >
> >   _1 = _2 + 1;
> >   if (_2 == 0)
> >
> > where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> > the life-range issue there's other side-effects as well.  Maybe ranger 
> > meanwhile
> > can handle the above case?
> >
> > What's the overall effect of the change on a larger code base?
> >
> > Thanks,
> > Richard.
> >
> > >
> > > Example from Aarch64:
> > >
> > > Before
> > > ldr w1, [x0]
> > > add w1, w1, 1
> > > str w1, [x0]
> > > cmp w1, 1
> > > beq .L4
> > > ret
> > >
> > > After
> > > ldr w1, [x0]
> > > add w2, w1, 1
> > > str w2, [x0]
> > > cbz w1, .L4
> > > ret
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > > (forward_propagate_into_comparison_1): Optimize
> > > for zero comparisons.
> > >
> > 

Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Andrew MacLeod via Gcc-patches



On 3/17/23 08:59, Jakub Jelinek wrote:

On Fri, Mar 17, 2023 at 12:53:48PM +, Richard Biener wrote:

On Fri, 17 Mar 2023, Jakub Jelinek wrote:


On Fri, Mar 17, 2023 at 01:18:32PM +0100, Richard Biener wrote:

The following adds a missing range-op for __builtin_expect which
helps -Wuse-after-free to detect the case a realloc original
pointer is used when the result was NULL.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

PR tree-optimization/109170
* gimple-range-op.cc (cfn_expect): New.
(gimple_range_op_handler::maybe_builtin_call): Handle
__builtin_expect.

* gcc.dg/Wuse-after-free-pr109170.c: New testcase.

Shouldn't that be something we handle generically for all
ERF_RETURNS_ARG calls (and not just for irange, but for any
supported ranges)?

Though, admittedly __builtin_expect probably doesn't set that
and all the other current builtins with ERF_RETURNS_ARG return
pointers I think.

Looking at builtin_fnspec we're indeed missing BUILT_IN_EXPECT,
but we could indeed use gimple_call_fnspec and look for a
returned argument.  If it's not the first handling this
generically is going to be interesting wrt op?_range though,
so we'd need a range operator for each case (returns arg 1,
returns arg 2, more args are not supported?).  Currently

I think fnspec supports 1-4, but nothing actually uses anything but 1
or none; I could be wrong.

Anyway, I think it is fine to implement __builtin_expect this way
for now, ERF_RETURNS_ARG will be more important for pointers, especially if
we propagate something more than just maybe be/can't be/must be null.
Don't you need to handle BUILT_IN_EXPECT_WITH_PROBABILITY the same though?


I think thats fine for now.

Im going to address improving dispatch for range-ops in stage 1 when it 
opens.


we want to handle non-standard ops more generally like we did for 
WIDEN_MULT_EXPR, plus we didnt know the actualy requirements for the 
initial cut of vrange ->irange/frange dispatch.   We'll clean that up to 
make adding more range kinds cleaner.


as for gimple_fnspec, im sure we can do something better than what we 
have.  Current range-ops works only with 2 operands, but via this 
mechanism they can be any 2. (I think :-)


We can fold arbitrary statements in 
gimpe-range-fold::fold_using_range(), so it is only the 
op[12]_range/relation  routines we loose... Im not sure if there is 
anything that critical there, but if we find something, well we can look 
at it.


Andrew



Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Mar 2023, Jakub Jelinek wrote:

> On Fri, Mar 17, 2023 at 12:53:48PM +, Richard Biener wrote:
> > On Fri, 17 Mar 2023, Jakub Jelinek wrote:
> > 
> > > On Fri, Mar 17, 2023 at 01:18:32PM +0100, Richard Biener wrote:
> > > > The following adds a missing range-op for __builtin_expect which
> > > > helps -Wuse-after-free to detect the case a realloc original
> > > > pointer is used when the result was NULL.
> > > > 
> > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> > > > 
> > > > PR tree-optimization/109170
> > > > * gimple-range-op.cc (cfn_expect): New.
> > > > (gimple_range_op_handler::maybe_builtin_call): Handle
> > > > __builtin_expect.
> > > > 
> > > > * gcc.dg/Wuse-after-free-pr109170.c: New testcase.
> > > 
> > > Shouldn't that be something we handle generically for all
> > > ERF_RETURNS_ARG calls (and not just for irange, but for any
> > > supported ranges)?
> > > 
> > > Though, admittedly __builtin_expect probably doesn't set that
> > > and all the other current builtins with ERF_RETURNS_ARG return
> > > pointers I think.
> > 
> > Looking at builtin_fnspec we're indeed missing BUILT_IN_EXPECT,
> > but we could indeed use gimple_call_fnspec and look for a
> > returned argument.  If it's not the first handling this
> > generically is going to be interesting wrt op?_range though,
> > so we'd need a range operator for each case (returns arg 1,
> > returns arg 2, more args are not supported?).  Currently
> 
> I think fnspec supports 1-4, but nothing actually uses anything but 1
> or none; I could be wrong.
> 
> Anyway, I think it is fine to implement __builtin_expect this way
> for now, ERF_RETURNS_ARG will be more important for pointers, especially if
> we propagate something more than just maybe be/can't be/must be null.
> Don't you need to handle BUILT_IN_EXPECT_WITH_PROBABILITY the same though?

Yes, BUILT_IN_ASSUME_ALIGNED would be another candidate.

One issue revealed by testing is that EVRP now propagates

  b.0_1 = b;
  _2 = b.0_1 < 0;
  _3 = (long int) _2;
  _4 = __builtin_expect (_3, 0);
  if (_4 != 0)
...

  b.2_8 = b;
  _9 = b.2_8 < 0;
  d_13 = (int) _9;
  _10 = (long int) _9;
  _11 = __builtin_expect (_10, 0);
  if (_11 != 0)

and thus gcc.dg/predict-20.c FAILs and the change is that we propagate
known true/false into the last compare as

   [local count: 977105059]:
  # _9 = PHI <1(3), 0(4)>
  if (_9 != 0)

and lose the connection to __builtin_expect.

We also FAIL gcc.dg/tree-ssa/ssa-lim-21.c, but that's because

  for (int j = 0; j < m; j++)
if (__builtin_expect (m, 0))

is now optimized (m is [1, +INF] when we enter the loop).  I
have difficulties in restoring the testcase by massaging it,
will try a bit more.

I've implemented the fnspec variant as well now and then we also
CSE the __builtin_expct call, see below for this patch variant.

Richard.

>From feb846cbff9774125d8401dfeacd8a4b9c2dccfa Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 17 Mar 2023 13:14:49 +0100
Subject: [PATCH] tree-optimization/109170 - bogus use-after-free with
 __builtin_expect
To: gcc-patches@gcc.gnu.org

The following adds a missing range-op for __builtin_expect which
helps -Wuse-after-free to detect the case a realloc original
pointer is used when the result was NULL.  The implementation
should handle all argument one pass-through builtins we handle
in the fnspec machinery.

tree-optimization/109170
* gimple-range-op.cc (cfn_pass_through_arg1): New.
(gimple_range_op_handler::maybe_builtin_call): Handle
__builtin_expect and similar via cfn_pass_through_arg1
and inspecting the calls fnspec.
* builtins.cc (builtin_fnspec): Handle BUILT_IN_EXPECT
and BUILT_IN_EXPECT_WITH_PROBABILITY.

* gcc.dg/Wuse-after-free-pr109170.c: New testcase.
---
 gcc/builtins.cc   |  2 ++
 gcc/gimple-range-op.cc| 32 ++-
 .../gcc.dg/Wuse-after-free-pr109170.c | 15 +
 3 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 90246e214d6..56545027297 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -11715,6 +11715,8 @@ builtin_fnspec (tree callee)
   case BUILT_IN_RETURN_ADDRESS:
return ".c";
   case BUILT_IN_ASSUME_ALIGNED:
+  case BUILT_IN_EXPECT:
+  case BUILT_IN_EXPECT_WITH_PROBABILITY:
return "1cX ";
   /* But posix_memalign stores a pointer into the memory pointed to
 by its first argument.  */
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a5d625387e7..1a00f1690e5 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "range.h"
 #include "value-query.h"
 #include "gimple-range.h"
+#include "attr-fnspec.h"
 
 // Given stmt S, fill 

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Philipp Tomsich
On Fri, 17 Mar 2023 at 09:31, Richard Biener  wrote:
>
> On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> wrote:
> >
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >   if (++*a == 1)
> > g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
>
> The downside is we now need two registers and their lifetime overlaps.
>
> Your patch mixes changing / inverting a parameter (which seems unneeded
> for the actual change) with preferring compares against zero.
>
> What's the reason to specifically prefer compares against zero?  On x86
> we have add that sets flags, so ++*a == 0 would be preferred, but
> for your sequence we'd need a test reg, reg; branch on zero, so we do
> not save any instruction.

AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while
comparing against a constant requires to load any non-zero value into
a register first.
This feels a bit like we need to call onto the backend to check
whether comparisons against 0 are cheaper.

Obviously, the underlying issue become worse if the immediate can not
be built up in a single instruction.
Using RISC-V as an example (primarily, as RISC-V makes it particularly
easy to run into multi-instruction sequences for constants), we can
construct the following case:

  void f(unsigned int *a)
  {
if ((*a += 0x900) == 0x900)
   g();
  }

which GCC 12.2.0 (trunk may already be small enough to reuse the
constant once loaded into register, but I did not check…) with -O3
turns into:

f:
  lw a4,0(a0)
  li a5,4096
  addiw a5,a5,-1792
  addw a4,a5,a4
  li a5,4096
  sw a4,0(a0)
  addi a5,a5,-1792
  beq a4,a5,.L4
  ret
.L4:
  tail g

Thanks,
Philipp.


On Fri, 17 Mar 2023 at 09:31, Richard Biener  wrote:
>
> On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> wrote:
> >
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >   if (++*a == 1)
> > g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
>
> The downside is we now need two registers and their lifetime overlaps.
>
> Your patch mixes changing / inverting a parameter (which seems unneeded
> for the actual change) with preferring compares against zero.
>
> What's the reason to specifically prefer compares against zero?  On x86
> we have add that sets flags, so ++*a == 0 would be preferred, but
> for your sequence we'd need a test reg, reg; branch on zero, so we do
> not save any instruction.
>
> We do have quite some number of bugreports with regards to making VRPs
> life harder when splitting things this way.  It's easier for VRP to handle
>
>   _1 = _2 + 1;
>   if (_1 == 1)
>
> than it is
>
>   _1 = _2 + 1;
>   if (_2 == 0)
>
> where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> the life-range issue there's other side-effects as well.  Maybe ranger 
> meanwhile
> can handle the above case?
>
> What's the overall effect of the change on a larger code base?
>
> Thanks,
> Richard.
>
> >
> > Example from Aarch64:
> >
> > Before
> > ldr w1, [x0]
> > add w1, w1, 1
> > str w1, [x0]
> > cmp w1, 1
> > beq .L4
> > ret
> >
> > After
> > ldr w1, [x0]
> > add w2, w1, 1
> > str w2, [x0]
> > cbz w1, .L4
> > ret
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > (forward_propagate_into_comparison_1): Optimize
> > for zero comparisons.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/tree-ssa-forwprop.cc | 41 +++-
> >  1 file changed, 28 insertions(+), 13 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > index e34f0888954..93d5043821b 100644
> > --- a/gcc/tree-ssa-forwprop.cc
> > +++ b/gcc/tree-ssa-forwprop.cc
> > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
> >  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> > the folded result in a form suitable for COND_EXPR_COND or
> > NULL_TREE, if there is no suitable simplified form.  If
> > -   INVARIANT_ONLY is true only gimple_min_invariant results are
> > -   considered simplified.  */
> > +   ALWAYS_COMBINE is false then only combine it the resulting
> > +   expression is 

[pushed] [PR109052] LRA: Implement combining secondary memory reload and original insn

2023-03-17 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109052

The patch was successfully bootstrapped and tested on x86-64, i686, 
aarch64, and ppc64le.
commit 57688950b9328cbb4a9c21eb3199f9132b5119d3
Author: Vladimir N. Makarov 
Date:   Fri Mar 17 08:58:58 2023 -0400

LRA: Implement combining secondary memory reload and original insn

LRA creates secondary memory reload insns but do not try to combine it
with the original insn.  This patch implements a simple insn combining
for such cases in LRA.

PR rtl-optimization/109052

gcc/ChangeLog:

* lra-constraints.cc: Include hooks.h.
(combine_reload_insn): New function.
(lra_constraints): Call it.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr109052.c: New.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index c38566a7451..95b534e1a70 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -110,6 +110,7 @@
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "hooks.h"
 #include "target.h"
 #include "rtl.h"
 #include "tree.h"
@@ -5001,6 +5002,96 @@ contains_reloaded_insn_p (int regno)
   return false;
 }
 
+/* Try combine secondary memory reload insn FROM for insn TO into TO insn.
+   FROM should be a load insn (usually a secondary memory reload insn).  Return
+   TRUE in case of success.  */
+static bool
+combine_reload_insn (rtx_insn *from, rtx_insn *to)
+{
+  bool ok_p;
+  rtx_insn *saved_insn;
+  rtx set, from_reg, to_reg, op;
+  enum reg_class to_class, from_class;
+  int n, nop;
+  signed char changed_nops[MAX_RECOG_OPERANDS + 1];
+  lra_insn_recog_data_t id = lra_get_insn_recog_data (to);
+  struct lra_static_insn_data *static_id = id->insn_static_data;
+  
+  /* Check conditions for second memory reload and original insn:  */
+  if ((targetm.secondary_memory_needed
+   == hook_bool_mode_reg_class_t_reg_class_t_false)
+  || NEXT_INSN (from) != to || CALL_P (to)
+  || id->used_insn_alternative == LRA_UNKNOWN_ALT
+  || (set = single_set (from)) == NULL_RTX)
+return false;
+  from_reg = SET_DEST (set);
+  to_reg = SET_SRC (set);
+  /* Ignore optional reloads: */
+  if (! REG_P (from_reg) || ! REG_P (to_reg)
+  || bitmap_bit_p (_optional_reload_pseudos, REGNO (from_reg)))
+return false;
+  to_class = lra_get_allocno_class (REGNO (to_reg));
+  from_class = lra_get_allocno_class (REGNO (from_reg));
+  /* Check that reload insn is a load:  */
+  if (to_class != NO_REGS || from_class == NO_REGS)
+return false;
+  for (n = nop = 0; nop < static_id->n_operands; nop++)
+{
+  if (static_id->operand[nop].type != OP_IN)
+	continue;
+  op = *id->operand_loc[nop];
+  if (!REG_P (op) || REGNO (op) != REGNO (from_reg))
+	continue;
+  *id->operand_loc[nop] = to_reg;
+  changed_nops[n++] = nop;
+}
+  changed_nops[n] = -1;
+  lra_update_dups (id, changed_nops);
+  lra_update_insn_regno_info (to);
+  ok_p = recog_memoized (to) >= 0;
+  if (ok_p)
+{
+  /* Check that combined insn does not need any reloads: */
+  saved_insn = curr_insn;
+  curr_insn = to;
+  curr_id = lra_get_insn_recog_data (curr_insn);
+  curr_static_id = curr_id->insn_static_data;
+  ok_p = !curr_insn_transform (true);
+  curr_insn = saved_insn;
+  curr_id = lra_get_insn_recog_data (curr_insn);
+  curr_static_id = curr_id->insn_static_data;
+}
+  if (ok_p)
+{
+  id->used_insn_alternative = -1;
+  lra_push_insn_and_update_insn_regno_info (to);
+  if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file, "Use combined insn:\n");
+	  dump_insn_slim (lra_dump_file, to);
+	}
+  return true;
+}
+  if (lra_dump_file != NULL)
+{
+  fprintf (lra_dump_file, "Failed combined insn:\n");
+  dump_insn_slim (lra_dump_file, to);
+}
+  for (int i = 0; i < n; i++)
+{
+  nop = changed_nops[i];
+  *id->operand_loc[nop] = from_reg;
+}
+  lra_update_dups (id, changed_nops);
+  lra_update_insn_regno_info (to);
+  if (lra_dump_file != NULL)
+{
+  fprintf (lra_dump_file, "Restoring insn after failed combining:\n");
+  dump_insn_slim (lra_dump_file, to);
+}
+  return false;
+}
+
 /* Entry function of LRA constraint pass.  Return true if the
constraint pass did change the code.	 */
 bool
@@ -5010,6 +5101,7 @@ lra_constraints (bool first_p)
   int i, hard_regno, new_insns_num;
   unsigned int min_len, new_min_len, uid;
   rtx set, x, reg, dest_reg;
+  rtx_insn *original_insn;
   basic_block last_bb;
   bitmap_iterator bi;
 
@@ -5119,6 +5211,7 @@ lra_constraints (bool first_p)
   new_insns_num = 0;
   last_bb = NULL;
   changed_p = false;
+  original_insn = NULL;
   while ((new_min_len = lra_insn_stack_length ()) != 0)
 {
   curr_insn = lra_pop_insn ();
@@ -5133,7 +5226,12 @@ lra_constraints (bool first_p)
 	{
 	  min_len = new_min_len;
 	  new_insns_num = 0;
+	  

Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 17, 2023 at 12:53:48PM +, Richard Biener wrote:
> On Fri, 17 Mar 2023, Jakub Jelinek wrote:
> 
> > On Fri, Mar 17, 2023 at 01:18:32PM +0100, Richard Biener wrote:
> > > The following adds a missing range-op for __builtin_expect which
> > > helps -Wuse-after-free to detect the case a realloc original
> > > pointer is used when the result was NULL.
> > > 
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> > > 
> > >   PR tree-optimization/109170
> > >   * gimple-range-op.cc (cfn_expect): New.
> > >   (gimple_range_op_handler::maybe_builtin_call): Handle
> > >   __builtin_expect.
> > > 
> > >   * gcc.dg/Wuse-after-free-pr109170.c: New testcase.
> > 
> > Shouldn't that be something we handle generically for all
> > ERF_RETURNS_ARG calls (and not just for irange, but for any
> > supported ranges)?
> > 
> > Though, admittedly __builtin_expect probably doesn't set that
> > and all the other current builtins with ERF_RETURNS_ARG return
> > pointers I think.
> 
> Looking at builtin_fnspec we're indeed missing BUILT_IN_EXPECT,
> but we could indeed use gimple_call_fnspec and look for a
> returned argument.  If it's not the first handling this
> generically is going to be interesting wrt op?_range though,
> so we'd need a range operator for each case (returns arg 1,
> returns arg 2, more args are not supported?).  Currently

I think fnspec supports 1-4, but nothing actually uses anything but 1
or none; I could be wrong.

Anyway, I think it is fine to implement __builtin_expect this way
for now, ERF_RETURNS_ARG will be more important for pointers, especially if
we propagate something more than just maybe be/can't be/must be null.
Don't you need to handle BUILT_IN_EXPECT_WITH_PROBABILITY the same though?

> all returns-arg builtins return the first arg, but eventually
> modref or the fortran frontend will end up with calls returning
> sth else.
> 
> If a float arg is returned it also needs to be a frange operator,
> right?

Jakub



Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Mar 2023, Jakub Jelinek wrote:

> On Fri, Mar 17, 2023 at 01:18:32PM +0100, Richard Biener wrote:
> > The following adds a missing range-op for __builtin_expect which
> > helps -Wuse-after-free to detect the case a realloc original
> > pointer is used when the result was NULL.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> > 
> > PR tree-optimization/109170
> > * gimple-range-op.cc (cfn_expect): New.
> > (gimple_range_op_handler::maybe_builtin_call): Handle
> > __builtin_expect.
> > 
> > * gcc.dg/Wuse-after-free-pr109170.c: New testcase.
> 
> Shouldn't that be something we handle generically for all
> ERF_RETURNS_ARG calls (and not just for irange, but for any
> supported ranges)?
> 
> Though, admittedly __builtin_expect probably doesn't set that
> and all the other current builtins with ERF_RETURNS_ARG return
> pointers I think.

Looking at builtin_fnspec we're indeed missing BUILT_IN_EXPECT,
but we could indeed use gimple_call_fnspec and look for a
returned argument.  If it's not the first handling this
generically is going to be interesting wrt op?_range though,
so we'd need a range operator for each case (returns arg 1,
returns arg 2, more args are not supported?).  Currently
all returns-arg builtins return the first arg, but eventually
modref or the fortran frontend will end up with calls returning
sth else.

If a float arg is returned it also needs to be a frange operator,
right?

Richard.

> So, I think LGTM, but give Andrew/Aldy a chance to chime in.
> 
> > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> > index a5d625387e7..f3d6e29e58d 100644
> > --- a/gcc/gimple-range-op.cc
> > +++ b/gcc/gimple-range-op.cc
> > @@ -309,6 +309,25 @@ public:
> >}
> >  } op_cfn_constant_p;
> >  
> > +// Implement range operator for integral CFN_BUILT_IN_EXPECT.
> > +class cfn_expect : public range_operator
> > +{
> > +public:
> > +  using range_operator::fold_range;
> > +  virtual bool fold_range (irange , tree, const irange ,
> > +  const irange &, relation_trio) const
> > +  {
> > +r = lh;
> > +return true;
> > +  }
> > +  virtual bool op1_range (irange , tree, const irange ,
> > + const irange &, relation_trio) const
> > +  {
> > +r = lhs;
> > +return true;
> > +  }
> > +} op_cfn_expect;
> > +
> >  // Implement range operator for CFN_BUILT_IN_SIGNBIT.
> >  class cfn_signbit : public range_operator_float
> >  {
> > @@ -966,6 +985,12 @@ gimple_range_op_handler::maybe_builtin_call ()
> >m_int = _cfn_parity;
> >break;
> >  
> > +case CFN_BUILT_IN_EXPECT:
> > +  m_valid = true;
> > +  m_op1 = gimple_call_arg (call, 0);
> > +  m_int = _cfn_expect;
> > +  break;
> > +
> >  default:
> >break;
> >  }
> > diff --git a/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c 
> > b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> > new file mode 100644
> > index 000..fa7dc66d66c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -Wuse-after-free" } */
> > +
> > +unsigned long bufmax = 0;
> > +unsigned long __open_catalog_bufmax;
> > +void *realloc(void *, __SIZE_TYPE__);
> > +void free(void *);
> > +
> > +void __open_catalog(char *buf)
> > +{
> > +  char *old_buf = buf;
> > +  buf = realloc (buf, bufmax);
> > +  if (__builtin_expect ((buf == ((void *)0)), 0))
> > +free (old_buf); /* { dg-bogus "used after" } */
> > +}
> > -- 
> > 2.35.3
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 17, 2023 at 01:18:32PM +0100, Richard Biener wrote:
> The following adds a missing range-op for __builtin_expect which
> helps -Wuse-after-free to detect the case a realloc original
> pointer is used when the result was NULL.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> 
>   PR tree-optimization/109170
>   * gimple-range-op.cc (cfn_expect): New.
>   (gimple_range_op_handler::maybe_builtin_call): Handle
>   __builtin_expect.
> 
>   * gcc.dg/Wuse-after-free-pr109170.c: New testcase.

Shouldn't that be something we handle generically for all
ERF_RETURNS_ARG calls (and not just for irange, but for any
supported ranges)?

Though, admittedly __builtin_expect probably doesn't set that
and all the other current builtins with ERF_RETURNS_ARG return
pointers I think.

So, I think LGTM, but give Andrew/Aldy a chance to chime in.

> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index a5d625387e7..f3d6e29e58d 100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -309,6 +309,25 @@ public:
>}
>  } op_cfn_constant_p;
>  
> +// Implement range operator for integral CFN_BUILT_IN_EXPECT.
> +class cfn_expect : public range_operator
> +{
> +public:
> +  using range_operator::fold_range;
> +  virtual bool fold_range (irange , tree, const irange ,
> +const irange &, relation_trio) const
> +  {
> +r = lh;
> +return true;
> +  }
> +  virtual bool op1_range (irange , tree, const irange ,
> +   const irange &, relation_trio) const
> +  {
> +r = lhs;
> +return true;
> +  }
> +} op_cfn_expect;
> +
>  // Implement range operator for CFN_BUILT_IN_SIGNBIT.
>  class cfn_signbit : public range_operator_float
>  {
> @@ -966,6 +985,12 @@ gimple_range_op_handler::maybe_builtin_call ()
>m_int = _cfn_parity;
>break;
>  
> +case CFN_BUILT_IN_EXPECT:
> +  m_valid = true;
> +  m_op1 = gimple_call_arg (call, 0);
> +  m_int = _cfn_expect;
> +  break;
> +
>  default:
>break;
>  }
> diff --git a/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c 
> b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> new file mode 100644
> index 000..fa7dc66d66c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wuse-after-free" } */
> +
> +unsigned long bufmax = 0;
> +unsigned long __open_catalog_bufmax;
> +void *realloc(void *, __SIZE_TYPE__);
> +void free(void *);
> +
> +void __open_catalog(char *buf)
> +{
> +  char *old_buf = buf;
> +  buf = realloc (buf, bufmax);
> +  if (__builtin_expect ((buf == ((void *)0)), 0))
> +free (old_buf); /* { dg-bogus "used after" } */
> +}
> -- 
> 2.35.3

Jakub



[PATCH] tree-optimization/109170 - bogus use-after-free with __builtin_expect

2023-03-17 Thread Richard Biener via Gcc-patches
The following adds a missing range-op for __builtin_expect which
helps -Wuse-after-free to detect the case a realloc original
pointer is used when the result was NULL.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

PR tree-optimization/109170
* gimple-range-op.cc (cfn_expect): New.
(gimple_range_op_handler::maybe_builtin_call): Handle
__builtin_expect.

* gcc.dg/Wuse-after-free-pr109170.c: New testcase.
---
 gcc/gimple-range-op.cc| 25 +++
 .../gcc.dg/Wuse-after-free-pr109170.c | 15 +++
 2 files changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index a5d625387e7..f3d6e29e58d 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -309,6 +309,25 @@ public:
   }
 } op_cfn_constant_p;
 
+// Implement range operator for integral CFN_BUILT_IN_EXPECT.
+class cfn_expect : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  virtual bool fold_range (irange , tree, const irange ,
+  const irange &, relation_trio) const
+  {
+r = lh;
+return true;
+  }
+  virtual bool op1_range (irange , tree, const irange ,
+ const irange &, relation_trio) const
+  {
+r = lhs;
+return true;
+  }
+} op_cfn_expect;
+
 // Implement range operator for CFN_BUILT_IN_SIGNBIT.
 class cfn_signbit : public range_operator_float
 {
@@ -966,6 +985,12 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_int = _cfn_parity;
   break;
 
+case CFN_BUILT_IN_EXPECT:
+  m_valid = true;
+  m_op1 = gimple_call_arg (call, 0);
+  m_int = _cfn_expect;
+  break;
+
 default:
   break;
 }
diff --git a/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c 
b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
new file mode 100644
index 000..fa7dc66d66c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wuse-after-free-pr109170.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wuse-after-free" } */
+
+unsigned long bufmax = 0;
+unsigned long __open_catalog_bufmax;
+void *realloc(void *, __SIZE_TYPE__);
+void free(void *);
+
+void __open_catalog(char *buf)
+{
+  char *old_buf = buf;
+  buf = realloc (buf, bufmax);
+  if (__builtin_expect ((buf == ((void *)0)), 0))
+free (old_buf); /* { dg-bogus "used after" } */
+}
-- 
2.35.3


Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Mar 2023, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> I'm only picking this up now. Just going through your earlier comments and
> stuff and I noticed we didn't address the situation with the gimple::build. Do
> you want me to add overloaded static member functions to cover all
> gimple_build_* functions, or just create one to replace vect_gimple_build and
> we create them as needed? It's more work but I think adding them all would be
> better. I'd even argue that it would be nice to replace the old ones with the
> new ones, but I can imagine you might not want that as it makes backporting
> and the likes a bit annoying...
> 
> Let me know what you prefer, I'll go work on your latest comments too.

I think the series was resolved and I approved it.  As for
vect_gimple_build the better way forward would be to use
gimple_build () as existing but add a vect_finish_stmt_* handling
a gimple_seq.

Richard.


Re: [PATCH] rs6000: suboptimal code for returning bool value on target ppc

2023-03-17 Thread Ajit Agarwal via Gcc-patches
Hello Jeff:

On 16/03/23 8:18 pm, Jeff Law wrote:
> 
> 
> On 3/16/23 04:11, Ajit Agarwal via Gcc-patches wrote:
>>
>> Hello Richard:
>>
>> On 16/03/23 3:22 pm, Richard Biener wrote:
>>> On Thu, Mar 16, 2023 at 9:19 AM Ajit Agarwal  wrote:



 On 16/03/23 1:44 pm, Richard Biener wrote:
> On Thu, Mar 16, 2023 at 9:11 AM Ajit Agarwal  
> wrote:
>>
>> Hello Richard:
>>
>> On 16/03/23 1:10 pm, Richard Biener wrote:
>>> On Thu, Mar 16, 2023 at 6:21 AM Ajit Agarwal via Gcc-patches
>>>  wrote:

 Hello All:


 This patch eliminates unnecessary zero extension instruction from 
 power generated assembly.
 Bootstrapped and regtested on powerpc64-linux-gnu.
>>>
>>> What makes this so special that we cannot deal with it from generic 
>>> code?
>>> In particular we do have the REE pass, why is target specific
>>> knowledge neccessary
>>> to eliminate the extension?
>>>
>>
>> For returning bool values and comparision with integers generates the 
>> following by all the rtl passes.
>>
>> set compare (subreg)
>> set if_then_else
>> Convert SImode -> QImode
>> set zero_extend to SImode from QImode
>> set return value 0 in one path of cfg.
>> set return value 1 in other path of cfg.
>>
>> This pass replaces the above zero extension and conversion from QImode 
>> to DImode with copy operation to keep QImode in 64 bit registers in 
>> powerpc target.
>
> Sorry, I can't parse that - as there's no testcase with the patch I
> cannot even try to see what the actual RTL
> looks like (without the pass).
>

 Here is the PR with bugzilla.
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103784

 I can add the attached testcase with this PR in the patch.
>>>
>>> I don't see any zero-extends there.
>>>
>>
>> Here is the testcase.
>>
>>
>> bool (int a, int b)
>> {
>>    if (a > 2)
>>    return false;
>>     if (b < 10)
>>     return true;
>>   return false;
>> }
>>
>> compiled with gcc -O3 -m64 testcase.cc -mcpu=power9 -save-temps.
>>
>> Here is the rtl after cse.
>> (note 12 11 15 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>> (insn 15 12 16 3 (set (reg:CC 123)
>>  (compare:CC (subreg/s/u:SI (reg/v:DI 120 [ b ]) 0)
>>  (const_int 9 [0x9]))) "ext.cc":5:5 796 {*cmpsi_signed}
>>   (expr_list:REG_DEAD (reg/v:DI 120 [ b ])
>>  (nil)))
>> (insn 16 15 17 3 (set (reg:SI 124)
>>  (const_int 1 [0x1])) "ext.cc":5:5 555 {*movsi_internal1}
>>   (nil))
>> (insn 17 16 18 3 (set (reg:SI 122)
>>  (if_then_else:SI (gt (reg:CC 123)
>>  (const_int 0 [0]))
>>  (const_int 0 [0])
>>  (reg:SI 124))) "ext.cc":5:5 344 {isel_cc_si}
>>   (expr_list:REG_DEAD (reg:SI 124)
>>  (expr_list:REG_DEAD (reg:CC 123)
>>  (nil
>> (insn 18 17 32 3 (set (reg:QI 117 [ _1 ])
>>  (subreg:QI (reg:SI 122) 0)) "ext.cc":5:5 562 {*movqi_internal}
>>   (expr_list:REG_DEAD (reg:SI 122)
>>  (nil)))
>>    ; pc falls through to BB 5
>> (code_label 32 18 31 4 3 (nil) [1 uses])
>> (note 31 32 5 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>> (insn 5 31 19 4 (set (reg:QI 117 [ _1 ])
>>  (const_int 0 [0])) "ext.cc":4:16 562 {*movqi_internal}
>>   (nil))
>> (code_label 19 5 20 5 2 (nil) [0 uses])
>> (note 20 19 21 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
>> (insn 21 20 22 5 (set (reg:DI 126 [ _1 ])
>>  (zero_extend:DI (reg:QI 117 [ _1 ]))) "ext.cc":8:1 5 
>> {zero_extendqidi2}
>>   (expr_list:REG_DEAD (reg:QI 117 [ _1 ])
>>  (nil)))
>> (insn 22 21 26 5 (set (reg:DI 118 [  ])
>>  (reg:DI 126 [ _1 ])) "ext.cc":8:1 681 {*movdi_internal64}
>>   (expr_list:REG_DEAD (reg:DI 126 [ _1 ])
>>  (nil)))
>> (insn 26 22 27 5 (set (reg/i:DI 3 3)
>>  (reg:DI 126 [ _1 ])) "ext.cc":8:1 681 {*movdi_internal64}
>>   (expr_list:REG_DEAD (reg:DI 118 [  ])
>>  (nil)))
>> (insn 27 26 0 5 (use (reg/i:DI 3 3)) "ext.cc":8:1 -1
>>   (nil))
> This looks like it'd be better addressed in REE.
> 
> 
> We've got two paths to the zero_extend.  One sets (reg 117) from a constant.  
> The other sets (reg 117) from a (subreg:QI (reg:SI)).
> 
> Handling the constant is trivial.  For the other set, we can replace the 
> subreg with the zero_extend.  Presumably we'd then proceed to try and 
> eliminate the zero-extend by realizing both arms of the conditional move are 
> constants and thus trivially handled.
> 
> While I don't think REE would handle all this today, fixing it to handle this 
> case seems like it'd be better than doing a specialized pass in the ppc 
> backend.
> 
> jeff
> 

Thanks for your advice. At the input of REE pass the RTL has the following 
wherein zero_extend and subreg( reg 117) is converted to and (subreg DI ( reg 
QI 117).

This needs to 

Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2023-03-17 Thread Andre Vieira (lists) via Gcc-patches

Hi Richard,

I'm only picking this up now. Just going through your earlier comments 
and stuff and I noticed we didn't address the situation with the 
gimple::build. Do you want me to add overloaded static member functions 
to cover all gimple_build_* functions, or just create one to replace 
vect_gimple_build and we create them as needed? It's more work but I 
think adding them all would be better. I'd even argue that it would be 
nice to replace the old ones with the new ones, but I can imagine you 
might not want that as it makes backporting and the likes a bit annoying...


Let me know what you prefer, I'll go work on your latest comments too.

Cheers,
Andre


RE: [PATCH] libatomic: Fix SEQ_CST 128-bit atomic load [PR108891]

2023-03-17 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Wilco Dijkstra 
> Sent: Thursday, March 16, 2023 6:22 PM
> To: GCC Patches 
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: Re: [PATCH] libatomic: Fix SEQ_CST 128-bit atomic load [PR108891]
> 
> ping
> 
> 
> From: Wilco Dijkstra
> Sent: 23 February 2023 15:11
> To: GCC Patches 
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [PATCH] libatomic: Fix SEQ_CST 128-bit atomic load [PR108891]
> 
> 
> The LSE2 ifunc for 16-byte atomic load requires a barrier before the LDP -
> without it, it effectively has Load-AcquirePC semantics similar to LDAPR,
> which is less restrictive than what __ATOMIC_SEQ_CST requires.  This patch
> fixes this and adds comments to make it easier to see which sequence is
> used for each case. Use a load/store exclusive loop for store to simplify
> testing memory ordering is correct (it is slightly faster too).
> 
> Passes regress, OK for commit?
> 

Ok.
Thanks,
Kyrill

> libatomic/
> PR libgcc/108891
> config/linux/aarch64/atomic_16.S: Fix libat_load_16_i1.
> Add comments describing the memory order.
> 
> ---
> 
> diff --git a/libatomic/config/linux/aarch64/atomic_16.S
> b/libatomic/config/linux/aarch64/atomic_16.S
> index
> 732c3534a06678664a252bdbc53652eeab0af506..05439ce394b9653c9bcb582
> 761ff7aaa7c8f9643 100644
> --- a/libatomic/config/linux/aarch64/atomic_16.S
> +++ b/libatomic/config/linux/aarch64/atomic_16.S
> @@ -72,33 +72,38 @@ name:   \
> 
>  ENTRY (libat_load_16_i1)
>  cbnzw1, 1f
> +
> +   /* RELAXED.  */
>  ldp res0, res1, [x0]
>  ret
>  1:
> -   cmp w1, ACQUIRE
> -   b.hi2f
> +   cmp w1, SEQ_CST
> +   b.eq2f
> +
> +   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
>  ldp res0, res1, [x0]
>  dmb ishld
>  ret
> -2:
> +
> +   /* SEQ_CST.  */
> +2: ldartmp0, [x0]  /* Block reordering with Store-Release instr. 
>  */
>  ldp res0, res1, [x0]
> -   dmb ish
> +   dmb ishld
>  ret
>  END (libat_load_16_i1)
> 
> 
>  ENTRY (libat_store_16_i1)
>  cbnzw4, 1f
> +
> +   /* RELAXED.  */
>  stp in0, in1, [x0]
>  ret
> -1:
> -   dmb ish
> -   stp in0, in1, [x0]
> -   cmp w4, SEQ_CST
> -   beq 2f
> -   ret
> -2:
> -   dmb ish
> +
> +   /* RELEASE/SEQ_CST.  */
> +1: ldaxp   xzr, tmp0, [x0]
> +   stlxp   w4, in0, in1, [x0]
> +   cbnzw4, 1b
>  ret
>  END (libat_store_16_i1)
> 
> @@ -106,29 +111,33 @@ END (libat_store_16_i1)
>  ENTRY (libat_exchange_16_i1)
>  mov x5, x0
>  cbnzw4, 2f
> -1:
> -   ldxpres0, res1, [x5]
> +
> +   /* RELAXED.  */
> +1: ldxpres0, res1, [x5]
>  stxpw4, in0, in1, [x5]
>  cbnzw4, 1b
>  ret
>  2:
>  cmp w4, ACQUIRE
>  b.hi4f
> -3:
> -   ldaxp   res0, res1, [x5]
> +
> +   /* ACQUIRE/CONSUME.  */
> +3: ldaxp   res0, res1, [x5]
>  stxpw4, in0, in1, [x5]
>  cbnzw4, 3b
>  ret
>  4:
>  cmp w4, RELEASE
>  b.ne6f
> -5:
> -   ldxpres0, res1, [x5]
> +
> +   /* RELEASE.  */
> +5: ldxpres0, res1, [x5]
>  stlxp   w4, in0, in1, [x5]
>  cbnzw4, 5b
>  ret
> -6:
> -   ldaxp   res0, res1, [x5]
> +
> +   /* ACQ_REL/SEQ_CST.  */
> +6: ldaxp   res0, res1, [x5]
>  stlxp   w4, in0, in1, [x5]
>  cbnzw4, 6b
>  ret
> @@ -142,6 +151,8 @@ ENTRY (libat_compare_exchange_16_i1)
>  cbz w4, 2f
>  cmp w4, RELEASE
>  b.hs3f
> +
> +   /* ACQUIRE/CONSUME.  */
>  caspa   exp0, exp1, in0, in1, [x0]
>  0:
>  cmp exp0, tmp0
> @@ -153,15 +164,18 @@ ENTRY (libat_compare_exchange_16_i1)
>  stp exp0, exp1, [x1]
>  mov x0, 0
>  ret
> -2:
> -   caspexp0, exp1, in0, in1, [x0]
> +
> +   /* RELAXED.  */
> +2: caspexp0, exp1, in0, in1, [x0]
>  b   0b
> -3:
> -   b.hi4f
> +
> +   /* RELEASE.  */
> +3: b.hi4f
>  caspl   exp0, exp1, in0, in1, [x0]
>  b   0b
> -4:
> -   caspal  exp0, exp1, in0, in1, [x0]
> +
> +   /* ACQ_REL/SEQ_CST.  */
> +4: caspal  exp0, exp1, in0, in1, [x0]
>  b   0b
>  END (libat_compare_exchange_16_i1)
> 
> @@ -169,15 +183,17 @@ END (libat_compare_exchange_16_i1)
>  ENTRY (libat_fetch_add_16_i1)
>  mov x5, x0
>  cbnzw4, 2f
> -1:
> -   ldxpres0, res1, [x5]
> +
> +   /* RELAXED.  */
> +1: ldxpres0, res1, [x5]
>  addstmplo, reslo, inlo
>  adc tmphi, reshi, inhi
>  stxpw4, tmp0, tmp1, [x5]
>  cbnzw4, 1b
>  ret
> -2:
> -   ldaxp   res0, res1, [x5]
> +
> +   /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST. 

Re: [PATCH] RISC-V: Fix bugs of internal tests.

2023-03-17 Thread Kito Cheng via Gcc-patches
Committed with git log tweak:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=c413abed869e7e34a86855a015413418f3c6b595

On Mon, Mar 13, 2023 at 3:52 PM  wrote:
>
> From: Ju-Zhe Zhong 
>
> Co-authored-by: kito-cheng 
> Co-authored-by: kito-cheng 
>
> This patch fixed a bunch of bugs reported by kito.ch...@sifive.com.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (legitimize_move): Handle undef value.
> * config/riscv/riscv-vector-builtins.cc 
> (function_expander::use_ternop_insn): Fix bugs of ternary intrinsic.
> (function_expander::use_widen_ternop_insn): Ditto.
> * config/riscv/vector.md (@vundefined): New pattern.
> (pred_mul__undef_merge): Fix bugs.
> (*pred_mul__undef_merge_scalar): Ditto.
> (*pred_mul__undef_merge_extended_scalar): Ditto.
> (pred_neg_mul__undef_merge): Ditto.
> (*pred_neg_mul__undef_merge_scalar): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/binop_vv_constraint-4.c: Adapt the test.
> * gcc.target/riscv/rvv/base/binop_vv_constraint-6.c: Ditto.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-127.c: Ditto.
> * g++.target/riscv/rvv/base/bug-1.C: New test.
> * gcc.target/riscv/rvv/base/bug-2.c: New test.
>
> Signed-off-by: Ju-Zhe Zhong 
> Co-authored-by: kito-cheng 
> Co-authored-by: kito-cheng 
>
> ---
>  gcc/config/riscv/riscv-v.cc   |   4 +
>  gcc/config/riscv/riscv-vector-builtins.cc |  17 +-
>  gcc/config/riscv/vector.md| 889 +++---
>  .../g++.target/riscv/rvv/base/bug-1.C |  40 +
>  .../riscv/rvv/base/binop_vv_constraint-4.c|   1 -
>  .../riscv/rvv/base/binop_vv_constraint-6.c|   2 +-
>  .../riscv/rvv/base/binop_vx_constraint-127.c  |   2 +-
>  .../gcc.target/riscv/rvv/base/bug-2.c |  86 ++
>  8 files changed, 482 insertions(+), 559 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/bug-1.C
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-2.c
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index d65c65b26cd..9b83ef6ea5e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -283,6 +283,10 @@ legitimize_move (rtx dest, rtx src, machine_mode 
> mask_mode)
> emit_move_insn (tmp, src);
>src = tmp;
>  }
> +
> +  if (satisfies_constraint_vu (src))
> +return false;
> +
>emit_vlmax_op (code_for_pred_mov (mode), dest, src, mask_mode);
>return true;
>  }
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 75e65091db3..0df3cd15119 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -3129,7 +3129,6 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
> insn_code icode)
>rtx vd = expand_normal (CALL_EXPR_ARG (exp, arg_offset++));
>rtx vs1 = expand_normal (CALL_EXPR_ARG (exp, arg_offset++));
>rtx vs2 = expand_normal (CALL_EXPR_ARG (exp, arg_offset++));
> -  rtx merge = use_real_merge_p (pred) ? vd : RVV_VUNDEF (mode);
>
>if (VECTOR_MODE_P (GET_MODE (vs1)))
>  {
> @@ -3139,7 +3138,7 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
> insn_code icode)
>add_input_operand (mode, vs2);
>if (vd_accum_p)
> add_input_operand (mode, vd);
> -  add_input_operand (mode, merge);
> +  add_input_operand (mode, vd);
>  }
>else
>  {
> @@ -3154,7 +3153,7 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
> insn_code icode)
>   add_input_operand (mode, vd);
>   add_input_operand (mode, vs2);
> }
> -  add_input_operand (mode, merge);
> +  add_input_operand (mode, vd);
>  }
>
>for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++)
> @@ -3171,8 +3170,6 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
> insn_code icode)
>  rtx
>  function_expander::use_widen_ternop_insn (insn_code icode)
>  {
> -  machine_mode mode = TYPE_MODE (builtin_types[type.index].vector);
> -
>/* Record the offset to get the argument.  */
>int arg_offset = 0;
>
> @@ -3181,16 +3178,8 @@ function_expander::use_widen_ternop_insn (insn_code 
> icode)
>else
>  add_all_one_mask_operand (mask_mode ());
>
> -  rtx merge = RVV_VUNDEF (mode);
> -  if (use_real_merge_p (pred))
> -merge = expand_normal (CALL_EXPR_ARG (exp, arg_offset));
> -
>for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++)
> -{
> -  if (argno == call_expr_nargs (exp) - 1)
> -   add_input_operand (mode, merge);
> -  add_input_operand (argno);
> -}
> +add_input_operand (argno);
>
>add_input_operand (Pmode, get_tail_policy_for_pred (pred));
>add_input_operand (Pmode, get_mask_policy_for_pred (pred));
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index 977d3f2042c..37a539b4852 100644
> --- 

Re: Re: [PATCH] RISC-V: Fix Bug 109092

2023-03-17 Thread Kito Cheng via Gcc-patches
Committed with commit log tweak:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=02880e7803b19c357718abd2f0d567b4a761f318

On Wed, Mar 15, 2023 at 11:06 AM juzhe.zh...@rivai.ai
 wrote:
>
> Yes, I have write access. However, I am new to commit patch to GCC trunk.
> I didn't figure out how to commit patch to GCC trunk.
> And I am afraid of producing a potential risk to GCC trunk during stage 4 in 
> GCC 13.
>
> So I am gonna learn to commit codes when GCC 14 is open.
> And currently, I let Kito commit patch for me.
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-03-15 02:13
> To: juzhe.zhong; gcc-patches
> CC: kito.cheng
> Subject: Re: [PATCH] RISC-V: Fix Bug 109092
>
>
> On 3/13/23 08:17, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109092.
> >
> > gcc/ChangeLog:
> >
> >  * config/riscv/riscv.md: Fix subreg bug.
> LGTM.  Do you have write access now?  If so, go ahead and commit this patch.
>
> Do you have any interest in fixing up a similar bug in peephole.md?
>
> Jeff
>


Re: [PATCH] c, ubsan: Instrument even shortened divisions [PR109151]

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase, the C FE decides to shorten the division because
> it has a guarantee that INT_MIN / -1 division won't be encountered, the
> first operand is widened from narrower unsigned and/or the second operand is
> a constant other than all ones (in this case both are true).
> The problem is that the narrower type in this case is _Bool and
> ubsan_instrument_division only instruments it if op0's type is INTEGER_TYPE
> or REAL_TYPE.  Strangely this doesn't happen in C++ FE.
> Anyway, we only shorten divisions if the INT_MIN / -1 case is impossible,
> so I think we should be fine even with -fstrict-enums in C++ in case it
> shortened to ENUMERAL_TYPEs.
> 
> The following patch just instruments those on the ubsan_instrument_division
> side.  Perhaps only the first hunk and testcase might be needed because
> we shouldn't shorten if the other case could be triggered.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk,
> or shall I omit the second hunk?

LGTM as-is

Richard.

> 2023-03-17  Jakub Jelinek  
> 
>   PR c/109151
>   * c-ubsan.cc (ubsan_instrument_division): Handle all scalar integral
>   types rather than just INTEGER_TYPE.
> 
>   * c-c++-common/ubsan/div-by-zero-8.c: New test.
> 
> --- gcc/c-family/c-ubsan.cc.jj2023-02-28 11:38:28.965868044 +0100
> +++ gcc/c-family/c-ubsan.cc   2023-03-16 09:49:51.651126302 +0100
> @@ -53,7 +53,7 @@ ubsan_instrument_division (location_t lo
>op0 = unshare_expr (op0);
>op1 = unshare_expr (op1);
>  
> -  if (TREE_CODE (type) == INTEGER_TYPE
> +  if (INTEGRAL_TYPE_P (type)
>&& sanitize_flags_p (SANITIZE_DIVIDE))
>  t = fold_build2 (EQ_EXPR, boolean_type_node,
>op1, build_int_cst (type, 0));
> @@ -68,7 +68,7 @@ ubsan_instrument_division (location_t lo
>  t = NULL_TREE;
>  
>/* We check INT_MIN / -1 only for signed types.  */
> -  if (TREE_CODE (type) == INTEGER_TYPE
> +  if (INTEGRAL_TYPE_P (type)
>&& sanitize_flags_p (SANITIZE_SI_OVERFLOW)
>&& !TYPE_UNSIGNED (type))
>  {
> --- gcc/testsuite/c-c++-common/ubsan/div-by-zero-8.c.jj   2023-03-16 
> 10:01:31.626824994 +0100
> +++ gcc/testsuite/c-c++-common/ubsan/div-by-zero-8.c  2023-03-16 
> 10:03:05.510443440 +0100
> @@ -0,0 +1,14 @@
> +/* PR c/109151 */
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=integer-divide-by-zero -Wno-div-by-zero 
> -fno-sanitize-recover=integer-divide-by-zero" } */
> +/* { dg-shouldfail "ubsan" } */
> +
> +int d;
> +
> +int
> +main ()
> +{
> +  d = ((short) (d == 1 | d > 9)) / 0;
> +}
> +
> +/* { dg-output "division by zero" } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] testsuite: Fix up forwprop-39.c testcase [PR109145]

2023-03-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Mar 2023, Jakub Jelinek wrote:

> Hi!
> 
> As written in the PR, newlib headers aren't C11 compliant in that they
> don't define CMPLXF macro, and glibc before 2.16 doesn't define that
> either.  I think it is easier to use __builtin_complex directly, over
> another patch which keeps including complex.h but defines CMPLXF if it
> isn't defined, we want to test how forwprop behaves rather than what
> complex.h defines or doesn't define.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2023-03-17  Jakub Jelinek  
> 
>   PR testsuite/109145
>   * gcc.dg/tree-ssa/forwprop-39.c: Don't include complex.h.
>   (foo): Use __builtin_complex rather than CMPLXF.
> 
> --- gcc/testsuite/gcc.dg/tree-ssa/forwprop-39.c.jj2023-03-13 
> 10:18:59.545433477 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-39.c   2023-03-16 
> 18:49:40.563504504 +0100
> @@ -1,14 +1,12 @@
>  /* { dg-do compile } */
>  /* { dg-options "-std=c11 -O2 -fdump-tree-forwprop1 -fdump-tree-optimized" } 
> */
>  
> -#include 
> -
>  extern void push1(void *p, float _Complex x);
>  void foo (void *q, float _Complex *x)
>  {
>float r = __real *x;
>float i = __imag *x;
> -  push1 (q, CMPLXF (r, i));
> +  push1 (q, __builtin_complex (r, i));
>  }
>  
>  /* { dg-final { scan-tree-dump-not "COMPLEX_EXPR" "forwprop1" } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Richard Biener via Gcc-patches
On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  wrote:
>
> For this C testcase:
>
> void g();
> void f(unsigned int *a)
> {
>   if (++*a == 1)
> g();
> }
>
> GCC will currently emit a comparison with 1 by using the value
> of *a after the increment. This can be improved by comparing
> against 0 and using the value before the increment. As a result
> there is a potentially shorter dependancy chain (no need to wait
> for the result of +1) and on targets with compare zero instructions
> the generated code is one instruction shorter.

The downside is we now need two registers and their lifetime overlaps.

Your patch mixes changing / inverting a parameter (which seems unneeded
for the actual change) with preferring compares against zero.

What's the reason to specifically prefer compares against zero?  On x86
we have add that sets flags, so ++*a == 0 would be preferred, but
for your sequence we'd need a test reg, reg; branch on zero, so we do
not save any instruction.

We do have quite some number of bugreports with regards to making VRPs
life harder when splitting things this way.  It's easier for VRP to handle

  _1 = _2 + 1;
  if (_1 == 1)

than it is

  _1 = _2 + 1;
  if (_2 == 0)

where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
the life-range issue there's other side-effects as well.  Maybe ranger meanwhile
can handle the above case?

What's the overall effect of the change on a larger code base?

Thanks,
Richard.

>
> Example from Aarch64:
>
> Before
> ldr w1, [x0]
> add w1, w1, 1
> str w1, [x0]
> cmp w1, 1
> beq .L4
> ret
>
> After
> ldr w1, [x0]
> add w2, w1, 1
> str w2, [x0]
> cbz w1, .L4
> ret
>
> gcc/ChangeLog:
>
> * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> (forward_propagate_into_comparison_1): Optimize
> for zero comparisons.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/tree-ssa-forwprop.cc | 41 +++-
>  1 file changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> index e34f0888954..93d5043821b 100644
> --- a/gcc/tree-ssa-forwprop.cc
> +++ b/gcc/tree-ssa-forwprop.cc
> @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
>  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> the folded result in a form suitable for COND_EXPR_COND or
> NULL_TREE, if there is no suitable simplified form.  If
> -   INVARIANT_ONLY is true only gimple_min_invariant results are
> -   considered simplified.  */
> +   ALWAYS_COMBINE is false then only combine it the resulting
> +   expression is gimple_min_invariant or considered simplified
> +   compared to the original.  */
>
>  static tree
>  combine_cond_expr_cond (gimple *stmt, enum tree_code code, tree type,
> -   tree op0, tree op1, bool invariant_only)
> +   tree op0, tree op1, bool always_combine)
>  {
>tree t;
>
> @@ -398,17 +399,31 @@ combine_cond_expr_cond (gimple *stmt, enum tree_code 
> code, tree type,
>/* Canonicalize the combined condition for use in a COND_EXPR.  */
>t = canonicalize_cond_expr_cond (t);
>
> -  /* Bail out if we required an invariant but didn't get one.  */
> -  if (!t || (invariant_only && !is_gimple_min_invariant (t)))
> +  if (!t)
>  {
>fold_undefer_overflow_warnings (false, NULL, 0);
>return NULL_TREE;
>  }
>
> -  bool nowarn = warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> -  fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> +  if (always_combine || is_gimple_min_invariant (t))
> +{
> +  bool nowarn = warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> +  fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> +  return t;
> +}
>
> -  return t;
> +  /* If the result of folding is a zero comparison treat it preferentially.  
> */
> +  if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison
> +  && integer_zerop (TREE_OPERAND (t, 1))
> +  && !integer_zerop (op1))
> +{
> +  bool nowarn = warning_suppressed_p (stmt, OPT_Wstrict_overflow);
> +  fold_undefer_overflow_warnings (!nowarn, stmt, 0);
> +  return t;
> +}
> +
> +  fold_undefer_overflow_warnings (false, NULL, 0);
> +  return NULL_TREE;
>  }
>
>  /* Combine the comparison OP0 CODE OP1 at LOC with the defining statements
> @@ -432,7 +447,7 @@ forward_propagate_into_comparison_1 (gimple *stmt,
>if (def_stmt && can_propagate_from (def_stmt))
> {
>   enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
> - bool invariant_only_p = !single_use0_p;
> + bool always_combine = single_use0_p;
>
>   rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt);
>
> @@ -442,10 +457,10 @@ forward_propagate_into_comparison_1 (gimple *stmt,
>&& TREE_CODE (TREE_TYPE (TREE_OPERAND (rhs0, 0)))
>  

[PATCH] cgraphclones: Fix up target_clones cloning of functions with vector arguments [PR105554]

2023-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

multiple_target.cc is the only caller of create_version_clone_with_body
which calls it with non-NULL target_attributes.  The attributes are
finalized soon after new_decl is created and then tree_function_versioning
is called.  This temporarily sets DECL_RESULT and DECL_ARGUMENTS of
new_decl to the ones of old_decl, after all usually we don't have the
arguments yet and then it initializes cfun for the new function, so that
we can later e.g. remap the arguments which needs the inlining
infrastructure and dest_cfun in id.  Now, initialize_cfun calls
allocate_function which with the new cfun pushed does:
4845  /* Now that we have activated any function-specific attributes
4846 that might affect layout, particularly vector modes, 
relayout
4847 each of the parameters and the result.  */
4848  relayout_decl (result);
4849  for (tree parm = DECL_ARGUMENTS (fndecl); parm;
4850   parm = DECL_CHAIN (parm))
4851relayout_decl (parm);
Normally that is the correct thing to do, but because in this case
DECL_RESULT and DECL_ARGUMENTs are those from old_decl until we change it,
the above actually could have changed DECL_MODE of DECL_RESULT and
DECL_ARGUMENTS if any of that has vector type and whether a vector_type_mode
differs between the old_decl and new_decl enabled ISAs.

I don't have an idea how to cleanly resolve this on the
tree_function_versioning side, plus most of the time this isn't a problem
because usually tree_function_versioning works between old_decl and new_decl
with same target attributes.  So, the following patch instead fixes it up
afterwards, doing the relayout_decl again on old_decl to restore it back.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-17  Jakub Jelinek  

PR target/105554
* cgraphclones.cc: Include stor-layout.h.
(cgraph_node::create_version_clone_with_body): If target_attributes,
relayout DECL_RESULT and DECL_ARGUMENTS of old_decl back after
tree_function_versioning.

* gcc.target/i386/pr105554.c: New test.

--- gcc/cgraphclones.cc.jj  2023-02-24 11:05:19.704595633 +0100
+++ gcc/cgraphclones.cc 2023-03-16 19:46:14.592006501 +0100
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.
 #include "ipa-fnsummary.h"
 #include "symtab-thunks.h"
 #include "symtab-clones.h"
+#include "stor-layout.h"
 
 /* Create clone of edge in the node N represented by CALL_EXPR
the callgraph.  */
@@ -1094,6 +1095,19 @@ cgraph_node::create_version_clone_with_b
   || in_lto_p)
 new_version_node->unique_name = true;
 
+  if (target_attributes)
+{
+  /* tree_function_versioning temporarily copies old_decl's DECL_RESULT
+and DECL_ARGUMENTS to new_decl, then allocate_struct_function
+which will relayout_decl those.  Relayout them back.  */
+  push_cfun (DECL_STRUCT_FUNCTION (old_decl));
+  relayout_decl (DECL_RESULT (old_decl));
+  for (tree parm = DECL_ARGUMENTS (old_decl);
+  parm; parm = DECL_CHAIN (parm))
+   relayout_decl (parm);
+  pop_cfun ();
+}
+
   /* Update the call_expr on the edges to call the new version node. */
   update_call_expr (new_version_node);
 
--- gcc/testsuite/gcc.target/i386/pr105554.c.jj 2023-03-16 19:50:58.126884823 
+0100
+++ gcc/testsuite/gcc.target/i386/pr105554.c2023-03-16 19:50:25.031365400 
+0100
@@ -0,0 +1,10 @@
+/* PR target/105554 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wno-psabi -mno-sse3" } */
+
+typedef long long v4di __attribute__((__vector_size__(32)));
+
+__attribute__((target_clones ("arch=core-avx2", "default"))) void
+foo (v4di x)
+{
+}

Jakub



[PATCH] testsuite: Fix up forwprop-39.c testcase [PR109145]

2023-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

As written in the PR, newlib headers aren't C11 compliant in that they
don't define CMPLXF macro, and glibc before 2.16 doesn't define that
either.  I think it is easier to use __builtin_complex directly, over
another patch which keeps including complex.h but defines CMPLXF if it
isn't defined, we want to test how forwprop behaves rather than what
complex.h defines or doesn't define.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-03-17  Jakub Jelinek  

PR testsuite/109145
* gcc.dg/tree-ssa/forwprop-39.c: Don't include complex.h.
(foo): Use __builtin_complex rather than CMPLXF.

--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-39.c.jj  2023-03-13 
10:18:59.545433477 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-39.c 2023-03-16 18:49:40.563504504 
+0100
@@ -1,14 +1,12 @@
 /* { dg-do compile } */
 /* { dg-options "-std=c11 -O2 -fdump-tree-forwprop1 -fdump-tree-optimized" } */
 
-#include 
-
 extern void push1(void *p, float _Complex x);
 void foo (void *q, float _Complex *x)
 {
   float r = __real *x;
   float i = __imag *x;
-  push1 (q, CMPLXF (r, i));
+  push1 (q, __builtin_complex (r, i));
 }
 
 /* { dg-final { scan-tree-dump-not "COMPLEX_EXPR" "forwprop1" } } */

Jakub



[PATCH] c, ubsan: Instrument even shortened divisions [PR109151]

2023-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

On the following testcase, the C FE decides to shorten the division because
it has a guarantee that INT_MIN / -1 division won't be encountered, the
first operand is widened from narrower unsigned and/or the second operand is
a constant other than all ones (in this case both are true).
The problem is that the narrower type in this case is _Bool and
ubsan_instrument_division only instruments it if op0's type is INTEGER_TYPE
or REAL_TYPE.  Strangely this doesn't happen in C++ FE.
Anyway, we only shorten divisions if the INT_MIN / -1 case is impossible,
so I think we should be fine even with -fstrict-enums in C++ in case it
shortened to ENUMERAL_TYPEs.

The following patch just instruments those on the ubsan_instrument_division
side.  Perhaps only the first hunk and testcase might be needed because
we shouldn't shorten if the other case could be triggered.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk,
or shall I omit the second hunk?

2023-03-17  Jakub Jelinek  

PR c/109151
* c-ubsan.cc (ubsan_instrument_division): Handle all scalar integral
types rather than just INTEGER_TYPE.

* c-c++-common/ubsan/div-by-zero-8.c: New test.

--- gcc/c-family/c-ubsan.cc.jj  2023-02-28 11:38:28.965868044 +0100
+++ gcc/c-family/c-ubsan.cc 2023-03-16 09:49:51.651126302 +0100
@@ -53,7 +53,7 @@ ubsan_instrument_division (location_t lo
   op0 = unshare_expr (op0);
   op1 = unshare_expr (op1);
 
-  if (TREE_CODE (type) == INTEGER_TYPE
+  if (INTEGRAL_TYPE_P (type)
   && sanitize_flags_p (SANITIZE_DIVIDE))
 t = fold_build2 (EQ_EXPR, boolean_type_node,
 op1, build_int_cst (type, 0));
@@ -68,7 +68,7 @@ ubsan_instrument_division (location_t lo
 t = NULL_TREE;
 
   /* We check INT_MIN / -1 only for signed types.  */
-  if (TREE_CODE (type) == INTEGER_TYPE
+  if (INTEGRAL_TYPE_P (type)
   && sanitize_flags_p (SANITIZE_SI_OVERFLOW)
   && !TYPE_UNSIGNED (type))
 {
--- gcc/testsuite/c-c++-common/ubsan/div-by-zero-8.c.jj 2023-03-16 
10:01:31.626824994 +0100
+++ gcc/testsuite/c-c++-common/ubsan/div-by-zero-8.c2023-03-16 
10:03:05.510443440 +0100
@@ -0,0 +1,14 @@
+/* PR c/109151 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=integer-divide-by-zero -Wno-div-by-zero 
-fno-sanitize-recover=integer-divide-by-zero" } */
+/* { dg-shouldfail "ubsan" } */
+
+int d;
+
+int
+main ()
+{
+  d = ((short) (d == 1 | d > 9)) / 0;
+}
+
+/* { dg-output "division by zero" } */

Jakub



[committed] openmp: Fix up handling of doacross loops with noreturn body in loops [PR108685]

2023-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch fixes an ICE with doacross loops which have a single entry
no exit body, at least one of the ordered > collapse loops isn't guaranteed to
have at least one iteration and the whole doacross loop is inside some other 
loop.
The OpenMP constructs aren't represented by struct loop until the omp 
expansions,
so for a normal doacross loop which doesn't have a noreturn body the entry_bb
with the GOMP_FOR statement and the first bb of the body typically have the
same loop_father, and if the doacross loop isn't inside of some other loop
and the body is noreturn as well, both are part of loop 0.  The problematic
case is when the entry_bb is inside of some deeper loop, but the body, because
it falls through into EXIT, has loop 0 as loop_father.  l0_bb is created by
splitting the entry_bb fallthru edge into l1_bb, and because the two basic 
blocks
have different loop_father, a common loop is found for those (which is loop 0).
Now, if the doacross loop has collapse == ordered or all the ordered > collapse
loops are guaranteed to iterate at least once, all is still fine, because all
enter the l1_bb (body), which doesn't return and so doesn't loop further either.
But, if one of those loops could loop 0 times, the user written body wouldn't be
reached at all, so unlike the expectations the whole construct actually wouldn't
be noreturn if entry_bb is encountered and decides to handle at least one
iteration.

In this case, we need to fix up, move the l0_bb into the same loop as entry_bb
(initially) and for the extra added loops put them as children of that same
loop, rather than of loop 0.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-03-17  Jakub Jelinek  

PR middle-end/108685
* omp-expand.cc (expand_omp_for_ordered_loops): Add L0_BB argument,
use its loop_father rather than BODY_BB's loop_father.
(expand_omp_for_generic): Adjust expand_omp_for_ordered_loops caller.
If broken_loop with ordered > collapse and at least one of those
extra loops aren't guaranteed to have at least one iteration, change
l0_bb's loop_father to entry_bb's loop_father.  Set cont_bb's
loop_father to l0_bb's loop_father rather than l1_bb's.

* c-c++-common/gomp/doacross-8.c: New test.

--- gcc/omp-expand.cc.jj2023-02-16 10:13:01.0 +0100
+++ gcc/omp-expand.cc   2023-03-16 17:19:46.180181287 +0100
@@ -3674,7 +3674,7 @@ expand_omp_ordered_source_sink (struct o
 static basic_block
 expand_omp_for_ordered_loops (struct omp_for_data *fd, tree *counts,
  basic_block cont_bb, basic_block body_bb,
- bool ordered_lastprivate)
+ basic_block l0_bb, bool ordered_lastprivate)
 {
   if (fd->ordered == fd->collapse)
 return cont_bb;
@@ -3783,7 +3783,7 @@ expand_omp_for_ordered_loops (struct omp
  class loop *loop = alloc_loop ();
  loop->header = new_header;
  loop->latch = e2->src;
- add_loop (loop, body_bb->loop_father);
+ add_loop (loop, l0_bb->loop_father);
}
 }
 
@@ -4481,9 +4481,15 @@ expand_omp_for_generic (struct omp_regio
}
  if (i < fd->ordered)
{
+ if (entry_bb->loop_father != l0_bb->loop_father)
+   {
+ remove_bb_from_loops (l0_bb);
+ add_bb_to_loop (l0_bb, entry_bb->loop_father);
+ gcc_assert (single_succ (l0_bb) == l1_bb);
+   }
  cont_bb
= create_empty_bb (EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb);
- add_bb_to_loop (cont_bb, l1_bb->loop_father);
+ add_bb_to_loop (cont_bb, l0_bb->loop_father);
  gimple_stmt_iterator gsi = gsi_after_labels (cont_bb);
  gimple *g = gimple_build_omp_continue (fd->loop.v, fd->loop.v);
  gsi_insert_before (, g, GSI_SAME_STMT);
@@ -4495,7 +4501,7 @@ expand_omp_for_generic (struct omp_regio
}
   expand_omp_ordered_source_sink (region, fd, counts, cont_bb);
   cont_bb = expand_omp_for_ordered_loops (fd, counts, cont_bb, l1_bb,
- ordered_lastprivate);
+ l0_bb, ordered_lastprivate);
   if (counts[fd->collapse - 1])
{
  gcc_assert (fd->collapse == 1);
--- gcc/testsuite/c-c++-common/gomp/doacross-8.c.jj 2023-03-16 
17:57:32.070167415 +0100
+++ gcc/testsuite/c-c++-common/gomp/doacross-8.c2023-03-16 
17:56:21.313199748 +0100
@@ -0,0 +1,17 @@
+/* PR middle-end/108685 */
+/* { dg-do compile } */
+
+void
+foo (int a)
+{
+  for (int m = 0; m < 10; m++)
+#pragma omp for collapse(2) ordered(4)
+for (int i = 0; i < 2; i++)
+  for (int j = 0; j < a; j++)
+   for (int k = 0; k < 2; k++)
+ for (int l = 0; l < a; l++)
+   {
+ #pragma omp ordered depend (source)
+