date:20210625

Re: [PATCH 3/7] Duplicate the range information of the phi onto the new ssa_name

2021-06-25 Thread Andrew Pinski via Gcc-patches

On Thu, Jun 24, 2021 at 8:17 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > Since match_simplify_replacement uses gimple_simplify, there is a new
> > ssa name created sometimes and then we go and replace the phi edge with
> > this new ssa name, the range information on the phi is lost.
> > Placing this in replace_phi_edge_with_variable is the best option instead
> > of doing it in each time replace_phi_edge_with_variable is called which is
> > what is done today.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range
> >   info if we're the only things setting the target PHI.
> >   (value_replacement): Don't duplicate range here.
> >   (minmax_replacement): Likewise.
> > ---
> >   gcc/tree-ssa-phiopt.c | 35 ++-
> >   1 file changed, 18 insertions(+), 17 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> > index 24cbce9955a..147397ad82c 100644
> > --- a/gcc/tree-ssa-phiopt.c
> > +++ b/gcc/tree-ssa-phiopt.c
> > @@ -391,6 +391,24 @@ replace_phi_edge_with_variable (basic_block cond_block,
> > basic_block bb = gimple_bb (phi);
> > basic_block block_to_remove;
> > gimple_stmt_iterator gsi;
> > +  tree phi_result = PHI_RESULT (phi);
> > +
> > +  /* Duplicate range info as needed.  */
> > +  if (TREE_CODE (new_tree) == SSA_NAME
> > +  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
> > +  && INTEGRAL_TYPE_P (TREE_TYPE (phi_result)))
> > +{
> > +  if (!SSA_NAME_RANGE_INFO (new_tree)
> > +   && SSA_NAME_RANGE_INFO (phi_result))
> > + duplicate_ssa_name_range_info (new_tree,
> > +SSA_NAME_RANGE_TYPE (phi_result),
> > +SSA_NAME_RANGE_INFO (phi_result));
> > +  if (SSA_NAME_RANGE_INFO (new_tree)
> > +   && !SSA_NAME_RANGE_INFO (phi_result))
> > + duplicate_ssa_name_range_info (phi_result,
> > +SSA_NAME_RANGE_TYPE (new_tree),
> > +SSA_NAME_RANGE_INFO (new_tree));
> > +}
> I'm not sure you need the EDGE_COUNT test here.

I think we need it. See below for the reason why.

>
> I worry that copying the range info from the argument to the PHI result
> isn't right.  It was OK for the ABS replacement, but I'm not sure that's
> true in general.

Right, I had originally placed the code that was done in
minmax_replacement in match_simplify_replacement.
But Richi wanted me to move the code to replace_phi_edge_with_variable
and have it be able to go both directions for the range.
The reason why the check for the number of incoming edges is needed is
because if we start with:

Range[xyz]
c_1 = PHI 

And we are replacing the variable for edges 4 and 5 only, then we
would get the wrong range on the new SSA name as it would include the
range of a_2 which is not correct.
Also we would not lose the range information in that case either.
This is more about losing the range information.

In the case where this shows up is we have:
bb1:
if (a_1 >= 255) goto bb2; else goto bb3;

bb2:
 a_2 = 255;
goto bb3;

range<-INF,255>
a_3 = PHI

Which gets translated into
bb1:
_4 = min
goto bb2

range<-INF,255>
a_3 = PHI<_4(1)>
bb3:

use(a_3)

And when cfg-cleanup merges the two basic blocks, _4 is propagated
into where a_3 was used and the range information is now gone.

So the idea is to propagate the range info from a_3 back on to _4 if
we know that they propagated later on.
The opposite way (range of _4 on to a_3) actually does not really
matter as we are going to throw away the PHI node anyways.

Thanks,
Andrew Pinski

>
> Jeff
>
>
> >
> > /* Change the PHI argument to new.  */
> > SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree);
> > @@ -1385,16 +1403,6 @@ value_replacement (basic_block cond_bb, basic_block 
> > middle_bb,
> >  :
> >  # u_3 = PHI   */
> > reset_flow_sensitive_info (lhs);
> > -  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
> > - {
> > -   /* If available, we can use VR of phi result at least.  */
> > -   tree phires = gimple_phi_result (phi);
> > -   struct range_info_def *phires_range_info
> > - = SSA_NAME_RANGE_INFO (phires);
> > -   if (phires_range_info)
> > - duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires),
> > -phires_range_info);
> > - }
> > gimple_stmt_iterator gsi_from;
> > for (int i = prep_cnt - 1; i >= 0; --i)
> >   {
> > @@ -1794,13 +1802,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
> > middle_bb,
> > gimple_seq stmts = NULL;
> > tree phi_result = PHI_RESULT (phi);
> > result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, 
> > arg1);
> > -  /* Duplicate ran

[pushed] c++: constexpr aggr init of empty class [PR101040]

2021-06-25 Thread Jason Merrill via Gcc-patches

This is basically the aggregate initializer version of PR97566; as in that
bug, we are trying to initialize empty field 'obj' in 'single' when there's
no CONSTRUCTOR entry for the 'single' base class subobject of 'derived'.  As
with that bug, the fix is to stop trying to add entries for empty fields,
this time in cxx_eval_bare_aggregate.

The change to is_empty_field isn't necessary for this version of
the patch, but seems worthwhile for robustness anyway.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101040
PR c++/97566

gcc/cp/ChangeLog:

* class.c (is_empty_field): Handle null argument.
* constexpr.c (cxx_eval_bare_aggregate): Discard initializer
for empty field.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/no_unique_address13.C: New test.
---
 gcc/cp/class.c|  2 +-
 gcc/cp/constexpr.c|  9 ++-
 .../g++.dg/cpp2a/no_unique_address13.C| 24 +++
 3 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/no_unique_address13.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index c89ffadcef8..33093e1e1ef 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4220,7 +4220,7 @@ field_poverlapping_p (tree decl)
 bool
 is_empty_field (tree decl)
 {
-  if (TREE_CODE (decl) != FIELD_DECL)
+  if (!decl || TREE_CODE (decl) != FIELD_DECL)
 return false;
 
   bool r = (is_empty_class (TREE_TYPE (decl))
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index a26aead6dc2..4cd9db33a1a 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4449,7 +4449,12 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree 
t,
   FOR_EACH_CONSTRUCTOR_ELT (v, i, index, value)
 {
   tree orig_value = value;
-  init_subob_ctx (ctx, new_ctx, index, value);
+  /* Like in cxx_eval_store_expression, omit entries for empty fields.  */
+  bool no_slot = TREE_CODE (type) == RECORD_TYPE && is_empty_field (index);
+  if (no_slot)
+   new_ctx = *ctx;
+  else
+   init_subob_ctx (ctx, new_ctx, index, value);
   int pos_hint = -1;
   if (new_ctx.ctor != ctx->ctor)
{
@@ -4495,6 +4500,8 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
  gcc_assert (is_empty_class (TREE_TYPE (TREE_TYPE (index;
  changed = true;
}
+  else if (no_slot)
+   changed = true;
   else
{
  if (TREE_CODE (type) == UNION_TYPE
diff --git a/gcc/testsuite/g++.dg/cpp2a/no_unique_address13.C 
b/gcc/testsuite/g++.dg/cpp2a/no_unique_address13.C
new file mode 100644
index 000..66b83d68137
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/no_unique_address13.C
@@ -0,0 +1,24 @@
+// PR c++/101040
+// { dg-do compile { target c++11 } }
+
+// This class has to be empty.
+struct empty
+{};
+
+// This class has to be empty.
+struct single
+{
+// This member has to be no_unique_address.
+[[no_unique_address]] empty obj;
+};
+
+// This class has to be empty and derived from single.
+struct derived : single
+{
+// This constructor has to be constexpr and take a forwarding reference.
+template 
+constexpr derived(Arg&& arg) : single{arg}
+{}
+};
+
+auto obj = derived{empty{}};

base-commit: 05516402f8eb8ec282a13fa1e38d9f9c5b3dd3e5
prerequisite-patch-id: 5514b2574504cceea4064fa53a3a97d77c936e66
-- 
2.27.0

[wwwdocs] Update C++ DR table

2021-06-25 Thread Marek Polacek via Gcc-patches

It's been a minute since I've updated our C++ DR table, so this patch
adds news DRs.  I've also written a script that check each DR's status
against the upstream document.  That revealed that many entries in our
table were out of sync, so I've gone through all of them and fixed them.

I've also used "iconv -f UTF-8 -t ASCII//TRANSLIT" to convert all quotes
to ASCII ".

Validated, pushed.

commit 7220e26862b78df86d77a55f514d110c93d230c6
Author: Marek Polacek 
Date:   Fri Jun 25 20:06:43 2021 -0400

cxx-dr-status.html: Add new DRs, update status of older ones

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index fc4864d7..8839a03b 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -15,7 +15,7 @@
 
   This table tracks the implementation status of C++ defect reports in GCC.
   It is based on C++ Standard Core Language Issue Table of Contents, Revision
-  102 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html";>here).
+  104 (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html";>here).
 
   
 
@@ -64,7 +64,8 @@
 
   https://wg21.link/cwg6";>6
   open
-  Should the optimization that allows a class object to alias another 
object also allow the case of a parameter in an inline function to alias its 
argument?
+  Should the optimization that allows a class object to alias another 
+ object also allow the case of a parameter in an inline function to 
alias its argument?
   -
   
 
@@ -271,11 +272,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg36";>36
-  open
+  DRWP
   using-declarations in multiple-declaration contexts
-  -
+  ?
   
 
 
@@ -789,11 +790,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg110";>110
-  open
+  DRWP
   Can template functions and classes be declared in the same 
scope?
-  -
+  ?
   
 
 
@@ -985,11 +986,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg138";>138
-  drafting
+  DRWP
   Friend declaration name lookup
-  -
+  ?
   
 
 
@@ -,11 +1112,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg156";>156
-  drafting
+  NAD
   Name lookup for conversion functions
-  -
+  N/A
   
 
 
@@ -1356,18 +1357,18 @@
   ?
   
 
-
+
   https://wg21.link/cwg191";>191
-  open
+  DRWP
   Name lookup does not handle complex nesting
-  -
+  ?
   
 
-
+
   https://wg21.link/cwg192";>192
-  drafting
+  NAD
   Name lookup in parameters
-  -
+  N/A
   
 
 
@@ -1394,7 +1395,7 @@
 
   https://wg21.link/cwg196";>196
   open
-  Arguments to deallocation functions
+  Arguments to deallocation functions 
   -
   
 
@@ -1807,11 +1808,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg255";>255
-  drafting
+  DRWP
   Placement deallocation functions and lookup ambiguity
-  -
+  ?
   
 
 
@@ -1919,11 +1920,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg271";>271
-  open
+  DRWP
   Explicit instantiation and template argument deduction
-  -
+  ?
   
 
 
@@ -1968,18 +1969,18 @@
   ?
   
 
-
+
   https://wg21.link/cwg278";>278
-  open
+  NAD
   External linkage and nameless entities
-  -
+  N/A
   
 
 
   https://wg21.link/cwg279";>279
-  open
+  DRWP
   Correspondence of "names for linkage purposes"
-  -
+  ?
   
 
 
@@ -2101,11 +2102,11 @@
   Yes
   
 
-
+
   https://wg21.link/cwg297";>297
-  open
+  NAD
   Which template does an explicit specialization specialize?
-  -
+  N/A
   
 
 
@@ -2209,7 +2210,7 @@
 
   https://wg21.link/cwg312";>312
   CD3
-  “use” of invalid pointer value not defined
+  "use" of invalid pointer value not defined
   ?
   
 
@@ -2384,15 +2385,15 @@
 
   https://wg21.link/cwg337";>337
   CD1
-  Attempt to create array of abtract type should cause deduction to 
fail
+  Attempt to create array of abstract type should cause deduction to 
fail
   ?
   
 
-
+
   https://wg21.link/cwg338";>338
-  open
+  DRWP
   Enumerator name with linkage used as class name in other translation 
unit
-  -
+  ?
   
 
 
@@ -2405,7 +2406,7 @@
 
   https://wg21.link/cwg340";>340
   NAD
-  Unclear wording in disambiguation section
+  Unclear wording in disambiguation section 
   ?
   
 
@@ -2542,11 +2543,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg360";>360
-  open
+  DRWP
   Using-declaration that reduces access
-  -
+

[committed] jit: fix test-vector-* failures

2021-06-25 Thread David Malcolm via Gcc-patches

Fix failures seen on i686 due to relying on exact floating-point
equality when testing results of vector division.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 99585d88a090b4c5b7791f7ab62f70eb37b748fa.

gcc/testsuite/ChangeLog:
* jit.dg/test-vector-rvalues.cc (check_div): Add specialization
for v4f, to avoid relying on exact floating-point equality.
* jit.dg/test-vector-types.cc (check_div): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/jit.dg/test-vector-rvalues.cc | 8 
 gcc/testsuite/jit.dg/test-vector-types.cc   | 8 
 2 files changed, 16 insertions(+)

diff --git a/gcc/testsuite/jit.dg/test-vector-rvalues.cc 
b/gcc/testsuite/jit.dg/test-vector-rvalues.cc
index ac230bf774c..02877114e4a 100644
--- a/gcc/testsuite/jit.dg/test-vector-rvalues.cc
+++ b/gcc/testsuite/jit.dg/test-vector-rvalues.cc
@@ -165,6 +165,14 @@ check_div (const V &a, const V &b, const V &c)
 CHECK_VALUE (c[i], a[i] / b[i]);
 }
 
+template <>
+void
+check_div (const v4f &a, const v4f &b, const v4f &c)
+{
+  for (int i = 0; i < 4; i++)
+CHECK_DOUBLE_VALUE (c[i], a[i] / b[i]);
+}
+
 template 
 void
 verify_vec_code (gcc_jit_context *ctxt, gcc_jit_result *result,
diff --git a/gcc/testsuite/jit.dg/test-vector-types.cc 
b/gcc/testsuite/jit.dg/test-vector-types.cc
index 3389e04a082..1f49be6b59f 100644
--- a/gcc/testsuite/jit.dg/test-vector-types.cc
+++ b/gcc/testsuite/jit.dg/test-vector-types.cc
@@ -139,6 +139,14 @@ check_div (const T &a, const T &b, const T &c)
 CHECK_VALUE (c[i], a[i] / b[i]);
 }
 
+template <>
+void
+check_div (const v4f &a, const v4f &b, const v4f &c)
+{
+  for (int i = 0; i < 4; i++)
+CHECK_DOUBLE_VALUE (c[i], a[i] / b[i]);
+}
+
 template 
 void
 verify_vec_code (gcc_jit_context *ctxt, gcc_jit_result *result,
-- 
2.26.3

[committed] jit: fix test-asm failures on i?86

2021-06-25 Thread David Malcolm via Gcc-patches

On i686, test_i386_basic_asm_4 has:
  error: inconsistent operand constraints in an 'asm'
and test_i386_basic_asm_5 has:
  /tmp/libgccjit-9FsLie/fake.s:9: Error: bad register name `%rdi'
  /tmp/libgccjit-9FsLie/fake.s:10: Error: bad register name `%rsi'

This is only intended as a smoketest of asm support, so only run
it on x86_64.

Successfully regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 176289e50e3d379b681f8775cdb0230b1e9f50dc.

gcc/testsuite/ChangeLog:
* jit.dg/test-asm.c: Remove i?86-*-* from target specifier.
* jit.dg/test-asm.cc: Likewise.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/jit.dg/test-asm.c  | 2 +-
 gcc/testsuite/jit.dg/test-asm.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/jit.dg/test-asm.c b/gcc/testsuite/jit.dg/test-asm.c
index eeeb4fe..35a9f9d8605 100644
--- a/gcc/testsuite/jit.dg/test-asm.c
+++ b/gcc/testsuite/jit.dg/test-asm.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile { target x86_64-*-* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/jit.dg/test-asm.cc b/gcc/testsuite/jit.dg/test-asm.cc
index 6f1828060ff..be487e3fb69 100644
--- a/gcc/testsuite/jit.dg/test-asm.cc
+++ b/gcc/testsuite/jit.dg/test-asm.cc
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile { target x86_64-*-* } } */
 
 #include "libgccjit++.h"
 
-- 
2.26.3

[committed] remove a spurious note from calls.c (PR 101216)

2021-06-25 Thread Martin Sebor via Gcc-patches


Yesterday's r12-1805 resulted in an informational note being
printed unconditionally, even when the warning it goes with
is disabled.  It caused an analyzer test to fail but I missed
that in my test results.  I pushed the attached fix to avoid
printing them.

Martin
PR middle-end/101216 - spurious notes for function calls

	PR middle-end/101216

gcc/ChangeLog:

	* calls.c (maybe_warn_rdwr_sizes): Use the no_warning constant.

gcc/testsuite/ChangeLog:

	* gcc.dg/Wnonnull-7.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index 27e8c451635..f8a4b79e7f8 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -2054,7 +2054,7 @@ maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
 	*sizstr = '\0';
 
   /* Set if a warning has been issued for the current argument.  */
-  opt_code arg_warned = N_OPTS;
+  opt_code arg_warned = no_warning;
   location_t loc = EXPR_LOCATION (exp);
   tree ptr = access.second.ptr;
   if (*sizstr
@@ -2080,7 +2080,7 @@ maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
 			   exp, sizidx + 1, sizstr))
 	arg_warned = OPT_Wstringop_overflow_;
 
-	  if (arg_warned != N_OPTS)
+	  if (arg_warned != no_warning)
 	{
 	  append_attrname (access, attrstr, sizeof attrstr);
 	  /* Remember a warning has been issued and avoid warning
@@ -2152,7 +2152,7 @@ maybe_warn_rdwr_sizes (rdwr_map *rwm, tree fndecl, tree fntype, tree exp)
 		arg_warned = OPT_Wnonnull;		
 	}
 
-	  if (arg_warned)
+	  if (arg_warned != no_warning)
 	{
 	  append_attrname (access, attrstr, sizeof attrstr);
 	  /* Remember a warning has been issued and avoid warning
diff --git a/gcc/testsuite/gcc.dg/Wnonnull-7.c b/gcc/testsuite/gcc.dg/Wnonnull-7.c
new file mode 100644
index 000..e7b331a904c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wnonnull-7.c
@@ -0,0 +1,15 @@
+/* PR middle-end/101216 - spurious notes for function calls
+   { dg-do compile }
+   { dg-options "-O2 -w" } */
+
+__attribute__ ((access (write_only, 1, 2))) char*
+getcwd (char *, __SIZE_TYPE__);
+
+char* f (void)
+{
+  char a[8];
+  return getcwd (0, 8);
+}
+
+/* Expect no messages of any kind on output.
+   { dg-bogus "" "" { target *-*-* } 0 } */

Re: [COMMITTED 1/2] Adjust on_entry cache to indicate if the value was set properly.

2021-06-25 Thread Martin Sebor via Gcc-patches


I just glanced at the patch out of curiosity and the first hunk
caught my eye.  There's nothing wrong with the change and I like
how you make the APIs const-correct!  Just a note that it looks
like the const in on the basic_block declaration might be missing
an underscore (it should be const_basic_block).  Otherwise it's
superfluous.  This is repeated in a bunch of places in the file
and its header.

Martin

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 4347485cf98..bdecd212691 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -132,7 +132,7 @@ non_null_ref::process_name (tree name)
 class ssa_block_ranges
 {
 public:
-  virtual void set_bb_range (const basic_block bb, const irange &r) = 0;
+  virtual bool set_bb_range (const basic_block bb, const irange &r) = 0;
   virtual bool get_bb_range (irange &r, const basic_block bb) = 0;
   virtual bool bb_range_p (const basic_block bb) = 0;

On 6/23/21 8:27 AM, Andrew MacLeod via Gcc-patches wrote:
The introduction of the sparse bitmap on-entry cache for large BB 
functions also introduced the concept that the value we ask to be 
written to the cache may not be properly represented.  It is limited to 
15 unique ranges for any given ssa-name, then  it reverts to varying for 
any additional values to be safe.


This patch adds a boolean return value to the cache set routines, 
allowing us to check if the value was properly written.


Other than that, no functional change.

Bootstrap on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-25 Thread Martin Sebor via Gcc-patches


On 6/25/21 4:11 PM, Jason Merrill wrote:

On 6/25/21 4:51 PM, Martin Sebor wrote:

On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:
On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec 
along
with a few simple tests.  It makes auto_vec safe to use in 
containers
that expect copyable and assignable element types and passes 
bootstrap

and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those 
operators?


I would strongly prefer the generic vector class to have the 
properties
expected of any other generic container: copyable and 
assignable.  If
we also want another vector type with this restriction I suggest 
to add
another "noncopyable" type and make that property explicit in 
its name.

I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign). 
Looking around

I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're 
writing C++11.



The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion 
richi mentions.  And many uses of auto_vec will pass them as vec, 
which will still do a shallow copy.  I think it's probably better 
to disable the copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing 
an auto_vec by value as a vec, and thus doing a shallow copy, would 
be to add a vec ctor taking an auto_vec, and delete that.  This would 
mean if you want to pass an auto_vec to a vec interface, it needs to 
be by reference.  We might as well do the same for operator=, though 
that isn't as important.


Thanks, that sounds like a good idea.  Attached is an implementation
of this change.  Since the auto_vec copy ctor and assignment have
been deleted by someone else in the interim, this patch doesn't
reverse that.  I will propose it separately

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-25 Thread Jason Merrill via Gcc-patches


On 6/25/21 4:51 PM, Martin Sebor wrote:

On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:
On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec 
along
with a few simple tests.  It makes auto_vec safe to use in 
containers
that expect copyable and assignable element types and passes 
bootstrap

and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those 
operators?


I would strongly prefer the generic vector class to have the 
properties
expected of any other generic container: copyable and 
assignable.  If
we also want another vector type with this restriction I suggest 
to add
another "noncopyable" type and make that property explicit in its 
name.

I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign). 
Looking around

I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're 
writing C++11.



The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion 
richi mentions.  And many uses of auto_vec will pass them as vec, 
which will still do a shallow copy.  I think it's probably better to 
disable the copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing 
an auto_vec by value as a vec, and thus doing a shallow copy, would be 
to add a vec ctor taking an auto_vec, and delete that.  This would 
mean if you want to pass an auto_vec to a vec interface, it needs to 
be by reference.  We might as well do the same for operator=, though 
that isn't as important.


Thanks, that sounds like a good idea.  Attached is an implementation
of this change.  Since the auto_vec copy ctor and assignment have
been deleted by someone else in the interim, this patch doesn't
reverse that.  I will propose it separately after these changes
are finalized.

My ap

Re: [r12-1805 Regression] FAIL: gcc.dg/analyzer/setjmp-2.c (test for excess errors) on Linux/x86_64

2021-06-25 Thread Andrew Pinski via Gcc-patches

On Fri, Jun 25, 2021 at 2:27 PM David Malcolm via Gcc-patches
 wrote:
>
> On Thu, 2021-06-24 at 22:26 -0700, sunil.k.pandey via Gcc-patches
> wrote:
> > On Linux/x86_64,
> >
> > e9e2bad7251477db92ab9ebcdc010f9282dd9890 is the first bad commit
> > commit e9e2bad7251477db92ab9ebcdc010f9282dd9890
> > Author: Martin Sebor 
> > Date:   Thu Jun 24 19:22:06 2021 -0600
> >
> > middle-end: add support for per-location warning groups.
> >
> > caused
> >
> > FAIL: gcc.dg/analyzer/setjmp-2.c (test for excess errors)
> >
> > with GCC configured with
> >
> > ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-
> > bisect-master/master/r12-1805/usr --enable-clocale=gnu --with-system-
> > zlib --with-demangler-in-ld --with-fpmath=sse --enable-
> > languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx
> > x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> > target_board='unix{-m32}'"
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> > target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> > target_board='unix{-m64}'"
> > $ cd {build_dir}/gcc && make check
> > RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> > target_board='unix{-m64\ -march=cascadelake}'"
>
> Thanks.
>
> > (Please do not reply to this email, for question about this report,
> > contact me at skpgkp2 at gmail dot com)
>
> [dropping you from the reply]
>
> I can reproduce this; it seems to be a bug in Martin's changes to
> calls.c, rather than being specific to the analyzer.
>
> I've filed it as:
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101217

It was already reported as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101216 .

Thanks,
Andrew Pinski

>
> Dave
>

Re: [r12-1805 Regression] FAIL: gcc.dg/analyzer/setjmp-2.c (test for excess errors) on Linux/x86_64

2021-06-25 Thread David Malcolm via Gcc-patches

On Thu, 2021-06-24 at 22:26 -0700, sunil.k.pandey via Gcc-patches
wrote:
> On Linux/x86_64,
> 
> e9e2bad7251477db92ab9ebcdc010f9282dd9890 is the first bad commit
> commit e9e2bad7251477db92ab9ebcdc010f9282dd9890
> Author: Martin Sebor 
> Date:   Thu Jun 24 19:22:06 2021 -0600
> 
>     middle-end: add support for per-location warning groups.
> 
> caused
> 
> FAIL: gcc.dg/analyzer/setjmp-2.c (test for excess errors)
> 
> with GCC configured with
> 
> ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-
> bisect-master/master/r12-1805/usr --enable-clocale=gnu --with-system-
> zlib --with-demangler-in-ld --with-fpmath=sse --enable-
> languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx
> x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="analyzer.exp=gcc.dg/analyzer/setjmp-2.c --
> target_board='unix{-m64\ -march=cascadelake}'"

Thanks.

> (Please do not reply to this email, for question about this report,
> contact me at skpgkp2 at gmail dot com)

[dropping you from the reply]

I can reproduce this; it seems to be a bug in Martin's changes to
calls.c, rather than being specific to the analyzer.

I've filed it as:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101217

Dave

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-25 Thread Martin Sebor via Gcc-patches


On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec along
with a few simple tests.  It makes auto_vec safe to use in 
containers
that expect copyable and assignable element types and passes 
bootstrap

and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those 
operators?


I would strongly prefer the generic vector class to have the 
properties

expected of any other generic container: copyable and assignable.  If
we also want another vector type with this restriction I suggest 
to add
another "noncopyable" type and make that property explicit in its 
name.

I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign). 
Looking around

I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're writing 
C++11.



The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion 
richi mentions.  And many uses of auto_vec will pass them as vec, 
which will still do a shallow copy.  I think it's probably better to 
disable the copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing an 
auto_vec by value as a vec, and thus doing a shallow copy, would be to 
add a vec ctor taking an auto_vec, and delete that.  This would mean if 
you want to pass an auto_vec to a vec interface, it needs to be by 
reference.  We might as well do the same for operator=, though that 
isn't as important.


Thanks, that sounds like a good idea.  Attached is an implementation
of this change.  Since the auto_vec copy ctor and assignment have
been deleted by someone else in the interim, this patch doesn't
reverse that.  I will propose it separately after these changes
are finalized.

My approach was to 1) disable the auto_vec to v

Re: [PATCH v2] c++: Failure to delay noexcept parsing with ptr-operator [PR100752]

2021-06-25 Thread Jason Merrill via Gcc-patches


On 6/25/21 4:42 PM, Marek Polacek wrote:

On Thu, Jun 10, 2021 at 10:31:33PM -0400, Jason Merrill wrote:

On 6/10/21 5:19 PM, Marek Polacek wrote:

On Thu, Jun 10, 2021 at 03:09:29PM -0400, Jason Merrill wrote:

On 6/8/21 8:25 PM, Marek Polacek wrote:

We weren't passing 'flags' to the recursive call to cp_parser_declarator
in the ptr-operator case and as an effect, delayed parsing of noexcept
didn't work as advertised.  The following change passes more than just
CP_PARSER_FLAGS_DELAY_NOEXCEPT but that doesn't seem to break anything.

I'm not passing member_p because I don't need it and because it breaks
a few tests.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/branches?

PR c++/100752

gcc/cp/ChangeLog:

* parser.c (cp_parser_declarator): Pass flags down to
cp_parser_declarator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept69.C: New test.
---
gcc/cp/parser.c |  3 +--
gcc/testsuite/g++.dg/cpp0x/noexcept69.C | 12 
2 files changed, 13 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept69.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d59a829d0b9..5930990ec1c 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -22066,8 +22066,7 @@ cp_parser_declarator (cp_parser* parser,
cp_parser_parse_tentatively (parser);
  /* Parse the dependent declarator.  */
-  declarator = cp_parser_declarator (parser, dcl_kind,
-CP_PARSER_FLAGS_NONE,
+  declarator = cp_parser_declarator (parser, dcl_kind, flags,
 /*ctor_dtor_or_conv_p=*/NULL,
 /*parenthesized_p=*/NULL,
 /*member_p=*/false,


Should the other parameters also be passed down?  I'd think definitely
member_p and static_p, not sure about ctor_dtor_or_conv_p and
parenthesized_p.


Hmm, as I mentioned in the patch description, I tried, but passing member_p
broke a few tests and since it's not needed for this fix I gave up
investigating why.  I could look into it if you're curious :).


Please.


Turns out those were just trivial changes to the expected error messages.
The following patch passes member_p and static_p too:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
We weren't passing 'flags' to the recursive call to cp_parser_declarator
in the ptr-operator case and as an effect, delayed parsing of noexcept
didn't work as advertised.  The following change passes more than just
CP_PARSER_FLAGS_DELAY_NOEXCEPT but that doesn't seem to break anything.

I'm now also passing member_p and static_p, as a consequence, two tests
needed small tweaks.

PR c++/100752

gcc/cp/ChangeLog:

* parser.c (cp_parser_declarator): Pass flags down to
cp_parser_declarator.  Also pass static_p/member_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept69.C: New test.
* g++.dg/parse/saved1.C: Adjust dg-error.
* g++.dg/template/crash50.C: Likewise.
---
  gcc/cp/parser.c |  6 ++
  gcc/testsuite/g++.dg/cpp0x/noexcept69.C | 12 
  gcc/testsuite/g++.dg/parse/saved1.C |  4 ++--
  gcc/testsuite/g++.dg/template/crash50.C |  2 +-
  4 files changed, 17 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept69.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 096580e7e50..02daa7a6f6a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -22170,12 +22170,10 @@ cp_parser_declarator (cp_parser* parser,
cp_parser_parse_tentatively (parser);
  
/* Parse the dependent declarator.  */

-  declarator = cp_parser_declarator (parser, dcl_kind,
-CP_PARSER_FLAGS_NONE,
+  declarator = cp_parser_declarator (parser, dcl_kind, flags,
 /*ctor_dtor_or_conv_p=*/NULL,
 /*parenthesized_p=*/NULL,
-/*member_p=*/false,
-friend_p, /*static_p=*/false);
+member_p, friend_p, static_p);
  
/* If we are parsing an abstract-declarator, we must handle the

 case where the dependent declarator is absent.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept69.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept69.C
new file mode 100644
index 000..9b87ba0cafb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept69.C
@@ -0,0 +1,12 @@
+// PR c++/100752
+// { dg-do compile { target c++11 } }
+
+struct S {
+  void f() noexcept {}
+  S &g() noexcept(noexcept(f())) { f(); return *this; }
+};
+
+struct X {
+  int& f() noexcept(noexcept(i));
+  int i;
+};
diff --git a/gcc/testsuite/g++.dg/parse/saved1.C 
b/gcc/testsuite/g++.dg/parse/saved1.C
index 979a05676d2..1deaa93f516 100644
--- a/gcc/testsu

Re: [PATCH v2] c++: Failure to delay noexcept parsing with ptr-operator [PR100752]

2021-06-25 Thread Marek Polacek via Gcc-patches

On Thu, Jun 10, 2021 at 10:31:33PM -0400, Jason Merrill wrote:
> On 6/10/21 5:19 PM, Marek Polacek wrote:
> > On Thu, Jun 10, 2021 at 03:09:29PM -0400, Jason Merrill wrote:
> > > On 6/8/21 8:25 PM, Marek Polacek wrote:
> > > > We weren't passing 'flags' to the recursive call to cp_parser_declarator
> > > > in the ptr-operator case and as an effect, delayed parsing of noexcept
> > > > didn't work as advertised.  The following change passes more than just
> > > > CP_PARSER_FLAGS_DELAY_NOEXCEPT but that doesn't seem to break anything.
> > > > 
> > > > I'm not passing member_p because I don't need it and because it breaks
> > > > a few tests.
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/branches?
> > > > 
> > > > PR c++/100752
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.c (cp_parser_declarator): Pass flags down to
> > > > cp_parser_declarator.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp0x/noexcept69.C: New test.
> > > > ---
> > > >gcc/cp/parser.c |  3 +--
> > > >gcc/testsuite/g++.dg/cpp0x/noexcept69.C | 12 
> > > >2 files changed, 13 insertions(+), 2 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept69.C
> > > > 
> > > > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> > > > index d59a829d0b9..5930990ec1c 100644
> > > > --- a/gcc/cp/parser.c
> > > > +++ b/gcc/cp/parser.c
> > > > @@ -22066,8 +22066,7 @@ cp_parser_declarator (cp_parser* parser,
> > > > cp_parser_parse_tentatively (parser);
> > > >  /* Parse the dependent declarator.  */
> > > > -  declarator = cp_parser_declarator (parser, dcl_kind,
> > > > -CP_PARSER_FLAGS_NONE,
> > > > +  declarator = cp_parser_declarator (parser, dcl_kind, flags,
> > > >  /*ctor_dtor_or_conv_p=*/NULL,
> > > >  /*parenthesized_p=*/NULL,
> > > >  /*member_p=*/false,
> > > 
> > > Should the other parameters also be passed down?  I'd think definitely
> > > member_p and static_p, not sure about ctor_dtor_or_conv_p and
> > > parenthesized_p.
> > 
> > Hmm, as I mentioned in the patch description, I tried, but passing member_p
> > broke a few tests and since it's not needed for this fix I gave up
> > investigating why.  I could look into it if you're curious :).
> 
> Please.

Turns out those were just trivial changes to the expected error messages.
The following patch passes member_p and static_p too:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
We weren't passing 'flags' to the recursive call to cp_parser_declarator
in the ptr-operator case and as an effect, delayed parsing of noexcept
didn't work as advertised.  The following change passes more than just
CP_PARSER_FLAGS_DELAY_NOEXCEPT but that doesn't seem to break anything.

I'm now also passing member_p and static_p, as a consequence, two tests
needed small tweaks.

PR c++/100752

gcc/cp/ChangeLog:

* parser.c (cp_parser_declarator): Pass flags down to
cp_parser_declarator.  Also pass static_p/member_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept69.C: New test.
* g++.dg/parse/saved1.C: Adjust dg-error.
* g++.dg/template/crash50.C: Likewise.
---
 gcc/cp/parser.c |  6 ++
 gcc/testsuite/g++.dg/cpp0x/noexcept69.C | 12 
 gcc/testsuite/g++.dg/parse/saved1.C |  4 ++--
 gcc/testsuite/g++.dg/template/crash50.C |  2 +-
 4 files changed, 17 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept69.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 096580e7e50..02daa7a6f6a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -22170,12 +22170,10 @@ cp_parser_declarator (cp_parser* parser,
cp_parser_parse_tentatively (parser);
 
   /* Parse the dependent declarator.  */
-  declarator = cp_parser_declarator (parser, dcl_kind,
-CP_PARSER_FLAGS_NONE,
+  declarator = cp_parser_declarator (parser, dcl_kind, flags,
 /*ctor_dtor_or_conv_p=*/NULL,
 /*parenthesized_p=*/NULL,
-/*member_p=*/false,
-friend_p, /*static_p=*/false);
+member_p, friend_p, static_p);
 
   /* If we are parsing an abstract-declarator, we must handle the
 case where the dependent declarator is absent.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept69.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept69.C
new file mode 100644
index 000..9b87ba0cafb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept69.C
@@ -0,0 +1,12 @@
+// PR c++/100752
+// { dg-do compile { target c++11 } }
+
+struct S

Re: [PATCH] c++: access scope during partial spec matching [PR96204]

2021-06-25 Thread Jason Merrill via Gcc-patches


On 6/25/21 1:14 PM, Patrick Palka wrote:

On Fri, 25 Jun 2021, Jason Merrill wrote:


On 6/25/21 11:03 AM, Patrick Palka wrote:

Here, when determining whether the partial specialization matches the
specialization has_set_attr_method, we do so from the scope of
where the template-id appears rather than from the scope of the
specialization, and this causes us to select the partial specialization
(since Child::type is accessible from Parent).  When we later
instantiate this partial specialization, we've entered the scope of the
specialization and so substitution into e.g. the DECL_CONTEXT for
'value' yields access errors for Child::type since the friend
declaration no longer applies.

It seems the appropriate access scope from which to perform partial
specialization matching is the specialization itself (similar to how
we check access of base-clauses), which is what this patch implements.



There's implementation divergence however: Clang accepts both testcases
below whereas MSVC and ICC reject both (indicating that Clang performs
partial spec matching from the scope of the specialization and MSVC/ICC
performs it from whatever scope the template-id appears).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/96204

gcc/cp/ChangeLog:

* pt.c (instantiate_class_template_1): Enter the scope of the
type before calling most_specialized_partial_spec.

gcc/testsuite/ChangeLog:

* g++.dg/template/access40.C: New test.
* g++.dg/template/access40a.C: New test.
---
   gcc/cp/pt.c   |  6 -
   gcc/testsuite/g++.dg/template/access40.C  | 30 +++
   gcc/testsuite/g++.dg/template/access40a.C | 30 +++
   3 files changed, 65 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/template/access40.C
   create mode 100644 gcc/testsuite/g++.dg/template/access40a.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f4e0abe5c1e..5107bfbf9d1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11774,8 +11774,12 @@ instantiate_class_template_1 (tree type)
 deferring_access_check_sentinel acs (dk_no_deferred);
   /* Determine what specialization of the original template to
- instantiate.  */
+ instantiate; do this relative to the scope of the type.  */
+  push_access_scope (TYPE_NAME (type));
+  pushclass (type);


How about replacing these two calls with push_nested_class (type), like we use
later in the function?


That works nicely.  Would the patch be OK with that change?
Bootstrapped and regtested on x86_64-pc-linux-gnu.


OK with that and the matching change...


 t = most_specialized_partial_spec (type, tf_warning_or_error);
+  popclass ();
+  pop_access_scope (TYPE_NAME (type));


...here.


 if (t == error_mark_node)
   return error_mark_node;
 else if (t)
diff --git a/gcc/testsuite/g++.dg/template/access40.C
b/gcc/testsuite/g++.dg/template/access40.C
new file mode 100644
index 000..e0d30779377
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40.C
@@ -0,0 +1,30 @@
+// PR c++/96204
+
+template struct bool_constant;
+
+template
+struct has_type_member {
+  static const bool value = false;
+};
+
+template
+struct has_type_member {
+  static const bool value = true;
+};
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct Parent;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization of has_type_member does not match
+// despite Child::type being accessible from the current scope.
+typedef bool_constant::value> type;
+typedef bool_constant type;
+  }
+};
diff --git a/gcc/testsuite/g++.dg/template/access40a.C
b/gcc/testsuite/g++.dg/template/access40a.C
new file mode 100644
index 000..85138c9e570
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40a.C
@@ -0,0 +1,30 @@
+// PR c++/96204
+
+template struct bool_constant;
+
+template
+struct has_type_member {
+  static const bool value = false;
+};
+
+template
+struct has_type_member {
+  static const bool value = true;
+};
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct has_type_member;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization matches because Child::type is
+// accessible from has_type_member.
+typedef bool_constant::value> type;
+typedef bool_constant type;
+  }
+};

Re: [PATCH] c++: CTAD within alias template [PR91911]

2021-06-25 Thread Jason Merrill via Gcc-patches


On 6/25/21 1:11 PM, Patrick Palka wrote:

On Fri, 25 Jun 2021, Jason Merrill wrote:


On 6/24/21 4:45 PM, Patrick Palka wrote:

In the first testcase below, during parsing of the alias template
ConstSpanType, transparency of alias template specializations means we
replace SpanType with SpanType's substituted definition.  But this
substitution lowers the level of the CTAD placeholder for span(T()) from
2 to 1, and so the later instantiantion of ConstSpanType
erroneously substitutes this CTAD placeholder with the template argument
at level 1 index 0, i.e. with int, before we get a chance to perform the
CTAD.

In light of this, it seems we should avoid level lowering when
substituting through through the type-id of a dependent alias template
specialization.  To that end this patch makes lookup_template_class_1
pass tf_partial to tsubst in this situation.


This makes sense, but what happens if SpanType is a member template, so that
the levels of it and ConstSpanType don't match?  Or the other way around?


If SpanType is a member template of say the class template A (and
thus its level is greater than ConstSpanType):

   template
   struct A {
 template
 using SpanType = decltype(span(T()));
   };

   template
   using ConstSpanType = span::SpanType::value_type>;

   using type = ConstSpanType;

then this case luckily works even without the patch because
instantiate_class_template now reuses the specialization A::SpanType
that was formed earlier during instantiation of A, where we
substitute only a single level of template arguments, so the level of
the CTAD placeholder inside the defining-type-id of this specialization
dropped from 3 to 2, so still more than the level of ConstSpanType.

This luck is short-lived though, because if we replace
A::SpanType with say A::SpanType then the testcase
breaks again (without the patch) because we no longer can reuse that
specialization, so we instead form it on the spot by substituting two
levels of template arguments (U=int,T=T) into the defining-type-id,
causing the level of the placeholder to drop to 1.  I think the patch
causes its level to remain 3 (though I guess it should really be 2).


For the other way around, if ConstSpanType is a member template of
say the class template B (and thus its level is greater than
SpanType):

   template
   using SpanType = decltype(span(T()));

   template
   struct B {
 template
 using ConstSpanType = span::value_type>;
   };

   using type = B::ConstSpanType;

then tf_partial doesn't help here at all; we end up substituting 'int'
for the CTAD placeholder...  What it seems we need is to _increase_ the
level of the CTAD placeholder from 2 to 3 during the dependent
substitution..

Hmm, rather than messing with tf_partial, which is apparently only a
partial solution, maybe we should just make tsubst never substitute a
CTAD placeholder -- they should always be resolved from do_class_deduction,
and their level doesn't really matter otherwise.  (But we'd still want
to substitute into the CLASS_PLACEHOLDER_TEMPLATE of the placeholder in
case it's a template template parm.)  Something like:

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5107bfbf9d1..dead651ed84 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15552,7 +15550,8 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
  
  	levels = TMPL_ARGS_DEPTH (args);

if (level <= levels
-   && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0)
+   && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0
+   && !template_placeholder_p (t))
  {
arg = TMPL_ARG (args, level, idx);
  


seems to work better.


Makes sense.




Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/91911

gcc/cp/ChangeLog:

* pt.c (lookup_template_class_1): When looking up a dependent
alias template specialization, pass tf_partial to tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction92.C: New test.
---
   gcc/cp/pt.c   |  7 +-
   .../g++.dg/cpp1z/class-deduction92.C  | 17 +
   .../g++.dg/cpp1z/class-deduction93.C  | 25 +++
   3 files changed, 48 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction93.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f73c7471a33..23c5f515716 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9954,7 +9954,12 @@ lookup_template_class_1 (tree d1, tree arglist, tree
in_decl, tree context,
template-arguments for the template-parameters in the
type-id of the alias template.  */
   -  t = tsubst (TREE_TYPE (gen_tmpl), arglist, complain, in_decl);
+ /* When substituting a dependent alias template specialization,
+we pass tf_partial to avoid lowering the level of any 'auto'
+in it

[committed] libstdc++: Avoid intercepting exception in ostream::write

2021-06-25 Thread Jonathan Wakely via Gcc-patches

Currently if ostream::write fails and sets badbit and that causes an
exception, we will catch the exception, set badbit again, and rethrow
the exception.

This change delays setting badbit until after the try-catch block, so
that if it causes an exception we don't need to catch and rethrow it.

This removes the last remaining use of _M_write, so it can be made
private (or removed entirely for versioned namespace builds, where ABI
compatibility is not required). All other uses of _M_write were replaced
by calls to __ostream_insert, so make _M_write use that too.

libstdc++-v3/ChangeLog:

* include/bits/ostream.tcc (basic_ostream::write): Call sputn
directly instead of using _M_write. Do setstate(__err) all
outside the try-catch block.
* include/std/ostream (basic_ostream::_M_write): Declare
private. Use __ostream_insert. Do not define for the versioned
namespace.

Tested powerpc64le-linux. Committed to trunk.

commit 4a52cf2eb9d6864ad85674ab56838e0d9ce27926
Author: Jonathan Wakely 
Date:   Fri Jun 25 18:31:23 2021

libstdc++: Avoid intercepting exception in ostream::write

Currently if ostream::write fails and sets badbit and that causes an
exception, we will catch the exception, set badbit again, and rethrow
the exception.

This change delays setting badbit until after the try-catch block, so
that if it causes an exception we don't need to catch and rethrow it.

This removes the last remaining use of _M_write, so it can be made
private (or removed entirely for versioned namespace builds, where ABI
compatibility is not required). All other uses of _M_write were replaced
by calls to __ostream_insert, so make _M_write use that too.

libstdc++-v3/ChangeLog:

* include/bits/ostream.tcc (basic_ostream::write): Call sputn
directly instead of using _M_write. Do setstate(__err) all
outside the try-catch block.
* include/std/ostream (basic_ostream::_M_write): Declare
private. Use __ostream_insert. Do not define for the versioned
namespace.

diff --git a/libstdc++-v3/include/bits/ostream.tcc 
b/libstdc++-v3/include/bits/ostream.tcc
index d3220e8034b..06b2217d216 100644
--- a/libstdc++-v3/include/bits/ostream.tcc
+++ b/libstdc++-v3/include/bits/ostream.tcc
@@ -192,8 +192,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   sentry __cerb(*this);
   if (__cerb)
{
+ ios_base::iostate __err = ios_base::goodbit;
  __try
-   { _M_write(__s, __n); }
+   {
+ if (this->rdbuf()->sputn(__s, __n) != __n)
+   __err = ios_base::badbit;
+   }
  __catch(__cxxabiv1::__forced_unwind&)
{
  this->_M_setstate(ios_base::badbit);  
@@ -201,6 +205,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
  __catch(...)
{ this->_M_setstate(ios_base::badbit); }
+ if (__err)
+   this->setstate(ios_base::badbit);
}
   return *this;
 }
diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 981697324c9..ddb33feb12f 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -308,19 +308,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __ostream_type&
   put(char_type __c);
 
-  /**
-   *  @brief  Core write functionality, without sentry.
-   *  @param  __s  The array to insert.
-   *  @param  __n  Maximum number of characters to insert.
-  */
-  void
-  _M_write(const char_type* __s, streamsize __n)
-  {
-   const streamsize __put = this->rdbuf()->sputn(__s, __n);
-   if (__put != __n)
- this->setstate(ios_base::badbit);
-  }
-
   /**
*  @brief  Character string insertion.
*  @param  __s  The array to insert.
@@ -419,6 +406,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
__ostream_type&
_M_insert(_ValueT __v);
+
+private:
+#if !_GLIBCXX_INLINE_VERSION
+  void
+  _M_write(const char_type* __s, streamsize __n)
+  { std::__ostream_insert(*this, __s, __n); }
+#endif
 };
 
   /**

[committed] libstdc++: Implement LWG 581 for std:ostream::flush()

2021-06-25 Thread Jonathan Wakely via Gcc-patches

LWG 581 changed ostream::flush() to an unformatted output function for
C++11, but it was never implemented in libstdc++.

libstdc++-v3/ChangeLog:

* doc/xml/manual/intro.xml: Document LWG 581 change.
* doc/html/manual/bugs.html: Regenerate.
* include/bits/basic_ios.tcc: Whitespace.
* include/bits/ostream.tcc (basic_ostream::flush()): Construct
sentry.
* testsuite/27_io/basic_ostream/flush/char/2.cc: Check
additional cases.
* testsuite/27_io/basic_ostream/flush/char/exceptions_badbit_throw.cc:
Likewise.
* testsuite/27_io/basic_ostream/flush/wchar_t/2.cc: Likewise.
* 
testsuite/27_io/basic_ostream/flush/wchar_t/exceptions_badbit_throw.cc:
Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit f8c5b542f6cb6a947600e34420565ac67486ea14
Author: Jonathan Wakely 
Date:   Fri Jun 25 18:31:23 2021

libstdc++: Implement LWG 581 for std:ostream::flush()

LWG 581 changed ostream::flush() to an unformatted output function for
C++11, but it was never implemented in libstdc++.

libstdc++-v3/ChangeLog:

* doc/xml/manual/intro.xml: Document LWG 581 change.
* doc/html/manual/bugs.html: Regenerate.
* include/bits/basic_ios.tcc: Whitespace.
* include/bits/ostream.tcc (basic_ostream::flush()): Construct
sentry.
* testsuite/27_io/basic_ostream/flush/char/2.cc: Check
additional cases.
* 
testsuite/27_io/basic_ostream/flush/char/exceptions_badbit_throw.cc:
Likewise.
* testsuite/27_io/basic_ostream/flush/wchar_t/2.cc: Likewise.
* 
testsuite/27_io/basic_ostream/flush/wchar_t/exceptions_badbit_throw.cc:
Likewise.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 3e7843f58c1..45762caa711 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -743,6 +743,12 @@ requirements of the license of GCC.
 In C++11 mode, remove the pow(float,int), etc., signatures.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#581">581:
+   flush() not unformatted function
+
+Change it to be a unformatted output function (i.e. 
construct a sentry and catch exceptions).
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#586">586:
string inserter not a formatted function
 
diff --git a/libstdc++-v3/include/bits/basic_ios.tcc 
b/libstdc++-v3/include/bits/basic_ios.tcc
index 6285f734031..664a9f22759 100644
--- a/libstdc++-v3/include/bits/basic_ios.tcc
+++ b/libstdc++-v3/include/bits/basic_ios.tcc
@@ -43,7 +43,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (this->rdbuf())
_M_streambuf_state = __state;
   else
- _M_streambuf_state = __state | badbit;
+   _M_streambuf_state = __state | badbit;
   if (this->exceptions() & this->rdstate())
__throw_ios_failure(__N("basic_ios::clear"));
 }
diff --git a/libstdc++-v3/include/bits/ostream.tcc 
b/libstdc++-v3/include/bits/ostream.tcc
index 20585f447ac..d3220e8034b 100644
--- a/libstdc++-v3/include/bits/ostream.tcc
+++ b/libstdc++-v3/include/bits/ostream.tcc
@@ -213,21 +213,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // DR 60. What is a formatted input function?
   // basic_ostream::flush() is *not* an unformatted output function.
-  ios_base::iostate __err = ios_base::goodbit;
-  __try
+  // 581. flush() not unformatted function
+  // basic_ostream::flush() *is* an unformatted output function.
+  if (__streambuf_type* __buf = this->rdbuf())
{
- if (this->rdbuf() && this->rdbuf()->pubsync() == -1)
-   __err |= ios_base::badbit;
+ sentry __cerb(*this);
+ if (__cerb)
+   {
+ ios_base::iostate __err = ios_base::goodbit;
+ __try
+   {
+ if (this->rdbuf()->pubsync() == -1)
+   __err |= ios_base::badbit;
+   }
+ __catch(__cxxabiv1::__forced_unwind&)
+   {
+ this->_M_setstate(ios_base::badbit);
+ __throw_exception_again;
+   }
+ __catch(...)
+   { this->_M_setstate(ios_base::badbit); }
+ if (__err)
+   this->setstate(__err);
+   }
}
-  __catch(__cxxabiv1::__forced_unwind&)
-   {
- this->_M_setstate(ios_base::badbit);  
- __throw_exception_again;
-   }
-  __catch(...)
-   { this->_M_setstate(ios_base::badbit); }
-  if (__err)
-   this->setstate(__err);
   return *this;
 }
 
diff --git a/libstdc++-v3/testsuite/27_io/basic_ostream/flush/char/2.cc 
b/libstdc++-v3/testsuite/27_io/basic_ostream/flush/char/2.cc
index 0b33e60bd08..96969debca7 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ostream/flush/char/2

[committed] libstdc++: Fix exception handling in std::ostream seek functions

2021-06-25 Thread Jonathan Wakely via Gcc-patches

N3168 added the requirement that the [ostream.seeks] functions create a
sentry object. Nothing in the requirements of those functions says
anything about catching exceptions and setting badbit.

libstdc++-v3/ChangeLog:

* include/bits/ostream.tcc (sentry): Only set failbit if badbit
is set, not if eofbit is set.
(tellp, seekp, seekp): Create sentry object. Do not set badbit
on exceptions.
* testsuite/27_io/basic_ostream/seekp/char/exceptions_badbit_throw.cc:
Adjust expected behaviour.
* 
testsuite/27_io/basic_ostream/seekp/wchar_t/exceptions_badbit_throw.cc:
Likewise.
* testsuite/27_io/basic_ostream/tellp/char/exceptions_badbit_throw.cc:
Likewise.
* 
testsuite/27_io/basic_ostream/tellp/wchar_t/exceptions_badbit_throw.cc:
Likewise.
* testsuite/27_io/basic_ostream/seekp/char/n3168.cc: New test.
* testsuite/27_io/basic_ostream/seekp/wchar_t/n3168.cc: New test.
* testsuite/27_io/basic_ostream/tellp/char/n3168.cc: New test.
* testsuite/27_io/basic_ostream/tellp/wchar_t/n3168.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

I've just realised that the description in the commit log is wrong.
After I wrote that I changed the sentry constructor to not set failbit,
so that seeking when eofbit is set doesn't set failbit. Sorry for the
bad commit log. The ChangeLog part is correct.


commit 9b6c65c754f2b2a78f5353219ec62817064e0d24
Author: Jonathan Wakely 
Date:   Fri Jun 25 18:31:23 2021

libstdc++: Fix exception handling in std::ostream seek functions

N3168 added the requirement that the [ostream.seeks] functions create a
sentry object. Nothing in the requirements of those functions says
anything about catching exceptions and setting badbit.

As well as not catching exceptions, this change results in another
observable behaviour change. Previously seeking on a stream with eofbit
set would work (as long as badbit and failbit weren't set). The
construction of a sentry causes failbit to be set when eofbit is set,
which causes the seek to fail. It is necessary to clear the eofbit
before seeking now.

libstdc++-v3/ChangeLog:

* include/bits/ostream.tcc (sentry): Only set failbit if badbit
is set, not if eofbit is set.
(tellp, seekp, seekp): Create sentry object. Do not set badbit
on exceptions.
* 
testsuite/27_io/basic_ostream/seekp/char/exceptions_badbit_throw.cc:
Adjust expected behaviour.
* 
testsuite/27_io/basic_ostream/seekp/wchar_t/exceptions_badbit_throw.cc:
Likewise.
* 
testsuite/27_io/basic_ostream/tellp/char/exceptions_badbit_throw.cc:
Likewise.
* 
testsuite/27_io/basic_ostream/tellp/wchar_t/exceptions_badbit_throw.cc:
Likewise.
* testsuite/27_io/basic_ostream/seekp/char/n3168.cc: New test.
* testsuite/27_io/basic_ostream/seekp/wchar_t/n3168.cc: New test.
* testsuite/27_io/basic_ostream/tellp/char/n3168.cc: New test.
* testsuite/27_io/basic_ostream/tellp/wchar_t/n3168.cc: New test.

diff --git a/libstdc++-v3/include/bits/ostream.tcc 
b/libstdc++-v3/include/bits/ostream.tcc
index 76ca28561c1..20585f447ac 100644
--- a/libstdc++-v3/include/bits/ostream.tcc
+++ b/libstdc++-v3/include/bits/ostream.tcc
@@ -53,7 +53,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   if (__os.good())
_M_ok = true;
-  else
+  else if (__os.bad())
__os.setstate(ios_base::failbit);
 }
 
@@ -236,19 +236,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 basic_ostream<_CharT, _Traits>::
 tellp()
 {
+  sentry __cerb(*this);
   pos_type __ret = pos_type(-1);
-  __try
-   {
- if (!this->fail())
-   __ret = this->rdbuf()->pubseekoff(0, ios_base::cur, ios_base::out);
-   }
-  __catch(__cxxabiv1::__forced_unwind&)
-   {
- this->_M_setstate(ios_base::badbit);  
- __throw_exception_again;
-   }
-  __catch(...)
-   { this->_M_setstate(ios_base::badbit); }
+  if (!this->fail())
+   __ret = this->rdbuf()->pubseekoff(0, ios_base::cur, ios_base::out);
   return __ret;
 }
 
@@ -257,30 +248,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 basic_ostream<_CharT, _Traits>::
 seekp(pos_type __pos)
 {
-  ios_base::iostate __err = ios_base::goodbit;
-  __try
+  sentry __cerb(*this);
+  if (!this->fail())
{
- if (!this->fail())
-   {
- // _GLIBCXX_RESOLVE_LIB_DEFECTS
- // 136.  seekp, seekg setting wrong streams?
- const pos_type __p = this->rdbuf()->pubseekpos(__pos,
-ios_base::out);
+ // _GLIBCXX_RESOLVE_LIB_DEFECTS
+ // 136.  seekp, seekg setting wrong streams?
+ const pos_type __p = this->rdbuf

[committed] libstdc++: Remove noexcept from syncbuf::swap (LWG 3498)

2021-06-25 Thread Jonathan Wakely via Gcc-patches

The proposed resolution for the inconsistent noexcept-specifiers in the
spec is to remove it from bto hthe assignment operator and swap.

libstdc++-v3/ChangeLog:

* include/std/syncstream (basic_syncbuf::swap()): Remove
noexcept, as per LWG 3498.

Tested powerpc64le-linux. Committed to trunk.

commit 7ab7fa1b51cb7227e051842bcb971b95c962327a
Author: Jonathan Wakely 
Date:   Fri Jun 25 18:31:22 2021

libstdc++: Remove noexcept from syncbuf::swap (LWG 3498)

The proposed resolution for the inconsistent noexcept-specifiers in the
spec is to remove it from bto hthe assignment operator and swap.

libstdc++-v3/ChangeLog:

* include/std/syncstream (basic_syncbuf::swap()): Remove
noexcept, as per LWG 3498.

diff --git a/libstdc++-v3/include/std/syncstream 
b/libstdc++-v3/include/std/syncstream
index 299941f651c..db6ebd50471 100644
--- a/libstdc++-v3/include/std/syncstream
+++ b/libstdc++-v3/include/std/syncstream
@@ -114,7 +114,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   void
-  swap(basic_syncbuf& __other) noexcept
+  swap(basic_syncbuf& __other)
   {
using _ATr = allocator_traits<_Alloc>;
if constexpr (!_ATr::propagate_on_container_swap::value)

[committed] libstdc++: More workarounds in 17_intro/names.cc test [PR 97088]

2021-06-25 Thread Jonathan Wakely via Gcc-patches

Conditionally #undef some more names that are used in system headers.

libstdc++-v3/ChangeLog:

PR libstdc++/97088
* testsuite/17_intro/names.cc: Undef more names for newlib and
also for arm-none-linux-gnueabi.
* testsuite/experimental/names.cc: Disable PCH.

Tested powerpc64le-linux. Committed to trunk.

commit e83a5a6b6893e910dc0b6b1cd034e1a258406c93
Author: Jonathan Wakely 
Date:   Fri Jun 25 18:31:22 2021

libstdc++: More workarounds in 17_intro/names.cc test [PR 97088]

Conditionally #undef some more names that are used in system headers.

libstdc++-v3/ChangeLog:

PR libstdc++/97088
* testsuite/17_intro/names.cc: Undef more names for newlib and
also for arm-none-linux-gnueabi.
* testsuite/experimental/names.cc: Disable PCH.

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 534dab70ff5..805c1002c3f 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -208,6 +208,11 @@
 #undef r
 #endif
 
+#if defined (__linux__) && defined (__arm__)
+//  defines fpregset_t::fpregs::j
+#undef j
+#endif
+
 #if defined (__linux__) && defined (__powerpc__)
 //  defines __vector128::u
 #undef u
@@ -220,6 +225,15 @@
 #if ! __has_include()
 // newlib's  defines __lockable as a macro, so we can't use it.
 # define __lockablecannot be used as an identifier
+// newlib's  defines __tzrule_type with these members.
+#undef d
+#undef m
+#undef n
+#undef s
+// newlib's  uses this for parameters
+#undef x
+// newlib's  uses this for parameters
+#undef j
 #endif
 
 #ifdef __sun__
diff --git a/libstdc++-v3/testsuite/experimental/names.cc 
b/libstdc++-v3/testsuite/experimental/names.cc
index 34ec3ba5968..d695a258f2c 100644
--- a/libstdc++-v3/testsuite/experimental/names.cc
+++ b/libstdc++-v3/testsuite/experimental/names.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-do compile { target c++11 } }
+// { dg-add-options no_pch }
 
 // Define macros for some common variables names that we must not use for
 // naming variables, parameters etc. in the library.

Re: [PATCH] inline: do not inline when no_profile_instrument_function is different

2021-06-25 Thread Nick Desaulniers via Gcc-patches

On Wed, Jun 23, 2021 at 6:15 AM Martin Liška  wrote:
>
> On 6/23/21 2:38 PM, Jan Hubicka wrote:
> > Is there reason to prevent the inlining once instrumentation is done?
>
> No ;)

Here's another case that coincidentally came up yesterday. How should
these attributes behave in the case of __attribute__((flatten)) on the
caller?

Another case which is curious is what happens when the callee has
__attribute__((always_inline)) and there's a conflict?

In LLVM, the current behavior is:
"__attribute__((no_stack_protector)) and
__attribute__((no_profile_instrument_function)) will prevent inline
substitution when the attributes do not match between caller and
callee, unless the callee was attributed with
__attribute__((always_inline)) or the caller was attributed with
__attribute__((flatten))."

We have some overzealous users of always_inline and flatten in
Android. They are currently playing whack-a-mole with
no_stack_protector.  I'm not sure yet how we can better help them self
diagnose, or whether we should consider a change in policy.

I'm also not sure whether GCC's einliner corresponds with
always_inline or not necessarily?
--
Thanks,
~Nick Desaulniers

Re: [PATCH 2/3] Fix IEEE 128-bit min/max test.

2021-06-25 Thread Segher Boessenkool

On Thu, Jun 17, 2021 at 04:11:40PM -0400, Michael Meissner wrote:
> On Thu, Jun 17, 2021 at 01:11:58PM -0500, Segher Boessenkool wrote:
> > > --- a/gcc/testsuite/gcc.target/powerpc/float128-minmax.c
> > > +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax.c
> > > @@ -1,6 +1,5 @@
> > > -/* { dg-do compile { target lp64 } } */
> > 
> > Does that work?  Why was it there before?
> 
> The lp64 eliminates 32-bit, which does not support hardware IEEE 128-bit due 
> to
> the lack of TImode.

I still do not understand this.  Why would support for QP float require
TImode?  "Need an integer mode of the same size" is not a convincing
argument, since double-double is a 16 byte mode as well.

> The test was written before the ppc_float128_hw test.  Now
> that we have ppc_float128_hw, we don't need an explicit lp64.

Ah good, some progress.  Well, it *is* an improvement, a better
abstraction, but on the other hand it only hides the actual problems
deeper :-/

> > >  /* { dg-require-effective-target powerpc_p9vector_ok } */
> > > -/* { dg-require-effective-target float128 } */
> > > +/* { dg-require-effective-target ppc_float128_hw } */
> > 
> > Why is it okay to no longer run this test where it ran before?
> 
> The ppc_float128_hw test is a more precise test than just float128 and power9.

You did not delete the p9 test though.


Segher

Re: [PATCH] c++: access scope during partial spec matching [PR96204]

2021-06-25 Thread Patrick Palka via Gcc-patches

On Fri, 25 Jun 2021, Jason Merrill wrote:

> On 6/25/21 11:03 AM, Patrick Palka wrote:
> > Here, when determining whether the partial specialization matches the
> > specialization has_set_attr_method, we do so from the scope of
> > where the template-id appears rather than from the scope of the
> > specialization, and this causes us to select the partial specialization
> > (since Child::type is accessible from Parent).  When we later
> > instantiate this partial specialization, we've entered the scope of the
> > specialization and so substitution into e.g. the DECL_CONTEXT for
> > 'value' yields access errors for Child::type since the friend
> > declaration no longer applies.
> > 
> > It seems the appropriate access scope from which to perform partial
> > specialization matching is the specialization itself (similar to how
> > we check access of base-clauses), which is what this patch implements.
> 
> > There's implementation divergence however: Clang accepts both testcases
> > below whereas MSVC and ICC reject both (indicating that Clang performs
> > partial spec matching from the scope of the specialization and MSVC/ICC
> > performs it from whatever scope the template-id appears).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > PR c++/96204
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.c (instantiate_class_template_1): Enter the scope of the
> > type before calling most_specialized_partial_spec.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/template/access40.C: New test.
> > * g++.dg/template/access40a.C: New test.
> > ---
> >   gcc/cp/pt.c   |  6 -
> >   gcc/testsuite/g++.dg/template/access40.C  | 30 +++
> >   gcc/testsuite/g++.dg/template/access40a.C | 30 +++
> >   3 files changed, 65 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/template/access40.C
> >   create mode 100644 gcc/testsuite/g++.dg/template/access40a.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index f4e0abe5c1e..5107bfbf9d1 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -11774,8 +11774,12 @@ instantiate_class_template_1 (tree type)
> > deferring_access_check_sentinel acs (dk_no_deferred);
> >   /* Determine what specialization of the original template to
> > - instantiate.  */
> > + instantiate; do this relative to the scope of the type.  */
> > +  push_access_scope (TYPE_NAME (type));
> > +  pushclass (type);
> 
> How about replacing these two calls with push_nested_class (type), like we use
> later in the function?

That works nicely.  Would the patch be OK with that change?
Bootstrapped and regtested on x86_64-pc-linux-gnu.

> 
> > t = most_specialized_partial_spec (type, tf_warning_or_error);
> > +  popclass ();
> > +  pop_access_scope (TYPE_NAME (type));
> > if (t == error_mark_node)
> >   return error_mark_node;
> > else if (t)
> > diff --git a/gcc/testsuite/g++.dg/template/access40.C
> > b/gcc/testsuite/g++.dg/template/access40.C
> > new file mode 100644
> > index 000..e0d30779377
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/template/access40.C
> > @@ -0,0 +1,30 @@
> > +// PR c++/96204
> > +
> > +template struct bool_constant;
> > +
> > +template
> > +struct has_type_member {
> > +  static const bool value = false;
> > +};
> > +
> > +template
> > +struct has_type_member {
> > +  static const bool value = true;
> > +};
> > +
> > +struct Parent;
> > +
> > +struct Child {
> > +private:
> > +  friend struct Parent;
> > +  typedef void type;
> > +};
> > +
> > +struct Parent {
> > +  static void f() {
> > +// The partial specialization of has_type_member does not match
> > +// despite Child::type being accessible from the current scope.
> > +typedef bool_constant::value> type;
> > +typedef bool_constant type;
> > +  }
> > +};
> > diff --git a/gcc/testsuite/g++.dg/template/access40a.C
> > b/gcc/testsuite/g++.dg/template/access40a.C
> > new file mode 100644
> > index 000..85138c9e570
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/template/access40a.C
> > @@ -0,0 +1,30 @@
> > +// PR c++/96204
> > +
> > +template struct bool_constant;
> > +
> > +template
> > +struct has_type_member {
> > +  static const bool value = false;
> > +};
> > +
> > +template
> > +struct has_type_member {
> > +  static const bool value = true;
> > +};
> > +
> > +struct Parent;
> > +
> > +struct Child {
> > +private:
> > +  friend struct has_type_member;
> > +  typedef void type;
> > +};
> > +
> > +struct Parent {
> > +  static void f() {
> > +// The partial specialization matches because Child::type is
> > +// accessible from has_type_member.
> > +typedef bool_constant::value> type;
> > +typedef bool_constant type;
> > +  }
> > +};
> > 
> 
>

Re: [PATCH] c++: CTAD within alias template [PR91911]

2021-06-25 Thread Patrick Palka via Gcc-patches

On Fri, 25 Jun 2021, Jason Merrill wrote:

> On 6/24/21 4:45 PM, Patrick Palka wrote:
> > In the first testcase below, during parsing of the alias template
> > ConstSpanType, transparency of alias template specializations means we
> > replace SpanType with SpanType's substituted definition.  But this
> > substitution lowers the level of the CTAD placeholder for span(T()) from
> > 2 to 1, and so the later instantiantion of ConstSpanType
> > erroneously substitutes this CTAD placeholder with the template argument
> > at level 1 index 0, i.e. with int, before we get a chance to perform the
> > CTAD.
> > 
> > In light of this, it seems we should avoid level lowering when
> > substituting through through the type-id of a dependent alias template
> > specialization.  To that end this patch makes lookup_template_class_1
> > pass tf_partial to tsubst in this situation.
> 
> This makes sense, but what happens if SpanType is a member template, so that
> the levels of it and ConstSpanType don't match?  Or the other way around?

If SpanType is a member template of say the class template A (and
thus its level is greater than ConstSpanType):

  template
  struct A {
template
using SpanType = decltype(span(T()));
  };

  template
  using ConstSpanType = span::SpanType::value_type>;

  using type = ConstSpanType;

then this case luckily works even without the patch because
instantiate_class_template now reuses the specialization A::SpanType
that was formed earlier during instantiation of A, where we
substitute only a single level of template arguments, so the level of
the CTAD placeholder inside the defining-type-id of this specialization
dropped from 3 to 2, so still more than the level of ConstSpanType.

This luck is short-lived though, because if we replace
A::SpanType with say A::SpanType then the testcase
breaks again (without the patch) because we no longer can reuse that
specialization, so we instead form it on the spot by substituting two
levels of template arguments (U=int,T=T) into the defining-type-id,
causing the level of the placeholder to drop to 1.  I think the patch
causes its level to remain 3 (though I guess it should really be 2).

For the other way around, if ConstSpanType is a member template of
say the class template B (and thus its level is greater than
SpanType):

  template
  using SpanType = decltype(span(T()));

  template
  struct B {
template
using ConstSpanType = span::value_type>;
  };

  using type = B::ConstSpanType;

then tf_partial doesn't help here at all; we end up substituting 'int'
for the CTAD placeholder...  What it seems we need is to _increase_ the
level of the CTAD placeholder from 2 to 3 during the dependent
substitution..

Hmm, rather than messing with tf_partial, which is apparently only a
partial solution, maybe we should just make tsubst never substitute a
CTAD placeholder -- they should always be resolved from do_class_deduction,
and their level doesn't really matter otherwise.  (But we'd still want
to substitute into the CLASS_PLACEHOLDER_TEMPLATE of the placeholder in
case it's a template template parm.)  Something like:

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 5107bfbf9d1..dead651ed84 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15552,7 +15550,8 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)

levels = TMPL_ARGS_DEPTH (args);
if (level <= levels
-   && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0)
+   && TREE_VEC_LENGTH (TMPL_ARGS_LEVEL (args, level)) > 0
+   && !template_placeholder_p (t))
  {
arg = TMPL_ARG (args, level, idx);

seems to work better.

> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > PR c++/91911
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.c (lookup_template_class_1): When looking up a dependent
> > alias template specialization, pass tf_partial to tsubst.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1z/class-deduction92.C: New test.
> > ---
> >   gcc/cp/pt.c   |  7 +-
> >   .../g++.dg/cpp1z/class-deduction92.C  | 17 +
> >   .../g++.dg/cpp1z/class-deduction93.C  | 25 +++
> >   3 files changed, 48 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction93.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index f73c7471a33..23c5f515716 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -9954,7 +9954,12 @@ lookup_template_class_1 (tree d1, tree arglist, tree
> > in_decl, tree context,
> > template-arguments for the template-parameters in the
> > type-id of the alias template.  */
> >   -   t = tsubst (TREE_TYPE (gen_tmpl), arglist, complain, in_decl);
> > + /* When substituting a dependent alias template specialization,
> > +we pass tf

[PATCH, libgomp, PR101114, committed] Fix struct-elem-5.c regression

2021-06-25 Thread Chung-Lin Tang


The libgomp.c-c++-common/struct-elem-5.c test which I added for the Structure
element mapping patch, does not properly "fail" for non-shared (unified) address
space cases (like host-fallback).

This was handled inside the testcase for struct-elem-[14].c, but missed this
one due to the dg-shouldfail nature.

Fixed by adding "target offload_device_nonshared_as" to dg-run. This is
quite small and obvious, so directly committed after testing.

Chung-Lin

libgomp/ChangeLog:

PR testsuite/101114
* testsuite/libgomp.c-c++-common/struct-elem-5.c:
Add "target offload_device_nonshared_as" condition for enabling test.

From e0672017370b9a9362fda52ecffe33d1c9c41829 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Sat, 26 Jun 2021 00:42:58 +0800
Subject: [PATCH] testsuite/101114: Adjust libgomp.c-c++-common/struct-elem-5.c
 testcase

The dg-shouldfail testcase libgomp.c-c++-common/struct-elem-5.c does not
properly fail for non-shared address space offloading. Adjust testcase
to limit testing only for "target offload_device_nonshared_as".

libgomp/ChangeLog:

PR testsuite/101114
* testsuite/libgomp.c-c++-common/struct-elem-5.c:
Add "target offload_device_nonshared_as" condition for enabling test.
---
 libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c 
b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c
index 814c30120e5..31a2fa5e8cf 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-5.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target offload_device_nonshared_as } } */
 
 struct S
 {
-- 
2.17.1

[PATCH 0/4 REVIEW] libtool and libctf fixes for Solaris 11

2021-06-25 Thread Nick Alcock via Gcc-patches

There are three intertwined bugs here, two in libtool.m4, one in libctf.
The underlying "problem" is that libctf tries to version its symbols: if
it can't use a GNU version script it tries to hide unnecessary symbols
using Libtool's --export-symbols-regex flag... and that flag has kinda
rusted.

Firstly, it relies on running nm and knowing what the symbol type codes
it emits (in BSD mode) mean: Solaris 11 has gained 'C' for common
symbols, like GNU nm, but libtool.m4 was never updated to know about
that, so the test failed.

Secondly, it relies on knowing how to switch nm into BSD mode, which it
does by nm'ing a bunch of candidate nms with various flags: but it nms
/dev/null and has not-very-good handling of nms that emit errors if you
do that.

Thirdly, libtool's nm option scanning is entirely suppressed if NM is
set in the environment, on the grounds that the user must have set it --
but libtool already augments other things set in the environment with
extra flags if needed, and the user can hardly be expected to know that
Libtool needs 'nm -p' on Solaris, not just nm. Worse yet, Cygnus
configure sets NM unconditionally, stopping the test dead.  Fixed by
checking flags even if the user overrides nm, but checking the specific
nm they requested.

The first two patches in this series need review because, well, libtool
upstream is dead enough that I don't know if I can even submit it there,
but it should probably go to GCC and binutils-gdb. I have tested it on
Solaris 11.3, Solaris 11.4, FreeBSD, (x86-64 and SPARC) Linux, and
Cygwin, which should be enough coverage, I hope.  libtool upstream has
diverged in this area and the patch would need redoing in any case
(though the conflicts look quite minor, I don't have time right now and
doing that wouldn't help fix any of the reporter's problems in any case).

(The latter two patches, sent to binutils@ alone, enable symbol
versioning on Solaris 11.4, which provides a --version-script flag
almost but not quite compatible with GNU ld's, and regenerate most
relevant configure scripts, skipping only sim/ because it's in a ferment
of change right now.)

Cc: gcc-patches@gcc.gnu.org

Nick Alcock (4):
  libtool.m4: augment symcode for Solaris 11
  libtool.m4: fix nm BSD flag detection
  libctf: try several possibilities for linker versioning flags
  configure: regenerate in all projects that use libtool.m4

 bfd/configure   |  91 -
 binutils/configure  |  91 -
 gas/configure   |  91 -
 gprof/configure |  91 -
 ld/configure|  91 -
 libctf/Makefile.am  |   5 +-
 libctf/Makefile.in  |   6 +-
 libctf/configure| 157 ++--
 libctf/configure.ac |  46 -
 libctf/libctf.ver   |  10 ++-
 libtool.m4  |  90 -
 opcodes/configure   |  91 -
 zlib/configure  |  91 -
 13 files changed, 531 insertions(+), 420 deletions(-)

-- 
2.32.0.255.gd9b1d14a2a

[PATCH 2/4 REVIEW] libtool.m4: fix nm BSD flag detection

2021-06-25 Thread Nick Alcock via Gcc-patches

Libtool needs to get BSD-format (or MS-format) output out of the system
nm, so that it can scan generated object files for symbol names for
-export-symbols-regex support.  Some nms need specific flags to turn on
BSD-formatted output, so libtool checks for this in its AC_PATH_NM.
Unfortunately the code to do this has a pair of interlocking flaws:

 - it runs the test by doing an nm of /dev/null.  Some platforms
   reasonably refuse to do an nm on a device file, but before now this
   has only been worked around by assuming that the error message has a
   specific textual form emitted by Tru64 nm, and that getting this
   error means this is Tru64 nm and that nm -B would work to produce
   BSD-format output, even though the test never actually got anything
   but an error message out of nm -B.  This is fixable by nm'ing *nm
   itself* (since we necessarily have a path to it).

 - the test is entirely skipped if NM is set in the environment, on the
   grounds that the user has overridden the test: but the user cannot
   reasonably be expected to know that libtool wants not only nm but
   also flags forcing BSD-format output.  Worse yet, one such "user" is
   the top-level Cygnus configure script, which neither tests for
   nor specifies any BSD-format flags.  So platforms needing BSD-format
   flags always fail to set them when run in a Cygnus tree, breaking
   -export-symbols-regex on such platforms.  Libtool also needs to
   augment $LD on some platforms, but this is done unconditionally,
   augmenting whatever the user specified: the nm check should do the
   same.

   One wrinkle: if the user has overridden $NM, a path might have been
   provided: so we use the user-specified path if there was one, and
   otherwise do the path search as usual.  (If the nm specified doesn't
   work, this might lead to a few extra pointless path searches -- but
   the test is going to fail anyway, so that's not a problem.)

(Tested with NM unset, and set to nm, /usr/bin/nm, my-nm where my-nm is a
symlink to /usr/bin/nm on the PATH, and /not-on-the-path/my-nm where
*that* is a symlink to /usr/bin/nm.)

Cc: gcc-patches@gcc.gnu.org

ChangeLog
2021-06-22  Nick Alcock  

PR libctf/27482
* libtool.m4 (LT_PATH_NM): Try BSDization flags with a user-provided
NM, if there is one.  Run nm on itself, not on /dev/null, to avoid
errors from nms that refuse to work on non-regular files.  Remove
other workarounds for this problem.  Strip out blank lines from the
nm output.
---
 libtool.m4 | 88 --
 1 file changed, 45 insertions(+), 43 deletions(-)

diff --git a/libtool.m4 b/libtool.m4
index b0a56917497..29dce1f8239 100644
--- a/libtool.m4
+++ b/libtool.m4
@@ -3200,53 +3200,55 @@ _LT_DECL([], [file_magic_cmd], [1],
 
 # LT_PATH_NM
 # --
-# find the pathname to a BSD- or MS-compatible name lister
+# find the pathname to a BSD- or MS-compatible name lister, and any flags
+# needed to make it compatible
 AC_DEFUN([LT_PATH_NM],
 [AC_REQUIRE([AC_PROG_CC])dnl
 AC_CACHE_CHECK([for BSD- or MS-compatible name lister (nm)], lt_cv_path_NM,
 [if test -n "$NM"; then
-  # Let the user override the test.
-  lt_cv_path_NM="$NM"
-else
-  lt_nm_to_check="${ac_tool_prefix}nm"
-  if test -n "$ac_tool_prefix" && test "$build" = "$host"; then
-lt_nm_to_check="$lt_nm_to_check nm"
-  fi
-  for lt_tmp_nm in $lt_nm_to_check; do
-lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR
-for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do
-  IFS="$lt_save_ifs"
-  test -z "$ac_dir" && ac_dir=.
-  tmp_nm="$ac_dir/$lt_tmp_nm"
-  if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext" ; then
-   # Check to see if the nm accepts a BSD-compat flag.
-   # Adding the `sed 1q' prevents false positives on HP-UX, which says:
-   #   nm: unknown option "B" ignored
-   # Tru64's nm complains that /dev/null is an invalid object file
-   case `"$tmp_nm" -B /dev/null 2>&1 | sed '1q'` in
-   */dev/null* | *'Invalid file or object type'*)
- lt_cv_path_NM="$tmp_nm -B"
- break
- ;;
-   *)
- case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in
- */dev/null*)
-   lt_cv_path_NM="$tmp_nm -p"
-   break
-   ;;
- *)
-   lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but
-   continue # so that we can try to find one that supports BSD flags
-   ;;
- esac
- ;;
-   esac
-  fi
-done
-IFS="$lt_save_ifs"
-  done
-  : ${lt_cv_path_NM=no}
-fi])
+   # Let the user override the nm to test.
+   lt_nm_to_check="$NM"
+ else
+   lt_nm_to_check="${ac_tool_prefix}nm"
+   if test -n "$ac_tool_prefix" && test "$build" = "$host"; then
+ lt_nm_to_check="$lt_nm_to_check nm"
+   fi
+ fi
+ for lt_tmp_nm in $lt_nm_to_check; do
+   lt_save_ifs="$IFS"; IFS=$PATH_SEPARATOR
+   for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/b

[PATCH 1/4] libtool.m4: augment symcode for Solaris 11

2021-06-25 Thread Nick Alcock via Gcc-patches

This reports common symbols like GNU nm, via a type code of 'C'.

Cc: gcc-patches@gcc.gnu.org

ChangeLog
2021-06-22  Nick Alcock  

PR libctf/27967
* libtool.m4 (lt_cv_sys_global_symbol_pipe): Augment symcode for
Solaris 11.
---
 libtool.m4 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libtool.m4 b/libtool.m4
index 9a13f3b117a..b0a56917497 100644
--- a/libtool.m4
+++ b/libtool.m4
@@ -3395,7 +3395,7 @@ osf*)
   symcode='[[BCDEGQRST]]'
   ;;
 solaris*)
-  symcode='[[BDRT]]'
+  symcode='[[BCDRT]]'
   ;;
 sco3.2v5*)
   symcode='[[DT]]'
-- 
2.32.0.255.gd9b1d14a2a

[PATCH 7/7] s390: Increase costs for load on condition and change movqicc expander.

2021-06-25 Thread Robin Dapp via Gcc-patches

---
 gcc/config/s390/s390.c  | 2 +-
 gcc/config/s390/s390.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..e3b1f99669f 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3636,7 +3636,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
/* It is going to be a load/store on condition.  Make it
   slightly more expensive than a normal load.  */
-   *total = COSTS_N_INSNS (1) + 1;
+   *total = COSTS_N_INSNS (1) + 2;
 
rtx dst = SET_DEST (x);
rtx then = XEXP (SET_SRC (x), 1);
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 0c5b4dc9029..ee4fac2f9ad 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -7006,9 +7006,9 @@
   if (!CONSTANT_P (els))
 els = simplify_gen_subreg (E_SImode, els, mode, 0);
 
-  rtx tmp_target = gen_reg_rtx (E_SImode);
+  rtx tmp_target = simplify_gen_subreg (E_SImode, operands[0], mode, 0);
+
   emit_insn (gen_movsicc (tmp_target, operands[1], then, els));
-  emit_move_insn (operands[0], gen_lowpart (mode, tmp_target));
   DONE;
 })
 
-- 
2.31.1

[PATCH 3/7] ifcvt: Improve costs handling for noce_convert_multiple.

2021-06-25 Thread Robin Dapp via Gcc-patches

When noce_convert_multiple is called the original costs are not yet
initialized.  Therefore, up to now, costs were only ever unfairly
compared against COSTS_N_INSNS (2).  This would lead to
default_noce_conversion_profitable_p () rejecting all but the most
contrived of sequences.

This patch temporarily initialized the original costs by counting
a compare and all the sets inside the then_bb.
---
 gcc/ifcvt.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 6006055f26a..ac0c142c9fe 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3382,14 +3382,17 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
(SET (REG) (REG)) insns suitable for conversion to a series
of conditional moves.  Also check that we have more than one set
(other routines can handle a single set better than we would), and
-   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
+   through the insns store the sum of their potential costs in COST.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb, unsigned *cost)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = param_max_rtl_if_conversion_insns;
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+  unsigned potential_cost = 0;
 
   FOR_BB_INSNS (test_bb, insn)
 {
@@ -3425,9 +3428,13 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
test_bb)
   if (!can_conditionally_move_p (GET_MODE (dest)))
return false;
 
+  potential_cost += pattern_cost (set, speed_p);
+
   count++;
 }
 
+  *cost += potential_cost;
+
   /* If we would only put out one conditional move, the other strategies
  this pass tries are better optimized and will be more appropriate.
  Some targets want to strictly limit the number of conditional moves
@@ -3475,11 +3482,23 @@ noce_process_if_block (struct noce_if_info *if_info)
  to calculate a value for x.
  ??? For future expansion, further expand the "multiple X" rules.  */
 
-  /* First look for multiple SETS.  */
+  /* First look for multiple SETS.  The original costs already
+ include a compare that we will be needing either way.  When
+ comparing costs we want to use the branch instruction cost and
+ the sets vs. the cmovs generated here.  Therefore subtract
+ the costs of the compare before checking.
+ ??? Actually, instead of the branch instruction costs we might want
+ to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.*/
+
+  unsigned potential_cost = if_info->original_cost - COSTS_N_INSNS (1);
+  unsigned old_cost = if_info->original_cost;
   if (!else_bb
   && HAVE_conditional_move
-  && bb_ok_for_noce_convert_multiple_sets (then_bb))
+  && bb_ok_for_noce_convert_multiple_sets (then_bb, &potential_cost))
 {
+  /* Temporarily set the original costs to what we estimated so
+we can determine if the transformation is worth it.  */
+  if_info->original_cost = potential_cost;
   if (noce_convert_multiple_sets (if_info))
{
  if (dump_file && if_info->transform_name)
@@ -3487,6 +3506,9 @@ noce_process_if_block (struct noce_if_info *if_info)
 if_info->transform_name);
  return TRUE;
}
+
+  /* Restore the original costs.  */
+  if_info->original_cost = old_cost;
 }
 
   bool speed_p = optimize_bb_for_speed_p (test_bb);
-- 
2.31.1

[PATCH 2/7] ifcvt: Allow constants for noce_convert_multiple.

2021-06-25 Thread Robin Dapp via Gcc-patches

This lifts the restriction of not allowing constants for
noce_convert_multiple.  The code later checks if a valid sequence
is produced anyway.
---
 gcc/ifcvt.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index eef6490626a..6006055f26a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3269,7 +3269,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
 Wrap the two cmove operands into subregs if appropriate to prevent
 that.  */
-  if (GET_MODE (new_val) != GET_MODE (temp))
+
+  if (!CONSTANT_P (new_val)
+ && GET_MODE (new_val) != GET_MODE (temp))
{
  machine_mode src_mode = GET_MODE (new_val);
  machine_mode dst_mode = GET_MODE (temp);
@@ -3280,7 +3282,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
}
  new_val = lowpart_subreg (dst_mode, new_val, src_mode);
}
-  if (GET_MODE (old_val) != GET_MODE (temp))
+  if (!CONSTANT_P (old_val)
+ && GET_MODE (old_val) != GET_MODE (temp))
{
  machine_mode src_mode = GET_MODE (old_val);
  machine_mode dst_mode = GET_MODE (temp);
@@ -3409,9 +3412,9 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
   if (!REG_P (dest))
return false;
 
-  if (!(REG_P (src)
-  || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
-  && subreg_lowpart_p (src
+  if (!((REG_P (src) || CONSTANT_P (src))
+   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
+ && subreg_lowpart_p (src
return false;
 
   /* Destination must be appropriate for a conditional write.  */
-- 
2.31.1

[PATCH 6/7] testsuite/s390: Add tests for noce_convert_multiple.

2021-06-25 Thread Robin Dapp via Gcc-patches

Add new s390-specific tests that check if we convert two SETs into two
loads on condition.  Remove the s390-specific target-check in
gcc.dg/ifcvt-4.c.
---
 gcc/testsuite/gcc.dg/ifcvt-4.c|  2 +-
 .../gcc.target/s390/ifcvt-two-insns-bool.c| 39 +++
 .../gcc.target/s390/ifcvt-two-insns-int.c | 39 +++
 .../gcc.target/s390/ifcvt-two-insns-long.c| 39 +++
 4 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index ec142cfd943..871ab950756 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -2,7 +2,7 @@
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */
 /* { dg-additional-options "-mtune-ctrl=^one_if_conv_insn" { target { i?86-*-* 
x86_64-*-* } } } */
-/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" 
{ "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* } } 
 */
+/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" 
{ "arm*-*-* avr-*-* hppa*64*-*-* visium-*-*" riscv*-*-* msp430-*-* } }  */
 /* { dg-skip-if "" { "s390x-*-*" } { "-m31" } }  */
 
 typedef int word __attribute__((mode(word)));
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
new file mode 100644
index 000..4415cb49f11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "lochinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locrnhe\t.*" } } */
+#include 
+#include 
+#include 
+#include 
+
+__attribute__ ((noinline))
+int foo (int *a, unsigned int n)
+{
+  int min = 99;
+  bool bla = false;
+  for (int i = 0; i < n; i++)
+{
+  if (a[i] < min)
+   {
+ min = a[i];
+ bla = true;
+   }
+}
+
+  if (bla)
+min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a));
+
+  assert (res == (INT_MIN + 1));
+}
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
new file mode 100644
index 000..db8df84cced
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "lochinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locrnhe\t.*" } } */
+#include 
+#include 
+#include 
+#include 
+
+__attribute__ ((noinline))
+int foo (int *a, unsigned int n)
+{
+  int min = 99;
+  int bla = 0;
+  for (int i = 0; i < n; i++)
+{
+  if (a[i] < min)
+   {
+ min = a[i];
+ bla = 1;
+   }
+}
+
+  if (bla)
+min += 1;
+  return min;
+}
+
+int main()
+{
+  int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
+
+  int res = foo (a, sizeof (a));
+
+  assert (res == (INT_MIN + 1));
+}
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c 
b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
new file mode 100644
index 000..bee63328729
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
@@ -0,0 +1,39 @@
+/* Check if conversion for two instructions.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -march=z13 --save-temps" } */
+
+/* { dg-final { scan-assembler "locghinhe\t%r.?,1" } } */
+/* { dg-final { scan-assembler "locgrnhe\t.*" } } */
+#include 
+#include 
+#include 
+#include 
+
+__attribute__ ((noinline))
+long foo (long *a, unsigned long n)
+{
+  long min = 99;
+  long bla = 0;
+  for (int i = 0; i < n; i++)
+{
+  if (a[i] < min)
+   {
+ min = a[i];
+ bla = 1;
+   }
+}
+
+  if (bla)
+min += 1;
+  return min;
+}
+
+int main()
+{
+  long a[] = {2, 1, -13, LONG_MAX, LONG_MIN, 0};
+
+  long res = foo (a, sizeof (a));
+
+  assert (res == (LONG_MIN + 1));
+}
-- 
2.31.1

[PATCH 5/7] ifcvt: Try re-using CC for conditional moves.

2021-06-25 Thread Robin Dapp via Gcc-patches

Following up on the previous patch, this patch makes
noce_convert_multiple emit two cmov sequences:  The same one as before
and a second one that tries to re-use the existing CC.  Then their costs
are compared and the cheaper one is selected.
---
 gcc/ifcvt.c | 94 -
 1 file changed, 86 insertions(+), 8 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index c5b8641e2aa..55405ce9ef7 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3142,6 +3142,47 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx 
cond,
   return false;
 }
 
+/* Helper function to emit a cmov sequence.  */
+
+static rtx_insn*
+try_emit_cmove_seq (struct noce_if_info *if_info, rtx temp,
+   rtx cond, rtx target,
+   rtx new_val, rtx old_val, bool need_cmov,
+   unsigned *cost, rtx *temp_dest,
+   rtx cc_cmp = NULL, rtx rev_cc_cmp = NULL)
+{
+  rtx_insn *seq = NULL;
+  *cost = 0;
+
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  start_sequence ();
+
+  if (need_cmov)
+*temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+x, y, new_val, old_val, cc_cmp, rev_cc_cmp);
+  else
+{
+  *temp_dest = target;
+  if (if_info->then_else_reversed)
+   noce_emit_move_insn (target, old_val);
+  else
+   noce_emit_move_insn (target, new_val);
+}
+
+  if (*temp_dest != NULL_RTX)
+{
+  seq = get_insns ();
+  *cost = seq_cost (seq, if_info->speed_p);
+}
+
+  end_sequence ();
+
+  return seq;
+}
+
 /* We have something like:
 
  if (x > y)
@@ -3199,7 +3240,9 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx cond = noce_get_condition (jump, &cond_earliest, false);
   rtx x = XEXP (cond, 0);
   rtx y = XEXP (cond, 1);
-  rtx_code cond_code = GET_CODE (cond);
+
+  rtx cc_cmp = cond_exec_get_condition (jump);
+  rtx rev_cc_cmp = noce_get_condition (jump, &cond_earliest, true);
 
   /* The true targets for a conditional move.  */
   auto_vec targets;
@@ -3301,18 +3344,52 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
}
 
-  /* Actually emit the conditional move.  */
-  rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
-  x, y, new_val, old_val);
+  /* Try emitting a conditional move passing the backend the
+canonicalized comparison.  The backend is then able to
+recognize expressions like
 
-  /* If we failed to expand the conditional move, drop out and don't
-try to continue.  */
-  if (temp_dest == NULL_RTX)
+  if (x > y)
+y = x;
+
+as min/max emit an insn, accordingly.
+We will still emit a superfluous CC comparison before the
+min/max, though, which complicates costing.  */
+  unsigned cost1 = 0, cost2 = 0;
+  rtx_insn *seq, *seq1, *seq2;
+  rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
+
+  seq1 = try_emit_cmove_seq (if_info, temp, cond, target,
+new_val, old_val, need_cmov, &cost1, &temp_dest1);
+
+  /* Here, we try to pass the backend a non-canonicalized cc comparison
+as well.  This allows the backend to emit a cmov directly without
+creating an additional compare for each.  If successful, costing
+is easier and this sequence is usually preferred.  */
+  seq2 = try_emit_cmove_seq (if_info, target, cond, target,
+new_val, old_val, need_cmov, &cost2, &temp_dest2,
+cc_cmp, rev_cc_cmp);
+
+  /* Check which version is less expensive.  */
+  if (seq1 != NULL_RTX && cost1 <= cost2)
+   {
+ seq = seq1;
+ temp_dest = temp_dest1;
+   }
+  else if (seq2 != NULL_RTX)
+   {
+ seq = seq2;
+ temp_dest = temp_dest2;
+   }
+  else
{
+ /* Nothing worked, bail out.  */
  end_sequence ();
  return FALSE;
}
 
+  /* End the sub sequence and emit to the main sequence.  */
+  emit_insn (seq);
+
   /* Bookkeeping.  */
   count++;
   targets.safe_push (target);
@@ -3326,7 +3403,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
   /* Now fixup the assignments.  */
   for (int i = 0; i < count; i++)
-noce_emit_move_insn (targets[i], temporaries[i]);
+if (targets[i] != temporaries[i])
+  noce_emit_move_insn (targets[i], temporaries[i]);
 
   /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
-- 
2.31.1

[PATCH 0/7] ifcvt: Convert multiple

2021-06-25 Thread Robin Dapp via Gcc-patches

Hi,

finally I got around to reworking this patch set for
if conversion so it is rather a v3 already.  Since so
much time has passed, I'm not replying to the old thread.

There are several problems with noce_convert_multiple
and most are due to the instructions costs not being
properly counted:

- We create too many temporaries that, by default,
  are counted as instruction even though they will
  be removed by later passes.
- We emit a new compare for every cmov.
- Original costs are unfairly low.

These are tackled in this patch set.

As mentioned in the last discussion, what still remains
is to properly unify some parts of ifcvt.  We have several
"sub-passes" that can handle similar situations but the
restrictions of each of them are not clearly documented.
Costing also differs in that we sometimes just count
emitted cmovs and sometimes we indeed use pattern_cost
or insn_cost.

Ideally, we will also unify passing value comparisons
as well as CC comparisons to backends in the future.
This will allow a backend to choose whether to emit
a comparison, re-use a comparison or even get rid of
a comparison by emitting a min/max.

Nonetheless, this patch set improves the situation on
s390 and, in general, allows more accurate costing.

Regards
 Robin

Robin Dapp (7):
  ifcvt: Check if cmovs are needed.
  ifcvt: Allow constants for noce_convert_multiple.
  ifcvt: Improve costs handling for noce_convert_multiple.
  ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.
  ifcvt: Try re-using CC for conditional moves.
  testsuite/s390: Add tests for noce_convert_multiple.
  s390: Increase costs for load on condition and change movqicc
expander.

 gcc/config/s390/s390.c|   2 +-
 gcc/config/s390/s390.md   |   4 +-
 gcc/ifcvt.c   | 273 +++---
 gcc/optabs.c  | 163 +++
 gcc/optabs.h  |   1 +
 gcc/testsuite/gcc.dg/ifcvt-4.c|   2 +-
 .../gcc.target/s390/ifcvt-two-insns-bool.c|  39 +++
 .../gcc.target/s390/ifcvt-two-insns-int.c |  39 +++
 .../gcc.target/s390/ifcvt-two-insns-long.c|  39 +++
 9 files changed, 464 insertions(+), 98 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-bool.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
 create mode 100644 gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c

-- 
2.31.1

[PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2021-06-25 Thread Robin Dapp via Gcc-patches

Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump.  Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.

This patch allows to re-use a CC so the costing situation is improved a
bit.
---
 gcc/ifcvt.c  |  16 +++--
 gcc/optabs.c | 163 ++-
 gcc/optabs.h |   1 +
 3 files changed, 121 insertions(+), 59 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index ac0c142c9fe..c5b8641e2aa 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -771,7 +771,7 @@ static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
-   rtx, rtx, rtx);
+   rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
@@ -1710,7 +1710,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 static rtx
 noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
-rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
+rtx rev_cc_cmp)
 {
   rtx target ATTRIBUTE_UNUSED;
   int unsignedp ATTRIBUTE_UNUSED;
@@ -1756,9 +1757,14 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, 
enum rtx_code code,
   unsignedp = (code == LTU || code == GEU
   || code == LEU || code == GTU);
 
-  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
- vtrue, vfalse, GET_MODE (x),
- unsignedp);
+  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
+target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
+   vtrue, vfalse, GET_MODE (x));
+  else
+target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
+   vtrue, vfalse, GET_MODE (x),
+   unsignedp);
+
   if (target)
 return target;
 
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59..6bf486b9b50 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -52,6 +52,8 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, 
rtx *,
 static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
 static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
 
+static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
+
 /* Debug facility for use in GDB.  */
 void debug_optab_libfuncs (void);
 
@@ -4747,7 +4749,6 @@ emit_conditional_move (rtx target, enum rtx_code code, 
rtx op0, rtx op1,
   machine_mode mode, int unsignedp)
 {
   rtx comparison;
-  rtx_insn *last;
   enum insn_code icode;
   enum rtx_code reversed;
 
@@ -4774,6 +4775,7 @@ emit_conditional_move (rtx target, enum rtx_code code, 
rtx op0, rtx op1,
   /* get_condition will prefer to generate LT and GT even if the old
  comparison was against zero, so undo that canonicalization here since
  comparisons against zero are cheaper.  */
+
   if (code == LT && op1 == const1_rtx)
 code = LE, op1 = const0_rtx;
   else if (code == GT && op1 == constm1_rtx)
@@ -4782,17 +4784,29 @@ emit_conditional_move (rtx target, enum rtx_code code, 
rtx op0, rtx op1,
   if (cmode == VOIDmode)
 cmode = GET_MODE (op0);
 
-  enum rtx_code orig_code = code;
+  /* If the first source operand is constant and the second is not, swap
+ it into the second.  In that case we also need to reverse the
+ comparison.  It is possible, though, that the conditional move
+ will not expand with operands in this order, so we might also need
+ to revert to the original comparison and operand order.  */
+
+  rtx rev_comparison = NULL_RTX;
   bool swapped = false;
-  if (swap_commutative_operands_p (op2, op3)
-  && ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
-  != UNKNOWN))
+
+  code = unsignedp ? unsigned_condition (code) : code;
+  comparison = simplify_gen_relational (code, VOIDmode, cmode, op0, op1);
+
+  if ((reversed = reversed_comparison_code_parts (code, op0, op1, NULL))
+  != UNKNOWN)
 {
-  std::swap (op2, op3);
-  code = reversed;
-  swapped = true;
+  reversed = unsignedp ? unsigned_condition (reversed) : reversed;
+  rev_comparison = simplify_gen_relational (reversed, VOIDmode, cmode,
+   op0, op1);
 }
 
+  if (swap_commutative_operands_p (op2, op3) && reversed != UNKNOWN)
+

[PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-06-25 Thread Robin Dapp via Gcc-patches

When if-converting multiple SETs and we encounter a swap-style idiom

  if (a > b)
{
  tmp = c;   // [1]
  c = d;
  d = tmp;
}

ifcvt should not generate a conditional move for the instruction at
[1].

In order to achieve that, this patch goes through all relevant SETs
and marks the relevant instructions.  This helps to evaluate costs.

On top, only generate temporaries if the current cmov is going to
overwrite one of the comparands of the initial compare.
---
 gcc/ifcvt.c | 104 +++-
 1 file changed, 87 insertions(+), 17 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 017944f4f79..eef6490626a 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, 
basic_block,
   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
+static void check_need_cmovs (basic_block, hash_map *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   auto_vec unmodified_insns;
   int count = 0;
 
+  hash_map need_cmovs;
+
+  check_need_cmovs (then_bb, &need_cmovs);
+
   FOR_BB_INSNS (then_bb, insn)
 {
   /* Skip over non-insns.  */
@@ -3213,26 +3218,38 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
   gcc_checking_assert (set);
 
   rtx target = SET_DEST (set);
-  rtx temp = gen_reg_rtx (GET_MODE (target));
+  rtx temp;
   rtx new_val = SET_SRC (set);
   rtx old_val = target;
 
-  /* If we were supposed to read from an earlier write in this block,
-we've changed the register allocation.  Rewire the read.  While
-we are looking, also try to catch a swap idiom.  */
-  for (int i = count - 1; i >= 0; --i)
-   if (reg_overlap_mentioned_p (new_val, targets[i]))
- {
-   /* Catch a "swap" style idiom.  */
-   if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
- /* The write to targets[i] is only live until the read
-here.  As the condition codes match, we can propagate
-the set to here.  */
- new_val = SET_SRC (single_set (unmodified_insns[i]));
-   else
- new_val = temporaries[i];
-   break;
- }
+  /* As we are transforming
+if (x > y)
+  a = b;
+  c = d;
+into
+  a = (x > y) ...
+  c = (x > y) ...
+
+we potentially check x > y before every set here.
+(Even though might be removed by subsequent passes.)
+We cannot transform
+  if (x > y)
+x = y;
+...
+into
+  x = (x > y) ...
+  ...
+since this would invalidate x.  Therefore we introduce a temporary
+every time we are about to overwrite a variable used in the
+check.  Costing of a sequence with these is going to be inaccurate.  */
+  if (reg_overlap_mentioned_p (target, cond))
+   temp = gen_reg_rtx (GET_MODE (target));
+  else
+   temp = target;
+
+  bool need_cmov = true;
+  if (need_cmovs.get (insn))
+   need_cmov = false;
 
   /* If we had a non-canonical conditional jump (i.e. one where
 the fallthrough is to the "else" case) we need to reverse
@@ -3808,6 +3825,59 @@ check_cond_move_block (basic_block bb,
   return TRUE;
 }
 
+/* Find local swap-style idioms in BB and mark the first insn (1)
+   that is only a temporary as not needing a conditional move as
+   it is going to be dead afterwards anyway.
+
+ (1) int tmp = a;
+a = b;
+b = tmp;
+
+ifcvt
+-->
+
+load tmp,a
+cmov a,b
+cmov b,tmp   */
+
+static void
+check_need_cmovs (basic_block bb, hash_map *need_cmov)
+{
+  rtx_insn *insn;
+  int count = 0;
+  auto_vec insns;
+  auto_vec dests;
+
+  FOR_BB_INSNS (bb, insn)
+{
+  rtx set, src, dest;
+
+  if (!active_insn_p (insn))
+   continue;
+
+  set = single_set (insn);
+  if (set == NULL_RTX)
+   continue;
+
+  src = SET_SRC (set);
+  dest = SET_DEST (set);
+
+  for (int i = count - 1; i >= 0; --i)
+   {
+ if (reg_overlap_mentioned_p (src, dests[i])
+ && find_reg_note (insn, REG_DEAD, src) != NULL_RTX)
+   {
+ need_cmov->put (insns[i], false);
+   }
+   }
+
+  insns.safe_push (insn);
+  dests.safe_push (dest);
+
+  count++;
+}
+}
+
 /* Given a basic block BB suitable for conditional move conversion,
a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing
the register values depending on COND, emit the insns in the block as
-- 
2.31.1

Re: [PATCH] c++: access scope during partial spec matching [PR96204]

2021-06-25 Thread Jason Merrill via Gcc-patches


On 6/25/21 11:03 AM, Patrick Palka wrote:

Here, when determining whether the partial specialization matches the
specialization has_set_attr_method, we do so from the scope of
where the template-id appears rather than from the scope of the
specialization, and this causes us to select the partial specialization
(since Child::type is accessible from Parent).  When we later
instantiate this partial specialization, we've entered the scope of the
specialization and so substitution into e.g. the DECL_CONTEXT for
'value' yields access errors for Child::type since the friend
declaration no longer applies.

It seems the appropriate access scope from which to perform partial
specialization matching is the specialization itself (similar to how
we check access of base-clauses), which is what this patch implements.



There's implementation divergence however: Clang accepts both testcases
below whereas MSVC and ICC reject both (indicating that Clang performs
partial spec matching from the scope of the specialization and MSVC/ICC
performs it from whatever scope the template-id appears).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/96204

gcc/cp/ChangeLog:

* pt.c (instantiate_class_template_1): Enter the scope of the
type before calling most_specialized_partial_spec.

gcc/testsuite/ChangeLog:

* g++.dg/template/access40.C: New test.
* g++.dg/template/access40a.C: New test.
---
  gcc/cp/pt.c   |  6 -
  gcc/testsuite/g++.dg/template/access40.C  | 30 +++
  gcc/testsuite/g++.dg/template/access40a.C | 30 +++
  3 files changed, 65 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/template/access40.C
  create mode 100644 gcc/testsuite/g++.dg/template/access40a.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f4e0abe5c1e..5107bfbf9d1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11774,8 +11774,12 @@ instantiate_class_template_1 (tree type)
deferring_access_check_sentinel acs (dk_no_deferred);
  
/* Determine what specialization of the original template to

- instantiate.  */
+ instantiate; do this relative to the scope of the type.  */
+  push_access_scope (TYPE_NAME (type));
+  pushclass (type);


How about replacing these two calls with push_nested_class (type), like 
we use later in the function?



t = most_specialized_partial_spec (type, tf_warning_or_error);
+  popclass ();
+  pop_access_scope (TYPE_NAME (type));
if (t == error_mark_node)
  return error_mark_node;
else if (t)
diff --git a/gcc/testsuite/g++.dg/template/access40.C 
b/gcc/testsuite/g++.dg/template/access40.C
new file mode 100644
index 000..e0d30779377
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40.C
@@ -0,0 +1,30 @@
+// PR c++/96204
+
+template struct bool_constant;
+
+template
+struct has_type_member {
+  static const bool value = false;
+};
+
+template
+struct has_type_member {
+  static const bool value = true;
+};
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct Parent;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization of has_type_member does not match
+// despite Child::type being accessible from the current scope.
+typedef bool_constant::value> type;
+typedef bool_constant type;
+  }
+};
diff --git a/gcc/testsuite/g++.dg/template/access40a.C 
b/gcc/testsuite/g++.dg/template/access40a.C
new file mode 100644
index 000..85138c9e570
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40a.C
@@ -0,0 +1,30 @@
+// PR c++/96204
+
+template struct bool_constant;
+
+template
+struct has_type_member {
+  static const bool value = false;
+};
+
+template
+struct has_type_member {
+  static const bool value = true;
+};
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct has_type_member;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization matches because Child::type is
+// accessible from has_type_member.
+typedef bool_constant::value> type;
+typedef bool_constant type;
+  }
+};

Re: [PATCH] c++: access scope during partial spec matching [PR96204]

2021-06-25 Thread Patrick Palka via Gcc-patches

On Fri, 25 Jun 2021, Patrick Palka wrote:

> Here, when determining whether the partial specialization matches the
> specialization has_set_attr_method, we do so from the scope of

Er, this should say has_type_method.

> where the template-id appears rather than from the scope of the
> specialization, and this causes us to select the partial specialization
> (since Child::type is accessible from Parent).  When we later
> instantiate this partial specialization, we've entered the scope of the
> specialization and so substitution into e.g. the DECL_CONTEXT for
> 'value' yields access errors for Child::type since the friend
> declaration no longer applies.
> 
> It seems the appropriate access scope from which to perform partial
> specialization matching is the specialization itself (similar to how
> we check access of base-clauses), which is what this patch implements.
> 
> There's implementation divergence however: Clang accepts both testcases
> below whereas MSVC and ICC reject both (indicating that Clang performs
> partial spec matching from the scope of the specialization and MSVC/ICC
> performs it from whatever scope the template-id appears).
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/96204
> 
> gcc/cp/ChangeLog:
> 
>   * pt.c (instantiate_class_template_1): Enter the scope of the
>   type before calling most_specialized_partial_spec.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/access40.C: New test.
>   * g++.dg/template/access40a.C: New test.
> ---
>  gcc/cp/pt.c   |  6 -
>  gcc/testsuite/g++.dg/template/access40.C  | 30 +++
>  gcc/testsuite/g++.dg/template/access40a.C | 30 +++
>  3 files changed, 65 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/access40.C
>  create mode 100644 gcc/testsuite/g++.dg/template/access40a.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index f4e0abe5c1e..5107bfbf9d1 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -11774,8 +11774,12 @@ instantiate_class_template_1 (tree type)
>deferring_access_check_sentinel acs (dk_no_deferred);
>  
>/* Determine what specialization of the original template to
> - instantiate.  */
> + instantiate; do this relative to the scope of the type.  */
> +  push_access_scope (TYPE_NAME (type));
> +  pushclass (type);

D'oh, this pair of calls is equivalent to push_nested_class; consider
this hunk replaced by a single call to push_nested_class(type).

>t = most_specialized_partial_spec (type, tf_warning_or_error);
> +  popclass ();
> +  pop_access_scope (TYPE_NAME (type));

Likewise consider this replaced with pop_nested_class().

>if (t == error_mark_node)
>  return error_mark_node;
>else if (t)
> diff --git a/gcc/testsuite/g++.dg/template/access40.C 
> b/gcc/testsuite/g++.dg/template/access40.C
> new file mode 100644
> index 000..e0d30779377
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/access40.C
> @@ -0,0 +1,30 @@
> +// PR c++/96204
> +
> +template struct bool_constant;
> +
> +template
> +struct has_type_member {
> +  static const bool value = false;
> +};
> +
> +template
> +struct has_type_member {
> +  static const bool value = true;
> +};
> +
> +struct Parent;
> +
> +struct Child {
> +private:
> +  friend struct Parent;
> +  typedef void type;
> +};
> +
> +struct Parent {
> +  static void f() {
> +// The partial specialization of has_type_member does not match
> +// despite Child::type being accessible from the current scope.
> +typedef bool_constant::value> type;
> +typedef bool_constant type;
> +  }
> +};
> diff --git a/gcc/testsuite/g++.dg/template/access40a.C 
> b/gcc/testsuite/g++.dg/template/access40a.C
> new file mode 100644
> index 000..85138c9e570
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/access40a.C
> @@ -0,0 +1,30 @@
> +// PR c++/96204
> +
> +template struct bool_constant;
> +
> +template
> +struct has_type_member {
> +  static const bool value = false;
> +};
> +
> +template
> +struct has_type_member {
> +  static const bool value = true;
> +};
> +
> +struct Parent;
> +
> +struct Child {
> +private:
> +  friend struct has_type_member;
> +  typedef void type;
> +};
> +
> +struct Parent {
> +  static void f() {
> +// The partial specialization matches because Child::type is
> +// accessible from has_type_member.
> +typedef bool_constant::value> type;
> +typedef bool_constant type;
> +  }
> +};
> -- 
> 2.32.0.93.g670b81a890
> 
>

Re: [PATCH] c++: CTAD within alias template [PR91911]

2021-06-25 Thread Jason Merrill via Gcc-patches


On 6/24/21 4:45 PM, Patrick Palka wrote:

In the first testcase below, during parsing of the alias template
ConstSpanType, transparency of alias template specializations means we
replace SpanType with SpanType's substituted definition.  But this
substitution lowers the level of the CTAD placeholder for span(T()) from
2 to 1, and so the later instantiantion of ConstSpanType
erroneously substitutes this CTAD placeholder with the template argument
at level 1 index 0, i.e. with int, before we get a chance to perform the
CTAD.

In light of this, it seems we should avoid level lowering when
substituting through through the type-id of a dependent alias template
specialization.  To that end this patch makes lookup_template_class_1
pass tf_partial to tsubst in this situation.


This makes sense, but what happens if SpanType is a member template, so 
that the levels of it and ConstSpanType don't match?  Or the other way 
around?



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/91911

gcc/cp/ChangeLog:

* pt.c (lookup_template_class_1): When looking up a dependent
alias template specialization, pass tf_partial to tsubst.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction92.C: New test.
---
  gcc/cp/pt.c   |  7 +-
  .../g++.dg/cpp1z/class-deduction92.C  | 17 +
  .../g++.dg/cpp1z/class-deduction93.C  | 25 +++
  3 files changed, 48 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction93.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f73c7471a33..23c5f515716 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9954,7 +9954,12 @@ lookup_template_class_1 (tree d1, tree arglist, tree 
in_decl, tree context,
template-arguments for the template-parameters in the
type-id of the alias template.  */
  
-	  t = tsubst (TREE_TYPE (gen_tmpl), arglist, complain, in_decl);

+ /* When substituting a dependent alias template specialization,
+we pass tf_partial to avoid lowering the level of any 'auto'
+in its type-id which might correspond to CTAD placeholders.  */
+ t = tsubst (TREE_TYPE (gen_tmpl), arglist,
+ complain | (is_dependent_type * tf_partial),
+ in_decl);
  /* Note that the call above (by indirectly calling
 register_specialization in tsubst_decl) registers the
 TYPE_DECL representing the specialization of the alias
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
new file mode 100644
index 000..ae3c55508b2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction92.C
@@ -0,0 +1,17 @@
+// PR c++/91911
+// { dg-do compile { target c++17 } }
+
+template
+struct span {
+  using value_type = T;
+  span(T);
+};
+
+template
+using SpanType = decltype(span(T()));
+
+template
+using ConstSpanType = span::value_type>;
+
+using type = ConstSpanType;
+using type = span;
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C
new file mode 100644
index 000..eebc986832e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction93.C
@@ -0,0 +1,25 @@
+// PR c++/98077
+// { dg-do compile { target c++17 } }
+
+template
+struct function {
+  template function(T) { }
+  using type = R;
+};
+
+template function(T) -> function;
+
+template
+struct CallableTrait;
+
+template
+struct CallableTrait> { using ReturnType = R; };
+
+template
+using CallableTraitT = decltype(function{F()});
+
+template
+using ReturnType = typename CallableTraitT::type;
+
+using type = ReturnType;
+using type = int;

Re: [PATCHv3 00/55] Replace the Power target-specific builtin machinery

2021-06-25 Thread Bill Schmidt via Gcc-patches


Ping / beg  :-)

On 6/17/21 10:18 AM, Bill Schmidt via Gcc-patches wrote:

Original patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568840.html

V2 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572231.html

I've made some slight changes to the V2 series because of recent updates
to trunk from Carl Love and Peter Bergner.  Carl added some new P10
builtins, and Peter made some changes to the MMA builtins.  This series
is compatible with all builtins that are currently upstream.

As a reminder, as a result of reviews, the original patch 0001 has been
dropped, so the patch numbering is off by one compared with the original
series.  Status of the remaining patches (using new numbering):

0001: Approved
0002: Approved
0003: Approved
0004: Approved
0005: Needs re-review
0006: Approved
0007: Approved
0008: Approved
0009: Approved
0010-0055: Not yet reviewed

Thanks again for the ongoing reviews!

Bill

Bill Schmidt (55):
   Support scanning of build-time GC roots in gengtype
   rs6000: Initial create of rs6000-gen-builtins.c
   rs6000: Add initial input files
   rs6000: Add file support and functions for diagnostic support
   rs6000: Add helper functions for parsing
   rs6000: Add functions for matching types, part 1 of 3
   rs6000: Add functions for matching types, part 2 of 3
   rs6000: Add functions for matching types, part 3 of 3
   rs6000: Red-black tree implementation for balanced tree search
   rs6000: Main function with stubs for parsing and output
   rs6000: Parsing built-in input file, part 1 of 3
   rs6000: Parsing built-in input file, part 2 of 3
   rs6000: Parsing built-in input file, part 3 of 3
   rs6000: Parsing of overload input file
   rs6000: Build and store function type identifiers
   rs6000: Write output to the builtin definition include file
   rs6000: Write output to the builtins header file
   rs6000: Write output to the builtins init file, part 1 of 3
   rs6000: Write output to the builtins init file, part 2 of 3
   rs6000: Write output to the builtins init file, part 3 of 3
   rs6000: Write static initializations for built-in table
   rs6000: Write static initializations for overload tables
   rs6000: Incorporate new builtins code into the build machinery
   rs6000: Add gengtype handling to the build machinery
   rs6000: Add the rest of the [altivec] stanza to the builtins file
   rs6000: Add VSX builtins
   rs6000: Add available-everywhere and ancient builtins
   rs6000: Add power7 and power7-64 builtins
   rs6000: Add power8-vector builtins
   rs6000: Add Power9 builtins
   rs6000: Add more type nodes to support builtin processing
   rs6000: Add Power10 builtins
   rs6000: Add MMA builtins
   rs6000: Add miscellaneous builtins
   rs6000: Add Cell builtins
   rs6000: Add remaining overloads
   rs6000: Execute the automatic built-in initialization code
   rs6000: Darwin builtin support
   rs6000: Add sanity to V2DI_type_node definitions
   rs6000: Always initialize vector_pair and vector_quad nodes
   rs6000: Handle overloads during program parsing
   rs6000: Handle gimple folding of target built-ins
   rs6000: Support for vectorizing built-in functions
   rs6000: Builtin expansion, part 1
   rs6000: Builtin expansion, part 2
   rs6000: Builtin expansion, part 3
   rs6000: Builtin expansion, part 4
   rs6000: Builtin expansion, part 5
   rs6000: Builtin expansion, part 6
   rs6000: Update rs6000_builtin_decl
   rs6000: Miscellaneous uses of rs6000_builtin_decls_x
   rs6000: Debug support
   rs6000: Update altivec.h for automated interfaces
   rs6000: Test case adjustments
   rs6000: Enable the new builtin support

  gcc/Makefile.in   |5 +-
  gcc/config.gcc|2 +
  gcc/config/rs6000/altivec.h   |  522 +-
  gcc/config/rs6000/darwin.h|8 +-
  gcc/config/rs6000/rbtree.c|  242 +
  gcc/config/rs6000/rbtree.h|   52 +
  gcc/config/rs6000/rs6000-builtin-new.def  | 3998 +++
  gcc/config/rs6000/rs6000-c.c  | 1083 +++
  gcc/config/rs6000/rs6000-call.c   | 3399 -
  gcc/config/rs6000/rs6000-gen-builtins.c   | 2984 
  gcc/config/rs6000/rs6000-overload.def | 6186 +
  gcc/config/rs6000/rs6000.c|  219 +-
  gcc/config/rs6000/rs6000.h|   84 +
  gcc/config/rs6000/t-rs6000|   45 +-
  gcc/gengtype-state.c  |   32 +-
  gcc/gengtype.c|   22 +-
  gcc/gengtype.h|5 +
  .../powerpc/bfp/scalar-extract-exp-2.c|2 +-
  .../powerpc/bfp/scalar-extract-sig-2.c|2 +-
  .../powerpc/bfp/scalar-insert-exp-2.c |2 +-
  .../powerpc/bfp/scalar-insert-exp-5.c |2 +-
  .../powerpc/bfp/scalar-insert-exp-8.c |2 +-
  .../powerpc/bfp/scalar-test-neg-2.c

[PATCH] c++: access scope during partial spec matching [PR96204]

2021-06-25 Thread Patrick Palka via Gcc-patches

Here, when determining whether the partial specialization matches the
specialization has_set_attr_method, we do so from the scope of
where the template-id appears rather than from the scope of the
specialization, and this causes us to select the partial specialization
(since Child::type is accessible from Parent).  When we later
instantiate this partial specialization, we've entered the scope of the
specialization and so substitution into e.g. the DECL_CONTEXT for
'value' yields access errors for Child::type since the friend
declaration no longer applies.

It seems the appropriate access scope from which to perform partial
specialization matching is the specialization itself (similar to how
we check access of base-clauses), which is what this patch implements.

There's implementation divergence however: Clang accepts both testcases
below whereas MSVC and ICC reject both (indicating that Clang performs
partial spec matching from the scope of the specialization and MSVC/ICC
performs it from whatever scope the template-id appears).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/96204

gcc/cp/ChangeLog:

* pt.c (instantiate_class_template_1): Enter the scope of the
type before calling most_specialized_partial_spec.

gcc/testsuite/ChangeLog:

* g++.dg/template/access40.C: New test.
* g++.dg/template/access40a.C: New test.
---
 gcc/cp/pt.c   |  6 -
 gcc/testsuite/g++.dg/template/access40.C  | 30 +++
 gcc/testsuite/g++.dg/template/access40a.C | 30 +++
 3 files changed, 65 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access40.C
 create mode 100644 gcc/testsuite/g++.dg/template/access40a.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f4e0abe5c1e..5107bfbf9d1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11774,8 +11774,12 @@ instantiate_class_template_1 (tree type)
   deferring_access_check_sentinel acs (dk_no_deferred);
 
   /* Determine what specialization of the original template to
- instantiate.  */
+ instantiate; do this relative to the scope of the type.  */
+  push_access_scope (TYPE_NAME (type));
+  pushclass (type);
   t = most_specialized_partial_spec (type, tf_warning_or_error);
+  popclass ();
+  pop_access_scope (TYPE_NAME (type));
   if (t == error_mark_node)
 return error_mark_node;
   else if (t)
diff --git a/gcc/testsuite/g++.dg/template/access40.C 
b/gcc/testsuite/g++.dg/template/access40.C
new file mode 100644
index 000..e0d30779377
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40.C
@@ -0,0 +1,30 @@
+// PR c++/96204
+
+template struct bool_constant;
+
+template
+struct has_type_member {
+  static const bool value = false;
+};
+
+template
+struct has_type_member {
+  static const bool value = true;
+};
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct Parent;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization of has_type_member does not match
+// despite Child::type being accessible from the current scope.
+typedef bool_constant::value> type;
+typedef bool_constant type;
+  }
+};
diff --git a/gcc/testsuite/g++.dg/template/access40a.C 
b/gcc/testsuite/g++.dg/template/access40a.C
new file mode 100644
index 000..85138c9e570
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40a.C
@@ -0,0 +1,30 @@
+// PR c++/96204
+
+template struct bool_constant;
+
+template
+struct has_type_member {
+  static const bool value = false;
+};
+
+template
+struct has_type_member {
+  static const bool value = true;
+};
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct has_type_member;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization matches because Child::type is
+// accessible from has_type_member.
+typedef bool_constant::value> type;
+typedef bool_constant type;
+  }
+};
-- 
2.32.0.93.g670b81a890

Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline

2021-06-25 Thread Richard Sandiford via Gcc-patches

Xi Ruoyao via Gcc-patches  writes:
> On Fri, 2021-06-25 at 01:02 +0800, Xi Ruoyao wrote:
>> On Thu, 2021-06-24 at 10:48 -0600, Jeff Law wrote:
>> > I'd like to know a bit more here.  mips.exp shouldn't care about the
>> > options passed to the compiler and to the best of my knowledge 
>> > patch itself is wrong, I question if it's necessary and whether or
>> > not
>> > your just papering over some other issue.
>> 
>> There is some logic processing options in mips.exp.  Some options are
>> overrided for multilib.  It seems the mips.exp was originally designed
>> as:
>> 
>> * MIPS options should go in dg-options
>> * Other options should go in dg-additional-options
>> 
>> In d2148424165 marxin merged some dg-additional-options into dg-
>> options,
>> exploited the problem.
>> 
>> And, the "origin" convention seems already broken: there is something
>> like -funroll-loops which is not a MIPS option, but accepted by
>> mips.exp
>> in dg-options.
>> 
>> Possiblities are:
>> 
>> (1) this patch
>> (2) make mips.exp accept -fno-inline as "if it is a MIPS option"
>> (3) refactor mips.exp to pass everything itself doesn't know directly
>> to gcc
>
> Attached a diff for mips.exp trying to make it pass everything in dg-
> options which is not known by itself directly to the compiler.
>
> The "smallest fix" is simply adding -fno-inline into mips.exp.  However
> I don't like it because I agree with you that mips.exp shouldn't care
> about dg-options, at least don't do it too much.

As I said in the other message, I think the smallest fix is the way to
go though.

Thanks,
Richard

[committed] Use right shifts to eliminate redundant test/compare insns on the H8

2021-06-25 Thread Jeff Law

Now we're moving on to the shifts/rotate patterns for redundant 
test/compare removal.  The first patch is trivial and just adds support 
into select_cc_mode for the right shifts which were missing.  This only 
helps the H8/SX as it's the only variant currently supporting using 
shifts to eliminate redundant test/compare insns, but it'll be needed 
for the other H8 variants soon enough.


This has gone through the usual ~24hr test cycle.  Committing to the trunk.

Jeff
commit 3a50aed09edc5e69080a0e49851acdb874227256
Author: Jeff Law 
Date:   Fri Jun 25 09:22:28 2021 -0400

Use right shifts to eliminate redundant test/compare insns on the H8

gcc/
* config/h8300/h8300.c (select_cc_mode): Handle ASHIFTRT and 
LSHIFTRT.

diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index 511c2b28e40..d8b4bfcbdbe 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -1947,9 +1947,10 @@ h8300_select_cc_mode (enum rtx_code cond, rtx op0, rtx 
op1)
   if (op1 == const0_rtx
   && (cond == EQ || cond == NE || cond == LT || cond == GE)
   && (GET_CODE (op0) == PLUS || GET_CODE (op0) == MINUS
-  || GET_CODE (op0) == NEG || GET_CODE (op0) == AND
-  || GET_CODE (op0) == IOR || GET_CODE (op0) == XOR
-  || GET_CODE (op0) == NOT || GET_CODE (op0) == ASHIFT
+ || GET_CODE (op0) == NEG || GET_CODE (op0) == AND
+ || GET_CODE (op0) == IOR || GET_CODE (op0) == XOR
+ || GET_CODE (op0) == NOT || GET_CODE (op0) == ASHIFT
+ || GET_CODE (op0) == ASHIFTRT || GET_CODE (op0) == LSHIFTRT
  || GET_CODE (op0) == MULT || GET_CODE (op0) == SYMBOL_REF
  || GET_CODE (op0) == SIGN_EXTEND || GET_CODE (op0) == ZERO_EXTEND
  || REG_P (op0) || MEM_P (op0)))

Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline

2021-06-25 Thread Xi Ruoyao via Gcc-patches

On Fri, 2021-06-25 at 01:02 +0800, Xi Ruoyao wrote:
> On Thu, 2021-06-24 at 10:48 -0600, Jeff Law wrote:
> > I'd like to know a bit more here.  mips.exp shouldn't care about the
> > options passed to the compiler and to the best of my knowledge 
> > patch itself is wrong, I question if it's necessary and whether or
> > not
> > your just papering over some other issue.
> 
> There is some logic processing options in mips.exp.  Some options are
> overrided for multilib.  It seems the mips.exp was originally designed
> as:
> 
> * MIPS options should go in dg-options
> * Other options should go in dg-additional-options
> 
> In d2148424165 marxin merged some dg-additional-options into dg-
> options,
> exploited the problem.
> 
> And, the "origin" convention seems already broken: there is something
> like -funroll-loops which is not a MIPS option, but accepted by
> mips.exp
> in dg-options.
> 
> Possiblities are:
> 
> (1) this patch
> (2) make mips.exp accept -fno-inline as "if it is a MIPS option"
> (3) refactor mips.exp to pass everything itself doesn't know directly
> to gcc

Attached a diff for mips.exp trying to make it pass everything in dg-
options which is not known by itself directly to the compiler.

The "smallest fix" is simply adding -fno-inline into mips.exp.  However
I don't like it because I agree with you that mips.exp shouldn't care
about dg-options, at least don't do it too much.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp
index 01292316944..3ffcab551be 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -247,7 +247,6 @@ set mips_option_groups {
 mips3d "-mips3d|-mno-mips3d"
 pic "-f(no-|)(pic|PIC)"
 cb "-mcompact-branches=.*"
-profiling "-pg"
 small-data "-G[0-9]+"
 warnings "-w"
 dump "-fdump-.*"
@@ -315,30 +314,6 @@ foreach option {
 lappend mips_option_groups $option "-m$option=.*"
 }
 
-# Add -ffoo/-fno-foo options to mips_option_groups.
-foreach option {
-common
-delayed-branch
-expensive-optimizations
-fast-math
-fat-lto-objects
-finite-math-only
-fixed-hi
-fixed-lo
-lax-vector-conversions
-omit-frame-pointer
-optimize-sibling-calls
-peephole2
-schedule-insns2
-split-wide-types
-tree-vectorize
-unroll-all-loops
-unroll-loops
-ipa-ra
-} {
-lappend mips_option_groups $option "-f(no-|)$option"
-}
-
 # A list of option groups that have an impact on the ABI.
 set mips_abi_groups {
 abi
@@ -571,6 +546,10 @@ proc mips_original_option { group } {
 proc mips_test_option_p { upstatus group } {
 upvar $upstatus status
 
+if {[ string equal $group "" ]} {
+return 0
+}
+
 return $status(test_option_p,$group)
 }
 
@@ -620,7 +599,12 @@ proc mips_have_test_option_p { upstatus option } {
 proc mips_make_test_option { upstatus option args } {
 upvar $upstatus status
 
-set group [mips_option_group $option]
+set group [mips_option_maybe_group $option]
+if { [string equal $group ""] } {
+	lappend status(raw_option) $option
+	return
+}
+
 if { ![mips_test_option_p status $group] } {
 	set status(option,$group) $option
 	set status(test_option_p,$group) 1
@@ -741,6 +725,7 @@ proc mips-dg-init {} {
 
 # Start with a fresh option status.
 array unset mips_base_options
+set mips_base_options(raw_option) {}
 foreach { group regexp } $mips_option_groups {
 	set mips_base_options(option,$group) ""
 	set mips_base_options(explicit_p,$group) 0
@@ -1016,7 +1001,7 @@ proc mips-dg-options { args } {
 # Record the options that this test explicitly needs.
 foreach option [lindex $args 1] {
 	set all_but_p [regexp {^\((.*)\)$} $option dummy option]
-	set group [mips_option_group $option]
+	set group [mips_option_maybe_group $option]
 	if { [mips_test_option_p options $group] } {
 	set old [mips_option options $group]
 	error "Inconsistent $group option: $old vs. $option"
@@ -1025,10 +1010,6 @@ proc mips-dg-options { args } {
 	}
 }
 
-# Handle dependencies between the test options and the optimization ones.
-mips_option_dependency options "-fno-unroll-loops" "-fno-unroll-all-loops"
-mips_option_dependency options "-pg" "-fno-omit-frame-pointer"
-
 # Handle dependencies between options on the left of the
 # dependency diagram.
 mips_option_dependency options "-mips16" "-mno-micromips"
@@ -1505,6 +1486,10 @@ proc mips-dg-options { args } {
 	}
 }
 
+foreach { opt } $options(raw_option) {
+append extra_tool_flags " " $opt
+}
+
 # If the test is marked as requiring standard libraries check
 # that the sysroot has support for the current set of test options.
 if { [mips_have_test_option_p options "REQUIRES_STDLIB"] } {

Re: GCC documentation: porting to Sphinx

2021-06-25 Thread Martin Liška


On 6/25/21 3:11 PM, Martin Liška wrote:

List of known issues (planned to be fixed after merging):


I forgot about:

- diagnostics URL (for e.g. warnings) needs to be adjusted

Martin

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches





On 2021/6/25 18:02, Richard Biener wrote:

On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:




On 2021/6/25 16:54, Richard Biener wrote:

On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
 wrote:


From: Xiong Hu Luo 

adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power.  For example, it generates mismatched address offset after
adjust iv update statement position:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
_34 = ref_180 + 18446744073709551615;
_86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
if (_84 == _86)
goto ; [94.50%]
else
goto ; [5.50%]

Disable it will produce:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
_86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
if (_84 == _86)
goto ; [94.50%]
else
goto ; [5.50%]

Then later pass loop unroll could benefit from same address offset
with different base address and reduces register dependency.
This patch could improve performance by 10% for typical case on Power,
no performance change observed for X86 or Aarch64 due to small loops
not unrolled on these platforms.  Any comments?


The case you quote is special in that if we hoisted the IV update before
the other MEM _also_ used in the condition it would be fine again.


Thanks.  I tried to hoist the IV update statement before the first MEM (Fix 2), 
it
shows even worse performance due to not unroll(two more "base-1" is generated 
in gimple,
then loop->ninsns is 11 so small loops is not unrolled), change the threshold 
from
10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, the
performance is SAME to the one that IV update statement in the *MIDDLE* (trunk).
 From the ASM, we can see the index register %r4 is used in two iterations which
maybe a bottle neck for hiding instruction latency?

Then it seems reasonable the performance would be better if keep the IV update
statement at *LAST* (Fix 1).

(Fix 2):
[local count: 70988443]:
   ivtmp.30_415 = ivtmp.30_414 + 1;
   _34 = ip_229 + 18446744073709551615;
   _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
   _33 = ref_180 + 18446744073709551615;
   _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
   if (_84 == _86)
 goto ; [94.50%]
   else
 goto ; [5.50%]


.L67:
 lbzx %r12,%r24,%r4
 lbzx %r25,%r7,%r4
 cmpw %cr0,%r12,%r25
 bne %cr0,.L11
 mr %r26,%r4
 addi %r4,%r4,1
 lbzx %r12,%r24,%r4
 lbzx %r25,%r7,%r4
 mr %r6,%r26
 cmpw %cr0,%r12,%r25
 bne %cr0,.L11
 mr %r26,%r4
.L12:
 cmpdi %cr0,%r10,1
 addi %r4,%r26,1
 mr %r6,%r26
 addi %r10,%r10,-1
 bne %cr0,.L67



Now, adjust_iv_update_pos doesn't seem to check that the
condition actually uses the IV use stmt def, so it likely applies to
too many cases.

Unfortunately the introducing rev didn't come with a testcase,
but still I think fixing up adjust_iv_update_pos is better than
introducing a way to short-cut it per target decision.

One "fix" might be to add a check that either the condition
lhs or rhs is the def of the IV use and the other operand
is invariant.  Or if it's of similar structure hoist across the
other iv-use as well.  Not that I understand the argument
about the overlapping life-range.

You also don't provide a complete testcase ...



Attached the test code, will also add it it patch in future version.
The issue comes from a very small hot loop:

 do {
   len++;
 } while(len < maxlen && ip[len] == ref[len]);


unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int maxlen)
{
   unsigned int len = 2;
   do {
   len++;
   }while(len < maxlen && ip[len] == ref[len]);
   return len;
}

I can see the effect on this loop on x86_64 as well, we end up with

.L6:
 movzbl  (%rdi,%rax), %ecx
 addq$1, %rax
 cmpb-1(%rsi,%rax), %cl
 jne .L1
.L3:
 movl%eax, %r8d
 cmpl%edx, %eax
 jb  .L6

but without the trick it is

.L6:
 movzbl  (%rdi,%rax), %r8d
 movzbl  (%rsi,%rax), %ecx
 addq$1, %rax
 cmpb%cl, %r8b
 jne .L1
.L3:
 movl%eax, %r9d
 cmpl%edx, %eax
 jb  .L6

so here you can see the missed fusion.  Of course
in this case the IV update could have been sunk into
the .L3 block and replicated on the exit edge as well.


It is not sunk by the recently added gimple sink2 pass,
reason is the latch block is an empty block.  Remove this
check would cause some SPEC performance regression.

tree-ssa-sink.c
static bool
statement_sink_location (gimple *stmt, basic_block frombb,
 gimple_stmt_iterator *togsi, bool *zero_uses_p)
{
...
  /* If the latch block is empty, don't make it non-empty by sinking
 something into it.  */
  if (sinkbb == frombb-

Re: GCC documentation: porting to Sphinx

2021-06-25 Thread Martin Liška


Hello.

I've got something that is very close to be a patch candidate that can be
eventually merged. Right now, the patches are available here:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/sphinx-v3

Changes since last version:

- gdc manual was ported
- 'make doc' works fine both with and w/o installed sphinx-build
- 'make pdf' and 'make html' works fine
- libgccjit was ported to the shared Makefile and configuration
- likewise for 3 existing Ada manuals
- .texi files are removed

List of known issues (planned to be fixed after merging):
- cross manual references are not working
- update_web_docs_git needs to be changed - will simplify rapidly
- Sphinx warnings should be addressed
- remove texinfo references in Manuals
- list package requirements for Sphinx manual generation

I'm looking forward to a feedback.
Thanks,
Martin

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches


+cc Xinliang David Li since he introduced the function
adjust_iv_update_pos. Looking forward to the input. :)


On 2021/6/25 18:02, Richard Biener wrote:

On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:




On 2021/6/25 16:54, Richard Biener wrote:

On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
 wrote:


From: Xiong Hu Luo 

adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power.  For example, it generates mismatched address offset after
adjust iv update statement position:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
_34 = ref_180 + 18446744073709551615;
_86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
if (_84 == _86)
goto ; [94.50%]
else
goto ; [5.50%]

Disable it will produce:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
_86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
if (_84 == _86)
goto ; [94.50%]
else
goto ; [5.50%]

Then later pass loop unroll could benefit from same address offset
with different base address and reduces register dependency.
This patch could improve performance by 10% for typical case on Power,
no performance change observed for X86 or Aarch64 due to small loops
not unrolled on these platforms.  Any comments?


The case you quote is special in that if we hoisted the IV update before
the other MEM _also_ used in the condition it would be fine again.


Thanks.  I tried to hoist the IV update statement before the first MEM (Fix 2), 
it
shows even worse performance due to not unroll(two more "base-1" is generated 
in gimple,
then loop->ninsns is 11 so small loops is not unrolled), change the threshold 
from
10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, the
performance is SAME to the one that IV update statement in the *MIDDLE* (trunk).
 From the ASM, we can see the index register %r4 is used in two iterations which
maybe a bottle neck for hiding instruction latency?

Then it seems reasonable the performance would be better if keep the IV update
statement at *LAST* (Fix 1).

(Fix 2):
[local count: 70988443]:
   ivtmp.30_415 = ivtmp.30_414 + 1;
   _34 = ip_229 + 18446744073709551615;
   _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
   _33 = ref_180 + 18446744073709551615;
   _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
   if (_84 == _86)
 goto ; [94.50%]
   else
 goto ; [5.50%]


.L67:
 lbzx %r12,%r24,%r4
 lbzx %r25,%r7,%r4
 cmpw %cr0,%r12,%r25
 bne %cr0,.L11
 mr %r26,%r4
 addi %r4,%r4,1
 lbzx %r12,%r24,%r4
 lbzx %r25,%r7,%r4
 mr %r6,%r26
 cmpw %cr0,%r12,%r25
 bne %cr0,.L11
 mr %r26,%r4
.L12:
 cmpdi %cr0,%r10,1
 addi %r4,%r26,1
 mr %r6,%r26
 addi %r10,%r10,-1
 bne %cr0,.L67



Now, adjust_iv_update_pos doesn't seem to check that the
condition actually uses the IV use stmt def, so it likely applies to
too many cases.

Unfortunately the introducing rev didn't come with a testcase,
but still I think fixing up adjust_iv_update_pos is better than
introducing a way to short-cut it per target decision.

One "fix" might be to add a check that either the condition
lhs or rhs is the def of the IV use and the other operand
is invariant.  Or if it's of similar structure hoist across the
other iv-use as well.  Not that I understand the argument
about the overlapping life-range.

You also don't provide a complete testcase ...



Attached the test code, will also add it it patch in future version.
The issue comes from a very small hot loop:

 do {
   len++;
 } while(len < maxlen && ip[len] == ref[len]);


unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int maxlen)
{
   unsigned int len = 2;
   do {
   len++;
   }while(len < maxlen && ip[len] == ref[len]);
   return len;
}

I can see the effect on this loop on x86_64 as well, we end up with

.L6:
 movzbl  (%rdi,%rax), %ecx
 addq$1, %rax
 cmpb-1(%rsi,%rax), %cl
 jne .L1
.L3:
 movl%eax, %r8d
 cmpl%edx, %eax
 jb  .L6

but without the trick it is

.L6:
 movzbl  (%rdi,%rax), %r8d
 movzbl  (%rsi,%rax), %ecx
 addq$1, %rax
 cmpb%cl, %r8b
 jne .L1
.L3:
 movl%eax, %r9d
 cmpl%edx, %eax
 jb  .L6

so here you can see the missed fusion.  Of course
in this case the IV update could have been sunk into
the .L3 block and replicated on the exit edge as well.

I'm not sure if the motivation for the change introducing this
trick was the above kind of combination or not, but I guess
so.  The dependence distance of the IV increment to the
use is now shorter, so I'm not sure the combined variant is
better.

Richard.




--
Thanks,
Xionghu


--
Thanks,
Xionghu

[PATCH v2] x86: Check AVX512 without mask instructions

2021-06-25 Thread H.J. Lu via Gcc-patches

On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak  wrote:
>
> On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu  wrote:
> >
> > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> >  wrote:
> > >
> > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu  wrote:
> > > >
> > > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > > are enabled, compiler is free to use them in these functions.  Add
> > > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > > to avoid vector instructions.
> > >
> > > These functions are intended to be inlined, so how does target
> > > attribute affect inlining?
> > I guess w/ -O0. they may not be inlined, that's why H.J adds those
> > attributes to those functions.
>
> The problem is not with these functions, but with surrounding checks
> for cpuid features. These checks are implemented with logic
> instructions, and nothing prevents RA from allocating mask registers,
> and consequently mask insn is emitted. Regarding mentioned functions,
> cpuid insn pattern has four GPR single-reg constraints, so mask
> registers can't be allocated here.
>
> > pr96814.dump:
> > 0804aa40 :
> >  804aa40: 8d 4c 24 04  lea0x4(%esp),%ecx
> > ...
> >  804aa63: 6a 07push   $0x7
> >  804aa65: e8 e0 e7 ff ffcall   804924a <__get_cpuid_count>
> >
> > Also we need to add a target attribute to avx512f_os_support (), and
> > that would be enough to fix the AVX512 part.
> >
> > Moreover, all check functions in below files may also need to deal with:
> > adx-check.h
> > aes-avx-check.h
> > aes-check.h
> > amx-check.h
> > attr-nocf-check-1a.c
> > attr-nocf-check-3a.c
> > avx2-check.h
> > avx2-vpop-check.h
> > avx512bw-check.h
> > avx512-check.h
> > avx512dq-check.h
> > avx512er-check.h
> > avx512f-check.h
> > avx512vl-check.h
> > avx-check.h
> > bmi2-check.h
> > bmi-check.h
> > cf_check-1.c
> > cf_check-2.c
> > cf_check-3.c
> > cf_check-4.c
> > cf_check-5.c
> > f16c-check.h
> > fma4-check.h
> > fma-check.h
> > isa-check.h
> > lzcnt-check.h
> > m128-check.h
> > m256-check.h
> > m512-check.h
> > mmx-3dnow-check.h
> > mmx-check.h
> > pclmul-avx-check.h
> > pclmul-check.h
> > pr39315-check.c
> > rtm-check.h
> > sha-check.h
> > spellcheck-options-1.c
> > spellcheck-options-2.c
> > spellcheck-options-3.c
> > spellcheck-options-4.c
> > spellcheck-options-5.c
> > sse2-check.h
> > sse3-check.h
> > sse4_1-check.h
> > sse4_2-check.h
> > sse4a-check.h
> > sse-check.h
> > ssse3-check.h
> > stack-check-11.c
> > stack-check-12.c
> > stack-check-17.c
> > stack-check-18.c
> > stack-check-19.c
> > xop-check.h
>
> True, but this would just paper over the real problem. Now, it is
> expected that the user decorates the function that checks CPUID
> features with the target attribute. I'm not sure if this is OK.
>
> Uros.

CPUID functions are used to detect CPU features.  If mask instructions
are enabled, compiler is free to use them in these functions.  Disable
AVX512F in AVX512 check with target pragma to avoid mask instructions.

OK for master?

Thanks.

-- 
H.J.
From 7961c445f27de3e813d7332afc14e9844c0831f7 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 24 Jun 2021 04:43:41 -0700
Subject: [PATCH v2] x86: Check AVX512 without mask instructions

CPUID functions are used to detect CPU features.  If mask instructions
are enabled, compiler is free to use them in these functions.  Disable
AVX512F in AVX512 check with target pragma to avoid mask instructions.

	PR target/101185
	* gcc.target/i386/avx512-check.h: Disable AVX512F in AVX512 check
	with target pragma.
---
 gcc/testsuite/gcc.target/i386/avx512-check.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..7ccf730c4f1 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -1,5 +1,8 @@
 #include 
+#pragma GCC push_options
+#pragma GCC target ("no-mmx,no-sse")
 #include "cpuid.h"
+#pragma GCC pop_options
 #include "m512-check.h"
 #include "avx512f-os-support.h"
 
@@ -25,6 +28,9 @@ do_test (void)
 }
 #endif
 
+#pragma GCC push_options
+#pragma GCC target ("no-mmx,no-sse")
+
 static int
 check_osxsave (void)
 {
@@ -110,3 +116,4 @@ main ()
 #endif
   return 0;
 }
+#pragma GCC pop_options
-- 
2.31.1

Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast

2021-06-25 Thread Jonathan Wakely via Gcc-patches

On Thu, 24 Jun 2021 at 15:02, Matthias Kretz wrote:
>
> For -ffast-math there was a missing using namespace __proposed left. The
> attached patch resolves the issue.

OK for trunk, please push (after adding yourself to the "Write After
Approval" section of MAINTAINERS as per
https://gcc.gnu.org/gitwrite.html as your first commit).

Thanks!


> From: Matthias Kretz 
>
> The __bit_cast function was a hack to achieve what __builtin_bit_cast
> can do, therefore use __builtin_bit_cast if possible. However,
> __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
> it isn't trivially copyable (in the language sense — in principle it
> is). Therefore add __proposed::simd_bit_cast to enable the use case
> required in the test framework.
>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> * include/experimental/bits/simd.h (__bit_cast): Implement via
> __builtin_bit_cast #if available.
> (__proposed::simd_bit_cast): Add overloads for simd and
> simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
> available), which return an object of the requested type with
> the same bits as the argument.
> * include/experimental/bits/simd_math.h: Use simd_bit_cast
> instead of __bit_cast to allow casts to fixed_size_simd.
> (copysign): Remove branch that was only required if __bit_cast
> cannot be constexpr.
> * testsuite/experimental/simd/tests/bits/test_values.h: Switch
> from __bit_cast to __proposed::simd_bit_cast since the former
> will not cast fixed_size objects anymore.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 57 ++-
>  .../include/experimental/bits/simd_math.h | 37 ++--
>  .../simd/tests/bits/test_values.h |  8 +--
>  3 files changed, 76 insertions(+), 26 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──

Re: [PATCH] testsuite: mips: use noinline attribute instead of -fno-inline

2021-06-25 Thread Richard Sandiford via Gcc-patches

Xi Ruoyao via Gcc-patches  writes:
> On Thu, 2021-06-24 at 10:48 -0600, Jeff Law wrote:
>> 
>> 
>> On 6/22/2021 1:05 AM, Xi Ruoyao via Gcc-patches wrote:
>> > mips.exp does not support -fno-inline, causing the tests return
>> > "ERROR:
>> > Unrecognised option: -fno-inline for dg-options ... ".
>> > 
>> > Use noinline attribute like other mips target tests, to workaround
>> > it.
>> > 
>> > gcc/testsuite/
>> > 
>> > * gcc.target/mips/cfgcleanup-jalr2.c: Remove -fno-inline and
>> > add
>> >   __attribute__((noinline)).
>> > * gcc.target/mips/cfgcleanup-jalr3.c: Likewise.
>> I'd like to know a bit more here.  mips.exp shouldn't care about the 
>> options passed to the compiler and to the best of my knowledge 
>> -fno-inline is still a valid GCC option.  So while I don't think the 
>> patch itself is wrong, I question if it's necessary and whether or not
>> your just papering over some other issue.
>
> There is some logic processing options in mips.exp.  Some options are
> overrided for multilib.  It seems the mips.exp was originally designed
> as:
>
> * MIPS options should go in dg-options
> * Other options should go in dg-additional-options
>
> In d2148424165 marxin merged some dg-additional-options into dg-options,
> exploited the problem.
>
> And, the "origin" convention seems already broken: there is something
> like -funroll-loops which is not a MIPS option, but accepted by mips.exp
> in dg-options.

Originally the idea was that all options would go into dg-options,
not just MIPS ones.  If a test wanted to pass an option that mips.exp
doesn't currently understand then mips.exp should be extended to
classify the new option appropriately.

Some options require intervention from mips.exp to be usable across all
configurations while others can be passed through as-is.  The idea was
that mips.exp would be the single place that knew which options were which,
rather than having to hard-code that distinction in every test.

So I think the fix for this would be to extend:

# Add -ffoo/-fno-foo options to mips_option_groups.
foreach option {
common
delayed-branch
expensive-optimizations
fast-math
fat-lto-objects
finite-math-only
fixed-hi
fixed-lo
lax-vector-conversions
omit-frame-pointer
optimize-sibling-calls
peephole2
schedule-insns2
split-wide-types
tree-vectorize
unroll-all-loops
unroll-loops
ipa-ra
} {
lappend mips_option_groups $option "-f(no-|)$option"
}

to include "inline" as well.

A rule like “all -f options can be passed through as-is” isn't safe
because (at least) -fpic requires special handling.

Thanks,
Richard

Re: [RFC] ldist: Recognize rawmemchr loop patterns

2021-06-25 Thread Stefan Schulze Frielinghaus via Gcc-patches

On Wed, Jun 16, 2021 at 04:22:35PM +0200, Richard Biener wrote:
> On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus
>  wrote:
> >
> > On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus wrote:
> > [...]
> > > > but we won't ever arrive here because of the niters condition.  But
> > > > yes, doing the pattern matching in the innermost loop processing code
> > > > looks good to me - for the specific case it would be
> > > >
> > > >   /* Don't distribute loop if niters is unknown.  */
> > > >   tree niters = number_of_latch_executions (loop);
> > > >   if (niters == NULL_TREE || niters == chrec_dont_know)
> > > > ---> here?
> > > > continue;
> > >
> > > Right, please find attached a new version of the patch where everything
> > > is included in the loop distribution pass.  I will do a bootstrap and
> > > regtest on IBM Z over night.  If you give me green light I will also do
> > > the same on x86_64.
> >
> > Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at
> > least the ldist-strlen testcase).  If you are Ok with the patch, then I
> > would rebase and run the testsuites again and post a patch series
> > including the rawmemchr implementation for IBM Z.
> 
> @@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop
> *loop, vec *work_list)
>return work_list->length () > 0;
>  }
> 
> +static void
> +generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
> +   data_reference_p store_dr, tree base, tree 
> pattern,
> +   location_t loc)
> +{
> 
> this new function needs a comment.  Applies to all of the new ones, btw.

Done.

> +  gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base))
> +  && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE 
> (pattern));
> 
> this looks fragile and is probably unnecessary as well.
> 
> +  gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base));
> 
> in general you want types_compatible_p () checks which for pointers means
> all pointers are compatible ...

True, I removed both asserts.

> (skipping stuff)
> 
> @@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun)
>   && !optimize_loop_for_speed_p (loop)))
> continue;
> 
> -  /* Don't distribute loop if niters is unknown.  */
> +  /* If niters is unknown don't distribute loop but rather try to 
> transform
> +it to a call to a builtin.  */
>tree niters = number_of_latch_executions (loop);
>if (niters == NULL_TREE || niters == chrec_dont_know)
> -   continue;
> +   {
> + if (transform_reduction_loop (loop))
> +   {
> + changed = true;
> + loops_to_be_destroyed.safe_push (loop);
> + if (dump_file)
> +   fprintf (dump_file, "Loop %d transformed into a
> builtin.\n", loop->num);
> +   }
> + continue;
> +   }
> 
> please look at
> 
>   if (nb_generated_loops + nb_generated_calls > 0)
> {
>   changed = true;
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
>  loc, "Loop%s %d distributed: split to
> %d loops "
>  "and %d library calls.\n", str, loop->num,
>  nb_generated_loops, nb_generated_calls);
> 
> and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the
> transforms are reported with -fopt-info-loop

Done.

> +
> +  return transform_reduction_loop_1 (loop, load_dr, store_dr, reduction_var);
> +}
> 
> what's the point in tail-calling here and visually splitting the
> function in half?

In the first place I thought that this is more pleasant since in
transform_reduction_loop_1 it is settled that we have a single load,
store, and reduction variable.  After refactoring this isn't true
anymore and I inlined the function and made this clear via a comment.

> 
> (sorry for picking random pieces now ;))
> 
> +  for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
> +  gsi_next (&bsi), ++ninsns)
> +   {
> 
> this counts debug insns, I guess you want gsi_next_nondebug at least.
> not sure why you are counting PHIs at all btw - for the loops you match
> you are expecting at most two, one IV and eventually one for the virtual
> operand of the store?

Yes, I removed the counting for the phi loop and changed to
gsi_next_nondebug for both loops.

> 
> + if (gimple_has_volatile_ops (phi))
> +   return false;
> 
> PHIs never have volatile ops.
> 
> + if (gimple_clobber_p (phi))
> +   continue;
> 
> or are clobbers.

Removed both.

> Btw, can you factor out a helper from find_single_drs working on a
> stmt to reduce code duplication?

Ahh sorry for that.  I've already done this in one of my first patches
but didn't copy that over.  Although my changes do not require a RDG the
whole pass is base

[PATCH] tree-optimization/101202 - fix ICE with failed backedge SLP nodes

2021-06-25 Thread Richard Biener

This fixes an ICE with failed backedge SLP nodes still in the graph
while doing permute optimization by explicitely handling those.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-25  Richard Biener  

PR tree-optimization/101202
* tree-vect-slp.c (vect_optimize_slp): Explicitely handle
failed nodes.

* gcc.dg/torture/pr101202.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101202.c | 33 +
 gcc/tree-vect-slp.c | 49 +++--
 2 files changed, 62 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101202.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101202.c 
b/gcc/testsuite/gcc.dg/torture/pr101202.c
new file mode 100644
index 000..e76c908f493
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101202.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+int printf(const char *, ...);
+unsigned a, b, d;
+int c, e, f;
+int main()
+{
+  while (a)
+if (b)
+  {
+   f = a;
+   while (e)
+ {
+   int h, i;
+   if (d)
+ {
+   h = a;
+   i = d;
+L:
+   d = a | d && c;
+   if (a)
+ {
+   printf("%d", a);
+   goto L;
+ }
+ }
+   a = h;
+   d = i;
+ }
+  }
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 227d6aa3ee8..17fe5f23c09 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3689,26 +3689,33 @@ vect_optimize_slp (vec_info *vinfo)
 
  vertices[idx].visited = 1;
 
- /* We do not handle stores with a permutation.  */
- stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node);
- if (STMT_VINFO_DATA_REF (rep)
- && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep)))
-   continue;
- /* We cannot move a permute across an operation that is
-not independent on lanes.  Note this is an explicit
-negative list since that's much shorter than the respective
-positive one but it's critical to keep maintaining it.  */
- if (is_gimple_call (STMT_VINFO_STMT (rep)))
-   switch (gimple_call_combined_fn (STMT_VINFO_STMT (rep)))
- {
- case CFN_COMPLEX_ADD_ROT90:
- case CFN_COMPLEX_ADD_ROT270:
- case CFN_COMPLEX_MUL:
- case CFN_COMPLEX_MUL_CONJ:
- case CFN_VEC_ADDSUB:
+ /* We still eventually have failed backedge SLP nodes in the
+graph, those are only cancelled when analyzing operations.
+Simply treat them as transparent ops, propagating permutes
+through them.  */
+ if (SLP_TREE_DEF_TYPE (node) == vect_internal_def)
+   {
+ /* We do not handle stores with a permutation.  */
+ stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node);
+ if (STMT_VINFO_DATA_REF (rep)
+ && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep)))
continue;
- default:;
- }
+ /* We cannot move a permute across an operation that is
+not independent on lanes.  Note this is an explicit
+negative list since that's much shorter than the respective
+positive one but it's critical to keep maintaining it.  */
+ if (is_gimple_call (STMT_VINFO_STMT (rep)))
+   switch (gimple_call_combined_fn (STMT_VINFO_STMT (rep)))
+ {
+ case CFN_COMPLEX_ADD_ROT90:
+ case CFN_COMPLEX_ADD_ROT270:
+ case CFN_COMPLEX_MUL:
+ case CFN_COMPLEX_MUL_CONJ:
+ case CFN_VEC_ADDSUB:
+   continue;
+ default:;
+ }
+   }
 
  int perm = -1;
  for (graph_edge *succ = slpg->vertices[idx].succ;
@@ -3812,7 +3819,9 @@ vect_optimize_slp (vec_info *vinfo)
   slp_tree child;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child)
{
- if (!child || SLP_TREE_DEF_TYPE (child) == vect_internal_def)
+ if (!child
+ || (SLP_TREE_DEF_TYPE (child) != vect_constant_def
+ && SLP_TREE_DEF_TYPE (child) != vect_external_def))
continue;
 
  /* If the vector is uniform there's nothing to do.  */
-- 
2.26.2

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Richard Biener via Gcc-patches

On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:
>
>
>
> On 2021/6/25 16:54, Richard Biener wrote:
> > On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
> >  wrote:
> >>
> >> From: Xiong Hu Luo 
> >>
> >> adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
> >> on Power.  For example, it generates mismatched address offset after
> >> adjust iv update statement position:
> >>
> >>  [local count: 70988443]:
> >> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
> >> ivtmp.30_415 = ivtmp.30_414 + 1;
> >> _34 = ref_180 + 18446744073709551615;
> >> _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
> >> if (_84 == _86)
> >>goto ; [94.50%]
> >>else
> >>goto ; [5.50%]
> >>
> >> Disable it will produce:
> >>
> >>  [local count: 70988443]:
> >> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
> >> _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
> >> ivtmp.30_415 = ivtmp.30_414 + 1;
> >> if (_84 == _86)
> >>goto ; [94.50%]
> >>else
> >>goto ; [5.50%]
> >>
> >> Then later pass loop unroll could benefit from same address offset
> >> with different base address and reduces register dependency.
> >> This patch could improve performance by 10% for typical case on Power,
> >> no performance change observed for X86 or Aarch64 due to small loops
> >> not unrolled on these platforms.  Any comments?
> >
> > The case you quote is special in that if we hoisted the IV update before
> > the other MEM _also_ used in the condition it would be fine again.
>
> Thanks.  I tried to hoist the IV update statement before the first MEM (Fix 
> 2), it
> shows even worse performance due to not unroll(two more "base-1" is generated 
> in gimple,
> then loop->ninsns is 11 so small loops is not unrolled), change the threshold 
> from
> 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, the
> performance is SAME to the one that IV update statement in the *MIDDLE* 
> (trunk).
> From the ASM, we can see the index register %r4 is used in two iterations 
> which
> maybe a bottle neck for hiding instruction latency?
>
> Then it seems reasonable the performance would be better if keep the IV update
> statement at *LAST* (Fix 1).
>
> (Fix 2):
>[local count: 70988443]:
>   ivtmp.30_415 = ivtmp.30_414 + 1;
>   _34 = ip_229 + 18446744073709551615;
>   _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
>   _33 = ref_180 + 18446744073709551615;
>   _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
>   if (_84 == _86)
> goto ; [94.50%]
>   else
> goto ; [5.50%]
>
>
> .L67:
> lbzx %r12,%r24,%r4
> lbzx %r25,%r7,%r4
> cmpw %cr0,%r12,%r25
> bne %cr0,.L11
> mr %r26,%r4
> addi %r4,%r4,1
> lbzx %r12,%r24,%r4
> lbzx %r25,%r7,%r4
> mr %r6,%r26
> cmpw %cr0,%r12,%r25
> bne %cr0,.L11
> mr %r26,%r4
> .L12:
> cmpdi %cr0,%r10,1
> addi %r4,%r26,1
> mr %r6,%r26
> addi %r10,%r10,-1
> bne %cr0,.L67
>
> >
> > Now, adjust_iv_update_pos doesn't seem to check that the
> > condition actually uses the IV use stmt def, so it likely applies to
> > too many cases.
> >
> > Unfortunately the introducing rev didn't come with a testcase,
> > but still I think fixing up adjust_iv_update_pos is better than
> > introducing a way to short-cut it per target decision.
> >
> > One "fix" might be to add a check that either the condition
> > lhs or rhs is the def of the IV use and the other operand
> > is invariant.  Or if it's of similar structure hoist across the
> > other iv-use as well.  Not that I understand the argument
> > about the overlapping life-range.
> >
> > You also don't provide a complete testcase ...
> >
>
> Attached the test code, will also add it it patch in future version.
> The issue comes from a very small hot loop:
>
> do {
>   len++;
> } while(len < maxlen && ip[len] == ref[len]);

unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int maxlen)
{
  unsigned int len = 2;
  do {
  len++;
  }while(len < maxlen && ip[len] == ref[len]);
  return len;
}

I can see the effect on this loop on x86_64 as well, we end up with

.L6:
movzbl  (%rdi,%rax), %ecx
addq$1, %rax
cmpb-1(%rsi,%rax), %cl
jne .L1
.L3:
movl%eax, %r8d
cmpl%edx, %eax
jb  .L6

but without the trick it is

.L6:
movzbl  (%rdi,%rax), %r8d
movzbl  (%rsi,%rax), %ecx
addq$1, %rax
cmpb%cl, %r8b
jne .L1
.L3:
movl%eax, %r9d
cmpl%edx, %eax
jb  .L6

so here you can see the missed fusion.  Of course
in this case the IV update could have been sunk into
the .L3 block and replicated on the exit edge as well.

I'm not sure if the motivation for the change introducing this
trick was the above kind of combination or not, but I guess
so.  The dependence distance of the IV increment to the
use is now shorter, so I'm not su

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches




On 2021/6/25 16:54, Richard Biener wrote:

On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
 wrote:


From: Xiong Hu Luo 

adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power.  For example, it generates mismatched address offset after
adjust iv update statement position:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
_34 = ref_180 + 18446744073709551615;
_86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
if (_84 == _86)
   goto ; [94.50%]
   else
   goto ; [5.50%]

Disable it will produce:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
_86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
if (_84 == _86)
   goto ; [94.50%]
   else
   goto ; [5.50%]

Then later pass loop unroll could benefit from same address offset
with different base address and reduces register dependency.
This patch could improve performance by 10% for typical case on Power,
no performance change observed for X86 or Aarch64 due to small loops
not unrolled on these platforms.  Any comments?


The case you quote is special in that if we hoisted the IV update before
the other MEM _also_ used in the condition it would be fine again.


Thanks.  I tried to hoist the IV update statement before the first MEM (Fix 2), 
it
shows even worse performance due to not unroll(two more "base-1" is generated 
in gimple,
then loop->ninsns is 11 so small loops is not unrolled), change the threshold 
from
10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, the
performance is SAME to the one that IV update statement in the *MIDDLE* (trunk). 

From the ASM, we can see the index register %r4 is used in two iterations which

maybe a bottle neck for hiding instruction latency?

Then it seems reasonable the performance would be better if keep the IV update
statement at *LAST* (Fix 1).

(Fix 2):
  [local count: 70988443]:
 ivtmp.30_415 = ivtmp.30_414 + 1;
 _34 = ip_229 + 18446744073709551615;
 _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
 _33 = ref_180 + 18446744073709551615;
 _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
 if (_84 == _86)
   goto ; [94.50%]
 else
   goto ; [5.50%]


.L67:
   lbzx %r12,%r24,%r4
   lbzx %r25,%r7,%r4
   cmpw %cr0,%r12,%r25
   bne %cr0,.L11
   mr %r26,%r4
   addi %r4,%r4,1
   lbzx %r12,%r24,%r4
   lbzx %r25,%r7,%r4
   mr %r6,%r26
   cmpw %cr0,%r12,%r25
   bne %cr0,.L11
   mr %r26,%r4
.L12:
   cmpdi %cr0,%r10,1
   addi %r4,%r26,1
   mr %r6,%r26
   addi %r10,%r10,-1
   bne %cr0,.L67



Now, adjust_iv_update_pos doesn't seem to check that the
condition actually uses the IV use stmt def, so it likely applies to
too many cases.

Unfortunately the introducing rev didn't come with a testcase,
but still I think fixing up adjust_iv_update_pos is better than
introducing a way to short-cut it per target decision.

One "fix" might be to add a check that either the condition
lhs or rhs is the def of the IV use and the other operand
is invariant.  Or if it's of similar structure hoist across the
other iv-use as well.  Not that I understand the argument
about the overlapping life-range.

You also don't provide a complete testcase ...



Attached the test code, will also add it it patch in future version.
The issue comes from a very small hot loop:

do {
  len++;
} while(len < maxlen && ip[len] == ref[len]);


--
Thanks,
Xionghu
#include 
#include 
#include 

# define HLOG 16
#defineMAX_LIT(1 <<  5)
typedef const uint8_t *LZF_HSLOT;
typedef LZF_HSLOT LZF_STATE[1 << (HLOG)];

int compute_on_bytes(uint8_t *, int, uint8_t *, int );

int main(int argc, char **argv)
{
//Declarations
FILE *fptr;
int len_limit;
uint8_t *inputbuf,*outputbuf;

uint8_t *ip,*op;
int len, i;
long int sum=0;

//randomness
if (argv[1] != NULL )
len=atoi(argv[1]);

//read
fptr = fopen(argv[2],"rb"); 

if(fptr == NULL)
{ 
   printf("Error!");   
   exit(1); 
}

fseek( fptr , 0L , SEEK_END);
len_limit = ftell( fptr );
rewind( fptr );

/* allocate memory for entire input content */
inputbuf = calloc( 1, len_limit+1 );
if( !inputbuf ) fclose(fptr),fputs("memory alloc fails",stderr),exit(1);

/* allocate memory for entire output */
outputbuf = calloc( 1, len_limit+1 );
if( !inputbuf ) fclose(fptr),fputs("memory alloc fails",stderr),exit(1);

/* copy the file into the buffer */
if( 1!=fread( inputbuf , len_limit, 1 , fptr) )
  fclose(fptr),free(inputbuf),fputs("entire read fails",stderr),exit(1);

//compare
ip=inputbuf;
op=outputbuf;

for (i=0; i < len_limit; i=i+(len/4))
sum+=compute_

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Richard Biener via Gcc-patches

On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
 wrote:
>
> From: Xiong Hu Luo 
>
> adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
> on Power.  For example, it generates mismatched address offset after
> adjust iv update statement position:
>
>  [local count: 70988443]:
> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
> ivtmp.30_415 = ivtmp.30_414 + 1;
> _34 = ref_180 + 18446744073709551615;
> _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
> if (_84 == _86)
>   goto ; [94.50%]
>   else
>   goto ; [5.50%]
>
> Disable it will produce:
>
>  [local count: 70988443]:
> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
> _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
> ivtmp.30_415 = ivtmp.30_414 + 1;
> if (_84 == _86)
>   goto ; [94.50%]
>   else
>   goto ; [5.50%]
>
> Then later pass loop unroll could benefit from same address offset
> with different base address and reduces register dependency.
> This patch could improve performance by 10% for typical case on Power,
> no performance change observed for X86 or Aarch64 due to small loops
> not unrolled on these platforms.  Any comments?

The case you quote is special in that if we hoisted the IV update before
the other MEM _also_ used in the condition it would be fine again.

Now, adjust_iv_update_pos doesn't seem to check that the
condition actually uses the IV use stmt def, so it likely applies to
too many cases.

Unfortunately the introducing rev didn't come with a testcase,
but still I think fixing up adjust_iv_update_pos is better than
introducing a way to short-cut it per target decision.

One "fix" might be to add a check that either the condition
lhs or rhs is the def of the IV use and the other operand
is invariant.  Or if it's of similar structure hoist across the
other iv-use as well.  Not that I understand the argument
about the overlapping life-range.

You also don't provide a complete testcase ...

Richard.


> .L67:
>   lbzx %r7,%r8,%r6
>   lbzx %r12,%r25,%r4
>   cmpw %cr0,%r7,%r12
>   bne %cr0,.L11
>   lbzx %r7,%r8,%r4
>   mr %r6,%r4
>   addi %r4,%r4,1
>   lbzx %r12,%r25,%r4
>   mr %r11,%r6
>   cmpw %cr0,%r7,%r12
>   bne %cr0,.L11
>   mr %r6,%r4
> .L12:
>   cmpdi %cr0,%r10,1
>   addi %r4,%r6,1
>   mr %r11,%r6
>   addi %r10,%r10,-1
>   bne %cr0,.L67
>
> vs.
>
> .L67:
>   lbzx %r25,%r8,%r6
>   lbzx %r12,%r7,%r6
>   addi %r4,%r6,1
>   cmpw %cr0,%r25,%r12
>   bne %cr0,.L11
>   lbzx %r12,%r8,%r4
>   lbzx %r25,%r7,%r4
>   mr %r6,%r4
>   mr %r11,%r4
>   cmpw %cr0,%r12,%r25
>   bne %cr0,.L11
>   addi %r6,%r4,1
> .L12:
>   cmpdi %cr0,%r10,1
>   mr %r11,%r6
>   addi %r10,%r10,-1
>   bne %cr0,.L67
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (TARGET_ADJUST_IV_UPDATE_POS):
> (rs6000_adjust_iv_update_pos):
> * doc/tm.texi:
> * doc/tm.texi.in:
> * target.def:
> * targhooks.c (default_adjust_iv_update_pos):
> * targhooks.h (default_adjust_iv_update_pos):
> * tree-ssa-loop-ivopts.c (rewrite_use_address):
> ---
>  gcc/config/rs6000/rs6000.c | 11 +++
>  gcc/doc/tm.texi|  5 +
>  gcc/doc/tm.texi.in |  2 ++
>  gcc/target.def |  7 +++
>  gcc/targhooks.c|  6 ++
>  gcc/targhooks.h|  2 ++
>  gcc/tree-ssa-loop-ivopts.c |  3 ++-
>  7 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index cd130dea611..e7725997793 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -1455,6 +1455,9 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #undef TARGET_LOOP_UNROLL_ADJUST
>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>
> +#undef TARGET_ADJUST_IV_UPDATE_POS
> +#define TARGET_ADJUST_IV_UPDATE_POS rs6000_adjust_iv_update_pos
> +
>  #undef TARGET_INIT_BUILTINS
>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>  #undef TARGET_BUILTIN_DECL
> @@ -5457,6 +5460,14 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct 
> loop *loop)
>return nunroll;
>  }
>
> +/* Implement targetm.adjust_iv_update_pos.  */
> +
> +bool
> +rs6000_adjust_iv_update_pos (void)
> +{
> +  return false;
> +}
> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
> library with vectorized intrinsics.  */
>
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index b272fa4806d..07ce40eb053 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11768,6 +11768,11 @@ By default, the RTL loop optimizer does not use a 
> present doloop pattern for
>  loops containing function calls or branch on table instructions.
>  @end deftypefn
>
> +@deftypefn {Target Hook} bool TARGET_ADJUST_IV_UPDATE_POS (void)
> +if adjust_iv_update_pos is enabled, reorder the iv update statement,
> + then mem ref uses the iv value after update.
> +@end deftypefn
> +
>  @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN (rtx_insn 
> *@var{insn})
>  Take an instruction in @var{ins

[PATCH] New hook adjust_iv_update_pos

2021-06-25 Thread Xionghu Luo via Gcc-patches

From: Xiong Hu Luo 

adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
on Power.  For example, it generates mismatched address offset after
adjust iv update statement position:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
_34 = ref_180 + 18446744073709551615;
_86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
if (_84 == _86)
  goto ; [94.50%]
  else
  goto ; [5.50%]

Disable it will produce:

 [local count: 70988443]:
_84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
_86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
ivtmp.30_415 = ivtmp.30_414 + 1;
if (_84 == _86)
  goto ; [94.50%]
  else
  goto ; [5.50%]

Then later pass loop unroll could benefit from same address offset
with different base address and reduces register dependency.
This patch could improve performance by 10% for typical case on Power,
no performance change observed for X86 or Aarch64 due to small loops
not unrolled on these platforms.  Any comments?

.L67:
  lbzx %r7,%r8,%r6
  lbzx %r12,%r25,%r4
  cmpw %cr0,%r7,%r12
  bne %cr0,.L11
  lbzx %r7,%r8,%r4
  mr %r6,%r4
  addi %r4,%r4,1
  lbzx %r12,%r25,%r4
  mr %r11,%r6
  cmpw %cr0,%r7,%r12
  bne %cr0,.L11
  mr %r6,%r4
.L12:
  cmpdi %cr0,%r10,1
  addi %r4,%r6,1
  mr %r11,%r6
  addi %r10,%r10,-1
  bne %cr0,.L67

vs.

.L67:
  lbzx %r25,%r8,%r6
  lbzx %r12,%r7,%r6
  addi %r4,%r6,1
  cmpw %cr0,%r25,%r12
  bne %cr0,.L11
  lbzx %r12,%r8,%r4
  lbzx %r25,%r7,%r4
  mr %r6,%r4
  mr %r11,%r4
  cmpw %cr0,%r12,%r25
  bne %cr0,.L11
  addi %r6,%r4,1
.L12:
  cmpdi %cr0,%r10,1
  mr %r11,%r6
  addi %r10,%r10,-1
  bne %cr0,.L67

gcc/ChangeLog:

* config/rs6000/rs6000.c (TARGET_ADJUST_IV_UPDATE_POS):
(rs6000_adjust_iv_update_pos):
* doc/tm.texi:
* doc/tm.texi.in:
* target.def:
* targhooks.c (default_adjust_iv_update_pos):
* targhooks.h (default_adjust_iv_update_pos):
* tree-ssa-loop-ivopts.c (rewrite_use_address):
---
 gcc/config/rs6000/rs6000.c | 11 +++
 gcc/doc/tm.texi|  5 +
 gcc/doc/tm.texi.in |  2 ++
 gcc/target.def |  7 +++
 gcc/targhooks.c|  6 ++
 gcc/targhooks.h|  2 ++
 gcc/tree-ssa-loop-ivopts.c |  3 ++-
 7 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index cd130dea611..e7725997793 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1455,6 +1455,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_LOOP_UNROLL_ADJUST
 #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
 
+#undef TARGET_ADJUST_IV_UPDATE_POS
+#define TARGET_ADJUST_IV_UPDATE_POS rs6000_adjust_iv_update_pos
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS rs6000_init_builtins
 #undef TARGET_BUILTIN_DECL
@@ -5457,6 +5460,14 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop 
*loop)
   return nunroll;
 }
 
+/* Implement targetm.adjust_iv_update_pos.  */
+
+bool
+rs6000_adjust_iv_update_pos (void)
+{
+  return false;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
library with vectorized intrinsics.  */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b272fa4806d..07ce40eb053 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11768,6 +11768,11 @@ By default, the RTL loop optimizer does not use a 
present doloop pattern for
 loops containing function calls or branch on table instructions.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_ADJUST_IV_UPDATE_POS (void)
+if adjust_iv_update_pos is enabled, reorder the iv update statement,
+ then mem ref uses the iv value after update.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_LEGITIMATE_COMBINED_INSN (rtx_insn 
*@var{insn})
 Take an instruction in @var{insn} and return @code{false} if the instruction 
is not appropriate as a combination of two or more instructions.  The default 
is to accept all instructions.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index bf724dc093c..87d02089588 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7979,6 +7979,8 @@ to by @var{ce_info}.
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
 
+@hook TARGET_ADJUST_IV_UPDATE_POS
+
 @hook TARGET_LEGITIMATE_COMBINED_INSN
 
 @hook TARGET_CAN_FOLLOW_JUMP
diff --git a/gcc/target.def b/gcc/target.def
index d7b94bd8e5d..aead7cb79ff 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4398,6 +4398,13 @@ loops containing function calls or branch on table 
instructions.",
  const char *, (const rtx_insn *insn),
  default_invalid_within_doloop)
 
+/* Function to adjust iv update statment position.  */
+DEFHOOK
+(adjust_iv_update_pos,
+ "if adjust_iv_update_pos is enabled, reorder the iv update statement,\n\
+ then mem ref uses the iv value after update.",
+ bool, (void), default_adjust_iv_update_pos)
+
 /* Returns true for a legitimate combined insn.  */
 DEFHOOK
 (legitimate_comb

Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-25 Thread Richard Biener via Gcc-patches

On Thu, Jun 24, 2021 at 6:28 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Wed, Jun 23, 2021 at 12:22 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  
> >> > wrote:
> >> >>
> >> >> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
> >> >> > On 6/21/21 1:15 AM, Richard Biener wrote:
> >> > [...]
> >> >> > >
> >> >> > > But maybe I'm misunderstanding C++ too much :/
> >> >> > >
> >> >> > > Well, I guess b) from above means auto_vec<> passing to
> >> >> > > vec<> taking functions will need changes?
> >> >> >
> >> >> > Converting an auto_vec object to a vec slices off its data members.
> >> >> > The auto_vec specialization has no data members so that's not
> >> >> > a bug in and of itself, but auto_vec does have data members
> >> >> > so that would be a bug.  The risk is not just passing it to
> >> >> > functions by value but also returning it.  That risk was made
> >> >> > worse by the addition of the move ctor.
> >> >>
> >> >> I would agree that the conversion from auto_vec<> to vec<> is
> >> >> questionable, and should get some work at some point, perhaps just
> >> >> passingauto_vec references is good enough, or perhaps there is value in
> >> >> some const_vec view to avoid having to rely on optimizations, I'm not
> >> >> sure without looking more at the usage.
> >> >
> >> > We do need to be able to provide APIs that work with both auto_vec<>
> >> > and vec<>, I agree that those currently taking a vec<> by value are
> >> > fragile (and we've had bugs there before), but I'm not ready to say
> >> > that changing them all to [const] vec<>& is OK.  The alternative
> >> > would be passing a const_vec<> by value, passing that along to
> >> > const vec<>& APIs should be valid then (I can see quite some API
> >> > boundary cleanups being necessary here ...).
> >>
> >> FWIW, as far as const_vec<> goes, we already have array_slice, which is
> >> mostly a version of std::span.  (The only real reason for not using
> >> std::span itself is that it requires a later version of C++.)
> >>
> >> Taking those as arguments has the advantage that we can pass normal
> >> arrays as well.
> >
> > It's not really a "const" thing it seems though it certainly would not 
> > expose
> > any API that would reallocate the vector (which is the problematic part
> > of passing vec<> by value, not necessarily changing elements in-place).
> >
> > Since array_slice doesn't inherit most of the vec<> API transforming an
> > API from vec<> to array_slice<> will be also quite some work.  But I
> > agree it might be useful for generic API stuff.
>
> Which API functions would be the most useful ones to have?  We could
> add them if necessary.

I think we'll see when introducing uses.  I guess that vec<> to const vec<>&
changes will be mostly fine but for vec<> to vec<>& I might prefer
vec<> to array_slice<> since that makes clear the caller won't re-allocate.
We'll see what those APIs would require then.

> There again, for things like searching and sorting, I think it would
> be better to use  where possible.  It should usually be
> more efficient than the void* callback approach that the C code tended
> to use.

Not sure whether  really is better, we've specifically replaced
libc qsort calls with our own sort to avoid all sorts of host isues and
the vec<>::bsearch is inline and thus the callbacks can be inlined.

Richard.

> Thanks,
> Richard

Re: [PATCH 4/7] Allow match-and-simplified phiopt to run in early phiopt

2021-06-25 Thread Richard Biener via Gcc-patches

On Thu, Jun 24, 2021 at 6:24 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > To move a few things more to match-and-simplify from phiopt,
> > we need to allow match_simplify_replacement to run in early
> > phiopt. To do this we add a replacement for gimple_simplify
> > that is explictly for phiopt.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no
> > regressions.
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-phiopt.c (match_simplify_replacement):
> >   Add early_p argument. Call gimple_simplify_phiopt
> >   instead of gimple_simplify.
> >   (tree_ssa_phiopt_worker): Update call to
> >   match_simplify_replacement and allow unconditionally.
> >   (phiopt_early_allow): New function.
> >   (gimple_simplify_phiopt): New function.
> So the two questions on my end are why did we restrict when this could
> run before and why restrict the codes we're willing to optimize in the
> early phase?  Not an ACK or NAK at this point, just trying to understand
> a bit of the history.

I've done this because jump threading likes to see the CFG structure
and some of the testcases use if () return 0/1 which are prone to be
value-replaced by phiopt.  At the point I added the early phiopt I
didn't want to go and fixup all the testcases to avoid the phiopt transforms
nor did I want to investigate the "real" impact on code - the purpose
of early phiopt was exactly to get min/max/abs replacement done so
that was the way of least resistance ...

I'd rather not revisit this decision as part of the match-and-simplify
series but of course if anyone dares to do the detective work she'll be
welcome.

Richard.

>
> jeff
>

[PATCH] Fixup reduction info on addsub SLP pattern

2021-06-25 Thread Richard Biener

gcc.dg/vect/pr96854.c shows we need to copy over reduction info
to the SLP pattern as already done for the complex patterns.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-25  Richard Biener  

* tree-vect-slp-patterns.c (addsub_pattern::build): Copy
STMT_VINFO_REDUC_DEF from the original representative.
---
 gcc/tree-vect-slp-patterns.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index d536494a1bd..2671f91972d 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -1577,11 +1577,13 @@ addsub_pattern::build (vec_info *vinfo)
   (TREE_TYPE (gimple_assign_lhs (rep->stmt;
   gimple_call_set_nothrow (call, true);
   gimple_set_bb (call, gimple_bb (rep->stmt));
-  SLP_TREE_REPRESENTATIVE (node) = vinfo->add_pattern_stmt (call, rep);
-  STMT_VINFO_RELEVANT (SLP_TREE_REPRESENTATIVE (node)) = vect_used_in_scope;
-  STMT_SLP_TYPE (SLP_TREE_REPRESENTATIVE (node)) = pure_slp;
-  STMT_VINFO_VECTYPE (SLP_TREE_REPRESENTATIVE (node)) = SLP_TREE_VECTYPE 
(node);
-  STMT_VINFO_SLP_VECT_ONLY_PATTERN (SLP_TREE_REPRESENTATIVE (node)) = true;
+  stmt_vec_info new_rep = vinfo->add_pattern_stmt (call, rep);
+  SLP_TREE_REPRESENTATIVE (node) = new_rep;
+  STMT_VINFO_RELEVANT (new_rep) = vect_used_in_scope;
+  STMT_SLP_TYPE (new_rep) = pure_slp;
+  STMT_VINFO_VECTYPE (new_rep) = SLP_TREE_VECTYPE (node);
+  STMT_VINFO_SLP_VECT_ONLY_PATTERN (new_rep) = true;
+  STMT_VINFO_REDUC_DEF (new_rep) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (rep));
   SLP_TREE_CODE (node) = ERROR_MARK;
   SLP_TREE_LANE_PERMUTATION (node).release ();
 
-- 
2.26.2

Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-25 Thread Uros Bizjak via Gcc-patches

On Fri, Jun 25, 2021 at 8:48 AM Richard Biener  wrote:
>
> On Thu, 24 Jun 2021, Uros Bizjak wrote:
>
> > On Thu, Jun 24, 2021 at 1:07 PM Richard Biener  wrote:
> >
> > > This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
> > > instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
> > > thus subtract, add alternating on lanes, starting with subtract.
> > >
> > > It adds a corresponding optab and direct internal function,
> > > vec_addsub$a3 and renames the existing i386 backend patterns to
> > > the new canonical name.
> > >
> > > The SLP pattern matches the exact alternating lane sequence rather
> > > than trying to be clever and anticipating incoming permutes - we
> > > could permute the two input vectors to the needed lane alternation,
> > > do the addsub and then permute the result vector back but that's
> > > only profitable in case the two input or the output permute will
> > > vanish - something Tamars refactoring of SLP pattern recog should
> > > make possible.
> >
> > Using the attached patch, I was also able to generate addsub for the
> > following testcase:
> >
> > float x[2], y[2], z[2];
> >
> > void foo ()
> > {
> >   x[0] = y[0] - z[0];
> >   x[1] = y[1] + z[1];
> > }
> >
> >vmovq   y(%rip), %xmm0
> >vmovq   z(%rip), %xmm1
> >vaddsubps   %xmm1, %xmm0, %xmm0
> >vmovlps %xmm0, x(%rip)
> >ret
>
> Nice.  But I suppose it can be merged with the other now single
> pattern?

Actually, MMX_WITH_SSE V2SF modes do not support memory operands (they
operate on V2SF subreg of V4SF SSE registers), so a new mode attribute
would be needed to instantiate register operand in case of V2SF.

Uros.
>
> Richard.

Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-25 Thread Uros Bizjak via Gcc-patches

On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu  wrote:
>
> On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu  wrote:
> > >
> > > CPUID functions are used to detect CPU features.  If vector ISAs
> > > are enabled, compiler is free to use them in these functions.  Add
> > > __attribute__ ((target("general-regs-only"))) to CPUID functions
> > > to avoid vector instructions.
> >
> > These functions are intended to be inlined, so how does target
> > attribute affect inlining?
> I guess w/ -O0. they may not be inlined, that's why H.J adds those
> attributes to those functions.

The problem is not with these functions, but with surrounding checks
for cpuid features. These checks are implemented with logic
instructions, and nothing prevents RA from allocating mask registers,
and consequently mask insn is emitted. Regarding mentioned functions,
cpuid insn pattern has four GPR single-reg constraints, so mask
registers can't be allocated here.

> pr96814.dump:
> 0804aa40 :
>  804aa40: 8d 4c 24 04  lea0x4(%esp),%ecx
> ...
>  804aa63: 6a 07push   $0x7
>  804aa65: e8 e0 e7 ff ffcall   804924a <__get_cpuid_count>
>
> Also we need to add a target attribute to avx512f_os_support (), and
> that would be enough to fix the AVX512 part.
>
> Moreover, all check functions in below files may also need to deal with:
> adx-check.h
> aes-avx-check.h
> aes-check.h
> amx-check.h
> attr-nocf-check-1a.c
> attr-nocf-check-3a.c
> avx2-check.h
> avx2-vpop-check.h
> avx512bw-check.h
> avx512-check.h
> avx512dq-check.h
> avx512er-check.h
> avx512f-check.h
> avx512vl-check.h
> avx-check.h
> bmi2-check.h
> bmi-check.h
> cf_check-1.c
> cf_check-2.c
> cf_check-3.c
> cf_check-4.c
> cf_check-5.c
> f16c-check.h
> fma4-check.h
> fma-check.h
> isa-check.h
> lzcnt-check.h
> m128-check.h
> m256-check.h
> m512-check.h
> mmx-3dnow-check.h
> mmx-check.h
> pclmul-avx-check.h
> pclmul-check.h
> pr39315-check.c
> rtm-check.h
> sha-check.h
> spellcheck-options-1.c
> spellcheck-options-2.c
> spellcheck-options-3.c
> spellcheck-options-4.c
> spellcheck-options-5.c
> sse2-check.h
> sse3-check.h
> sse4_1-check.h
> sse4_2-check.h
> sse4a-check.h
> sse-check.h
> ssse3-check.h
> stack-check-11.c
> stack-check-12.c
> stack-check-17.c
> stack-check-18.c
> stack-check-19.c
> xop-check.h

True, but this would just paper over the real problem. Now, it is
expected that the user decorates the function that checks CPUID
features with the target attribute. I'm not sure if this is OK.

Uros.

Re: [[PATCH V9] 1/7] dwarf: add a dwarf2int.h internal interface

2021-06-25 Thread Richard Biener via Gcc-patches

On Thu, Jun 24, 2021 at 5:31 PM Jason Merrill via Gcc-patches
 wrote:
>
> On 6/24/21 11:13 AM, Jose E. Marchesi wrote:
> >
> > This patch introduces a dwarf2int.h header, to be used by code that
> > needs access to the internal DIE structures and their attributes.
> 
>  Why not put these bits in dwarf2out.h?
> >>> We think that it makes sense to have a separated interface file for
> >>> the
> >>> implementation of DWARF-based debug formats.  It is called `internal'
> >>> because it provides access to internal data structures as well as the
> >>> basic accessor functions to the internals of the DWARF DIEs.
> >>
> >> Yes, but "internal data structures" also describes most of the current
> >> dwarf2out.h.
> >
> > Yes right, dwarf2out.h contains a mixture of function prototypes and
> > several data structures, many of them "internal".
> >
> >> I'm not opposed to refactoring the header, but splitting
> >> off a dwarf2cfi.h (for print-rtl.c and final.c) seems like a better
> >> dividing line.
> >
> > Ok, so what about this: at this point we remove dwarf2int.h from our
> > patch, put the definitions in dwarf2out.h instead, and then once the
> > stuff is upstream we can discuss on how better refactor the dwarf2out*
> > stuff.
> >
> > Is that ok?
>
> Sounds good.

Fine with me as well.

Richard.

> Jason
>

Re: [PING][PATCH 9/13] v2 Use new per-location warning APIs in LTO

2021-06-25 Thread Richard Biener via Gcc-patches

On Thu, Jun 24, 2021 at 5:27 PM Martin Sebor  wrote:
>
> On 6/24/21 3:32 AM, Richard Biener wrote:
> > On Mon, Jun 21, 2021 at 11:55 PM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571980.html
> >>
> >> Looking for a review of the LTO changes to switch TREE_NO_WARNING to
> >> the suppress_warning() API.
> >
> > Hmm, since the warning suppressions are on location ad-hoc data the 
> > appropriate
> > thing is to adjust the location streaming and that part seems to be missing?
> >
> > So what you now stream is simply the "everything" fallback, correct?
> >
> > In particular:
> >
> > else
> > -bp_pack_value (bp, TREE_NO_WARNING (expr), 1);
> > +/* FIXME: pack all warning bits.  */
> > +bp_pack_value (bp, warning_suppressed_p (expr), 1);
> >
> > this looks like a wrong comment in that light.
>
> Yes, this is just a fallback.  I haven't thought about how to handle
> the FIXME yet but from your comment it sounds like this code might
> stay the same (or maybe even go back to streaming the flag directly)
> and the nowarn_spec_t bitmap should be streamed elsewhere?

Yes, since the bitmap is per location it should be streamed alongside
locations in lto_{input,output}_location.

> >
> > -  else
> > -compare_values (TREE_NO_WARNING);
> > +  else if (t1->base.nowarning_flag != t2->base.nowarning_flag)
> > +return false;
> >
> > uh.  Can you keep sth like TREE_NO_WARNING_RAW or so?
>
> The flag is used directly in fold-const.c and cp/module.cc so
> this would be in keeping with that, but I also don't mind adding
> a macro for it.  My only concern is with macro getting used to
> inadvertently bypass the API.

There's precedent with other _RAW accessors, it should be
clear that it bypasses accessors and thus review should easily
spot inadverted uses.

Richard.

> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> On 6/4/21 3:43 PM, Martin Sebor wrote:
> >>> The attached patch replaces the uses of TREE_NO_WARNING in the LTO
> >>> front end with the new suppress_warning() API.  It adds a couple of
> >>> FIXMEs that I plan to take care of in a follow up.
> >>
>

Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-25 Thread Richard Biener via Gcc-patches

On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod  wrote:
>
> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
> > On 6/24/21 8:29 AM, Richard Biener wrote:
> >
> >
> > THe original function in EVRP currently looks like:
> >
> >  === BB 2 
> >  :
> > if (a_5(D) == b_6(D))
> >   goto ; [INV]
> > else
> >   goto ; [INV]
> >
> > === BB 8 
> > Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides
> > a_5 and b_6 as equivalences
> >  :
> > goto ; [100.00%]
> >
> > === BB 6 
> >  :
> > # i_1 = PHI <0(8), i_10(5)>
> > if (i_1 < a_5(D))
> >   goto ; [INV]
> > else
> >   goto ; [INV]
> >
> > === BB 3 
> > Relational : (i_1 < a_5(D)) edge 6->3 provides
> > this relation
> >  :
> > if (i_1 == b_6(D))
> >   goto ; [INV]
> > else
> >   goto ; [INV]
> >
> >
> > So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
> > a_5 in BB3 as well..
> >
> > so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
> > currently aren't.   I think I had turned on equivalence mapping during
> > relational processing, so should be able to tag that without
> > transitive relations...  I'll have a look at why.
> >
> > And once we get a bit further along, you will be able to access this
> > without ranger.. if one wants to simply register the relations directly.
> >
> > Anyway, I'll get back to you why its currently being missed.
> >
> > Andrew
> >
> >
> >
> As promised.  There was a typo in the equivalency comparisons... so it
> was getting missed.  With the fix, the oracle identifies the relation
> and evrp will now fold that expression away and the IL becomes:
>
> :
>if (a_5(D) == b_6(D))
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>i_10 = i_1 + 1;
>
> :
># i_1 = PHI <0(2), i_10(3)>
>if (i_1 < a_5(D))
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>return;
>
> for the other cases you quote, there are no predictions such that if a
> != 0 then this equivalency exists...
>
> +  if (a != 0)
> +{
> +  c = b;
> +}
>
> but the oracle would register that in the TRUE block,  c and b are
> equivalent... so some other pass that was interested in tracking
> conditions that make a block relevant would be able to compare relations...

I guess to fully leverage optimizations for cases like

  if (a != 0)
c = b;
  ...
  if (a != 0)
{
if (c == b)
...
}

one would need to consider the "optimally jump threaded path" to the
program point where the to be optimized stmt resides, making all
originally conditional but on the jump threaded path unconditional
relations and equivalences available.

For VN that could be done by unwinding to the CFG merge after
the first if (a != 0), treating only one of the predecessor edges
as executable and registering the appropriate a != 0 result and
continue VN up to the desired point, committing to the result
until before the CFG merge after the second if (a != 0).  And then
unwinding again for the "else" path.  Sounds like a possible
explosion in complexity as well if second-order opportunities
arise.

That is, we'd do simplifications exposed by jump threading but
without actually doing the jump threading (which will of course
not allow all possible simplifications w/o inserting extra PHIs
for computations we might want to re-use).

Richard.

> Andrew
>
>

66 matches

Mail list logo