date:20200729

[PATCH, rs6000] Add non-relative jump table support for 64bit rs6000

2020-07-29 Thread HAO CHEN GUI via Gcc-patches


David,

Seems there is something wrong with my email box.  I lost your email. I 
reconfigured the box and it should be OK now.


Could you inform me how to exclude AIX from the condition testing? By 
the ABI? Thanks a lot.


Haochen Gui

Re: [PATCH] c++: decl_constant_value and unsharing [PR96197]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/21/20 3:07 PM, Patrick Palka wrote:

In the testcase from the PR we are seeing excessive memory use (> 5GB)
during constexpr evaluation, almost all of which is due to the call to
decl_constant_value in the VAR_DECL/CONST_DECL branch of
cxx_eval_constant_expression.  We reach here every time we evaluate an
ARRAY_REF of a constexpr VAR_DECL, which in this testcase is quite
often, and from there decl_constant_value makes an unshared copy of the
VAR_DECL's initializer, even though the unsharing is not needed at this
call site (because it is up to callers of cxx_eval_constant_expression
to unshare).

To fix this, this patch moves the responsibility of unsharing the result
of decl_constant_value, decl_really_constant_value and
scalar_constant_value from the callee to the caller.


How about creating another entry point that doesn't unshare, and using 
that in constexpr evaluation?



Fortunately there's only six calls to these functions, two of which are
from cxx_eval_constant_expression where the unsharing is undesirable.
And in unify there is one call, to scalar_constant_value, that looks
like:

case CONST_DECL:
  if (DECL_TEMPLATE_PARM_P (parm))
return ...;

if (arg != scalar_constant_value (parm))

return ...;

where we are suspiciously testing for pointer equality despite
scalar_constant_value's unsharing behavior.


Since scalar_constant_value of a CONST_DECL should be its DECL_INITIAL 
INTEGER_CST, that probably did work when we would see unlowered 
enumerators.  But apparently we don't anymore, so simplifying this makes 
sense.



This line seems to be dead
code however, so this patch replaces it with an appropriate gcc_assert.
Finally, this patch adds an explicit call to unshare_expr to the
remaining three callers.

Now that the the calls to decl_constant_value and
decl_really_constant_value from cxx_eval_constant_expression no longer
unshare their result, memory use during constexpr evaluation for the
testcase in the PR falls from 5GB to 15MB according to -ftime-report.

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
cmcstl2 and a number of other libraries.  Does this look OK to commit?

gcc/cp/ChangeLog:

PR c++/96197
* cp-gimplify.c (cp_fold_maybe_rvalue): Call unshare_expr on the
result of decl_constant_value.
* cvt.c: Include gimplify.h.
(ocp_convert): Call unshare_expr on the result of
scalar_constant_value.
* init.c (constant_value_1): Don't call unshare_expr here,
so that callers can choose whether to unshare.
* pt.c (tsubst_copy): Call unshare_expr on the result of
scalar_constant_value.
(unify) : Assert DECL_TEMPLATE_PARM_P and
simplify accordingly.
---
  gcc/cp/cp-gimplify.c | 2 +-
  gcc/cp/cvt.c | 3 ++-
  gcc/cp/init.c| 2 +-
  gcc/cp/pt.c  | 9 +++--
  4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 0e949e29c5c..5c5c44dbc5d 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2433,7 +2433,7 @@ cp_fold_maybe_rvalue (tree x, bool rval)
  tree v = decl_constant_value (x);
  if (v != x && v != error_mark_node)
{
- x = v;
+ x = unshare_expr (v);
  continue;
}
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index c9e7b1ff044..a7e40a1fa51 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "stringpool.h"
  #include "attribs.h"
  #include "escaped_string.h"
+#include "gimplify.h"
  
  static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);

  static tree build_type_conversion (tree, tree);
@@ -725,7 +726,7 @@ ocp_convert (tree type, tree expr, int convtype, int flags,
e = mark_rvalue_use (e);
tree v = scalar_constant_value (e);
if (!error_operand_p (v))
-   e = v;
+   e = unshare_expr (v);
  }
if (error_operand_p (e))
  return error_mark_node;
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index ef4b3c4dc3c..bf229bd2ba5 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2343,7 +2343,7 @@ constant_value_1 (tree decl, bool strict_p, bool 
return_aggregate_cst_ok_p)
  && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl)
  && DECL_NONTRIVIALLY_INITIALIZED_P (decl))
break;
-  decl = unshare_expr (init);
+  decl = init;
  }
return decl;
  }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 34876788a9c..4d3ee099cea 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16368,7 +16368,7 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
  return t;
/* If ARGS is NULL, then T is known to be non-dependent.  */
if (args == NULL_TREE)
- return scalar_constant_value (t);
+ return unshare_expr (scalar_constant_value (t));
  
  	/* Unfortunately, we cannot just call lookup_

[PATCH] [csky] Delete big endian CPUs' mutilib for linux gcc.

2020-07-29 Thread Cooper Qu via Gcc-patches

gcc/
* config/csky/t-csky-linux: Delete big endian CPUs' multilib.

---
 gcc/config/csky/t-csky-linux | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/gcc/config/csky/t-csky-linux b/gcc/config/csky/t-csky-linux
index 16656c3..df471ed 100644
--- a/gcc/config/csky/t-csky-linux
+++ b/gcc/config/csky/t-csky-linux
@@ -20,14 +20,8 @@
 # .
 
 
-# Endiannesses.
-MULTILIB_OPTIONS = mlittle-endian/mbig-endian
-MULTILIB_DIRNAMES= little big
-MULTILIB_MATCHES = mlittle-endian=EL
-MULTILIB_MATCHES = mbig-endian=EB
-
 MULTILIB_EXCEPTIONS  =
-CSKY_MULTILIB_OSDIRNAMES = mbig-endian=/big mlittle-endian=/. 
mhard-float=/hard-fp msoft-float=/. mcpu.ck810f=/. mcpu.ck807f=/ck807
+CSKY_MULTILIB_OSDIRNAMES = mhard-float=/hard-fp msoft-float=/. mcpu.ck810f=/. 
mcpu.ck807f=/ck807
 
 # Arch variants.
 MULTILIB_OPTIONS += mcpu=ck810f/mcpu=ck807f
-- 
2.7.4

RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-07-29 Thread xiezhiheng

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Friday, July 17, 2020 5:04 PM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
>

Cut...

> 
> Thanks, pushed to master.
> 
> Richard

And I have finished the second part.

In function aarch64_general_add_builtin, I add an argument ATTRS to
pass attributes for each built-in function.

And some new functions are added:
aarch64_call_properties: return flags for each built-in function based
on command-line options.  When the built-in function handles
floating-points, add FLAG_FP flag.

aarch64_modifies_global_state_p: True if the function would modify
global states.

aarch64_reads_global_state_p: True if the function would read
global states.

aarch64_could_trap_p: True if the function would raise a signal.

aarch64_add_attribute: Add attributes in ATTRS.

aarch64_get_attributes: return attributes for each built-in functons
based on flags and command-line options.

In function aarch64_init_simd_builtins, attributes are get by flags
and pass them to function aarch64_general_add_builtin.


Bootstrap is tested OK on aarch64 Linux platform, but regression
FAIL one test case  pr93423.f90.
However, I found that this test case would fail randomly in trunk.
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93423
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96041
Some PRs have tracked it.  After my patch, this test case would
always fail.  I guess the syntax errors in fortran crash some structures
result in illegal memory access but I can't find what exactly it is.
But I think my patch should have no influence on it.

Have some further suggestions?

Thanks,
Xiezhiheng



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 871b97c8543..8882ec1d59a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,15 @@
+2020-07-30  Zhiheng Xie  
+
+   * config/aarch64/aarch64-builtins.c (aarch64_general_add_builtin):
+   Add new argument ATTRS.
+   (aarch64_call_properties): New function.
+   (aarch64_modifies_global_state_p): Likewise.
+   (aarch64_reads_global_state_p): Likewise.
+   (aarch64_could_trap_p): Likewise.
+   (aarch64_add_attribute): Likewise.
+   (aarch64_get_attributes): Likewise.
+   (aarch64_init_simd_builtins): Add attributes for each built-in function.
+



pr94442-v1.patch
Description: pr94442-v1.patch

Re: [PATCH] c++: Fixing the wording of () aggregate-init [PR92812]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/20/20 7:28 PM, Marek Polacek wrote:

P1975R0 tweaks the static_cast wording: it says that "An expression e can be
explicitly converted to a type T if [...] T is an aggregate type having a first
element x and there is an implicit conversion sequence from e to the type of
x."  This already works for classes, e.g.:

   struct Aggr { int x; int y; };
   Aggr a = static_cast(1);

albeit I noticed a -Wmissing-field-initializer warning which is unlikely to be
helpful in this context, as there's nothing like static_cast(1, 2)
to quash that warning.


You could write Aggr{1,2} instead?

This warning could be helpful if they didn't realize that what they were 
writing is initializing one element of an aggregate.



However, the proposal also mentions "If T is ``array of unknown bound of U'',
this direct-initialization defines the type of the expression as U[1]" which
suggest that this should work for arrays (they're aggregates too, after all).
Ville, can you confirm that these

   int (&&r)[3] = static_cast(42);
   int (&&r2)[1] = static_cast(42);

are supposed to work now?  There's no {} variant to check.  Thanks.


I'll review the discussion of this later.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/92812
* typeck.c (build_static_cast_1): Add warning_sentinel for
-Wmissing-field-initializer.

gcc/testsuite/ChangeLog:

PR c++/92812
* g++.dg/cpp2a/paren-init27.C: New test.
---
  gcc/cp/typeck.c   |  3 +++
  gcc/testsuite/g++.dg/cpp2a/paren-init27.C | 24 +++
  2 files changed, 27 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init27.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 589e014f855..062751a6379 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -7472,6 +7472,9 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
   static_cast of the form static_cast(e) if the declaration T
   t(e);" is well-formed, for some invented temporary variable
   t.  */
+
+  /* In C++20, we don't want to warn about static_cast(1).  */
+  warning_sentinel w (warn_missing_field_initializers);
result = perform_direct_initialization_if_possible (type, expr,
  c_cast_p, complain);
if (result)
diff --git a/gcc/testsuite/g++.dg/cpp2a/paren-init27.C 
b/gcc/testsuite/g++.dg/cpp2a/paren-init27.C
new file mode 100644
index 000..a856c7fd7be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/paren-init27.C
@@ -0,0 +1,24 @@
+// PR c++/92812
+// P1975R0
+// { dg-do run { target c++20 } }
+// { dg-options "-Wall -Wextra" }
+
+struct Aggr { int x; int y; };
+struct Base { int i; Base(int i_) : i{i_} { } };
+struct BaseAggr : Base { };
+struct X { };
+struct AggrSDM { static X x; int i; int j; };
+
+int
+main ()
+{
+  Aggr a = static_cast(42);
+  if (a.x != 42 || a.y != 0)
+__builtin_abort ();
+  BaseAggr b = static_cast(42);
+  if (b.i != 42)
+__builtin_abort ();
+  AggrSDM s = static_cast(42);
+  if (s.i != 42 || s.j != 0)
+__builtin_abort ();
+}

base-commit: 932fbc868ad429167a3d4d5625aa9d6dc0b4506b

Re: RFC: Monitoring old PRs, new dg directives

2020-07-29 Thread Martin Sebor via Gcc-patches


I've created a much more rudimentary setup for myself to deal
with the same problem.  I copy tests from Bugzilla, sometimes
with tweaks, and compile them from time to time as I revisit
unresolved bugs.  I've also thought about adding those to
the test suite and marking them XFAIL but I don't think I've
actually done it more than a handful of times.  I was told
adding tests (passing or xfailing) is fine without approval.

I think your proposal to add tests for known failures is a good
idea.  I don't have much of an opinion about extending the test
harness to differentiate other kinds of failures (like ICEs) and
mark them as expected.  I'm not sure I understand the benefit
of adding directives like dg-accepts-invalid over using xfail.

Martin

On 7/28/20 3:44 PM, Marek Polacek via Gcc-patches wrote:

In Bugzilla, for the c++ component, we currently have over 3200 open bugs.  In
my experience, a good amount of them have already been fixed; my periodical
sweeps always turn up a bunch of PRs that had already been fixed previously.
Sometimes my sweeps are more or less random, but more often than not I'm just
looking for duplicates of an existing PR.  Sometimes the reason the already
fixed PRs are still open is because a PR that was fixed had duplicates that we
didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as a
side-effect of fixing another PR.  Manual sweeps are tedious and time-consuming
because often you need to grab the test from the Bugzilla yet again (and
sometimes there are multiple tests).  Even if you find a PR that was fixed, you
still need to bisect the fix and perhaps add the test to our testsuite.  That's
draining and since the number of bugs only increases, never decreases, it is not
sustainable.

So I've started a personal repo where I've gathered dozens of tests and wrote a
script that just compiles every test in the repo and reports if anything
changed.  One line from it:

pr=59798; $cxx $o -c $pr.C 2>&1 | grep -qE 'internal compiler error' || echo -e 
"$pr: ${msg_ice}"

This has major drawbacks: you have to remember to run this manually, keep
updating it, and it's yet another repo that people interested in this would
have to clone, but the worst thing is that typically you would only discover
that a patch fixed a different PR long after the patch was committed.  And
quite likely it wasn't even your patch.  We know that finding problems earlier
in the developer workflow reduces costs; if we can catch this before the
original developer commits & pushes the changes, it's cheaper, because the
developer already understands what the patch does.

A case in point: https://gcc.gnu.org/PR58156 which has been fixed recently
by an unrelated (?) patch.  Knowing that the tsubst_pack_expansion hunk in
the patch had this effect would probably have been very useful.  More testing
will lead to a better compiler.

Another case: https://gcc.gnu.org/35098 which was fixed 12 years (!) after
it was reported by a different change.

Or another: https://gcc.gnu.org/91525 where the patch contained a test, but
that was ice-on-invalid, whereas the test in PR91525 was ice-on-valid.

To alleviate some of these problems, I propose that we introduce a means to our
DejaGNU infrastructure that allows adding tests for old bugs that have not been
fixed yet, and re-introduce the keyword monitored (no longer used for anything
-- I think Volker stepped away) to the GCC Bugzilla to signal that a PR is
tracked in the testsuite.  I don't want any unnecessary moving tests around, so
the tests would go where they would normally go; they have to be reduced and
have proper targets, etc.  Having such tests in the testsuite means that when
something changes, you will know immediately, before you push any changes.

My thinking is that for:

* rejects-valid: use the existing dg-xfail-if
* accepts-valid: use the new dg-accepts-invalid
* ICEs: use the new dg-ice

dg-ice can be used like this:

// { dg-ice "build_over_call" { target c++11 } }

and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
you'll get an XPASS + FAIL.  Then you can close the old PR.

Similarly, dg-accepts-invalid:

// { dg-accepts-invalid "PR86500" }

means that if the test still compiles without errors, you'll get a quiet XFAIL.
If we start giving errors, you'll get an XPASS.

If the bug is fixed, simply remove the directive.

The patch implementing these new directives is appended.  Once/if this is
accepted, I can start adding the old tests we have in our Bugzilla.  (I'm
only concerned about the c++ component, if that wasn't already clear.)

The question is what makes the bug "old": is it one year without it being
assigned?  6 months?  3 months?  Note: I *don't* propose to add every test for
every new PR, just the reasonably old ones that are useful/important.  Such
additions should be done in batches, so that we don't have dozens of commits,
each of

Re: [PATCH] c++: abbreviated function template friend matching [PR96106]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/20/20 4:18 PM, Patrick Palka wrote:

In the below testcase, duplicate_decls wasn't merging the tsubsted
friend declaration for 'void add(auto)' with its definition, because
reduce_template_parm_level (during tsubst_friend_function) lost the
DECL_VIRTUAL_P flag on the invented 'auto' template parameter, which
made template_heads_equivalent_p deem the two template heads as not
equivalent in C++20 mode.

This patch makes reduce_template_parm_level carry over the
DECL_VIRTUAL_P from the original TEMPLATE_PARM_DECL.

Passes 'make check-c++' and the cmcstl2 testsuite.  Does this look OK
for trunk after a full bootstrap and regtest?


OK.


gcc/cp/ChangeLog:

PR c++/96106
* pt.c (reduce_template_parm_level): Carry over DECL_VIRTUAL_P
from the original TEMPLATE_PARM_DECL.

gcc/testsuite/ChangeLog:

PR c++/96106
* g++.dg/concepts/abbrev7.C: New test.
---
  gcc/cp/pt.c |  1 +
  gcc/testsuite/g++.dg/concepts/abbrev7.C | 14 ++
  2 files changed, 15 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/concepts/abbrev7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index defc2a9abd8..3f7b89141b6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -4453,6 +4453,7 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
  type);
TREE_CONSTANT (decl) = TREE_CONSTANT (orig_decl);
TREE_READONLY (decl) = TREE_READONLY (orig_decl);
+  DECL_VIRTUAL_P (decl) = DECL_VIRTUAL_P (orig_decl);
DECL_ARTIFICIAL (decl) = 1;
SET_DECL_TEMPLATE_PARM_P (decl);
  
diff --git a/gcc/testsuite/g++.dg/concepts/abbrev7.C b/gcc/testsuite/g++.dg/concepts/abbrev7.C

new file mode 100644
index 000..443c1b7871b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/abbrev7.C
@@ -0,0 +1,14 @@
+// PR c++/96106
+// { dg-do compile { target concepts } }
+
+template
+struct number {
+  friend void add(auto);
+};
+
+void add(auto) { }
+
+void foo() {
+  number n;
+  add(n); // { dg-bogus "ambiguous" }
+}

Re: [PATCH] c++: Variable template and template parameter pack [PR96218]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/16/20 11:06 AM, Marek Polacek wrote:

This is DR 2032 which says that the restrictions regarding template
parameter packs and default arguments apply to variable templates as
well, but we weren't detecting that.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

DR 2032
PR c++/96218
* pt.c (check_default_tmpl_args): Also consider variable
templates.

gcc/testsuite/ChangeLog:

DR 2032
PR c++/96218
* g++.dg/cpp1y/var-templ67.C: New test.
---
  gcc/cp/pt.c  |  5 +++--
  gcc/testsuite/g++.dg/cpp1y/var-templ67.C | 16 
  2 files changed, 19 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ67.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4e1c77a6bd7..b74074a092b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5481,14 +5481,15 @@ check_default_tmpl_args (tree decl, tree parms, bool 
is_primary,
   /* Don't complain about an enclosing partial
  specialization.  */
   && parm_level == parms
-  && TREE_CODE (decl) == TYPE_DECL
+  && (TREE_CODE (decl) == TYPE_DECL || VAR_P (decl))


Can we remove this part of the test entirely, since the enclosing if 
already excludes functions?



   && i < ntparms - 1
   && template_parameter_pack_p (TREE_VALUE (parm))
   /* A fixed parameter pack will be partially
  instantiated into a fixed length list.  */
   && !fixed_parameter_pack_p (TREE_VALUE (parm)))
{
- /* A primary class template can only have one
+ /* A primary class template, primary variable template
+(DR 2032), or alias template can only have one
 parameter pack, at the end of the template
 parameter list.  */
  
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ67.C b/gcc/testsuite/g++.dg/cpp1y/var-templ67.C

new file mode 100644
index 000..f36af39bc19
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ67.C
@@ -0,0 +1,16 @@
+// DR 2032 - Default template-arguments of variable templates
+// PR c++/96218
+// { dg-do compile { target c++14 } }
+
+// [temp.param]/14: If a template-parameter of a class template, variable
+// template, or alias template has a default template-argument, each subsequent
+// template-parameter shall either have a default template-argument supplied or
+// be a template parameter pack.
+template
+T vt; // { dg-error "no default argument" }
+
+// [temp.param]/14: If a template-parameter of a primary class template,
+// primary variable template, or alias template is a template parameter pack,
+// it shall be the last template-parameter.
+template // { dg-error "must be at the end" }
+int vt2;

base-commit: 866c5bfd9c3ebc00913f3a84eb5383b51f2aee16

Re: [PATCH] nvptx: Support floating point reciprocal instructions.

2020-07-29 Thread Tom de Vries

On 7/12/20 9:38 PM, Roger Sayle wrote:
> 
> The following patch addds support for PTX's rcp.rn.f32 and rcp.rn.f64
> instructions.  Note that the "rcp.rn" forms of this instruction
> calculate the fully IEEE compliant result for the reciprocal, unlike
> the rcp.approx variants that just provide fast approximations.
> 
> I'm undecided as to whether to prefix this instruction name with
> "*" as this pattern is almost standard, so I can imagine the middle-end
> potentially adding optab support for recip2 expansion at some
> point.  I'll go with the (backend) reviewer's recommendation?
> 
> This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
> with "make" and "make check" with no new regressions.
> Ok for mainline?
> 
> 2020-07-12  Roger Sayle  
> 
> gcc/ChangeLog
>   * config/nvptx/nvptx.md (recip2): New instruction.
> 
> gcc/testsuite/ChangeLog
>   * gcc.target/nvptx/recip-1.c: New test.
> 

Pushed with some minor nits fixed (not reposting).

Thanks,
- Tom

Re: [PATCH] c++: alias_ctad_tweaks and constrained dguide [PR95486]

2020-07-29 Thread Patrick Palka via Gcc-patches

On Wed, 29 Jul 2020, Jason Merrill wrote:

> On 7/24/20 9:49 AM, Patrick Palka wrote:
> > In the below testcase, we're ICEing from alias_ctad_tweaks ultimately
> > because the implied deduction guide for X's user-defined constructor
> > already has constraints associated with it.  We then carry over these
> > constraints to 'fprime', the overlying deduction guide for the alias
> > template Y, via tsubst_decl from alias_ctad_tweaks.  Later in
> > alias_ctad_tweaks we call get_constraints followed by set_constraints on
> > this overlying deduction guide without doing remove_constraints in
> > between, which triggers the !found assert in set_constraints.
> > 
> > This patch fixes this issue by adding an intervening call to
> > remove_constraints.
> 
> This hunk is OK.
> 
> > And for consistency, it changes the call to
> > get_constraints to go through fprime instead of f.
> 
> This hunk is not, as what we're doing is substituting the constraints of f to
> produce the constraints of f'.  This change makes it less clear.
> 
> OK without that change.

That makes sense, consider that hunk gone then.

> 
> > Since the DECL_TEMPLATE_RESULT of fprime is f, this consistency change makes
> > no functional difference.
> 
> The DECL_TEMPLATE_RESULT of fprime is g, not f.

Ah yes, sorry for this thinko.  I'll commit just the one hunk after a
round of testing; thanks for the review.

> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > commit?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/95486
> > * pt.c (alias_ctad_tweaks): Obtain the associated constraints
> > through fprime instead of through f.  Call remove_constraints
> > before calling set_constraints.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/95486
> > * g++.dg/cpp2a/class-deduction-alias3.C: New test.
> > ---
> >   gcc/cp/pt.c |  7 +--
> >   gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C | 11 +++
> >   2 files changed, 16 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index b6df87a8a2e..4d955c555dc 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -28598,7 +28598,7 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
> >   DECL_PRIMARY_TEMPLATE (fprime) = fprime;
> >   /* Substitute the associated constraints.  */
> > - tree ci = get_constraints (f);
> > + tree ci = get_constraints (fprime);
> >   if (ci)
> > ci = tsubst_constraint_info (ci, targs, complain, in_decl);
> >   if (ci == error_mark_node)
> > @@ -28616,7 +28616,10 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
> > }
> >   if (ci)
> > -   set_constraints (fprime, ci);
> > +   {
> > + remove_constraints (fprime);
> > + set_constraints (fprime, ci);
> > +   }
> > }
> > else
> > {
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C
> > b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C
> > new file mode 100644
> > index 000..318d4c942ce
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C
> > @@ -0,0 +1,11 @@
> > +// PR c++/95486
> > +// { dg-do compile { target c++20 } }
> > +
> > +template
> > +struct X { X(U) requires __is_same(U, int) {} };
> > +
> > +template
> > +using Y = X;
> > +
> > +Y y{1};
> > +Y z{'a'}; // { dg-error "failed|no match" }
> > 
> 
>

[RFC, WIP] introduce attribute exalias

2020-07-29 Thread Alexandre Oliva



This patch introduces an attribute to add extra aliases to a symbol
when its definition is output.  The main goal is to ease interfacing
C++ with Ada, as C++ mangled names have to be named, and in some cases
(e.g. when using stdint.h typedefs in function arguments) the symbol
names may vary across platforms.

The attribute is usable in C and C++, presumably in all C-family
languages.  It can be attached to global variables and functions.  In
C++, it can also be attached to namespace-scoped variables and
functions, static data members, and member functions including
constructors and destructors, including explicit specializations of
template functions.  When applied to cdtors, additional exaliases with
_Base and _Del suffixes are defined when appropriate.

Applying the attribute to types is only valid in C++, and the effect
is to attach the alias to the RTTI object associated with the class
type, including in an explicit template instantiation.

This is still WIP, not fully tested, posted for feedback on the
feature and on the implementation.  I'm particularly unhappy with:

- the complexity of the interface/linkage updates in C++; I wonder why
  the back-propagation through aliases isn't being picked up, and
  whether it would be enough.

- C++ didn't merge attributes of extern local declarations with those
  of the namespace-scoped declaration.  I've added code to merge the
  attributes if there is a namespace-scoped declaration, but if there
  isn't one, there won't be any merging, and the effects are
  noticeable, as in the added attr-weak-1.C.  I'm also slightly
  concerned that an earlier local decl would go out of sync if a
  subsequent local decl, say within the same or even in another
  function, introduces additional attributes in the global decl.

- I'd rather rule out the attribute in template classes and functions,
  and enable it in explicit instantiations or specializations only,
  but there doesn't seem to be any way to test for generics proper (as
  opposed to specializations of generics) within c-attribs.c.

- I noticed an inconsistency between the handling of attribute exalias
  in template classes vs template functions.  Those attached to a
  template class end up affecting all instantiations, whereas those
  attached to template functions seem to be just dropped on the
  floor.  I haven't investigated this difference yet.

- Adding exalias to explicit instantiations of members of a template
  class, and also to the template class proper, pose some
  difficulties, as implicit instantiations sometimes conflict with the
  intent of explicit instantiations.  See attr-exalias-3.C for
  details.  I'm considering enabling attribute exalias in typedefs and
  using declarations, even after template instantiation, to try to
  alleviate this.

- Though exalias associated with a type, particularly a class type,
  makes sense for interfacing with Ada, as the RTTI object needs to be
  named to import a C++ exception type.  I've also considered an
  alternate attribute name, applied to a class and/or to its dtor.  I
  even implemented attribute exrtti for C++ class types, as an
  incremental patch, if that's preferred.

Early feedback is welcome.  Thanks in advance,

---
 .../doc/gnat_rm/interfacing_to_other_languages.rst |6 ++
 .../doc/gnat_ugn/the_gnat_compilation_model.rst|   20 --
 gcc/attribs.c  |   56 +
 gcc/attribs.h  |7 ++
 gcc/c-family/c-ada-spec.c  |7 ++
 gcc/c-family/c-attribs.c   |   42 -
 gcc/c/c-decl.c |2 +
 gcc/cgraph.h   |4 +
 gcc/cgraphunit.c   |2 -
 gcc/cp/class.c |   54 
 gcc/cp/cp-tree.h   |1 
 gcc/cp/decl.c  |4 +
 gcc/cp/decl2.c |   43 +
 gcc/cp/name-lookup.c   |8 ++
 gcc/cp/optimize.c  |6 ++
 gcc/cp/rtti.c  |   13 
 gcc/doc/extend.texi|   52 
 gcc/symtab.c   |   26 
 gcc/testsuite/c-c++-common/attr-weak-1.c   |   19 ++
 .../c-c++-common/torture/attr-exalias-1.c  |   39 
 .../c-c++-common/torture/attr-exalias-2.c  |   13 
 .../c-c++-common/torture/attr-exalias-3.c  |   41 
 .../c-c++-common/torture/attr-exalias-4.c  |   28 
 gcc/testsuite/g++.dg/torture/attr-exalias-1.C  |   66 
 gcc/testsuite/g++.dg/torture/attr-exalias-2.C  |   26 
 gcc/testsuite/g++.dg/torture/attr-exalias-3.C  |   61 ++
 gcc/te

Re: RFC: Monitoring old PRs, new dg directives

2020-07-29 Thread Jason Merrill via Gcc-patches

On Tue, Jul 28, 2020 at 5:45 PM Marek Polacek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> In Bugzilla, for the c++ component, we currently have over 3200 open
> bugs.  In
> my experience, a good amount of them have already been fixed; my periodical
> sweeps always turn up a bunch of PRs that had already been fixed
> previously.
> Sometimes my sweeps are more or less random, but more often than not I'm
> just
> looking for duplicates of an existing PR.  Sometimes the reason the already
> fixed PRs are still open is because a PR that was fixed had duplicates
> that we
> didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as
> a
> side-effect of fixing another PR.  Manual sweeps are tedious and
> time-consuming
> because often you need to grab the test from the Bugzilla yet again (and
> sometimes there are multiple tests).  Even if you find a PR that was
> fixed, you
> still need to bisect the fix and perhaps add the test to our testsuite.
> That's
> draining and since the number of bugs only increases, never decreases, it
> is not
> sustainable.
>
> So I've started a personal repo where I've gathered dozens of tests and
> wrote a
> script that just compiles every test in the repo and reports if anything
> changed.  One line from it:
>
> pr=59798; $cxx $o -c $pr.C 2>&1 | grep -qE 'internal compiler error' ||
> echo -e "$pr: ${msg_ice}"
>
> This has major drawbacks: you have to remember to run this manually, keep
> updating it, and it's yet another repo that people interested in this would
> have to clone, but the worst thing is that typically you would only
> discover
> that a patch fixed a different PR long after the patch was committed.  And
> quite likely it wasn't even your patch.  We know that finding problems
> earlier
> in the developer workflow reduces costs; if we can catch this before the
> original developer commits & pushes the changes, it's cheaper, because the
> developer already understands what the patch does.
>
> A case in point: https://gcc.gnu.org/PR58156 which has been fixed recently
> by an unrelated (?) patch.  Knowing that the tsubst_pack_expansion hunk in
> the patch had this effect would probably have been very useful.  More
> testing
> will lead to a better compiler.
>
> Another case: https://gcc.gnu.org/35098 which was fixed 12 years (!) after
> it was reported by a different change.
>
> Or another: https://gcc.gnu.org/91525 where the patch contained a test,
> but
> that was ice-on-invalid, whereas the test in PR91525 was ice-on-valid.
>
> To alleviate some of these problems, I propose that we introduce a means
> to our
> DejaGNU infrastructure that allows adding tests for old bugs that have not
> been
> fixed yet, and re-introduce the keyword monitored (no longer used for
> anything
> -- I think Volker stepped away) to the GCC Bugzilla to signal that a PR is
> tracked in the testsuite.  I don't want any unnecessary moving tests
> around, so
> the tests would go where they would normally go; they have to be reduced
> and
> have proper targets, etc.  Having such tests in the testsuite means that
> when
> something changes, you will know immediately, before you push any changes.
>
> My thinking is that for:
>
> * rejects-valid: use the existing dg-xfail-if
>

Or dg-excess-errors, or xfailed dg-bogus.


> * accepts-valid: use the new dg-accepts-invalid
>

xfailed dg-error should cover this case.


> * ICEs: use the new dg-ice.
>

This seems like a good addition.


> dg-ice can be used like this:
>
> // { dg-ice "build_over_call" { target c++11 } }
>
> and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
> ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
> you'll get an XPASS + FAIL.  Then you can close the old PR.
>
> Similarly, dg-accepts-invalid:
>
> // { dg-accepts-invalid "PR86500" }
>
> means that if the test still compiles without errors, you'll get a quiet
> XFAIL.
> If we start giving errors, you'll get an XPASS.
>
> If the bug is fixed, simply remove the directive.
>
> The patch implementing these new directives is appended.  Once/if this is
> accepted, I can start adding the old tests we have in our Bugzilla.  (I'm
> only concerned about the c++ component, if that wasn't already clear.)
>
> The question is what makes the bug "old": is it one year without it being
> assigned?  6 months?  3 months?  Note: I *don't* propose to add every test
> for
> every new PR, just the reasonably old ones that are useful/important.  Such
> additions should be done in batches, so that we don't have dozens of
> commits,
> each of them merely adding a single test.
>
> We will still have a surfeit of bugs that we've given short shrift to, but
> let's at least automate what we can.  The initial addition of the relevant
> old(-ish) tests won't of course happen automagically, but it's a price I'm
> willing to pay.  My goal here isn't merely to reduce the number of open
> PRs;
> it is to improve the testing of the compiler

Re: [PATCH] avoid -Wnonnull on synthesized condition in static_cast (PR 96003)

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/23/20 3:08 PM, Martin Sebor wrote:

Another test case added to the bug since I posted the patch shows
that the no-warning bit set by the C++ front end is easily lost
during early folding, triggering the -Wnonull again despite
the suppression.  The attached revision tweaks the folder in just
this one case to let the suppression take effect in this situation.

In light of how pervasive this potential for stripping looks (none
of the other folders preserves the bit) the tweak feels like a band-
aid rather than a general solution.  But implementing something
better (and mainly developing tests to verify that it works in
general) would probably be quite the project.  So I leave it for
some other time.


How about in check_function_arguments_recurse COND_EXPR handling, 
checking for TREE_NO_WARNING on the condition?  Then we wouldn't need to 
deal with setting and copying it on the COND_EXPR itself.




On 7/17/20 1:00 PM, Martin Sebor wrote:

The recent enhancement to treat the implicit this pointer argument
as nonnull in member functions triggers spurious -Wnonnull for
the synthesized conditional expression the C++ front end replaces
the pointer with in some static_cast expressions.  The front end
already sets the no-warning bit for the test but not for the whole
conditional expression, so the attached fix extends the same solution
to it.

The consequence of this fix is that user-written code like this:

   static_cast(p ? p : 0)->f ();
or
   static_cast(p ? p : nullptr)->f ();

don't trigger the warning because they are both transformed into
the same expression as:

   static_cast(p)->f ();

What still does trigger it is this:

   static_cast(p ? p : (T*)0)->f ();

because here it's the inner COND_EXPR's no-warning bit that's set
(the outer one is clear), whereas in the former expressions it's
the other way around.  It would be nice if this worked consistently
but I didn't see an easy way to do that and more than a quick fix
seems outside the scope for this bug.

Another case reported by someone else in the same bug involves
a dynamic_cast.  A simplified test case goes something like this:

   if (dynamic_cast(p))
 dynamic_cast(p)->f ();

The root cause is the same: the front end emitting the COND_EXPR

   ((p != 0) ? ((T*)__dynamic_cast(p, (& _ZTI1B), (& _ZTI1C), 0)) : 0)

I decided not to suppress the warning in this case because doing
so would also suppress it in unconditional calls with the result
of the cast:

   dynamic_cast(p)->f ();

and that doesn't seem helpful.  Instead, I'd suggest to make
the second cast in the if statement to reference to T&:

   if (dynamic_cast(p))
 dynamic_cast(*p).f ();

Martin

Re: [PATCH] c++: alias_ctad_tweaks and constrained dguide [PR95486]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/24/20 9:49 AM, Patrick Palka wrote:

In the below testcase, we're ICEing from alias_ctad_tweaks ultimately
because the implied deduction guide for X's user-defined constructor
already has constraints associated with it.  We then carry over these
constraints to 'fprime', the overlying deduction guide for the alias
template Y, via tsubst_decl from alias_ctad_tweaks.  Later in
alias_ctad_tweaks we call get_constraints followed by set_constraints on
this overlying deduction guide without doing remove_constraints in
between, which triggers the !found assert in set_constraints.

This patch fixes this issue by adding an intervening call to
remove_constraints.


This hunk is OK.


And for consistency, it changes the call to
get_constraints to go through fprime instead of f.


This hunk is not, as what we're doing is substituting the constraints of 
f to produce the constraints of f'.  This change makes it less clear.


OK without that change.


Since the DECL_TEMPLATE_RESULT of fprime is f, this consistency change makes
no functional difference.


The DECL_TEMPLATE_RESULT of fprime is g, not f.


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
commit?

gcc/cp/ChangeLog:

PR c++/95486
* pt.c (alias_ctad_tweaks): Obtain the associated constraints
through fprime instead of through f.  Call remove_constraints
before calling set_constraints.

gcc/testsuite/ChangeLog:

PR c++/95486
* g++.dg/cpp2a/class-deduction-alias3.C: New test.
---
  gcc/cp/pt.c |  7 +--
  gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C | 11 +++
  2 files changed, 16 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b6df87a8a2e..4d955c555dc 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28598,7 +28598,7 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  DECL_PRIMARY_TEMPLATE (fprime) = fprime;
  
  	  /* Substitute the associated constraints.  */

- tree ci = get_constraints (f);
+ tree ci = get_constraints (fprime);
  if (ci)
ci = tsubst_constraint_info (ci, targs, complain, in_decl);
  if (ci == error_mark_node)
@@ -28616,7 +28616,10 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
}
  
  	  if (ci)

-   set_constraints (fprime, ci);
+   {
+ remove_constraints (fprime);
+ set_constraints (fprime, ci);
+   }
}
else
{
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C
new file mode 100644
index 000..318d4c942ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias3.C
@@ -0,0 +1,11 @@
+// PR c++/95486
+// { dg-do compile { target c++20 } }
+
+template
+struct X { X(U) requires __is_same(U, int) {} };
+
+template
+using Y = X;
+
+Y y{1};
+Y z{'a'}; // { dg-error "failed|no match" }

Re: [PATCH] c++: overload sets and placeholder return type [PR64194]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/29/20 3:23 PM, Patrick Palka wrote:

On Wed, 29 Jul 2020, Patrick Palka wrote:


In the testcase below, template argument deduction for the call
g(id) goes wrong because the functions in the overload set id
each have a yet-undeduced auto return type, and this undeduced return
type makes try_one_overload fail to match up any of these functions with
g's parameter type, leading to g's template parameter going undeduced
and to the overload set going unresolved.

This patch fixes this issue by performing return type deduction via
instantiation before doing try_one_overload, in a manner similar to what
resolve_address_of_overloaded_function does.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?

gcc/cp/ChangeLog:

PR c++/64194
* pt.c (resolve_overloaded_unification): If the function
template specialization has a placeholder return type,
then instantiate it before attempting unification.

gcc/testsuite/ChangeLog:

PR c++/64194
* g++.dg/cpp1y/auto-fn60.C: New test.
---
  gcc/cp/pt.c| 11 ++-
  gcc/testsuite/g++.dg/cpp1y/auto-fn60.C | 11 +++
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn60.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4d955c555dc..abb74520fc9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22118,7 +22118,16 @@ resolve_overloaded_unification (tree tparms,
  if (subargs != error_mark_node
  && !any_dependent_template_arguments_p (subargs))
{
- elem = TREE_TYPE (instantiate_template (fn, subargs, tf_none));
+ fn = instantiate_template (fn, subargs, tf_none);
+ if (undeduced_auto_decl (fn))
+   {
+ /* Instantiate the function to deduce its return type.  */
+ ++function_depth;
+ instantiate_decl (fn, /*defer*/false, /*class*/false);
+ --function_depth;
+   }
+
+ elem = TREE_TYPE (fn);
  if (try_one_overload (tparms, targs, tempargs, parm,
elem, strict, sub_strict, addr_p, explain_p)
  && (!goodfn || !same_type_p (goodfn, elem)))
diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
new file mode 100644
index 000..237868c076b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
@@ -0,0 +1,11 @@
+// PR c++/64194
+// { dg-do compile { target c++14 } }
+
+template  void g(void (*)(T)) { }
+
+template  auto id(int) { }
+template  auto id(char) { return 0; }
+
+int main() {
+  g(id);


Oops, this is supposed to use id instead of id to match up
with the commit message (or vice versa).


OK.

Jason

Re: [PATCH] Require CET support only for the final GCC build

2020-07-29 Thread Joseph Myers

On Wed, 29 Jul 2020, Richard Biener wrote:

> Note that any workable solution is fine with me, I just
> don't feel comfortable approving the solution involving
> ../curr_stage and friends.  Joseph, would HJs latest
> patch be OK technically?

Yes, I think that's OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] c++: Use error_at rather than warning_at for missing return in constexpr functions [PR96182]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/14/20 4:50 AM, Jakub Jelinek wrote:

Hi!

For C++11 we already emit an error if a constexpr function doesn't contain
a return statement, because in C++11 that is the only thing it needs to
contain, but for C++14 we would normally issue a -Wreturn-type warning.

As mentioned by Jonathan, such constexpr functions are invalid, no
diagnostics required, because there doesn't exist any arguments for
which it would result in valid constant expression.

This raises it to an error in such cases.  The !LAMBDA_TYPE_P case
is to avoid error on g++.dg/pr81194.C where the user didn't write
constexpr anywhere and the operator() is compiler generated.


We set DECL_DECLARED_CONSTEXPR_P on lambdas earlier in finish_function 
if suitable; can we move this diagnostic up before that?



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-07-14  Jakub Jelinek  

PR c++/96182
* decl.c (finish_function): In constexpr functions other than
lambdas use for C++14 and later error instead of warning if no
return statement is present and diagnose it regardless of
warn_return_type.

* g++.dg/cpp1y/constexpr-96182.C: New test.
* g++.dg/other/error35.C (S::g()): Add return statement.
* g++.dg/cpp1y/pr63996.C (foo): Likewise.
* g++.dg/cpp1y/constexpr-return2.C (f): Likewise.
* g++.dg/cpp1y/var-templ44.C (make_array): Add throw 1.

--- gcc/cp/decl.c.jj2020-07-13 19:09:27.258953908 +0200
+++ gcc/cp/decl.c   2020-07-13 22:25:42.437062842 +0200
@@ -17164,7 +17164,10 @@ finish_function (bool inline_p)
BLOCK_SUPERCONTEXT (DECL_INITIAL (fndecl)) = fndecl;
  
/* Complain if there's just no return statement.  */

-  if (warn_return_type
+  if ((warn_return_type
+   || (cxx_dialect >= cxx14
+  && DECL_DECLARED_CONSTEXPR_P (fndecl)
+  && !LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl
&& !VOID_TYPE_P (TREE_TYPE (fntype))
&& !dependent_type_p (TREE_TYPE (fntype))
&& !current_function_returns_value && !current_function_returns_null
@@ -17196,8 +17199,14 @@ finish_function (bool inline_p)
global_dc->option_state))
add_return_star_this_fixit (&richloc, fndecl);
}
-  if (warning_at (&richloc, OPT_Wreturn_type,
- "no return statement in function returning non-void"))
+  if (cxx_dialect >= cxx14
+ && DECL_DECLARED_CONSTEXPR_P (fndecl)
+ && !LAMBDA_TYPE_P (CP_DECL_CONTEXT (fndecl)))
+   error_at (&richloc, "no return statement in % function "
+   "returning non-void");
+  else if (warning_at (&richloc, OPT_Wreturn_type,
+  "no return statement in function returning "
+  "non-void"))
TREE_NO_WARNING (fndecl) = 1;
  }
  
--- gcc/testsuite/g++.dg/other/error35.C.jj	2020-01-12 11:54:37.214401324 +0100

+++ gcc/testsuite/g++.dg/other/error35.C2020-07-13 22:35:55.359228614 
+0200
@@ -9,6 +9,6 @@ template  struct S {
  enum S::E;
  template  enum S::E : int { b };
  template 
-constexpr int S::g() const { b; } // { dg-error "not declared" }
+constexpr int S::g() const { b; if (false) return 0; } // { dg-error "not 
declared" }
  static_assert(S().g() == 1, ""); // { dg-error "" }
  // { dg-message "in .constexpr. expansion of" "" { target *-*-* } .-1 }
--- gcc/testsuite/g++.dg/cpp1y/pr63996.C.jj 2020-01-12 11:54:37.0 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/pr63996.C2020-07-13 22:17:39.034004329 
+0200
@@ -5,6 +5,7 @@ constexpr int
  foo (int i)
  {
int a[i] = { }; // { dg-error "7:ISO C\\+\\+ forbids variable length array 
.a" }
+  if (i == 23) return 0;
  }
  
  constexpr int j = foo (1); // { dg-error "flows off the end|in .constexpr. expansion of" }

--- gcc/testsuite/g++.dg/cpp1y/constexpr-96182.C.jj 2020-07-13 
19:16:42.742936492 +0200
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-96182.C2020-07-13 
19:16:12.264357640 +0200
@@ -0,0 +1,6 @@
+// PR c++/96182
+// { dg-do compile { target c++11 } }
+
+constexpr int foo () {} // { dg-error "no return statement in 'constexpr' function returning 
non-void" "" { target c++14 } }
+// { dg-error "body of 'constexpr' function 'constexpr int foo\\\(\\\)' not a 
return-statement" "" { target c++11_only } .-1 }
+// { dg-warning "no return statement in function returning non-void" "" { 
target c++11_only } .-2 }
--- gcc/testsuite/g++.dg/cpp1y/constexpr-return2.C.jj   2020-01-12 
11:54:37.115402818 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-return2.C  2020-07-13 
22:17:03.582513397 +0200
@@ -3,6 +3,7 @@
  
  constexpr int f (int i)

  {
+  if (i == -1) return 0;
  }
  
  constexpr int i = f(42);	// { dg-error "flows off the end|in .constexpr. expansion of " }

--- gcc/testsuite/g++.dg/cpp1y/var-templ44.C.jj 2020-01-12 11:54:37.123402697 
+0100
+++ gcc/testsuite/g++.dg/cpp1y/var-templ44.C2020-07-13 22:35:03.322980157 
+0200
@@ -26,5

Re: [PATCH] c++: overload sets and placeholder return type [PR64194]

2020-07-29 Thread Patrick Palka via Gcc-patches

On Wed, 29 Jul 2020, Patrick Palka wrote:

> In the testcase below, template argument deduction for the call
> g(id) goes wrong because the functions in the overload set id
> each have a yet-undeduced auto return type, and this undeduced return
> type makes try_one_overload fail to match up any of these functions with
> g's parameter type, leading to g's template parameter going undeduced
> and to the overload set going unresolved.
> 
> This patch fixes this issue by performing return type deduction via
> instantiation before doing try_one_overload, in a manner similar to what
> resolve_address_of_overloaded_function does.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
> commit?
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/64194
>   * pt.c (resolve_overloaded_unification): If the function
>   template specialization has a placeholder return type,
>   then instantiate it before attempting unification.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c++/64194
>   * g++.dg/cpp1y/auto-fn60.C: New test.
> ---
>  gcc/cp/pt.c| 11 ++-
>  gcc/testsuite/g++.dg/cpp1y/auto-fn60.C | 11 +++
>  2 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 4d955c555dc..abb74520fc9 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -22118,7 +22118,16 @@ resolve_overloaded_unification (tree tparms,
> if (subargs != error_mark_node
> && !any_dependent_template_arguments_p (subargs))
>   {
> -   elem = TREE_TYPE (instantiate_template (fn, subargs, tf_none));
> +   fn = instantiate_template (fn, subargs, tf_none);
> +   if (undeduced_auto_decl (fn))
> + {
> +   /* Instantiate the function to deduce its return type.  */
> +   ++function_depth;
> +   instantiate_decl (fn, /*defer*/false, /*class*/false);
> +   --function_depth;
> + }
> +
> +   elem = TREE_TYPE (fn);
> if (try_one_overload (tparms, targs, tempargs, parm,
>   elem, strict, sub_strict, addr_p, explain_p)
> && (!goodfn || !same_type_p (goodfn, elem)))
> diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C 
> b/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
> new file mode 100644
> index 000..237868c076b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
> @@ -0,0 +1,11 @@
> +// PR c++/64194
> +// { dg-do compile { target c++14 } }
> +
> +template  void g(void (*)(T)) { }
> +
> +template  auto id(int) { }
> +template  auto id(char) { return 0; }
> +
> +int main() {
> +  g(id);

Oops, this is supposed to use id instead of id to match up
with the commit message (or vice versa).

> +}
> -- 
> 2.28.0.rc1
> 
>

[PATCH] c++: overload sets and placeholder return type [PR64194]

2020-07-29 Thread Patrick Palka via Gcc-patches

In the testcase below, template argument deduction for the call
g(id) goes wrong because the functions in the overload set id
each have a yet-undeduced auto return type, and this undeduced return
type makes try_one_overload fail to match up any of these functions with
g's parameter type, leading to g's template parameter going undeduced
and to the overload set going unresolved.

This patch fixes this issue by performing return type deduction via
instantiation before doing try_one_overload, in a manner similar to what
resolve_address_of_overloaded_function does.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?

gcc/cp/ChangeLog:

PR c++/64194
* pt.c (resolve_overloaded_unification): If the function
template specialization has a placeholder return type,
then instantiate it before attempting unification.

gcc/testsuite/ChangeLog:

PR c++/64194
* g++.dg/cpp1y/auto-fn60.C: New test.
---
 gcc/cp/pt.c| 11 ++-
 gcc/testsuite/g++.dg/cpp1y/auto-fn60.C | 11 +++
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn60.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4d955c555dc..abb74520fc9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22118,7 +22118,16 @@ resolve_overloaded_unification (tree tparms,
  if (subargs != error_mark_node
  && !any_dependent_template_arguments_p (subargs))
{
- elem = TREE_TYPE (instantiate_template (fn, subargs, tf_none));
+ fn = instantiate_template (fn, subargs, tf_none);
+ if (undeduced_auto_decl (fn))
+   {
+ /* Instantiate the function to deduce its return type.  */
+ ++function_depth;
+ instantiate_decl (fn, /*defer*/false, /*class*/false);
+ --function_depth;
+   }
+
+ elem = TREE_TYPE (fn);
  if (try_one_overload (tparms, targs, tempargs, parm,
elem, strict, sub_strict, addr_p, explain_p)
  && (!goodfn || !same_type_p (goodfn, elem)))
diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
new file mode 100644
index 000..237868c076b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn60.C
@@ -0,0 +1,11 @@
+// PR c++/64194
+// { dg-do compile { target c++14 } }
+
+template  void g(void (*)(T)) { }
+
+template  auto id(int) { }
+template  auto id(char) { return 0; }
+
+int main() {
+  g(id);
+}
-- 
2.28.0.rc1

Re: [PATCH] c++: constraints and explicit instantiation [PR96164]

2020-07-29 Thread Jason Merrill via Gcc-patches


On 7/13/20 5:19 PM, Patrick Palka wrote:

When considering to instantiate a member of a class template as part of an
explicit instantiation of the class template, we need to first check the
member's constraints before proceeding with the instantiation of the member.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to commit
to trunk (and perhaps to the 10 branch)?


Let's change constraints_satisfied_p to return true if !flag_concepts, 
so you don't need to check the flag here.  OK for both with that change.



gcc/cp/ChangeLog:

PR c++/96164
* pt.c (do_type_instantiation): Update a paragraph taken from
[temp.explicit] to reflect the latest specification.  Don't
instantiate a member with unsatisfied constraints.

gcc/testsuite/ChangeLog:

PR c++/96164
* g++.dg/cpp2a/concepts-explicit-inst5.C: New test.
---
  gcc/cp/pt.c   | 23 +--
  .../g++.dg/cpp2a/concepts-explicit-inst5.C| 14 +++
  2 files changed, 25 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 61f22733858..c518f221f2f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -24902,23 +24902,22 @@ do_type_instantiation (tree t, tree storage, 
tsubst_flags_t complain)
  
  	 [temp.explicit]
  
-	 The explicit instantiation of a class template specialization

-implies the instantiation of all of its members not
-previously explicitly specialized in the translation unit
-containing the explicit instantiation.
-
- Of course, we can't instantiate member template classes, since we
- don't have any arguments for them.  Note that the standard is
- unclear on whether the instantiation of the members are
- *explicit* instantiations or not.  However, the most natural
- interpretation is that it should be an explicit
- instantiation.  */
+An explicit instantiation that names a class template
+specialization is also an explicit instantiation of the same
+kind (declaration or definition) of each of its members (not
+including members inherited from base classes and members
+that are templates) that has not been previously explicitly
+specialized in the translation unit containing the explicit
+instantiation, provided that the associated constraints, if
+any, of that member are satisfied by the template arguments
+of the explicit instantiation.  */
for (tree fld = TYPE_FIELDS (t); fld; fld = DECL_CHAIN (fld))
  if ((VAR_P (fld)
 || (TREE_CODE (fld) == FUNCTION_DECL
 && !static_p
 && user_provided_p (fld)))
-   && DECL_TEMPLATE_INSTANTIATION (fld))
+   && DECL_TEMPLATE_INSTANTIATION (fld)
+   && (!flag_concepts || constraints_satisfied_p (fld)))
{
mark_decl_instantiated (fld, extern_p);
if (! extern_p)
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst5.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst5.C
new file mode 100644
index 000..05959a8972c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-explicit-inst5.C
@@ -0,0 +1,14 @@
+// PR c++/96164
+// { dg-do compile { target concepts } }
+
+template 
+struct A {
+void f() requires (N == 3) { static_assert(N == 3); }
+};
+template struct A<2>;
+
+template 
+struct B {
+void f() requires (N == 2) { static_assert(N == 3); } // { dg-error 
"assert" }
+};
+template struct B<2>;

New Swedish PO file for 'gcc' (version 10.2.0)

2020-07-29 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-10.2.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH v2] RS6000 Add testlsbb (Test LSB by Byte) operations

2020-07-29 Thread Segher Boessenkool

Hi Will,

On Wed, Jul 29, 2020 at 12:40:08PM -0500, will schmidt wrote:
> V2 has completed tests on powerpc64le-unknown-linux-gnu Power8LE, with
> other regression tests still in progress on some other powerpc platforms.

Thanks for that :-)

> +;; xvtlsbb BF,XB
> +;; Set the CR field BF to indicate if bit 7 of every byte element in VSR[XB]
> +;; is equal to 1 (ALL_TRUE) or equal to 0 (ALL_FALSE).

That now does not say which bit in the CR field means which.  And, do you
think "bit 7" is clearer than "lowest bit"?

> +(define_insn "*xvtlsbb_internal"
> +  [(set (match_operand:CC 0 "cc_reg_operand" "=y")
> + (unspec:CC [(match_operand:V16QI 1 "vsx_register_operand" "wa")]
> +  UNSPEC_XVTLSBB))]
> +  "TARGET_POWER10"
> +  "xvtlsbb %0,%x1"
> +  [(set_attr "type" "logical")])

> +;; Vector Test Least Significant Bit by Byte
> +;; for the implementation of the builtin
> +;; __builtin_vec_test_lsbb_all_ones
> +;; int vec_test_lsbb_all_ones (vector unsigned char);
> +;; and
> +;; __builtin_vec_test_lsbb_all_zeros
> +;; int vec_test_lsbb_all_zeros (vector unsigned char);
> +(define_expand "xvtlsbbo"
> +  [(set (match_dup 2)
> + (unspec:CC [(match_operand:V16QI 1 "vsx_register_operand" "v")]
> +  UNSPEC_XVTLSBB))
> + (set (match_operand:SI 0 "gpc_reg_operand" "=r")
> + (lt:SI (match_dup 2) (const_int 0)))]
> +  "TARGET_POWER10"
> +  {
> +  operands[2] = gen_reg_rtx (CCmode);
> +  })

Indentation is wrong still:

(define_expand "xvtlsbbo"
  [(set (match_dup 2)
(unspec:CC [(match_operand:V16QI 1 "vsx_register_operand" "v")]
 UNSPEC_XVTLSBB))
   (set (match_operand:SI 0 "gpc_reg_operand" "=r")
(lt:SI (match_dup 2) (const_int 0)))]
  "TARGET_POWER10"
{
  operands[2] = gen_reg_rtx (CCmode);
})

(The C block should be indented like C, withe the outer braces flush
left; the arms of the (implicit) parallel should be indented the same;
if writing some RTX over multiple lines, things should be indented the
same as the previous thing at the same level, if it isn't the first
itself.  But none of these rules are hard rules ;-) )

Other than those nits, this looks fine.  Okay for trunk.  Thanks!


Segher

[pushed] c++: Implement C++20 implicit move changes. [PR91427]

2020-07-29 Thread Jason Merrill via Gcc-patches

P1825R0 extends the C++11 implicit move on return by removing the
constraints on the called constructor: previously, it needed to take an
rvalue reference to the type of the returned variable.  The paper also
allows move on throw of parameters and implicit move of rvalue references.

Discussion on the CWG reflector about how to avoid breaking the PR91212 test
in the new model settled on the model of doing only a single overload
resolution, with the variable treated as an xvalue that can bind to
non-const lvalue references.  So this patch implements that approach.  The
implementation does not use the existing LOOKUP_PREFER_RVALUE flag, but
instead sets a flag on the representation of the static_cast turning the
variable into an xvalue.

For the time being I'm limiting the new semantics to C++20 mode; since it
was moved as a DR, we will probably want to apply the change to other
standard modes as well once we have a better sense of the impact on existing
code, probably in GCC 12.

gcc/cp/ChangeLog:

PR c++/91427
* cp-tree.h (IMPLICIT_RVALUE_P): New.
(enum cp_lvalue_kind_flags): Add clk_implicit_rval.
(implicit_rvalue_p, set_implicit_rvalue_p): New.
* call.c (reference_binding): Check clk_implicit_rval.
(build_over_call): Adjust C++20 implicit move.
* coroutines.cc (finish_co_return_stmt): Simplify implicit move.
* except.c (build_throw): Adjust C++20 implicit move.
* pt.c (tsubst_copy_and_build) [STATIC_CAST_EXPR]: Propagate
IMPLICIT_RVALUE_P.
* tree.c (lvalue_kind): Set clk_implicit_rval.
* typeck.c (treat_lvalue_as_rvalue_p): Overhaul.
(maybe_warn_pessimizing_move): Adjust.
(check_return_expr): Adjust C++20 implicit move.

gcc/testsuite/ChangeLog:

PR c++/91427
* g++.dg/coroutines/co-return-syntax-10-movable.C: Extend.
* g++.dg/cpp0x/Wredundant-move1.C: Adjust for C++20.
* g++.dg/cpp0x/Wredundant-move7.C: Adjust for C++20.
* g++.dg/cpp0x/Wredundant-move9.C: Adjust for C++20.
* g++.dg/cpp0x/elision_neg.C: Adjust for C++20.
* g++.dg/cpp0x/move-return2.C: Adjust for C++20.
* g++.dg/cpp0x/ref-qual20.C: Adjust for C++20.
* g++.dg/cpp2a/implicit-move1.C: New test.
* g++.dg/cpp2a/implicit-move2.C: New test.
* g++.dg/cpp2a/implicit-move3.C: New test.
---
 gcc/cp/cp-tree.h  | 35 ++-
 gcc/cp/call.c |  6 +-
 gcc/cp/coroutines.cc  | 30 ++
 gcc/cp/except.c   | 28 +++---
 gcc/cp/pt.c   |  2 +
 gcc/cp/tree.c |  7 +-
 gcc/cp/typeck.c   | 95 ++-
 .../coroutines/co-return-syntax-10-movable.C  | 12 ++-
 gcc/testsuite/g++.dg/cpp0x/Wredundant-move1.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/Wredundant-move7.C |  6 +-
 gcc/testsuite/g++.dg/cpp0x/Wredundant-move9.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/elision_neg.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/move-return2.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/ref-qual20.C   |  5 +-
 gcc/testsuite/g++.dg/cpp2a/implicit-move1.C   | 17 
 gcc/testsuite/g++.dg/cpp2a/implicit-move2.C   | 49 ++
 gcc/testsuite/g++.dg/cpp2a/implicit-move3.C   | 49 ++
 17 files changed, 277 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/implicit-move1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/implicit-move2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/implicit-move3.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2377fc052bb..ea4871f836a 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -466,7 +466,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   IMPLICIT_CONV_EXPR_BRACED_INIT (in IMPLICIT_CONV_EXPR)
   TINFO_VAR_DECLARED_CONSTINIT (in TEMPLATE_INFO)
   CALL_FROM_NEW_OR_DELETE_P (in CALL_EXPR)
-   3: (TREE_REFERENCE_EXPR) (in NON_LVALUE_EXPR) (commented-out).
+   3: IMPLICIT_RVALUE_P (in NON_LVALUE_EXPR or STATIC_CAST_EXPR)
   ICS_BAD_FLAG (in _CONV)
   FN_TRY_BLOCK_P (in TRY_BLOCK)
   BIND_EXPR_BODY_BLOCK (in BIND_EXPR)
@@ -3803,6 +3803,11 @@ struct GTY(()) lang_decl {
&& TREE_TYPE (TREE_OPERAND (NODE, 0))   \
&& TYPE_REF_P (TREE_TYPE (TREE_OPERAND ((NODE), 0
 
+/* True iff this represents an lvalue being treated as an rvalue during return
+   or throw as per [class.copy.elision].  */
+#define IMPLICIT_RVALUE_P(NODE) \
+  TREE_LANG_FLAG_3 (TREE_CHECK2 ((NODE), NON_LVALUE_EXPR, STATIC_CAST_EXPR))
+
 #define NEW_EXPR_USE_GLOBAL(NODE) \
   TREE_LANG_FLAG_0 (NEW_EXPR_CHECK (NODE))
 #define DELETE_EXPR_USE_GLOBAL(NODE) \
@@ -5184,7 +5189,8 @@ enum cp_lvalue_kind_flags {
   clk_rvalueref = 2,/* An xvalue (rvalue formed using an rvalue reference) */
   clk_class = 4,/* A prvalue of class or array type.  */
   clk_bitfield = 8, /* An lvalue for a bit-field.  */

[pushed] c++: Avoid calling const copy ctor on implicit move. [PR91212]

2020-07-29 Thread Jason Merrill via Gcc-patches

Our implementation of C++11 implicit move was wrong for return; we didn't
actually hit the check for the type of the first parameter of the selected
constructor, because we didn't see LOOKUP_PREFER_RVALUE set properly.

Fixing that to look at the right flags fixed the issue for this testcase,
but broke implicit move for a by-value converting constructor (PR58051).  I
think this was not allowed in C++17, but it is allowed under the implicit
move changes from C++20, and those changes were voted to apply as a DR to
earlier standards as well, so I don't want to break it now.

So after fixing the flags check I changed the test to allow value
parameters.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/91212
* call.c (build_over_call): Don't call a const ref
overload for implicit move.

gcc/testsuite/ChangeLog:

PR c++/91212
* g++.dg/cpp0x/move-return3.C: New test.
---
 gcc/cp/call.c |  9 ++---
 gcc/testsuite/g++.dg/cpp0x/move-return3.C | 23 +++
 2 files changed, 29 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/move-return3.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f9508ae3635..e283d635d60 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8678,15 +8678,18 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
  parm = TREE_CHAIN (parm);
}
 
-  if (flags & LOOKUP_PREFER_RVALUE)
+  if (cand->flags & LOOKUP_PREFER_RVALUE)
{
  /* The implicit move specified in 15.8.3/3 fails "...if the type of
 the first parameter of the selected constructor is not an rvalue
 reference to the object's type (possibly cv-qualified)" */
  gcc_assert (!(complain & tf_error));
  tree ptype = convs[0]->type;
- if (!TYPE_REF_P (ptype)
- || !TYPE_REF_IS_RVALUE (ptype)
+ /* Allow calling a by-value converting constructor even though it
+isn't permitted by the above, because we've allowed it since GCC 5
+(PR58051) and it's allowed in C++20.  But don't call a copy
+constructor.  */
+ if ((TYPE_REF_P (ptype) && !TYPE_REF_IS_RVALUE (ptype))
  || CONVERSION_RANK (convs[0]) > cr_exact)
return error_mark_node;
}
diff --git a/gcc/testsuite/g++.dg/cpp0x/move-return3.C 
b/gcc/testsuite/g++.dg/cpp0x/move-return3.C
new file mode 100644
index 000..c79f0591936
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/move-return3.C
@@ -0,0 +1,23 @@
+// PR c++/91212
+// Test that C++11 implicit move semantics don't call the const copy.
+// { dg-do link }
+
+struct T { int i; };
+
+struct X {
+  X(T&) { }// #1
+  X(const T&); // #2
+};
+
+X
+fn ()
+{
+  T buf;
+  return buf;
+}
+
+int
+main()
+{
+  X c = fn ();
+}

base-commit: d8140b9ed3c0fed041aedaff3fa4a603984ca10f
-- 
2.18.1

Re: [PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-07-29 Thread Joseph Myers

On Tue, 28 Jul 2020, Martin Sebor via Gcc-patches wrote:

> +  /* A list of VLA variable bounds or null if not specified.  */
> +  tree vbchain = NULL_TREE;
> +  if (parm->declarator->kind == cdk_array)

> +   if (pd->kind != cdk_array)
> + break;

> +   /* Skip all constant bounds except the most significant
> +  one.  The interior ones are included in the array type.  */
> +   if (next && next->kind == cdk_array)
> + continue;

Anything working with declarators should typically have logic to skip 
cdk_attrs declarators.

For example, a parameter is declared as an array using [] in that 
declarator if the innermost c_declarator that is not cdk_id or cdk_attrs 
is of kind cdk_array.  (It's the innermost not the outermost because of C 
declarator syntax.)  The array bounds for the parameter array itself (as 
opposed to any other bounds if the parameter is e.g. an array of pointers 
to arrays) are then those in all the cdk_array declarators after the last 
declarator (if any) that's not cdk_array, cdk_attrs or cdk_id (cdk_id only 
comes in the last place).

If e.g. the parameter has the nested cdk_declarator sequence

cdk_function
cdk_pointer
cdk_array
cdk_attrs
cdk_array
cdk_pointer
cdk_array
cdk_attrs
cdk_array
cdk_array
cdk_id

then it's a three-dimensional array of pointers to two-dimensional arrays 
of pointers to functions.

I don't see anything in the tests in this patch to cover this sort of case 
(arrays of pointers, including arrays of pointers to arrays etc.).

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH v2] RS6000 Add testlsbb (Test LSB by Byte) operations

2020-07-29 Thread will schmidt via Gcc-patches

[PATCH] RS6000 Add testlsbb by Byte operations

Hi,

Add support for new instructions to test LSB by Byte.

[v2] Additional updates per feedback.  Including adding _all
to the internal name, typos and cosmetic fixups throughout, extraneous
-mvsx removed from tests.

V2 has completed tests on powerpc64le-unknown-linux-gnu Power8LE, with
other regression tests still in progress on some other powerpc platforms.
OK for trunk? 

Thanks,
-Will

[gcc]

2020-07-29  Will Schmidt  

* config/rs6000/altivec.h (vec_test_lsbb_all_ones): New define.
(vec_test_lsbb_all_zeros): New define.
* config/rs6000/rs6000-builtin.def (BU_P10_VSX_1): New built-in
handling macro.
(XVTLSBB_ZEROS, XVTLSBB_ONES): New builtin defines.
(xvtlsbb_all_zeros, xvtlsbb_all_ones): New builtin overloads.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VEC_XVTLSBB_ZEROS,
P10_BUILTIN_VEC_XVTLSBB_ONES): New altivec_builtin_types entries.
* config/rs6000/rs6000.md (UNSPEC_XVTLSBB):  New unspec.
* config/rs6000/vsx.md (*xvtlsbb_internal): New instruction define.
(xvtlsbbo, xvtlsbbz): New instruction expands.

[testsuite]
* testsuite/gcc.target/powerpc/lsbb-runnable.c: New test.
* testsuite/gcc.target/powerpc/lsbb.c: New test.

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 6c43124..119fb1c 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -491,10 +491,13 @@
 #define vec_cmpnez __builtin_vec_vcmpnez
 
 #define vec_cntlz_lsbb __builtin_vec_vclzlsbb
 #define vec_cnttz_lsbb __builtin_vec_vctzlsbb
 
+#define vec_test_lsbb_all_ones __builtin_vec_xvtlsbb_all_ones
+#define vec_test_lsbb_all_zeros __builtin_vec_xvtlsbb_all_zeros
+
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index f703755..38f859f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1060,10 +1060,18 @@
RS6000_BTM_P10, /* MASK */  \
(RS6000_BTC_ ## ATTR/* ATTR */  \
 | RS6000_BTC_QUATERNARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+#define BU_P10_VSX_1(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_1 (P10_BUILTIN_ ## ENUM,  /* ENUM */  \
+   "__builtin_vsx_" NAME,  /* NAME */  \
+   RS6000_BTM_P10, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_UNARY),   \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
 #define BU_P10_OVERLOAD_1(ENUM, NAME)  \
   RS6000_BUILTIN_1 (P10_BUILTIN_VEC_ ## ENUM,  /* ENUM */  \
"__builtin_vec_" NAME,  /* NAME */  \
RS6000_BTM_P10, /* MASK */  \
(RS6000_BTC_OVERLOADED  /* ATTR */  \
@@ -2734,10 +2742,13 @@ BU_P10V_1 (VSTRIHL, "vstrihl", CONST, vstril_v8hi)
 BU_P10V_1 (VSTRIBR_P, "vstribr_p", CONST, vstrir_p_v16qi)
 BU_P10V_1 (VSTRIHR_P, "vstrihr_p", CONST, vstrir_p_v8hi)
 BU_P10V_1 (VSTRIBL_P, "vstribl_p", CONST, vstril_p_v16qi)
 BU_P10V_1 (VSTRIHL_P, "vstrihl_p", CONST, vstril_p_v8hi)
 
+BU_P10_VSX_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros", CONST, xvtlsbbz)
+BU_P10_VSX_1 (XVTLSBB_ONES, "xvtlsbb_all_ones", CONST, xvtlsbbo)
+
 BU_P10V_1 (MTVSRBM, "mtvsrbm", CONST, vec_mtvsr_v16qi)
 BU_P10V_1 (MTVSRHM, "mtvsrhm", CONST, vec_mtvsr_v8hi)
 BU_P10V_1 (MTVSRWM, "mtvsrwm", CONST, vec_mtvsr_v4si)
 BU_P10V_1 (MTVSRDM, "mtvsrdm", CONST, vec_mtvsr_v2di)
 BU_P10V_1 (MTVSRQM, "mtvsrqm", CONST, vec_mtvsr_v1ti)
@@ -2769,10 +2780,14 @@ BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
 BU_P10_OVERLOAD_1 (VSTRIR, "strir")
 BU_P10_OVERLOAD_1 (VSTRIL, "stril")
 
 BU_P10_OVERLOAD_1 (VSTRIR_P, "strir_p")
 BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p")
+
+BU_P10_OVERLOAD_1 (XVTLSBB_ZEROS, "xvtlsbb_all_zeros")
+BU_P10_OVERLOAD_1 (XVTLSBB_ONES, "xvtlsbb_all_ones")
+
 
 BU_P10_OVERLOAD_1 (MTVSRBM, "mtvsrbm")
 BU_P10_OVERLOAD_1 (MTVSRHM, "mtvsrhm")
 BU_P10_OVERLOAD_1 (MTVSRWM, "mtvsrwm")
 BU_P10_OVERLOAD_1 (MTVSRDM, "mtvsrdm")
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 5ec3f2c..ece8d76 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5679,10 +5679,15 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
   { P10_BUILTIN_VEC_VEXTRACTM, P10_BUILTIN_VEXTRACTMD,
 RS6000_BTI_INTSI, RS6000_BTI_unsigned_V2DI, 0, 0 },
   { P10_BUILTIN_VEC_VEXTRACTM, P10_BUILTIN_VEXTRACTMQ,
 RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, 0, 0 },
 
+ { P10_BUILTIN_VEC_XVTLSBB_ZEROS, P10

Re: [Patch] OpenMP: Handle order(concurrent) clause in gfortran

2020-07-29 Thread Jakub Jelinek via Gcc-patches

On Wed, Jul 29, 2020 at 06:30:16PM +0200, Tobias Burnus wrote:
> Adds 'order(concurrent)'. OpenMP 5.0 also permits it
> for 'loop' but gfortran does not yet support 'loop'.
> 
> (That the argument is passed on to the ME can be
> seen by the testcases as the errors are emitted
> by the ME.)
> 
> OK?

Ok, thanks.

Jakub

[Patch] OpenMP: Handle order(concurrent) clause in gfortran

2020-07-29 Thread Tobias Burnus


Adds 'order(concurrent)'. OpenMP 5.0 also permits it
for 'loop' but gfortran does not yet support 'loop'.

(That the argument is passed on to the ME can be
seen by the testcases as the errors are emitted
by the ME.)

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Handle order(concurrent) clause in gfortran

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Handle order(concurrent).
	* gfortran.h (struct gfc_omp_clauses): Add order_concurrent.
	* openmp.c (enum omp_mask1, OMP_DO_CLAUSES, OMP_SIMD_CLAUSES):
	Add OMP_CLAUSE_ORDER.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses):
	Handle order(concurrent) clause.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/order-3.f90: New test.
	* gfortran.dg/gomp/order-4.f90: New test.

 gcc/fortran/dump-parse-tree.c  |   2 +
 gcc/fortran/gfortran.h |   2 +-
 gcc/fortran/openmp.c   |  12 +-
 gcc/fortran/trans-openmp.c |  12 ++
 gcc/testsuite/gfortran.dg/gomp/order-3.f90 | 227 +
 gcc/testsuite/gfortran.dg/gomp/order-4.f90 |  34 +
 6 files changed, 286 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 2a02bc871bc..71d0e7d00f5 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1552,6 +1552,8 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 fputs (" SEQ", dumpfile);
   if (omp_clauses->independent)
 fputs (" INDEPENDENT", dumpfile);
+  if (omp_clauses->order_concurrent)
+fputs (" ORDER(CONCURRENT)", dumpfile);
   if (omp_clauses->ordered)
 {
   if (omp_clauses->orderedc)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 20cce5cf39b..48b2ab14fdb 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1365,7 +1365,7 @@ typedef struct gfc_omp_clauses
   bool nowait, ordered, untied, mergeable;
   bool inbranch, notinbranch, defaultmap, nogroup;
   bool sched_simd, sched_monotonic, sched_nonmonotonic;
-  bool simd, threads, depend_source;
+  bool simd, threads, depend_source, order_concurrent;
   enum gfc_omp_cancel_kind cancel;
   enum gfc_omp_proc_bind_kind proc_bind;
   struct gfc_expr *safelen_expr;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 16f39a4e086..ec116206a5c 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -766,6 +766,7 @@ enum omp_mask1
   OMP_CLAUSE_NUM_THREADS,
   OMP_CLAUSE_SCHEDULE,
   OMP_CLAUSE_DEFAULT,
+  OMP_CLAUSE_ORDER,
   OMP_CLAUSE_ORDERED,
   OMP_CLAUSE_COLLAPSE,
   OMP_CLAUSE_UNTIED,
@@ -1549,6 +1550,13 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	continue;
 	  break;
 	case 'o':
+	  if ((mask & OMP_CLAUSE_ORDER)
+	  && !c->order_concurrent
+	  && gfc_match ("order ( concurrent )") == MATCH_YES)
+	{
+	  c->order_concurrent = true;
+	  continue;
+	}
 	  if ((mask & OMP_CLAUSE_ORDERED)
 	  && !c->ordered
 	  && gfc_match ("ordered") == MATCH_YES)
@@ -2575,7 +2583,7 @@ cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
| OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_REDUCTION			\
| OMP_CLAUSE_SCHEDULE | OMP_CLAUSE_ORDERED | OMP_CLAUSE_COLLAPSE	\
-   | OMP_CLAUSE_LINEAR)
+   | OMP_CLAUSE_LINEAR | OMP_CLAUSE_ORDER)
 #define OMP_SECTIONS_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
| OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_REDUCTION)
@@ -2583,7 +2591,7 @@ cleanup:
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_LASTPRIVATE		\
| OMP_CLAUSE_REDUCTION | OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_SAFELEN	\
| OMP_CLAUSE_LINEAR | OMP_CLAUSE_ALIGNED | OMP_CLAUSE_SIMDLEN	\
-   | OMP_CLAUSE_IF)
+   | OMP_CLAUSE_IF | OMP_CLAUSE_ORDER)
 #define OMP_TASK_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
| OMP_CLAUSE_SHARED | OMP_CLAUSE_IF | OMP_CLAUSE_DEFAULT		\
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f6a39edf121..076efb03831 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3371,6 +3371,12 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
   omp_clauses = gfc_trans_add_clause (c, omp_clauses);
 }
 
+  if (clauses->order_concurrent)
+{
+  c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_ORDER);
+  omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+}
+
   if (clauses->untied)
 {
   c = build_omp_clause (gfc_get_location (&where), OMP_CLAUSE_UNTIED);
@@ -4970,6 +4976,8 @@ gfc_split_omp_clauses (gfc_code *code,
 	  /* Duplicate collapse.  */
 	  clausesa[GFC_OMP_SPLIT_DISTRIBUTE].collapse
 	= code->ext.omp_clauses->collapse;
+	  clausesa[GFC_OMP_SPLIT_DISTRIBUTE].order_concurrent
+	= code->ext.omp_clauses->order_concurrent;
 	}
   if (mask &

Re: [PATCH 2/2] Tune memcpy and memset for Zen cores.

2020-07-29 Thread Jan Hubicka

> Based on the collected numbers in PR95435, I suggest the following
> tuning changes:
> 
> gcc/ChangeLog:
> 
>   PR target/95435
>   * config/i386/x86-tune-costs.h: Use libcall for large sizes for
>   -m32. Start using libcall from 128+ bytes.

OK,
thanks!
Honza
> ---
>  gcc/config/i386/x86-tune-costs.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/i386/x86-tune-costs.h 
> b/gcc/config/i386/x86-tune-costs.h
> index 1169178433f..3207404e514 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1314,20 +1314,20 @@ static stringop_algs znver1_memcpy[2] = {
>/* 32-bit tuning.  */
>{libcall, {{6, loop, false},
>{14, unrolled_loop, false},
> -  {-1, rep_prefix_4_byte, false}}},
> +  {-1, libcall, false}}},
>/* 64-bit tuning.  */
>{libcall, {{16, loop, false},
> -  {8192, rep_prefix_8_byte, false},
> +  {128, rep_prefix_8_byte, false},
>{-1, libcall, false;
>  static stringop_algs znver1_memset[2] = {
>/* 32-bit tuning.  */
>{libcall, {{8, loop, false},
>{24, unrolled_loop, false},
> -  {2048, rep_prefix_4_byte, false},
> +  {128, rep_prefix_4_byte, false},
>{-1, libcall, false}}},
>/* 64-bit tuning.  */
>{libcall, {{48, unrolled_loop, false},
> -  {8192, rep_prefix_8_byte, false},
> +  {128, rep_prefix_8_byte, false},
>{-1, libcall, false;
>  struct processor_costs znver1_cost = {
>{
> @@ -1460,7 +1460,7 @@ static stringop_algs znver2_memcpy[2] = {
>/* 32-bit tuning.  */
>{libcall, {{6, loop, false},
>{14, unrolled_loop, false},
> -  {-1, rep_prefix_4_byte, false}}},
> +  {-1, libcall, false}}},
>/* 64-bit tuning.  */
>{libcall, {{16, loop, false},
>{64, rep_prefix_4_byte, false},
> @@ -1469,7 +1469,7 @@ static stringop_algs znver2_memset[2] = {
>/* 32-bit tuning.  */
>{libcall, {{8, loop, false},
>{24, unrolled_loop, false},
> -  {2048, rep_prefix_4_byte, false}
> +  {128, rep_prefix_4_byte, false},
>{-1, libcall, false}}},
>/* 64-bit tuning.  */
>{libcall, {{24, rep_prefix_4_byte, false},
> -- 
> 2.26.2
>

Re: [PATCH 1/2] Re-format zen memcpy/memset costs.

2020-07-29 Thread Jan Hubicka

> The patch improves readability of the memcpy and memset
> expansion strategies.
> 
> gcc/ChangeLog:
> 
>   * config/i386/x86-tune-costs.h: Change code formatting.

OK,
thanks!
Honza
> ---
>  gcc/config/i386/x86-tune-costs.h | 38 +++-
>  1 file changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/i386/x86-tune-costs.h 
> b/gcc/config/i386/x86-tune-costs.h
> index c73917e5a62..1169178433f 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -1311,14 +1311,23 @@ const struct processor_costs bdver_cost = {
>  very small blocks it is better to use loop.  For large blocks, libcall
>  can do nontemporary accesses and beat inline considerably.  */
>  static stringop_algs znver1_memcpy[2] = {
> -  {libcall, {{6, loop, false}, {14, unrolled_loop, false},
> +  /* 32-bit tuning.  */
> +  {libcall, {{6, loop, false},
> +  {14, unrolled_loop, false},
>{-1, rep_prefix_4_byte, false}}},
> -  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
> +  /* 64-bit tuning.  */
> +  {libcall, {{16, loop, false},
> +  {8192, rep_prefix_8_byte, false},
>{-1, libcall, false;
>  static stringop_algs znver1_memset[2] = {
> -  {libcall, {{8, loop, false}, {24, unrolled_loop, false},
> -  {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
> -  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
> +  /* 32-bit tuning.  */
> +  {libcall, {{8, loop, false},
> +  {24, unrolled_loop, false},
> +  {2048, rep_prefix_4_byte, false},
> +  {-1, libcall, false}}},
> +  /* 64-bit tuning.  */
> +  {libcall, {{48, unrolled_loop, false},
> +  {8192, rep_prefix_8_byte, false},
>{-1, libcall, false;
>  struct processor_costs znver1_cost = {
>{
> @@ -1448,14 +1457,23 @@ struct processor_costs znver1_cost = {
>  very small blocks it is better to use loop.  For large blocks, libcall
>  can do nontemporary accesses and beat inline considerably.  */
>  static stringop_algs znver2_memcpy[2] = {
> -  {libcall, {{6, loop, false}, {14, unrolled_loop, false},
> +  /* 32-bit tuning.  */
> +  {libcall, {{6, loop, false},
> +  {14, unrolled_loop, false},
>{-1, rep_prefix_4_byte, false}}},
> -  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
> +  /* 64-bit tuning.  */
> +  {libcall, {{16, loop, false},
> +  {64, rep_prefix_4_byte, false},
>{-1, libcall, false;
>  static stringop_algs znver2_memset[2] = {
> -  {libcall, {{8, loop, false}, {24, unrolled_loop, false},
> -  {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
> -  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false},
> +  /* 32-bit tuning.  */
> +  {libcall, {{8, loop, false},
> +  {24, unrolled_loop, false},
> +  {2048, rep_prefix_4_byte, false}
> +  {-1, libcall, false}}},
> +  /* 64-bit tuning.  */
> +  {libcall, {{24, rep_prefix_4_byte, false},
> +  {128, rep_prefix_8_byte, false},
>{-1, libcall, false;
>  struct processor_costs znver2_cost = {
> -- 
> 2.26.2
> 
>

Re: [PATCH] Don't make -gsplit-dwarf imply -g

2020-07-29 Thread David Blaikie via Gcc-patches

On Tue, Jul 28, 2020, 10:24 PM Fāng-ruì Sòng  wrote:

> On Thu, May 14, 2020 at 12:16 AM Richard Biener
>  wrote:
> >
> > On Wed, May 13, 2020 at 5:50 PM Fangrui Song  wrote:
> > >
> > > On 2020-05-13, Eric Botcazou wrote:
> > > >> Did I mention I dislike -fsplit-dwarf? ;)
> > > >
> > > >Seconded, this will be confusing for almost all users.  Since the
> option only
> > > >affects debug info generation, it should be prefixed with 'g' in any
> case.
> > >
> > > Updating the semantics of -gsplit-dwarf is actually my favorite as
> > > well:)
> > >
> > > -gsplit-dwarf is not common. Many uses have separate -g. Let's change
> it.
> > >
> > > Attached the patch.
> >
> > OK if there are no objections over the weekend.  I guess this change
> needs
> > documenting in gcc-11/changes.html (which probably does not exist yet,
> > will take care of that).
> >
> > Thanks,
> > Richard.
> >
> > >
> > > (I also wish -gdwarf-5 did not imply -g but the ship may have shipped.)
>
> Richard, are you still going to make this change?
> (If you do it, I'll happy to ask folks to move forward with the clang
> patch https://reviews.llvm.org/D80391 )
>
> I've added a note from the original implementer (Cary Coutant) here:
> https://gcc.gnu.org/pipermail/gcc/2020-July/233074.html
>
> On the clang side, I don't think anyone has expressed that they would
> be upset by a behavior change.
>

I think I've repeatedly expressed this. My concerns are both the break in
backwards compatibility and the general direction of -g flags that don't
enable debug info emission which at least myself and a couple of other long
time dwarf contributors (Eric Christopher and Cary Contant) have been
surprised by/not the semantics that many debug info flags have.



Several folks have expressed that the semantics are complex, though,
> e.g. https://github.com/ccache/ccache/issues/393
>

Re: [PATCH v3] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-29 Thread Segher Boessenkool

Hi!

On Wed, Jul 29, 2020 at 05:02:04PM +0800, Jojo R wrote:
> 在 2020年7月24日 +0800 PM9:19，Segher Boessenkool ，写道：
> > On Fri, Jul 24, 2020 at 12:03:16PM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > It will also produce different build results depending on the build host
> > > which is even more undesirable.
> >
> > Yeah, it has to be the same number everywhere. Something high enough
> > that bigger machines benefit (so 100 or so), but not so high that the
> > overhead increases too much... It shouldn't be quite as high for
> > smaller backends... so some fixed ratio of number of define_insns
> > perhaps, something like that?
> 
> We can get bigger benefit with much more huge file in this simple solution,
> Indeed, it does not cover smaller backends.
> 
> Yes, we should consider to target all backends, I thinks there is no high 
> enough
> benefit for smaller backends, is it necessary ?

It should work everywhere.  It does not matter much at all how much it
can speed up tiny backends, but there is a large spread in how big the
bigger backends are, too.  So either we make it adjust itself, or we
will have to tune it manually, and continuously.

> > Something else. Does this make the generated compiler slower? It will
> > at least potentially have fewer inlining opportunities, but does that
> > matter?
> 
> Yes, some machine will take > 30 minutes in my testcase,
> It’s really matter in gcc development stage :(

That wasn't really my question (or I don't understand your answer).


Segher

[PATCH] OpenACC: Support GOMP_MAP_ZERO_LEN_ARRAY_SECTION

2020-07-29 Thread Andrew Stubbs

This patch adds support for zero-length arrays in OpenACC data 
transfers. Previously, trying to use an array section with zero length 
would cause a fatal error at runtime.


This patch requires that my other patch "OpenACC: Separate enter/exit 
data APIs" is already applied.


Unfortunately, because the reference counting is handled by the code 
shared with OpenMP, and because the semantics there appear to be a 
little bit different (or broken?), I've been unable to get 
acc_is_present to return true for zero-length arrays created by pragmas 
(those created via acc_create are fine). That issue will require a 
another patch, probably with more invasive changes.


The test case should cover all the main uses of zero-length arrays, and 
I've added an xfail message to highlight the known deficiency.


OK for mainline (and backport to OG10)?

Andrew
OpenACC: Support GOMP_MAP_ZERO_LEN_ARRAY_SECTION

The shared code with OpenMP use special map kinds for zero-length arrays
(detected at runtime), but the OpenACC specific code doesn't know what to do
with them.

This patch implements support for GOMP_MAP_ZERO_LEN_ARRAY_SECTION and
GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION throughout.

The last remaining problem case -- acc_is_present not reporting the array
present -- is highlighted in the testcase so it doesn't get forgotten, but will
need to be solved another time.

libgomp/ChangeLog:

	* libgomp.h (splay_compare): Ensure that distinct zero-length mappings
	aren't confused.
	* oacc-mem.c (acc_is_present): Don't reject zero-sized queries.
	(goacc_enter_datum): Likewise.
	(update_dev_host): Don't actual copy zero-length arrays.
	(goacc_enter_data_internal): Allow tgt to be null.
	(goacc_exit_data_internal): Handle GOMP_MAP_ZERO_LEN_ARRAY_SECTION and
	GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION.
	* oacc-parallel.c (GOACC_update): Handle
	  GOMP_MAP_ZERO_LEN_ARRAY_SECTION.
	* testsuite/libgomp.oacc-c/zerolengtharray.c: New test.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index f9080e9f70f..e0426acdbfe 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1026,7 +1026,8 @@ struct splay_tree_key_s {
 static inline int
 splay_compare (splay_tree_key x, splay_tree_key y)
 {
-  if (x->host_start == x->host_end
+  if (x->host_start == y->host_start
+  && x->host_start == x->host_end
   && y->host_start == y->host_end)
 return 0;
   if (x->host_end <= y->host_start)
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 45162d24786..965c81ddbd7 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -322,7 +322,7 @@ acc_is_present (void *h, size_t s)
 {
   splay_tree_key n;
 
-  if (!s || !h)
+  if (!h)
 return 0;
 
   goacc_lazy_initialize ();
@@ -534,7 +534,7 @@ goacc_enter_datum (void **hostaddrs, size_t *sizes, void *kinds, int async)
   void *d;
   splay_tree_key n;
 
-  if (!hostaddrs[0] || !sizes[0])
+  if (!hostaddrs[0])
 gomp_fatal ("[%p,+%d] is a bad range", hostaddrs[0], (int) sizes[0]);
 
   goacc_lazy_initialize ();
@@ -849,6 +849,10 @@ update_dev_host (int is_dev, void *h, size_t s, int async)
   if (h == NULL)
 return;
 
+  /* Zero length arrays registered via gomp_map_vars don't show as mapped.  */
+  if (s == 0)
+return;
+
   acc_prof_info prof_info;
   acc_api_info api_info;
   bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info);
@@ -1203,16 +1207,17 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	= gomp_map_vars_async (acc_dev, aq, groupnum, &hostaddrs[i], NULL,
    &sizes[i], &kinds[i], true,
    GOMP_MAP_VARS_ENTER_DATA);
-	  assert (tgt);
 
 	  gomp_mutex_lock (&acc_dev->lock);
 
-	  for (size_t j = 0; j < tgt->list_count; j++)
-	{
-	  n = tgt->list[j].key;
-	  if (n && !tgt->list[j].is_attach)
-		n->dynamic_refcount++;
-	}
+	  /* TGT can be null for zero-length arrays.  */
+	  if (tgt)
+	for (size_t j = 0; j < tgt->list_count; j++)
+	  {
+		n = tgt->list[j].key;
+		if (n && !tgt->list[j].is_attach)
+		  n->dynamic_refcount++;
+	  }
 	}
 
   i = group_last;
@@ -1276,6 +1281,8 @@ goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_DELETE:
 	case GOMP_MAP_RELEASE:
+	case GOMP_MAP_ZERO_LEN_ARRAY_SECTION:
+	case GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION:
 	  {
 	struct splay_tree_key_s cur_node;
 	size_t size;
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index bca31b51427..d3277e60404 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -647,6 +647,7 @@ GOACC_update (int flags_m, size_t mapnum,
 	{
 	case GOMP_MAP_POINTER:
 	case GOMP_MAP_TO_PSET:
+	case GOMP_MAP_ZERO_LEN_ARRAY_SECTION:
 	  break;
 
 	case GOMP_MAP_ALWAYS_POINTER:
diff --git a/libgomp/testsuite/libgomp.oacc-c/zerolengtharray.c b/libgomp/testsuite/libgomp.oacc-c/zerolengtharray.c
new file mode 100644
index 000..cae102cb580
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c/zerolengtharray.

[PATCH] OpenACC: Separate enter/exit data APIs

2020-07-29 Thread Andrew Stubbs

This patch does not implement anything new, but simply separates OpenACC 
'enter data' and 'exit data' into two libgomp API functions.  The 
original API name is kept for backward compatibility, but no longer 
referenced by the compiler.


The previous implementation assumed that it would always be possible to 
infer which kind of pragma it was dealing with from the context, but 
there are a few exceptions, and I want to add one more: zero-length arrays.


By cleaning this up I will be free to add the new feature without the 
reference counting getting broken.


OK for mainline (and backport to OG10)?

Andrew
OpenACC: Separate enter/exit data APIs

Move the OpenACC enter and exit data directives from using a single builtin
to having one each.  For most purposes it was easy to tell which was which,
from the directives given, but there are some exceptions.  In particular,
zero-length array copies are indistiguishable, but we still want reference
counting to work.

gcc/ChangeLog:

	* gimple-pretty-print.c (dump_gimple_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	* gimple.h (enum gf_mask): Likewise.
	(is_gimple_omp_oacc): Likewise.
	* gimplify.c (gimplify_omp_target_update): Likewise.
	* omp-builtins.def (BUILT_IN_GOACC_ENTER_EXIT_DATA): Delete.
	(BUILT_IN_GOACC_ENTER_DATA): Add new.
	(BUILT_IN_GOACC_EXIT_DATA): Add new.
	* omp-expand.c (expand_omp_target): Replace
	GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA with
	GF_OMP_TARGET_KIND_OACC_ENTER_DATA and
	GF_OMP_TARGET_KIND_OACC_EXIT_DATA.
	(build_omp_regions_1): Likewise.
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (check_omp_nesting_restrictions): Likewise.
	(lower_omp_target): Likewise.

libgomp/ChangeLog:

	* libgomp.map: Add GOACC_enter_data and GOACC_exit_data.
	* libgomp_g.h (GOACC_enter_exit_data): Delete.
	(GOACC_enter_data): New prototype.
	(GOACC_exit_data) New prototype.:
	* oacc-mem.c (GOACC_enter_exit_data): Move most of the content ...
	(GOACC_enter_exit_data_internal): ... here.
	(GOACC_enter_data): New function.
	(GOACC_exit_data) New function.:
	* oacc-parallel.c (GOACC_declare): Replace GOACC_enter_exit_data with
	  GOACC_enter_data and GOACC_exit_data.

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index e05b770138e..a4dfc493588 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1694,8 +1694,11 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
 case GF_OMP_TARGET_KIND_OACC_UPDATE:
   kind = " oacc_update";
   break;
-case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
-  kind = " oacc_enter_exit_data";
+case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+  kind = " oacc_enter_data";
+  break;
+case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
+  kind = " oacc_exit_data";
   break;
 case GF_OMP_TARGET_KIND_OACC_DECLARE:
   kind = " oacc_declare";
diff --git a/gcc/gimple.h b/gcc/gimple.h
index d64c47a7d0d..681000d2ae4 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -180,9 +180,10 @@ enum gf_mask {
 GF_OMP_TARGET_KIND_OACC_SERIAL = 7,
 GF_OMP_TARGET_KIND_OACC_DATA = 8,
 GF_OMP_TARGET_KIND_OACC_UPDATE = 9,
-GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
+GF_OMP_TARGET_KIND_OACC_ENTER_DATA = 10,
 GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
 GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
+GF_OMP_TARGET_KIND_OACC_EXIT_DATA = 13,
 GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
 GF_OMP_TEAMS_HOST		= 1 << 1,
 
@@ -6587,7 +6588,8 @@ is_gimple_omp_oacc (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_UPDATE:
-	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
+	case GF_OMP_TARGET_KIND_OACC_ENTER_DATA:
+	case GF_OMP_TARGET_KIND_OACC_EXIT_DATA:
 	case GF_OMP_TARGET_KIND_OACC_DECLARE:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	  return true;
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 15dfee903ab..6948c1ccdbb 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12950,8 +12950,11 @@ gimplify_omp_target_update (tree *expr_p, gimple_seq *pre_p)
   switch (TREE_CODE (expr))
 {
 case OACC_ENTER_DATA:
+  kind = GF_OMP_TARGET_KIND_OACC_ENTER_DATA;
+  ort = ORT_ACC;
+  break;
 case OACC_EXIT_DATA:
-  kind = GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA;
+  kind = GF_OMP_TARGET_KIND_OACC_EXIT_DATA;
   ort = ORT_ACC;
   break;
 case OACC_UPDATE:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index f461d60e52b..ab45eecb752 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -35,7 +35,10 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_START, "GOACC_data_start",
 		   BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, "GOACC_data_end",
 		   BT_FN_VOID, ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data",
+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENT

Re: [PATCH] Use exact=false for vec_safe_grow{,_cleared} functions.

2020-07-29 Thread Richard Biener via Gcc-patches

On Tue, Jul 28, 2020 at 12:23 PM Martin Liška  wrote:
>
> On 7/27/20 1:31 PM, Richard Biener wrote:
> > No, add gro*_exact variants and replace existing ones with this, then switch
> > to exact = false for the non-_exact variants.  Or add a exact=false argument
> > to all of them and make all existing calls explicitly passing true.
>
> -EBITLAZY

Indeed you are - you're missing to update a lot of vec_safe_grow[_cleared]
calls.  Esp. the lto-streamer ones warrant a , true arg since we explicitly
streamed the number of elements.

How did you select the ones you updated and those you did not? (which
is why I asked for a boiler-plate replacement of , true first, then making
the parameter defaulted to false and adjusting those cases you found)

> Anyway, I prepared a patch that adds exact=false for all the grow functions.
> Apart from selective scheduling, all other consumers seems happy. And I 
> removed
> the artificial capacity growth calculations at places mentioned by Honza.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin

New Japanese PO file for 'gcc' (version 10.2.0)

2020-07-29 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Japanese team of translators.  The file is available at:

https://translationproject.org/latest/gcc/ja.po

(This file, 'gcc-10.2.0.ja.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH][www] Document -gsplit-dwarf behavioral change

2020-07-29 Thread Richard Biener

This mentions that -gsplit-dwarf now requires -g to enable debug info
generation.

Pushed.

* gcc-11/changes.html: Mention -gsplit-dwarf behavioral change.
---
 htdocs/gcc-11/changes.html | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index d3fb9882..96955345 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -43,6 +43,9 @@ a work-in-progress.
   See https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546494.html";>the
   patch, particularly its textual description, for more
   details about the changes.
+
+  -gsplit-dwarf no longer enables debug info generation
+  on its own but requires a separate -g for this.
 
 
 
-- 
2.26.2

Re: [PATCH] Don't make -gsplit-dwarf imply -g

2020-07-29 Thread Richard Biener via Gcc-patches

On Wed, Jul 29, 2020 at 7:24 AM Fāng-ruì Sòng  wrote:
>
> On Thu, May 14, 2020 at 12:16 AM Richard Biener
>  wrote:
> >
> > On Wed, May 13, 2020 at 5:50 PM Fangrui Song  wrote:
> > >
> > > On 2020-05-13, Eric Botcazou wrote:
> > > >> Did I mention I dislike -fsplit-dwarf? ;)
> > > >
> > > >Seconded, this will be confusing for almost all users.  Since the option 
> > > >only
> > > >affects debug info generation, it should be prefixed with 'g' in any 
> > > >case.
> > >
> > > Updating the semantics of -gsplit-dwarf is actually my favorite as
> > > well:)
> > >
> > > -gsplit-dwarf is not common. Many uses have separate -g. Let's change it.
> > >
> > > Attached the patch.
> >
> > OK if there are no objections over the weekend.  I guess this change needs
> > documenting in gcc-11/changes.html (which probably does not exist yet,
> > will take care of that).
> >
> > Thanks,
> > Richard.
> >
> > >
> > > (I also wish -gdwarf-5 did not imply -g but the ship may have shipped.)
>
> Richard, are you still going to make this change?

Sorry - I thought you'd commit the patch.  I've now done that and will
amend changes.html.

Richard.

> (If you do it, I'll happy to ask folks to move forward with the clang
> patch https://reviews.llvm.org/D80391 )
>
> I've added a note from the original implementer (Cary Coutant) here:
> https://gcc.gnu.org/pipermail/gcc/2020-July/233074.html
>
> On the clang side, I don't think anyone has expressed that they would
> be upset by a behavior change.
> Several folks have expressed that the semantics are complex, though,
> e.g. https://github.com/ccache/ccache/issues/393

Re: [PATCH] Require CET support only for the final GCC build

2020-07-29 Thread Richard Biener

On Wed, 29 Jul 2020, Richard Biener wrote:

> On Wed, 29 Jul 2020, H.J. Lu wrote:
> 
> > On Wed, Jul 29, 2020 at 4:01 AM Richard Biener
> >  wrote:
> > >
> > > On Fri, Jul 17, 2020 at 5:32 PM H.J. Lu via Gcc-patches
> > >  wrote:
> > > >
> > > > On Fri, Jul 17, 2020 at 6:27 AM Richard Biener  
> > > > wrote:
> > > > >
> > > > > On Fri, 17 Jul 2020, H.J. Lu wrote:
> > > > >
> > > > > > On Fri, Jul 17, 2020 at 12:08 AM Richard Biener  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, 15 Jul 2020, Joseph Myers wrote:
> > > > > > >
> > > > > > > > On Wed, 15 Jul 2020, Richard Biener wrote:
> > > > > > > >
> > > > > > > > > But note one of the issues is that when not cross-compiling 
> > > > > > > > > we're
> > > > > > > > > using a single libiberty for target and host objects (likewise
> > > > > > > >
> > > > > > > > There shouldn't be a target libiberty, since commit
> > > > > > > > 8499116aa30a46993deff5acf73985df6b16fb8b (re PR 
> > > > > > > > regression/47836 (Some
> > > > > > > > Cross Compiler can't build target-libiberty or target-zlib), 
> > > > > > > > Wed Jun 22
> > > > > > > > 19:40:45 2011 +).  If something is causing target libiberty 
> > > > > > > > to be
> > > > > > > > built, that's a bug that should be fixed.
> > > > > > > >
> > > > > > > > > That said, giving configury an idea whether it configures for
> > > > > > > > > the host, the target or the build would be required here - 
> > > > > > > > > Joseph,
> > > > > > > > > is there an existing mechanism for example libiberty can use
> > > > > > > > > here?
> > > > > > > >
> > > > > > > > Makefile.def has some settings specific to host or build, e.g.
> > > > > > > >
> > > > > > > > build_modules= { module= libcpp;
> > > > > > > >  extra_configure_flags='--disable-nls 
> > > > > > > > am_cv_func_iconv=no';};
> > > > > > > >
> > > > > > > > or
> > > > > > > >
> > > > > > > > host_modules= { module= libiberty; bootstrap=true;
> > > > > > > > 
> > > > > > > > extra_configure_flags='@extra_host_libiberty_configure_flags@';};
> > > > > > >
> > > > > > > Ah, OK.  Looks like we should be able to add a
> > > > > > > @extra_target_cet_configure_flags@, funnel that to the 
> > > > > > > target_modules
> > > > > > > and keep CET disabled by default in the modules configury.
> > > > > > >
> > > > > > > Similarly (if HJ is correct) we'd need to add
> > > > > > > @extra_{host,build}_cet_configure_flags@ for the purpose of 
> > > > > > > lto-plugin
> > > > > > > which only has a host module (and for bootstrap host == build, so 
> > > > > > > it's
> > > > > > > shared there but we still have separate libiberties for 
> > > > > > > host/build...)
> > > > > > >
> > > > > >
> > > > > > We need -fcf-protection only on object files which will be dlopened 
> > > > > > on
> > > > > > CET enabled build and host.
> > > > >
> > > > > Why is there a distinction between dlopen and execution?  IIRC
> > > > > ld falls back to non-CET operation when dlopening a non-CET shared 
> > > > > object?
> > > >
> > > > BTW, ld.so refuses to dlopen a legacy shared object after CET has been 
> > > > enabled.
> > > > This behavior can be controlled when configuring glibc:
> > > >
> > > > '--enable-cet'
> > > > '--enable-cet=permissive'
> > > >  Enable Intel Control-flow Enforcement Technology (CET) support.
> > > >  When the GNU C Library is built with '--enable-cet' or
> > > >  '--enable-cet=permissive', the resulting library is protected with
> > > >  indirect branch tracking (IBT) and shadow stack (SHSTK).  When CET
> > > >  is enabled, the GNU C Library is compatible with all existing
> > > >  executables and shared libraries.  This feature is currently
> > > >  supported on i386, x86_64 and x32 with GCC 8 and binutils 2.29 or
> > > >  later.  Note that when CET is enabled, the GNU C Library requires
> > > >  CPUs capable of multi-byte NOPs, like x86-64 processors as well as
> > > >  Intel Pentium Pro or newer.  With '--enable-cet', it is an error to
> > > >  dlopen a non CET enabled shared library in CET enabled application.
> > > >  With '--enable-cet=permissive', CET is disabled when dlopening a
> > > >  non CET enabled shared library in CET enabled application.
> > >
> > > So getting back to this one of the issues is that --enable-cet is used
> > > for both GCC_CET_FLAGS and GCC_CET_HOST_FLAGS where
> > > for the host flag part I'd use --enable-cet=auto but for the target 
> > > library
> > > part I definitely want to know if --enable-cet cannot be honored.
> > >
> > > Your current patch would still prohibit a non-bootstrap build with a host
> > > compiler not supporting CET and requesting CET enabled target libraries,
> > > thus
> > >
> > > ../configure --enable-cet --disable-bootstrap
> > >
> > > would fail.  Shouldn't we - for the host part - simply treat 'yes' equal
> > > to 'auto'?  If not, then we should have --enable-cet-host.  Which would
> > > be somewhat misleading since cc1 i

Re: [PATCH 0/2] cpp: fix __has_include in traditional mode

2020-07-29 Thread Tiziano Müller


Hi Nathan,

On 29.07.20 14:52, Nathan Sidwell wrote:

On 7/28/20 3:28 AM, Tiziano Müller wrote:

Hi there,

this patch intends to fix the regression introduced in gcc 10 with 
__has_include,

see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95889


Thanks, do you have write access?  I don't see your name in MAINTAINERS.


No, I don't.

[for avoidance of doubt, this is sufficiently restricted I'm not 
concerned with copyright assignment, there is only one way to write this 
change]


Me neither, just glad to get this fixed ;-)

Best,
Tiziano

--
Tiziano Müller
University of Zurich
Department of Chemistry
Winterthurerstrasse 190
CH-8057 Zürich

Tel: +41 44 63 54234
www.chem.uzh.ch
tiziano.muel...@chem.uzh.ch

RE: [PATCH v2][GCC] arm: Enable no-writeback vldr.16/vstr.16.

2020-07-29 Thread Kyrylo Tkachov

Hi Joe,

> -Original Message-
> From: Gcc-patches  On Behalf Of Joe
> Ramsay
> Sent: 29 July 2020 11:45
> To: François Dumont via Gcc-patches 
> Subject: [PATCH v2][GCC] arm: Enable no-writeback vldr.16/vstr.16.
> 
> Hi,
> 
> There was previously no way to specify that a register operand cannot
> have any writeback modifiers, and as a result the argument to vldr.16
> and vstr.16 could be erroneously output with post-increment. This
> change adds a constraint which forbids all writeback, and
> selects it in the relevant case for vldr.16 and vstr.16
> 
> Bootstrapped on arm-linux, gcc and CMSIS-DSP testsuites are clean.
> Is this patch OK for trunk? If yes, please commit on my behalf as I don't
> have commit rights.
> 

Thanks, I've pushed this to master for you.
For future patches can you please apply for write-after-approval commit access 
using the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing my 
email from the MAINTAINERS file as sponsor.

Kyrill


> Thanks,
> Joe
> 
> gcc/ChangeLog:
> 
> 2020-05-20  Joe Ramsay  
> 
> * config/arm/arm-protos.h (arm_coproc_mem_operand_no_writeback):
> Declare prototype.
> (arm_mve_mode_and_operands_type_check): Declare prototype.
> * config/arm/arm.c (arm_coproc_mem_operand): Refactor to use
> _arm_coproc_mem_operand.
> (arm_coproc_mem_operand_wb): New function to cover full, limited
> and no writeback.
> (arm_coproc_mem_operand_no_writeback): New constraint for
> memory operand with no writeback.
> (arm_print_operand): Extend 'E' specifier for memory operand that does
> not support
> writeback.
> (arm_mve_mode_and_operands_type_check): New constraint check for
> MVE memory operands.
> * config/arm/constraints.md: Add Uj constraint for VFP vldr.16 and
> vstr.16.
> * config/arm/vfp.md (*mov_load_vfp_hf16): New pattern for vldr.16.
> (*mov_store_vfp_hf16): New pattern for vstr.16.
> (*mov_vfp_16): Remove MVE moves.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-05-20  Joe Ramsay  
> 
> * gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: New test.
> ---
>  gcc/config/arm/arm-protos.h|  3 +
>  gcc/config/arm/arm.c   | 74 
> ++
>  gcc/config/arm/constraints.md  |  7 ++
>  gcc/config/arm/vfp.md  | 26 +---
>  .../arm/mve/intrinsics/mve-vldstr16-no-writeback.c | 17 +
>  5 files changed, 105 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-
> vldstr16-no-writeback.c
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 33d162c..e811da4 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -115,8 +115,11 @@ extern enum reg_class
> coproc_secondary_reload_class (machine_mode, rtx,
>  extern bool arm_tls_referenced_p (rtx);
> 
>  extern int arm_coproc_mem_operand (rtx, bool);
> +extern int arm_coproc_mem_operand_no_writeback (rtx);
> +extern int arm_coproc_mem_operand_wb (rtx, int);
>  extern int neon_vector_mem_operand (rtx, int, bool);
>  extern int mve_vector_mem_operand (machine_mode, rtx, bool);
> +bool arm_mve_mode_and_operands_type_check (machine_mode, rtx, rtx);
>  extern int neon_struct_mem_operand (rtx);
> 
>  extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 6b7ca82..63e052f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -13217,13 +13217,14 @@ neon_element_bits (machine_mode mode)
>  /* Predicates for `match_operand' and `match_operator'.  */
> 
>  /* Return TRUE if OP is a valid coprocessor memory address pattern.
> -   WB is true if full writeback address modes are allowed and is false
> +   WB level is 2 if full writeback address modes are allowed, 1
> if limited writeback address modes (POST_INC and PRE_DEC) are
> -   allowed.  */
> +   allowed and 0 if no writeback at all is supported.  */
> 
>  int
> -arm_coproc_mem_operand (rtx op, bool wb)
> +arm_coproc_mem_operand_wb (rtx op, int wb_level)
>  {
> +  gcc_assert (wb_level == 0 || wb_level == 1 || wb_level == 2);
>rtx ind;
> 
>/* Reject eliminable registers.  */
> @@ -13256,16 +13257,18 @@ arm_coproc_mem_operand (rtx op, bool wb)
> 
>/* Autoincremment addressing modes.  POST_INC and PRE_DEC are
>   acceptable in any case (subject to verification by
> - arm_address_register_rtx_p).  We need WB to be true to accept
> + arm_address_register_rtx_p).  We need full writeback to accept
> + PRE_INC and POST_DEC, and at least restricted writeback for
>   PRE_INC and POST_DEC.  */
> -  if (GET_CODE (ind) == POST_INC
> -  || GET_CODE (ind) == PRE_DEC
> -  || (wb
> -   && (GET_CODE (ind) == PRE_INC
> -   || GET_CODE (ind) == POST_DEC)))
> +  if (wb_level > 0
> +  && (GET_CODE (ind) == POST_INC
> +   || GET_CO

Re: [Patch] OpenMP: Permit in Fortran omp target data without map

2020-07-29 Thread Jakub Jelinek via Gcc-patches

On Wed, Jul 29, 2020 at 02:57:25PM +0200, Tobias Burnus wrote:
> I have missed this OpenMP 5 feature already a couple of
> times myself ...
> (The other related OpenMP 5 feature is that 'map(x) use_device_ptr(x)'
> is permitted on the same directive; but that already works and
> is tested for in the large use_device*.f90 libgomp tests.)
> 
> OK?

Ok, thanks.

Jakub

[Patch] OpenMP: Permit in Fortran omp target data without map

2020-07-29 Thread Tobias Burnus


I have missed this OpenMP 5 feature already a couple of
times myself ...
(The other related OpenMP 5 feature is that 'map(x) use_device_ptr(x)'
is permitted on the same directive; but that already works and
is tested for in the large use_device*.f90 libgomp tests.)

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Permit in Fortran omp target data without map

gcc/fortran/ChangeLog:

	* openmp.c (resolve_omp_clauses): Permit 'omp target data' without
	map if use_device_{addr,ptr} is present.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/map-3.f90: New test.
	* gfortran.dg/gomp/map-4.f90: New test.

 gcc/fortran/openmp.c | 12 +++---
 gcc/testsuite/gfortran.dg/gomp/map-3.f90 | 38 
 gcc/testsuite/gfortran.dg/gomp/map-4.f90 |  7 ++
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 0fd998839b2..16f39a4e086 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5330,17 +5330,23 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
   if (omp_clauses->depend_source && code->op != EXEC_OMP_ORDERED)
 gfc_error ("SOURCE dependence type only allowed "
 	   "on ORDERED directive at %L", &code->loc);
-  if (!openacc && code && omp_clauses->lists[OMP_LIST_MAP] == NULL)
+  if (!openacc
+  && code
+  && omp_clauses->lists[OMP_LIST_MAP] == NULL
+  && omp_clauses->lists[OMP_LIST_USE_DEVICE_PTR] == NULL
+  && omp_clauses->lists[OMP_LIST_USE_DEVICE_ADDR] == NULL)
 {
   const char *p = NULL;
   switch (code->op)
 	{
-	case EXEC_OMP_TARGET_DATA: p = "TARGET DATA"; break;
 	case EXEC_OMP_TARGET_ENTER_DATA: p = "TARGET ENTER DATA"; break;
 	case EXEC_OMP_TARGET_EXIT_DATA: p = "TARGET EXIT DATA"; break;
 	default: break;
 	}
-  if (p)
+  if (code->op == EXEC_OMP_TARGET_DATA)
+	gfc_error ("TARGET DATA must contain at least one MAP, USE_DEVICE_PTR, "
+		   "or USE_DEVICE_ADDR clause at %L", &code->loc);
+  else if (p)
 	gfc_error ("%s must contain at least one MAP clause at %L",
 		   p, &code->loc);
 }
diff --git a/gcc/testsuite/gfortran.dg/gomp/map-3.f90 b/gcc/testsuite/gfortran.dg/gomp/map-3.f90
new file mode 100644
index 000..13f63647bda
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/map-3.f90
@@ -0,0 +1,38 @@
+! { dg-additional-options "-fdump-tree-original" }
+
+subroutine bar
+integer, target :: x
+integer, allocatable, target :: y(:,:), z(:,:)
+x = 7
+!$omp target enter data map(to:x)
+
+x = 8
+!$omp target data map(always, to: x)
+call foo(x)
+!$omp end target data
+
+!$omp target data use_device_ptr(x)
+call foo2(x)
+!$omp end target data
+
+!$omp target data use_device_addr(x)
+call foo2(x)
+!$omp end target data
+!$omp target exit data map(release:x)
+
+!$omp target data map(y) use_device_addr(y)
+call foo3(y)
+!$omp end target data
+
+!$omp target data map(z) use_device_ptr(z)
+call foo3(z)
+!$omp end target data
+end 
+
+! { dg-final { scan-tree-dump-times "#pragma omp target enter data map\\(to:x\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(always,to:x\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data use_device_ptr\\(x\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data use_device_addr\\(x\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target exit data map\\(release:x\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:\\*\\(c_char \\*\\) y.data \\\[len: .*\\) map\\(to:y \\\[pointer set, len: .*\\) map\\(alloc:.*y.data \\\[pointer assign, bias: 0\\\]\\) use_device_addr\\(y\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target data map\\(tofrom:\\*\\(c_char \\*\\) z.data \\\[len: .*\\) map\\(to:z \\\[pointer set, len: .*\\) map\\(alloc:.*z.data \\\[pointer assign, bias: 0\\\]\\) use_device_ptr\\(z\\)" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/gomp/map-4.f90 b/gcc/testsuite/gfortran.dg/gomp/map-4.f90
new file mode 100644
index 000..3aa1c8096d7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/map-4.f90
@@ -0,0 +1,7 @@
+!$omp target enter data device(1) if (.true.) nowait ! { dg-error "TARGET ENTER DATA must contain at least one MAP clause" }
+
+!$omp target data device(1)  ! { dg-error "TARGET DATA must contain at least one MAP, USE_DEVICE_PTR, or USE_DEVICE_ADDR clause" }
+!$omp endtarget data
+
+!$omp target exit data device(1) if (.true.) nowait ! { dg-error "TARGET EXIT DATA must contain at least one MAP clause" }
+end

Re: [PATCH 0/2] cpp: fix __has_include in traditional mode

2020-07-29 Thread Nathan Sidwell


On 7/28/20 3:28 AM, Tiziano Müller wrote:

Hi there,

this patch intends to fix the regression introduced in gcc 10 with 
__has_include,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95889


Thanks, do you have write access?  I don't see your name in MAINTAINERS.

[for avoidance of doubt, this is sufficiently restricted I'm not 
concerned with copyright assignment, there is only one way to write this 
change]


nathan


Best,
Tiziano

Tiziano Müller (2):
   libcpp: fix __has_include handling with traditional-cpp
   [testsuite] add tests for __has_include with traditional-cpp

  gcc/testsuite/ChangeLog   |  4 ++
  .../cpp/has-include-1-traditional.c   | 38 +++
  libcpp/init.c |  3 ++
  libcpp/traditional.c  |  4 +-
  4 files changed, 48 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/c-c++-common/cpp/has-include-1-traditional.c




--
Nathan Sidwell

Re: [PATCH v2] [RISC-V] Add support for TLS stack protector canary access

2020-07-29 Thread Cooper Qu via Gcc-patches

Sorry for later replay, I will add testcases on a following patch if the 
patch is accepted.



Regards,

Cooper

On 2020/7/28 上午9:23, Kito Cheng wrote:

Add testcase later is OK to me.

On Tue, Jul 28, 2020 at 6:55 AM Jim Wilson  wrote:

On Sun, Jul 19, 2020 at 7:04 PM cooper  wrote:

Ping

On 2020/7/13 下午4:15, cooper wrote:

gcc/
   * config/riscv/riscv-opts.h (stack_protector_guard): New enum.
   * config/riscv/riscv.c (riscv_option_override): Handle
   the new options.
   * config/riscv/riscv.md (stack_protect_set): New pattern to handle
   flexible stack protector guard settings.
   (stack_protect_set_): Ditto.
   (stack_protect_test): Ditto.
   (stack_protect_test_): Ditto.
   * config/riscv/riscv.opt (mstack-protector-guard=,
   mstack-protector-guard-reg=, mstack-protector-guard-offset=): New
   options.
   * doc/invoke.texi (Option Summary) [RISC-V Options]:
   Add -mstack-protector-guard=, -mstack-protector-guard-reg=, and
   -mstack-protector-guard-offset=.
   (RISC-V Options): Ditto.

The v2 patch looks fine to me.  Meanwhile, Kito asked for testcases
which would be nice to have but I don't think is critical considering
that this has already been tested with a kernel build.  Maybe the
testcases can be a follow on patch?  I'd like to see forward movement
on this, even if we accept a patch without the testcases.

Jim

[PATCH] gcc-changelog: fix combining of arguments.

2020-07-29 Thread Martin Liška


One obvious fix we've just hit with Honza.

Martin

contrib/ChangeLog:

2020-07-29  Martin Liska  

* git-backport.py: fix how are ChangeLog paths combined.
---
 contrib/git-backport.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/git-backport.py b/contrib/git-backport.py
index 3a9413dcd27..2b8e4686719 100755
--- a/contrib/git-backport.py
+++ b/contrib/git-backport.py
@@ -46,7 +46,7 @@ if __name__ == '__main__':
 conflicts = out.strip().split('\n')
 changelogs = [c for c in conflicts if c.endswith('ChangeLog')]
 if changelogs:
-cmd = 'git checkout --theirs %s' % '\n'.join(changelogs)
+cmd = 'git checkout --theirs %s' % ' '.join(changelogs)
 subprocess.check_output(cmd, shell=True)
 # 2) remove all ChangeLog files from index
 cmd = 'git diff --name-only --diff-filter=M HEAD'
--
2.27.0

Re: [PATCH] Require CET support only for the final GCC build

2020-07-29 Thread Richard Biener

On Wed, 29 Jul 2020, H.J. Lu wrote:

> On Wed, Jul 29, 2020 at 4:01 AM Richard Biener
>  wrote:
> >
> > On Fri, Jul 17, 2020 at 5:32 PM H.J. Lu via Gcc-patches
> >  wrote:
> > >
> > > On Fri, Jul 17, 2020 at 6:27 AM Richard Biener  wrote:
> > > >
> > > > On Fri, 17 Jul 2020, H.J. Lu wrote:
> > > >
> > > > > On Fri, Jul 17, 2020 at 12:08 AM Richard Biener  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 15 Jul 2020, Joseph Myers wrote:
> > > > > >
> > > > > > > On Wed, 15 Jul 2020, Richard Biener wrote:
> > > > > > >
> > > > > > > > But note one of the issues is that when not cross-compiling 
> > > > > > > > we're
> > > > > > > > using a single libiberty for target and host objects (likewise
> > > > > > >
> > > > > > > There shouldn't be a target libiberty, since commit
> > > > > > > 8499116aa30a46993deff5acf73985df6b16fb8b (re PR regression/47836 
> > > > > > > (Some
> > > > > > > Cross Compiler can't build target-libiberty or target-zlib), Wed 
> > > > > > > Jun 22
> > > > > > > 19:40:45 2011 +).  If something is causing target libiberty 
> > > > > > > to be
> > > > > > > built, that's a bug that should be fixed.
> > > > > > >
> > > > > > > > That said, giving configury an idea whether it configures for
> > > > > > > > the host, the target or the build would be required here - 
> > > > > > > > Joseph,
> > > > > > > > is there an existing mechanism for example libiberty can use
> > > > > > > > here?
> > > > > > >
> > > > > > > Makefile.def has some settings specific to host or build, e.g.
> > > > > > >
> > > > > > > build_modules= { module= libcpp;
> > > > > > >  extra_configure_flags='--disable-nls 
> > > > > > > am_cv_func_iconv=no';};
> > > > > > >
> > > > > > > or
> > > > > > >
> > > > > > > host_modules= { module= libiberty; bootstrap=true;
> > > > > > > 
> > > > > > > extra_configure_flags='@extra_host_libiberty_configure_flags@';};
> > > > > >
> > > > > > Ah, OK.  Looks like we should be able to add a
> > > > > > @extra_target_cet_configure_flags@, funnel that to the 
> > > > > > target_modules
> > > > > > and keep CET disabled by default in the modules configury.
> > > > > >
> > > > > > Similarly (if HJ is correct) we'd need to add
> > > > > > @extra_{host,build}_cet_configure_flags@ for the purpose of 
> > > > > > lto-plugin
> > > > > > which only has a host module (and for bootstrap host == build, so 
> > > > > > it's
> > > > > > shared there but we still have separate libiberties for 
> > > > > > host/build...)
> > > > > >
> > > > >
> > > > > We need -fcf-protection only on object files which will be dlopened on
> > > > > CET enabled build and host.
> > > >
> > > > Why is there a distinction between dlopen and execution?  IIRC
> > > > ld falls back to non-CET operation when dlopening a non-CET shared 
> > > > object?
> > >
> > > BTW, ld.so refuses to dlopen a legacy shared object after CET has been 
> > > enabled.
> > > This behavior can be controlled when configuring glibc:
> > >
> > > '--enable-cet'
> > > '--enable-cet=permissive'
> > >  Enable Intel Control-flow Enforcement Technology (CET) support.
> > >  When the GNU C Library is built with '--enable-cet' or
> > >  '--enable-cet=permissive', the resulting library is protected with
> > >  indirect branch tracking (IBT) and shadow stack (SHSTK).  When CET
> > >  is enabled, the GNU C Library is compatible with all existing
> > >  executables and shared libraries.  This feature is currently
> > >  supported on i386, x86_64 and x32 with GCC 8 and binutils 2.29 or
> > >  later.  Note that when CET is enabled, the GNU C Library requires
> > >  CPUs capable of multi-byte NOPs, like x86-64 processors as well as
> > >  Intel Pentium Pro or newer.  With '--enable-cet', it is an error to
> > >  dlopen a non CET enabled shared library in CET enabled application.
> > >  With '--enable-cet=permissive', CET is disabled when dlopening a
> > >  non CET enabled shared library in CET enabled application.
> >
> > So getting back to this one of the issues is that --enable-cet is used
> > for both GCC_CET_FLAGS and GCC_CET_HOST_FLAGS where
> > for the host flag part I'd use --enable-cet=auto but for the target library
> > part I definitely want to know if --enable-cet cannot be honored.
> >
> > Your current patch would still prohibit a non-bootstrap build with a host
> > compiler not supporting CET and requesting CET enabled target libraries,
> > thus
> >
> > ../configure --enable-cet --disable-bootstrap
> >
> > would fail.  Shouldn't we - for the host part - simply treat 'yes' equal
> > to 'auto'?  If not, then we should have --enable-cet-host.  Which would
> > be somewhat misleading since cc1 isn't built CET enabled, just
> > lto-plugin.so is, so better --enable-lto-plugin-cet?
> >
> > Thus I'd go with the simpler of both:
> >
> > diff --git a/config/cet.m4 b/config/cet.m4
> > index d9608699cd5..fb4e4275413 100644
> > --- a/config/cet.m4
> > +++ b/config/cet.m4
> > @@ -64

Re: [PATCH] Require CET support only for the final GCC build

2020-07-29 Thread H.J. Lu via Gcc-patches

On Wed, Jul 29, 2020 at 4:01 AM Richard Biener
 wrote:
>
> On Fri, Jul 17, 2020 at 5:32 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > On Fri, Jul 17, 2020 at 6:27 AM Richard Biener  wrote:
> > >
> > > On Fri, 17 Jul 2020, H.J. Lu wrote:
> > >
> > > > On Fri, Jul 17, 2020 at 12:08 AM Richard Biener  
> > > > wrote:
> > > > >
> > > > > On Wed, 15 Jul 2020, Joseph Myers wrote:
> > > > >
> > > > > > On Wed, 15 Jul 2020, Richard Biener wrote:
> > > > > >
> > > > > > > But note one of the issues is that when not cross-compiling we're
> > > > > > > using a single libiberty for target and host objects (likewise
> > > > > >
> > > > > > There shouldn't be a target libiberty, since commit
> > > > > > 8499116aa30a46993deff5acf73985df6b16fb8b (re PR regression/47836 
> > > > > > (Some
> > > > > > Cross Compiler can't build target-libiberty or target-zlib), Wed 
> > > > > > Jun 22
> > > > > > 19:40:45 2011 +).  If something is causing target libiberty to 
> > > > > > be
> > > > > > built, that's a bug that should be fixed.
> > > > > >
> > > > > > > That said, giving configury an idea whether it configures for
> > > > > > > the host, the target or the build would be required here - Joseph,
> > > > > > > is there an existing mechanism for example libiberty can use
> > > > > > > here?
> > > > > >
> > > > > > Makefile.def has some settings specific to host or build, e.g.
> > > > > >
> > > > > > build_modules= { module= libcpp;
> > > > > >  extra_configure_flags='--disable-nls 
> > > > > > am_cv_func_iconv=no';};
> > > > > >
> > > > > > or
> > > > > >
> > > > > > host_modules= { module= libiberty; bootstrap=true;
> > > > > > 
> > > > > > extra_configure_flags='@extra_host_libiberty_configure_flags@';};
> > > > >
> > > > > Ah, OK.  Looks like we should be able to add a
> > > > > @extra_target_cet_configure_flags@, funnel that to the target_modules
> > > > > and keep CET disabled by default in the modules configury.
> > > > >
> > > > > Similarly (if HJ is correct) we'd need to add
> > > > > @extra_{host,build}_cet_configure_flags@ for the purpose of lto-plugin
> > > > > which only has a host module (and for bootstrap host == build, so it's
> > > > > shared there but we still have separate libiberties for host/build...)
> > > > >
> > > >
> > > > We need -fcf-protection only on object files which will be dlopened on
> > > > CET enabled build and host.
> > >
> > > Why is there a distinction between dlopen and execution?  IIRC
> > > ld falls back to non-CET operation when dlopening a non-CET shared object?
> >
> > BTW, ld.so refuses to dlopen a legacy shared object after CET has been 
> > enabled.
> > This behavior can be controlled when configuring glibc:
> >
> > '--enable-cet'
> > '--enable-cet=permissive'
> >  Enable Intel Control-flow Enforcement Technology (CET) support.
> >  When the GNU C Library is built with '--enable-cet' or
> >  '--enable-cet=permissive', the resulting library is protected with
> >  indirect branch tracking (IBT) and shadow stack (SHSTK).  When CET
> >  is enabled, the GNU C Library is compatible with all existing
> >  executables and shared libraries.  This feature is currently
> >  supported on i386, x86_64 and x32 with GCC 8 and binutils 2.29 or
> >  later.  Note that when CET is enabled, the GNU C Library requires
> >  CPUs capable of multi-byte NOPs, like x86-64 processors as well as
> >  Intel Pentium Pro or newer.  With '--enable-cet', it is an error to
> >  dlopen a non CET enabled shared library in CET enabled application.
> >  With '--enable-cet=permissive', CET is disabled when dlopening a
> >  non CET enabled shared library in CET enabled application.
>
> So getting back to this one of the issues is that --enable-cet is used
> for both GCC_CET_FLAGS and GCC_CET_HOST_FLAGS where
> for the host flag part I'd use --enable-cet=auto but for the target library
> part I definitely want to know if --enable-cet cannot be honored.
>
> Your current patch would still prohibit a non-bootstrap build with a host
> compiler not supporting CET and requesting CET enabled target libraries,
> thus
>
> ../configure --enable-cet --disable-bootstrap
>
> would fail.  Shouldn't we - for the host part - simply treat 'yes' equal
> to 'auto'?  If not, then we should have --enable-cet-host.  Which would
> be somewhat misleading since cc1 isn't built CET enabled, just
> lto-plugin.so is, so better --enable-lto-plugin-cet?
>
> Thus I'd go with the simpler of both:
>
> diff --git a/config/cet.m4 b/config/cet.m4
> index d9608699cd5..fb4e4275413 100644
> --- a/config/cet.m4
> +++ b/config/cet.m4
> @@ -64,7 +64,7 @@ case "$host" in
>  save_CFLAGS="$CFLAGS"
>  CFLAGS="$CFLAGS -fcf-protection"
>  case "$enable_cet" in
> -  auto)
> +  auto|yes)
> # Check if target supports multi-byte NOPs
> # and if assembler supports CET insn.
> AC_COMPILE_IFELSE(
> @@ -80,15 +80,6 @@ asm ("setssbsy");
>  [e

Re: [PATCH] Require CET support only for the final GCC build

2020-07-29 Thread Richard Biener

On Wed, 29 Jul 2020, Richard Biener wrote:

> On Fri, Jul 17, 2020 at 5:32 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > On Fri, Jul 17, 2020 at 6:27 AM Richard Biener  wrote:
> > >
> > > On Fri, 17 Jul 2020, H.J. Lu wrote:
> > >
> > > > On Fri, Jul 17, 2020 at 12:08 AM Richard Biener  
> > > > wrote:
> > > > >
> > > > > On Wed, 15 Jul 2020, Joseph Myers wrote:
> > > > >
> > > > > > On Wed, 15 Jul 2020, Richard Biener wrote:
> > > > > >
> > > > > > > But note one of the issues is that when not cross-compiling we're
> > > > > > > using a single libiberty for target and host objects (likewise
> > > > > >
> > > > > > There shouldn't be a target libiberty, since commit
> > > > > > 8499116aa30a46993deff5acf73985df6b16fb8b (re PR regression/47836 
> > > > > > (Some
> > > > > > Cross Compiler can't build target-libiberty or target-zlib), Wed 
> > > > > > Jun 22
> > > > > > 19:40:45 2011 +).  If something is causing target libiberty to 
> > > > > > be
> > > > > > built, that's a bug that should be fixed.
> > > > > >
> > > > > > > That said, giving configury an idea whether it configures for
> > > > > > > the host, the target or the build would be required here - Joseph,
> > > > > > > is there an existing mechanism for example libiberty can use
> > > > > > > here?
> > > > > >
> > > > > > Makefile.def has some settings specific to host or build, e.g.
> > > > > >
> > > > > > build_modules= { module= libcpp;
> > > > > >  extra_configure_flags='--disable-nls 
> > > > > > am_cv_func_iconv=no';};
> > > > > >
> > > > > > or
> > > > > >
> > > > > > host_modules= { module= libiberty; bootstrap=true;
> > > > > > 
> > > > > > extra_configure_flags='@extra_host_libiberty_configure_flags@';};
> > > > >
> > > > > Ah, OK.  Looks like we should be able to add a
> > > > > @extra_target_cet_configure_flags@, funnel that to the target_modules
> > > > > and keep CET disabled by default in the modules configury.
> > > > >
> > > > > Similarly (if HJ is correct) we'd need to add
> > > > > @extra_{host,build}_cet_configure_flags@ for the purpose of lto-plugin
> > > > > which only has a host module (and for bootstrap host == build, so it's
> > > > > shared there but we still have separate libiberties for host/build...)
> > > > >
> > > >
> > > > We need -fcf-protection only on object files which will be dlopened on
> > > > CET enabled build and host.
> > >
> > > Why is there a distinction between dlopen and execution?  IIRC
> > > ld falls back to non-CET operation when dlopening a non-CET shared object?
> >
> > BTW, ld.so refuses to dlopen a legacy shared object after CET has been 
> > enabled.
> > This behavior can be controlled when configuring glibc:
> >
> > '--enable-cet'
> > '--enable-cet=permissive'
> >  Enable Intel Control-flow Enforcement Technology (CET) support.
> >  When the GNU C Library is built with '--enable-cet' or
> >  '--enable-cet=permissive', the resulting library is protected with
> >  indirect branch tracking (IBT) and shadow stack (SHSTK).  When CET
> >  is enabled, the GNU C Library is compatible with all existing
> >  executables and shared libraries.  This feature is currently
> >  supported on i386, x86_64 and x32 with GCC 8 and binutils 2.29 or
> >  later.  Note that when CET is enabled, the GNU C Library requires
> >  CPUs capable of multi-byte NOPs, like x86-64 processors as well as
> >  Intel Pentium Pro or newer.  With '--enable-cet', it is an error to
> >  dlopen a non CET enabled shared library in CET enabled application.
> >  With '--enable-cet=permissive', CET is disabled when dlopening a
> >  non CET enabled shared library in CET enabled application.
> 
> So getting back to this one of the issues is that --enable-cet is used
> for both GCC_CET_FLAGS and GCC_CET_HOST_FLAGS where
> for the host flag part I'd use --enable-cet=auto but for the target library
> part I definitely want to know if --enable-cet cannot be honored.
> 
> Your current patch would still prohibit a non-bootstrap build with a host
> compiler not supporting CET and requesting CET enabled target libraries,
> thus
> 
> ../configure --enable-cet --disable-bootstrap
> 
> would fail.  Shouldn't we - for the host part - simply treat 'yes' equal
> to 'auto'?  If not, then we should have --enable-cet-host.  Which would
> be somewhat misleading since cc1 isn't built CET enabled, just
> lto-plugin.so is, so better --enable-lto-plugin-cet?
> 
> Thus I'd go with the simpler of both:
> 
> diff --git a/config/cet.m4 b/config/cet.m4
> index d9608699cd5..fb4e4275413 100644
> --- a/config/cet.m4
> +++ b/config/cet.m4
> @@ -64,7 +64,7 @@ case "$host" in
>  save_CFLAGS="$CFLAGS"
>  CFLAGS="$CFLAGS -fcf-protection"
>  case "$enable_cet" in
> -  auto)
> +  auto|yes)
> # Check if target supports multi-byte NOPs
> # and if assembler supports CET insn.
> AC_COMPILE_IFELSE(
> @@ -80,15 +80,6 @@ asm ("setssbsy");
>  [enable_c

Re: [PATCH] Require CET support only for the final GCC build

2020-07-29 Thread Richard Biener via Gcc-patches

On Fri, Jul 17, 2020 at 5:32 PM H.J. Lu via Gcc-patches
 wrote:
>
> On Fri, Jul 17, 2020 at 6:27 AM Richard Biener  wrote:
> >
> > On Fri, 17 Jul 2020, H.J. Lu wrote:
> >
> > > On Fri, Jul 17, 2020 at 12:08 AM Richard Biener  wrote:
> > > >
> > > > On Wed, 15 Jul 2020, Joseph Myers wrote:
> > > >
> > > > > On Wed, 15 Jul 2020, Richard Biener wrote:
> > > > >
> > > > > > But note one of the issues is that when not cross-compiling we're
> > > > > > using a single libiberty for target and host objects (likewise
> > > > >
> > > > > There shouldn't be a target libiberty, since commit
> > > > > 8499116aa30a46993deff5acf73985df6b16fb8b (re PR regression/47836 (Some
> > > > > Cross Compiler can't build target-libiberty or target-zlib), Wed Jun 
> > > > > 22
> > > > > 19:40:45 2011 +).  If something is causing target libiberty to be
> > > > > built, that's a bug that should be fixed.
> > > > >
> > > > > > That said, giving configury an idea whether it configures for
> > > > > > the host, the target or the build would be required here - Joseph,
> > > > > > is there an existing mechanism for example libiberty can use
> > > > > > here?
> > > > >
> > > > > Makefile.def has some settings specific to host or build, e.g.
> > > > >
> > > > > build_modules= { module= libcpp;
> > > > >  extra_configure_flags='--disable-nls 
> > > > > am_cv_func_iconv=no';};
> > > > >
> > > > > or
> > > > >
> > > > > host_modules= { module= libiberty; bootstrap=true;
> > > > > 
> > > > > extra_configure_flags='@extra_host_libiberty_configure_flags@';};
> > > >
> > > > Ah, OK.  Looks like we should be able to add a
> > > > @extra_target_cet_configure_flags@, funnel that to the target_modules
> > > > and keep CET disabled by default in the modules configury.
> > > >
> > > > Similarly (if HJ is correct) we'd need to add
> > > > @extra_{host,build}_cet_configure_flags@ for the purpose of lto-plugin
> > > > which only has a host module (and for bootstrap host == build, so it's
> > > > shared there but we still have separate libiberties for host/build...)
> > > >
> > >
> > > We need -fcf-protection only on object files which will be dlopened on
> > > CET enabled build and host.
> >
> > Why is there a distinction between dlopen and execution?  IIRC
> > ld falls back to non-CET operation when dlopening a non-CET shared object?
>
> BTW, ld.so refuses to dlopen a legacy shared object after CET has been 
> enabled.
> This behavior can be controlled when configuring glibc:
>
> '--enable-cet'
> '--enable-cet=permissive'
>  Enable Intel Control-flow Enforcement Technology (CET) support.
>  When the GNU C Library is built with '--enable-cet' or
>  '--enable-cet=permissive', the resulting library is protected with
>  indirect branch tracking (IBT) and shadow stack (SHSTK).  When CET
>  is enabled, the GNU C Library is compatible with all existing
>  executables and shared libraries.  This feature is currently
>  supported on i386, x86_64 and x32 with GCC 8 and binutils 2.29 or
>  later.  Note that when CET is enabled, the GNU C Library requires
>  CPUs capable of multi-byte NOPs, like x86-64 processors as well as
>  Intel Pentium Pro or newer.  With '--enable-cet', it is an error to
>  dlopen a non CET enabled shared library in CET enabled application.
>  With '--enable-cet=permissive', CET is disabled when dlopening a
>  non CET enabled shared library in CET enabled application.

So getting back to this one of the issues is that --enable-cet is used
for both GCC_CET_FLAGS and GCC_CET_HOST_FLAGS where
for the host flag part I'd use --enable-cet=auto but for the target library
part I definitely want to know if --enable-cet cannot be honored.

Your current patch would still prohibit a non-bootstrap build with a host
compiler not supporting CET and requesting CET enabled target libraries,
thus

../configure --enable-cet --disable-bootstrap

would fail.  Shouldn't we - for the host part - simply treat 'yes' equal
to 'auto'?  If not, then we should have --enable-cet-host.  Which would
be somewhat misleading since cc1 isn't built CET enabled, just
lto-plugin.so is, so better --enable-lto-plugin-cet?

Thus I'd go with the simpler of both:

diff --git a/config/cet.m4 b/config/cet.m4
index d9608699cd5..fb4e4275413 100644
--- a/config/cet.m4
+++ b/config/cet.m4
@@ -64,7 +64,7 @@ case "$host" in
 save_CFLAGS="$CFLAGS"
 CFLAGS="$CFLAGS -fcf-protection"
 case "$enable_cet" in
-  auto)
+  auto|yes)
# Check if target supports multi-byte NOPs
# and if assembler supports CET insn.
AC_COMPILE_IFELSE(
@@ -80,15 +80,6 @@ asm ("setssbsy");
 [enable_cet=yes],
 [enable_cet=no])
;;
-  yes)
-   # Check if assembler supports CET.
-   AC_COMPILE_IFELSE(
-[AC_LANG_PROGRAM(
- [],
- [asm ("setssbsy");])],
-[],
-[AC_MSG_ERROR([assembler with CET support is required for
--enable-cet])]

[PATCH v2][GCC] arm: Enable no-writeback vldr.16/vstr.16.

2020-07-29 Thread Joe Ramsay

Hi,

There was previously no way to specify that a register operand cannot
have any writeback modifiers, and as a result the argument to vldr.16
and vstr.16 could be erroneously output with post-increment. This
change adds a constraint which forbids all writeback, and
selects it in the relevant case for vldr.16 and vstr.16

Bootstrapped on arm-linux, gcc and CMSIS-DSP testsuites are clean.
Is this patch OK for trunk? If yes, please commit on my behalf as I don't
have commit rights.

Thanks,
Joe

gcc/ChangeLog:

2020-05-20  Joe Ramsay  

* config/arm/arm-protos.h (arm_coproc_mem_operand_no_writeback): 
Declare prototype.
(arm_mve_mode_and_operands_type_check): Declare prototype.
* config/arm/arm.c (arm_coproc_mem_operand): Refactor to use 
_arm_coproc_mem_operand.
(arm_coproc_mem_operand_wb): New function to cover full, limited and no 
writeback.
(arm_coproc_mem_operand_no_writeback): New constraint for memory 
operand with no writeback.
(arm_print_operand): Extend 'E' specifier for memory operand that does 
not support
writeback.
(arm_mve_mode_and_operands_type_check): New constraint check for MVE 
memory operands.
* config/arm/constraints.md: Add Uj constraint for VFP vldr.16 and 
vstr.16.
* config/arm/vfp.md (*mov_load_vfp_hf16): New pattern for vldr.16.
(*mov_store_vfp_hf16): New pattern for vstr.16.
(*mov_vfp_16): Remove MVE moves.

gcc/testsuite/ChangeLog:

2020-05-20  Joe Ramsay  

* gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c: New test.
---
 gcc/config/arm/arm-protos.h|  3 +
 gcc/config/arm/arm.c   | 74 ++
 gcc/config/arm/constraints.md  |  7 ++
 gcc/config/arm/vfp.md  | 26 +---
 .../arm/mve/intrinsics/mve-vldstr16-no-writeback.c | 17 +
 5 files changed, 105 insertions(+), 22 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/mve-vldstr16-no-writeback.c

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 33d162c..e811da4 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -115,8 +115,11 @@ extern enum reg_class coproc_secondary_reload_class 
(machine_mode, rtx,
 extern bool arm_tls_referenced_p (rtx);
 
 extern int arm_coproc_mem_operand (rtx, bool);
+extern int arm_coproc_mem_operand_no_writeback (rtx);
+extern int arm_coproc_mem_operand_wb (rtx, int);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int mve_vector_mem_operand (machine_mode, rtx, bool);
+bool arm_mve_mode_and_operands_type_check (machine_mode, rtx, rtx);
 extern int neon_struct_mem_operand (rtx);
 
 extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6b7ca82..63e052f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13217,13 +13217,14 @@ neon_element_bits (machine_mode mode)
 /* Predicates for `match_operand' and `match_operator'.  */
 
 /* Return TRUE if OP is a valid coprocessor memory address pattern.
-   WB is true if full writeback address modes are allowed and is false
+   WB level is 2 if full writeback address modes are allowed, 1
if limited writeback address modes (POST_INC and PRE_DEC) are
-   allowed.  */
+   allowed and 0 if no writeback at all is supported.  */
 
 int
-arm_coproc_mem_operand (rtx op, bool wb)
+arm_coproc_mem_operand_wb (rtx op, int wb_level)
 {
+  gcc_assert (wb_level == 0 || wb_level == 1 || wb_level == 2);
   rtx ind;
 
   /* Reject eliminable registers.  */
@@ -13256,16 +13257,18 @@ arm_coproc_mem_operand (rtx op, bool wb)
 
   /* Autoincremment addressing modes.  POST_INC and PRE_DEC are
  acceptable in any case (subject to verification by
- arm_address_register_rtx_p).  We need WB to be true to accept
+ arm_address_register_rtx_p).  We need full writeback to accept
+ PRE_INC and POST_DEC, and at least restricted writeback for
  PRE_INC and POST_DEC.  */
-  if (GET_CODE (ind) == POST_INC
-  || GET_CODE (ind) == PRE_DEC
-  || (wb
- && (GET_CODE (ind) == PRE_INC
- || GET_CODE (ind) == POST_DEC)))
+  if (wb_level > 0
+  && (GET_CODE (ind) == POST_INC
+ || GET_CODE (ind) == PRE_DEC
+ || (wb_level > 1
+ && (GET_CODE (ind) == PRE_INC
+ || GET_CODE (ind) == POST_DEC
 return arm_address_register_rtx_p (XEXP (ind, 0), 0);
 
-  if (wb
+  if (wb_level > 1
   && (GET_CODE (ind) == POST_MODIFY || GET_CODE (ind) == PRE_MODIFY)
   && arm_address_register_rtx_p (XEXP (ind, 0), 0)
   && GET_CODE (XEXP (ind, 1)) == PLUS
@@ -13287,6 +13290,25 @@ arm_coproc_mem_operand (rtx op, bool wb)
   return FALSE;
 }
 
+/* Return TRUE if OP is a valid coprocessor memory address pattern.
+   WB is true if full writeback address modes are allowed and is false
+   if limited writeback address modes (POST

[PATCH] verify SCEV cache for stale entries

2020-07-29 Thread Richard Biener

This adds verification for the SCEV cache to avoid stale entries
that when picked up will lead to ICEs or other surprises.

Bootstrapped / tested on x86_64-unknown-linux-gnu with the two
previous fixes.

Posted for reference and archival purposes, I'm not going to push
this at this point.

2020-07-29  Richard Biener  

* passes.c (execute_function_todo): Call verify_scev_cache
if it is initialized.
* tree-scalar-evolution.h (verify_scev_cache): Declare.
* tree-scalar-evolution.c (verify_scev_cache_r): New.
(verify_scev_cache): Likewise.
---
 gcc/passes.c| 11 ---
 gcc/tree-scalar-evolution.c | 29 +
 gcc/tree-scalar-evolution.h |  1 +
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/gcc/passes.c b/gcc/passes.c
index a5da9a46f4e..c3c3a086236 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-core.h" /* for fnotice */
 #include "stringpool.h"
 #include "attribs.h"
+#include "tree-scalar-evolution.h" /* for verify_scev_cache.  */
 
 using namespace gcc;
 
@@ -1994,9 +1995,13 @@ execute_function_todo (function *fn, void *data)
verify_gimple_in_seq (gimple_body (cfun->decl));
}
  if (cfun->curr_properties & PROP_ssa)
-   /* IPA passes leave stmts to be fixed up, so make sure to
-  not verify SSA operands whose verifier will choke on that.  */
-   verify_ssa (true, !from_ipa_pass);
+   {
+ /* IPA passes leave stmts to be fixed up, so make sure to
+not verify SSA operands whose verifier will choke on that.  */
+ verify_ssa (true, !from_ipa_pass);
+ if (scev_initialized_p ())
+   verify_scev_cache ();
+   }
  /* IPA passes leave basic-blocks unsplit, so make sure to
 not trip on that.  */
  if ((cfun->curr_properties & PROP_cfg)
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index edab778277b..9645fed980c 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -284,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "builtins.h"
 #include "case-cfn-macros.h"
+#include "diagnostic.h"
 
 static tree analyze_scalar_evolution_1 (class loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (class loop *loop,
@@ -3025,6 +3026,34 @@ scev_reset (void)
 }
 }
 
+/* CHREC walker for verify_scev_cache.  */
+
+static tree
+verify_scev_cache_r (tree *tp, int *walk_subtrees, void *)
+{
+  if (TREE_CODE (*tp) == SSA_NAME
+  && SSA_NAME_IN_FREE_LIST (*tp))
+internal_error ("stale SCEV hash table entries");
+  if (!EXPR_P (*tp))
+*walk_subtrees = 0;
+  return NULL_TREE;
+}
+
+/* Verify the SCEV cache has no stale entries.  */
+
+void
+verify_scev_cache ()
+{
+  for (auto i = scalar_evolution_info->begin ();
+   i != scalar_evolution_info->end (); ++i)
+{
+  if ((*i)->name_version >= num_ssa_names
+ || ! ssa_name ((*i)->name_version))
+   internal_error ("stale SCEV hash table entries");
+  walk_tree (&(*i)->chrec, verify_scev_cache_r, NULL, NULL);
+}
+}
+
 /* Return true if the IV calculation in TYPE can overflow based on the 
knowledge
of the upper bound on the number of iterations of LOOP, the BASE and STEP
of IV.
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index e2fbfb55bd0..d0c578708ac 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -42,6 +42,7 @@ extern bool simple_iv (class loop *, class loop *, tree, 
struct affine_iv *,
   bool);
 extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
+extern void verify_scev_cache ();
 
 /* Returns the basic block preceding LOOP, or the CFG entry block when
the loop is function's body.  */
-- 
2.26.2

[PATCH] more SCEV cache clearing

2020-07-29 Thread Richard Biener

This fixes two more places, in loop interchange and in the
vectorizer where the SCEV verifier sees stale entries.

Boostrapped/tested on x86_64-unknown-linux-gnu, pushed.

2020-07-29  Richard Biener  

* tree-vectorizer.c (vectorize_loops): Reset the SCEV
cache if we removed any SIMD UID SSA defs.
* gimple-loop-interchange.cc (pass_linterchange::execute):
Reset the scev cache if we interchanged a loop.
---
 gcc/gimple-loop-interchange.cc | 2 ++
 gcc/tree-vectorizer.c  | 6 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 2379848808f..1656004ecf0 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2084,6 +2084,8 @@ pass_linterchange::execute (function *fun)
   loop_nest.release ();
 }
 
+  if (changed_p)
+scev_reset ();
   return changed_p ? (TODO_update_ssa_only_virtuals) : 0;
 }
 
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 26a184696aa..2a60d37bb87 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -1283,7 +1283,11 @@ vectorize_loops (void)
 
   /* Fold IFN_GOMP_SIMD_{VF,LANE,LAST_LANE,ORDERED_{START,END}} builtins.  */
   if (cfun->has_simduid_loops)
-adjust_simduid_builtins (simduid_to_vf_htab);
+{
+  adjust_simduid_builtins (simduid_to_vf_htab);
+  /* Avoid stale SCEV cache entries for the SIMD_LANE defs.  */
+  scev_reset ();
+}
 
   /* Shrink any "omp array simd" temporary arrays to the
  actual vectorization factors.  */
-- 
2.26.2

[PATCH] tree-optimization/95679 - properly signal changes from propagate_into_phi_args

2020-07-29 Thread Richard Biener

This restores a lost setting of something_changed with the
recent refactoring of the substitute and fold engine.  The
reported ICE in the PR was meanwhile mitigated in other ways
but the issue can still result in missed optimizations via
failed runs of CFG cleanup.

Bootstrapped/tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-07-29  Richard Biener  

PR tree-optimization/95679
* tree-ssa-propagate.h
(substitute_and_fold_engine::propagate_into_phi_args): Return
whether anything changed.
* tree-ssa-propagate.c
(substitute_and_fold_engine::propagate_into_phi_args): Likewise.
(substitute_and_fold_dom_walker::before_dom_children): Update
something_changed.
---
 gcc/tree-ssa-propagate.c | 15 +++
 gcc/tree-ssa-propagate.h |  2 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c
index 01ee7fd33eb..1e057284154 100644
--- a/gcc/tree-ssa-propagate.c
+++ b/gcc/tree-ssa-propagate.c
@@ -1017,11 +1017,13 @@ substitute_and_fold_dom_walker::foreach_new_stmt_in_bb
 }
 }
 
-void
+bool
 substitute_and_fold_engine::propagate_into_phi_args (basic_block bb)
 {
   edge e;
   edge_iterator ei;
+  bool propagated = false;
+
   /* Visit BB successor PHI nodes and replace PHI args.  */
   FOR_EACH_EDGE (e, ei, bb->succs)
 {
@@ -1035,11 +1037,16 @@ substitute_and_fold_engine::propagate_into_phi_args 
(basic_block bb)
  || virtual_operand_p (arg))
continue;
  tree val = get_value (arg, phi);
- if (val && is_gimple_min_invariant (val)
+ if (val
+ && is_gimple_min_invariant (val)
  && may_propagate_copy (arg, val))
-   propagate_value (use_p, val);
+   {
+ propagate_value (use_p, val);
+ propagated = true;
+   }
}
 }
+  return propagated;
 }
 
 edge
@@ -1229,7 +1236,7 @@ substitute_and_fold_dom_walker::before_dom_children 
(basic_block bb)
}
 }
 
-  substitute_and_fold_engine->propagate_into_phi_args (bb);
+  something_changed |= substitute_and_fold_engine->propagate_into_phi_args 
(bb);
 
   return NULL;
 }
diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h
index 24de43ebc6c..9406cdf8f51 100644
--- a/gcc/tree-ssa-propagate.h
+++ b/gcc/tree-ssa-propagate.h
@@ -115,7 +115,7 @@ class substitute_and_fold_engine
   virtual void pre_fold_stmt (gimple *) { }
   virtual void post_new_stmt (gimple *) { }
 
-  void propagate_into_phi_args (basic_block);
+  bool propagate_into_phi_args (basic_block);
 
   /* Users like VRP can set this when they want to perform
  folding for every propagation.  */
-- 
2.26.2

Re: [Patch] OpenMP: Add 'omp requires' to Fortran (mostly parsing)

2020-07-29 Thread Tobias Burnus


Committed as r11-2395, 
https://gcc.gnu.org/g:269322ece17202632bc354e9c510e4a5bd6ad84b
with review comments applied plus a follow-up commit for an indentation issue I 
missed:
r11-2399, https://gcc.gnu.org/g:6de5600a8bd1ef0ad3d57670efdcc68bb3484276

Tobias

On 7/29/20 10:23 AM, Jakub Jelinek wrote:


On Wed, Jul 29, 2020 at 10:10:35AM +0200, Tobias Burnus wrote:

+case AB_OMP_REQ_REVERSE_OFFLOAD:
+   gfc_omp_requires_add_clause (OMP_REQ_REVERSE_OFFLOAD,
+"reverse_offload",
+&gfc_current_locus,
+   module_name);

Even visually something is wrong with the indentation here and in a few
others, where I'd expect all the arguments to be in the same column (even
with the > + before it, but they are not.  E.g. above I think the
gfc_omp_... is indented by 3 instead of 2 columns from case, and "rev... and
&gfc... match that, but module_name is probably correct.


+  break;
+case AB_OMP_REQ_UNIFIED_ADDRESS:
+  gfc_omp_requires_add_clause (OMP_REQ_UNIFIED_ADDRESS,
+   "unified_address",
+&gfc_current_locus,

And e.g. here the &gfc... is off.

--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "diagnostic.h"
  #include "gomp-constants.h"

+

Unnecessary.

  /* Match an end of OpenMP directive.  End of OpenMP directive is optional
 whitespace, followed by '\n' or comment '!'.  */

@@ -3745,6 +3970,26 @@ gfc_match_omp_oacc_atomic (bool omp_p)
new_st.op = (omp_p ? EXEC_OMP_ATOMIC : EXEC_OACC_ATOMIC);
if (seq_cst)
  op = (gfc_omp_atomic_op) (op | GFC_OMP_ATOMIC_SEQ_CST);
+  else

I wonder if this shouldn't be else if (omp_p), I'd think
OpenACC atomics shouldn't be affected by OpenMP requires directive.


for (gfc_current_ns = gfc_global_ns_list; gfc_current_ns;
 gfc_current_ns = gfc_current_ns->sibling)
-gfc_check_externals (gfc_current_ns);
+ gfc_check_omp_requires (gfc_current_ns, omp_requires);

Again indentation looks weird.

Otherwise LGTM.

  Jakub


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Re: [PATCH][Hashtable 0/6] Code review

2020-07-29 Thread François Dumont via Gcc-patches


On 17/07/20 12:11 pm, Jonathan Wakely wrote:

N.B. the 0/6 entry of a patch series is supposed to be a cover letter
describing the changes in the series, not one of the patches.

If you have patches 0/6, 1/6, 2/6 ... 6/6 then you actually have seven
patches, not six!

Anyway ...

On 19/12/19 20:17 +0100, FranÃ§ois Dumont wrote:
diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h

index f5809c7443a..b9320506bb2 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -172,6 +172,14 @@ namespace __detail

  // Auxiliary types used for all instantiations of _Hashtable nodes
  // and iterators.
+  using __unique_keys_t = true_type;
+  using __multi_keys_t = false_type;
+
+  using __constant_iterators_t = true_type;
+  using __mutable_iterators_t = false_type;
+
+  using __hash_cached_t = true_type;
+  using __hash_not_cached_t = false_type;


This is an ABI change, and the benefit doesn't seem large enough to
justify it. It will result in code size increases for anything that
links objects compiled before and after the change.

If we wanted to do this, I think it would be better to use enums, so:

enum _Unique_keys { __unique_keys, __multi_keys };

Otherwise you can use __hash_cached_t where __unique_keys_t is meant
to be used, so all you've done is introduce more names for the same
things.

If you want to replace the 'true' and 'false' literals with something
more descriptive, can't you just use constants?

constexpr bool __hash_cached = true;
constexpr bool __hash_not_cached = false;

Or just use comments:

    struct _Local_iterator_base<_Key, _Value, _ExtractKey,
    _H1, _H2, _Hash, /*cached=*/ false>



Ok, I'll review this patch with this simpler approach.

I even started to add comments in the patch you already approved.

Re: [PATCH][Hashtable 3/6] Fix noexcept qualifications

2020-07-29 Thread François Dumont via Gcc-patches

I eventually committed the attached patch which consider all your 
remarks and moreover put back the move constructor implementation where 
it used to be (not inline). It limits the size of the patch.


I also added comments on true_type/false_type new usages like you 
advised on [Hashtable 0/6] thread.


François

On 17/07/20 11:55 am, Jonathan Wakely wrote:

On 17/11/19 21:59 +0100, FranÃ§ois Dumont wrote:
This patch adds noexcept qualification on allocator aware 
constructors and fix the one on the default constructor.


Â Â Â  * include/bits/hashtable.h
Â Â Â  (_Hashtable(_Hashtable&& __ht, __node_alloc_type&& __a, 
true_type)):

Â Â Â  Add noexcept qualification.
Â Â Â  (_Hashtable(_Hashtable&&)): Fix noexcept qualification.
Â Â Â  (_Hashtable(_Hashtable&&, const allocator_type&)): Add noexcept
Â Â Â  qualification.
Â Â Â  * include/bits/unordered_map.h
Â Â Â  (unordered_map(unordered_map&&, const allocator_type&)): Add 
noexcept

Â Â Â  qualification.
Â Â Â  (unordered_multimap(unordered_multimap&&, const 
allocator_type&)): Add

Â Â Â  noexcept qualification.
Â Â Â  * include/bits/unordered_set.h
Â Â Â  (unordered_set(unordered_set&&, const allocator_type&)): Add 
noexcept

Â Â Â  qualification.
Â Â Â  (unordered_multiset(unordered_multiset&&, const 
allocator_type&)): Add

Â Â Â  noexcept qualification.
Â Â Â  * 
testsuite/23_containers/unordered_map/allocator/default_init.cc:

Â Â Â  New.
Â Â Â  * testsuite/23_containers/unordered_map/cons/
Â Â Â  noexcept_default_construct.cc: New.
Â Â Â  * testsuite/23_containers/unordered_map/cons/
Â Â Â  noexcept_move_construct.cc: New.
Â Â Â  * testsuite/23_containers/unordered_map/modifiers/move_assign.cc:
Â Â Â  New.
Â Â Â  * testsuite/23_containers/unordered_multimap/cons/
Â Â Â  noexcept_default_construct.cc: New.
Â Â Â  * testsuite/23_containers/unordered_multimap/cons/
Â Â Â  noexcept_move_construct.cc: New.
Â Â Â  * testsuite/23_containers/unordered_multiset/cons/
Â Â Â  noexcept_default_construct.cc: New.
Â Â Â  * testsuite/23_containers/unordered_multiset/cons/
Â Â Â  noexcept_move_construct.cc: New.
Â Â Â  * 
testsuite/23_containers/unordered_set/allocator/default_init.cc: New.

Â Â Â  * testsuite/23_containers/unordered_set/cons/
Â Â Â  noexcept_default_construct.cc: New.
Â Â Â  * 
testsuite/23_containers/unordered_set/cons/noexcept_move_construct.cc:

Â Â Â  New.

Tested under Linux x86_64.

FranÃ§ois



diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h

index 5f785d4904d..ad07a36eb83 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -463,6 +463,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__hashtable_alloc(__node_alloc_type(__a))
  { }

+  _Hashtable(_Hashtable&& __ht, __node_alloc_type&& __a, true_type)
+ noexcept(std::is_nothrow_copy_constructible<_H1>::value &&
+ std::is_nothrow_copy_constructible<_Equal>::value)
+  : __hashtable_base(__ht),
+    __map_base(__ht),
+    __rehash_base(__ht),
+    __hashtable_alloc(std::move(__a)),
+    _M_buckets(__ht._M_buckets),
+    _M_bucket_count(__ht._M_bucket_count),
+    _M_before_begin(__ht._M_before_begin._M_nxt),
+    _M_element_count(__ht._M_element_count),
+    _M_rehash_policy(__ht._M_rehash_policy)
+  {
+    // Update, if necessary, buckets if __ht is using its single 
bucket.


I know the comment was already present, but I can't parse this.

There are two 'if' conditions, but don't they refer to the same thing?
It's necessary if __ht is using a single bucket, so shouldn't it be
either "Update buckets if __ht is using its single bucket" or "Update
buckets if necessary"?

"Update buckets if __ht is using its single bucket" seems preferable.


+    if (__ht._M_uses_single_bucket())
+  {
+    _M_buckets = &_M_single_bucket;
+    _M_single_bucket = __ht._M_single_bucket;
+  }
+
+    // Fix bucket containing the _M_before_begin pointer that can't be
+    // moved.
+    _M_update_bbegin();
+
+    __ht._M_reset();
+  }
+
+  _Hashtable(_Hashtable&&, __node_alloc_type&&, false_type);
+
  template
_Hashtable(_InputIterator __first, _InputIterator __last,
   size_type __bkt_count_hint,
@@ -489,11 +518,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  _Hashtable(const _Hashtable&);

-  _Hashtable(_Hashtable&&) noexcept;
+  _Hashtable(_Hashtable&& __ht)
+    noexcept( noexcept(
+  _Hashtable(std::declval<_Hashtable&&>(),


The && here is redundant, declval adds && to the type anyway.


+ std::declval<__node_alloc_type&&>(), std::declval())) )


Same for __node_alloc_type&& here.

And true_type{} is shorter and simpler than std::declval().
We know is_nothrow_default_constructible is true, so we can
just use that.

+  : _Hashtable(std::move(__ht), 
std::move(__ht._M_node_allocator()),

+   true_type{})
+  { }

  _Hashtable(const _Hashtable&, const allocator_type&);

-  _Hashtable(_Hashtable&&, const allocator_type&);
+  _Hashtable(_Has

Re: [PATCH] libstdc++ testsuite: atomic_float/value_init.cc requires libatomic

2020-07-29 Thread Rainer Orth

Jonathan Wakely via Gcc-patches  writes:

>>diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp
>>b/libstdc++-v3/testsuite/lib/dg-options.exp
>>index 9bfae71adf3..bf6219c3042 100644
>>--- a/libstdc++-v3/testsuite/lib/dg-options.exp
>>+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
>>@@ -260,7 +260,8 @@ proc add_options_for_net_ts { flags } {
>> # Add to FLAGS all the target-specific flags to link to libatomic, if 
>> required.
>>
>> proc add_options_for_libatomic { flags } {
>>-if { [istarget hppa*-*-hpux*] || [istarget riscv*-*-*] } {
>>+if { [istarget hppa*-*-hpux*] || [istarget riscv*-*-*]
>>+|| [istarget powerpc-ibm-aix*] } {
>
> Could you add || [istarget powerpc*-darwin*] here too please?
>
> Iain agreed that's needed.
>
> OK with that change, thanks.

And please keep the target list sorted alphabetically.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] libstdc++ testsuite: atomic_float/value_init.cc requires libatomic

2020-07-29 Thread Jonathan Wakely via Gcc-patches


On 28/07/20 21:44 -0400, David Edelsohn via Libstdc++ wrote:

atomic_float/value_init.cc requires libatomic on some targets, i.e.,
when it tries to perform an atomic operation with a 64 bit floating
point double type on a 32 bit target.  This patch adds AIX to the list
of targets that require the libatomic option and adds the option to
the atomic_float/value_init.cc testcase.

Bootstrapped on powerpc-ibm-aix7.2.0.0

Okay?

Thanks, David

   2020-07-28  David Edelsohn  
   Jonathan Wakely  

   * testsuite/lib/dg-options.exp (add_options_for_libatomic): Add
   target powerpc-ibm-aix*.
   * testsuite/29_atomics/atomic_float/value_init.cc: Add options
   for libatomic.

diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_float/value_init.cc
b/libstdc++-v3/testsuite/29_atomics/atomic_float/value_init.cc
index 237c0dd13ed..38af9bdc8d4 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_float/value_init.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/value_init.cc
@@ -17,6 +17,7 @@

// { dg-options "-std=gnu++2a" }
// { dg-do run { target c++2a } }
+// { dg-add-options libatomic }

#include 
#include 

diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp
b/libstdc++-v3/testsuite/lib/dg-options.exp
index 9bfae71adf3..bf6219c3042 100644
--- a/libstdc++-v3/testsuite/lib/dg-options.exp
+++ b/libstdc++-v3/testsuite/lib/dg-options.exp
@@ -260,7 +260,8 @@ proc add_options_for_net_ts { flags } {
# Add to FLAGS all the target-specific flags to link to libatomic, if required.

proc add_options_for_libatomic { flags } {
-if { [istarget hppa*-*-hpux*] || [istarget riscv*-*-*] } {
+if { [istarget hppa*-*-hpux*] || [istarget riscv*-*-*]
+|| [istarget powerpc-ibm-aix*] } {


Could you add || [istarget powerpc*-darwin*] here too please?

Iain agreed that's needed.

OK with that change, thanks.



   return "$flags -L../../libatomic/.libs -latomic"
}
return $flags

Re: [PATCH v3] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-29 Thread Jojo R

在 2020年7月24日 +0800 PM9:19，Segher Boessenkool ，写道：
> On Fri, Jul 24, 2020 at 12:03:16PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > On Fri, Jul 24, 2020 at 10:13 AM Rainer Orth
> >  wrote:
> > > Jojo R  writes:
> > > > +insn-generated-split-num = $(shell grep -c ^processor /proc/cpuinfo)
> > >
> > > This is highly unportable, probably Linux-only. It certainly doesn't
> > > exist on at least Solaris and macOS. Maybe gnulib has something here.
> >
> > It will also produce different build results depending on the build host
> > which is even more undesirable.
>
> Yeah, it has to be the same number everywhere. Something high enough
> that bigger machines benefit (so 100 or so), but not so high that the
> overhead increases too much... It shouldn't be quite as high for
> smaller backends... so some fixed ratio of number of define_insns
> perhaps, something like that?
>

We can get bigger benefit with much more huge file in this simple solution,
Indeed, it does not cover smaller backends.

Yes, we should consider to target all backends, I thinks there is no high enough
benefit for smaller backends, is it necessary ?
> Something else. Does this make the generated compiler slower? It will
> at least potentially have fewer inlining opportunities, but does that
> matter?
>

Yes, some machine will take > 30 minutes in my testcase,
It’s really matter in gcc development stage :(
> Thanks for working on this,
>
>
> Segher

Re: RFC: Monitoring old PRs, new dg directives

2020-07-29 Thread Richard Sandiford

Thanks for doing this.  +1 for the best fix being to add XFAILing tests
to the main testsute, enabled by default.  I don't see any other realistic
way of ensuring that fixes are matched with PRs at the time that the fix
is made (rather than some time after the fact).

Marek Polacek via Gcc-patches  writes:
> […]
> My thinking is that for:
>
> * rejects-valid: use the existing dg-xfail-if
> * accepts-valid: use the new dg-accepts-invalid
> * ICEs: use the new dg-ice
>
> dg-ice can be used like this:
>
> // { dg-ice "build_over_call" { target c++11 } }
>
> and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
> ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
> you'll get an XPASS + FAIL.  Then you can close the old PR.

This is long overdue IMO, thanks for adding it.

> Similarly, dg-accepts-invalid:
>
> // { dg-accepts-invalid "PR86500" }
>
> means that if the test still compiles without errors, you'll get a quiet 
> XFAIL.
> If we start giving errors, you'll get an XPASS.
>
> If the bug is fixed, simply remove the directive.
>
> The patch implementing these new directives is appended.  Once/if this is
> accepted, I can start adding the old tests we have in our Bugzilla.  (I'm
> only concerned about the c++ component, if that wasn't already clear.)
>
> The question is what makes the bug "old": is it one year without it being
> assigned?  6 months?  3 months?  Note: I *don't* propose to add every test for
> every new PR, just the reasonably old ones that are useful/important.  Such
> additions should be done in batches, so that we don't have dozens of commits,
> each of them merely adding a single test.

IMO it should be OK to add a testcase for any open PR, if someone think
it's useful, regardless of age and without being forced to batch the
commits.  I.e. I think it should come under the “obvious” rule and
people should just use their judgement about when it's appropriate.
Adding XFAILing tests shouldn't disturb anyone else very much.

I guess there's a possibility that some tests happen to pass already
on some targets.  That's more likely with middle-end and backend bugs
rather than frontend stuff though.  Perhaps for those it would make
sense to have a convention in which the failing testcase is restricted
(at the whole-test level) to the targets that the person committing the
testcase has actually tried.  Maybe with a comment on the dg-ice etc.
to remind people to reconsider the main target selector when un-XFAILing
the test.

Richard

Re: [PATCH v3] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-29 Thread Jojo R

在 2020年7月24日 +0800 PM9:57，Joseph Myers ，写道：
> On Fri, 24 Jul 2020, Jojo R wrote:
>
> > + -csplit insn-$*.c /parallel\ compilation/ -k -s 
> > {$(insn-generated-split-num)} -f insn-$* -b "%d.c"
> > + -( [ ! -s insn-$*0.c ] && for i in {1..$(insn-generated-split-num)}; do 
> > touch insn-$*$$i.c; done && echo "" > insn-$*.c)
>
> Ignoring errors (disk full, out of memory, etc.) here is not appropriate.
> If csplit fails, the build must fail.

Thank you and it’s fixed in the next version patch
>
> --
> Joseph S. Myers
> jos...@codesourcery.com

Re: [Patch] OpenMP: Add 'omp requires' to Fortran (mostly parsing)

2020-07-29 Thread Jakub Jelinek via Gcc-patches

On Wed, Jul 29, 2020 at 10:10:35AM +0200, Tobias Burnus wrote:
> + case AB_OMP_REQ_REVERSE_OFFLOAD:
> +gfc_omp_requires_add_clause (OMP_REQ_REVERSE_OFFLOAD,
> + "reverse_offload",
> + &gfc_current_locus,
> +module_name);

Even visually something is wrong with the indentation here and in a few
others, where I'd expect all the arguments to be in the same column (even
with the > + before it, but they are not.  E.g. above I think the
gfc_omp_... is indented by 3 instead of 2 columns from case, and "rev... and
&gfc... match that, but module_name is probably correct.

> +   break;
> + case AB_OMP_REQ_UNIFIED_ADDRESS:
> +   gfc_omp_requires_add_clause (OMP_REQ_UNIFIED_ADDRESS,
> +"unified_address",
> + &gfc_current_locus,

And e.g. here the &gfc... is off.
> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "diagnostic.h"
>  #include "gomp-constants.h"
>  
> +

Unnecessary.
>  /* Match an end of OpenMP directive.  End of OpenMP directive is optional
> whitespace, followed by '\n' or comment '!'.  */
>  
> @@ -3745,6 +3970,26 @@ gfc_match_omp_oacc_atomic (bool omp_p)
>new_st.op = (omp_p ? EXEC_OMP_ATOMIC : EXEC_OACC_ATOMIC);
>if (seq_cst)
>  op = (gfc_omp_atomic_op) (op | GFC_OMP_ATOMIC_SEQ_CST);
> +  else

I wonder if this shouldn't be else if (omp_p), I'd think
OpenACC atomics shouldn't be affected by OpenMP requires directive.

>for (gfc_current_ns = gfc_global_ns_list; gfc_current_ns;
> gfc_current_ns = gfc_current_ns->sibling)
> -gfc_check_externals (gfc_current_ns);
> + gfc_check_omp_requires (gfc_current_ns, omp_requires);

Again indentation looks weird.

Otherwise LGTM.

Jakub

Re: [PATCH v3] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-29 Thread Jojo R

在 2020年7月24日 +0800 PM6:03，Richard Biener ，写道：
> On Fri, Jul 24, 2020 at 10:13 AM Rainer Orth
>  wrote:
> >
> > Jojo R  writes:
> >
> > > gcc/ChangeLog:
> > >
> > > * genemit.c (main): Print 'split line'.
> > > * Makefile.in (insn-emit.c): Define split count and file
> > [...]
> > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > index 2ba76656dbf..75841e49127 100644
> > > --- a/gcc/Makefile.in
> > > +++ b/gcc/Makefile.in
> > > @@ -1253,6 +1253,13 @@ ANALYZER_OBJS = \
> > > # We put the *-match.o and insn-*.o files first so that a parallel make
> > > # will build them sooner, because they are large and otherwise tend to be
> > > # the last objects to finish building.
> > > +
> > > +insn-generated-split-num = $(shell grep -c ^processor /proc/cpuinfo)
> >
> > This is highly unportable, probably Linux-only. It certainly doesn't
> > exist on at least Solaris and macOS. Maybe gnulib has something here.
>
> It will also produce different build results depending on the build host
> which is even more undesirable.
>

Theoretically possible it’s the highest performance for building insn-emit.c if 
all processors are used.
User should not pay much more attention at auto-generated file like 
insn-emit.c, it’s right ?
> Richard,
>
> > Rainer
> >
> > --
> > -
> > Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH v3] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-29 Thread Martin Liška


On 7/29/20 10:01 AM, Jojo R wrote:

Thank you and it’s fixed in the next version patch


Hey.

One small note: can you please link the next v4 email thread to this one?
It's much easier to follow all the discussion related to your patch set.

Thanks,
Martin

Re: [Patch] OpenMP: Add 'omp requires' to Fortran (mostly parsing)

2020-07-29 Thread Tobias Burnus


And another update; I removed the global variable, which
was only used for checking at the end – and
reconstruct its value in a local variable just before calling
gfc_check_omp_requires.

(Following a suggestion by Jakub and removing a kind-of left
over from the first implementation.)

Tobias

On 7/28/20 5:12 PM, Tobias Burnus wrote:

I just realized that supporting 'acq_rel' is simple; while
'omp atomic' parsing needs to be updated quite a bit for the
OpenMP 5 changes, just adding ACQ_REL support for 'requires'
is trivial.

Hence, I updated the requires-9.c testcase for 'acq_rel',
adjusted trans-openmp.c and did some openmp.c adjustments.

Thus: New version with this trivial change and being
able to tick-off the 'atomic_default_mem_order' clause :-)

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
OpenMP: Add 'omp requires' to Fortran (mostly parsing)

gcc/fortran/ChangeLog:

	* gfortran.h (enum gfc_statement): Add ST_OMP_REQUIRES.
	(enum gfc_omp_requires_kind): New.
	(enum gfc_omp_atomic_op): Add GFC_OMP_ATOMIC_ACQ_REL.
	(struct gfc_namespace): Add omp_requires and omp_target_seen.
	(gfc_omp_requires_add_clause,
	(gfc_check_omp_requires): New.
	* match.h (gfc_match_omp_requires): New.
	* module.c (enum ab_attribute, attr_bits): Add omp requires clauses.
	(mio_symbol_attribute): Read/write them.
	* openmp.c (gfc_check_omp_requires, (gfc_omp_requires_add_clause,
	gfc_match_omp_requires): New.
	(gfc_match_omp_oacc_atomic): Use requires's default mem-order.
	* parse.c (decode_omp_directive): Match requires, set omp_target_seen.
	(gfc_ascii_statement): Handle ST_OMP_REQUIRES.
	* trans-openmp.c (gfc_trans_omp_atomic): Handle GFC_OMP_ATOMIC_ACQ_REL.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/requires-1.f90: New test.
	* gfortran.dg/gomp/requires-2.f90: New test.
	* gfortran.dg/gomp/requires-3.f90: New test.
	* gfortran.dg/gomp/requires-4.f90: New test.
	* gfortran.dg/gomp/requires-5.f90: New test.
	* gfortran.dg/gomp/requires-6.f90: New test.
	* gfortran.dg/gomp/requires-7.f90: New test.
	* gfortran.dg/gomp/requires-8.f90: New test.
	* gfortran.dg/gomp/requires-9.f90: New test.

 gcc/fortran/gfortran.h|  30 +++-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/module.c  |  73 +++-
 gcc/fortran/openmp.c  | 245 ++
 gcc/fortran/parse.c   |  53 +-
 gcc/fortran/trans-openmp.c|  10 +-
 gcc/testsuite/gfortran.dg/gomp/requires-1.f90 |  13 ++
 gcc/testsuite/gfortran.dg/gomp/requires-2.f90 |  14 ++
 gcc/testsuite/gfortran.dg/gomp/requires-3.f90 |   4 +
 gcc/testsuite/gfortran.dg/gomp/requires-4.f90 |  36 
 gcc/testsuite/gfortran.dg/gomp/requires-5.f90 |  16 ++
 gcc/testsuite/gfortran.dg/gomp/requires-6.f90 |  16 ++
 gcc/testsuite/gfortran.dg/gomp/requires-7.f90 |  41 +
 gcc/testsuite/gfortran.dg/gomp/requires-8.f90 |  22 +++
 gcc/testsuite/gfortran.dg/gomp/requires-9.f90 |  85 +
 15 files changed, 649 insertions(+), 10 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5fa86aa4e30..20cce5cf39b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -263,7 +263,7 @@ enum gfc_statement
   ST_OMP_TARGET_SIMD, ST_OMP_END_TARGET_SIMD,
   ST_OMP_TASKLOOP, ST_OMP_END_TASKLOOP,
   ST_OMP_TASKLOOP_SIMD, ST_OMP_END_TASKLOOP_SIMD, ST_OMP_ORDERED_DEPEND,
-  ST_PROCEDURE, ST_GENERIC, ST_CRITICAL, ST_END_CRITICAL,
+  ST_OMP_REQUIRES, ST_PROCEDURE, ST_GENERIC, ST_CRITICAL, ST_END_CRITICAL,
   ST_GET_FCN_CHARACTERISTICS, ST_LOCK, ST_UNLOCK, ST_EVENT_POST,
   ST_EVENT_WAIT, ST_FAIL_IMAGE, ST_FORM_TEAM, ST_CHANGE_TEAM,
   ST_END_TEAM, ST_SYNC_TEAM, ST_NONE
@@ -1334,6 +1334,24 @@ enum gfc_omp_if_kind
   OMP_IF_LAST
 };
 
+enum gfc_omp_requires_kind
+{
+  /* Keep in sync with gfc_namespace, esp. with omp_req_mem_order.  */
+  OMP_REQ_ATOMIC_MEM_ORDER_SEQ_CST = 1,  /* 01 */
+  OMP_REQ_ATOMIC_MEM_ORDER_ACQ_REL = 2,  /* 10 */
+  OMP_REQ_ATOMIC_MEM_ORDER_RELAXED = 3,  /* 11 */
+  OMP_REQ_REVERSE_OFFLOAD = (1 << 2),
+  OMP_REQ_UNIFIED_ADDRESS = (1 << 3),
+  OMP_REQ_UNIFIED_SHARED_MEMORY = (1 << 4),
+  OMP_REQ_DYNAMIC_ALLOCATORS = (1 << 5),
+  OMP_REQ_TARGET_MASK = (OMP_REQ_REVERSE_OFFLOAD
+			 | OMP_REQ_UNIFIED_ADDRESS
+			 | OMP_REQ_UNIFIED_SHARED_MEMORY),
+  OMP_REQ_ATOMIC_MEM_ORDER_MASK = (OMP_REQ_ATOMIC_MEM_ORDER_SEQ_CST
+   | OMP_REQ_ATOMIC_MEM_ORDER_ACQ_REL
+   | OMP_REQ_ATOMIC_MEM_ORDER_RELAXED)
+};
+
 typedef struct gfc_omp_clauses
 {
   struct gfc_expr *if_expr;
@@ -1915,6 +1933,10 @@ typedef struct gfc_namespace
 
   /* Set to 1 if there are any calls to procedures with implicit interface.  */
   unsigned implicit_interface_calls:1;
+
+  /* OpenMP requires. */
+  unsigned omp_requires:6;
+  unsigned omp_target_seen:1;
 }
 gfc_namespace;
 
@@ -2645,7 +2667,8 @@

Re: [PATCH] [RFC] vect: Fix infinite loop while determining peeling amount

2020-07-29 Thread Richard Biener via Gcc-patches

On Wed, Jul 29, 2020 at 9:49 AM Stefan Schulze Frielinghaus
 wrote:
>
> On Wed, Jul 29, 2020 at 09:11:12AM +0200, Richard Biener wrote:
> > On Tue, Jul 28, 2020 at 5:36 PM Stefan Schulze Frielinghaus
> >  wrote:
> > >
> > > On Tue, Jul 28, 2020 at 08:55:57AM +0200, Richard Biener wrote:
> > > > On Mon, Jul 27, 2020 at 4:20 PM Stefan Schulze Frielinghaus
> > > >  wrote:
> > > > >
> > > > > On Mon, Jul 27, 2020 at 12:29:11PM +0200, Richard Biener wrote:
> > > > > > On Mon, Jul 27, 2020 at 11:45 AM Richard Sandiford
> > > > > >  wrote:
> > > > > > >
> > > > > > > Richard Biener  writes:
> > > > > > > > On Mon, Jul 27, 2020 at 11:09 AM Richard Sandiford
> > > > > > > >  wrote:
> > > > > > > >>
> > > > > > > >> Richard Biener via Gcc-patches  
> > > > > > > >> writes:
> > > > > > > >> > On Wed, Jul 22, 2020 at 5:18 PM Stefan Schulze Frielinghaus 
> > > > > > > >> > via
> > > > > > > >> > Gcc-patches  wrote:
> > > > > > > >> >>
> > > > > > > >> >> This is a follow up to commit 5c9669a0e6c respectively 
> > > > > > > >> >> discussion
> > > > > > > >> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549132.html
> > > > > > > >> >>
> > > > > > > >> >> In case that an alignment constraint is less than the size 
> > > > > > > >> >> of a
> > > > > > > >> >> corresponding scalar type, ensure that we advance at least 
> > > > > > > >> >> by one
> > > > > > > >> >> iteration.  For example, on s390x we have for a long double 
> > > > > > > >> >> an alignment
> > > > > > > >> >> constraint of 8 bytes whereas the size is 16 bytes.  
> > > > > > > >> >> Therefore,
> > > > > > > >> >> TARGET_ALIGN / DR_SIZE equals zero resulting in an infinite 
> > > > > > > >> >> loop which
> > > > > > > >> >> can be reproduced by the following MWE:
> > > > > > > >> >
> > > > > > > >> > But we guard this case with vector_alignment_reachable_p, so 
> > > > > > > >> > we shouldn't
> > > > > > > >> > have ended up here and the patch looks bogus.
> > > > > > > >>
> > > > > > > >> The above sounds like it ought to count as reachable alignment 
> > > > > > > >> though.
> > > > > > > >> If a type requires a lower alignment than its size, then 
> > > > > > > >> that's even
> > > > > > > >> more easily reachable than a type that requires the same 
> > > > > > > >> alignment as
> > > > > > > >> the size.  I guess at one extreme, a target alignment of 1 is 
> > > > > > > >> always
> > > > > > > >> reachable.
> > > > > > > >
> > > > > > > > Well, if the element alignment is 8 but its size is 16 then 
> > > > > > > > when presumably
> > > > > > > > the desired vector alignment is a multiple of 16 we can never 
> > > > > > > > reach it.
> > > > > > > > Isn't this the case here?
> > > > > > >
> > > > > > > If the desired vector alignment (TARGET_ALIGN) is a multiple of 
> > > > > > > 16 then
> > > > > > > TARGET_ALIGN / DR_SIZE will be nonzero and the problem the patch 
> > > > > > > is
> > > > > > > fixing wouldn't occur.  I agree that we might never be able to 
> > > > > > > reach
> > > > > > > that alignment if the pointer starts out misaligned by 8 bytes.
> > > > > > >
> > > > > > > But I think that's why it makes sense for the target to only ask
> > > > > > > for 8-byte alignment for vectors too, if it can cope with it.  
> > > > > > > 8-byte
> > > > > > > alignment should always be achievable if the scalars are 
> > > > > > > ABI-aligned.
> > > > > > > And if the target does ask for only 8-byte alignment, 
> > > > > > > TARGET_ALIGN /
> > > > > > > DR_SIZE would be zero and the loop would never progress, which is 
> > > > > > > the
> > > > > > > problem that the patch is fixing.
> > > > > > >
> > > > > > > It would even make sense for the target to ask for 1-byte 
> > > > > > > alignment,
> > > > > > > if the target doesn't care about alignment at all.
> > > > > >
> > > > > > Hmm, OK.  Guess I still think we should detect this somewhere upward
> > > > > > and avoid this peeling compute at all.  Somehow.
> > > > >
> > > > > I've been playing around with another solution which works for me by
> > > > > changing vector_alignment_reachable_p to return also false if the
> > > > > alignment requirements are already satisfied, i.e., by adding:
> > > > >
> > > > > if (known_alignment_for_access_p (dr_info) && aligned_access_p 
> > > > > (dr_info))
> > > > >   return false;
> > > >
> > > > That sounds wrong, instead ...
> > >
> > > Can you elaborate on that?  A similar test exists for predicate
> > > vector_alignment_reachable_p where the second conjunct is the same but
> > > negated in order to test for the case where a misalignment is known:
> > > https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/tree-vect-data-refs.c;h=e35a215e042478d11d6545f1f829d816d0c3620f;hb=refs/heads/master#l1263
> > > Therefore, I'm wondering why the non-negated case should be wrong.
> > >
> > > > > Though, I'm not entirely sure whether this makes it better or not.
> > > > > Strictly speaking if the alignment was reachable before peeling, then
> > > > > reaching alignment with pe

Re: [PATCH v3] genemit.c (main): split insn-emit.c for compiling parallelly

2020-07-29 Thread Jojo R

在 2020年7月24日 +0800 PM4:12，Rainer Orth ，写道：
> Jojo R  writes:
>
> > gcc/ChangeLog:
> >
> > * genemit.c (main): Print 'split line'.
> > * Makefile.in (insn-emit.c): Define split count and file
> [...]
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 2ba76656dbf..75841e49127 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1253,6 +1253,13 @@ ANALYZER_OBJS = \
> > # We put the *-match.o and insn-*.o files first so that a parallel make
> > # will build them sooner, because they are large and otherwise tend to be
> > # the last objects to finish building.
> > +
> > +insn-generated-split-num = $(shell grep -c ^processor /proc/cpuinfo)
>
> This is highly unportable, probably Linux-only. It certainly doesn't
> exist on at least Solaris and macOS. Maybe gnulib has something here.
>

Thank you and it’s fixed in the next version patch
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] [RFC] vect: Fix infinite loop while determining peeling amount

2020-07-29 Thread Stefan Schulze Frielinghaus via Gcc-patches

On Wed, Jul 29, 2020 at 09:11:12AM +0200, Richard Biener wrote:
> On Tue, Jul 28, 2020 at 5:36 PM Stefan Schulze Frielinghaus
>  wrote:
> >
> > On Tue, Jul 28, 2020 at 08:55:57AM +0200, Richard Biener wrote:
> > > On Mon, Jul 27, 2020 at 4:20 PM Stefan Schulze Frielinghaus
> > >  wrote:
> > > >
> > > > On Mon, Jul 27, 2020 at 12:29:11PM +0200, Richard Biener wrote:
> > > > > On Mon, Jul 27, 2020 at 11:45 AM Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Richard Biener  writes:
> > > > > > > On Mon, Jul 27, 2020 at 11:09 AM Richard Sandiford
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> Richard Biener via Gcc-patches  writes:
> > > > > > >> > On Wed, Jul 22, 2020 at 5:18 PM Stefan Schulze Frielinghaus via
> > > > > > >> > Gcc-patches  wrote:
> > > > > > >> >>
> > > > > > >> >> This is a follow up to commit 5c9669a0e6c respectively 
> > > > > > >> >> discussion
> > > > > > >> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549132.html
> > > > > > >> >>
> > > > > > >> >> In case that an alignment constraint is less than the size of 
> > > > > > >> >> a
> > > > > > >> >> corresponding scalar type, ensure that we advance at least by 
> > > > > > >> >> one
> > > > > > >> >> iteration.  For example, on s390x we have for a long double 
> > > > > > >> >> an alignment
> > > > > > >> >> constraint of 8 bytes whereas the size is 16 bytes.  
> > > > > > >> >> Therefore,
> > > > > > >> >> TARGET_ALIGN / DR_SIZE equals zero resulting in an infinite 
> > > > > > >> >> loop which
> > > > > > >> >> can be reproduced by the following MWE:
> > > > > > >> >
> > > > > > >> > But we guard this case with vector_alignment_reachable_p, so 
> > > > > > >> > we shouldn't
> > > > > > >> > have ended up here and the patch looks bogus.
> > > > > > >>
> > > > > > >> The above sounds like it ought to count as reachable alignment 
> > > > > > >> though.
> > > > > > >> If a type requires a lower alignment than its size, then that's 
> > > > > > >> even
> > > > > > >> more easily reachable than a type that requires the same 
> > > > > > >> alignment as
> > > > > > >> the size.  I guess at one extreme, a target alignment of 1 is 
> > > > > > >> always
> > > > > > >> reachable.
> > > > > > >
> > > > > > > Well, if the element alignment is 8 but its size is 16 then when 
> > > > > > > presumably
> > > > > > > the desired vector alignment is a multiple of 16 we can never 
> > > > > > > reach it.
> > > > > > > Isn't this the case here?
> > > > > >
> > > > > > If the desired vector alignment (TARGET_ALIGN) is a multiple of 16 
> > > > > > then
> > > > > > TARGET_ALIGN / DR_SIZE will be nonzero and the problem the patch is
> > > > > > fixing wouldn't occur.  I agree that we might never be able to reach
> > > > > > that alignment if the pointer starts out misaligned by 8 bytes.
> > > > > >
> > > > > > But I think that's why it makes sense for the target to only ask
> > > > > > for 8-byte alignment for vectors too, if it can cope with it.  
> > > > > > 8-byte
> > > > > > alignment should always be achievable if the scalars are 
> > > > > > ABI-aligned.
> > > > > > And if the target does ask for only 8-byte alignment, TARGET_ALIGN /
> > > > > > DR_SIZE would be zero and the loop would never progress, which is 
> > > > > > the
> > > > > > problem that the patch is fixing.
> > > > > >
> > > > > > It would even make sense for the target to ask for 1-byte alignment,
> > > > > > if the target doesn't care about alignment at all.
> > > > >
> > > > > Hmm, OK.  Guess I still think we should detect this somewhere upward
> > > > > and avoid this peeling compute at all.  Somehow.
> > > >
> > > > I've been playing around with another solution which works for me by
> > > > changing vector_alignment_reachable_p to return also false if the
> > > > alignment requirements are already satisfied, i.e., by adding:
> > > >
> > > > if (known_alignment_for_access_p (dr_info) && aligned_access_p 
> > > > (dr_info))
> > > >   return false;
> > >
> > > That sounds wrong, instead ...
> >
> > Can you elaborate on that?  A similar test exists for predicate
> > vector_alignment_reachable_p where the second conjunct is the same but
> > negated in order to test for the case where a misalignment is known:
> > https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/tree-vect-data-refs.c;h=e35a215e042478d11d6545f1f829d816d0c3620f;hb=refs/heads/master#l1263
> > Therefore, I'm wondering why the non-negated case should be wrong.
> >
> > > > Though, I'm not entirely sure whether this makes it better or not.
> > > > Strictly speaking if the alignment was reachable before peeling, then
> > > > reaching alignment with peeling is also possible but probably not what
> > > > was intended.  So I guess returning false in this case is sensible.  Any
> > > > comments?
> > >
> > > ... why is the DR considered for peeling at all?  If it is already
> > > aligned there's
> > > no point to do that.
> >
> > Isn't the whole point of vector_alignment_reachable_p to check DRs in
> > order to

Re: [PATCH] [RFC] vect: Fix infinite loop while determining peeling amount

2020-07-29 Thread Richard Biener via Gcc-patches

On Tue, Jul 28, 2020 at 5:36 PM Stefan Schulze Frielinghaus
 wrote:
>
> On Tue, Jul 28, 2020 at 08:55:57AM +0200, Richard Biener wrote:
> > On Mon, Jul 27, 2020 at 4:20 PM Stefan Schulze Frielinghaus
> >  wrote:
> > >
> > > On Mon, Jul 27, 2020 at 12:29:11PM +0200, Richard Biener wrote:
> > > > On Mon, Jul 27, 2020 at 11:45 AM Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Richard Biener  writes:
> > > > > > On Mon, Jul 27, 2020 at 11:09 AM Richard Sandiford
> > > > > >  wrote:
> > > > > >>
> > > > > >> Richard Biener via Gcc-patches  writes:
> > > > > >> > On Wed, Jul 22, 2020 at 5:18 PM Stefan Schulze Frielinghaus via
> > > > > >> > Gcc-patches  wrote:
> > > > > >> >>
> > > > > >> >> This is a follow up to commit 5c9669a0e6c respectively 
> > > > > >> >> discussion
> > > > > >> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549132.html
> > > > > >> >>
> > > > > >> >> In case that an alignment constraint is less than the size of a
> > > > > >> >> corresponding scalar type, ensure that we advance at least by 
> > > > > >> >> one
> > > > > >> >> iteration.  For example, on s390x we have for a long double an 
> > > > > >> >> alignment
> > > > > >> >> constraint of 8 bytes whereas the size is 16 bytes.  Therefore,
> > > > > >> >> TARGET_ALIGN / DR_SIZE equals zero resulting in an infinite 
> > > > > >> >> loop which
> > > > > >> >> can be reproduced by the following MWE:
> > > > > >> >
> > > > > >> > But we guard this case with vector_alignment_reachable_p, so we 
> > > > > >> > shouldn't
> > > > > >> > have ended up here and the patch looks bogus.
> > > > > >>
> > > > > >> The above sounds like it ought to count as reachable alignment 
> > > > > >> though.
> > > > > >> If a type requires a lower alignment than its size, then that's 
> > > > > >> even
> > > > > >> more easily reachable than a type that requires the same alignment 
> > > > > >> as
> > > > > >> the size.  I guess at one extreme, a target alignment of 1 is 
> > > > > >> always
> > > > > >> reachable.
> > > > > >
> > > > > > Well, if the element alignment is 8 but its size is 16 then when 
> > > > > > presumably
> > > > > > the desired vector alignment is a multiple of 16 we can never reach 
> > > > > > it.
> > > > > > Isn't this the case here?
> > > > >
> > > > > If the desired vector alignment (TARGET_ALIGN) is a multiple of 16 
> > > > > then
> > > > > TARGET_ALIGN / DR_SIZE will be nonzero and the problem the patch is
> > > > > fixing wouldn't occur.  I agree that we might never be able to reach
> > > > > that alignment if the pointer starts out misaligned by 8 bytes.
> > > > >
> > > > > But I think that's why it makes sense for the target to only ask
> > > > > for 8-byte alignment for vectors too, if it can cope with it.  8-byte
> > > > > alignment should always be achievable if the scalars are ABI-aligned.
> > > > > And if the target does ask for only 8-byte alignment, TARGET_ALIGN /
> > > > > DR_SIZE would be zero and the loop would never progress, which is the
> > > > > problem that the patch is fixing.
> > > > >
> > > > > It would even make sense for the target to ask for 1-byte alignment,
> > > > > if the target doesn't care about alignment at all.
> > > >
> > > > Hmm, OK.  Guess I still think we should detect this somewhere upward
> > > > and avoid this peeling compute at all.  Somehow.
> > >
> > > I've been playing around with another solution which works for me by
> > > changing vector_alignment_reachable_p to return also false if the
> > > alignment requirements are already satisfied, i.e., by adding:
> > >
> > > if (known_alignment_for_access_p (dr_info) && aligned_access_p (dr_info))
> > >   return false;
> >
> > That sounds wrong, instead ...
>
> Can you elaborate on that?  A similar test exists for predicate
> vector_alignment_reachable_p where the second conjunct is the same but
> negated in order to test for the case where a misalignment is known:
> https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/tree-vect-data-refs.c;h=e35a215e042478d11d6545f1f829d816d0c3620f;hb=refs/heads/master#l1263
> Therefore, I'm wondering why the non-negated case should be wrong.
>
> > > Though, I'm not entirely sure whether this makes it better or not.
> > > Strictly speaking if the alignment was reachable before peeling, then
> > > reaching alignment with peeling is also possible but probably not what
> > > was intended.  So I guess returning false in this case is sensible.  Any
> > > comments?
> >
> > ... why is the DR considered for peeling at all?  If it is already
> > aligned there's
> > no point to do that.
>
> Isn't the whole point of vector_alignment_reachable_p to check DRs in
> order to decide whether peeling should be done or not?  At least this is
> my intuition and the reason why I was suggesting to return false in case
> it is aligned.

Doh, you are right - I confused the function to be a mere wrapper
around the VECTOR_ALIGNMENT_REACHABLE target hook.  But
yes, it's exactly what you say.  But with your suggested extra check

70 matches

Mail list logo