Re: make sincos take type from intrinsic formal, not from result assignment

2020-10-05 Thread Richard Biener via Gcc-patches
On October 6, 2020 3:15:02 AM GMT+02:00, Alexandre Oliva  
wrote:
>
>This is a first step towards enabling the sincos optimization in Ada.
>
>The issue this patch solves is that sincos takes the type to be looked
>up with mathfn_built_in from variables or temporaries in which results
>of sin and cos are stored.  In Ada, sin and cos are declared in an
>internal aux package, with uses thereof in a standard generic package,
>which ensures that the types are not what mathfn_built_in expects.

But are they not compatible? 

Richard. 

>Taking the type from the intrinsic's formal parameter, as in the
>patch, ensures we get the type associated with the intrinsics,
>regardless of the types used to declare and import them, so the lookup
>of the CEXPI intrinsic for the same type finds it.
>
>
>for  gcc/ChangeLog
>
>   * tree-ssa-math-opts.c (execute_cse_sincos_1): Take the type
>   for the cexpi/sincos intrinsic interface from formals of other
>   intrinsics.
>---
> gcc/tree-ssa-math-opts.c |   26 ++
> 1 file changed, 22 insertions(+), 4 deletions(-)
>
>diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
>index 8423caa..31fd241 100644
>--- a/gcc/tree-ssa-math-opts.c
>+++ b/gcc/tree-ssa-math-opts.c
>@@ -1139,7 +1139,7 @@ execute_cse_sincos_1 (tree name)
> {
>   gimple_stmt_iterator gsi;
>   imm_use_iterator use_iter;
>-  tree fndecl, res, type;
>+  tree fndecl = NULL_TREE, res, type = NULL_TREE;
>   gimple *def_stmt, *use_stmt, *stmt;
>   int seen_cos = 0, seen_sin = 0, seen_cexpi = 0;
>   auto_vec stmts;
>@@ -1147,7 +1147,6 @@ execute_cse_sincos_1 (tree name)
>   int i;
>   bool cfg_changed = false;
> 
>-  type = TREE_TYPE (name);
>   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
> {
>   if (gimple_code (use_stmt) != GIMPLE_CALL
>@@ -1169,15 +1168,34 @@ execute_cse_sincos_1 (tree name)
> break;
> 
>   default:;
>+continue;
>   }
>-}
> 
>+  tree t = TREE_VALUE (TYPE_ARG_TYPES (gimple_call_fntype
>(use_stmt)));
>+  if (!type)
>+  type = t;
>+  else if (t != type)
>+  {
>+if (!tree_nop_conversion_p (type, t))
>+  return false;
>+/* If there is more than one type to choose from, prefer one
>+   that has a CEXPI builtin.  */
>+else if (!fndecl
>+ && (fndecl = mathfn_built_in (t, BUILT_IN_CEXPI)))
>+  type = t;
>+  }
>+}
>   if (seen_cos + seen_sin + seen_cexpi <= 1)
> return false;
> 
>+  if (type != TREE_TYPE (name)
>+  && !tree_nop_conversion_p (type, TREE_TYPE (name)))
>+return false;
>+
> /* Simply insert cexpi at the beginning of top_bb but not earlier than
>  the name def statement.  */
>-  fndecl = mathfn_built_in (type, BUILT_IN_CEXPI);
>+  if (!fndecl)
>+fndecl = mathfn_built_in (type, BUILT_IN_CEXPI);
>   if (!fndecl)
> return false;
>   stmt = gimple_build_call (fndecl, 1, name);



Re: [PATCH] c++: Verify 'this' of NS member functions in constexpr [PR97230]

2020-10-05 Thread Jason Merrill via Gcc-patches

On 10/1/20 1:08 PM, Marek Polacek wrote:

This PR points out that when we're invoking a non-static member function
on a null instance during constant evaluation, we should reject.
cxx_eval_call_expression calls cxx_bind_parameters_in_call which
evaluates function arguments, but it won't detect problems like these.

Well, ok, so use integer_zerop to detect a null 'this'.  This also
detects member calls on a variable whose lifetime has ended, because
check_return_expr creates an artificial nullptr:
10195   else if (!processing_template_decl
10196&& maybe_warn_about_returning_address_of_local (retval, 
loc)
10197&& INDIRECT_TYPE_P (valtype))
10198 retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval,
10199  build_zero_cst (TREE_TYPE (retval)));
It would be great if we could somehow distinguish between those two
cases, but experiments with setting TREE_THIS_VOLATILE on the zero
didn't work, so I left it be.

But by the same token, we should detect out-of-bounds accesses.  For
this I'm (ab)using eval_and_check_array_index so that I don't have
to reimplement bounds checking yet again.  But this only works for
ARRAY_REFs, so won't detect

   X x;
   ()[0].foo(); // ok
   ()[1].foo(); // bad

so I've added a special handling of POINTER_PLUS_EXPRs.

While here, we should also detect using an inactive union member.  For
that, I'm using cxx_eval_component_reference.

Does this approach seem sensible?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97230
* constexpr.c (eval_and_check_array_index): Forward declare.
(cxx_eval_component_reference): Likewise.
(cxx_eval_call_expression): Verify the 'this' pointer for
non-static member functions.

gcc/testsuite/ChangeLog:

PR c++/97230
* g++.dg/cpp0x/constexpr-member-fn1.C: New test.
---
  gcc/cp/constexpr.c| 72 ++-
  .../g++.dg/cpp0x/constexpr-member-fn1.C   | 44 
  2 files changed, 115 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-member-fn1.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index a118f8a810b..f62f37ce384 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -2181,6 +2181,11 @@ cxx_eval_thunk_call (const constexpr_ctx *ctx, tree t, 
tree thunk_fndecl,
   non_constant_p, overflow_p);
  }
  
+static tree eval_and_check_array_index (const constexpr_ctx *, tree, bool,

+   bool *, bool *);
+static tree cxx_eval_component_reference (const constexpr_ctx *, tree,
+ bool, bool *, bool *);
+
  /* Subroutine of cxx_eval_constant_expression.
 Evaluate the call expression tree T in the context of OLD_CALL expression
 evaluation.  */
@@ -2467,6 +2472,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
if (*non_constant_p)
  return t;
  
+  tree result = NULL_TREE;

depth_ok = push_cx_call_context (t);
  
/* Remember the object we are constructing.  */

@@ -2496,8 +2502,72 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
new_obj = NULL_TREE;
}
  }
+   /* Verify that the object we're invoking the function on is sane.  */
+  else if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fun)
+  /* maybe_add_lambda_conv_op creates a null 'this' pointer.  */
+  && !LAMBDA_TYPE_P (CP_DECL_CONTEXT (fun)))


Let's look at lambda_static_thunk_p of ctx->call->fundef->decl instead; 
we don't want to allow calling a lambda op() with a null object from 
other contexts.



+{
+  tree thisarg = TREE_VEC_ELT (new_call.bindings, 0);
+  if (integer_zerop (thisarg))
+   {
+ if (!ctx->quiet)
+   error_at (loc, "member call on null pointer is not allowed "
+ "in a constant expression");
+ *non_constant_p = true;
+ result = error_mark_node;
+   }
+  else
+   {
+ STRIP_NOPS (thisarg);
+ if (TREE_CODE (thisarg) == ADDR_EXPR)
+   thisarg = TREE_OPERAND (thisarg, 0);
+ /* Detect out-of-bounds accesses.  */
+ if (TREE_CODE (thisarg) == ARRAY_REF)
+   {
+ eval_and_check_array_index (ctx, thisarg, /*allow_one_past*/false,
+ non_constant_p, overflow_p);
+ if (*non_constant_p)
+   result = error_mark_node;
+   }
+ /* Detect using an inactive member of a union.  */
+ else if (TREE_CODE (thisarg) == COMPONENT_REF)
+   {
+ cxx_eval_component_reference (ctx, thisarg, /*lval*/false,
+   non_constant_p, overflow_p);
+ if (*non_constant_p)
+   result = error_mark_node;
+   }
+ /* Detect other invalid accesses like
  
-  

Re: [committed] libstdc++: Add deduction guide for std::ranges::join_view [LWG 3474]

2020-10-05 Thread Tim Song via Gcc-patches
I thought LWG approved the other option in the PR (changing views::join to
not use CTAD)?

On Mon, Aug 24, 2020 at 10:22 AM Jonathan Wakely via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> This implements the proposed resolution for LWG 3474.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (join_view): Add deduction guide (LWG 3474).
> * testsuite/std/ranges/adaptors/join_lwg3474.cc: New test.
>
> Tested powerpc64le-linux. Committed to trunk.
>
>


Re: [patch] convert -Wrestrict pass to ranger

2020-10-05 Thread Andrew MacLeod via Gcc-patches

On 10/5/20 4:16 PM, Martin Sebor wrote:

On 10/5/20 8:50 AM, Aldy Hernandez via Gcc-patches wrote:

[Martin, as the original author of this pass, do you have any concerns?]



@@ -1270,7 +1271,21 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

    enum value_range_kind range_type;

    if (integral)
-    range_type = determine_value_range (exp, , );
+    {
+  if (query)
+    {
+  value_range vr;
+  gcc_assert (TREE_CODE (exp) == SSA_NAME
+  || TREE_CODE (exp) == INTEGER_CST);
+  gcc_assert (query->range_of_expr (vr, exp, stmt));


Will the call to the function in the assert above not be eliminated
if the assert is turned into a no-op?  If it can't happen (it looks
like it shouldn't anymore), it would still be nice to break it out
of the macro.  Those of us used to the semantics of the C standard
assert macro might wonder.

gcc_assert contents will not be eliminated in a release compiler, only 
the actual check of the return value.    The body of the assert will 
continue to be executed.


This exists because if we were to try to check the return value, we'd 
have to do something like

  bool ret = range_of_expr (...)
  gcc_checking_assert (ret);

and when the checking assert goes away, we're left with an unused 
variable 'ret' warning.   the gcc_assert ()  resolves that issue.


I'm not a huge fan of them, but they do serve a purpose and seem better 
than the alternatives :-P


The first assert should however be a gcc_checking_assert since its just 
a check.. and then that will go away in a release compiler.



-/* Execute the pass for function FUN, walking in dominator order.  */
-
  unsigned
  pass_wrestrict::execute (function *fun)
  {
-  calculate_dominance_info (CDI_DOMINATORS);
-
-  wrestrict_dom_walker walker;
-  walker.walk (ENTRY_BLOCK_PTR_FOR_FN (fun));
+  gimple_ranger ranger;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    wrestrict_walk (, bb);

    return 0;
  }
@@ -159,11 +144,14 @@ public:
   only the destination reference is.  */
    bool strbounded_p;

-  builtin_memref (tree, tree);
+  builtin_memref (range_query *, gimple *, tree, tree);

    tree offset_out_of_bounds (int, offset_int[3]) const;

  private:
+  gimple *stmt;
+
+  range_query *query;


Also please add a comment above STMT to make it clear it's the call
statement to the builtin.

For QUERY, I'm not sure adding a member to every class that needs
to compute ranges is the right design.  At the same time, adding
an argument to every function that computes ranges isn't great
either.  It seems like there should be one shared ranger instance
that could be used by all clients/passes as well as untility
functions called from them.  It could be a global object set/pushed
by each pass when it starts and popped when it ends, and managed by
the ranger API.  Say a static member function:

  range_query* range_query::instance ();
  range_query* range_query::push_instance (range_query*);
  range_query* range_query::pop_instance ();

As some background, when I wrote the builtin_access access
I envisioned using it as a general-purpose class in other similar
contexts.  That hasn't quite happened yet but there is a class
kind of like it that might eventually end up taking the place of
builtin_access.  It's access_ref in builtins.h.  And while neither
class crates a lot of instances so far, I'm about to post a patch
that does create one or two instances of access_ref per SSA_NAME
of pointer type.  Having an extra member in each instance just
to gain access to an API would be excessive.

I'm not saying all this as an objection to the change but more
as something to think about going forward.


Long term, I would expect we might have some sort of general access... 
probably thru cfun.     so any pass/routines would just ask for

    RANGE_INFO (cfun)->range_of_expr()
The default would be a general value_range implementation which probably 
implements something like determine_value_range_1 ().. and if a pass 
wants to use a ranger, then it could register a ranger, and when its 
done delete it.  and it would just work for everyone everywhere.


but we're not there yet, and we havent worked out the details :-) for 
the moment, passes need to figure out themselves how to access the 
ranger they create if they want one.   They had to manage a 
range_analyzer before if the used anything other than global ranges, so 
that is nothing new.





make sincos take type from intrinsic formal, not from result assignment

2020-10-05 Thread Alexandre Oliva


This is a first step towards enabling the sincos optimization in Ada.

The issue this patch solves is that sincos takes the type to be looked
up with mathfn_built_in from variables or temporaries in which results
of sin and cos are stored.  In Ada, sin and cos are declared in an
internal aux package, with uses thereof in a standard generic package,
which ensures that the types are not what mathfn_built_in expects.

Taking the type from the intrinsic's formal parameter, as in the
patch, ensures we get the type associated with the intrinsics,
regardless of the types used to declare and import them, so the lookup
of the CEXPI intrinsic for the same type finds it.


for  gcc/ChangeLog

* tree-ssa-math-opts.c (execute_cse_sincos_1): Take the type
for the cexpi/sincos intrinsic interface from formals of other
intrinsics.
---
 gcc/tree-ssa-math-opts.c |   26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 8423caa..31fd241 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1139,7 +1139,7 @@ execute_cse_sincos_1 (tree name)
 {
   gimple_stmt_iterator gsi;
   imm_use_iterator use_iter;
-  tree fndecl, res, type;
+  tree fndecl = NULL_TREE, res, type = NULL_TREE;
   gimple *def_stmt, *use_stmt, *stmt;
   int seen_cos = 0, seen_sin = 0, seen_cexpi = 0;
   auto_vec stmts;
@@ -1147,7 +1147,6 @@ execute_cse_sincos_1 (tree name)
   int i;
   bool cfg_changed = false;
 
-  type = TREE_TYPE (name);
   FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, name)
 {
   if (gimple_code (use_stmt) != GIMPLE_CALL
@@ -1169,15 +1168,34 @@ execute_cse_sincos_1 (tree name)
  break;
 
default:;
+ continue;
}
-}
 
+  tree t = TREE_VALUE (TYPE_ARG_TYPES (gimple_call_fntype (use_stmt)));
+  if (!type)
+   type = t;
+  else if (t != type)
+   {
+ if (!tree_nop_conversion_p (type, t))
+   return false;
+ /* If there is more than one type to choose from, prefer one
+that has a CEXPI builtin.  */
+ else if (!fndecl
+  && (fndecl = mathfn_built_in (t, BUILT_IN_CEXPI)))
+   type = t;
+   }
+}
   if (seen_cos + seen_sin + seen_cexpi <= 1)
 return false;
 
+  if (type != TREE_TYPE (name)
+  && !tree_nop_conversion_p (type, TREE_TYPE (name)))
+return false;
+
   /* Simply insert cexpi at the beginning of top_bb but not earlier than
  the name def statement.  */
-  fndecl = mathfn_built_in (type, BUILT_IN_CEXPI);
+  if (!fndecl)
+fndecl = mathfn_built_in (type, BUILT_IN_CEXPI);
   if (!fndecl)
 return false;
   stmt = gimple_build_call (fndecl, 1, name);


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Go patch committed: correct file reading logic in Stream_from_file

2020-10-05 Thread Ian Lance Taylor via Gcc-patches
This Go frontend patch by Nikhil Benesch fixes the file reading logic
in the Stream_from_file class.  That class is almost never used, and I
guess nobody noticed these problems.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
500c2690e24054730a2ecf9989720e9d501c9eb1
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 94827406df1..701b2d427e3 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-801c458a562d22260ff176c26d65639dd32c8a90
+d00febdab0535546ccbf1ef634be1f23b09c8b77
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/import.cc b/gcc/go/gofrontend/import.cc
index c63ae24f533..081afefa083 100644
--- a/gcc/go/gofrontend/import.cc
+++ b/gcc/go/gofrontend/import.cc
@@ -1487,7 +1487,7 @@ Stream_from_file::~Stream_from_file()
 bool
 Stream_from_file::do_peek(size_t length, const char** bytes)
 {
-  if (this->data_.length() <= length)
+  if (this->data_.length() >= length)
 {
   *bytes = this->data_.data();
   return true;
@@ -1504,7 +1504,7 @@ Stream_from_file::do_peek(size_t length, const char** 
bytes)
   return false;
 }
 
-  if (lseek(this->fd_, - got, SEEK_CUR) != 0)
+  if (lseek(this->fd_, - got, SEEK_CUR) < 0)
 {
   if (!this->saw_error())
go_fatal_error(Linemap::unknown_location(), "lseek failed: %m");
@@ -1524,7 +1524,7 @@ Stream_from_file::do_peek(size_t length, const char** 
bytes)
 void
 Stream_from_file::do_advance(size_t skip)
 {
-  if (lseek(this->fd_, skip, SEEK_CUR) != 0)
+  if (lseek(this->fd_, skip, SEEK_CUR) < 0)
 {
   if (!this->saw_error())
go_fatal_error(Linemap::unknown_location(), "lseek failed: %m");
@@ -1532,7 +1532,7 @@ Stream_from_file::do_advance(size_t skip)
 }
   if (!this->data_.empty())
 {
-  if (this->data_.length() < skip)
+  if (this->data_.length() > skip)
this->data_.erase(0, skip);
   else
this->data_.clear();


Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-05 Thread Jonathan Wakely via Gcc-patches

On 06/10/20 00:25 +0100, Jonathan Wakely wrote:

I'm sorry it's taken a year to review this properly. Comments below ...

On 27/09/19 14:18 -0400, Daniel Lemire wrote:

(This is a revised patch proposal. I am revising both the description
and the code itself.)

Even on recent processors, integer division is relatively expensive.
The current implementation of  std::uniform_int_distribution typically
requires two divisions by invocation:

  // downscaling
   const __uctype __uerange = __urange + 1; // __urange can be zero
   const __uctype __scaling = __urngrange / __uerange;
   const __uctype __past = __uerange * __scaling;
   do
 __ret = __uctype(__urng()) - __urngmin;
   while (__ret >= __past);
   __ret /= __scaling;

We can achieve the same algorithmic result with at most one division,
and typically no division at all without requiring more calls to the
random number generator.
This was recently added to Swift (https://github.com/apple/swift/pull/25286)

The main challenge is that we need to be able to compute the "full"
product. E.g., given two 64-bit integers, we want the 128-bit result;
given two 32-bit integers we want the 64-bit result. This is fast on
common processors.
The 128-bit product is not natively supported in C/C++ but can be
achieved with the
__int128 extension when it is available. The patch checks for
__int128 support; when
support is lacking, we fallback on the existing approach which uses
two divisions per
call.

For example, if we replace the above code by the following, we get a substantial
performance boost on skylake microarchitectures. E.g., it can
be twice as fast to shuffle arrays of 1 million elements (e.g., using
the followingbenchmark: https://github.com/lemire/simple_cpp_shuffle_benchmark )


unsigned __int128 __product = (unsigned
__int128)(__uctype(__urng()) - __urngmin) * uint64_t(__uerange);
uint64_t __lsb = uint64_t(__product);
if(__lsb < __uerange)
{
  uint64_t __threshold = -uint64_t(__uerange) % uint64_t(__uerange);
  while (__lsb < __threshold)
  {
__product = (unsigned __int128)(__uctype(__urng()) -
__urngmin) * (unsigned __int128)(__uerange);
__lsb = uint64_t(__product);
  }
}
__ret = __product >> 64;

Included is a patch that would bring better performance (e.g., 2x gain) to
std::uniform_int_distribution  in some cases. Here are some actual numbers...

With this patch:

std::shuffle(testvalues, testvalues + size, g)  :  7952091
ns total,  7.95 ns per input key

Before this patch:

std::shuffle(testvalues, testvalues + size, g)  :
14954058 ns total,  14.95 ns per input key


Compiler: GNU GCC 8.3 with -O3, hardware: Skylake (i7-6700).

Furthermore, the new algorithm is unbiased, so the randomness of the
result is not affected.

I ran both performance and biases tests with the proposed patch.


This patch proposal was improved following feedback by Jonathan
Wakely. An earlier version used the __uint128_t type, which is widely
supported but not used in the C++ library, instead we now use unsigned
__int128. Furthermore, the previous patch was accidentally broken: it
was not computing the full product since a rhs cast was missing. These
issues are fixed and verified.


After looking at GCC's internals, it looks like __uint128_t is
actually fine to use, even though we never currently use it in the
library. I didn't even know it was supported for C++ mode, sorry!


Reference: Fast Random Integer Generation in an Interval, ACM Transactions on
Modeling and Computer Simulation 29 (1), 2019 https://arxiv.org/abs/1805.10941



Index: libstdc++-v3/include/bits/uniform_int_dist.h
===
--- libstdc++-v3/include/bits/uniform_int_dist.h(revision 276183)
+++ libstdc++-v3/include/bits/uniform_int_dist.h(working copy)
@@ -33,7 +33,8 @@

#include 
#include 
-
+#include 
+#include 
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -239,18 +240,61 @@
  = __uctype(__param.b()) - __uctype(__param.a());

__uctype __ret;
-
-   if (__urngrange > __urange)
+   if (__urngrange > __urange)
  {
-   // downscaling
-   const __uctype __uerange = __urange + 1; // __urange can be zero
-   const __uctype __scaling = __urngrange / __uerange;
-   const __uctype __past = __uerange * __scaling;
-   do
- __ret = __uctype(__urng()) - __urngmin;
-   while (__ret >= __past);
-   __ret /= __scaling;
- }
+   const __uctype __uerange = __urange + 1; // __urange can be zero
+#if _GLIBCXX_USE_INT128 == 1
+if(sizeof(__uctype) == sizeof(uint64_t) and
+  (__urngrange == numeric_limits::max()))
+{
+  // 64-bit case
+  // reference: Fast Random Integer Generation in an Interval
+  // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
+  

Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-05 Thread Jonathan Wakely via Gcc-patches

On 06/10/20 00:25 +0100, Jonathan Wakely wrote:

I'm sorry it's taken a year to review this properly. Comments below ...

On 27/09/19 14:18 -0400, Daniel Lemire wrote:

(This is a revised patch proposal. I am revising both the description
and the code itself.)

Even on recent processors, integer division is relatively expensive.
The current implementation of  std::uniform_int_distribution typically
requires two divisions by invocation:

  // downscaling
   const __uctype __uerange = __urange + 1; // __urange can be zero
   const __uctype __scaling = __urngrange / __uerange;
   const __uctype __past = __uerange * __scaling;
   do
 __ret = __uctype(__urng()) - __urngmin;
   while (__ret >= __past);
   __ret /= __scaling;

We can achieve the same algorithmic result with at most one division,
and typically no division at all without requiring more calls to the
random number generator.
This was recently added to Swift (https://github.com/apple/swift/pull/25286)

The main challenge is that we need to be able to compute the "full"
product. E.g., given two 64-bit integers, we want the 128-bit result;
given two 32-bit integers we want the 64-bit result. This is fast on
common processors.
The 128-bit product is not natively supported in C/C++ but can be
achieved with the
__int128 extension when it is available. The patch checks for
__int128 support; when
support is lacking, we fallback on the existing approach which uses
two divisions per
call.

For example, if we replace the above code by the following, we get a substantial
performance boost on skylake microarchitectures. E.g., it can
be twice as fast to shuffle arrays of 1 million elements (e.g., using
the followingbenchmark: https://github.com/lemire/simple_cpp_shuffle_benchmark )


unsigned __int128 __product = (unsigned
__int128)(__uctype(__urng()) - __urngmin) * uint64_t(__uerange);
uint64_t __lsb = uint64_t(__product);
if(__lsb < __uerange)
{
  uint64_t __threshold = -uint64_t(__uerange) % uint64_t(__uerange);
  while (__lsb < __threshold)
  {
__product = (unsigned __int128)(__uctype(__urng()) -
__urngmin) * (unsigned __int128)(__uerange);
__lsb = uint64_t(__product);
  }
}
__ret = __product >> 64;

Included is a patch that would bring better performance (e.g., 2x gain) to
std::uniform_int_distribution  in some cases. Here are some actual numbers...

With this patch:

std::shuffle(testvalues, testvalues + size, g)  :  7952091
ns total,  7.95 ns per input key

Before this patch:

std::shuffle(testvalues, testvalues + size, g)  :
14954058 ns total,  14.95 ns per input key


Compiler: GNU GCC 8.3 with -O3, hardware: Skylake (i7-6700).

Furthermore, the new algorithm is unbiased, so the randomness of the
result is not affected.

I ran both performance and biases tests with the proposed patch.


This patch proposal was improved following feedback by Jonathan
Wakely. An earlier version used the __uint128_t type, which is widely
supported but not used in the C++ library, instead we now use unsigned
__int128. Furthermore, the previous patch was accidentally broken: it
was not computing the full product since a rhs cast was missing. These
issues are fixed and verified.


After looking at GCC's internals, it looks like __uint128_t is
actually fine to use, even though we never currently use it in the
library. I didn't even know it was supported for C++ mode, sorry!


Reference: Fast Random Integer Generation in an Interval, ACM Transactions on
Modeling and Computer Simulation 29 (1), 2019 https://arxiv.org/abs/1805.10941



Index: libstdc++-v3/include/bits/uniform_int_dist.h
===
--- libstdc++-v3/include/bits/uniform_int_dist.h(revision 276183)
+++ libstdc++-v3/include/bits/uniform_int_dist.h(working copy)
@@ -33,7 +33,8 @@

#include 
#include 
-
+#include 
+#include 
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -239,18 +240,61 @@
  = __uctype(__param.b()) - __uctype(__param.a());

__uctype __ret;
-
-   if (__urngrange > __urange)
+   if (__urngrange > __urange)
  {
-   // downscaling
-   const __uctype __uerange = __urange + 1; // __urange can be zero
-   const __uctype __scaling = __urngrange / __uerange;
-   const __uctype __past = __uerange * __scaling;
-   do
- __ret = __uctype(__urng()) - __urngmin;
-   while (__ret >= __past);
-   __ret /= __scaling;
- }
+   const __uctype __uerange = __urange + 1; // __urange can be zero
+#if _GLIBCXX_USE_INT128 == 1
+if(sizeof(__uctype) == sizeof(uint64_t) and
+  (__urngrange == numeric_limits::max()))
+{
+  // 64-bit case
+  // reference: Fast Random Integer Generation in an Interval
+  // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
+  

Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-05 Thread Jonathan Wakely via Gcc-patches

I'm sorry it's taken a year to review this properly. Comments below ...

On 27/09/19 14:18 -0400, Daniel Lemire wrote:

(This is a revised patch proposal. I am revising both the description
and the code itself.)

Even on recent processors, integer division is relatively expensive.
The current implementation of  std::uniform_int_distribution typically
requires two divisions by invocation:

   // downscaling
const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
  __ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;

We can achieve the same algorithmic result with at most one division,
and typically no division at all without requiring more calls to the
random number generator.
This was recently added to Swift (https://github.com/apple/swift/pull/25286)

The main challenge is that we need to be able to compute the "full"
product. E.g., given two 64-bit integers, we want the 128-bit result;
given two 32-bit integers we want the 64-bit result. This is fast on
common processors.
The 128-bit product is not natively supported in C/C++ but can be
achieved with the
__int128 extension when it is available. The patch checks for
__int128 support; when
support is lacking, we fallback on the existing approach which uses
two divisions per
call.

For example, if we replace the above code by the following, we get a substantial
performance boost on skylake microarchitectures. E.g., it can
be twice as fast to shuffle arrays of 1 million elements (e.g., using
the followingbenchmark: https://github.com/lemire/simple_cpp_shuffle_benchmark )


 unsigned __int128 __product = (unsigned
__int128)(__uctype(__urng()) - __urngmin) * uint64_t(__uerange);
 uint64_t __lsb = uint64_t(__product);
 if(__lsb < __uerange)
 {
   uint64_t __threshold = -uint64_t(__uerange) % uint64_t(__uerange);
   while (__lsb < __threshold)
   {
 __product = (unsigned __int128)(__uctype(__urng()) -
__urngmin) * (unsigned __int128)(__uerange);
 __lsb = uint64_t(__product);
   }
 }
 __ret = __product >> 64;

Included is a patch that would bring better performance (e.g., 2x gain) to
std::uniform_int_distribution  in some cases. Here are some actual numbers...

With this patch:

std::shuffle(testvalues, testvalues + size, g)  :  7952091
ns total,  7.95 ns per input key

Before this patch:

std::shuffle(testvalues, testvalues + size, g)  :
14954058 ns total,  14.95 ns per input key


Compiler: GNU GCC 8.3 with -O3, hardware: Skylake (i7-6700).

Furthermore, the new algorithm is unbiased, so the randomness of the
result is not affected.

I ran both performance and biases tests with the proposed patch.


This patch proposal was improved following feedback by Jonathan
Wakely. An earlier version used the __uint128_t type, which is widely
supported but not used in the C++ library, instead we now use unsigned
__int128. Furthermore, the previous patch was accidentally broken: it
was not computing the full product since a rhs cast was missing. These
issues are fixed and verified.


After looking at GCC's internals, it looks like __uint128_t is
actually fine to use, even though we never currently use it in the
library. I didn't even know it was supported for C++ mode, sorry!


Reference: Fast Random Integer Generation in an Interval, ACM Transactions on
Modeling and Computer Simulation 29 (1), 2019 https://arxiv.org/abs/1805.10941



Index: libstdc++-v3/include/bits/uniform_int_dist.h
===
--- libstdc++-v3/include/bits/uniform_int_dist.h(revision 276183)
+++ libstdc++-v3/include/bits/uniform_int_dist.h(working copy)
@@ -33,7 +33,8 @@

#include 
#include 
-
+#include 
+#include 
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -239,18 +240,61 @@
  = __uctype(__param.b()) - __uctype(__param.a());

__uctype __ret;
-
-   if (__urngrange > __urange)
+   if (__urngrange > __urange)
  {
-   // downscaling
-   const __uctype __uerange = __urange + 1; // __urange can be zero
-   const __uctype __scaling = __urngrange / __uerange;
-   const __uctype __past = __uerange * __scaling;
-   do
- __ret = __uctype(__urng()) - __urngmin;
-   while (__ret >= __past);
-   __ret /= __scaling;
- }
+   const __uctype __uerange = __urange + 1; // __urange can be zero
+#if _GLIBCXX_USE_INT128 == 1
+if(sizeof(__uctype) == sizeof(uint64_t) and
+  (__urngrange == numeric_limits::max()))
+{
+  // 64-bit case
+  // reference: Fast Random Integer Generation in an Interval
+  // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
+  // 

[PATCH] RFC: add "deallocated_by" attribute for use by analyzer

2020-10-05 Thread David Malcolm via Gcc-patches
This work-in-progress patch generalizes the malloc/free problem-checking
in -fanalyzer so that it can work on arbitrary acquire/release API pairs.

It adds a new __attribute__((deallocated_by(FOO))) that could be used
like this in a library header:

  struct foo;

  extern void foo_release (struct foo *);

  extern struct foo *foo_acquire (void)
__attribute__ ((deallocated_by(foo_release)));

In theory, the analyzer then "knows" these functions are an
acquire/release pair, and can emit diagnostics for leaks, double-frees,
use-after-frees, mismatching deallocations, etc.

My hope was that this would provide a minimal level of markup that would
support library-checking without requiring lots of further markup.
I attempted to use this to detect a memory leak within a Linux
driver (CVE-2019-19078), by adding the attribute to mark these fns:
  extern struct urb *usb_alloc_urb(int iso_packets, gfp_t mem_flags);
  extern void usb_free_urb(struct urb *urb);
where there is a leak of a "urb" on an error-handling path.
Unfortunately I ran into the problem that there are various other fns
that take "struct urb *" and the analyzer conservatively assumes that a
urb passed to them might or might not be freed and thus stops tracking
state for them.

So I don't know how much use this feature would be as-is.
(without either requiring lots of additional attributes for marking
fndecl args as being merely borrowed, or simply assuming that they
are borrowed in the absence of a function body to analyze)

Thoughts?
Dave

gcc/analyzer/ChangeLog:
* region-model-impl-calls.cc
(region_model::impl_deallocation_call): New.
* region-model.cc: Include "attribs.h".
(region_model::on_call_post): Handle fndecls referenced by
__attribute__((deallocated_by(FOO))).
* region-model.h (region_model::impl_deallocation_call): New decl.
* sm-malloc.cc: Include "stringpool.h" and "attribs.h".
(enum wording): Add WORDING_DEALLOCATED.
(malloc_state_machine::custom_api_map_t): New typedef.
(malloc_state_machine::m_custom_apis): New field.
(start_p): New.
(use_after_free::describe_state_change): Handle
WORDING_DEALLOCATED.
(use_after_free::describe_final_event): Likewise.
(malloc_leak::describe_state_change): Only emit "allocated here" on
a start->nonnull transition, rather than on other transitions to
nonnull.
(malloc_state_machine::~malloc_state_machine): New.
(malloc_state_machine::on_stmt): Handle
"__attribute__((deallocated_by(FOO)))", and the special attribute
set on FOO.
(malloc_state_machine::get_or_create_api): New.
(malloc_state_machine::on_allocator_call): Add "returns_nonnull"
param and use it to affect which state to transition to.

gcc/c-family/ChangeLog:
* c-attribs.c (c_common_attribute_table): Add entry for
"deallocated_by".
(matching_deallocator_type_p): New.
(maybe_add_deallocator_attribute): New.
(handle_deallocated_by_attribute): New.

gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Add
"deallocated_by".

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/attr-deallocated_by-1.c: New test.
* gcc.dg/analyzer/attr-deallocated_by-1a.c: New test.
* gcc.dg/analyzer/attr-deallocated_by-2.c: New test.
* gcc.dg/analyzer/attr-deallocated_by-3.c: New test.
* gcc.dg/analyzer/attr-deallocated_by-4.c: New test.
* gcc.dg/analyzer/attr-deallocated_by-CVE-2019-19078-usb-leak.c:
New test.
* gcc.dg/analyzer/attr-deallocated_by-misuses.c: New test.
---
 gcc/analyzer/region-model-impl-calls.cc   |   9 +
 gcc/analyzer/region-model.cc  |  10 +
 gcc/analyzer/region-model.h   |   1 +
 gcc/analyzer/sm-malloc.cc | 107 -
 gcc/c-family/c-attribs.c  | 116 ++
 gcc/doc/extend.texi   |  89 
 .../gcc.dg/analyzer/attr-deallocated_by-1.c   |  74 +++
 .../gcc.dg/analyzer/attr-deallocated_by-1a.c  |  69 ++
 .../gcc.dg/analyzer/attr-deallocated_by-2.c   |  24 ++
 .../gcc.dg/analyzer/attr-deallocated_by-3.c   |  31 +++
 .../gcc.dg/analyzer/attr-deallocated_by-4.c   |  21 ++
 ...r-deallocated_by-CVE-2019-19078-usb-leak.c | 208 ++
 .../analyzer/attr-deallocated_by-misuses.c|  26 +++
 13 files changed, 779 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-deallocated_by-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-deallocated_by-1a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-deallocated_by-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-deallocated_by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/attr-deallocated_by-4.c
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/attr-deallocated_by-CVE-2019-19078-usb-leak.c
 create mode 

[PATCH] c++: typename in out-of-class member function definitions [PR97297]

2020-10-05 Thread Marek Polacek via Gcc-patches
I was notified that our P0634R3 (Down with typename) implementation has
a flaw: when we have an out-of-class member function definition, we
still required 'typename' for its parameters.  For example here:

  template  struct S {
int simple(T::type);
  };
  template 
  int S::simple(/* typename */T::type) { return 0; }

the 'typename' isn't necessary per [temp.res]/5.2.4.  We have a qualified
name here ("S::simple") so we know it's already been declared so we
can look it up to see if it's a function template or a variable
template.

In this case, the P0634R3 code in cp_parser_direct_declarator wasn't
looking into uninstantiated templates and didn't find the member
function 'simple' -- cp_parser_lookup_name returned a SCOPE_REF which
means that the qualifying scope was dependent.  With this fix, we find
the BASELINK for 'simple', don't clear CP_PARSER_FLAGS_TYPENAME_OPTIONAL
from the flags, and the typename is implicitly assumed.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/97297
* parser.c (cp_parser_direct_declarator): When checking if a
name is a function template declaration for the P0634R3 case,
look in uninstantiated templates too.

gcc/testsuite/ChangeLog:

PR c++/97297
* g++.dg/cpp2a/typename18.C: New test.
---
 gcc/cp/parser.c | 10 --
 gcc/testsuite/g++.dg/cpp2a/typename18.C | 21 +
 2 files changed, 29 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/typename18.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cb4422764ed..2002c05fdb5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -21788,8 +21788,14 @@ cp_parser_direct_declarator (cp_parser* parser,
   templates, assume S::p to name a type.  Otherwise,
   don't.  */
tree decl
- = cp_parser_lookup_name_simple (parser, unqualified_name,
- token->location);
+ = cp_parser_lookup_name (parser, unqualified_name,
+  none_type,
+  /*is_template=*/false,
+  /*is_namespace=*/false,
+  /*check_dependency=*/false,
+  /*ambiguous_decls=*/NULL,
+  token->location);
+
if (!is_overloaded_fn (decl)
/* Allow
   template
diff --git a/gcc/testsuite/g++.dg/cpp2a/typename18.C 
b/gcc/testsuite/g++.dg/cpp2a/typename18.C
new file mode 100644
index 000..99468661491
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/typename18.C
@@ -0,0 +1,21 @@
+// PR c++/97297
+// { dg-do compile { target c++20 } }
+
+template 
+struct S {
+int simple(T::type);
+
+template 
+int member(U::type);
+};
+
+template 
+int S::simple(T::type) {
+return 1;
+}
+
+template 
+template 
+int S::member(U::type) {
+return 2;
+}

base-commit: ea6da7f50fe2adc3a09fc10a3f437902c40ebff9
-- 
2.26.2



[committed] libstdc++: Reduce uses of std::numeric_limits

2020-10-05 Thread Jonathan Wakely via Gcc-patches
This avoids unnecessary instantiations of std::numeric_limits or
inclusion of  when a more lightweight alternative would work.
Some uses can be replaced with __gnu_cxx::__int_traits and some can just
use size_t(-1) directly where SIZE_MAX is needed.

libstdc++-v3/ChangeLog:

* include/bits/regex.h: Use __int_traits instead of
std::numeric_limits.
* include/bits/uniform_int_dist.h: Use __int_traits::__max
instead of std::numeric_limits::max().
* include/bits/hashtable_policy.h: Use size_t(-1) instead of
std::numeric_limits::max().
* include/std/regex: Include .
* include/std/string_view: Use typedef for __int_traits.
* src/c++11/hashtable_c++0x.cc: Use size_t(-1) instead of
std::numeric_limits::max().
* testsuite/std/ranges/iota/96042.cc: Include .
* testsuite/std/ranges/iota/difference_type.cc: Likewise.
* testsuite/std/ranges/subrange/96042.cc: Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 9af65c2b9047168f14e623d55f87beda33ba1503
Author: Jonathan Wakely 
Date:   Tue Oct 6 00:05:11 2020

libstdc++: Reduce uses of std::numeric_limits

This avoids unnecessary instantiations of std::numeric_limits or
inclusion of  when a more lightweight alternative would work.
Some uses can be replaced with __gnu_cxx::__int_traits and some can just
use size_t(-1) directly where SIZE_MAX is needed.

libstdc++-v3/ChangeLog:

* include/bits/regex.h: Use __int_traits instead of
std::numeric_limits.
* include/bits/uniform_int_dist.h: Use __int_traits::__max
instead of std::numeric_limits::max().
* include/bits/hashtable_policy.h: Use size_t(-1) instead of
std::numeric_limits::max().
* include/std/regex: Include .
* include/std/string_view: Use typedef for __int_traits.
* src/c++11/hashtable_c++0x.cc: Use size_t(-1) instead of
std::numeric_limits::max().
* testsuite/std/ranges/iota/96042.cc: Include .
* testsuite/std/ranges/iota/difference_type.cc: Likewise.
* testsuite/std/ranges/subrange/96042.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
b/libstdc++-v3/include/bits/hashtable_policy.h
index 0109ef86a7b..31ff4f16579 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -32,8 +32,8 @@
 #define _HASHTABLE_POLICY_H 1
 
 #include// for std::tuple, std::forward_as_tuple
-#include   // for std::numeric_limits
 #include  // for std::min, std::is_permutation.
+#include // for __gnu_cxx::__int_traits
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -506,6 +506,7 @@ namespace __detail
   inline std::size_t
   __clp2(std::size_t __n) noexcept
   {
+using __gnu_cxx::__int_traits;
 // Equivalent to return __n ? std::bit_ceil(__n) : 0;
 if (__n < 2)
   return __n;
@@ -513,7 +514,7 @@ namespace __detail
   ? __builtin_clzll(__n - 1ull)
   : __builtin_clzl(__n - 1ul);
 // Doing two shifts avoids undefined behaviour when __lz == 0.
-return (size_t(1) << (numeric_limits::digits - __lz - 1)) << 1;
+return (size_t(1) << (__int_traits::__digits - __lz - 1)) << 1;
   }
 
   /// Rehash policy providing power of 2 bucket numbers. Avoids modulo
@@ -556,7 +557,7 @@ namespace __detail
// Set next resize to the max value so that we never try to rehash again
// as we already reach the biggest possible bucket number.
// Note that it might result in max_load_factor not being respected.
-   _M_next_resize = numeric_limits::max();
+   _M_next_resize = size_t(-1);
   else
_M_next_resize
  = __builtin_floorl(__res * (long double)_M_max_load_factor);
diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index 31ebcc1eb86..15e4289bf95 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -973,11 +973,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  if (const size_t __n = std::min(_M_len, __s._M_len))
if (int __ret = traits_type::compare(_M_data, __s._M_data, __n))
  return __ret;
+ using __limits = __gnu_cxx::__int_traits;
  const difference_type __diff = _M_len - __s._M_len;
- if (__diff > std::numeric_limits::max())
-   return std::numeric_limits::max();
- if (__diff < std::numeric_limits::min())
-   return std::numeric_limits::min();
+ if (__diff > __limits::__max)
+   return __limits::__max;
+ if (__diff < __limits::__min)
+   return __limits::__min;
  return static_cast(__diff);
}
 
diff --git a/libstdc++-v3/include/bits/uniform_int_dist.h 
b/libstdc++-v3/include/bits/uniform_int_dist.h
index e3d7934e997..6e1e3d5fc5f 100644
--- 

[PATCH] t/trodgers/c2a_synchronization

2020-10-05 Thread Thomas Rodgers
From: Thomas Rodgers 

This *should* be the correct patch this time.

Add support for -
  * atomic_flag::wait/notify_one/notify_all
  * atomic::wait/notify_one/notify_all
  * counting_semaphore
  * binary_semaphore
  * latch

libstdc++-v3/ChangeLog:

* include/Makefile.am (bits_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h (__atomic_flag::wait): Define.
(__atomic_flag::notify_one): Likewise.
(__atomic_flag::notify_all): Likewise.
(__atomic_base<_Itp>::wait): Likewise.
(__atomic_base<_Itp>::notify_one): Likewise.
(__atomic_base<_Itp>::notify_all): Likewise.
(__atomic_base<_Ptp*>::wait): Likewise.
(__atomic_base<_Ptp*>::notify_one): Likewise.
(__atomic_base<_Ptp*>::notify_all): Likewise.
(__atomic_impl::wait): Likewise.
(__atomic_impl::notify_one): Likewise.
(__atomic_impl::notify_all): Likewise.
(__atomic_float<_Fp>::wait): Likewise.
(__atomic_float<_Fp>::notify_one): Likewise.
(__atomic_float<_Fp>::notify_all): Likewise.
(__atomic_ref<_Tp>::wait): Likewise.
(__atomic_ref<_Tp>::notify_one): Likewise.
(__atomic_ref<_Tp>::notify_all): Likewise.
(atomic_wait<_Tp>): Likewise.
(atomic_wait_explicit<_Tp>): Likewise.
(atomic_notify_one<_Tp>): Likewise.
(atomic_notify_all<_Tp>): Likewise.
* include/bits/atomic_wait.h: New file.
* include/bits/atomic_timed_wait.h: New file.
* include/bits/semaphore_base.h: New file.
* include/std/atomic (atomic::wait): Define.
(atomic::wait_one): Likewise.
(atomic::wait_all): Likewise.
(atomic<_Tp>::wait): Likewise.
(atomic<_Tp>::wait_one): Likewise.
(atomic<_Tp>::wait_all): Likewise.
(atomic<_Tp*>::wait): Likewise.
(atomic<_Tp*>::wait_one): Likewise.
(atomic<_Tp*>::wait_all): Likewise.
* include/std/latch: New file.
* include/std/semaphore: New file.
* include/std/version: Add __cpp_lib_semaphore and
__cpp_lib_latch defines.
* testsuite/29_atomic/atomic/wait_notify/bool.cc: New test.
* testsuite/29_atomic/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomic/atomic/wait_notify/generic.cc: Liekwise.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc: Likewise.
* testsuite/29_atomic/atomic_float/wait_notify.cc: Likewise.
* testsuite/29_atomic/atomic_integral/wait_notify.cc: Likewise.
* testsuite/29_atomic/atomic_ref/wait_notify.cc: Likewise.
* testsuite/30_thread/semaphore/1.cc: New test.
* testsuite/30_thread/semaphore/2.cc: Likewise.
* testsuite/30_thread/semaphore/least_max_value_neg.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_for.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_posix.cc: Likewise.
* testsuite/30_thread/semaphore/try_acquire_until.cc: Likewise.
* testsuite/30_thread/latch/1.cc: New test.
* testsuite/30_thread/latch/2.cc: New test.
* testsuite/30_thread/latch/3.cc: New test.
* testsuite/util/atomic/wait_notify_util.h: New File.

---
 libstdc++-v3/include/Makefile.am   |   5 +
 libstdc++-v3/include/Makefile.in   |   5 +
 libstdc++-v3/include/bits/atomic_base.h| 195 -
 libstdc++-v3/include/bits/atomic_timed_wait.h  | 288 +++
 libstdc++-v3/include/bits/atomic_wait.h| 306 +
 libstdc++-v3/include/bits/semaphore_base.h | 298 
 libstdc++-v3/include/std/atomic|  78 ++
 libstdc++-v3/include/std/latch |  91 ++
 libstdc++-v3/include/std/semaphore |  92 +++
 libstdc++-v3/include/std/version   |   2 +
 .../29_atomics/atomic/wait_notify/bool.cc  |  59 
 .../29_atomics/atomic/wait_notify/generic.cc   |  31 +++
 .../29_atomics/atomic/wait_notify/pointers.cc  |  59 
 .../29_atomics/atomic_flag/wait_notify/1.cc|  61 
 .../29_atomics/atomic_float/wait_notify.cc |  32 +++
 .../29_atomics/atomic_integral/wait_notify.cc  |  65 +
 .../testsuite/29_atomics/atomic_ref/wait_notify.cc | 103 +++
 libstdc++-v3/testsuite/30_threads/latch/1.cc   |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/2.cc   |  27 ++
 libstdc++-v3/testsuite/30_threads/latch/3.cc   |  69 +
 libstdc++-v3/testsuite/30_threads/semaphore/1.cc   |  27 ++
 libstdc++-v3/testsuite/30_threads/semaphore/2.cc   |  27 ++
 .../30_threads/semaphore/least_max_value_neg.cc|  30 ++
 .../testsuite/30_threads/semaphore/try_acquire.cc  |  55 
 .../30_threads/semaphore/try_acquire_for.cc|  85 ++
 .../30_threads/semaphore/try_acquire_posix.cc 

[pushed] c++: Fix typo in NON_UNION_CLASS_TYPE_P.

2020-10-05 Thread Marek Polacek via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* cp-tree.h (NON_UNION_CLASS_TYPE_P): Fix typo in a comment.
---
 gcc/cp/cp-tree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c9ad75117ad..c7b5e7915ae 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2064,7 +2064,7 @@ enum languages { lang_c, lang_cplusplus };
 #define CLASS_TYPE_P(T) \
   (RECORD_OR_UNION_CODE_P (TREE_CODE (T)) && TYPE_LANG_FLAG_5 (T))
 
-/* Nonzero if T is a class type but not an union.  */
+/* Nonzero if T is a class type but not a union.  */
 #define NON_UNION_CLASS_TYPE_P(T) \
   (TREE_CODE (T) == RECORD_TYPE && TYPE_LANG_FLAG_5 (T))
 

base-commit: 1c72f460e9e4fce1220e426989226dfeb0db816e
-- 
2.26.2



[committed] libstdc++: Minor header cleanup in

2020-10-05 Thread Jonathan Wakely via Gcc-patches
When adding new features to  I included the required headers
adjacent to the new code. This cleans it up by moving all the includes
to the start of the file.

libstdc++-v3/ChangeLog:

* include/std/numeric: Move all #include directives to the top
of the header.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error line
numbers.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit 1c72f460e9e4fce1220e426989226dfeb0db816e
Author: Jonathan Wakely 
Date:   Mon Oct 5 22:45:27 2020

libstdc++: Minor header cleanup in 

When adding new features to  I included the required headers
adjacent to the new code. This cleans it up by moving all the includes
to the start of the file.

libstdc++-v3/ChangeLog:

* include/std/numeric: Move all #include directives to the top
of the header.
* testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error line
numbers.
* testsuite/26_numerics/lcm/lcm_neg.cc: Likewise.

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index 2de6aaf06ec..af1e7884f6a 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -60,12 +60,24 @@
 #include 
 #include 
 #include 
-#include 
 
 #ifdef _GLIBCXX_PARALLEL
 # include 
 #endif
 
+#if __cplusplus >= 201402L
+# include 
+# include 
+#endif
+
+#if __cplusplus >= 201703L
+# include 
+#endif
+
+#if __cplusplus > 201703L
+# include 
+#endif
+
 /**
  * @defgroup numerics Numerics
  *
@@ -74,14 +86,11 @@
  * arrays, generalized numeric algorithms, and mathematical special functions.
  */
 
-#if __cplusplus >= 201402L
-#include 
-#include 
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+#if __cplusplus >= 201402L
 namespace __detail
 {
   // std::abs is not constexpr, doesn't support unsigned integers,
@@ -181,18 +190,10 @@ namespace __detail
 }
 
 #endif // C++17
-
-_GLIBCXX_END_NAMESPACE_VERSION
-} // namespace std
-
 #endif // C++14
 
 #if __cplusplus > 201703L
-#include 
 
-namespace std _GLIBCXX_VISIBILITY(default)
-{
-_GLIBCXX_BEGIN_NAMESPACE_VERSION
   // midpoint
 # define __cpp_lib_interpolate 201902L
 
@@ -241,17 +242,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert( sizeof(_Tp) != 0, "type must be complete" );
   return __a  + (__b - __a) / 2;
 }
-_GLIBCXX_END_NAMESPACE_VERSION
-} // namespace std
-
 #endif // C++20
 
-#if __cplusplus > 201402L
-#include 
-
-namespace std _GLIBCXX_VISIBILITY(default)
-{
-_GLIBCXX_BEGIN_NAMESPACE_VERSION
+#if __cplusplus >= 201703L
 
 #if __cplusplus > 201703L
 #define __cpp_lib_constexpr_numeric 201911L
@@ -720,10 +713,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   // @} group numeric_ops
+#endif // C++17
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
+#if __cplusplus >= 201703L
 // Parallel STL algorithms
 # if _PSTL_EXECUTION_POLICIES_DEFINED
 // If  has already been included, pull in implementations
diff --git a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc 
b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
index fa559b6f475..4294a2c69ce 100644
--- a/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/gcd/gcd_neg.cc
@@ -46,9 +46,9 @@ test01()
   std::gcd(0.1, 0.1);   // { dg-error "from here" }
 }
 
-// { dg-error "must be integers" "" { target *-*-* } 160 }
-// { dg-error "must be integers" "" { target *-*-* } 161 }
-// { dg-error "must not be bool" "" { target *-*-* } 162 }
-// { dg-error "must not be bool" "" { target *-*-* } 163 }
+// { dg-error "must be integers" "" { target *-*-* } 169 }
+// { dg-error "must be integers" "" { target *-*-* } 170 }
+// { dg-error "must not be bool" "" { target *-*-* } 171 }
+// { dg-error "must not be bool" "" { target *-*-* } 172 }
 // { dg-prune-output "deleted function" }
 // { dg-prune-output "incomplete type .*make_unsigned" }
diff --git a/libstdc++-v3/testsuite/26_numerics/lcm/lcm_neg.cc 
b/libstdc++-v3/testsuite/26_numerics/lcm/lcm_neg.cc
index 7e36c2654b0..114995cf0b9 100644
--- a/libstdc++-v3/testsuite/26_numerics/lcm/lcm_neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/lcm/lcm_neg.cc
@@ -46,9 +46,9 @@ test01()
   std::lcm(0.1, 0.1);   // { dg-error "from here" }
 }
 
-// { dg-error "must be integers" "" { target *-*-* } 174 }
-// { dg-error "must be integers" "" { target *-*-* } 175 }
-// { dg-error "must not be bool" "" { target *-*-* } 176 }
-// { dg-error "must not be bool" "" { target *-*-* } 177 }
+// { dg-error "must be integers" "" { target *-*-* } 183 }
+// { dg-error "must be integers" "" { target *-*-* } 184 }
+// { dg-error "must not be bool" "" { target *-*-* } 185 }
+// { dg-error "must not be bool" "" { target *-*-* } 186 }
 // { dg-prune-output "deleted function" }
 // { dg-prune-output "incomplete type .*make_unsigned" }


Re: Fix handling of stores in modref_summary::useful_p

2020-10-05 Thread Jan Hubicka
> The 10/05/2020 17:28, Szabolcs Nagy via Gcc-patches wrote:
> > The 10/05/2020 12:52, Vaseeharan Vinayagamoorthy wrote:
> > > Hi,
> > > 
> > > After this patch, I am noticing that some glibc crypto tests get stuck in 
> > > scanf which goes into busy loop.
> > > 
> > > My build/host/target setup is:
> > > Build: aarch64-none-linux-gnu
> > > Host: aarch64-none-linux-gnu
> > > Target: aarch64-none-linux-gnu
> > 
> > i can reproduce this on aarch64, i'm looking at it:
> > 
> > if i compile glibc with gcc trunk after this commit i see
> > 
> > $ ./testrun.sh crypt/cert < $glibcsrc/crypt/cert.input
> >  K:  P:  C:  Encrypt FAIL
> >  K:  P:  C:  Encrypt FAIL
> >  K:  P:  C:  Encrypt FAIL
> >  K:  P:  C:  Encrypt FAIL
> >  K:  P:  C:  Encrypt FAIL
> >  K:  P:  C:  Encrypt FAIL
> > ...
> > 
> > it just keeps printing this.
> > 
> > same test binary with glibc code compiled with an
> > older gcc works, so something in glibc gets miscompiled.
> > 
> > i will have to do more digging to figure out what.
> 
> minimal reproducer:
> 
> #include 
> int main()
> {
> int r,t;
> r = sscanf("01", "%2x", );
> printf("scanf: %d  %02x\n", r, t);
> return 0;
> }
> 
> should print
> 
> scanf: 1  01
> 
> but when glibc is compiled with gcc trunk on aarch64 it prints
> 
> scanf: 0  00
> 
> i will continute the debugging from here tomorrow.

There is a report on glibc issue here 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97264
it turned out to be a latent glibc bug type punning const char * and
const unsigned char *.

I wonder if it is same as problem you are seeing?
Honza
> 
> 
> > > On 27/09/2020, 22:46, "Gcc-patches on behalf of Jan Hubicka" 
> > >  wrote:
> > > 
> > > Hi,
> > > this patch fixes a pasto in modref_summary::useful_p that made
> > > ipa-modref to give up on tracking stores when all load info got lost.
> > > 
> > > Bootstrapped/regtested x86_64-linux, comitted.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2020-09-27  Jan Hubicka  
> > > 
> > > * ipa-modref.c (modref_summary::useful_p): Fix testing of stores.
> > > 
> > > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> > > index 728c6c1523d..6225552e41a 100644
> > > --- a/gcc/ipa-modref.c
> > > +++ b/gcc/ipa-modref.c
> > > @@ -135,7 +135,7 @@ modref_summary::useful_p (int ecf_flags)
> > >  return true;
> > >if (ecf_flags & ECF_PURE)
> > >  return false;
> > > -  return stores && !loads->every_base;
> > > +  return stores && !stores->every_base;
> > >  }
> > > 
> > >  /* Dump A to OUT.  */
> > > 


Re: [PATCH] c++: ICE in dependent_type_p with constrained auto [PR97052]

2020-10-05 Thread Patrick Palka via Gcc-patches
On Wed, 30 Sep 2020, Jason Merrill wrote:

> On 9/29/20 5:01 PM, Patrick Palka wrote:
> > This patch fixes an "unguarded" call to coerce_template_parms in
> > build_standard_check: processing_template_decl could be zero if we
> > we get here during processing of the first 'auto' parameter of an
> > abbreviated function template.  In the testcase below, this leads to an
> > ICE when coerce_template_parms substitutes into C's dependent default
> > template argument.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu and tested by building
> > cmcstl2 and range-v3.  Does this look OK for trunk?
> 
> This looks OK, but is there a place higher in the call stack where we should
> have already set processing_template_decl?

The call stack at that point is:

  build_variable_check
  build_concept_check
  build_type_constraint
  finish_type_constraints
  cp_parser_placeholder_type_specifier
  cp_parser_simple_type_specifier
  ...

So it seems the most natural place to set processing_template_decl would
be in build_type_constraint, around the call to build_concept_check,
since that's where we create the WILDCARD_DECL that eventually reaches
coerce_template_parms.

And in order to additionally avoid a similar ICE when processing the
type constraint of a non-templated variable, we also need to guard the
call to build_concept check in make_constrained_placeholder_type.  The
testcase below now contains such an example.

So something like this perhaps:

-- >8 --

Subject: [PATCH] c++: ICE in dependent_type_p with constrained auto [PR97052]

This patch fixes an "unguarded" call to coerce_template_parms in
build_standard_check: processing_template_decl could be zero if we
get here during processing of the first 'auto' parameter of an
abbreviated function template, or if we're processing the type
constraint of a non-templated variable.  In the testcase below, this
leads to an ICE when coerce_template_parms instantiates C's dependent
default template argument.

gcc/cp/ChangeLog:

PR c++/97052
* constraint.cc (build_type_constraint): Temporarily increment
processing_template_decl before calling build_concept_check.
* pt.c (make_constrained_placeholder_type): Likewise.

gcc/testsuite/ChangeLog:

PR c++/97052
* g++.dg/cpp2a/concepts-defarg2: New test.
---
 gcc/cp/constraint.cc  |  2 ++
 gcc/cp/pt.c   |  2 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C | 13 +
 3 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index d49957a6c4a..050b55ce092 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1427,7 +1427,9 @@ tree
 build_type_constraint (tree decl, tree args, tsubst_flags_t complain)
 {
   tree wildcard = build_nt (WILDCARD_DECL);
+  ++processing_template_decl;
   tree check = build_concept_check (decl, wildcard, args, complain);
+  --processing_template_decl;
   if (check == error_mark_node)
 return error_mark_node;
   return unpack_concept_check (check);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 72efecff37f..efdd017a4d5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27914,7 +27914,9 @@ make_constrained_placeholder_type (tree type, tree con, 
tree args)
   tree expr = tmpl;
   if (TREE_CODE (con) == FUNCTION_DECL)
 expr = ovl_make (tmpl);
+  ++processing_template_decl;
   expr = build_concept_check (expr, type, args, tf_warning_or_error);
+  --processing_template_decl;
 
   PLACEHOLDER_TYPE_CONSTRAINTS (type) = expr;
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C
new file mode 100644
index 000..a63ca4e133d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-defarg2.C
@@ -0,0 +1,13 @@
+// PR c++/97052
+// { dg-do compile { target c++20 } }
+
+template
+concept C = true;
+
+constexpr bool f(C auto) {
+  return true;
+}
+
+static_assert(f(0));
+
+C auto x = 0;
-- 
2.28.0.715.gab4691b67b



Re: [PATCH] rs6000: Fix extraneous characters in the documentation

2020-10-05 Thread Tulio Magno Quites Machado Filho via Gcc-patches
Ping?

Tulio Magno Quites Machado Filho via Gcc-patches  
writes:

> Replace them with a whitespace in order to avoid artifacts in the HTML
> document.
>
> 2020-08-19  Tulio Magno Quites Machado Filho  
>
> gcc/
>   * doc/extend.texi (PowerPC Built-in Functions): Replace
>   extraneous characters with whitespace.
> ---
>  gcc/doc/extend.texi | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index bcc251481ca..0c380322280 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21538,10 +21538,10 @@ void amo_stdat_smin (int64_t *, int64_t);
>  ISA 3.1 of the PowerPC added new Matrix-Multiply Assist (MMA) instructions.
>  GCC provides support for these instructions through the following built-in
>  functions which are enabled with the @code{-mmma} option.  The vec_t type
> -below is defined to be a normal vector unsigned char type.  The uint2, uint4
> +below is defined to be a normal vector unsigned char type.  The uint2, uint4
>  and uint8 parameters are 2-bit, 4-bit and 8-bit unsigned integer constants
> -respectively.  The compiler will verify that they are constants and that
> -their values are within range. 
> +respectively.  The compiler will verify that they are constants and that
> +their values are within range.
>  
>  The built-in functions supported are:
>  
> -- 
> 2.25.4
>

-- 
Tulio Magno


Re: [patch] convert -Wrestrict pass to ranger

2020-10-05 Thread Martin Sebor via Gcc-patches

On 10/5/20 8:50 AM, Aldy Hernandez via Gcc-patches wrote:

[Martin, as the original author of this pass, do you have any concerns?]



No concerns, just a few minor things.

This patch converts the -Wrestrict pass to use an on-demand ranger 
instead of global ranges.


No effort was made to convert value_range's into multi-ranges. 
Basically, the places that were using value_range's, and looking at 
kind(), are still doing so.  This can be fixed as a follow-up patch, but 
it's not high on my list.


Note that there are still calls into get_range_info (global range info) 
when no ranger has been passed, because some of these functions are 
called from gimple fold during gimple lowering (builtin expansion as 
well??).


This patch depends on the ranger, and will likely be tweaked before 
going in.


Aldy

     gcc/ChangeLog:

     * calls.c (get_size_range): Adjust to work with ranger.
     * calls.h (get_size_range): Add ranger argument to prototype.
     * gimple-ssa-warn-restrict.c (class wrestrict_dom_walker): 
Remove.

     (check_call): Pull out of wrestrict_dom_walker into a
     static function.
     (wrestrict_dom_walker::before_dom_children): Rename to...
     (wrestrict_walk): ...this.
     (pass_wrestrict::execute): Instantiate ranger.
     (class builtin_memref): Add stmt and query fields.
     (builtin_access::builtin_access): Add range_query field.
     (builtin_memref::builtin_memref): Same.
     (builtin_memref::extend_offset_range): Same.
     (builtin_access::builtin_access): Make work with ranger.
     (wrestrict_dom_walker::check_call): Pull out into...
     (check_call): ...here.
     (check_bounds_or_overlap): Add range_query argument.
     * gimple-ssa-warn-restrict.h (check_bounds_or_overlap):
     Add range_query and gimple stmt arguments.

     gcc/testsuite/ChangeLog:

     * gcc.dg/Wrestrict-22.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index ed4363811c8..c9c71657e54 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "attribs.h"
  #include "builtins.h"
  #include "gimple-fold.h"
-
+#include "value-query.h"
  #include "tree-pretty-print.h"

  /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits.  */
@@ -1251,7 +1251,8 @@ alloc_max_size (void)
     functions like memset.  */

  bool
-get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
+get_size_range (range_query *query, tree exp, gimple *stmt, tree range[2],
+    bool allow_zero /* = false */)
  {
    if (!exp)
  return false;
@@ -1270,7 +1271,21 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

    enum value_range_kind range_type;

    if (integral)
-    range_type = determine_value_range (exp, , );
+    {
+  if (query)
+    {
+  value_range vr;
+  gcc_assert (TREE_CODE (exp) == SSA_NAME
+  || TREE_CODE (exp) == INTEGER_CST);
+  gcc_assert (query->range_of_expr (vr, exp, stmt));


Will the call to the function in the assert above not be eliminated
if the assert is turned into a no-op?  If it can't happen (it looks
like it shouldn't anymore), it would still be nice to break it out
of the macro.  Those of us used to the semantics of the C standard
assert macro might wonder.


+  range_type = vr.kind ();
+  min = wi::to_wide (vr.min ());
+  max = wi::to_wide (vr.max ());
+    }
+  else
+    range_type = determine_value_range (exp, , );
+
+    }
    else
  range_type = VR_VARYING;

@@ -1351,6 +1366,13 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

    return true;
  }

+bool
+get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
+{
+  return get_size_range (/*query=*/NULL, exp, /*stmt=*/NULL, range,
+ allow_zero);
+}
+


I realize its purpose is obvious from the context but can you please
add a brief comment above the new function?


  /* Diagnose a call EXP to function FN decorated with attribute alloc_size
     whose argument numbers given by IDX with values given by ARGS exceed
     the maximum object size or cause an unsigned oveflow (wrapping) when
diff --git a/gcc/calls.h b/gcc/calls.h
index dfb951ca45b..ab56b48fee4 100644
--- a/gcc/calls.h
+++ b/gcc/calls.h
@@ -134,6 +134,8 @@ extern void maybe_warn_alloc_args_overflow (tree, 
tree, tree[2], int[2]);

  extern tree get_attr_nonstring_decl (tree, tree * = NULL);
  extern bool maybe_warn_nonstring_arg (tree, tree);
  extern bool get_size_range (tree, tree[2], bool = false);
+extern bool get_size_range (class range_query *, tree, gimple *,
+    tree[2], bool = false);
  extern rtx rtx_for_static_chain (const_tree, bool);
  extern bool cxx17_empty_base_field_p (const_tree);

diff --git a/gcc/gimple-ssa-warn-restrict.c 
b/gcc/gimple-ssa-warn-restrict.c

index 

Re: [PATCH] libstdc++: Diagnose visitors with different return types [PR95904]

2020-10-05 Thread Ville Voutilainen via Gcc-patches
On Mon, 5 Oct 2020 at 01:15, Ville Voutilainen
 wrote:
> The patch is borked, doesn't pass tests, fixing...

Unborked, ok for trunk if full testsuite passes?

2020-10-05  Ville Voutilainen  

PR libstdc++/95904
* include/std/variant (__deduce_visit_result): Add a nested ::type.
(__gen_vtable_impl::_S_apply):
Check the visitor return type.
(__same_types): New.
(__check_visitor_result): Likewise.
(__check_visitor_results): Likewise.
(visit(_Visitor&&, _Variants&&...)): Use __check_visitor_results
in case we're visiting just one variant.
* testsuite/20_util/variant/visit_neg.cc: Adjust.
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index dd8847cf829..b32e564fd41 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -182,7 +182,7 @@ namespace __variant
   // used for raw visitation with indices passed in
   struct __variant_idx_cookie { using type = __variant_idx_cookie; };
   // Used to enable deduction (and same-type checking) for std::visit:
-  template struct __deduce_visit_result { };
+  template struct __deduce_visit_result { using type = _Tp; };
 
   // Visit variants that might be valueless.
   template
@@ -1017,7 +1017,26 @@ namespace __variant
 
   static constexpr auto
   _S_apply()
-  { return _Array_type{&__visit_invoke}; }
+  {
+	if constexpr (_Array_type::__result_is_deduced::value)
+	  {
+	constexpr bool __visit_ret_type_mismatch =
+	  !is_same_v(),
+std::declval<_Variants>()...))>;
+	if constexpr (__visit_ret_type_mismatch)
+	  {
+		static_assert(!__visit_ret_type_mismatch,
+		  "std::visit requires the visitor to have the same "
+		  "return type for all alternatives of a variant");
+		return __nonesuch{};
+	  }
+	else
+	  return _Array_type{&__visit_invoke};
+	  }
+	else
+	  return _Array_type{&__visit_invoke};
+  }
 };
 
   template
@@ -1692,6 +1711,26 @@ namespace __variant
 			   std::forward<_Variants>(__variants)...);
 }
 
+  template
+ constexpr inline bool __same_types = (is_same_v<_Tp, _Types> && ...);
+
+  template 
+decltype(auto)
+__check_visitor_result(_Visitor&& __vis, _Variant&& __variant)
+{
+  return std::__invoke(std::forward<_Visitor>(__vis),
+			   std::get<_Idx>(std::forward<_Variant>(__variant)));
+}
+
+  template 
+constexpr bool __check_visitor_results(std::index_sequence<_Idxs...>)
+{
+  return __same_types(
+	std::declval<_Visitor>(),
+	std::declval<_Variant>()))...>;
+}
+
+
   template
 constexpr decltype(auto)
 visit(_Visitor&& __visitor, _Variants&&... __variants)
@@ -1704,8 +1743,28 @@ namespace __variant
 
   using _Tag = __detail::__variant::__deduce_visit_result<_Result_type>;
 
-  return std::__do_visit<_Tag>(std::forward<_Visitor>(__visitor),
-   std::forward<_Variants>(__variants)...);
+  if constexpr (sizeof...(_Variants) == 1)
+	{
+	  constexpr bool __visit_rettypes_match =
+	__check_visitor_results<_Visitor, _Variants...>(
+	  std::make_index_sequence<
+	std::variant_size...>::value>());
+	  if constexpr (!__visit_rettypes_match)
+	{
+	  static_assert(__visit_rettypes_match,
+			  "std::visit requires the visitor to have the same "
+			  "return type for all alternatives of a variant");
+	  return;
+	}
+	  else
+	return std::__do_visit<_Tag>(
+	  std::forward<_Visitor>(__visitor),
+	  std::forward<_Variants>(__variants)...);
+	}
+  else
+	return std::__do_visit<_Tag>(
+	  std::forward<_Visitor>(__visitor),
+	  std::forward<_Variants>(__variants)...);
 }
 
 #if __cplusplus > 201703L
diff --git a/libstdc++-v3/testsuite/20_util/variant/visit_neg.cc b/libstdc++-v3/testsuite/20_util/variant/visit_neg.cc
index 6279dec5aa2..748eb21c1ad 100644
--- a/libstdc++-v3/testsuite/20_util/variant/visit_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/variant/visit_neg.cc
@@ -21,7 +21,7 @@
 #include 
 #include 
 
-// { dg-error "invalid conversion" "" { target *-*-* } 0 }
+// { dg-error "same return type for all alternatives" "" { target *-*-* } 0 }
 // { dg-prune-output "in 'constexpr' expansion" }
 
 void


Re: [PATCH 3/5] Add TI to TD (128-bit DFP) and TD to TI support

2020-10-05 Thread Carl Love via Gcc-patches
Will, Segher:

Add support for converting to/from 128-bit integers and 128-bit 
decimal floating point formats.

The updates from the previous version of the patch:

Just a fix for the change log per Will's comments.

No regression failures were found when run on a P9.

Please let me know if this is ready for mainline. 

   Carl

--


gcc/ChangeLog

2020-10-05  Carl Love  
* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
* config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P, 
P10V_BUILTIN_VCMPAET_P):
New overloaded definitions.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  
* gcc.target/powerpc/int_128bit-runnable.c:  Update test.
---
 gcc/config/rs6000/dfp.md  | 14 +
 gcc/config/rs6000/rs6000-call.c   |  4 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  | 62 +++
 3 files changed, 80 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 8f822732bac..0e82e315fee 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+   (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,13 @@
   "TARGET_DFP"
   "dctfix %0,%1"
   [(set_attr "type" "dfp")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+   (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 
 ;; Decimal builtin support
 
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 87fff5c1c80..8d00a25d806 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -4967,6 +4967,8 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
 RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPNE_P, P10V_BUILTIN_VCMPNET_P,
+RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
 
   { P9V_BUILTIN_VEC_VCMPNE_P, P9V_BUILTIN_VCMPNEFP_P,
 RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
@@ -5074,6 +5076,8 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_V2DI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
 RS6000_BTI_INTSI, RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0 },
+  { P9V_BUILTIN_VEC_VCMPAE_P, P10V_BUILTIN_VCMPAET_P,
+RS6000_BTI_INTSI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEFP_P,
 RS6000_BTI_INTSI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { P9V_BUILTIN_VEC_VCMPAE_P, P9V_BUILTIN_VCMPAEDP_P,
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 85ad544e22b..ec3dcf3dff1 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include 
 #include 
+#include 
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+__uint128_t u128;
+_Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
   vector unsigned long long int vec_uresult_di;
@@ -2249,6 +2257,60 @@ int main ()
 abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Can't get printing of DFP values to work.  Print the DFP value as an
+ unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208ULL) &&
+  ((conv.u128 & 0x) != 0x4ULL)) {
+#if DEBUG
+printf("ERROR:  convert int128 value ");
+print_i128 (arg1);
+conv.d128 = result_dfp128;
+printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+  (unsigned long long)((conv.u128) >>64),
+  (unsigned long long)((conv.u128) & 0x));
+
+conv.d128 = expected_result_dfp128;
+printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+  (unsigned long 

Re: [PATCH 5/5] Conversions between 128-bit integer and floating point values.

2020-10-05 Thread Carl Love via Gcc-patches
Will, Segher:

This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The changes from the previous version include:

Fixed up the change log entry issues noted by Will.

Regression tests reran on Power 9 LE with no regression errors.

Please let me know if it looks OK to commit to mainline.

  Carl 
-

gcc/ChangeLog

2020-10-05  Carl Love  
* config/rs6000/rs6000.md (floatti2, floatunsti2,
fix_truncti2, fixuns_truncti2): Add
define_insn for mode IEEE 128.
* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
Update source function name.  White space fixes.
* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
Update source function name.  White space fixes.
* libgcc/config/rs6000/float128-hw.c (__floattikf_hw,
__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw):
New functions.
* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1):
New macro.
(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
__fixunskfti_resolve): Add resolve functions.
(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New
functions.
* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
__fixtfti, __fixunstfti): Add editor commands to change
names.
* libgcc/config/rs6000/float128-sed-hw (__floattitf,
__floatuntitf, __fixtfti, __fixunstfti): Add editor commands
to change names.
* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
file names to fp128_ppc_funcs.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  
* gcc.target/powerpc/fl128_conversions.c: New file.
---
 gcc/config/rs6000/rs6000.md   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c| 286 ++
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   7 +-
 libgcc/config/rs6000/float128-hw.c|  24 ++
 libgcc/config/rs6000/float128-ifunc.c |  44 ++-
 libgcc/config/rs6000/float128-sed |   4 +
 libgcc/config/rs6000/float128-sed-hw  |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}|   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}   |   4 +-
 libgcc/config/rs6000/quad-float128.h  |  17 +-
 libgcc/config/rs6000/t-float128   |   3 +-
 12 files changed, 417 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (90%)
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename libgcc/config/rs6000/{floatuntikf.c => floatuntikf-sw.c} (96%)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 694ff70635e..5db5d0b4505 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6390,6 +6390,42 @@
xscvsxddp %x0,%x1"
   [(set_attr "type" "fp")])
 
+(define_insn "floatti2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+   (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvsqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "floatunsti2"
+  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
+   (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" 
"v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvuqqp %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fix_truncti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+   (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpsqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
+(define_insn "fixuns_truncti2"
+  [(set (match_operand:TI 0 "vsx_register_operand" "=v")
+   (unsigned_fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
+  "TARGET_POWER10"
+{
+  return  "xscvqpuqz %0,%1";
+}
+  [(set_attr "type" "fp")])
+
 ; Allow the combiner to merge source memory operands to the conversion so that
 ; the 

Re: [PATCH 4/5] Test 128-bit shifts for just the int128 type.

2020-10-05 Thread Carl Love via Gcc-patches
Will, Segher:

Patch 4 adds the vector 128-bit integer shift instruction support for
the V1TI type.

The changes from the previous version include:

Fixed up the change log entry issues noted by Will.

Regression tests reran on Power 9 LE with no regression errors.

Please let me know if it looks OK to commit to mainline.

  Carl 
-

gcc/ChangeLog

2020-10-05  Carl Love  
* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
Rename to altivec_vslq_, altivec_vsrq_, mode VEC_TI.
* config/rs6000/vector.md (VEC_TI): New name for VSX_TI iterator.
(vashlv1ti3): Change to vashl3, mode VEC_TI.
(vlshrv1ti3): Change to vlshr3, mode VEC_TI.
* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  
gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
tests.
---
 gcc/config/rs6000/altivec.md  | 16 -
 gcc/config/rs6000/vector.md   | 27 ---
 gcc/config/rs6000/vsx.md  | 33 +--
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 34a4731342a..5db3de3cc9f 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2219,10 +2219,10 @@
   "vsl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-(match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+(match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2236,10 +2236,10 @@
   "vsr %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-  (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+  (match_operand:VEC_TI 2 "vsx_register_operand" 
"v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0cca4232619..3ea3a91845a 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
  V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-(match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+(match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_ (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-  (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+  (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (mode);
 
   emit_insn(gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn(gen_altivec_vsrq (operands[0], 

Re: [PATCH 2b/5] RS6000 add 128-bit Integer Operations

2020-10-05 Thread Carl Love via Gcc-patches
Will and Segher:

This is the rest of the second patch which adds the 128-bit integer
support for divide, modulo, shift, compare of 128-bit
integers instructions and builtin support.

In the last round of changes, the flag for the 128-bit operations was
removed.  Per Will's comments, the  BU_P10_128BIT_* builtin definitions
can be removed.  Instead we can just use P10V_BUILTIN. Similarly for
the BU_P10_P builtin definition.  The commit log was updated to reflect
the change.  There were a few change log entries for the 128-bit
operations flag that needed removing.  As well as other fixes noted by
Will.

The changes are all name changes not functional changes.  

No regression failures were found when run on a P9.

Please let me know if this is ready for mainline.  

   Carl



gcc/ChangeLog

2020-10/05  Carl Love  
* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
for new builtins.
* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
define_insn.
(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
altivec_vrlqnm): New define_expands.
* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
VCMPGTUT_P): Add macro expansions.
(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
P10V_BUILTIN_128BIT_DIV_V1TI, P10V_BUILTIN_128BIT_UDIV_V1TI,
P10V_BUILTIN_128BIT_VMULESD, P10V_BUILTIN_128BIT_VMULEUD,
P10V_BUILTIN_128BIT_VMULOSD, P10V_BUILTIN_128BIT_VMULOUD,
P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
P10V_BUILTIN_128BIT_VRLQ, P10V_BUILTIN_128BIT_VRLQMI,
P10V_BUILTIN_128BIT_VRLQNM, P10V_BUILTIN_128BIT_VSLQ,
P10V_BUILTIN_128BIT_VSRQ, P10V_BUILTIN_128BIT_VSRAQ,
P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
P10V_BUILTIN_128BIT_VSIGNEXTSD2Q, P10V_BUILTIN_128BIT_DIVES_V1TI,
P10V_BUILTIN_128BIT_MODS_V1TI, P10V_BUILTIN_128BIT_MODU_V1TI):
New overloaded definitions.
(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
P10_BUILTIN_CMPLE_U1TI]: New case statements.
(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
New assignments.
(altivec_init_builtins): New E_V1TImode case statement.
(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
E_V1TImode]: New case statements.
* config/rs6000/r6000.h (RS6000_BTM_TI_VECTOR_OPS): New defines.
(rs6000_builtin_type_index): New enum value RS6000_BTI_bool_V1TI.
* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
vlshrv1ti3, vashrv1ti3): New define_expands.
* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
UNSPEC_VSX_MODUQ, UNSPEC_XXSWAPD_V1TI): New unspecs.
(vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti, vsx_diveu_v1ti,
vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti, vsx_sign_extend_v2di_v1ti):
New define_insns.

Re: [PATCH 2a/5] rs6000, vec_rlnm builtin fix arguments

2020-10-05 Thread Carl Love via Gcc-patches
Will, Segher:



The following changes were made from the previous version:

Per Will's comments, I split the bug fix from patch 2 into a separate
patch.  This patch is the bug fix for the vec_rlnm builtin.

Regression tests reran on Power 9 LE with no regression errors.

Please let me know if it looks OK to commit to mainline.

  Carl 

--


gcc/ChangeLog

2020-10-05  Carl Love  

* config/rs6000/altivec.h (vec_rlnm): Fix bug in argument generation.
---
 gcc/config/rs6000/altivec.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 8a2dcda0144..f7720d136c9 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -183,7 +183,7 @@
 #define vec_recipdiv __builtin_vec_recipdiv
 #define vec_rlmi __builtin_vec_rlmi
 #define vec_vrlnm __builtin_vec_rlnm
-#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((b)<<8)|(c)))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_signed __builtin_vec_vsigned
-- 
2.17.1




Re: [PATCH 1/5] RS6000 Add 128-bit Binary Integer sign extend operations

2020-10-05 Thread Carl Love via Gcc-patches
Will, Segher:

Patch 1, adds the 128-bit sign extension instruction support and
corresponding builtin support.

I updated the change log per the comments from Will.

Patch has been retested on Power 9 LE.

Pet me know if it is ready to commit to mainline.

 Carl 

---


gcc/ChangeLog

2020-10-05  Carl Love  
* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
for new builtins.
* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
overloaded builtin definitions.
(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
Add builtin expansions.
* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
visible.
* doc/extend.texi:  Add documentation for the vec_signexti and
vec_signextll builtins.

gcc/testsuite/ChangeLog

2020-10-05  Carl Love  
* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h   |   3 +
 gcc/config/rs6000/rs6000-builtin.def  |   9 ++
 gcc/config/rs6000/rs6000-call.c   |  13 ++
 gcc/config/rs6000/vsx.md  |   2 +-
 gcc/doc/extend.texi   |  15 ++
 .../powerpc/p9-sign_extend-runnable.c | 128 ++
 6 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index f7720d136c9..cfa5eda4cd5 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -494,6 +494,9 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
+
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index e91a48ddf5f..4c2e9460949 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2715,6 +2715,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,   "vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,   "vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,  "vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL, "vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,"byte_in_range",CONST,  cmprb)
@@ -2726,6 +2728,13 @@ BU_P9_OVERLOAD_2 (CMPRB, "byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,  "byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,  "byte_in_set")
 
+/* Sign extend builtins that work on ISA 3.0, but not defined until ISA 3.1.  
*/
+BU_P9V_AV_1 (VSIGNEXTSB2W, "vsignextsb2w", CONST,  
vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W, "vsignextsh2w", CONST,  
vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D, "vsignextsb2d", CONST,  
vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D, "vsignextsh2d", CONST,  
vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D, "vsignextsw2d", CONST,  
vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8b520834c7..9e514a01012 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5527,6 +5527,19 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
 RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
 RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 4ff52455fd3..31fcffe8f33 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4787,7 +4787,7 @@
   "vextsh2 %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_insn 

Re: [patch] Rework CPP_BUILTINS_SPEC for powerpc-vxworks

2020-10-05 Thread Olivier Hainque
Hi Segher,

> On 3 Oct 2020, at 00:43, Segher Boessenkool  
> wrote:
> 
> Hi Olivier,
> 
> On Thu, Oct 01, 2020 at 11:30:55AM +0200, Olivier Hainque wrote:
>> This change reworks CPP_BUILTINS_SPEC for powerpc-vxworks to
>> prepare for the upcoming addition of 32 and 64 bit ports for
>> VxWorks 7r2.
> 
> Cool, looking forward to it!

:-)

> Your attachment is not quotable (it is application/octet-stream),

Ah, I thought I has addressed that. Will review.

> so
> I'll paste it in here, hopefully correct:
> 
> --- a/gcc/config/rs6000/vxworks.h
> +++ b/gcc/config/rs6000/vxworks.h
> @@ -26,21 +26,56 @@ along with GCC; see the file COPYING3.  If not see
> /* CPP predefined macros.  */
> 
> #undef TARGET_OS_CPP_BUILTINS
> -#define TARGET_OS_CPP_BUILTINS() \
> -  do \
> -{\
> -  builtin_define ("__ppc");  \
> -  builtin_define ("__PPC__");\
> -  builtin_define ("__EABI__");   \
> -  builtin_define ("__ELF__");\
> -  if (!TARGET_SOFT_FLOAT)\
> - builtin_define ("__hardfp");\
> +#define TARGET_OS_CPP_BUILTINS()\
> 
> Hrm, you changed a lot of white space, was that on purpose?

Not really, I was just refactoring the whole block and thought
that seeing the entire new block as one piece was of possible
interest.

I can adjust to keep the few unchanged pieces unchanged.

> +  do\
> +{   \
> +  /* CPU macros.  */ \
> +  builtin_define ("__ppc"); \
> +  builtin_define ("__ppc__");   \
> +  builtin_define ("__PPC"); \
> +  builtin_define ("__PPC__");   \
> +  builtin_define ("__powerpc"); \
> +  builtin_define ("__powerpc__");   \
> +  if (TARGET_64BIT) \
> +{   \
> +  builtin_define ("__ppc64");   \
> +  builtin_define ("__ppc64__"); \
> +  builtin_define ("__PPC64");\
> +  builtin_define ("__PPC64__"); \
> +  builtin_define ("__powerpc64");\
> +  builtin_define ("__powerpc64__"); \
> + }   \
> 
> Are all those new names actually defined by your ABIs?  If not, this is
> counter-productive: it does not help anyone if there are six ways to
> write things, where not all ways are supported by all compilers!
> (Including older versions of the same compilers.)

This is inspired from a mix of what the system compilers do
and what we can find in the system headers across the two versions
(vx6 and 7), with the intent to come up with a consistent and
not too convoluted overall logic.

I'll see if I differentiating 6 and 7 can tighten at least
some of the subsets.

> -  /* C89 namespace violation! */ \
> -  builtin_define ("CPU_FAMILY=PPC"); \
> 
> +  builtin_define ("CPU_FAMILY=PPC"); \
> 
> You removed the comment, but it is rather important still?  Of course
> the "C89" part of it is dated, but it is true for all newer language
> standards just the same.

I'm happy to add the comment back, even without the C89 specialization.
It's just one macro that the VxWorks header files rely on pretty
heavily.

I'll post an updated version, thanks for the comments.

Cheers,

Olivier



Re: [patch] convert -Walloca pass to ranger

2020-10-05 Thread Martin Sebor via Gcc-patches

On 10/5/20 3:51 AM, Aldy Hernandez via Gcc-patches wrote:
The walloca pass is a mess.  It has all sorts of heuristics to divine 
problematic ranges fed into alloca, none of them very good, and all of 
them unreadable.  The mess therein was actually one of the original 
motivators for the ranger project (along with array bounds checking).


The attached patch is a conversion of the pass to ranger.  It's mostly 
an exercise in removing code.  The entire pass almost reduces to:


+  // If the user specified a limit, use it.
+  int_range_max r;
+  if (warn_limit_specified_p (is_vla)
+  && TREE_CODE (len) == SSA_NAME
+  && query.range_of_expr (r, len, stmt)
+  && !r.varying_p ())
+    {
+  // The invalid bits are anything outside of [0, MAX_SIZE].
+  static int_range<2> invalid_range (build_int_cst (size_type_node, 
0),

+    build_int_cst (size_type_node,
+   max_size),
+    VR_ANTI_RANGE);
+
+  r.intersect (invalid_range);
+  if (r.undefined_p ())
+   return alloca_type_and_limit (ALLOCA_OK);
+
+  return alloca_type_and_limit (ALLOCA_BOUND_MAYBE_LARGE,
+   wi::to_wide (integer_zero_node));
  }

That is, if the range of the integer passed to alloca is outside of 
[0,MAX_SIZE], warn, otherwise it's ok.  Plain and simple.


It looks like a nice simplification to me!  You're the author
of the alloca pass so I have no concerns with the changes (but
I appreciate the heads up).  That said, I do want to respond
to your commentary and add a few notes of my own.

Eventually, I'd like all the -Wfoo-larger-than= warnings to use
the same "core logic" when deciding whether to warn or not.  As it
is, they're not completely consistent with each other and some warn
when others don't and vice versa.  For instance:

$ cat z.c && gcc -O2 -S -Wall -Walloca-larger-than=1000 
-Walloc-size-larger-than=1000 z.c

void f0 (void*);

void f1 (int n)
{
  f0 (__builtin_alloca (n * sizeof (int)));   // warning
}

void f2 (int n)
{
  f0 (__builtin_malloc (n * sizeof (int)));   // silence
}

z.c: In function ‘f1’:
z.c:5:3: warning: argument to ‘alloca’ may be too large 
[-Walloca-larger-than=]

5 |   f0 (__builtin_alloca (n * sizeof (int)));   // warning
  |   ^~~~
z.c:5:3: note: limit is 1000 bytes, but argument may be as large as 
18446744073709551612


That's because -Walloca-larger-than= (and -Wvla-larger-than=) is
designed to warn when it can't prove that the argument isn't too
large, while the others use the conventional approach of warning
only when they can prove that the argument is excessive.

On the other hand, without -Walloca-larger-than=, for the example
below GCC issues -Walloc-size-larger-than= but not the former, even
though the alloca call is more likely to cause serious trouble than
the one to malloc.

$ cat z.c && gcc -O2 -S -Wall z.c
void f0 (void*);

void f1 (int n)
{
  if (n >= 0) n = -1;
  f0 (__builtin_alloca (n * sizeof (int)));
}

void f2 (int n)
{
  if (n >= 0) n = -1;
  f0 (__builtin_malloc (n * sizeof (int)));
}

z.c: In function ‘f2’:
z.c:12:3: warning: argument 1 range [18446744065119617024, 
18446744073709551612] exceeds maximum object size 9223372036854775807 
[-Walloc-size-larger-than=]

   12 |   f0 (__builtin_malloc (n * sizeof (int)));
  |   ^~~~
z.c:12:3: note: in a call to built-in allocation function ‘__builtin_malloc’

(This seems like a bug/missing feature in -Walloca-larger-than=.)

You will notice I removed the nuanced errors we gave before-- like 
trying to guess whether the problematic range came by virtue of a signed 
cast conversion.  These specific errors were never part of the original 
design, they were just stuff we could guess by how the IL looked.  It 
was non-exact and fragile.  Now we just say the alloca argument may be 
too large, period.


That makes sense to me, as long as the note following the warning
still mentions the limit in effect and the range (or at least
the bound in excess of the limit).



It the future, I would even like to remove the specific range the ranger 
was able to compute from the error message itself.  As will become 
obvious, the ranger can get pretty outrageous ranges that are entirely 
non-obvious by looking at the code.  Peppering the error messages with 
these ranges will ultimately just confuse the user.  But alas, that's a 
problem for another patch to solve.


I agree that when it comes to sizes where just one bound of the range
is used to decide whether or not to warn (the lower bound in the case
of most warnings but, as the example above shows, the upper bound for
-Walloca-larger-than=), printing multiple subranges is unnecessary
and could easily be confusing.  Even printing the very large bounds
(in decimal) in the warning above may be too much.  At the same time,
simply 

Re: Fix handling of stores in modref_summary::useful_p

2020-10-05 Thread Szabolcs Nagy via Gcc-patches
The 10/05/2020 17:28, Szabolcs Nagy via Gcc-patches wrote:
> The 10/05/2020 12:52, Vaseeharan Vinayagamoorthy wrote:
> > Hi,
> > 
> > After this patch, I am noticing that some glibc crypto tests get stuck in 
> > scanf which goes into busy loop.
> > 
> > My build/host/target setup is:
> > Build: aarch64-none-linux-gnu
> > Host: aarch64-none-linux-gnu
> > Target: aarch64-none-linux-gnu
> 
> i can reproduce this on aarch64, i'm looking at it:
> 
> if i compile glibc with gcc trunk after this commit i see
> 
> $ ./testrun.sh crypt/cert < $glibcsrc/crypt/cert.input
>  K:  P:  C:  Encrypt FAIL
>  K:  P:  C:  Encrypt FAIL
>  K:  P:  C:  Encrypt FAIL
>  K:  P:  C:  Encrypt FAIL
>  K:  P:  C:  Encrypt FAIL
>  K:  P:  C:  Encrypt FAIL
> ...
> 
> it just keeps printing this.
> 
> same test binary with glibc code compiled with an
> older gcc works, so something in glibc gets miscompiled.
> 
> i will have to do more digging to figure out what.

minimal reproducer:

#include 
int main()
{
int r,t;
r = sscanf("01", "%2x", );
printf("scanf: %d  %02x\n", r, t);
return 0;
}

should print

scanf: 1  01

but when glibc is compiled with gcc trunk on aarch64 it prints

scanf: 0  00

i will continute the debugging from here tomorrow.


> > On 27/09/2020, 22:46, "Gcc-patches on behalf of Jan Hubicka" 
> >  wrote:
> > 
> > Hi,
> > this patch fixes a pasto in modref_summary::useful_p that made
> > ipa-modref to give up on tracking stores when all load info got lost.
> > 
> > Bootstrapped/regtested x86_64-linux, comitted.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-09-27  Jan Hubicka  
> > 
> > * ipa-modref.c (modref_summary::useful_p): Fix testing of stores.
> > 
> > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> > index 728c6c1523d..6225552e41a 100644
> > --- a/gcc/ipa-modref.c
> > +++ b/gcc/ipa-modref.c
> > @@ -135,7 +135,7 @@ modref_summary::useful_p (int ecf_flags)
> >  return true;
> >if (ecf_flags & ECF_PURE)
> >  return false;
> > -  return stores && !loads->every_base;
> > +  return stores && !stores->every_base;
> >  }
> > 
> >  /* Dump A to OUT.  */
> > 


[PATCH] Overflow-trapping integer arithmetic routines7code

2020-10-05 Thread Stefan Kanthak
The implementation of the functions __absv?i2(), __addv?i3() etc. for
trapping integer overflow provided in libgcc2.c is rather bad.
Same for __cmp?i2() and __ucmp?i2()

At least for AMD64 and i386 processors GCC creates awful to horrible
code for them: see 
for some examples as well as the expected assembly.

The attached diff/patch provides better implementations.

Stefan

libgcc2.diff
Description: Binary data


Re: [r11-3641 Regression] FAIL: gcc.dg/torture/pta-ptrarith-1.c -Os scan-tree-dump alias "ESCAPED = {[^\n}]* i f [^\n}]*}" on Linux/x86_64 (-m32 -march=cascadelake)

2020-10-05 Thread Joseph Myers
On Sun, 4 Oct 2020, H.J. Lu via Gcc-patches wrote:

> This email is generated by an automated script.  Does GCC BZ have
> an email gateway?

Bugzilla has a REST API that you can use to interact with it via JSON 
messages over HTTP.  contrib/mark_spam.py has an example to mark bugs as 
spam.  glibc's scripts/list-fixed-bugs.py has an example extracting bug 
data for bugs matching a given search.  There are lots of other things you 
can do with that API, including filing new bugs.  (You probably want to 
make sure you e.g. only file one bug for a commit, not 1000 bugs for a 
commit that introduces 1000 test failures.)

https://bugzilla.readthedocs.io/en/latest/api/

-- 
Joseph S. Myers
jos...@codesourcery.com


[PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-10-05 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555019.html

On 9/28/20 4:01 PM, Martin Sebor wrote:

On 9/25/20 11:17 PM, Jason Merrill wrote:

On 9/22/20 4:05 PM, Martin Sebor wrote:

The rebased and retested patches are attached.

On 9/21/20 3:17 PM, Martin Sebor wrote:
Ping: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html


(I'm working on rebasing the patch on top of the latest trunk which
has changed some of the same code but it'd be helpful to get a go-
ahead on substance the changes.  I don't expect the rebase to
require any substantive modifications.)

Martin

On 9/14/20 4:01 PM, Martin Sebor wrote:

On 9/4/20 11:14 AM, Jason Merrill wrote:

On 9/3/20 2:44 PM, Martin Sebor wrote:

On 9/1/20 1:22 PM, Jason Merrill wrote:

On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:
-Wplacement-new handles array indices and pointer offsets the 
same:

by adjusting them by the size of the element.  That's correct for
the latter but wrong for the former, causing false positives when
the element size is greater than one.

In addition, the warning doesn't even attempt to handle arrays of
arrays.  I'm not sure if I forgot or if I simply didn't think of
it.

The attached patch corrects these oversights by replacing most
of the -Wplacement-new code with a call to compute_objsize which
handles all this correctly (plus more), and is also better tested.
But even compute_objsize has bugs: it trips up while converting
wide_int to offset_int for some pointer offset ranges.  Since
handling the C++ IL required changes in this area the patch also
fixes that.

For review purposes, the patch affects just the middle end.
The C++ diff pretty much just removes code from the front end.


The C++ changes are OK.


Thank you for looking at the rest as well.




-compute_objsize (tree ptr, int ostype, access_ref *pref,
-    bitmap *visited, const vr_values *rvals /* = 
NULL */)
+compute_objsize (tree ptr, int ostype, access_ref *pref, 
bitmap *visited,

+    const vr_values *rvals)


This reformatting seems unnecessary, and I prefer to keep the 
comment about the default argument.


This overload doesn't take a default argument.  (There was a stray
declaration of a similar function at the top of the file that had
one.  I've removed it.)


Ah, true.


-  if (!size || TREE_CODE (size) != INTEGER_CST)
-   return false;

 >...

You change some failure cases in compute_objsize to return 
success with a maximum range, while others continue to return 
failure. This needs commentary about the design rationale.


This is too much for a comment in the code but the background is
this: compute_objsize initially returned the object size as a 
constant.
Recently, I have enhanced it to return a range to improve 
warnings for
allocated objects.  With that, a failure can be turned into 
success by
having the function set the range to that of the largest object. 
That

should simplify the function's callers and could even improve
the detection of some invalid accesses.  Once this change is made
it might even be possible to change its return type to void.

The change that caught your eye is necessary to make the function
a drop-in replacement for the C++ front end code which makes this
same assumption.  Without it, a number of test cases that exercise
VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:

   void f (int n)
   {
 char a[n];
 new (a - 1) int ();
   }

Changing any of the other places isn't necessary for existing tests
to pass (and I didn't want to introduce too much churn).  But I do
want to change the rest of the function along the same lines at some
point.


Please do change the other places to be consistent; better to have 
more churn than to leave the function half-updated.  That can be a 
separate patch if you prefer, but let's do it now rather than later.


I've made most of these changes in the other patch (also attached).
I'm quite happy with the result but it turned out to be a lot more
work than either of us expected, mostly due to the amount of testing.

I've left a couple of failing cases in place mainly as reminders
to handle them better (which means I also didn't change the caller
to avoid testing for failures).  I've also added TODO notes with
reminders to handle some of the new codes more completely.




+  special_array_member sam{ };


sam is always set by component_ref_size, so I don't think it's 
necessary to initialize it at the declaration.


I find initializing pass-by-pointer local variables helpful but
I don't insist on it.




@@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
   tree last_type = TREE_TYPE (last);
   if (TREE_CODE (last_type) != ARRAY_TYPE
   || TYPE_SIZE (last_type))
-    return size;
+    return size ? size : TYPE_SIZE_UNIT (type);


This change seems to violate the comment for the function.


By my reading (and writing) the change is covered by the first
sentence:

    Returns the size of 

[PING][PATCH] make handling of zero-length arrays in C++ pretty printer more robust (PR 97201)

2020-10-05 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554893.html

On 9/25/20 12:58 PM, Martin Sebor wrote:

The C and C++ representations of zero-length arrays are different:
C uses a null upper bound of the type's domain while C++ uses
SIZE_MAX.  This makes the middle end logic more complicated (and
prone to mistakes) because it has to be prepared for both.  A recent
change to -Warray-bounds has the middle end create a zero-length
array to print in a warning message.  I forgot about this gotcha
and, as a result, when the warning triggers under these conditions
in C++, it causes an ICE in the C++ pretty printer that in turn
isn't prepared for the C form of the domain.

In my mind, the "right fix" is to make the representation the same
between the front ends, but I'm certain that such a change would
cause more problems before it solved them.  Another solution might
be to provide APIs for creating (and querying) arrays and have them
call language hooks in cases where the representation might differ.
But that would likely be quite intrusive as well.  So with that in
mind, for the time being, the attached patch just continues to deal
with the difference by teaching the C++ pretty printer to also
recognize the C form of the zero-length domain.

While testing the one line fix I noticed that -Warray-bounds (and
therefore, I assume also all other warnings that detect out of bounds
accesses to allocated objects) triggers only for the ordinary form of
operator new and not for the nothrow overload, for instance.  That's
because the ordinary form is recognized as a built-in which has
the alloc_size attribute attached to it.  But because the other forms
are neither built-in nor declared in  with the same attribute,
the warning doesn't trigger.  So the patch also adds the attribute
to the declarations of these overloads in .  In addition, it
adds attribute malloc to a couple of overloads of the operator that
it's missing from.

Tested on x86_64-linux.

Martin




Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-05 Thread Ian Lance Taylor via Gcc-patches
On Mon, Oct 5, 2020 at 9:09 AM Martin Liška  wrote:
>
> The previous patch was not correct. This one should be.
>
> Ready for master?

I don't understand why this code uses symtab_indices_shndx at all.
There should only be one SHT_SYMTAB_SHNDX section.  There shouldn't be
any need for the symtab_indices_shndx vector.

But in any case this patch looks OK.

Thanks.

Ian


Re: Fix handling of stores in modref_summary::useful_p

2020-10-05 Thread Szabolcs Nagy via Gcc-patches
The 10/05/2020 12:52, Vaseeharan Vinayagamoorthy wrote:
> Hi,
> 
> After this patch, I am noticing that some glibc crypto tests get stuck in 
> scanf which goes into busy loop.
> 
> My build/host/target setup is:
> Build: aarch64-none-linux-gnu
> Host: aarch64-none-linux-gnu
> Target: aarch64-none-linux-gnu

i can reproduce this on aarch64, i'm looking at it:

if i compile glibc with gcc trunk after this commit i see

$ ./testrun.sh crypt/cert < $glibcsrc/crypt/cert.input
 K:  P:  C:  Encrypt FAIL
 K:  P:  C:  Encrypt FAIL
 K:  P:  C:  Encrypt FAIL
 K:  P:  C:  Encrypt FAIL
 K:  P:  C:  Encrypt FAIL
 K:  P:  C:  Encrypt FAIL
...

it just keeps printing this.

same test binary with glibc code compiled with an
older gcc works, so something in glibc gets miscompiled.

i will have to do more digging to figure out what.




> 
> 
> 
> Kind regards
> Vasee
> 
> 
> On 27/09/2020, 22:46, "Gcc-patches on behalf of Jan Hubicka" 
>  wrote:
> 
> Hi,
> this patch fixes a pasto in modref_summary::useful_p that made
> ipa-modref to give up on tracking stores when all load info got lost.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> gcc/ChangeLog:
> 
> 2020-09-27  Jan Hubicka  
> 
> * ipa-modref.c (modref_summary::useful_p): Fix testing of stores.
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 728c6c1523d..6225552e41a 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -135,7 +135,7 @@ modref_summary::useful_p (int ecf_flags)
>  return true;
>if (ecf_flags & ECF_PURE)
>  return false;
> -  return stores && !loads->every_base;
> +  return stores && !stores->every_base;
>  }
> 
>  /* Dump A to OUT.  */
> 

-- 


Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-05 Thread Martin Liška

Hi.

The previous patch was not correct. This one should be.

Ready for master?
Thanks,
Martin
>From a96f7ae39b5d56ce886edf1bfb9ca6475a857652 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 5 Oct 2020 18:03:08 +0200
Subject: [PATCH] lto: fix LTO debug sections copying.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

readelf -S prints:

There are 81999 section headers, starting at offset 0x1f488060:

Section Headers:
  [Nr] Name  TypeAddress  OffSize   ES Flg Lk Inf Al
  [ 0]   NULL 00 01404f 00 81998   0  0
  [ 1] .groupGROUP    40 08 04 81995 105027  4
...
  [81995] .symtab   SYMTAB   d5d9298 2db310 18 81997 105026  8
  [81996] .symtab_shndx SYMTAB SECTION INDICES  d8b45a8 079dd8 04 81995   0  4
  [81997] .strtab   STRTAB   d92e380 80460c 00  0   0  1
...

Looking at the documentation:
Table 7–15 ELF sh_link and sh_info Interpretation

sh_type - sh_link
SHT_SYMTAB - The section header index of the associated string table.
SHT_SYMTAB_SHNDX - The section header index of the associated symbol table.

As seen, sh_link of a SHT_SYMTAB always points to a .strtab and readelf
confirms that.

So we need to use reverse mapping taken from
  [81996] .symtab_shndx SYMTAB SECTION INDICES  d8b45a8 079dd8 04 81995   0  4

where sh_link points to 81995.

libiberty/ChangeLog:

PR lto/97290
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
Use sh_link of a .symtab_shndx section.
---
 libiberty/simple-object-elf.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 7c9d492f6a4..37e73348cb7 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1191,7 +1191,7 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	  unsigned int sh_link;
 	  sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
  shdr, sh_link, Elf_Word);
-	  symtab_indices_shndx[sh_link - 1] = i;
+	  symtab_indices_shndx[sh_link - 1] = i - 1;
 	  /* Always discard the extended index sections, after
 	 copying it will not be needed.  This way we don't need to
 	 update it and deal with the ordering constraints of
@@ -1372,19 +1372,22 @@ simple_object_elf_copy_lto_debug_sections (simple_object_read *sobj,
 	{
 	  unsigned entsize = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	  shdr, sh_entsize, Elf_Addr);
-	  unsigned strtab = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
-	 shdr, sh_link, Elf_Word);
 	  size_t prevailing_name_idx = 0;
 	  unsigned char *ent;
 	  unsigned *shndx_table = NULL;
 	  /* Read the section index table if present.  */
 	  if (symtab_indices_shndx[i - 1] != 0)
 	{
-	  unsigned char *sidxhdr = shdrs + (strtab - 1) * shdr_size;
+	  unsigned char *sidxhdr = shdrs + symtab_indices_shndx[i - 1] * shdr_size;
 	  off_t sidxoff = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	   sidxhdr, sh_offset, Elf_Addr);
 	  size_t sidxsz = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
 	   sidxhdr, sh_size, Elf_Addr);
+	  unsigned int shndx_type
+		= ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+   sidxhdr, sh_type, Elf_Word);
+	  if (shndx_type != SHT_SYMTAB_SHNDX)
+		return "Wrong section type of a SYMTAB SECTION INDICES section";
 	  shndx_table = (unsigned *)XNEWVEC (char, sidxsz);
 	  simple_object_internal_read (sobj->descriptor,
 	   sobj->offset + sidxoff,
-- 
2.28.0



Re: [patch] cleanup legacy_union and legacy intersect in value_range.

2020-10-05 Thread Andrew MacLeod via Gcc-patches

On 10/5/20 11:51 AM, Aldy Hernandez wrote:
More changes from the ranger branch that been tested and retested, 
including a full Fedora build.


These are cleanups so that multi-range union/intersect doesn't have to 
deal with legacy code.  Instead, these should be done in legacy mode.


OK pending new tests against trunk?

gcc/ChangeLog:

* value-range.cc (irange::legacy_intersect): Only handle
legacy ranges.
(irange::legacy_union): Same.
(irange::union_): When unioning legacy with non-legacy,
first convert to legacy and do everything in legacy mode.
(irange::intersect): Same, but for intersect.
* range-op.cc (range_tests): Adjust for above changes.


OK.



[patch] cleanup legacy_union and legacy intersect in value_range.

2020-10-05 Thread Aldy Hernandez via Gcc-patches
More changes from the ranger branch that been tested and retested, 
including a full Fedora build.


These are cleanups so that multi-range union/intersect doesn't have to 
deal with legacy code.  Instead, these should be done in legacy mode.


OK pending new tests against trunk?

gcc/ChangeLog:

* value-range.cc (irange::legacy_intersect): Only handle
legacy ranges.
(irange::legacy_union): Same.
(irange::union_): When unioning legacy with non-legacy,
first convert to legacy and do everything in legacy mode.
(irange::intersect): Same, but for intersect.
* range-op.cc (range_tests): Adjust for above changes.
---
 gcc/range-op.cc| 21 +++
 gcc/value-range.cc | 67 +++---
 2 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3ab268f101e..2f6cab44ee7 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -3837,6 +3837,27 @@ range_tests ()
   r0.invert ();
   ASSERT_TRUE (r0.nonzero_p ());

+  // test legacy interaction
+  // r0 = ~[1,1]
+  r0 = int_range<1> (UINT (1), UINT (1), VR_ANTI_RANGE);
+  // r1 = ~[3,3]
+  r1 = int_range<1> (UINT (3), UINT (3), VR_ANTI_RANGE);
+
+  // vv = [0,0][2,2][4, MAX]
+  int_range<3> vv = r0;
+  vv.intersect (r1);
+
+  ASSERT_TRUE (vv.contains_p (UINT (2)));
+  ASSERT_TRUE (vv.num_pairs () == 3);
+
+  // create r0 as legacy [1,1]
+  r0 = int_range<1> (UINT (1), UINT (1));
+  // And union it with  [0,0][2,2][4,MAX] multi range
+  r0.union_ (vv);
+  // The result should be [0,2][4,MAX], or ~[3,3]  but it must contain 2
+  ASSERT_TRUE (r0.contains_p (UINT (2)));
+
+
   multi_precision_range_tests ();
   int_range_max_tests ();
   operator_tests ();
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index ed2c322ded9..cdcc6c65594 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1093,19 +1093,14 @@ intersect_ranges (enum value_range_kind *vr0type,
 void
 irange::legacy_intersect (irange *vr0, const irange *vr1)
 {
+  gcc_checking_assert (vr0->legacy_mode_p ());
+  gcc_checking_assert (vr1->legacy_mode_p ());
   /* If either range is VR_VARYING the other one wins.  */
   if (vr1->varying_p ())
 return;
   if (vr0->varying_p ())
 {
-  /* Avoid the full copy if we already know both sides are simple
-and can be trivially copied.  */
-  if (vr1->legacy_mode_p ())
-   {
- vr0->set (vr1->min (), vr1->max (), vr1->kind ());
- return;
-   }
-  *vr0 = *vr1;
+  vr0->set (vr1->min (), vr1->max (), vr1->kind ());
   return;
 }

@@ -1122,17 +1117,9 @@ irange::legacy_intersect (irange *vr0, const 
irange *vr1)

   value_range_kind vr0kind = vr0->kind ();
   tree vr0min = vr0->min ();
   tree vr0max = vr0->max ();
-  /* Handle multi-ranges that can be represented as anti-ranges.  */
-  if (!vr1->legacy_mode_p () && vr1->maybe_anti_range ())
-{
-  int_range<3> tmp (*vr1);
-  tmp.invert ();
-  intersect_ranges (, , ,
-   VR_ANTI_RANGE, tmp.min (), tmp.max ());
-}
-  else
-intersect_ranges (, , ,
- vr1->kind (), vr1->min (), vr1->max ());
+
+  intersect_ranges (, , ,
+   vr1->kind (), vr1->min (), vr1->max ());

   /* Make sure to canonicalize the result though as the inversion of a
  VR_RANGE can still be a VR_RANGE.  */
@@ -1427,6 +1414,9 @@ give_up:
 void
 irange::legacy_union (irange *vr0, const irange *vr1)
 {
+  gcc_checking_assert (vr0->legacy_mode_p ());
+  gcc_checking_assert (vr1->legacy_mode_p ());
+
   /* VR0 has the resulting range if VR1 is undefined or VR0 is 
varying.  */

   if (vr1->undefined_p ()
   || vr0->varying_p ())
@@ -1435,16 +1425,10 @@ irange::legacy_union (irange *vr0, const irange 
*vr1)
   /* VR1 has the resulting range if VR0 is undefined or VR1 is 
varying.  */

   if (vr0->undefined_p ())
 {
-  /* Avoid the full copy if we already know both sides are simple
-and can be trivially copied.  */
-  if (vr1->legacy_mode_p ())
-   {
- vr0->set (vr1->min (), vr1->max (), vr1->kind ());
- return;
-   }
-  *vr0 = *vr1;
+  vr0->set (vr1->min (), vr1->max (), vr1->kind ());
   return;
 }
+
   if (vr1->varying_p ())
 {
   vr0->set_varying (vr1->type ());
@@ -1454,17 +1438,9 @@ irange::legacy_union (irange *vr0, const irange *vr1)
   value_range_kind vr0kind = vr0->kind ();
   tree vr0min = vr0->min ();
   tree vr0max = vr0->max ();
-  /* Handle multi-ranges that can be represented as anti-ranges.  */
-  if (!vr1->legacy_mode_p () && vr1->maybe_anti_range ())
-{
-  int_range<3> tmp (*vr1);
-  tmp.invert ();
-  union_ranges (, , ,
-   VR_ANTI_RANGE, tmp.min (), tmp.max ());
-}
-  else
-union_ranges (, , ,
- vr1->kind (), vr1->min (), vr1->max ());
+
+  union_ranges (, , ,
+   vr1->kind (), vr1->min (), vr1->max ());

   if 

Re: UX ideas for diagnostics involving ranges (was Re: [patch] convert -Walloca pass to ranger)

2020-10-05 Thread Andrew MacLeod via Gcc-patches

On 10/5/20 11:28 AM, David Malcolm via Gcc-patches wrote:

On Mon, 2020-10-05 at 11:51 +0200, Aldy Hernandez via Gcc-patches
wrote:

The walloca pass is a mess.  It has all sorts of heuristics to
divine
problematic ranges fed into alloca, none of them very good, and all
of
them unreadable.  The mess therein was actually one of the original
motivators for the ranger project (along with array bounds checking).

The attached patch is a conversion of the pass to ranger.  It's
mostly
an exercise in removing code.  The entire pass almost reduces to:

+  // If the user specified a limit, use it.
+  int_range_max r;
+  if (warn_limit_specified_p (is_vla)
+  && TREE_CODE (len) == SSA_NAME
+  && query.range_of_expr (r, len, stmt)
+  && !r.varying_p ())
+{
+  // The invalid bits are anything outside of [0, MAX_SIZE].
+  static int_range<2> invalid_range (build_int_cst
(size_type_node, 0),
+build_int_cst
(size_type_node,
+   max_size),
+VR_ANTI_RANGE);
+
+  r.intersect (invalid_range);
+  if (r.undefined_p ())
+   return alloca_type_and_limit (ALLOCA_OK);
+
+  return alloca_type_and_limit (ALLOCA_BOUND_MAYBE_LARGE,
+   wi::to_wide (integer_zero_node));
   }

That is, if the range of the integer passed to alloca is outside of
[0,MAX_SIZE], warn, otherwise it's ok.  Plain and simple.

You will notice I removed the nuanced errors we gave before-- like
trying to guess whether the problematic range came by virtue of a
signed
cast conversion.  These specific errors were never part of the
original
design, they were just stuff we could guess by how the IL
looked.  It
was non-exact and fragile.  Now we just say the alloca argument may
be
too large, period.

It the future, I would even like to remove the specific range the
ranger
was able to compute from the error message itself.  As will become
obvious, the ranger can get pretty outrageous ranges that are
entirely
non-obvious by looking at the code.  Peppering the error messages
with
these ranges will ultimately just confuse the user.  But alas, that's
a
problem for another patch to solve.

I can't comment on the content of the patch itself, but with my "user
experience hat" on I'm wondering what diagnostic messages involving
ranges ought to look like.  I worry that if we simply drop range
information altogether from the messages, we're not giving the user
enough information to make a judgment call on whether to pay attention
to the diagnostic.  That said, I'm unhappy with the status quo of our
range-based messages, so I don't object to the patch.


I also dont know what the right answer is regarding dumping of ranges.  
Most of the time I suspect the error only needs the end points, not the 
fun stuff in the middle if its a multi-range... but I suppose thats up 
to the actual issuer of the message to decide if thats important.


Some possible ideas:

I added support for capturing and printing control-flow paths for
diagnostics in gcc 10; see gcc/diagnostic-path.h.  I added this for the
analyzer code, but it's usable outside of it.  It's possible to use
this to associate a chain of events with a diagnostic, by building a
diagnostic_path and associating it with a rich_location.  Perhaps the
ranger code could have a mode which captures diagnostic_paths, where
the events in the path give pertinent information about ranges - e.g.
the various conditionals that lead to a particular range.  There's a
pre-canned simple_diagnostic_path that can be populated with
simple_diagnostic_event, or you could create your own ranger-specific
event and path subclasses.  These can be temporary objects that live on
the stack next to the rich_location, just for the lifetime of emitting
the diagnostic.  Perhaps a "class ranger_rich_location" which
implicitly supplies an instance of "class ranger_diagnostic_path" to
the diagnostic, describing where the range information comes from?
For debugging I have a trace, but it doesnt log the trace. just sort of 
logs as it does the walk backwards to determine things.  I use it to 
figure out where things have gone amok.


  THis sort of thing would probably be best tackled by simply 
inheriting from a ranger and overloading the core routines to log them.. 
thats effectively what the tracer does. just emits info before and after 
each of the core 5 calls.  I suppose one could just as easily log the 
info that comes back, and with a little bit of extra understanding, 
determine whether this is a "new" value or a previously computed one and 
automatically do what I am doing manually when determining the 
origination of something.




Here are some more out-there ideas I had some years ago (I can't
remember if these ever made it to the mailing list).  These ideas focus
more on how the ranges are used, rather than where the ranges come
from.  There's an 

Re: c++: Make spell corrections consistent

2020-10-05 Thread Nathan Sidwell

On 10/5/20 10:17 AM, Jakub Jelinek wrote:

On Mon, Oct 05, 2020 at 09:39:01AM -0400, Nathan Sidwell wrote:

My change to namespace-scope spell corrections ignored the issue that
different targets might have different builtins, and therefore perturb
iteration order.  This fixes it by using an intermediate array of
identifier, which we sort before considering.

 gcc/cp/
 * name-lookup.c (maybe_add_fuzzy_decl): New.
 (maybe_add_fuzzy_binding): New.
 (consider_binding_level): Use intermediate sortable vector for
 namespace bindings.
 gcc/testsuite/
 * c-c++-common/spellcheck-reserved.c: Restore diagnostic.


Won't that be unnecessarily expensive?
I mean, as implemented, it will push into the vector all non-artificial
decls, so perhaps tens of thousands of them into the vector, then qsort the 
vector
and only then consider it.


Well, we're giving a diagnostic, so expensive isn't really a consideration


So, either the code could hand-inline parts of what consider method
considers and only push into vectors decls that have the currently best
distance (and when encountering a better one truncate the vector before
pushing in).
Or add spellcheck.h consider alternative that would from candidates with the
same distance choose some particular one (e.g. the one where strcmp says it
compares earlier).
The
 if (min_candidate_distance >= m_best_distance)
would probably need changing to > and then have dist == m_best_distance
handling.


I tried fiddling with the distance metric, but that wasn't promising. 
It's simply too coarse.


nathan

--
Nathan Sidwell


Re: [PATCH v2] builtins: rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2020-10-05 Thread Segher Boessenkool
Hi!

On Sun, Oct 04, 2020 at 09:56:01PM -0400, Hans-Peter Nilsson wrote:
> Please excuse a comment from the gallery:
> 
> On Mon, 28 Sep 2020, will schmidt via Gcc-patches wrote:
> > On Fri, 2020-09-04 at 12:52 -0300, Raoni Fassina Firmino via Gcc-patches 
> > wrote:
> > > +(define_expand "feraiseexceptsi"
> > > +  [(use (match_operand:SI 1 "const_int_operand" "n"))
> > > +   (set (match_operand:SI 0 "gpc_reg_operand")
> > > + (const_int 0))]
> > > +  "TARGET_HARD_FLOAT"
> > > +{
> > > +  switch (INTVAL (operands[1]))
> > > +{
> > > +case 0x200:  /* FE_INEXACT */
> > > +case 0x400:  /* FE_DIVBYZERO */
> > > +case 0x800:  /* FE_UNDERFLOW */
> > > +case 0x1000: /* FE_OVERFLOW */
> > > +  break;
> > > +default:
> > > +  FAIL;
> > > +}
> > > +
> > > +  rtx tmp = gen_rtx_CONST_INT (SImode, __builtin_clz (INTVAL 
> > > (operands[1])));
> 
> This doesn't appear to be very portable, to any-cxx11-compiler
> that doesn't pretend to be gcc-intrinsics-compatible.

Yeah, very good point!

Should this pattern not allow setting more than one exception bit at
once, btw?

So you can first see if any out-of-range bits are set:

  unsigned int fe = INTVAL (operands[1]);
  if ((fe & 0x1e00) != fe)
FAIL;

and then see if more than one bit is set:

  if (fe & (fe - 1))
FAIL;

but also disallow zero:

  if (!fe)
FAIL;

Or, you can just put the bit number in the switch cases for the four
bits.  There are only four, after all.

Thanks,


Segher


UX ideas for diagnostics involving ranges (was Re: [patch] convert -Walloca pass to ranger)

2020-10-05 Thread David Malcolm via Gcc-patches
On Mon, 2020-10-05 at 11:51 +0200, Aldy Hernandez via Gcc-patches
wrote:
> The walloca pass is a mess.  It has all sorts of heuristics to
> divine 
> problematic ranges fed into alloca, none of them very good, and all
> of 
> them unreadable.  The mess therein was actually one of the original 
> motivators for the ranger project (along with array bounds checking).
> 
> The attached patch is a conversion of the pass to ranger.  It's
> mostly 
> an exercise in removing code.  The entire pass almost reduces to:
> 
> +  // If the user specified a limit, use it.
> +  int_range_max r;
> +  if (warn_limit_specified_p (is_vla)
> +  && TREE_CODE (len) == SSA_NAME
> +  && query.range_of_expr (r, len, stmt)
> +  && !r.varying_p ())
> +{
> +  // The invalid bits are anything outside of [0, MAX_SIZE].
> +  static int_range<2> invalid_range (build_int_cst
> (size_type_node, 0),
> +build_int_cst
> (size_type_node,
> +   max_size),
> +VR_ANTI_RANGE);
> +
> +  r.intersect (invalid_range);
> +  if (r.undefined_p ())
> +   return alloca_type_and_limit (ALLOCA_OK);
> +
> +  return alloca_type_and_limit (ALLOCA_BOUND_MAYBE_LARGE,
> +   wi::to_wide (integer_zero_node));
>   }
> 
> That is, if the range of the integer passed to alloca is outside of 
> [0,MAX_SIZE], warn, otherwise it's ok.  Plain and simple.
> 
> You will notice I removed the nuanced errors we gave before-- like 
> trying to guess whether the problematic range came by virtue of a
> signed 
> cast conversion.  These specific errors were never part of the
> original 
> design, they were just stuff we could guess by how the IL
> looked.  It 
> was non-exact and fragile.  Now we just say the alloca argument may
> be 
> too large, period.
> 
> It the future, I would even like to remove the specific range the
> ranger 
> was able to compute from the error message itself.  As will become 
> obvious, the ranger can get pretty outrageous ranges that are
> entirely 
> non-obvious by looking at the code.  Peppering the error messages
> with 
> these ranges will ultimately just confuse the user.  But alas, that's
> a 
> problem for another patch to solve.

I can't comment on the content of the patch itself, but with my "user
experience hat" on I'm wondering what diagnostic messages involving
ranges ought to look like.  I worry that if we simply drop range
information altogether from the messages, we're not giving the user
enough information to make a judgment call on whether to pay attention
to the diagnostic.  That said, I'm unhappy with the status quo of our
range-based messages, so I don't object to the patch.

Some possible ideas:

I added support for capturing and printing control-flow paths for
diagnostics in gcc 10; see gcc/diagnostic-path.h.  I added this for the
analyzer code, but it's usable outside of it.  It's possible to use
this to associate a chain of events with a diagnostic, by building a
diagnostic_path and associating it with a rich_location.  Perhaps the
ranger code could have a mode which captures diagnostic_paths, where
the events in the path give pertinent information about ranges - e.g.
the various conditionals that lead to a particular range.  There's a
pre-canned simple_diagnostic_path that can be populated with
simple_diagnostic_event, or you could create your own ranger-specific
event and path subclasses.  These can be temporary objects that live on
the stack next to the rich_location, just for the lifetime of emitting
the diagnostic.  Perhaps a "class ranger_rich_location" which
implicitly supplies an instance of "class ranger_diagnostic_path" to
the diagnostic, describing where the range information comes from?


Here are some more out-there ideas I had some years ago (I can't
remember if these ever made it to the mailing list).  These ideas focus
more on how the ranges are used, rather than where the ranges come
from.  There's an interaction, though, so I think it's on-topic - can
the ranger code supply this kind of information?

Consider this problematic call to sprintf:

$ cat demo.c
#include 

const char *test_1 (const char *msg)
{
  static char buf[16];
  sprintf (buf, "msg: %s\n", msg);
  return buf; 
}

void test_2 ()
{
  test_1 ("this is long enough to cause trouble");
}

Right now, we emit this (this is trunk, plus some fixes for line-
numbering bugs):

$ ./xgcc -B. -c demo.c  -Wall -O2
demo.c: In function ‘test_2’:
demo.c:6:23: warning: ‘%s’ directive writing 36 bytes into a region of
size 11 [-Wformat-overflow=]
 6 |   sprintf (buf, "msg: %s\n", msg);
   |   ^~
demo.c:12:11:
12 |   test_1 ("this is long enough to cause trouble");
   |   ~~
demo.c:6:3: note: ‘sprintf’ output 43 bytes into a destination of size
16
 6 |   sprintf (buf, "msg: %s\n", 

Re: [patch] Import various range-op fixes from the ranger branch.

2020-10-05 Thread Aldy Hernandez via Gcc-patches

And now with changelog entry :).

gcc/ChangeLog:

* range-op.cc (operator_div::wi_fold): Return varying for
division by zero.
(class operator_rshift): Move class up.
(operator_abs::wi_fold): Return [-MIN,-MIN] for ABS([-MIN,-MIN]).
(operator_tests): Adjust tests.
---
 gcc/range-op.cc | 164 +---
 1 file changed, 114 insertions(+), 50 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3ab268f101e..11e847f02c1 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1317,10 +1317,10 @@ operator_div::wi_fold (irange , tree type,
   const wide_int _lb, const wide_int _ub,
   const wide_int _lb, const wide_int _ub) const
 {
-  // If we know we will divide by zero, return undefined.
+  // If we know we will divide by zero...
   if (rh_lb == 0 && rh_ub == 0)
 {
-  r.set_undefined ();
+  r.set_varying (type);
   return;
 }

@@ -1430,6 +1430,27 @@ public:
const wide_int &) const;
 } op_lshift;

+class operator_rshift : public cross_product_operator
+{
+public:
+  virtual bool fold_range (irange , tree type,
+  const irange ,
+  const irange ) const;
+  virtual void wi_fold (irange , tree type,
+   const wide_int _lb,
+   const wide_int _ub,
+   const wide_int _lb,
+   const wide_int _ub) const;
+  virtual bool wi_op_overflows (wide_int ,
+   tree type,
+   const wide_int ,
+   const wide_int ) const;
+  virtual bool op1_range (irange &, tree type,
+ const irange ,
+ const irange ) const;
+} op_rshift;
+
+
 bool
 operator_lshift::fold_range (irange , tree type,
 const irange ,
@@ -1546,60 +1567,47 @@ operator_lshift::op1_range (irange ,
   tree shift_amount;
   if (op2.singleton_p (_amount))
 {
-  int_range<1> shifted (shift_amount, shift_amount), ub, lb;
-  const range_operator *rshift_op = range_op_handler (RSHIFT_EXPR, 
type);

-  rshift_op->fold_range (ub, type, lhs, shifted);
-  if (TYPE_UNSIGNED (type))
+  wide_int shift = wi::to_wide (shift_amount);
+  gcc_checking_assert (wi::gt_p (shift, 0, SIGNED));
+
+  // Work completely in unsigned mode to start.
+  tree utype = type;
+  if (TYPE_SIGN (type) == SIGNED)
{
- r = ub;
- return true;
+ int_range_max tmp = lhs;
+ utype = unsigned_type_for (type);
+ range_cast (tmp, utype);
+ op_rshift.fold_range (r, utype, tmp, op2);
}
-  // For signed types, we can't just do an arithmetic rshift,
-  // because that will propagate the sign bit.
-  //
-  //  LHS
-  // 1110 = OP1 << 1
-  //
-  // Assuming a 4-bit signed integer, a right shift will result in
-  // OP1=, but OP1 could have also been 0111.  What we want is
-  // a range from 0111 to .  That is, a range from the logical
-  // rshift (0111) to the arithmetic rshift ().
-  //
-  // Perform a logical rshift by doing the rshift as unsigned.
-  tree unsigned_type = unsigned_type_for (type);
-  int_range_max unsigned_lhs = lhs;
-  range_cast (unsigned_lhs, unsigned_type);
-  rshift_op = range_op_handler (RSHIFT_EXPR, unsigned_type);
-  rshift_op->fold_range (lb, unsigned_type, unsigned_lhs, shifted);
-  range_cast (lb, type);
-  r = lb;
-  r.union_ (ub);
+  else
+   op_rshift.fold_range (r, utype, lhs, op2);
+
+  // Start with ranges which can produce the LHS by right shifting the
+  // result by the shift amount.
+  // ie   [0x08, 0xF0] = op1 << 2 will start with
+  //  [1000, ] = op1 << 2
+  //  [0x02, 0x4C] aka [0010, 0000]
+
+  // Then create a range from the LB with the least significant 
upper bit

+  // set, to the upper bound with all the bits set.
+  // This would be [0x42, 0xFC] aka [0110, 1100].
+
+  // Ideally we do this for each subrange, but just lump them all 
for now.

+  unsigned low_bits = TYPE_PRECISION (utype)
+ - TREE_INT_CST_LOW (shift_amount);
+  wide_int up_mask = wi::mask (low_bits, true, TYPE_PRECISION (utype));
+  wide_int new_ub = wi::bit_or (up_mask, r.upper_bound ());
+  wide_int new_lb = wi::set_bit (r.lower_bound (), low_bits);
+  int_range<2> fill_range (utype, new_lb, new_ub);
+  r.union_ (fill_range);
+
+  if (utype != type)
+   range_cast (r, type);
   return true;
 }
   return false;
 }

-
-class operator_rshift : public cross_product_operator
-{
-public:
-  virtual bool fold_range (irange , tree type,
-  const irange ,
-  const irange ) 

Re: [patch] Import various range-op fixes from the ranger branch.

2020-10-05 Thread Andrew MacLeod via Gcc-patches

On 10/5/20 11:19 AM, Aldy Hernandez wrote:

This patch imports three fixes from the ranger branch:

1. Fold division by zero into varying instead of undefined.
This provides compatibility with existing stuff on trunk.

2. Solver changes for lshift and rshift.
This should not affect anything on trunk, as it only involves
the GORI solver which is yet to be contributed.

3. Preserve existing behavior for ABS([-MIN,-MIN]).
This is actually unrepresentable, but trunk has traditionally
treated this as [-MIN,-MIN] so this patch just syncs range-ops
with the rest of trunk.

Approved off-line by Andrew.

I will commit once tests finish.

Aldy

gcc/ChangeLog:



OK with an actual ChangeLog.

Andrew



Re: [PATCH] lto: fix LTO debug sections copying.

2020-10-05 Thread Martin Liška

Adding Ian (and Richi) to CC.

On 10/5/20 5:20 PM, Martin Liška wrote:

As seen in the PR, we get to situation where we have a big number
of symbols (~125K) and thus we reach .symtab_shndx section usage.
For .symtab we get the following sh_link:

(gdb) p strtab
$1 = 81997

readelf -S prints:

There are 81999 section headers, starting at offset 0x1f488060:

Section Headers:
   [Nr] Name  Type    Address  Off    Size   ES Flg 
Lk Inf Al
   [ 0]   NULL     00 01404f 00 
81998   0  0
   [ 1] .group    GROUP    40 08 04 
81995 105027  4
...
   [81995] .symtab   SYMTAB   d5d9298 2db310 18 
    81997 105026  8
   [81996] .symtab_shndx SYMTAB SECTION INDICES  d8b45a8 
079dd8 04 81995   0  4
...

Apparently the index is starting from 1 and as we skip first section

│   1118  /* Read the section headers.  We skip section 0, which is not 
a
│   1119 useful section.  */

thus we need to subtract 2.

I run lto.exp and it's fine.
Ready for master?
Thanks,
Martin

libiberty/ChangeLog:

 PR lto/97290
 * simple-object-elf.c (simple_object_elf_copy_lto_debug_sections): Fix 
off-by-one error.
---
  libiberty/simple-object-elf.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 7c9d492f6a4..ce3e809e1e0 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1380,11 +1380,16 @@ simple_object_elf_copy_lto_debug_sections 
(simple_object_read *sobj,
    /* Read the section index table if present.  */
    if (symtab_indices_shndx[i - 1] != 0)
  {
-  unsigned char *sidxhdr = shdrs + (strtab - 1) * shdr_size;
+  unsigned char *sidxhdr = shdrs + (strtab - 2) * shdr_size;
    off_t sidxoff = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
     sidxhdr, sh_offset, Elf_Addr);
    size_t sidxsz = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
     sidxhdr, sh_size, Elf_Addr);
+  unsigned int shndx_type
+    = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+   sidxhdr, sh_type, Elf_Word);
+  if (shndx_type != SHT_SYMTAB_SHNDX)
+    return "Wrong section type of a SYMTAB SECTION INDICES section";
    shndx_table = (unsigned *)XNEWVEC (char, sidxsz);
    simple_object_internal_read (sobj->descriptor,
     sobj->offset + sidxoff,




[PATCH] lto: fix LTO debug sections copying.

2020-10-05 Thread Martin Liška

As seen in the PR, we get to situation where we have a big number
of symbols (~125K) and thus we reach .symtab_shndx section usage.
For .symtab we get the following sh_link:

(gdb) p strtab
$1 = 81997

readelf -S prints:

There are 81999 section headers, starting at offset 0x1f488060:

Section Headers:
  [Nr] Name  TypeAddress  OffSize   ES Flg 
Lk Inf Al
  [ 0]   NULL 00 01404f 00 
81998   0  0
  [ 1] .groupGROUP    40 08 04 
81995 105027  4
...
  [81995] .symtab   SYMTAB   d5d9298 2db310 18  
   81997 105026  8
  [81996] .symtab_shndx SYMTAB SECTION INDICES  d8b45a8 
079dd8 04 81995   0  4
...

Apparently the index is starting from 1 and as we skip first section

│   1118  /* Read the section headers.  We skip section 0, which is not 
a
│   1119 useful section.  */

thus we need to subtract 2.

I run lto.exp and it's fine.
Ready for master?
Thanks,
Martin

libiberty/ChangeLog:

PR lto/97290
* simple-object-elf.c (simple_object_elf_copy_lto_debug_sections): Fix 
off-by-one error.
---
 libiberty/simple-object-elf.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
index 7c9d492f6a4..ce3e809e1e0 100644
--- a/libiberty/simple-object-elf.c
+++ b/libiberty/simple-object-elf.c
@@ -1380,11 +1380,16 @@ simple_object_elf_copy_lto_debug_sections 
(simple_object_read *sobj,
  /* Read the section index table if present.  */
  if (symtab_indices_shndx[i - 1] != 0)
{
- unsigned char *sidxhdr = shdrs + (strtab - 1) * shdr_size;
+ unsigned char *sidxhdr = shdrs + (strtab - 2) * shdr_size;
  off_t sidxoff = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
   sidxhdr, sh_offset, Elf_Addr);
  size_t sidxsz = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
   sidxhdr, sh_size, Elf_Addr);
+ unsigned int shndx_type
+   = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
+  sidxhdr, sh_type, Elf_Word);
+ if (shndx_type != SHT_SYMTAB_SHNDX)
+   return "Wrong section type of a SYMTAB SECTION INDICES section";
  shndx_table = (unsigned *)XNEWVEC (char, sidxsz);
  simple_object_internal_read (sobj->descriptor,
   sobj->offset + sidxoff,
--
2.28.0



Re: [r11-3641 Regression] FAIL: gcc.dg/torture/pta-ptrarith-1.c -Os scan-tree-dump alias "ESCAPED = {[^\n}]* i f [^\n}]*}" on Linux/x86_64 (-m32 -march=cascadelake)

2020-10-05 Thread Segher Boessenkool
On Sun, Oct 04, 2020 at 09:51:23AM -0700, H.J. Lu wrote:
> On Sat, Oct 3, 2020 at 5:57 PM Segher Boessenkool
>  wrote:
> > On Sat, Oct 03, 2020 at 12:21:04PM -0700, sunil.k.pandey via Gcc-patches 
> > wrote:
> > > On Linux/x86_64,
> > >
> > > c34db4b6f8a5d80367c709309f9b00cb32630054 is the first bad commit
> > > commit c34db4b6f8a5d80367c709309f9b00cb32630054
> > > Author: Jan Hubicka 
> > > Date:   Sat Oct 3 17:20:16 2020 +0200
> > >
> > > Track access ranges in ipa-modref
> > >
> > > caused
> >
> > [ ... ]
> >
> > This isn't a patch.  Wrong mailing list?
> 
> I view this as a follow up of
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555314.html

But it *isn't* a follow-up of that mail.  That is my point.  Most of
these messages do not finger any particular patch even, I think?

> What do people think about this kind of followups?  Is this appropriate
> for this mailing list?

Please just use bugzilla.  And report bugs there the way they should be
reported: full command lines, full description of the errors, and
everything else needed to easily reproduce the problem.

*Actually* following up to the patch mail could be useful (but you can
than just point to the bugzilla).  Sending spam to gcc-patches@ is not
useful for most users of the list.


Segher


[patch] Import various range-op fixes from the ranger branch.

2020-10-05 Thread Aldy Hernandez via Gcc-patches

This patch imports three fixes from the ranger branch:

1. Fold division by zero into varying instead of undefined.
This provides compatibility with existing stuff on trunk.

2. Solver changes for lshift and rshift.
This should not affect anything on trunk, as it only involves
the GORI solver which is yet to be contributed.

3. Preserve existing behavior for ABS([-MIN,-MIN]).
This is actually unrepresentable, but trunk has traditionally
treated this as [-MIN,-MIN] so this patch just syncs range-ops
with the rest of trunk.

Approved off-line by Andrew.

I will commit once tests finish.

Aldy

gcc/ChangeLog:

* range-op.cc (operator_div::wi_fold):
(class operator_rshift):
(operator_abs::wi_fold):
(operator_tests):
---
 gcc/range-op.cc | 164 +---
 1 file changed, 114 insertions(+), 50 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3ab268f101e..11e847f02c1 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1317,10 +1317,10 @@ operator_div::wi_fold (irange , tree type,
   const wide_int _lb, const wide_int _ub,
   const wide_int _lb, const wide_int _ub) const
 {
-  // If we know we will divide by zero, return undefined.
+  // If we know we will divide by zero...
   if (rh_lb == 0 && rh_ub == 0)
 {
-  r.set_undefined ();
+  r.set_varying (type);
   return;
 }

@@ -1430,6 +1430,27 @@ public:
const wide_int &) const;
 } op_lshift;

+class operator_rshift : public cross_product_operator
+{
+public:
+  virtual bool fold_range (irange , tree type,
+  const irange ,
+  const irange ) const;
+  virtual void wi_fold (irange , tree type,
+   const wide_int _lb,
+   const wide_int _ub,
+   const wide_int _lb,
+   const wide_int _ub) const;
+  virtual bool wi_op_overflows (wide_int ,
+   tree type,
+   const wide_int ,
+   const wide_int ) const;
+  virtual bool op1_range (irange &, tree type,
+ const irange ,
+ const irange ) const;
+} op_rshift;
+
+
 bool
 operator_lshift::fold_range (irange , tree type,
 const irange ,
@@ -1546,60 +1567,47 @@ operator_lshift::op1_range (irange ,
   tree shift_amount;
   if (op2.singleton_p (_amount))
 {
-  int_range<1> shifted (shift_amount, shift_amount), ub, lb;
-  const range_operator *rshift_op = range_op_handler (RSHIFT_EXPR, 
type);

-  rshift_op->fold_range (ub, type, lhs, shifted);
-  if (TYPE_UNSIGNED (type))
+  wide_int shift = wi::to_wide (shift_amount);
+  gcc_checking_assert (wi::gt_p (shift, 0, SIGNED));
+
+  // Work completely in unsigned mode to start.
+  tree utype = type;
+  if (TYPE_SIGN (type) == SIGNED)
{
- r = ub;
- return true;
+ int_range_max tmp = lhs;
+ utype = unsigned_type_for (type);
+ range_cast (tmp, utype);
+ op_rshift.fold_range (r, utype, tmp, op2);
}
-  // For signed types, we can't just do an arithmetic rshift,
-  // because that will propagate the sign bit.
-  //
-  //  LHS
-  // 1110 = OP1 << 1
-  //
-  // Assuming a 4-bit signed integer, a right shift will result in
-  // OP1=, but OP1 could have also been 0111.  What we want is
-  // a range from 0111 to .  That is, a range from the logical
-  // rshift (0111) to the arithmetic rshift ().
-  //
-  // Perform a logical rshift by doing the rshift as unsigned.
-  tree unsigned_type = unsigned_type_for (type);
-  int_range_max unsigned_lhs = lhs;
-  range_cast (unsigned_lhs, unsigned_type);
-  rshift_op = range_op_handler (RSHIFT_EXPR, unsigned_type);
-  rshift_op->fold_range (lb, unsigned_type, unsigned_lhs, shifted);
-  range_cast (lb, type);
-  r = lb;
-  r.union_ (ub);
+  else
+   op_rshift.fold_range (r, utype, lhs, op2);
+
+  // Start with ranges which can produce the LHS by right shifting the
+  // result by the shift amount.
+  // ie   [0x08, 0xF0] = op1 << 2 will start with
+  //  [1000, ] = op1 << 2
+  //  [0x02, 0x4C] aka [0010, 0000]
+
+  // Then create a range from the LB with the least significant 
upper bit

+  // set, to the upper bound with all the bits set.
+  // This would be [0x42, 0xFC] aka [0110, 1100].
+
+  // Ideally we do this for each subrange, but just lump them all 
for now.

+  unsigned low_bits = TYPE_PRECISION (utype)
+ - TREE_INT_CST_LOW (shift_amount);
+  wide_int up_mask = wi::mask (low_bits, true, TYPE_PRECISION (utype));
+  wide_int new_ub = wi::bit_or (up_mask, r.upper_bound ());
+  

Re: [PATCH] calls.c:can_implement_as_sibling_call_p REG_PARM_STACK_SPACE check

2020-10-05 Thread Segher Boessenkool
Hi!

On Sun, Oct 04, 2020 at 11:09:11PM +1030, Alan Modra wrote:
> On Fri, Oct 02, 2020 at 01:50:24PM -0500, Segher Boessenkool wrote:
> > > +  /* If reg parm stack space increases, we cannot sibcall.  */
> > > +  if (REG_PARM_STACK_SPACE (decl ? decl : fntype)
> > > +  > REG_PARM_STACK_SPACE (current_function_decl))
> > > +{
> > > +  maybe_complain_about_tail_call (exp,
> > > +   "inconsistent size of stack space"
> > > +   " allocated for arguments which are"
> > > +   " passed in registers");
> > > +  return false;
> > > +}
> > 
> > Maybe change the message?  You allow all sizes smaller or equal than
> > the current size, "inconsistent" isn't very great for that.
> 
> We're talking about just two sizes here.  For 64-bit ELFv2 the reg
> parm save size is either 0 or 64 bytes.  Yes, a better message would
> be "caller lacks stack space allocated for aguments passed in
> registers, required by callee".

Something like that yes.  However:

> Note that I'll likely be submitting a further patch that removes the
> above code in rs6000-logue.c.  I thought is safer to only make a small
> change at the same time as moving code around.

Yeah, just keep it then.


Segher


[committed] libstdc++: Make allocators throw bad_array_new_length on overflow [LWG 3190]

2020-10-05 Thread Jonathan Wakely via Gcc-patches
std::allocator and std::pmr::polymorphic_allocator should throw
std::bad_array_new_length from their allocate member functions if the
number of bytes required cannot be represented in std::size_t.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Add new symbol.
* include/bits/functexcept.h (__throw_bad_array_new_length):
Declare new function.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Throw bad_array_new_length for impossible sizes (LWG 3190).
* include/ext/new_allocator.h (new_allocator::allocate):
Likewise.
* include/std/memory_resource (polymorphic_allocator::allocate)
(polymorphic_allocator::allocate_object): Use new function,
__throw_bad_array_new_length.
* src/c++11/functexcept.cc (__throw_bad_array_new_length):
Define.
* testsuite/20_util/allocator/lwg3190.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit f92a504fdd943527687faf9557e0b39ff7fe6125
Author: Jonathan Wakely 
Date:   Mon Oct 5 15:16:58 2020

libstdc++: Make allocators throw bad_array_new_length on overflow [LWG 3190]

std::allocator and std::pmr::polymorphic_allocator should throw
std::bad_array_new_length from their allocate member functions if the
number of bytes required cannot be represented in std::size_t.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Add new symbol.
* include/bits/functexcept.h (__throw_bad_array_new_length):
Declare new function.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Throw bad_array_new_length for impossible sizes (LWG 3190).
* include/ext/new_allocator.h (new_allocator::allocate):
Likewise.
* include/std/memory_resource (polymorphic_allocator::allocate)
(polymorphic_allocator::allocate_object): Use new function,
__throw_bad_array_new_length.
* src/c++11/functexcept.cc (__throw_bad_array_new_length):
Define.
* testsuite/20_util/allocator/lwg3190.cc: New test.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 87a48a21f53..6a2b2da33f5 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2322,6 +2322,9 @@ GLIBCXX_3.4.29 {
 # std::__atomic_futex_unsigned_base::_M_futex_wait_until_steady
 _ZNSt28__atomic_futex_unsigned_base26_M_futex_wait_until_steady*;
 
+# std::__throw_bad_array_new_length()
+_ZSt28__throw_bad_array_new_lengthv;
+
 } GLIBCXX_3.4.28;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/bits/functexcept.h 
b/libstdc++-v3/include/bits/functexcept.h
index 52eef2bb2c6..f6079e2a535 100644
--- a/libstdc++-v3/include/bits/functexcept.h
+++ b/libstdc++-v3/include/bits/functexcept.h
@@ -51,6 +51,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   __throw_bad_alloc(void) __attribute__((__noreturn__));
 
+  void
+  __throw_bad_array_new_length(void) __attribute__((__noreturn__));
+
   // Helper for exception objects in 
   void
   __throw_bad_cast(void) __attribute__((__noreturn__));
diff --git a/libstdc++-v3/include/ext/malloc_allocator.h 
b/libstdc++-v3/include/ext/malloc_allocator.h
index 366c766f25b..dd45470c456 100644
--- a/libstdc++-v3/include/ext/malloc_allocator.h
+++ b/libstdc++-v3/include/ext/malloc_allocator.h
@@ -102,8 +102,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Tp*
   allocate(size_type __n, const void* = 0)
   {
-   if (__n > this->_M_max_size())
- std::__throw_bad_alloc();
+   if (__builtin_expect(__n > this->_M_max_size(), false))
+ {
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3190. allocator::allocate sometimes returns too little storage
+   if (__n > (std::size_t(-1) / sizeof(_Tp)))
+ std::__throw_bad_array_new_length();
+   std::__throw_bad_alloc();
+ }
 
_Tp* __ret = 0;
 #if __cpp_aligned_new
diff --git a/libstdc++-v3/include/ext/new_allocator.h 
b/libstdc++-v3/include/ext/new_allocator.h
index 2e21a98409f..a43c8d9b6fb 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -102,8 +102,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX_NODISCARD _Tp*
   allocate(size_type __n, const void* = static_cast(0))
   {
-   if (__n > this->_M_max_size())
- std::__throw_bad_alloc();
+   if (__builtin_expect(__n > this->_M_max_size(), false))
+ {
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3190. allocator::allocate sometimes returns too little storage
+   if (__n > (std::size_t(-1) / sizeof(_Tp)))
+ std::__throw_bad_array_new_length();
+   std::__throw_bad_alloc();
+ }
 
 #if __cpp_aligned_new
if (alignof(_Tp) > __STDCPP_DEFAULT_NEW_ALIGNMENT__)
diff --git 

[patch] convert -Wrestrict pass to ranger

2020-10-05 Thread Aldy Hernandez via Gcc-patches

[Martin, as the original author of this pass, do you have any concerns?]

This patch converts the -Wrestrict pass to use an on-demand ranger 
instead of global ranges.


No effort was made to convert value_range's into multi-ranges. 
Basically, the places that were using value_range's, and looking at 
kind(), are still doing so.  This can be fixed as a follow-up patch, but 
it's not high on my list.


Note that there are still calls into get_range_info (global range info) 
when no ranger has been passed, because some of these functions are 
called from gimple fold during gimple lowering (builtin expansion as 
well??).


This patch depends on the ranger, and will likely be tweaked before 
going in.


Aldy

gcc/ChangeLog:

* calls.c (get_size_range): Adjust to work with ranger.
* calls.h (get_size_range): Add ranger argument to prototype.
* gimple-ssa-warn-restrict.c (class wrestrict_dom_walker): 
Remove.

(check_call): Pull out of wrestrict_dom_walker into a
static function.
(wrestrict_dom_walker::before_dom_children): Rename to...
(wrestrict_walk): ...this.
(pass_wrestrict::execute): Instantiate ranger.
(class builtin_memref): Add stmt and query fields.
(builtin_access::builtin_access): Add range_query field.
(builtin_memref::builtin_memref): Same.
(builtin_memref::extend_offset_range): Same.
(builtin_access::builtin_access): Make work with ranger.
(wrestrict_dom_walker::check_call): Pull out into...
(check_call): ...here.
(check_bounds_or_overlap): Add range_query argument.
* gimple-ssa-warn-restrict.h (check_bounds_or_overlap):
Add range_query and gimple stmt arguments.

gcc/testsuite/ChangeLog:

* gcc.dg/Wrestrict-22.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index ed4363811c8..c9c71657e54 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "builtins.h"
 #include "gimple-fold.h"
-
+#include "value-query.h"
 #include "tree-pretty-print.h"

 /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits.  */
@@ -1251,7 +1251,8 @@ alloc_max_size (void)
functions like memset.  */

 bool
-get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
+get_size_range (range_query *query, tree exp, gimple *stmt, tree range[2],
+   bool allow_zero /* = false */)
 {
   if (!exp)
 return false;
@@ -1270,7 +1271,21 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

   enum value_range_kind range_type;

   if (integral)
-range_type = determine_value_range (exp, , );
+{
+  if (query)
+   {
+ value_range vr;
+ gcc_assert (TREE_CODE (exp) == SSA_NAME
+ || TREE_CODE (exp) == INTEGER_CST);
+ gcc_assert (query->range_of_expr (vr, exp, stmt));
+ range_type = vr.kind ();
+ min = wi::to_wide (vr.min ());
+ max = wi::to_wide (vr.max ());
+   }
+  else
+   range_type = determine_value_range (exp, , );
+
+}
   else
 range_type = VR_VARYING;

@@ -1351,6 +1366,13 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

   return true;
 }

+bool
+get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
+{
+  return get_size_range (/*query=*/NULL, exp, /*stmt=*/NULL, range,
+allow_zero);
+}
+
 /* Diagnose a call EXP to function FN decorated with attribute alloc_size
whose argument numbers given by IDX with values given by ARGS exceed
the maximum object size or cause an unsigned oveflow (wrapping) when
diff --git a/gcc/calls.h b/gcc/calls.h
index dfb951ca45b..ab56b48fee4 100644
--- a/gcc/calls.h
+++ b/gcc/calls.h
@@ -134,6 +134,8 @@ extern void maybe_warn_alloc_args_overflow (tree, 
tree, tree[2], int[2]);

 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 extern bool maybe_warn_nonstring_arg (tree, tree);
 extern bool get_size_range (tree, tree[2], bool = false);
+extern bool get_size_range (class range_query *, tree, gimple *,
+   tree[2], bool = false);
 extern rtx rtx_for_static_chain (const_tree, bool);
 extern bool cxx17_empty_base_field_p (const_tree);

diff --git a/gcc/gimple-ssa-warn-restrict.c b/gcc/gimple-ssa-warn-restrict.c
index 512fc138528..7961c51c5b0 100644
--- a/gcc/gimple-ssa-warn-restrict.c
+++ b/gcc/gimple-ssa-warn-restrict.c
@@ -25,7 +25,6 @@
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
-#include "domwalk.h"
 #include "tree-pass.h"
 #include "builtins.h"
 #include "ssa.h"
@@ -41,6 +40,7 @@
 #include "calls.h"
 #include "cfgloop.h"
 #include "intl.h"
+#include "gimple-range.h"

 namespace {

@@ -77,21 +77,10 @@ pass_wrestrict::gate (function *fun ATTRIBUTE_UNUSED)
   return warn_array_bounds || 

Ping: [PATCH] PR rtl-optimization/96791 Check precision of partial modes

2020-10-05 Thread Aaron Sawdey via Gcc-patches
Not exactly a patch ping, but I was hoping we could re-engage the discussion on 
this and figure out how we can make POImode work for powerpc.

How does x86 solve this? There was some suggestion that it has some similar 
situations? 

Thanks,
   

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Sep 9, 2020, at 1:27 PM, Aaron Sawdey  wrote:
> 
> Now that the documentation for partial modes says they have a known
> number of bits of precision, would it make sense for extract_low_bits to
> check this before attempting to extract the bits?
> 
> This would solve the problem we have been having with POImode and
> extract_low_bits -- DSE tries to use it to extract part of a POImode
> register used in a previous store. We do not want to supply any patterns
> to make POImode (or OImode) used like a regular integer mode.
> 
> This patch adds such a check, and sets the precision of POImode to one
> bit, which resolves the problems of PR/96791 for ppc64 target.
> 
> Bootstrap passes on ppc64le and x86_64.
> 
> Thanks,
>   Aaron
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-modes.def (POImode): Change precision.
>   * expmed.c (extract_low_bits): Check precision.
> ---
> gcc/config/rs6000/rs6000-modes.def | 2 +-
> gcc/expmed.c   | 3 +++
> 2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-modes.def 
> b/gcc/config/rs6000/rs6000-modes.def
> index ddb218b3fba..aa7d60dd835 100644
> --- a/gcc/config/rs6000/rs6000-modes.def
> +++ b/gcc/config/rs6000/rs6000-modes.def
> @@ -90,5 +90,5 @@ INT_MODE (OI, 32);
> INT_MODE (XI, 64);
> 
> /* Modes used by __vector_pair and __vector_quad.  */
> -PARTIAL_INT_MODE (OI, 256, POI); /* __vector_pair.  */
> +PARTIAL_INT_MODE (OI, 1, POI);   /* __vector_pair.  */
> PARTIAL_INT_MODE (XI, 512, PXI);  /* __vector_quad.  */
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index d34f0fb0b54..23ca181afa6 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -2396,6 +2396,9 @@ extract_low_bits (machine_mode mode, machine_mode 
> src_mode, rtx src)
>   if (GET_MODE_CLASS (mode) == MODE_CC || GET_MODE_CLASS (src_mode) == 
> MODE_CC)
> return NULL_RTX;
> 
> +  if (known_lt (GET_MODE_PRECISION (src_mode), GET_MODE_BITSIZE (mode)))
> +return NULL_RTX;
> +
>   if (known_eq (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
>   && targetm.modes_tieable_p (mode, src_mode))
> {
> -- 
> 2.17.1
> 



[PATCH][omp, ftracer] Improve comment in ignore_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 10:51 AM, Alexander Monakov wrote:
> On Mon, 5 Oct 2020, Tom de Vries wrote:
> 
>> I've had to modify this patch in two ways:
>> - the original test-case stopped failing, though not the
>>   minimized one, so I added that one as a test-case
>> - only testing for ENTER_ALLOC and EXIT, and not explicitly for VOTE_ANY
>>   in ignore_bb_p also stopped working, so I've added that now.
>>
>> Re-tested and committed.
> 
> I don't understand, was the patch already approved somewhere?

Not explicitly, no.  But it was sent two weeks ago, and all the review
comments were related to compile-time slow-down, which I deal with in
separate patches.  So, based on that and the fact that this fixes a
problem for nvptx offloading currently triggering in the libgomp
test-suite, I decided to commit.

> It has some
> issues.
> 

OK, thanks for the review.

>> --- a/gcc/tracer.c
>> +++ b/gcc/tracer.c
>> @@ -108,6 +108,24 @@ ignore_bb_p (const_basic_block bb)
>>  return true;
>>  }
>>  
>> +  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
>> +   !gsi_end_p (gsi); gsi_next ())
>> +{
>> +  gimple *g = gsi_stmt (gsi);
>> +
>> +  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
>> + duplicated as part of its group, or not at all.
> 
> What does "its group" stand for? It seems obviously copy-pasted from the
> description of IFN_UNIQUE treatment, where it is even less clear what the
> "group" is.
> 
> (I know what it means, but the comment is not explaining things well at all)
> 
>> + The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
>> + so the same holds there, but it could be argued that the
>> + IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
>> + in which case it could be duplicated.  */
> 

How about this?

Thanks,
- Tom
[omp, ftracer] Improve comment in ignore_bb_p

Improve comment in ignore_bb_p related to the group marked by
IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (ignore_bb_p): Improve comment.

---
 gcc/tracer.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 7f32ccb7e21..cdda535ce9d 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -112,11 +112,15 @@ ignore_bb_p (const_basic_block bb)
!gsi_end_p (gsi); gsi_next ())
 {
   gimple *g = gsi_stmt (gsi);
-
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
-	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
-	 group, so the same holds there.  */
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT pair marks a SESE
+	 region.  When entering the region, per-lane storage is allocated,
+	 and non-uniform execution is started.  When exiting the region,
+	 non-uniform execution is stopped and per-lane storage is deallocated.
+	 These calls can be duplicated safely, if the entire region is
+	 duplicated, otherwise not.  So, since we're not asked about such a
+	 region here, conservatively return false.
+	 The IFN_GOMP_SIMT_VOTE_ANY and SIMT_XCHG_* are part of such
+	 a region, so the same holds there.  */
   if (is_gimple_call (g)
 	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
 	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)


Re: c++: Make spell corrections consistent

2020-10-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 05, 2020 at 09:39:01AM -0400, Nathan Sidwell wrote:
> My change to namespace-scope spell corrections ignored the issue that
> different targets might have different builtins, and therefore perturb
> iteration order.  This fixes it   by using an intermediate array of
> identifier, which we sort before considering.
> 
> gcc/cp/
> * name-lookup.c (maybe_add_fuzzy_decl): New.
> (maybe_add_fuzzy_binding): New.
> (consider_binding_level): Use intermediate sortable vector for
> namespace bindings.
> gcc/testsuite/
> * c-c++-common/spellcheck-reserved.c: Restore diagnostic.

Won't that be unnecessarily expensive?
I mean, as implemented, it will push into the vector all non-artificial
decls, so perhaps tens of thousands of them into the vector, then qsort the 
vector
and only then consider it.
So, either the code could hand-inline parts of what consider method
considers and only push into vectors decls that have the currently best
distance (and when encountering a better one truncate the vector before
pushing in).
Or add spellcheck.h consider alternative that would from candidates with the
same distance choose some particular one (e.g. the one where strcmp says it
compares earlier).
The
if (min_candidate_distance >= m_best_distance)
would probably need changing to > and then have dist == m_best_distance
handling.

Jakub



[committed][omp, ftracer] Ignore IFN_GOMP_SIMT_XCHG_* in ignore_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 10:51 AM, Alexander Monakov wrote:
>> +  if (is_gimple_call (g)
>> +  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
>> +  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
>> +  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
> 
> Hm? So you are leaving SIMT_XCHG_* be until the next testcase breaks?
> 

Fixed in patch below, committed.

Thanks,
- Tom
[omp, ftracer] Ignore IFN_GOMP_SIMT_XCHG_* in ignore_bb_p

As IFN_GOMP_SIMT_XCHG_* are part of the group marked by
IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT, handle them conservatively
in ignore_bb_p.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_XCHG_*.

---
 gcc/tracer.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 5ee66511f8d..7f32ccb7e21 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -115,12 +115,14 @@ ignore_bb_p (const_basic_block bb)
 
   /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY is part of such a group, so the same holds
-	 there.  */
+	 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
+	 group, so the same holds there.  */
   if (is_gimple_call (g)
 	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
 	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
 	return true;
 }
 


c++: Make spell corrections consistent

2020-10-05 Thread Nathan Sidwell


My change to namespace-scope spell corrections ignored the issue that
different targets might have different builtins, and therefore perturb
iteration order.  This fixes it by using an intermediate array of
identifier, which we sort before considering.

gcc/cp/
* name-lookup.c (maybe_add_fuzzy_decl): New.
(maybe_add_fuzzy_binding): New.
(consider_binding_level): Use intermediate sortable vector for
namespace bindings.
gcc/testsuite/
* c-c++-common/spellcheck-reserved.c: Restore diagnostic.

pushing to trunk

nathan

--
Nathan Sidwell
diff --git i/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index 190b56bf4dd..774c4473390 100644
--- i/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -6077,6 +6077,9 @@ qualified_namespace_lookup (tree scope, name_lookup *lookup)
   return found;
 }
 
+/* If DECL is suitably visible to the user, consider its name for
+   spelling correction.  */
+
 static void
 consider_decl (tree decl,  best_match  ,
 	   bool consider_impl_names)
@@ -6110,6 +6113,65 @@ consider_decl (tree decl,  best_match  ,
   bm.consider (suggestion_str);
 }
 
+/* If DECL is suitably visible to the user, add its name to VEC and
+   return true.  Otherwise return false.  */
+
+static bool
+maybe_add_fuzzy_decl (auto_vec , tree decl)
+{
+  /* Skip compiler-generated variables (e.g. __for_begin/__for_end
+ within range for).  */
+  if (TREE_CODE (decl) == VAR_DECL && DECL_ARTIFICIAL (decl))
+return false;
+
+  tree suggestion = DECL_NAME (decl);
+  if (!suggestion)
+return false;
+
+  /* Don't suggest names that are for anonymous aggregate types, as
+ they are an implementation detail generated by the compiler.  */
+  if (IDENTIFIER_ANON_P (suggestion))
+return false;
+
+  vec.safe_push (suggestion);
+
+  return true;
+}
+
+/* Examing the namespace binding BINDING, and add at most one instance
+   of the name, if it contains a visible entity of interest.  */
+
+void
+maybe_add_fuzzy_binding (auto_vec , tree binding,
+			  lookup_name_fuzzy_kind kind)
+{
+  tree value = NULL_TREE;
+
+  if (STAT_HACK_P (binding))
+{
+  if (!STAT_TYPE_HIDDEN_P (binding)
+	  && STAT_TYPE (binding))
+	{
+	  if (maybe_add_fuzzy_decl (vec, STAT_TYPE (binding)))
+	return;
+	}
+  else if (!STAT_DECL_HIDDEN_P (binding))
+	value = STAT_DECL (binding);
+}
+  else
+value = binding;
+
+  value = ovl_skip_hidden (value);
+  if (value)
+{
+  value = OVL_FIRST (value);
+  if (kind != FUZZY_LOOKUP_TYPENAME
+	  || TREE_CODE (STRIP_TEMPLATE (value)) == TYPE_DECL)
+	if (maybe_add_fuzzy_decl (vec, value))
+	  return;
+}
+}
+
 /* Helper function for lookup_name_fuzzy.
Traverse binding level LVL, looking for good name matches for NAME
(and BM).  */
@@ -6157,38 +6219,46 @@ consider_binding_level (tree name, best_match  ,
   }
   else
 {
-  /* Iterate over the namespace hash table, that'll have fewer
-	 entries than the decl list.  */
+  /* We need to iterate over the namespace hash table, in order to
+ not mention hidden entities.  But hash table iteration is
+ (essentially) unpredictable, our correction-distance measure
+ is very granular, and we pick the first of equal distances.
+ Hence, we need to call the distance-measurer in a predictable
+ order.  So, iterate over the namespace hash, inserting
+ visible names into a vector.  Then sort the vector.  Then
+ determine spelling distance.  */
+  
   tree ns = lvl->this_entity;
+  auto_vec vec;
 
   hash_table::iterator end
 	(DECL_NAMESPACE_BINDINGS (ns)->end ());
   for (hash_table::iterator iter
 	 (DECL_NAMESPACE_BINDINGS (ns)->begin ()); iter != end; ++iter)
+	maybe_add_fuzzy_binding (vec, *iter, kind);
+
+  vec.qsort ([] (const void *a_, const void *b_)
+		 {
+		   return strcmp (IDENTIFIER_POINTER (*(const tree *)a_),
+  IDENTIFIER_POINTER (*(const tree *)b_));
+		 });
+
+  /* Examine longest to shortest.  */
+  for (unsigned ix = vec.length (); ix--;)
 	{
-	  tree binding = *iter;
-	  tree value = NULL_TREE;
+	  const char *str = IDENTIFIER_POINTER (vec[ix]);
 
-	  if (STAT_HACK_P (binding))
-	{
-	  if (!STAT_TYPE_HIDDEN_P (binding)
-		  && STAT_TYPE (binding))
-		consider_decl (STAT_TYPE (binding), bm,
-			   consider_implementation_names);
-	  else if (!STAT_DECL_HIDDEN_P (binding))
-		value = STAT_DECL (binding);
-	}
-	  else
-	value = binding;
+	  /* Ignore internal names with spaces in them.  */
+	  if (strchr (str, ' '))
+	continue;
 	  
-	  value = ovl_skip_hidden (value);
-	  if (value)
-	{
-	  value = OVL_FIRST (value);
-	  if (!(kind == FUZZY_LOOKUP_TYPENAME
-		&& TREE_CODE (STRIP_TEMPLATE (value)) != TYPE_DECL))
-		consider_decl (value, bm, consider_implementation_names);
-	}
+	  /* Don't suggest names that are reserved for use by the
+	 implementation, unless NAME began with an 

[PATCH] arm: [MVE[ Add vqdmlashq intrinsics

2020-10-05 Thread Christophe Lyon via Gcc-patches
This patch adds:
vqdmlashq_m_n_s16
vqdmlashq_m_n_s32
vqdmlashq_m_n_s8
vqdmlashq_n_s16
vqdmlashq_n_s32
vqdmlashq_n_s8

2020-10-05  Christophe Lyon  

gcc/
* config/arm/arm_mve.h (vqdmlashq, vqdmlashq_m): Define.
* config/arm/arm_mve_builtins.def (vqdmlashq_n_s, vqdmlashq_m_n_s,): 
New.
* config/arm/mve.md (VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New unspec.
(VQDMLASHQ_N_S, VQDMLASHQ_M_N_S): New attributes.
(VQDMLASHQ_N): New iterator.
(mve_vqdmlashq_n_, mve_vqdmlashq_m_n_s): New patterns.

gcc/tetsuite/
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: New test.
* gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: New test.
---
 gcc/config/arm/arm_mve.h   | 116 +
 gcc/config/arm/arm_mve_builtins.def|   2 +
 gcc/config/arm/mve.md  |  42 +++-
 .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c |  23 
 .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c |  23 
 .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c  |  23 
 .../arm/mve/intrinsics/vqdmlashq_n_s16.c   |  21 
 .../arm/mve/intrinsics/vqdmlashq_n_s32.c   |  21 
 .../gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c |  21 
 9 files changed, 290 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index ecff3d1..59460ef 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -141,6 +141,7 @@
 #define vrev64q_m(__inactive, __a, __p) __arm_vrev64q_m(__inactive, __a, __p)
 #define vqrdmlashq(__a, __b, __c) __arm_vqrdmlashq(__a, __b, __c)
 #define vqrdmlahq(__a, __b, __c) __arm_vqrdmlahq(__a, __b, __c)
+#define vqdmlashq(__a, __b, __c) __arm_vqdmlashq(__a, __b, __c)
 #define vqdmlahq(__a, __b, __c) __arm_vqdmlahq(__a, __b, __c)
 #define vmvnq_m(__inactive, __a, __p) __arm_vmvnq_m(__inactive, __a, __p)
 #define vmlasq(__a, __b, __c) __arm_vmlasq(__a, __b, __c)
@@ -260,6 +261,7 @@
 #define vorrq_m(__inactive, __a, __b, __p) __arm_vorrq_m(__inactive, __a, __b, 
__p)
 #define vqaddq_m(__inactive, __a, __b, __p) __arm_vqaddq_m(__inactive, __a, 
__b, __p)
 #define vqdmladhq_m(__inactive, __a, __b, __p) __arm_vqdmladhq_m(__inactive, 
__a, __b, __p)
+#define vqdmlashq_m(__a, __b, __c, __p) __arm_vqdmlashq_m(__a, __b, __c, __p)
 #define vqdmladhxq_m(__inactive, __a, __b, __p) __arm_vqdmladhxq_m(__inactive, 
__a, __b, __p)
 #define vqdmlahq_m(__a, __b, __c, __p) __arm_vqdmlahq_m(__a, __b, __c, __p)
 #define vqdmlsdhq_m(__inactive, __a, __b, __p) __arm_vqdmlsdhq_m(__inactive, 
__a, __b, __p)
@@ -1307,6 +1309,7 @@
 #define vqdmlsdhxq_s8(__inactive, __a, __b) __arm_vqdmlsdhxq_s8(__inactive, 
__a, __b)
 #define vqdmlsdhq_s8(__inactive, __a, __b) __arm_vqdmlsdhq_s8(__inactive, __a, 
__b)
 #define vqdmlahq_n_s8(__a, __b, __c) __arm_vqdmlahq_n_s8(__a, __b, __c)
+#define vqdmlashq_n_s8(__a, __b, __c) __arm_vqdmlashq_n_s8(__a, __b, __c)
 #define vqdmladhxq_s8(__inactive, __a, __b) __arm_vqdmladhxq_s8(__inactive, 
__a, __b)
 #define vqdmladhq_s8(__inactive, __a, __b) __arm_vqdmladhq_s8(__inactive, __a, 
__b)
 #define vmlsdavaxq_s8(__a, __b, __c) __arm_vmlsdavaxq_s8(__a, __b, __c)
@@ -1391,6 +1394,7 @@
 #define vqrdmladhq_s16(__inactive, __a, __b) __arm_vqrdmladhq_s16(__inactive, 
__a, __b)
 #define vqdmlsdhxq_s16(__inactive, __a, __b) __arm_vqdmlsdhxq_s16(__inactive, 
__a, __b)
 #define vqdmlsdhq_s16(__inactive, __a, __b) __arm_vqdmlsdhq_s16(__inactive, 
__a, __b)
+#define vqdmlashq_n_s16(__a, __b, __c) __arm_vqdmlashq_n_s16(__a, __b, __c)
 #define vqdmlahq_n_s16(__a, __b, __c) __arm_vqdmlahq_n_s16(__a, __b, __c)
 #define vqdmladhxq_s16(__inactive, __a, __b) __arm_vqdmladhxq_s16(__inactive, 
__a, __b)
 #define vqdmladhq_s16(__inactive, __a, __b) __arm_vqdmladhq_s16(__inactive, 
__a, __b)
@@ -1476,6 +1480,7 @@
 #define vqrdmladhq_s32(__inactive, __a, __b) __arm_vqrdmladhq_s32(__inactive, 
__a, __b)
 #define vqdmlsdhxq_s32(__inactive, __a, __b) __arm_vqdmlsdhxq_s32(__inactive, 
__a, __b)
 #define vqdmlsdhq_s32(__inactive, __a, __b) __arm_vqdmlsdhq_s32(__inactive, 
__a, __b)
+#define vqdmlashq_n_s32(__a, __b, __c) __arm_vqdmlashq_n_s32(__a, __b, __c)
 #define 

Re: [PATCH 2/6] gimple-range-edge

2020-10-05 Thread Andrew MacLeod via Gcc-patches

On 10/5/20 8:09 AM, Jakub Jelinek wrote:

On Fri, Oct 02, 2020 at 12:59:54PM -0400, Andrew MacLeod via Gcc-patches wrote:

The ranger is needed to map those values to the switch variable, and also
apply any previous ranges or derived values (ie, if you ask for the range of
'y' in case 2, it will return unsigned int [6,6].
* gimple-range-edge.h: New File.
(outgoing_range): Calculate/cache constant outgoing edge ranges.

* gimple-range-edge.cc: New file.
(gimple_outgoing_range_stmt_p): New.  Find control statement.
(outgoing_range::outgoing_range): New.
(outgoing_range::~outgoing_range): New.
(outgoing_range::get_edge_range): New.  Internal switch edge query.
(outgoing_range::calc_switch_ranges): New.  Calculate switch ranges.
(outgoing_range::edge_range_p): New.  Find constant range on edge.

Just a ChangeLog comment (ditto for several other patches).
When you add a new file, just say that and nothing else, i.e.
* gimple-range-edge.h: New File.
* gimple-range-edge.cc: New file.
and that's it.  Everything in the new file is new, no need to state it
explicitly.

Jakub
Really? huh. ok.  Figured it was useful for anyone looking up a routine 
name. That will dramatically shrink these changelogs :-)


Andrew



Re: [PATCH] xfail and improve some failing libgomp tests

2020-10-05 Thread Tom de Vries
On 2/7/20 4:29 PM, Jakub Jelinek wrote:
> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>> * {target-32.c, thread-limit-2.c}:
>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
> 
> Please don't, I want to deal with that using declare variant, just didn't
> get yet around to finishing the last patch needed for that.  Will try next 
> week.
> 

Hi Jakub,

Ping, any update on this?

Thanks,
- Tom


[committed][omp, ftracer] Remove incorrect suggestion in ignore_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]
On 10/5/20 10:51 AM, Alexander Monakov wrote:
>> + The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
>> + so the same holds there, but it could be argued that the
>> + IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
>> + in which case it could be duplicated.  */
> No, something like that cannot be argued, as VOTE_ANY may have data
> dependencies to storage that is deallocated by SIMT_EXIT. You seem to be
> claiming something that is simply not possible with the current design.
> 

Fixed in patch below, committed.

Thanks,
- Tom
[omp, ftracer] Remove incorrect suggestion in ignore_bb_p

In commit ab3f4b27abe "[omp, ftracer] Don't duplicate blocks in SIMT region" I
added a comment in ignore_bb_p suggesting a reordering of SIMT_VOTE_ANY and
SIMT_EXIT, which is not possible since VOTE_ANY may have data dependencies to
storage that is deallocated by SIMT_EXIT.

I've now opened a PR (PR97291) to describe the problem the reordering was
intended to fix.

Remove the incorrect suggestion.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (ignore_bb_p): Remove incorrect suggestion.

---
 gcc/tracer.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 5e51752d89f..5ee66511f8d 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -115,10 +115,8 @@ ignore_bb_p (const_basic_block bb)
 
   /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
-	 so the same holds there, but it could be argued that the
-	 IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
-	 in which case it could be duplicated.  */
+	 The IFN_GOMP_SIMT_VOTE_ANY is part of such a group, so the same holds
+	 there.  */
   if (is_gimple_call (g)
 	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
 	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)


Re: [PATCH 2/6] gimple-range-edge

2020-10-05 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 02, 2020 at 12:59:54PM -0400, Andrew MacLeod via Gcc-patches wrote:
> 
> The ranger is needed to map those values to the switch variable, and also
> apply any previous ranges or derived values (ie, if you ask for the range of
> 'y' in case 2, it will return unsigned int [6,6].

> 
>   * gimple-range-edge.h: New File.
>   (outgoing_range): Calculate/cache constant outgoing edge ranges.
> 
>   * gimple-range-edge.cc: New file.
>   (gimple_outgoing_range_stmt_p): New.  Find control statement.
>   (outgoing_range::outgoing_range): New.
>   (outgoing_range::~outgoing_range): New.
>   (outgoing_range::get_edge_range): New.  Internal switch edge query.
>   (outgoing_range::calc_switch_ranges): New.  Calculate switch ranges.
>   (outgoing_range::edge_range_p): New.  Find constant range on edge.

Just a ChangeLog comment (ditto for several other patches).
When you add a new file, just say that and nothing else, i.e.
* gimple-range-edge.h: New File.
* gimple-range-edge.cc: New file.
and that's it.  Everything in the new file is new, no need to state it
explicitly.

Jakub



Re: [PATCH] S/390: Do not turn maybe-uninitialized warnings into errors

2020-10-05 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Tue, Sep 22, 2020 at 02:59:30PM +0200, Andreas Krebbel wrote:
> On 15.09.20 17:02, Stefan Schulze Frielinghaus wrote:
> > Over the last couple of months quite a few warnings about uninitialized
> > variables were raised while building GCC.  A reason why these warnings
> > show up on S/390 only is due to the aggressive inlining settings here.
> > Some of these warnings (2c832ffedf0, b776bdca932, 2786c0221b6,
> > 1657178f59b) could be fixed or in case of a false positive silenced by
> > initializing the corresponding variable.  Since the latter reoccurs and
> > while bootstrapping such warnings are turned into errors bootstrapping
> > fails on S/390 consistently.  Therefore, for the moment do not turn
> > those warnings into errors.
> > 
> > config/ChangeLog:
> > 
> > * warnings.m4: Do not turn maybe-uninitialized warnings into errors
> > on S/390.
> > 
> > fixincludes/ChangeLog:
> > 
> > * configure: Regenerate.
> > 
> > gcc/ChangeLog:
> > 
> > * configure: Regenerate.
> > 
> > libcc1/ChangeLog:
> > 
> > * configure: Regenerate.
> > 
> > libcpp/ChangeLog:
> > 
> > * configure: Regenerate.
> > 
> > libdecnumber/ChangeLog:
> > 
> > * configure: Regenerate.
> 
> That change looks good to me. Could a global reviewer please comment!

Ping

> 
> Andreas
> 
> > ---
> >  config/warnings.m4 | 20 ++--
> >  fixincludes/configure  |  8 +++-
> >  gcc/configure  | 12 +---
> >  libcc1/configure   |  8 +++-
> >  libcpp/configure   |  8 +++-
> >  libdecnumber/configure |  8 +++-
> >  6 files changed, 51 insertions(+), 13 deletions(-)
> > 
> > diff --git a/config/warnings.m4 b/config/warnings.m4
> > index ce007f9b73e..d977bfb20af 100644
> > --- a/config/warnings.m4
> > +++ b/config/warnings.m4
> > @@ -101,8 +101,10 @@ AC_ARG_ENABLE(werror-always,
> >  AS_HELP_STRING([--enable-werror-always],
> >[enable -Werror despite compiler version]),
> >  [], [enable_werror_always=no])
> > -AS_IF([test $enable_werror_always = yes],
> > -  [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > +AS_IF([test $enable_werror_always = yes], [dnl
> > +  acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > +  AS_CASE([$host], [s390*-*-*],
> > +  [acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
> >   m4_if($1, [manual],,
> >   [AS_VAR_PUSHDEF([acx_GCCvers], [acx_cv_prog_cc_gcc_$1_or_newer])dnl
> >AC_CACHE_CHECK([whether $CC is GCC >=$1], acx_GCCvers,
> > @@ -116,7 +118,9 @@ AS_IF([test $enable_werror_always = yes],
> > [AS_VAR_SET(acx_GCCvers, yes)],
> > [AS_VAR_SET(acx_GCCvers, no)])])
> >   AS_IF([test AS_VAR_GET(acx_GCCvers) = yes],
> > -   [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > +   [acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > +AS_CASE([$host], [s390*-*-*],
> > +[acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
> >AS_VAR_POPDEF([acx_GCCvers])])
> >  m4_popdef([acx_Var])dnl
> >  AC_LANG_POP(C)
> > @@ -205,8 +209,10 @@ AC_ARG_ENABLE(werror-always,
> >  AS_HELP_STRING([--enable-werror-always],
> >[enable -Werror despite compiler version]),
> >  [], [enable_werror_always=no])
> > -AS_IF([test $enable_werror_always = yes],
> > -  [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > +AS_IF([test $enable_werror_always = yes], [dnl
> > +  acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > +  AS_CASE([$host], [s390*-*-*],
> > +  [strict_warn="$strict_warn -Wno-error=maybe-uninitialized"])])
> >   m4_if($1, [manual],,
> >   [AS_VAR_PUSHDEF([acx_GXXvers], [acx_cv_prog_cxx_gxx_$1_or_newer])dnl
> >AC_CACHE_CHECK([whether $CXX is G++ >=$1], acx_GXXvers,
> > @@ -220,7 +226,9 @@ AS_IF([test $enable_werror_always = yes],
> > [AS_VAR_SET(acx_GXXvers, yes)],
> > [AS_VAR_SET(acx_GXXvers, no)])])
> >   AS_IF([test AS_VAR_GET(acx_GXXvers) = yes],
> > -   [acx_Var="$acx_Var${acx_Var:+ }-Werror"])
> > +   [acx_Var="$acx_Var${acx_Var:+ }-Werror"
> > +AS_CASE([$host], [s390*-*-*],
> > +[acx_Var="$acx_Var -Wno-error=maybe-uninitialized"])])
> >AS_VAR_POPDEF([acx_GXXvers])])
> >  m4_popdef([acx_Var])dnl
> >  AC_LANG_POP(C++)
> > diff --git a/fixincludes/configure b/fixincludes/configure
> > index 6e2d67b655b..e0d679cc18e 100755
> > --- a/fixincludes/configure
> > +++ b/fixincludes/configure
> > @@ -4753,7 +4753,13 @@ else
> >  fi
> >  
> >  if test $enable_werror_always = yes; then :
> > -  WERROR="$WERROR${WERROR:+ }-Werror"
> > +WERROR="$WERROR${WERROR:+ }-Werror"
> > +  case $host in #(
> > +  s390*-*-*) :
> > +WERROR="$WERROR -Wno-error=maybe-uninitialized" ;; #(
> > +  *) :
> > + ;;
> > +esac
> >  fi
> >  
> >  ac_ext=c
> > diff --git a/gcc/configure b/gcc/configure
> > index 0a09777dd42..ea03581537a 100755
> > --- a/gcc/configure
> > +++ b/gcc/configure
> > @@ -7064,7 +7064,13 @@ else
> >  fi
> >  
> >  if test $enable_werror_always = yes; then :
> > -  strict_warn="$strict_warn${strict_warn:+ }-Werror"
> > +

Re: Patch ping

2020-10-05 Thread Nathan Sidwell

On 10/5/20 5:09 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

I'd like to ping a few patches:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554845.html
   - PR97197 - support TARGET_MEM_REF in C/C++ error pretty-printing


ok, but could you add a comment on what it's printing out.  As you say, 
it's not the original source :)


nathan

--
Nathan Sidwell


Re: Fix handling of stores in modref_summary::useful_p

2020-10-05 Thread Vaseeharan Vinayagamoorthy via Gcc-patches
Hi,

After this patch, I am noticing that some glibc crypto tests get stuck in scanf 
which goes into busy loop.

My build/host/target setup is:
Build: aarch64-none-linux-gnu
Host: aarch64-none-linux-gnu
Target: aarch64-none-linux-gnu



Kind regards
Vasee


On 27/09/2020, 22:46, "Gcc-patches on behalf of Jan Hubicka" 
 wrote:

Hi,
this patch fixes a pasto in modref_summary::useful_p that made
ipa-modref to give up on tracking stores when all load info got lost.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2020-09-27  Jan Hubicka  

* ipa-modref.c (modref_summary::useful_p): Fix testing of stores.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 728c6c1523d..6225552e41a 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -135,7 +135,7 @@ modref_summary::useful_p (int ecf_flags)
 return true;
   if (ecf_flags & ECF_PURE)
 return false;
-  return stores && !loads->every_base;
+  return stores && !stores->every_base;
 }

 /* Dump A to OUT.  */



Re: c++: Hash table iteration for namespace-member spelling suggestions

2020-10-05 Thread Nathan Sidwell

On 10/5/20 5:48 AM, Christophe Lyon wrote:

On Fri, 2 Oct 2020 at 20:23, Nathan Sidwell  wrote:





Hi Nathan,

This is causing regressions on aarch64 and arm when using
-mfloat-abi=hard (or configuring for arm-linux-gnueabihf).
The logs says:
FAIL: c-c++-common/spellcheck-reserved.c  -std=gnu++98 (test for excess errors)
Excess errors:
/gcc/testsuite/c-c++-common/spellcheck-reserved.c:31:3: error:
'__builtin_strtchr' was not declared in this scope; did you mean
'__builtin_strrchr'?

The test still passes on arm with -mfloat=abi=soft


thanks.  What I'd ignored is that other targets (or abi-variants) might 
have different builtins.  Those would perturb the hash table order. 
I'll have to do it right ...


nathan

--
Nathan Sidwell


Re: libstdc++: Fix chrono::__detail::ceil to work with C++11 (was Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486])

2020-10-05 Thread Jonathan Wakely via Gcc-patches

On 19/09/20 11:37 +0100, Mike Crowe wrote:

On Friday 11 September 2020 at 19:59:36 +0100, Jonathan Wakely wrote:

commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147
Author: Jonathan Wakely 
Date:   Fri Sep 11 19:59:11 2020

libstdc++: Fix chrono::__detail::ceil to work with C++11

In C++11 constexpr functions can only have a return statement, so we
need to fix __detail::ceil to make it valid in C++11. This can be done
by moving the comparison and increment into a new function, __ceil_impl,
and calling that with the result of the duration_cast.

This would mean the standard C++17 std::chrono::ceil function would make
two further calls, which would add too much overhead when not inlined.
For C++17 and later use a using-declaration to add chrono::ceil to
namespace __detail. For C++11 and C++14 define chrono::__detail::__ceil
as a C++11-compatible constexpr function template.


libstdc++-v3/ChangeLog:

* include/std/chrono [C++17] (chrono::__detail::ceil): Add
using declaration to make chrono::ceil available for internal
use with a consistent name.
(chrono::__detail::__ceil_impl): New function template.
(chrono::__detail::ceil): Use __ceil_impl to compare and
increment the value. Remove SFINAE constraint.


This change introduces a new implementation of ceil that, as far as I can
tell, has no tests. A patch is attached to add the equivalent of the
existing chrono::ceil tests for chrono::__detail::ceil. The tests fail to
compile if I run them without 53ad6b1979f4bd7121e977c4a44151b14d8a0147 as
expected due to the previous non-C++11-compliant implementation.


Pushed to master, thanks.


From b9dffbf4f1bc05a887269ea95a3b86d5a611e720 Mon Sep 17 00:00:00 2001
From: Mike Crowe 
Date: Wed, 16 Sep 2020 15:31:28 +0100
Subject: [PATCH 1/2] libstdc++: Test C++11 implementation of
std::chrono::__detail::ceil

Commit 53ad6b1979f4bd7121e977c4a44151b14d8a0147 split the implementation
of std::chrono::__detail::ceil so that when compiling for C++17 and
later std::chrono::ceil is used but when compiling for earlier versions
a separate implementation is used to comply with C++11's limited
constexpr rules. Let's run the equivalent of the existing
std::chrono::ceil test cases on std::chrono::__detail::ceil too to make
sure that it doesn't get broken.

libstdc++-v3/ChangeLog:

* testsuite/20_util/duration_cast/rounding_c++11.cc: Copy
   rounding.cc and alter to support compilation for C++11 and to
   test std::chrono::__detail::ceil.
---
.../20_util/duration_cast/rounding_c++11.cc   | 43 +++
1 file changed, 43 insertions(+)
create mode 100644 
libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc

diff --git a/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc 
b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
new file mode 100644
index 000..f10d27fd082
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/duration_cast/rounding_c++11.cc
@@ -0,0 +1,43 @@
+// Copyright (C) 2016-2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using std::chrono::seconds;
+using std::chrono::milliseconds;
+
+using fp_seconds = std::chrono::duration;
+
+static_assert( std::chrono::__detail::ceil(milliseconds(1000))
+  == seconds(1) );
+static_assert( std::chrono::__detail::ceil(milliseconds(1001))
+  == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(1500))
+  == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(1999))
+  == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(2000))
+  == seconds(2) );
+static_assert( std::chrono::__detail::ceil(milliseconds(2001))
+  == seconds(3) );
+static_assert( std::chrono::__detail::ceil(milliseconds(2500))
+  == seconds(3) );
+static_assert( std::chrono::__detail::ceil(milliseconds(500))
+  == fp_seconds{0.5f} );
--
2.28.0





Re: [PATCH v5 6/8] libstdc++ atomic_futex: Avoid rounding errors in std::future::wait_* [PR91486]

2020-10-05 Thread Jonathan Wakely via Gcc-patches

On 19/09/20 11:50 +0100, Mike Crowe wrote:

On 29/05/20 07:17 +0100, Mike Crowe via Libstdc++ wrote:

> > diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
> > index 5f95ade..aa137a7 100644
> > --- a/libstdc++-v3/include/bits/atomic_futex.h
> > +++ b/libstdc++-v3/include/bits/atomic_futex.h
> > @@ -219,8 +219,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  _M_load_when_equal_for(unsigned __val, memory_order __mo,
> > const chrono::duration<_Rep, _Period>& __rtime)
> >  {
> > + using __dur = typename __clock_t::duration;
> >   return _M_load_when_equal_until(__val, __mo,
> > - __clock_t::now() + __rtime);
> > + __clock_t::now() + 
chrono::__detail::ceil<__dur>(__rtime));
> >  }
> >
> >// Returns false iff a timeout occurred.
> > @@ -233,7 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   do {
> > const __clock_t::time_point __s_entry = __clock_t::now();
> > const auto __delta = __atime - __c_entry;
> > -   const auto __s_atime = __s_entry + __delta;
> > +   const auto __s_atime = __s_entry +
> > +   chrono::__detail::ceil<_Duration>(__delta);


On Friday 11 September 2020 at 18:22:04 +0100, Jonathan Wakely wrote:

I'm testing the attached patch to fix the C++11 constexpr error, but
while re-looking at the uses of __detail::ceil I noticed this is using
_Duration as the target type. Shouldn't that be __clock_t::duration
instead? Why do we care about the duration of the user's time_point
here, rather than the preferred duration of the clock we're about to
wait against?


I think you're right. I've attached a patch to fix it and also add a test
that would have failed at least some of the time if run on a machine with
an uptime greater than 208.5 days with:

void test_pr91486_wait_until(): Assertion 'float_steady_clock::call_count <= 3' 
failed.

If we implement the optimisation to not re-check against the custom clock
when the wait is complete if is_steady == true then the test would have
started failing due to the wait not being long enough.

(I used a couple of the GCC farm machines that have high uptimes to test
this.)


Also pushed to master. Thanks!



Thanks.

Mike.



From fa4decc00698785fb9e07aa36c0d862414ca5ff9 Mon Sep 17 00:00:00 2001
From: Mike Crowe 
Date: Wed, 16 Sep 2020 16:55:11 +0100
Subject: [PATCH 2/2] libstdc++: Use correct duration for atomic_futex wait on
custom clock [PR 91486]

As Jonathan Wakely pointed out[1], my change in commit
f9ddb696a289cc48d24d3d23c0b324cb88de9573 should have been rounding to
the target clock duration type rather than the input clock duration type
in __atomic_futex_unsigned::_M_load_when_equal_until just as (e.g.)
condition_variable does.

As well as fixing this, let's create a rather contrived test that fails
with the previous code, but unfortunately only when run on a machine
with an uptime of over 208.5 days, and even then not always.

[1] https://gcc.gnu.org/pipermail/libstdc++/2020-September/051004.html

libstdc++-v3/ChangeLog:

* include/bits/atomic_futex.h:
   (__atomic_futex_unsigned::_M_load_when_equal_until): Use
   target clock duration type when rounding.

   * testsuite/30_threads/async/async.cc: (test_pr91486_wait_for)
   rename from test_pr91486.  (float_steady_clock) new class for
   test.  (test_pr91486_wait_until) new test.
---
libstdc++-v3/include/bits/atomic_futex.h  |  2 +-
.../testsuite/30_threads/async/async.cc   | 62 ++-
2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index aa137a7b64e..6093be0fbc7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -235,7 +235,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  const __clock_t::time_point __s_entry = __clock_t::now();
  const auto __delta = __atime - __c_entry;
  const auto __s_atime = __s_entry +
- chrono::__detail::ceil<_Duration>(__delta);
+ chrono::__detail::ceil<__clock_t::duration>(__delta);
  if (_M_load_when_equal_until(__val, __mo, __s_atime))
return true;
  __c_entry = _Clock::now();
diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 46f8d2f327d..1c779bfbcad 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -157,7 +157,7 @@ void test04()
  }
}

-void test_pr91486()
+void test_pr91486_wait_for()
{
  future f1 = async(launch::async, []() {
  std::this_thread::sleep_for(std::chrono::seconds(1));
@@ -171,6 +171,63 @@ void test_pr91486()
  VERIFY( elapsed_steady >= std::chrono::seconds(1) );
}

+// This is a clock with a very recent epoch which ensures that the difference
+// between now() and one second in the future 

Re: [PATCH] dwarf: Multi-register CFI address support

2020-10-05 Thread Andrew Stubbs

Ping.

On 21/09/2020 14:51, Andrew Stubbs wrote:

Ping.

On 03/09/2020 16:29, Andrew Stubbs wrote:

On 28/08/2020 13:04, Andrew Stubbs wrote:

Hi all,

This patch introduces DWARF CFI support for architectures that 
require multiple registers to hold pointers, such as the stack 
pointer, frame pointer, and return address. The motivating case is 
the AMD GCN architecture which has 64-bit address pointers, but 
32-bit registers.


The current implementation permits program variables to span as many 
registers as they need, but assumes that CFI expressions will only 
need a single register for each frame value.


To be fair, the DWARF standard makes a similar assumption; the 
engineers working on LLVM and GDB, at AMD, have therefore invented 
some new DWARF operators that they plan to propose for a future 
standard. Only one is relevant here, however: DW_OP_LLVM_piece_end. 
(Unfortunately this clashes with an AArch64 extension, but I think we 
can cope using an alias -- only GCC dumps will be confusing.)


My approach is to change the type representing a DWARF register 
throughout the CFI code. This permits the register span information 
to propagate to where it is needed.


I've taken advantage of C++ struct copies and operator== to minimize 
the amount of refactoring required. I'm not sure this meets the GCC 
guidelines exactly, but if not I can change that once the basic form 
is agreed. (I also considered an operator= to make assigning single 
dwreg values transparent, but that hid too many invalid assumptions.)


OK to commit? (Although, I'll hold off until AMD release the 
compatible GDB.)


Minor patch update, following Tom's feedback.

Andrew







Re: c++: Hash table iteration for namespace-member spelling suggestions

2020-10-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 05, 2020 at 11:48:32AM +0200, Christophe Lyon via Gcc-patches wrote:
> > For 'no such binding' errors, we iterate over binding levels to find a
> > close match.  At the namespace level we were using DECL_ANTICIPATED to
> > skip undeclared builtins.  But (a) there are other unnameable things
> > there and (b) decl-anticipated is about to go away.  This changes the
> > namespace scanning to iterate over the hash table, and look at
> > non-hidden bindings.  This does mean we look at fewer strings
> > (hurrarh), but the order we meet them is somewhat 'random'.  Our
> > distance measure is not very fine grained, and a couple of testcases
> > change their suggestion.  I notice for the c/c++ common one, we now
> > match the output of the C compiler.  For the other one we think 'int'
> > and 'int64_t' have the same distance from 'int64', and now meet the
> > former first.  That's a little unfortunate.  If it's too problematic I
> > suppose we could sort the strings via an intermediate array before
> > measuring distance.
> >
> >  gcc/cp/
> >  * name-lookup.c (consider_decl): New, broken out of ...
> >  (consider_binding_level): ... here.  Iterate the hash table for
> >  namespace bindings.
> >  gcc/testsuite/
> >  * c-c++-common/spellcheck-reserved.c: Adjust diagnostic.
> >  * g++.dg/spellcheck-typenames.C: Adjust diagnostic.
> >
> > pushing to trunk
> >
> 
> Hi Nathan,
> 
> This is causing regressions on aarch64 and arm when using
> -mfloat-abi=hard (or configuring for arm-linux-gnueabihf).
> The logs says:
> FAIL: c-c++-common/spellcheck-reserved.c  -std=gnu++98 (test for excess 
> errors)
> Excess errors:
> /gcc/testsuite/c-c++-common/spellcheck-reserved.c:31:3: error:
> '__builtin_strtchr' was not declared in this scope; did you mean
> '__builtin_strrchr'?

Yeah, ditto on i686-linux (but passes on x86_64-linux).
Iterating over a hash table is fine, but we should not generate random
results.  So, the function that decides what is better and worse needs to
have some additional rules, so that no two different decl names are considered
equal.  Whether it is DECL_UID, or string length of the name, or some hash
of the string characters doesn't matter that much.

Jakub



[PATCH] arm: [MVE] Add missing __arm_vcvtnq_u32_f32 intrinsic (PR 96914)

2020-10-05 Thread Christophe Lyon via Gcc-patches
__arm_vcvtnq_u32_f32 was missing from arm_mve.h, although the s32_f32 and
[su]16_f16 versions were present.

This patch adds the missing version and testcase, which are
cut-and-paste from the other versions.

2020-10-05  Christophe Lyon  

gcc/
* config/arm/arm_mve.h (__arm_vcvtnq_u32_f32): New.

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c: New test.
---
 gcc/config/arm/arm_mve.h|  8 
 .../gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c  | 13 +
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 99cff41..ecff3d1 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -643,6 +643,7 @@
 #define vcvtpq_u16_f16(__a) __arm_vcvtpq_u16_f16(__a)
 #define vcvtpq_u32_f32(__a) __arm_vcvtpq_u32_f32(__a)
 #define vcvtnq_u16_f16(__a) __arm_vcvtnq_u16_f16(__a)
+#define vcvtnq_u32_f32(__a) __arm_vcvtnq_u32_f32(__a)
 #define vcvtmq_u16_f16(__a) __arm_vcvtmq_u16_f16(__a)
 #define vcvtmq_u32_f32(__a) __arm_vcvtmq_u32_f32(__a)
 #define vcvtaq_u16_f16(__a) __arm_vcvtaq_u16_f16(__a)
@@ -17012,6 +17013,13 @@ __arm_vcvtnq_u16_f16 (float16x8_t __a)
   return __builtin_mve_vcvtnq_uv8hi (__a);
 }
 
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__arm_vcvtnq_u32_f32 (float32x4_t __a)
+{
+  return __builtin_mve_vcvtnq_uv4si (__a);
+}
+
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vcvtmq_u16_f16 (float16x8_t __a)
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c
new file mode 100644
index 000..b6d5eb9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vcvtnq_u32_f32.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O2" } */
+
+#include "arm_mve.h"
+
+uint32x4_t
+foo (float32x4_t a)
+{
+  return vcvtnq_u32_f32 (a);
+}
+
+/* { dg-final { scan-assembler "vcvtn.u32.f32"  }  } */
-- 
2.7.4



[patch] convert -Walloca pass to ranger

2020-10-05 Thread Aldy Hernandez via Gcc-patches
The walloca pass is a mess.  It has all sorts of heuristics to divine 
problematic ranges fed into alloca, none of them very good, and all of 
them unreadable.  The mess therein was actually one of the original 
motivators for the ranger project (along with array bounds checking).


The attached patch is a conversion of the pass to ranger.  It's mostly 
an exercise in removing code.  The entire pass almost reduces to:


+  // If the user specified a limit, use it.
+  int_range_max r;
+  if (warn_limit_specified_p (is_vla)
+  && TREE_CODE (len) == SSA_NAME
+  && query.range_of_expr (r, len, stmt)
+  && !r.varying_p ())
+{
+  // The invalid bits are anything outside of [0, MAX_SIZE].
+  static int_range<2> invalid_range (build_int_cst (size_type_node, 0),
+build_int_cst (size_type_node,
+   max_size),
+VR_ANTI_RANGE);
+
+  r.intersect (invalid_range);
+  if (r.undefined_p ())
+   return alloca_type_and_limit (ALLOCA_OK);
+
+  return alloca_type_and_limit (ALLOCA_BOUND_MAYBE_LARGE,
+   wi::to_wide (integer_zero_node));
 }

That is, if the range of the integer passed to alloca is outside of 
[0,MAX_SIZE], warn, otherwise it's ok.  Plain and simple.


You will notice I removed the nuanced errors we gave before-- like 
trying to guess whether the problematic range came by virtue of a signed 
cast conversion.  These specific errors were never part of the original 
design, they were just stuff we could guess by how the IL looked.  It 
was non-exact and fragile.  Now we just say the alloca argument may be 
too large, period.


It the future, I would even like to remove the specific range the ranger 
was able to compute from the error message itself.  As will become 
obvious, the ranger can get pretty outrageous ranges that are entirely 
non-obvious by looking at the code.  Peppering the error messages with 
these ranges will ultimately just confuse the user.  But alas, that's a 
problem for another patch to solve.


This patch goes on top of the ranger which Andrew just posted.  It's 
likely to be adjusted as the ranger is committed.


gcc/ChangeLog:

* gimple-ssa-warn-alloca.c (enum alloca_type): Remove
ALLOCA_BOUND_UNKNOWN and ALLOCA_CAST_FROM_SIGNED.
(warn_limit_specified_p): New.
(alloca_call_type_by_arg): Remove.
(cast_from_signed_p): Remove.
(is_max): Remove.
(alloca_call_type): Remove heuristics and replace with call 
into

ranger.
(pass_walloca::execute): Instantiate ranger.

gcc/testsuite/ChangeLog:

* gcc.dg/Walloca-1.c: Adjust for ranger.
* gcc.dg/Walloca-12.c: Same.
* gcc.dg/Walloca-13.c: Same.
* gcc.dg/Walloca-2.c: Same.
* gcc.dg/Walloca-3.c: Same.
* gcc.dg/Walloca-6.c: Same.

diff --git a/gcc/gimple-ssa-warn-alloca.c b/gcc/gimple-ssa-warn-alloca.c
index 9e80e5dbbd9..33824a7a091 100644
--- a/gcc/gimple-ssa-warn-alloca.c
+++ b/gcc/gimple-ssa-warn-alloca.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "calls.h"
 #include "cfgloop.h"
 #include "intl.h"
+#include "gimple-range.h"

 static unsigned HOST_WIDE_INT adjusted_warn_limit (bool);

@@ -99,12 +100,6 @@ enum alloca_type {
   // Alloca argument may be too large.
   ALLOCA_BOUND_MAYBE_LARGE,

-  // Alloca argument is bounded but of an indeterminate size.
-  ALLOCA_BOUND_UNKNOWN,
-
-  // Alloca argument was casted from a signed integer.
-  ALLOCA_CAST_FROM_SIGNED,
-
   // Alloca appears in a loop.
   ALLOCA_IN_LOOP,

@@ -135,6 +130,15 @@ public:
   }
 };

+/* Return TRUE if the user specified a limit for either VLAs or 
ALLOCAs.  */

+
+static bool
+warn_limit_specified_p (bool is_vla)
+{
+  unsigned HOST_WIDE_INT max = is_vla ? warn_vla_limit : warn_alloca_limit;
+  return max != HOST_WIDE_INT_MAX;
+}
+
 /* Return the value of the argument N to -Walloca-larger-than= or
-Wvla-larger-than= adjusted for the target data model so that
when N == HOST_WIDE_INT_MAX, the adjusted value is set to
@@ -158,183 +162,15 @@ adjusted_warn_limit (bool idx)
   return limits[idx];
 }

-
-// NOTE: When we get better range info, this entire function becomes
-// irrelevant, as it should be possible to get range info for an SSA
-// name at any point in the program.
-//
-// We have a few heuristics up our sleeve to determine if a call to
-// alloca() is within bounds.  Try them out and return the type of
-// alloca call with its assumed limit (if applicable).
-//
-// Given a known argument (ARG) to alloca() and an EDGE (E)
-// calculating said argument, verify that the last statement in the BB
-// in E->SRC is a gate comparing ARG to an acceptable bound for
-// alloca().  See examples below.
-//
-// If set, ARG_CASTED is the possible unsigned 

[PATCH] PR target/96307: Fix KASAN option checking.

2020-10-05 Thread Kito Cheng
 - Disable kasan if target is unsupported and -fasan-shadow-offset= is not
   given, no matter `--param asan-stack=1` is given or not.

 - Moving KASAN option checking testcase to gcc.dg, those testcase could be
   useful for all other target which not support asan.

 - Verifed on riscv and x86.

gcc/ChangeLog:

PR target/96307
* toplev.c (process_options): Remove param_asan_stack checking for kasan
option checking.

gcc/testsuite/ChangeLog:

PR target/96307
* gcc.dg/pr96307.c: New.
* gcc.target/riscv/pr96260.c: Move this test case from here to ...
* gcc.dg/pr96260.c: ... here.
* gcc.target/riscv/pr91441.c: Move this test case from here to ...
* gcc.dg/pr91441.c: ... here.
* lib/target-supports.exp (check_effective_target_no_fsanitize_address):
New proc.
---
 .../{gcc.target/riscv => gcc.dg}/pr91441.c|  1 +
 .../{gcc.target/riscv => gcc.dg}/pr96260.c|  1 +
 gcc/testsuite/gcc.dg/pr96307.c| 25 +++
 gcc/testsuite/lib/target-supports.exp | 11 
 gcc/toplev.c  |  1 -
 5 files changed, 38 insertions(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.target/riscv => gcc.dg}/pr91441.c (85%)
 rename gcc/testsuite/{gcc.target/riscv => gcc.dg}/pr96260.c (77%)
 create mode 100644 gcc/testsuite/gcc.dg/pr96307.c

diff --git a/gcc/testsuite/gcc.target/riscv/pr91441.c 
b/gcc/testsuite/gcc.dg/pr91441.c
similarity index 85%
rename from gcc/testsuite/gcc.target/riscv/pr91441.c
rename to gcc/testsuite/gcc.dg/pr91441.c
index b55df5e7f00c..4f7a8fbec5e9 100644
--- a/gcc/testsuite/gcc.target/riscv/pr91441.c
+++ b/gcc/testsuite/gcc.dg/pr91441.c
@@ -1,5 +1,6 @@
 /* PR target/91441 */
 /* { dg-do compile  } */
+/* { dg-require-effective-target no_fsanitize_address }*/
 /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address" } */
 
 int *bar(int *);
diff --git a/gcc/testsuite/gcc.target/riscv/pr96260.c 
b/gcc/testsuite/gcc.dg/pr96260.c
similarity index 77%
rename from gcc/testsuite/gcc.target/riscv/pr96260.c
rename to gcc/testsuite/gcc.dg/pr96260.c
index 229997f877b7..734832f021e3 100644
--- a/gcc/testsuite/gcc.target/riscv/pr96260.c
+++ b/gcc/testsuite/gcc.dg/pr96260.c
@@ -1,5 +1,6 @@
 /* PR target/96260 */
 /* { dg-do compile } */
+/* { dg-require-effective-target no_fsanitize_address }*/
 /* { dg-options "--param asan-stack=1 -fsanitize=kernel-address 
-fasan-shadow-offset=0x10" } */
 
 int *bar(int *);
diff --git a/gcc/testsuite/gcc.dg/pr96307.c b/gcc/testsuite/gcc.dg/pr96307.c
new file mode 100644
index ..cd1c17c9661b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96307.c
@@ -0,0 +1,25 @@
+/* PR target/96307 */
+/* { dg-do compile } */
+/* { dg-require-effective-target no_fsanitize_address }*/
+/* { dg-additional-options "-fsanitize=kernel-address 
--param=asan-instrumentation-with-call-threshold=8" } */
+
+#include 
+enum a {test1, test2, test3=INT_MAX};
+enum a a;
+enum a *b;
+
+void reset (void);
+
+void
+t()
+{
+  if (a != test2)
+__builtin_abort ();
+  if (*b != test2)
+__builtin_abort ();
+  reset ();
+  if (a != test1)
+__builtin_abort ();
+  if (*b != test1)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 8314e443c437..e80b71a2110c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -10552,3 +10552,14 @@ proc check_effective_target_ident_directive {} {
int i;
 }]
 }
+
+# Return 1 if target is not support address sanitize, 1 otherwise.
+
+proc check_effective_target_no_fsanitize_address {} {
+if ![check_no_compiler_messages fsanitize_address executable {
+   int main (void) { return 0; }
+}] {
+   return 1;
+}
+return 0;
+}
diff --git a/gcc/toplev.c b/gcc/toplev.c
index a4cb8bb262ed..540e131d963d 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1842,7 +1842,6 @@ process_options (void)
 
   if ((flag_sanitize & SANITIZE_KERNEL_ADDRESS)
   && (targetm.asan_shadow_offset == NULL
- && param_asan_stack
  && !asan_shadow_offset_set_p ()))
 {
   warning_at (UNKNOWN_LOCATION, 0,
-- 
2.28.0



Re: c++: Hash table iteration for namespace-member spelling suggestions

2020-10-05 Thread Christophe Lyon via Gcc-patches
On Fri, 2 Oct 2020 at 20:23, Nathan Sidwell  wrote:
>
>
> For 'no such binding' errors, we iterate over binding levels to find a
> close match.  At the namespace level we were using DECL_ANTICIPATED to
> skip undeclared builtins.  But (a) there are other unnameable things
> there and (b) decl-anticipated is about to go away.  This changes the
> namespace scanning to iterate over the hash table, and look at
> non-hidden bindings.  This does mean we look at fewer strings
> (hurrarh), but the order we meet them is somewhat 'random'.  Our
> distance measure is not very fine grained, and a couple of testcases
> change their suggestion.  I notice for the c/c++ common one, we now
> match the output of the C compiler.  For the other one we think 'int'
> and 'int64_t' have the same distance from 'int64', and now meet the
> former first.  That's a little unfortunate.  If it's too problematic I
> suppose we could sort the strings via an intermediate array before
> measuring distance.
>
>  gcc/cp/
>  * name-lookup.c (consider_decl): New, broken out of ...
>  (consider_binding_level): ... here.  Iterate the hash table for
>  namespace bindings.
>  gcc/testsuite/
>  * c-c++-common/spellcheck-reserved.c: Adjust diagnostic.
>  * g++.dg/spellcheck-typenames.C: Adjust diagnostic.
>
> pushing to trunk
>

Hi Nathan,

This is causing regressions on aarch64 and arm when using
-mfloat-abi=hard (or configuring for arm-linux-gnueabihf).
The logs says:
FAIL: c-c++-common/spellcheck-reserved.c  -std=gnu++98 (test for excess errors)
Excess errors:
/gcc/testsuite/c-c++-common/spellcheck-reserved.c:31:3: error:
'__builtin_strtchr' was not declared in this scope; did you mean
'__builtin_strrchr'?

The test still passes on arm with -mfloat=abi=soft

Christophe

> nathan
>
> --
> Nathan Sidwell


Re: Track access ranges in ipa-modref

2020-10-05 Thread Christophe Lyon via Gcc-patches
On Fri, 2 Oct 2020 at 10:37, Jan Hubicka  wrote:
>
> Hi,
> this patch implements tracking of access ranges.  This is only applied when
> base pointer is an arugment. Incrementally i will extend it to also track
> TBAA basetype so we can disambiguate ranges for accesses to same basetype
> (which makes is quite bit more effective). For this reason i track the access
> offset separately from parameter offset (the second track combined adjustments
> to the parameter). This is I think last feature I would like to add to the
> memory access summary this stage1.
>
> Further work will be needed to opitmize the summary and merge adjacent
> range/make collapsing more intelingent (so we do not lose track that often),
> but I wanted to keep basic patch simple.
>
> According to the cc1plus stats:
>
> Alias oracle query stats:
>   refs_may_alias_p: 64108082 disambiguations, 74386675 queries
>   ref_maybe_used_by_call_p: 142319 disambiguations, 65004781 queries
>   call_may_clobber_ref_p: 23587 disambiguations, 29420 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 38117 queries
>   nonoverlapping_refs_since_match_p: 19489 disambiguations, 55748 must 
> overlaps, 76044 queries
>   aliasing_component_refs_p: 54763 disambiguations, 755876 queries
>   TBAA oracle: 24184658 disambiguations 56823187 queries
>16260329 are in alias set 0
>10617146 queries asked about the same object
>125 queries asked about the same alias set
>0 access volatile
>3960555 are dependent in the DAG
>1800374 are aritificially in conflict with void *
>
> Modref stats:
>   modref use: 10656 disambiguations, 47037 queries
>   modref clobber: 1473322 disambiguations, 1961464 queries
>   5027242 tbaa queries (2.563005 per modref query)
>   649087 base compares (0.330920 per modref query)
>
> PTA query stats:
>   pt_solution_includes: 977385 disambiguations, 13609749 queries
>   pt_solutions_intersect: 1032703 disambiguations, 13187507 queries
>
> Which should still compare with
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554930.html
> there is about 2% more load disambiguations and 3.6% more store that is not
> great, but the TBAA part helps noticeably more and also this should help
> with -fno-strict-aliasing.
>
> I plan to work on improving param tracking too.
>
> Bootstrapped/regtested x86_64-linux with the other changes, OK?
>

Hi,
Since this was committed, I've noticed regressions on arm:
gcc.c-torture/execute/pta-field-2.c   -O1  execution test
gcc.c-torture/execute/pta-field-2.c   -O2  execution test
gcc.c-torture/execute/pta-field-2.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
gcc.c-torture/execute/pta-field-2.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
gcc.c-torture/execute/pta-field-2.c   -O3 -g  execution test
gcc.c-torture/execute/pta-field-2.c   -Os  execution test
gcc.dg/ipa/ipa-pta-15.c execution test
gcc.dg/torture/pta-ptrarith-1.c   -O1   scan-tree-dump alias
"ESCAPED = {[^\n}]* i f [^\n}]*}"
gcc.dg/torture/pta-ptrarith-1.c   -O1  execution test
gcc.dg/torture/pta-ptrarith-1.c   -O2   scan-tree-dump alias
"ESCAPED = {[^\n}]* i f [^\n}]*}"
gcc.dg/torture/pta-ptrarith-1.c   -O2  execution test
gcc.dg/torture/pta-ptrarith-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none   scan-tree-dump alias "ESCAPED = {[^\n}]* i f
[^\n}]*}"
gcc.dg/torture/pta-ptrarith-1.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
gcc.dg/torture/pta-ptrarith-1.c   -O3 -g   scan-tree-dump alias
"ESCAPED = {[^\n}]* i f [^\n}]*}"
gcc.dg/torture/pta-ptrarith-1.c   -O3 -g  execution test
gcc.dg/torture/pta-ptrarith-1.c   -Os   scan-tree-dump alias
"ESCAPED = {[^\n}]* i f [^\n}]*}"
gcc.dg/torture/pta-ptrarith-1.c   -Os  execution test

I think there are similar problems on x86 since I saw regression emails.

Can you check?

Thanks,

Christophe

> 2020-10-02  Jan Hubicka  
>
> * ipa-modref-tree.c (test_insert_search_collapse): Update andling
> of accesses.
> (test_merge): Likewise.
> * ipa-modref-tree.h (struct modref_access_node): Add offset, size,
> max_size, parm_offset and parm_offset_known.
> (modref_access_node::useful_p): Constify.
> (modref_access_node::range_info_useful_p): New predicate.
> (modref_access_node::operator==): New.
> (struct modref_parm_map): New structure.
> (modref_tree::merge): Update for racking parameters)
> * ipa-modref.c (dump_access): Dump new fields.
> (get_access): Fill in new fields.
> (merge_call_side_effects): Update handling of parm map.
> (write_modref_records): Stream new fields.
> (read_modref_records): Stream new fields.
> (compute_parm_map): Update for new parm map.
> (ipa_merge_modref_summary_after_inlining): Update.

Re: [PATCH] arm: Add missing vec_cmp and vcond patterns

2020-10-05 Thread Christophe Lyon via Gcc-patches
On Thu, 1 Oct 2020 at 16:10, Richard Sandiford via Gcc-patches
 wrote:
>
> This patch does several things at once:
>
> (1) Add vector compare patterns (vec_cmp and vec_cmpu).
>
> (2) Add vector selects between floating-point modes when the
> values being compared are integers (affects vcond and vcondu).
>
> (3) Add vector selects between integer modes when the values being
> compared are floating-point (affects vcond).
>
> (4) Add standalone vector select patterns (vcond_mask).
>
> (5) Tweak the handling of compound comparisons with zeros.
>
> Unfortunately it proved too difficult (for me) to separate this
> out into a series of smaller patches, since everything is so
> inter-related.  Defining only some of the new patterns does
> not leave things in a happy state.
>
> The handling of comparisons is mostly taken from the vcond patterns.
> This means that it remains non-compliant with IEEE: “quiet” comparisons
> use signalling instructions.  But that shouldn't matter for floats,
> since we require -funsafe-math-optimizations to vectorize for them
> anyway.
>
> It remains the case that comparisons and selects aren't implemented
> at all for HF vectors.  Implementing those feels like separate work.
>
> Tested on arm-linux-gnueabihf and arm-eabi (for MVE).  OK to install?
>
> Richard
>

Hi Richard,

This patches enables a few more tests on armeb-linux-gnueabihf
--with-cpu cortex-a9
--with-fpu neon-fp16, with these failures:
gcc.dg/vect/slp-cond-2-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 3
gcc.dg/vect/slp-cond-2-big-array.c scan-tree-dump-times vect
"vectorizing stmts using SLP" 3
gcc.dg/vect/slp-cond-2.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 3
gcc.dg/vect/slp-cond-2.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 3
gcc.dg/vect/vect-cond-10.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 8
gcc.dg/vect/vect-cond-10.c scan-tree-dump-times vect "vectorized 1 loops" 8
gcc.dg/vect/vect-cond-8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 5
gcc.dg/vect/vect-cond-8.c scan-tree-dump-times vect "vectorized 1 loops" 5
gcc.dg/vect/vect-cond-9.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 10
gcc.dg/vect/vect-cond-9.c scan-tree-dump-times vect "vectorized 1 loops" 10

I guess this is expected since vectorization does not work well on
armeb in general?

Thanks,

Christophe



>
> gcc/
> PR target/96528
> PR target/97288
> * config/arm/arm-protos.h (arm_expand_vector_compare): Declare.
> (arm_expand_vcond): Likewise.
> * config/arm/arm.c (arm_expand_vector_compare): New function.
> (arm_expand_vcond): Likewise.
> * config/arm/neon.md (vec_cmp): New pattern.
> (vec_cmpu): Likewise.
> (vcond): Require operand 5 to be a register
> or zero.  Use arm_expand_vcond.
> (vcond): New pattern.
> (vcondu): Generalize to...
> (vcondu to be a register or zero.  Use arm_expand_vcond.
> (vcond_mask_): New pattern.
> (neon_vc, neon_vc_insn): Add "@" marker.
> (neon_vbsl): Likewise.
> (neon_vcu): Reexpress as...
> (@neon_vc): ...this.
>
> gcc/testsuite/
> * lib/target-supports.exp (check_effective_target_vect_cond_mixed): 
> Add
> arm neon targets.
> * gcc.target/arm/neon-compare-1.c: New test.
> * gcc.target/arm/neon-compare-2.c: Likewise.
> * gcc.target/arm/neon-compare-3.c: Likewise.
> * gcc.target/arm/neon-compare-4.c: Likewise.
> * gcc.target/arm/neon-compare-5.c: Likewise.
> * gcc.target/arm/neon-vcond-gt.c: Expect comparisons with zero.
> * gcc.target/arm/neon-vcond-ltgt.c: Likewise.
> * gcc.target/arm/neon-vcond-unordered.c: Likewise.
> ---
>  gcc/config/arm/arm-protos.h   |   2 +
>  gcc/config/arm/arm.c  | 121 
>  gcc/config/arm/neon.md| 281 --
>  gcc/testsuite/gcc.target/arm/neon-compare-1.c |  84 ++
>  gcc/testsuite/gcc.target/arm/neon-compare-2.c |  45 +++
>  gcc/testsuite/gcc.target/arm/neon-compare-3.c |  44 +++
>  gcc/testsuite/gcc.target/arm/neon-compare-4.c |  38 +++
>  gcc/testsuite/gcc.target/arm/neon-compare-5.c |  37 +++
>  gcc/testsuite/gcc.target/arm/neon-vcond-gt.c  |   2 +-
>  .../gcc.target/arm/neon-vcond-ltgt.c  |   3 +-
>  .../gcc.target/arm/neon-vcond-unordered.c |   4 +-
>  gcc/testsuite/lib/target-supports.exp |   2 +
>  12 files changed, 442 insertions(+), 221 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/neon-compare-1.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/neon-compare-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/neon-compare-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/neon-compare-4.c
>  

Re: [PATCH v2] PR target/96759 - Handle global variable assignment from misaligned structure/PARALLEL return values.

2020-10-05 Thread Kito Cheng
ping.


On Fri, Sep 25, 2020 at 2:33 PM Richard Biener  wrote:

> On Fri, 25 Sep 2020, Kito Cheng wrote:
>
> > In g:70cdb21e579191fe9f0f1d45e328908e59c0179e, DECL/global variable has
> handled
> > misaligned stores, but it didn't handle PARALLEL values, and I refer the
> > other part of this function, I found the PARALLEL need handled by
> > emit_group_* functions, so I add a check, and using emit_group_store if
> > storing a PARALLEL value, also checked this change didn't break the
> > testcase(gcc.target/arm/unaligned-argument-3.c) added by the orginal
> changes.
> >
> > For riscv64 target, struct S {int a; double b;} will pack into a parallel
> > value to return and it has TImode when misaligned access is supported,
> > however TImode required 16-byte align, but it only 8-byte align, so it
> go to
> > the misaligned stores handling, then it will try to generate move
> > instruction from a PARALLEL value.
> >
> > Tested on following target without introduced new reguression:
> >   - riscv32/riscv64 elf
> >   - x86_64-linux
> >   - arm-eabi
>
> OK if Eric says so.
>
> Thanks,
> Richard.
>
> > v2 changes:
> >   - Use maybe_emit_group_store instead of emit_group_store.
> >   - Remove push_temp_slots/pop_temp_slots, emit_group_store only require
> > stack temp slot when dst is CONCAT or PARALLEL, however
> > maybe_emit_group_store will always use REG for dst if needed.
> >
> > gcc/ChangeLog:
> >
> >   PR target/96759
> >   * expr.c (expand_assignment): Handle misaligned stores with
> PARALLEL
> >   value.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/96759
> >   * g++.target/riscv/pr96759.C: New.
> >   * gcc.target/riscv/pr96759.c: New.
> > ---
> >  gcc/expr.c   |  2 ++
> >  gcc/testsuite/g++.target/riscv/pr96759.C |  8 
> >  gcc/testsuite/gcc.target/riscv/pr96759.c | 13 +
> >  3 files changed, 23 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.target/riscv/pr96759.C
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/pr96759.c
> >
> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index 1a15f24b3979..6eb13a12c8c5 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -5168,6 +5168,8 @@ expand_assignment (tree to, tree from, bool
> nontemporal)
> >rtx reg, mem;
> >
> >reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> > +  /* Handle PARALLEL.  */
> > +  reg = maybe_emit_group_store (reg, TREE_TYPE (from));
> >reg = force_not_mem (reg);
> >mem = expand_expr (to, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >if (TREE_CODE (to) == MEM_REF && REF_REVERSE_STORAGE_ORDER (to))
> > diff --git a/gcc/testsuite/g++.target/riscv/pr96759.C
> b/gcc/testsuite/g++.target/riscv/pr96759.C
> > new file mode 100644
> > index ..673999a4baf7
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/riscv/pr96759.C
> > @@ -0,0 +1,8 @@
> > +/* { dg-options "-mno-strict-align -std=gnu++17" } */
> > +/* { dg-do compile } */
> > +struct S {
> > +  int a;
> > +  double b;
> > +};
> > +S GetNumbers();
> > +auto [globalC, globalD] = GetNumbers();
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr96759.c
> b/gcc/testsuite/gcc.target/riscv/pr96759.c
> > new file mode 100644
> > index ..621c39196fca
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr96759.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-options "-mno-strict-align" } */
> > +/* { dg-do compile } */
> > +
> > +struct S {
> > +  int a;
> > +  double b;
> > +};
> > +struct S GetNumbers();
> > +struct S g;
> > +
> > +void foo(){
> > +  g = GetNumbers();
> > +}
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend
>


Patch ping (Re: [PATCH] options, lto: Optimize streaming of optimization nodes)

2020-10-05 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 14, 2020 at 11:47:56AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Sep 14, 2020 at 11:02:26AM +0200, Jan Hubicka wrote:
> > Especially for the new param machinery, most of streamed values are
> > probably going to be the default values.  Perhaps somehow we could
> > stream them more effectively.
> 
> Ah, that seems like a good idea, that brings further savings, the size
> goes down from 574 bytes to 273 bytes, i.e. less than half.
> Here is an updated version that does that.  Not trying to handle
> enums because the code doesn't know if (enum ...) 10 is even valid,
> similarly non-parameters because those really generally don't have large
> initializers, and params without Init (those are 0 initialized and thus
> don't need to be handled).

Here is an updated version of that on top of what has been committed.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-09-16  Jakub Jelinek  

* optc-save-gen.awk: Initialize var_opt_init.  In
cl_optimization_stream_out for params with default values larger than
10, xor the default value with the actual parameter value.  In
cl_optimization_stream_in repeat the above xor.

--- gcc/optc-save-gen.awk.jj2020-09-14 10:51:54.493740942 +0200
+++ gcc/optc-save-gen.awk   2020-09-14 11:39:39.441602594 +0200
@@ -1186,6 +1186,7 @@ for (i = 0; i < n_opts; i++) {
var_opt_val_type[n_opt_val] = otype;
var_opt_val[n_opt_val] = "x_" name;
var_opt_hash[n_opt_val] = flag_set_p("Optimization", flags[i]);
+   var_opt_init[n_opt_val] = opt_args("Init", flags[i]);
n_opt_val++;
}
 }
@@ -1257,10 +1258,21 @@ for (i = 0; i < n_opt_val; i++) {
otype = var_opt_val_type[i];
if (otype ~ "^const char \\**$")
print "  bp_pack_string (ob, bp, ptr->" name", true);";
-   else if (otype ~ "^unsigned")
-   print "  bp_pack_var_len_unsigned (bp, ptr->" name");";
-   else
-   print "  bp_pack_var_len_int (bp, ptr->" name");";
+   else {
+   if (otype ~ "^unsigned") {
+   sgn = "unsigned";
+   } else {
+   sgn = "int";
+   }
+   if (name ~ "^x_param" && !(otype ~ "^enum ") && 
var_opt_init[i]) {
+   print "  if (" var_opt_init[i] " > (" 
var_opt_val_type[i] ") 10)";
+   print "bp_pack_var_len_" sgn " (bp, ptr->" name" ^ 
" var_opt_init[i] ");";
+   print "  else";
+   print "bp_pack_var_len_" sgn " (bp, ptr->" name");";
+   } else {
+   print "  bp_pack_var_len_" sgn " (bp, ptr->" name");";
+   }
+   }
 }
 print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
(ptr->explicit_mask[0]); i++)";
 print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
@@ -1281,10 +1293,18 @@ for (i = 0; i < n_opt_val; i++) {
print "  if (ptr->" name")";
print "ptr->" name" = xstrdup (ptr->" name");";
}
-   else if (otype ~ "^unsigned")
-   print "  ptr->" name" = (" var_opt_val_type[i] ") 
bp_unpack_var_len_unsigned (bp);";
-   else
-   print "  ptr->" name" = (" var_opt_val_type[i] ") 
bp_unpack_var_len_int (bp);";
+   else {
+   if (otype ~ "^unsigned") {
+   sgn = "unsigned";
+   } else {
+   sgn = "int";
+   }
+   print "  ptr->" name" = (" var_opt_val_type[i] ") 
bp_unpack_var_len_" sgn " (bp);";
+   if (name ~ "^x_param" && !(otype ~ "^enum ") && 
var_opt_init[i]) {
+   print "  if (" var_opt_init[i] " > (" 
var_opt_val_type[i] ") 10)";
+   print "ptr->" name" ^= " var_opt_init[i] ";";
+   }
+   }
 }
 print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
(ptr->explicit_mask[0]); i++)";
 print "ptr->explicit_mask[i] = bp_unpack_value (bp, 64);";


Jakub



Patch ping

2020-10-05 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping a few patches:

https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552451.html
  - allow plugins to deal with global_options layout changes

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553420.html
  - --enable-link-serialization{,=N} support

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553992.html
  - pass -gdwarf-5 to assembler for -gdwarf-5 if possible

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554845.html
  - PR97197 - support TARGET_MEM_REF in C/C++ error pretty-printing

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554804.html
  - PR97164 - reject forming arrays with elt sizes not divisible by elt 
alignment

Thanks

Jakub



Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-10-05 Thread Alexander Monakov via Gcc-patches
On Mon, 5 Oct 2020, Tom de Vries wrote:

> I've had to modify this patch in two ways:
> - the original test-case stopped failing, though not the
>   minimized one, so I added that one as a test-case
> - only testing for ENTER_ALLOC and EXIT, and not explicitly for VOTE_ANY
>   in ignore_bb_p also stopped working, so I've added that now.
> 
> Re-tested and committed.

I don't understand, was the patch already approved somewhere? It has some
issues.


> --- a/gcc/tracer.c
> +++ b/gcc/tracer.c
> @@ -108,6 +108,24 @@ ignore_bb_p (const_basic_block bb)
>   return true;
>  }
>  
> +  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
> +   !gsi_end_p (gsi); gsi_next ())
> +{
> +  gimple *g = gsi_stmt (gsi);
> +
> +  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
> +  duplicated as part of its group, or not at all.

What does "its group" stand for? It seems obviously copy-pasted from the
description of IFN_UNIQUE treatment, where it is even less clear what the
"group" is.

(I know what it means, but the comment is not explaining things well at all)

> +  The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
> +  so the same holds there, but it could be argued that the
> +  IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
> +  in which case it could be duplicated.  */

No, something like that cannot be argued, as VOTE_ANY may have data
dependencies to storage that is deallocated by SIMT_EXIT. You seem to be
claiming something that is simply not possible with the current design.

> +  if (is_gimple_call (g)
> +   && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
> +   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> +   || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))


Hm? So you are leaving SIMT_XCHG_* be until the next testcase breaks?

> + return true;
> +}
> +
>return false;
>  }

Alexander


Re: [PING 2][PATCH 2/5] C front end support to detect out-of-bounds accesses to array parameters

2020-10-05 Thread Szabolcs Nagy via Gcc-patches
The 09/23/2020 21:45, Jeff Law wrote:
> On 9/23/20 11:45 AM, Martin Sebor via Gcc-patches wrote:
> > On 9/23/20 9:44 AM, Szabolcs Nagy wrote:
> > > The 09/23/2020 09:22, Szabolcs Nagy wrote:
> > > > The 09/21/2020 12:45, Martin Sebor via Gcc-patches wrote:
> > > > > On 9/21/20 12:20 PM, Vaseeharan Vinayagamoorthy wrote:
> > > > > > After this patch, I am seeing this -Warray-parameter error:
> > > > > > 
> > > > > > In file included from ../include/pthread.h:1,
> > > > > >    from ../sysdeps/nptl/thread_db.h:25,
> > > > > >    from ../nptl/descr.h:32,
> > > > > >    from ../sysdeps/aarch64/nptl/tls.h:44,
> > > > > >    from ../include/errno.h:25,
> > > > > >    from ../sysdeps/unix/sysv/linux/sysdep.h:23,
> > > > > >    from
> > > > > > ../sysdeps/unix/sysv/linux/generic/sysdep.h:22,
> > > > > >    from
> > > > > > ../sysdeps/unix/sysv/linux/aarch64/sysdep.h:24,
> > > > > >    from :1:
> > > > > > ../sysdeps/nptl/pthread.h:734:47: error: argument 1 of
> > > > > > type ‘struct __jmp_buf_tag *’ declared as a pointer
> > > > > > [-Werror=array-parameter=]
> > > > > >     734 | extern int __sigsetjmp (struct __jmp_buf_tag
> > > > > > *__env, int __savemask) __THROWNL;
> > > > > >     | ~~^
> > > > > > In file included from ../include/setjmp.h:2,
> > > > > >    from ../nptl/descr.h:24,
> > > > > >    from ../sysdeps/aarch64/nptl/tls.h:44,
> > > > > >    from ../include/errno.h:25,
> > > > > >    from ../sysdeps/unix/sysv/linux/sysdep.h:23,
> > > > > >    from
> > > > > > ../sysdeps/unix/sysv/linux/generic/sysdep.h:22,
> > > > > >    from
> > > > > > ../sysdeps/unix/sysv/linux/aarch64/sysdep.h:24,
> > > > > >    from :1:
> > > > > > ../setjmp/setjmp.h:54:46: note: previously declared as
> > > > > > an array ‘struct __jmp_buf_tag[1]’
> > > > > >  54 | extern int __sigsetjmp (struct __jmp_buf_tag
> > > > > > __env[1], int __savemask) __THROWNL;
> > > > > >     | ~^~~~
> > > > > > cc1: all warnings being treated as errors
> > > > > 
> > > > > The warning flags differences between the forms of array parameters
> > > > > in redeclarations of the same function, including pointers vs arrays
> > > > > as in this instance.  It needs to be suppressed in glibc, either by
> > > > > making the function declarations consistent, or by #pragma diagnostic.
> > > > > (IIRC, the pointer declaration comes before struct __jmp_buf_tag has
> > > > > been defined so simply using the array form there doesn't work without
> > > > > defining the type first.)
> > > > > 
> > > > > I would expect the warning to be suppressed when using the installed
> > > > > library thanks to -Wno-system-headers.
> > > > 
> > > > why is this a warning? i'm not convinced it
> > > > should be in -Wall.
> > 
> > The main motivation for the warning is to detect unintentional
> > inconsistencies between function redeclarations that make deriving
> > true true intent difficult or impossible  (e.g, T[3] vs T[1], or
> > T[] vs T[1], or equivalently T* vs T[1]).
> > 
> > One goal is to support the convention where a constant array bound
> > in a function array parameter is used in lieu of the [static N]
> > notation (i.e., the minimum number of elements the caller is
> > expected to  make available).  The [static N] notation is little
> > known, used only exceedingly rarely, and isn't available in C++.
> > The array notation is used more often, although by no means common.
> > 
> > The ultimate goal is to motivate users to take advantage of GCC's
> > ability to check ordinary functions for out-of-bounds accesses to
> > array arguments.  The checking is only feasible if all declarations
> > of the same function, including its definition, use a consistent
> > notation to specify the same bound.  Including the strict
> > -Warray-parameter=2 setting in -Wall helps support this goal
> > (-Warray-parameter=1 doesn't warn for mismatches in the forms
> > of ordinary array bounds without [static].)
> > 
> > I mentioned the results of testing the patch with a number of
> > packages, including Glibc, Binutils/GDB, Glibc, and the kernel,
> > in the overview of the patch series:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550920.html
> > It explains why I chose not to relax the warning to accommodate
> > the Glibc use case.
> > 
> > Based on my admittedly limited testing I'm not concerned about
> > the warning having adverse effects.  But if broader excposure
> > shows that it is prone to some it can certainly be adjusted.
> > Jeff does periodic mass rebuilds of all of Fedora with the top
> > of GCC trunk so we should know soon.
> 
> Yea.  If the patch was in last week's snapshot, then it should start
> spinning the 9000 Fedora packages later tonight alongside some Ranger 

[committed] store-merging: Fix up -Wnarrowing warning

2020-10-05 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed a -Wnarrowing warning on gimple-ssa-store-merging.c, this
change fixes that up.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious.

2020-10-05  Jakub Jelinek  

* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Use ~0U instead of ~0 in
unsigned int array initializer.

--- gcc/gimple-ssa-store-merging.c.jj   2020-09-16 09:35:51.273221896 +0200
+++ gcc/gimple-ssa-store-merging.c  2020-10-04 21:12:11.359915122 +0200
@@ -3804,7 +3804,7 @@ imm_store_chain_info::output_merged_stor
 Similarly, if there is a whole region clear first, prefer expanding
 it together compared to expanding clear first followed by merged
 further stores.  */
-  unsigned cnt[4] = { ~0, ~0, ~0, ~0 };
+  unsigned cnt[4] = { ~0U, ~0U, ~0U, ~0U };
   int pass_min = 0;
   for (int pass = 0; pass < 4; ++pass)
{


Jakub



Re: [PATCH 1/4] system_data_types.7: Add '__int128'

2020-10-05 Thread Florian Weimer via Gcc-patches
* Paul Eggert:

> On 10/2/20 12:01 PM, Alejandro Colomar wrote:
>> If you propose not to document the stdint types either,
>
> This is not a stdint.h issue. __int128 is not in stdint.h and is not a
> system data type in any real sense; it's purely a compiler
> issue. Besides, do we start repeating the GCC manual too, while we're
> at it? At some point we need to restrain ourselves and stay within the
> scope of the man pages.

The manual pages also duplicate the glibc manual, and as far as I know,
it's what programmers actually read.  (Downstream, we receive many more
man-pages bugs than glibc or GCC manual bugs.)  Most developers use
distributions which do not ship the glibc or GCC manual for licensing
policy reasons, so the GNU manuals are not installed locally.

> PS. Have you ever tried to use __int128 in real code? I have, to my
> regret. It's a portability and bug minefield and should not be used
> unless you really know what you're doing, which most people do not.

Doesn't this suggest we need improved documentation?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-10-05 Thread Richard Biener
On October 5, 2020 9:08:41 AM GMT+02:00, Jakub Jelinek  wrote:
>On Sun, Oct 04, 2020 at 09:16:00PM +0200, Jakub Jelinek via Gcc-patches
>wrote:
>> On Sun, Oct 04, 2020 at 09:13:29AM +0200, Andreas Schwab wrote:
>> > This breaks ia64:
>> > 
>> > In file included from ./tm.h:23,
>> >  from ../../gcc/gencheck.c:23:
>> > ./options.h:7816:40: error: ISO C++ forbids zero-size array
>'explicit_mask' [-Werror=pedantic]
>> >  7816 |   unsigned HOST_WIDE_INT explicit_mask[0];
>> >   |^
>> > ./options.h:7816:26: error: zero-size array member
>'cl_target_option::explicit_mask' not at end of 'struct
>cl_target_option' [-Werror=pedantic]
>> >  7816 |   unsigned HOST_WIDE_INT explicit_mask[0];
>> >   |  ^
>> > ./options.h:7812:16: note: in the definition of 'struct
>cl_target_option'
>> >  7812 | struct GTY(()) cl_target_option
>> >   |^~~~
>> 
>> Oops, sorry.
>> 
>> The following patch should fix that and should also fix streaming of
>the
>> new explicit_mask_* members.
>> I'll bootstrap/regtest on x86_64-linux and i686-linux tonight, but
>have no
>> way to test it on ia64-linux (well, tested that x86_64-linux ->
>ia64-linux
>> cross builds cc1 with it).
>
>Successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>> 2020-10-04  Jakub Jelinek  
>> 
>>  * opth-gen.awk: Don't emit explicit_mask array if n_target_explicit
>>  is equal to n_target_explicit_mask.
>>  * optc-save-gen.awk: Compute has_target_explicit_mask and if false,
>>  don't emit code iterating over explicit_mask array elements.  Stream
>>  also explicit_mask_* target members.
>> 
>> --- gcc/opth-gen.awk.jj  2020-10-03 21:21:59.727862692 +0200
>> +++ gcc/opth-gen.awk 2020-10-04 11:12:51.851906413 +0200
>> @@ -291,7 +291,10 @@ for (i = 0; i < n_target_char; i++) {
>>  }
>>  
>>  print "  /* " n_target_explicit - n_target_explicit_mask " members
>*/";
>> -print "  unsigned HOST_WIDE_INT explicit_mask[" int
>((n_target_explicit - n_target_explicit_mask + 63) / 64) "];";
>> +if (n_target_explicit > n_target_explicit_mask) {
>> +print "  unsigned HOST_WIDE_INT explicit_mask[" \
>> +  int ((n_target_explicit - n_target_explicit_mask + 63) / 64)
>"];";
>> +}
>>  
>>  for (i = 0; i < n_target_explicit_mask; i++) {
>>  print "  " var_target_explicit_mask[i] ";";
>> --- gcc/optc-save-gen.awk.jj 2020-10-03 21:21:59.728862678 +0200
>> +++ gcc/optc-save-gen.awk2020-10-04 21:03:31.619462434 +0200
>> @@ -689,6 +689,10 @@ for (i = 0; i < n_target_string; i++) {
>>  if (j != 0) {
>>  print "  ptr->explicit_mask[" k "] = mask;";
>>  }
>> +has_target_explicit_mask = 0;
>> +if (j != 0 || k != 0) {
>> +has_target_explicit_mask = 1;
>> +}
>>  
>>  print "}";
>>  
>> @@ -1075,9 +1079,11 @@ for (i = 0; i < n_target_val; i++) {
>>  print "return false;";
>>  }
>>  
>> -print "  for (size_t i = 0; i < sizeof (ptr1->explicit_mask) /
>sizeof (ptr1->explicit_mask[0]); i++)";
>> -print "if (ptr1->explicit_mask[i] != ptr2->explicit_mask[i])";
>> -print "  return false;"
>> +if (has_target_explicit_mask) {
>> +print "  for (size_t i = 0; i < sizeof (ptr1->explicit_mask) /
>sizeof (ptr1->explicit_mask[0]); i++)";
>> +print "if (ptr1->explicit_mask[i] != ptr2->explicit_mask[i])";
>> +print "  return false;"
>> +}
>>  
>>  for (i = 0; i < n_target_other; i++) {
>>  if (var_target_other[i] in var_target_explicit_mask) {
>> @@ -1121,8 +1127,10 @@ for (i = 0; i < n_target_val; i++) {
>>  name = var_target_val[i]
>>  print "  hstate.add_hwi (ptr->" name");";
>>  }
>> -print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof
>(ptr->explicit_mask[0]); i++)";
>> -print "hstate.add_hwi (ptr->explicit_mask[i]);";
>> +if (has_target_explicit_mask) {
>> +print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) /
>sizeof (ptr->explicit_mask[0]); i++)";
>> +print "hstate.add_hwi (ptr->explicit_mask[i]);";
>> +}
>>  
>>  for (i = 0; i < n_target_other; i++) {
>>  if (var_target_other[i] in var_target_explicit_mask)
>> @@ -1159,8 +1167,22 @@ for (i = 0; i < n_target_val; i++) {
>>  print "  bp_pack_value (bp, ptr->" name", 64);";
>>  }
>>  
>> -print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof
>(ptr->explicit_mask[0]); i++)";
>> -print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
>> +if (has_target_explicit_mask) {
>> +print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) /
>sizeof (ptr->explicit_mask[0]); i++)";
>> +print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
>> +}
>> +
>> +for (i = 0; i < n_target_other; i++) {
>> +if (var_target_other[i] in var_target_explicit_mask) {
>> +print "  bp_pack_value (bp, ptr->explicit_mask_"
>var_target_other[i] ", 64);";
>> +}
>> +}
>> +
>> +for (i = 0; i < n_target_int; i++) {
>> +if 

Re: [PATCH] options: Save and restore opts_set for Optimization and Target options

2020-10-05 Thread Jakub Jelinek via Gcc-patches
On Sun, Oct 04, 2020 at 09:16:00PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Sun, Oct 04, 2020 at 09:13:29AM +0200, Andreas Schwab wrote:
> > This breaks ia64:
> > 
> > In file included from ./tm.h:23,
> >  from ../../gcc/gencheck.c:23:
> > ./options.h:7816:40: error: ISO C++ forbids zero-size array 'explicit_mask' 
> > [-Werror=pedantic]
> >  7816 |   unsigned HOST_WIDE_INT explicit_mask[0];
> >   |^
> > ./options.h:7816:26: error: zero-size array member 
> > 'cl_target_option::explicit_mask' not at end of 'struct cl_target_option' 
> > [-Werror=pedantic]
> >  7816 |   unsigned HOST_WIDE_INT explicit_mask[0];
> >   |  ^
> > ./options.h:7812:16: note: in the definition of 'struct cl_target_option'
> >  7812 | struct GTY(()) cl_target_option
> >   |^~~~
> 
> Oops, sorry.
> 
> The following patch should fix that and should also fix streaming of the
> new explicit_mask_* members.
> I'll bootstrap/regtest on x86_64-linux and i686-linux tonight, but have no
> way to test it on ia64-linux (well, tested that x86_64-linux -> ia64-linux
> cross builds cc1 with it).

Successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

> 2020-10-04  Jakub Jelinek  
> 
>   * opth-gen.awk: Don't emit explicit_mask array if n_target_explicit
>   is equal to n_target_explicit_mask.
>   * optc-save-gen.awk: Compute has_target_explicit_mask and if false,
>   don't emit code iterating over explicit_mask array elements.  Stream
>   also explicit_mask_* target members.
> 
> --- gcc/opth-gen.awk.jj   2020-10-03 21:21:59.727862692 +0200
> +++ gcc/opth-gen.awk  2020-10-04 11:12:51.851906413 +0200
> @@ -291,7 +291,10 @@ for (i = 0; i < n_target_char; i++) {
>  }
>  
>  print "  /* " n_target_explicit - n_target_explicit_mask " members */";
> -print "  unsigned HOST_WIDE_INT explicit_mask[" int ((n_target_explicit - 
> n_target_explicit_mask + 63) / 64) "];";
> +if (n_target_explicit > n_target_explicit_mask) {
> + print "  unsigned HOST_WIDE_INT explicit_mask[" \
> +   int ((n_target_explicit - n_target_explicit_mask + 63) / 64) "];";
> +}
>  
>  for (i = 0; i < n_target_explicit_mask; i++) {
>   print "  " var_target_explicit_mask[i] ";";
> --- gcc/optc-save-gen.awk.jj  2020-10-03 21:21:59.728862678 +0200
> +++ gcc/optc-save-gen.awk 2020-10-04 21:03:31.619462434 +0200
> @@ -689,6 +689,10 @@ for (i = 0; i < n_target_string; i++) {
>  if (j != 0) {
>   print "  ptr->explicit_mask[" k "] = mask;";
>  }
> +has_target_explicit_mask = 0;
> +if (j != 0 || k != 0) {
> + has_target_explicit_mask = 1;
> +}
>  
>  print "}";
>  
> @@ -1075,9 +1079,11 @@ for (i = 0; i < n_target_val; i++) {
>   print "return false;";
>  }
>  
> -print "  for (size_t i = 0; i < sizeof (ptr1->explicit_mask) / sizeof 
> (ptr1->explicit_mask[0]); i++)";
> -print "if (ptr1->explicit_mask[i] != ptr2->explicit_mask[i])";
> -print "  return false;"
> +if (has_target_explicit_mask) {
> + print "  for (size_t i = 0; i < sizeof (ptr1->explicit_mask) / sizeof 
> (ptr1->explicit_mask[0]); i++)";
> + print "if (ptr1->explicit_mask[i] != ptr2->explicit_mask[i])";
> + print "  return false;"
> +}
>  
>  for (i = 0; i < n_target_other; i++) {
>   if (var_target_other[i] in var_target_explicit_mask) {
> @@ -1121,8 +1127,10 @@ for (i = 0; i < n_target_val; i++) {
>   name = var_target_val[i]
>   print "  hstate.add_hwi (ptr->" name");";
>  }
> -print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
> -print "hstate.add_hwi (ptr->explicit_mask[i]);";
> +if (has_target_explicit_mask) {
> + print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
> + print "hstate.add_hwi (ptr->explicit_mask[i]);";
> +}
>  
>  for (i = 0; i < n_target_other; i++) {
>   if (var_target_other[i] in var_target_explicit_mask)
> @@ -1159,8 +1167,22 @@ for (i = 0; i < n_target_val; i++) {
>   print "  bp_pack_value (bp, ptr->" name", 64);";
>  }
>  
> -print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
> -print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
> +if (has_target_explicit_mask) {
> + print "  for (size_t i = 0; i < sizeof (ptr->explicit_mask) / sizeof 
> (ptr->explicit_mask[0]); i++)";
> + print "bp_pack_value (bp, ptr->explicit_mask[i], 64);";
> +}
> +
> +for (i = 0; i < n_target_other; i++) {
> + if (var_target_other[i] in var_target_explicit_mask) {
> + print "  bp_pack_value (bp, ptr->explicit_mask_" 
> var_target_other[i] ", 64);";
> + }
> +}
> +
> +for (i = 0; i < n_target_int; i++) {
> + if (var_target_int[i] in var_target_explicit_mask) {
> + print "  bp_pack_value (bp, ptr->explicit_mask_" 
> var_target_int[i] ", 

[PATCH][ftracer] Add caching of can_duplicate_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 9:05 AM, Tom de Vries wrote:
> Ack, updated the patch accordingly, and split it up in two bits, one
> that does refactoring, and one that adds the actual caching:
> - [ftracer] Factor out can_duplicate_bb_p
> - [ftracer] Add caching of can_duplicate_bb_p
> 
> I'll post these in reply to this email.

OK?

Thanks,
- Tom
[ftracer] Add caching of can_duplicate_bb_p

The fix "[omp, ftracer] Don't duplicate blocks in SIMT region" adds iteration
over insns in ignore_bb_p, which makes it more expensive.

Counteract this by piggybacking the computation of can_duplicate_bb_p onto
count_insns, which is called at the start of ftracer.

Bootstrapped and reg-tested on x86_64-linux.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (count_insns): Rename to ...
	(analyze_bb): ... this.
	(cache_can_duplicate_bb_p, cached_can_duplicate_bb_p): New function.
	(ignore_bb_p): Use cached_can_duplicate_bb_p.
	(tail_duplicate): Call cache_can_duplicate_bb_p.

---
 gcc/tracer.c | 47 +--
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index c0e888f6b03..0f69b335b8c 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -53,7 +53,7 @@
 #include "fibonacci_heap.h"
 #include "tracer.h"
 
-static int count_insns (basic_block);
+static void analyze_bb (basic_block, int *);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -143,6 +143,33 @@ can_duplicate_bb_p (const_basic_block bb)
   return true;
 }
 
+static sbitmap can_duplicate_bb;
+
+/* Cache VAL as value of can_duplicate_bb_p for BB.  */
+static inline void
+cache_can_duplicate_bb_p (const_basic_block bb, bool val)
+{
+  if (val)
+bitmap_set_bit (can_duplicate_bb, bb->index);
+}
+
+/* Return cached value of can_duplicate_bb_p for BB.  */
+static bool
+cached_can_duplicate_bb_p (const_basic_block bb)
+{
+  if (can_duplicate_bb)
+{
+  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
+  if ((unsigned int)bb->index < size)
+	return bitmap_bit_p (can_duplicate_bb, bb->index);
+
+  /* Assume added bb's should not be duplicated.  */
+  return false;
+}
+
+  return can_duplicate_bb_p (bb);
+}
+
 /* Return true if we should ignore the basic block for purposes of tracing.  */
 bool
 ignore_bb_p (const_basic_block bb)
@@ -152,24 +179,27 @@ ignore_bb_p (const_basic_block bb)
   if (optimize_bb_for_size_p (bb))
 return true;
 
-  return !can_duplicate_bb_p (bb);
+  return !cached_can_duplicate_bb_p (bb);
 }
 
 /* Return number of instructions in the block.  */
 
-static int
-count_insns (basic_block bb)
+static void
+analyze_bb (basic_block bb, int *count)
 {
   gimple_stmt_iterator gsi;
   gimple *stmt;
   int n = 0;
+  bool can_duplicate = can_duplicate_bb_no_insn_iter_p (bb);
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
 {
   stmt = gsi_stmt (gsi);
   n += estimate_num_insns (stmt, _size_weights);
+  can_duplicate = can_duplicate && can_duplicate_insn_p (stmt);
 }
-  return n;
+  *count = n;
+  cache_can_duplicate_bb_p (bb, can_duplicate);
 }
 
 /* Return true if E1 is more frequent than E2.  */
@@ -317,6 +347,8 @@ tail_duplicate (void)
  resize it.  */
   bb_seen = sbitmap_alloc (last_basic_block_for_fn (cfun) * 2);
   bitmap_clear (bb_seen);
+  can_duplicate_bb = sbitmap_alloc (last_basic_block_for_fn (cfun));
+  bitmap_clear (can_duplicate_bb);
   initialize_original_copy_tables ();
 
   if (profile_info && profile_status_for_fn (cfun) == PROFILE_READ)
@@ -330,7 +362,8 @@ tail_duplicate (void)
 
   FOR_EACH_BB_FN (bb, cfun)
 {
-  int n = count_insns (bb);
+  int n;
+  analyze_bb (bb, );
   if (!ignore_bb_p (bb))
 	blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), bb);
 
@@ -420,6 +453,8 @@ tail_duplicate (void)
 
   free_original_copy_tables ();
   sbitmap_free (bb_seen);
+  sbitmap_free (can_duplicate_bb);
+  can_duplicate_bb = NULL;
   free (trace);
   free (counts);
 


[PATCH][ftracer] Factor out can_duplicate_bb_p

2020-10-05 Thread Tom de Vries
[ was: Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region ]

On 10/5/20 9:05 AM, Tom de Vries wrote:
> Ack, updated the patch accordingly, and split it up in two bits, one
> that does refactoring, and one that adds the actual caching:
> - [ftracer] Factor out can_duplicate_bb_p
> - [ftracer] Add caching of can_duplicate_bb_p
> 
> I'll post these in reply to this email.
> 

OK?

Thanks,
- Tom
[ftracer] Factor out can_duplicate_bb_p

Factor out can_duplicate_bb_p out of ignore_bb_p.

Also factor out can_duplicate_insn_p and can_duplicate_bb_no_insn_iter_p to
expose the parts of can_duplicate_bb_p that are per-bb and per-insn.

Bootstrapped and reg-tested on x86_64-linux.

gcc/ChangeLog:

2020-10-05  Tom de Vries  

	* tracer.c (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p)
	(can_duplicate_bb_p): New function, factored out of ...
	(ignore_bb_p): ... here.

---
 gcc/tracer.c | 74 
 1 file changed, 50 insertions(+), 24 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 5e51752d89f..c0e888f6b03 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -84,49 +84,75 @@ bb_seen_p (basic_block bb)
   return bitmap_bit_p (bb_seen, bb->index);
 }
 
-/* Return true if we should ignore the basic block for purposes of tracing.  */
-bool
-ignore_bb_p (const_basic_block bb)
+/* Return true if gimple stmt G can be duplicated.  */
+static bool
+can_duplicate_insn_p (gimple *g)
+{
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+ duplicated as part of its group, or not at all.
+ The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
+ so the same holds there, but it could be argued that the
+ IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
+ in which case it could be duplicated.  */
+  if (is_gimple_call (g)
+  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
+	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
+return false;
+
+  return true;
+}
+
+/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
+static bool
+can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
 {
   if (bb->index < NUM_FIXED_BLOCKS)
-return true;
-  if (optimize_bb_for_size_p (bb))
-return true;
+return false;
 
   if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
 {
   /* A transaction is a single entry multiple exit region.  It
 	 must be duplicated in its entirety or not at all.  */
   if (gimple_code (g) == GIMPLE_TRANSACTION)
-	return true;
+	return false;
 
   /* An IFN_UNIQUE call must be duplicated as part of its group,
 	 or not at all.  */
   if (is_gimple_call (g)
 	  && gimple_call_internal_p (g)
 	  && gimple_call_internal_unique_p (g))
-	return true;
+	return false;
 }
 
+  return true;
+}
+
+/* Return true if BB can be duplicated.  */
+static bool
+can_duplicate_bb_p (const_basic_block bb)
+{
+  if (!can_duplicate_bb_no_insn_iter_p (bb))
+return false;
+
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
!gsi_end_p (gsi); gsi_next ())
-{
-  gimple *g = gsi_stmt (gsi);
-
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
-	 duplicated as part of its group, or not at all.
-	 The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
-	 so the same holds there, but it could be argued that the
-	 IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
-	 in which case it could be duplicated.  */
-  if (is_gimple_call (g)
-	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
-	  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)))
-	return true;
-}
+if (!can_duplicate_insn_p (gsi_stmt (gsi)))
+  return false;
+
+  return true;
+}
+
+/* Return true if we should ignore the basic block for purposes of tracing.  */
+bool
+ignore_bb_p (const_basic_block bb)
+{
+  if (bb->index < NUM_FIXED_BLOCKS)
+return true;
+  if (optimize_bb_for_size_p (bb))
+return true;
 
-  return false;
+  return !can_duplicate_bb_p (bb);
 }
 
 /* Return number of instructions in the block.  */


Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-10-05 Thread Tom de Vries
On 9/24/20 2:44 PM, Richard Biener wrote:
> On Thu, 24 Sep 2020, Tom de Vries wrote:
> 
>> On 9/24/20 1:42 PM, Richard Biener wrote:
>>> On Wed, 23 Sep 2020, Tom de Vries wrote:
>>>
 On 9/23/20 9:28 AM, Richard Biener wrote:
> On Tue, 22 Sep 2020, Tom de Vries wrote:
>
>> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
>> with SIMT LANE [PR95654] ]
>>
>> On 9/16/20 8:20 PM, Alexander Monakov wrote:
>>>
>>>
>>> On Wed, 16 Sep 2020, Tom de Vries wrote:
>>>
 [ cc-ing author omp support for nvptx. ]
>>>
>>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
>>> recognized it too for their GPU targets). In an attempt to get agreement
>>> to fix the issue "properly" for GCC I found a similar issue that affects
>>> all targets, not just offloading, and filed it as PR 80053.
>>>
>>> (yes, there are no addressable labels involved in offloading, but 
>>> nevertheless
>>> the nature of the middle-end issue is related)
>>
>> Hi Alexander,
>>
>> thanks for looking into this.
>>
>> Seeing that the attempt to fix things properly is stalled, for now I'm
>> proposing a point-fix, similar to the original patch proposed by Tobias.
>>
>> Richi, Jakub, OK for trunk?
>
> I notice that we call ignore_bb_p many times in tracer.c but one call
> is conveniently early in tail_duplicate (void):
>
>   int n = count_insns (bb);
>   if (!ignore_bb_p (bb))
> blocks[bb->index] = heap.insert (-bb->count.to_frequency (cfun), 
> bb);
>
> where count_insns already walks all stmts in the block.  It would be
> nice to avoid repeatedly walking all stmts, maybe adjusting the above
> call is enough and/or count_insns can compute this and/or the ignore_bb_p
> result can be cached (optimize_bb_for_size_p might change though,
> but maybe all other ignore_bb_p calls effectively just are that,
> checks for blocks that became optimize_bb_for_size_p).
>

 This untested follow-up patch tries something in that direction.

 Is this what you meant?
>>>
>>> Yeah, sort of.
>>>
>>> +static bool
>>> +cached_can_duplicate_bb_p (const_basic_block bb)
>>> +{
>>> +  if (can_duplicate_bb)
>>>
>>> is there any path where can_duplicate_bb would be NULL?
>>>
>>
>> Yes, ignore_bb_p is called from gimple-ssa-split-paths.c.
> 
> Oh, that was probably done because of the very same OMP issue ...
> 
>>> +{
>>> +  unsigned int size = SBITMAP_SIZE (can_duplicate_bb);
>>> +  /* Assume added bb's should be ignored.  */
>>> +  if ((unsigned int)bb->index < size
>>> + && bitmap_bit_p (can_duplicate_bb_computed, bb->index))
>>> +   return !bitmap_bit_p (can_duplicate_bb, bb->index);
>>>
>>> yes, newly added bbs should be ignored so,
>>>
>>>  }
>>>  
>>> -  return false;
>>> +  bool val = compute_can_duplicate_bb_p (bb);
>>> +  if (can_duplicate_bb)
>>> +cache_can_duplicate_bb_p (bb, val);
>>>
>>> no need to compute & cache for them, just return true (because
>>> we did duplicate them)?
>>>
>>
>> Also the case for gimple-ssa-split-paths.c.?
> 
> If it had the bitmap then yes ... since it doesn't the early
> out should be in the conditional above only.
> 

Ack, updated the patch accordingly, and split it up in two bits, one
that does refactoring, and one that adds the actual caching:
- [ftracer] Factor out can_duplicate_bb_p
- [ftracer] Add caching of can_duplicate_bb_p

I'll post these in reply to this email.

Thanks,
- Tom


Re: [PATCH][omp, ftracer] Don't duplicate blocks in SIMT region

2020-10-05 Thread Tom de Vries
On 9/22/20 6:38 PM, Tom de Vries wrote:
> [ was: Re: [Patch] [middle-end & nvptx] gcc/tracer.c: Don't split BB
> with SIMT LANE [PR95654] ]
> 
> On 9/16/20 8:20 PM, Alexander Monakov wrote:
>>
>>
>> On Wed, 16 Sep 2020, Tom de Vries wrote:
>>
>>> [ cc-ing author omp support for nvptx. ]
>>
>> The issue looks familiar. I recognized it back in 2017 (and LLVM people
>> recognized it too for their GPU targets). In an attempt to get agreement
>> to fix the issue "properly" for GCC I found a similar issue that affects
>> all targets, not just offloading, and filed it as PR 80053.
>>
>> (yes, there are no addressable labels involved in offloading, but 
>> nevertheless
>> the nature of the middle-end issue is related)
> 
> Hi Alexander,
> 
> thanks for looking into this.
> 
> Seeing that the attempt to fix things properly is stalled, for now I'm
> proposing a point-fix, similar to the original patch proposed by Tobias.
> 
> Richi, Jakub, OK for trunk?
> 

I've had to modify this patch in two ways:
- the original test-case stopped failing, though not the
  minimized one, so I added that one as a test-case
- only testing for ENTER_ALLOC and EXIT, and not explicitly for VOTE_ANY
  in ignore_bb_p also stopped working, so I've added that now.

Re-tested and committed.

Thanks,
- Tom
[omp, ftracer] Don't duplicate blocks in SIMT region

When running the libgomp testsuite on x86_64-linux with nvptx accelerator on
the test-case included in this patch, we run into:
...
FAIL: libgomp.fortran/pr95654.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The test-case is a minimal version of this FAIL:
...
FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...
but that one has stopped failing at commit c2ebf4f10de "openmp: Add support
for non-rect simd and improve collapsed simd support".

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| v
| *
v bb8
*<*
bb5(VOTE_ANY)
*-+
| |
| |
| |
| |
| v
| *
v bb7(XCHG_IDX)
*<*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*--+
|   \
|\
| \
|  \
v   v
*   *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*   *
|\ /|
| \  ++ |
|  \/   |
|  /\   |
| /  +--v
|/  *
v   bb7(XCHG_IDX)
*<--*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC, VOTE_ANY and
EXIT, effectively treating the SIMT region conservatively.

An argument can be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Bootstrapped and reg-tested on x86_64-linux.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

	PR fortran/95654
	* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC,
	GOMP_SIMT_VOTE_ANY and GOMP_SIMT_EXIT.

libgomp/ChangeLog:

2020-10-05  Tom de Vries  

	PR fortran/95654
	* testsuite/libgomp.fortran/pr95654.f90: New test.

---
 gcc/tracer.c  | 18 ++
 libgomp/testsuite/libgomp.fortran/pr95654.f90 | 11 +++
 2 files changed, 29 insertions(+)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index 82ede722534..5e51752d89f 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -108,6 +108,24 @@ ignore_bb_p (const_basic_block bb)
 	return true;
 }
 
+  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
+   !gsi_end_p (gsi); gsi_next ())
+{
+  gimple *g = gsi_stmt (gsi);
+
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+	 duplicated as part of its group, or not at all.
+	 The IFN_GOMP_SIMT_VOTE_ANY is currently part of such a group,
+	 so the same holds there, but it could be argued that the
+	 IFN_GOMP_SIMT_VOTE_ANY could be generated after that group,
+	 in which case it could be duplicated.  */
+  if (is_gimple_call (g)
+	  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+