RE: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-10-09 Thread duanbo (C)


> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Wednesday, September 30, 2020 6:38 PM
> To: duanbo (C) 
> Cc: GCC Patches 
> Subject: Re: [PATCH PR96757] aarch64: ICE during GIMPLE pass: vect
> 
> Thanks for the update, looks good apart from…
> 
> "duanbo (C)"  writes:
> > @@ -4361,7 +4391,7 @@ vect_recog_mask_conversion_pattern (vec_info
> *vinfo,
> >if (known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
> > TYPE_VECTOR_SUBPARTS (vectype2))
> >   && (TREE_CODE (rhs1) == SSA_NAME
> > - || rhs1_type == TREE_TYPE (TREE_OPERAND (rhs1, 0
> > + || !rhs1_op0_type || !rhs1_op1_type))
> > return NULL;
> 
> …I think this should be:
> 
> && (TREE_CODE (rhs1) == SSA_NAME
> || (!rhs1_op0_type && !rhs1_op1_type))
> 
> i.e. punt only if both types are already OK.  If one operand wants a specific
> mask type, we should continue to the code below and attach the chosen
> type to the comparison.
> 
> Although I guess this simplifies to:
> 
>   if (known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
> TYPE_VECTOR_SUBPARTS (vectype2))
>   && !rhs1_op0_type
>   && !rhs1_op1_type)
> return NULL;
> 
> (I think the comment above the code is still accurate with this change.)
> 
> > @@ -4393,7 +4423,16 @@ vect_recog_mask_conversion_pattern
> (vec_info *vinfo,
> >if (TREE_CODE (rhs1) != SSA_NAME)
> > {
> >   tmp = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
> > - pattern_stmt = gimple_build_assign (tmp, rhs1);
> > + if (rhs1_op0_type && TYPE_PRECISION (rhs1_op0_type)
> > +   != TYPE_PRECISION (rhs1_type))
> > +   rhs1_op0 = build_mask_conversion (vinfo, rhs1_op0,
> > + vectype2, stmt_vinfo);
> > + if (rhs1_op1_type && TYPE_PRECISION (rhs1_op1_type)
> > +   != TYPE_PRECISION (rhs1_type))
> 
> Very minor -- I would have fixed this up before committing if it wasn't for 
> the
> above -- but: GCC formatting is instead:
> 
> if (rhs1_op1_type
> && TYPE_PRECISION (rhs1_op1_type) != TYPE_PRECISION
> (rhs1_type))
> 
> LGTM with those changes, thanks.
> 
> Richard
> 
> > +   rhs1_op1 = build_mask_conversion (vinfo, rhs1_op1,
> > + vectype2, stmt_vinfo);
> > + pattern_stmt = gimple_build_assign (tmp, TREE_CODE (rhs1),
> > + rhs1_op0, rhs1_op1);
> >   rhs1 = tmp;
> >   append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt,
> vectype2,
> >   rhs1_type);

Sorry for the late reply.
I have modified the patch according to your suggestion, and it works well.
Ok for trunk?

Thanks,
Duan bo



pr96757-v3.patch
Description: pr96757-v3.patch


[PATCH] random memory leak fixes

2020-10-09 Thread Richard Biener
This fixes leaks discovered checking whether I introduced new ones
with the last vectorizer changes.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.  Parts
are also applicable for branches.

2020-10-09  Richard Biener  

* cgraphunit.c (expand_all_functions): Free tp_first_run_order.
* ipa-modref.c (pass_ipa_modref::execute): Free order.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Free
loop body.
* tree-vect-data-refs.c (vect_find_stmt_data_reference): Free
data references upon failure.
* tree-vect-loop.c (update_epilogue_loop_vinfo): Free BBs
array of the original loop.
* tree-vect-slp.c (vect_slp_bbs): Use an auto_vec for
dataref_groups to release its memory.
---
 gcc/cgraphunit.c  |  1 +
 gcc/ipa-modref.c  |  1 +
 gcc/tree-ssa-loop-niter.c |  1 +
 gcc/tree-vect-data-refs.c | 37 +
 gcc/tree-vect-loop.c  |  1 +
 gcc/tree-vect-slp.c   |  2 +-
 6 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index bedb6e2eea1..19ae8763373 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2494,6 +2494,7 @@ expand_all_functions (void)
   delete ipa_saved_clone_sources;
   ipa_saved_clone_sources = NULL;
   free (order);
+  free (tp_first_run_order);
 }
 
 /* This is used to sort the node types by the cgraph order number.  */
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 5868aa97484..c22c0d233f7 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1748,6 +1748,7 @@ pass_ipa_modref::execute (function *)
 }
   ((modref_summaries *)summaries)->ipa = false;
   ipa_free_postorder_info ();
+  free (order);
   return 0;
 }
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 45747e150f4..697d30fb989 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -4305,6 +4305,7 @@ estimate_numbers_of_iterations (class loop *loop)
 
   if (flag_aggressive_loop_optimizations)
 infer_loop_bounds_from_undefined (loop, body);
+  free (body);
 
   discover_iteration_bound_by_body_walk (loop);
 
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 5bf93e2942b..676182c0888 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4045,29 +4045,42 @@ vect_find_stmt_data_reference (loop_p loop, gimple 
*stmt,
 return opt_result::success ();
 
   if (refs.length () > 1)
-return opt_result::failure_at (stmt,
-  "not vectorized:"
-  " more than one data ref in stmt: %G", stmt);
+{
+  while (!refs.is_empty ())
+   free_data_ref (refs.pop ());
+  return opt_result::failure_at (stmt,
+"not vectorized: more than one "
+"data ref in stmt: %G", stmt);
+}
 
+  data_reference_p dr = refs.pop ();
   if (gcall *call = dyn_cast  (stmt))
 if (!gimple_call_internal_p (call)
|| (gimple_call_internal_fn (call) != IFN_MASK_LOAD
&& gimple_call_internal_fn (call) != IFN_MASK_STORE))
-  return opt_result::failure_at (stmt,
-"not vectorized: dr in a call %G", stmt);
+  {
+   free_data_ref (dr);
+   return opt_result::failure_at (stmt,
+  "not vectorized: dr in a call %G", stmt);
+  }
 
-  data_reference_p dr = refs.pop ();
   if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
   && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
-return opt_result::failure_at (stmt,
-  "not vectorized:"
-  " statement is bitfield access %G", stmt);
+{
+  free_data_ref (dr);
+  return opt_result::failure_at (stmt,
+"not vectorized:"
+" statement is bitfield access %G", stmt);
+}
 
   if (DR_BASE_ADDRESS (dr)
   && TREE_CODE (DR_BASE_ADDRESS (dr)) == INTEGER_CST)
-return opt_result::failure_at (stmt,
-  "not vectorized:"
-  " base addr of dr is a constant\n");
+{
+  free_data_ref (dr);
+  return opt_result::failure_at (stmt,
+"not vectorized:"
+" base addr of dr is a constant\n");
+}
 
   /* Check whether this may be a SIMD lane access and adjust the
  DR to make it easier for us to handle it.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ce5d95d7277..0a315e6 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -8817,6 +8817,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)
   basic_block *epilogue_bbs = get_loop_body (epilogue);
   unsigned i;
 
+  free (LOOP_VINFO_BBS (epilogue_vinfo));
   LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_b

[r10-8871 Regression] FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 578437695752307201" 2 on Linux/x86_64

2020-10-09 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

e4c9aac98611f63847ef6c57916808d9a2d7abcb is the first bad commit
commit e4c9aac98611f63847ef6c57916808d9a2d7abcb
Author: Martin Sebor 
Date:   Thu Oct 8 12:35:01 2020 -0600

Correct handling of constant representations containing embedded nuls 
(backport from trunk).

caused

FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
1976943448883713" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
576467370915332609" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
578431098682540545" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
578437695685198337" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
578437695752110593" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
578437695752306689" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
578437695752307200" 1
FAIL: gcc.target/i386/memcpy-pr95886.c scan-rtl-dump-times expand "const_int 
578437695752307201" 2

with GCC configured with

Configured with: ../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-10/releases/gcc-10/r10-8871/usr
 --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/memcpy-pr95886.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/memcpy-pr95886.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] Move pr97315-1.c test to g++.dg/opt/.

2020-10-09 Thread Aldy Hernandez via Gcc-patches
OK for trunk?

gcc/testsuite/ChangeLog:

PR testsuite/97337
* gcc.dg/pr97315-1.c: Moved to...
* g++.dg/opt/pr97315-1.C: ...here.
---
 gcc/testsuite/{gcc.dg/pr97315-1.c => g++.dg/opt/pr97315-1.C} | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.dg/pr97315-1.c => g++.dg/opt/pr97315-1.C} (95%)

diff --git a/gcc/testsuite/gcc.dg/pr97315-1.c 
b/gcc/testsuite/g++.dg/opt/pr97315-1.C
similarity index 95%
rename from gcc/testsuite/gcc.dg/pr97315-1.c
rename to gcc/testsuite/g++.dg/opt/pr97315-1.C
index 250e0e9ecbb..5a618d8e1e8 100644
--- a/gcc/testsuite/gcc.dg/pr97315-1.c
+++ b/gcc/testsuite/g++.dg/opt/pr97315-1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fno-exceptions" } */
 
 typedef struct tree_node *tree;
 enum tree_code { RECORD_TYPE, QUAL_UNION_TYPE };
-- 
2.26.2



Re: [PATCH] Move pr97315-1.c test to g++.dg/opt/.

2020-10-09 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 09, 2020 at 10:11:29AM +0200, Aldy Hernandez wrote:
> OK for trunk?

Sure.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR testsuite/97337
>   * gcc.dg/pr97315-1.c: Moved to...
>   * g++.dg/opt/pr97315-1.C: ...here.
> ---
>  gcc/testsuite/{gcc.dg/pr97315-1.c => g++.dg/opt/pr97315-1.C} | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>  rename gcc/testsuite/{gcc.dg/pr97315-1.c => g++.dg/opt/pr97315-1.C} (95%)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr97315-1.c 
> b/gcc/testsuite/g++.dg/opt/pr97315-1.C
> similarity index 95%
> rename from gcc/testsuite/gcc.dg/pr97315-1.c
> rename to gcc/testsuite/g++.dg/opt/pr97315-1.C
> index 250e0e9ecbb..5a618d8e1e8 100644
> --- a/gcc/testsuite/gcc.dg/pr97315-1.c
> +++ b/gcc/testsuite/g++.dg/opt/pr97315-1.C
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3" } */
> +/* { dg-options "-O3 -fno-exceptions" } */
>  
>  typedef struct tree_node *tree;
>  enum tree_code { RECORD_TYPE, QUAL_UNION_TYPE };
> -- 
> 2.26.2

Jakub



Re: [PATCH] adjust BB vectorization dump scanning

2020-10-09 Thread Richard Biener
On Thu, 8 Oct 2020, Thomas Schwinge wrote:

> Hi Richard!
> 
> On 2020-10-08T13:34:02+0200, Richard Biener  wrote:
> > It might be interesting to work on adding sth like
> > dg-warning to look for -fopt-info-{optimized,missing} so
> > we could directly annotate (not) vectorized loops instead of
> > relying on fragile counts.
> 
> I'm maybe (likely?) misunderstanding what you're looking for, but just in
> case I'm not, the following works already:
> 
> --- gcc/testsuite/gcc.dg/vect/bb-slp-1.c
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fopt-info-optimized-vec" } */
> 
>  #include 
>  #include "tree-vect.h"
> @@ -17,7 +18,7 @@ main1 (int dummy)
> 
>for (i = 0; i < N; i++)
>  {
> -  *pout++ = *pin++;
> +  *pout++ = *pin++; /* { dg-message "optimized: basic block part 
> vectorized" } */

Oh, nice.  OK, well - it doesn't help with extra spurious vectorizations
of course.  I'd also have to check for duplicate messages on the same line
caused by unrolling.

But yeah, I'll see whether this makes it easier to follow what we actually
expect to be vectorized...

Thanks,
Richard.

>*pout++ = *pin++;
>*pout++ = *pin++;
>*pout++ = *pin++;
> @@ -55,4 +56,3 @@ int main (void)
>  }
> 
>  /* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */
> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } 
> } */
> 
> 
> Gr??e
>  Thomas
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany
> Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, 
> Alexander Walter
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Re: [PING][PATCH v2] combine: Don't turn (mult (extend x) 2^n) into extract [PR96998]

2020-10-09 Thread Alex Coplan via Gcc-patches
Hi Segher,

On 08/10/2020 15:20, Segher Boessenkool wrote:
> On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> > Ping. The kernel is still broken on AArch64.
> 
> You *cannot* fix a correctness bug with a combine addition.

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html
explains why we do precisely that.

Also, as your own testing confirmed, the patch does indeed fix the issue.

> So please fix the target bug first.

I think the problem here -- the reason that we're talking past each
other -- is that there are (at least) two parts of the codebase that can
be blamed for the ICE here:

1. aarch64: "The insn is unrecognized, so it's a backend bug
(missing pattern or similar)."

2. combine: "Combine produces a non-canonical insn, so the backend
(correctly) doesn't recognise it, and combine is at fault."

Now I initially (naively) took interpretation 1 here and tried to fix
the ICE by adding a pattern to recognise the sign_extract insn that
combine is producing here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553605.html

Howerver, in the review of that patch, Richard opened my eyes to
interpretation 2, which in hindsight is clearly a better way to fix the
issue.

Combine already does the canonicalisation for the (ashift x n) case, so
it seems like an obvious improvement to do the same for the (mult x 2^n)
case, as this is how shifts are represented inside mem rtxes.

Again, please see Richard's comments here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html

> 
> I haven't had time to look at your patch yet, sorry.

Not to worry. Hopefully this clears up any confusion around what we're
trying to do here and why.

Thanks,
Alex


Re: [PATCH] c++: Distinguish btw. alignof and __alignof__ in cp_tree_equal [PR97273]

2020-10-09 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 06, 2020 at 03:40:52PM -0400, Jason Merrill via Gcc-patches wrote:
> On 10/4/20 11:28 PM, Patrick Palka wrote:
> > cp_tree_equal currently considers alignof the same as __alignof__, but
> > these operators are semantically different ever since r8-7957.  In the
> > testcase below, this causes the second static_assert to fail on targets
> > where alignof(double) != __alignof__(double) because the specialization
> > cache (which uses cp_tree_equal as the equality predicate) conflates the
> > two dependent specializations integral_constant<__alignof__(T)> and
> > integral_constant.
> > 
> > This patch makes cp_tree_equal distinguish between these two operators
> > by inspecting the ALIGNOF_EXPR_STD_P flag.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, and also verified
> > that we now correctly compile the  PR97273 testcase, does this look OK
> > for trunk and the release branches?
> 
> OK.

Shouldn't we then mangle alignof and __alignof__ differently though?

Jakub



[PUSHED] Fix for PR97317.

2020-10-09 Thread Aldy Hernandez via Gcc-patches
As discussed in the PR, this fixes the range-ops cast code to handle
casts where the precision of the RHS is only 1 greater than the precision
of the LHS.

This is actually Andrew's patch.

Tested on x86-64 Linux.

Pushed.

gcc/ChangeLog:

PR tree-optimization/97317
* range-op.cc (operator_cast::op1_range):

gcc/testsuite/ChangeLog:

* gcc.dg/pr97317.c: New test.
---
 gcc/range-op.cc| 25 ++---
 gcc/testsuite/gcc.dg/pr97317.c | 11 +++
 2 files changed, 29 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr97317.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 22bc23c1bbf..d1a11b34894 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1849,14 +1849,25 @@ operator_cast::op1_range (irange &r, tree type,
  type,
  converted_lhs,
  lim_range);
- // And union this with the entire outer types negative range.
- int_range_max neg (type,
-wi::min_value (TYPE_PRECISION (type),
-   SIGNED),
-lim - 1);
- neg.union_ (lhs_neg);
+ // lhs_neg now has all the negative versions of the LHS.
+ // Now union in all the values from SIGNED MIN (0x8) to
+ // lim-1 in order to fill in all the ranges with the upper
+ // bits set.
+
+ // PR 97317.  If the lhs has only 1 bit less precision than the rhs,
+ // we don't need to create a range from min to lim-1
+ // calculate neg range traps trying to create [lim, lim - 1].
+ wide_int min_val = wi::min_value (TYPE_PRECISION (type), SIGNED);
+ if (lim != min_val)
+   {
+ int_range_max neg (type,
+wi::min_value (TYPE_PRECISION (type),
+   SIGNED),
+lim - 1);
+ lhs_neg.union_ (neg);
+   }
  // And finally, munge the signed and unsigned portions.
- r.union_ (neg);
+ r.union_ (lhs_neg);
}
   // And intersect with any known value passed in the extra operand.
   r.intersect (op2);
diff --git a/gcc/testsuite/gcc.dg/pr97317.c b/gcc/testsuite/gcc.dg/pr97317.c
new file mode 100644
index 000..f07327ac9a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97317.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct a {
+  unsigned c : 17;
+};
+struct a b;
+int d(void) {
+  short e = b.c;
+  return e ? 0 : b.c;
+}
-- 
2.26.2



Re: [PATCH] IBM Z: Change vector copysign to use bitwise operations

2020-10-09 Thread Andreas Krebbel via Gcc-patches
On 08.10.20 11:38, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  OK for master?
> 
> The vector copysign pattern incorrectly assumes that vector
> if_then_else operates on bits, not on elements.  This can theoretically
> mislead the optimizers.  Fix by changing it to use bitwise operations,
> like commit 2930bb321794 ("PR94613: Fix vec_sel builtin for IBM Z") did
> for vec_sel builtin.
> 
> gcc/ChangeLog:
> 
> 2020-10-07  Ilya Leoshkevich  
> 
>   * config/s390/s390-protos.h (s390_build_signbit_mask): New
>   function.
>   * config/s390/s390.c (s390_tointvec): New function.
>   (s390_contiguous_bitmask_vector_p): Bitcast the argument to
>   an integral mode.
>   (s390_expand_vec_init): Do not call
>   s390_contiguous_bitmask_vector_p with a scalar argument.
>   (s390_build_signbit_mask): New function.
>   * config/s390/vector.md (copysign3): Use bitwise
>   operations.

Couldn't s390_tointvec be implemented/replaced with related_int_vector_mode?

Ok, Thanks!

Andreas

> ---
>  gcc/config/s390/s390-protos.h |  1 +
>  gcc/config/s390/s390.c| 92 ---
>  gcc/config/s390/vector.md | 31 
>  3 files changed, 95 insertions(+), 29 deletions(-)
> 
> diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
> index 6f1bc07db17..029f7289fac 100644
> --- a/gcc/config/s390/s390-protos.h
> +++ b/gcc/config/s390/s390-protos.h
> @@ -121,6 +121,7 @@ extern void s390_expand_vec_compare_cc (rtx, enum 
> rtx_code, rtx, rtx, bool);
>  extern enum rtx_code s390_reverse_condition (machine_mode, enum rtx_code);
>  extern void s390_expand_vcond (rtx, rtx, rtx, enum rtx_code, rtx, rtx);
>  extern void s390_expand_vec_init (rtx, rtx);
> +extern rtx s390_build_signbit_mask (machine_mode);
>  extern rtx s390_return_addr_rtx (int, rtx);
>  extern rtx s390_back_chain_rtx (void);
>  extern rtx_insn *s390_emit_call (rtx, rtx, rtx, rtx);
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 93894307d62..554c1adf40a 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -2450,6 +2450,54 @@ s390_contiguous_bitmask_p (unsigned HOST_WIDE_INT in, 
> bool wrap_p,
>return b;
>  }
>  
> +/* Return the associated integral mode of VEC_MODE.  Must be in sync with
> +   tointvec mode_attr.  */
> +static machine_mode
> +s390_tointvec (machine_mode vec_mode)
> +{
> +  switch (vec_mode)
> +{
> +case V1QImode:
> +  return V1QImode;
> +case V2QImode:
> +  return V2QImode;
> +case V4QImode:
> +  return V4QImode;
> +case V8QImode:
> +  return V8QImode;
> +case V16QImode:
> +  return V16QImode;
> +case V1HImode:
> +  return V1HImode;
> +case V2HImode:
> +  return V2HImode;
> +case V4HImode:
> +  return V4HImode;
> +case V8HImode:
> +  return V8HImode;
> +case V1SImode:
> +case V1SFmode:
> +  return V1SImode;
> +case V2SImode:
> +case V2SFmode:
> +  return V2SImode;
> +case V4SImode:
> +case V4SFmode:
> +  return V4SImode;
> +case V1DImode:
> +case V1DFmode:
> +  return V1DImode;
> +case V2DImode:
> +case V2DFmode:
> +  return V2DImode;
> +case V1TImode:
> +case V1TFmode:
> +  return V1TImode;
> +default:
> +  gcc_unreachable ();
> +}
> +}
> +
>  /* Return true if OP contains the same contiguous bitfield in *all*
> its elements.  START and END can be used to obtain the start and
> end position of the bitfield.
> @@ -2467,6 +2515,9 @@ s390_contiguous_bitmask_vector_p (rtx op, int *start, 
> int *end)
>rtx elt;
>bool b;
>  
> +  /* Handle floats by bitcasting them to ints.  */
> +  op = gen_lowpart (s390_tointvec (GET_MODE (op)), op);
> +
>gcc_assert (!!start == !!end);
>if (!const_vec_duplicate_p (op, &elt)
>|| !CONST_INT_P (elt))
> @@ -6863,15 +6914,16 @@ s390_expand_vec_init (rtx target, rtx vals)
>  }
>  
>/* Use vector gen mask or vector gen byte mask if possible.  */
> -  if (all_same && all_const_int
> -  && (XVECEXP (vals, 0, 0) == const0_rtx
> -   || s390_contiguous_bitmask_vector_p (XVECEXP (vals, 0, 0),
> -NULL, NULL)
> -   || s390_bytemask_vector_p (XVECEXP (vals, 0, 0), NULL)))
> +  if (all_same && all_const_int)
>  {
> -  emit_insn (gen_rtx_SET (target,
> -   gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0;
> -  return;
> +  rtx vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
> +  if (XVECEXP (vals, 0, 0) == const0_rtx
> +   || s390_contiguous_bitmask_vector_p (vec, NULL, NULL)
> +   || s390_bytemask_vector_p (vec, NULL))
> + {
> +   emit_insn (gen_rtx_SET (target, vec));
> +   return;
> + }
>  }
>  
>/* Use vector replicate instructions.  vlrep/vrepi/vrep  */
> @@ -6949,6 +7001,30 @@ s390_expand_vec_init (rtx target, rtx vals)
>  }
>  }
>

Re: [committed][nvptx] Split up function ref plus const

2020-10-09 Thread Thomas Schwinge
Hi Tom!

On 2020-09-23T22:46:34+0200, Tom de Vries  wrote:
> With test-case gcc.c-torture/compile/pr92231.c, we run into:

"Interesting" testcase...  ;-)

> ...
> nvptx-as: ptxas terminated with signal 11 [Segmentation fault], core dumped^M

Confirmed with:

$ ptxas --version
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:15_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

..., and:

$ ptxas --version
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Dec__1_00:57:38_CST_2017
Cuda compilation tools, release 9.1, V9.1.108

Have you reported this to Nvidia?

> compiler exited with status 1
> FAIL: gcc.c-torture/compile/pr92231.c   -O0  (test for excess errors)
> ...
> due to using a function reference plus constant as operand:
> ...
>   mov.u64 %r24,bar+4096';
> ...
>
> Fix this by splitting such an insn into:
> ...
>   mov.u64 %r24,bar';
>   add.u64 %r24,%r24,4096';
> ...

(Spurious single-quote characters in PTX code?)


Grüße
 Thomas


> Tested on nvptx.
>
> Committed to trunk.
>
> Thanks,
> - Tom
>
> [nvptx] Split up function ref plus const
>
> gcc/ChangeLog:
>
>   * config/nvptx/nvptx.md: Don't allow operand containing sum of
>   function ref and const.
>
> ---
>  gcc/config/nvptx/nvptx.md | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 6178e6a0f77..035f6e0151b 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -146,6 +146,13 @@
>return true;
>  })
>
> +;; Test for a function symbol ref operand
> +(define_predicate "symbol_ref_function_operand"
> +  (match_code "symbol_ref")
> +{
> +  return SYMBOL_REF_FUNCTION_P (op);
> +})
> +
>  (define_attr "predicable" "false,true"
>(const_string "true"))
>
> @@ -241,6 +248,17 @@
>  }
>[(set_attr "subregs_ok" "true")])
>
> +;; ptxas segfaults on 'mov.u64 %r24,bar+4096', so break it up.
> +(define_split
> +  [(set (match_operand:DI 0 "nvptx_register_operand")
> + (const:DI (plus:DI (match_operand:DI 1 "symbol_ref_function_operand")
> +(match_operand 2 "const_int_operand"]
> +  ""
> +  [(set (match_dup 0) (match_dup 1))
> +   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 2)))
> +  ]
> +  "")
> +
>  (define_insn "*mov_insn"
>[(set (match_operand:SDFM 0 "nonimmediate_operand" "=R,R,m")
>   (match_operand:SDFM 1 "general_operand" "RF,m,R"))]
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] tree-optimization/97347 - fix another SLP constant insertion issue

2020-10-09 Thread Richard Biener
Just use edge insertion which will appropriately handle the situation
from botan.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

2020-10-09  Richard Biener  

PR tree-optimization/97347
* tree-vect-slp.c (vect_create_constant_vectors): Use
edge insertion when inserting on the fallthru edge,
appropriately insert at the start of BBs when inserting
after PHIs.

* g++.dg/vect/pr97347.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr97347.cc | 41 
 gcc/tree-vect-slp.c  | 19 +
 2 files changed, 54 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr97347.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr97347.cc 
b/gcc/testsuite/g++.dg/vect/pr97347.cc
new file mode 100644
index 000..6a9116c412a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr97347.cc
@@ -0,0 +1,41 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+
+inline namespace __cxx11 {}
+typedef int size_t;
+class MessageAuthenticationCode;
+class __uniq_ptr_impl {
+  struct _Ptr {
+using type = MessageAuthenticationCode *;
+  };
+public:
+  using pointer = _Ptr::type;
+};
+class unique_ptr {
+public:
+  using pointer = __uniq_ptr_impl::pointer;
+  unique_ptr(pointer);
+};
+namespace __cxx11 {
+class basic_string {
+public:
+  basic_string(char *);
+  ~basic_string();
+};
+} // namespace __cxx11
+class MessageAuthenticationCode {};
+class SCAN_Name {
+public:
+  SCAN_Name(basic_string);
+  size_t arg_as_integer();
+};
+class SipHash : public MessageAuthenticationCode {
+public:
+  SipHash(size_t c, size_t d) : m_C(c), m_D(d) {}
+  size_t m_C, m_D;
+};
+void create(basic_string algo_spec, char *s) {
+  basic_string provider = s;
+  SCAN_Name req(algo_spec);
+  unique_ptr(new SipHash(req.arg_as_integer(), req.arg_as_integer()));
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 77ea4d0eb51..479c3eeaec7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4145,9 +4145,17 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree 
op_node)
{
  gimple_stmt_iterator gsi;
  if (gimple_code (insert_after->stmt) == GIMPLE_PHI)
-   gsi = gsi_after_labels (gimple_bb (insert_after->stmt));
+   {
+ gsi = gsi_after_labels (gimple_bb 
(insert_after->stmt));
+ gsi_insert_seq_before (&gsi, ctor_seq,
+GSI_CONTINUE_LINKING);
+   }
  else if (!stmt_ends_bb_p (insert_after->stmt))
-   gsi = gsi_for_stmt (insert_after->stmt);
+   {
+ gsi = gsi_for_stmt (insert_after->stmt);
+ gsi_insert_seq_after (&gsi, ctor_seq,
+   GSI_CONTINUE_LINKING);
+   }
  else
{
  /* When we want to insert after a def where the
@@ -4155,11 +4163,10 @@ vect_create_constant_vectors (vec_info *vinfo, slp_tree 
op_node)
 edge.  */
  edge e = find_fallthru_edge
 (gimple_bb (insert_after->stmt)->succs);
- gcc_assert (single_pred_p (e->dest));
- gsi = gsi_after_labels (e->dest);
+ basic_block new_bb
+   = gsi_insert_seq_on_edge_immediate (e, ctor_seq);
+ gcc_assert (!new_bb);
}
- gsi_insert_seq_after (&gsi, ctor_seq,
-   GSI_CONTINUE_LINKING);
}
  else
vinfo->insert_seq_on_entry (NULL, ctor_seq);
-- 
2.26.2


RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-10-09 Thread xiezhiheng
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, August 27, 2020 4:08 PM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng  writes:
> > I made two separate patches for these two groups for review purposes.
> >
> > Note: Patch for min/max intrinsics should be applied before the patch for
> rounding intrinsics
> >
> > Bootstrapped and tested on aarch64 Linux platform.
> 
> Thanks, LGTM.  Pushed to master.
> 
> Richard

I made the patch for multiply and multiply accumulator intrinsics.

Note that bfmmlaq intrinsic is special because this instruction ignores the 
FPCR and does not update the FPSR exception status.
  
https://developer.arm.com/docs/ddi0596/h/simd-and-floating-point-instructions-alphabetic-order/bfmmla-bfloat16-floating-point-matrix-multiply-accumulate-into-2x2-matrix
So I set it to the AUTO_FP flag.

Bootstrapped and tested on aarch64 Linux platform.

Thanks,
Xie Zhiheng


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 75b62b590e2..8ca9746189a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2020-10-09  Zhiheng Xie  
+   Nannan Zheng  
+
+   * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
+   for mul/mla/mls intrinsics.
+


pr94442-v1.patch
Description: pr94442-v1.patch


Re: [committed][nvptx] Split up function ref plus const

2020-10-09 Thread Tom de Vries
On 10/9/20 11:03 AM, Thomas Schwinge wrote:
> Hi Tom!
> 
> On 2020-09-23T22:46:34+0200, Tom de Vries  wrote:
>> With test-case gcc.c-torture/compile/pr92231.c, we run into:
> 
> "Interesting" testcase...  ;-)
> 
>> ...
>> nvptx-as: ptxas terminated with signal 11 [Segmentation fault], core dumped^M
> 
> Confirmed with:
> 
> $ ptxas --version
> ptxas: NVIDIA (R) Ptx optimizing assembler
> Copyright (c) 2005-2014 NVIDIA Corporation
> Built on Thu_Jul_17_21:41:15_CDT_2014
> Cuda compilation tools, release 6.5, V6.5.12
> 
> ..., and:
> 
> $ ptxas --version
> ptxas: NVIDIA (R) Ptx optimizing assembler
> Copyright (c) 2005-2017 NVIDIA Corporation
> Built on Fri_Dec__1_00:57:38_CST_2017
> Cuda compilation tools, release 9.1, V9.1.108
> 
> Have you reported this to Nvidia?
> 

No, it's on my list to report, but I'm locked out from my nvidia account
for a while now, and nvidia remains unresponsive.

>> compiler exited with status 1
>> FAIL: gcc.c-torture/compile/pr92231.c   -O0  (test for excess errors)
>> ...
>> due to using a function reference plus constant as operand:
>> ...
>>   mov.u64 %r24,bar+4096';
>> ...
>>
>> Fix this by splitting such an insn into:
>> ...
>>   mov.u64 %r24,bar';
>>   add.u64 %r24,%r24,4096';
>> ...
> 
> (Spurious single-quote characters in PTX code?)
> 

Yeah, I think that was a copy-paste thing, the compiler DDRT.

Thanks,
- Tom


[PATCH 2/2] reset edge probibility and BB-count for peeled/unrolled loop

2020-10-09 Thread guojiufu via Gcc-patches
Hi,
PR68212 mentioned that the COUNT of unrolled loop was not correct, and
comments of this PR also mentioned that loop become 'cold'.

This patch fixes the wrong COUNT/PROB of unrolled loop.  And the
patch handles the case where unrolling in unreliable count number can
cause a loop to no longer look hot and therefor not get aligned.  This
patch scale by profile_probability::likely () if unrolled count gets
unrealistically small.

Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?

Jiufu Guo

gcc/ChangeLog:
2020-10-09  Jiufu Guo   
Pat Haugen  

PR rtl-optimization/68212
* cfgloopmanip.c (duplicate_loop_to_header_edge): Reset probablity
of unrolled/peeled loop.

testsuite/ChangeLog:
2020-10-09  Jiufu Guo   
Pat Haugen  
PR rtl-optimization/68212
* gcc.dg/pr68212.c: New test.


---
 gcc/cfgloopmanip.c | 31 +--
 gcc/testsuite/gcc.dg/pr68212.c | 13 +
 2 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr68212.c

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index b0ca82a67fd..d3c95498402 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1260,14 +1260,30 @@ duplicate_loop_to_header_edge (class loop *loop, edge e,
  /* If original loop is executed COUNT_IN times, the unrolled
 loop will account SCALE_MAIN_DEN times.  */
  scale_main = count_in.probability_in (scale_main_den);
+
+ /* If we are guessing at the number of iterations and count_in
+becomes unrealistically small, reset probability.  */
+ if (!(count_in.reliable_p () || loop->any_estimate))
+   {
+ profile_count new_count_in = count_in.apply_probability 
(scale_main);
+ profile_count preheader_count = loop_preheader_edge (loop)->count 
();
+ if (new_count_in.apply_scale (1, 10) < preheader_count)
+   scale_main = profile_probability::likely ();
+   }
+
  scale_act = scale_main * prob_pass_main;
}
   else
{
+ profile_count new_loop_count;
  profile_count preheader_count = e->count ();
- for (i = 0; i < ndupl; i++)
-   scale_main = scale_main * scale_step[i];
  scale_act = preheader_count.probability_in (count_in);
+ /* Compute final preheader count after peeling NDUPL copies.  */
+ for (i = 0; i < ndupl; i++)
+   preheader_count = preheader_count.apply_probability (scale_step[i]);
+ /* Subtract out exit(s) from peeled copies.  */
+ new_loop_count = count_in - (e->count () - preheader_count);
+ scale_main = new_loop_count.probability_in (count_in);
}
 }
 
@@ -1383,6 +1399,17 @@ duplicate_loop_to_header_edge (class loop *loop, edge e,
  scale_bbs_frequencies (new_bbs, n, scale_act);
  scale_act = scale_act * scale_step[j];
}
+
+  /* Need to update PROB of exit edge and corresponding COUNT.  */
+  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
+ && bbs_to_scale)
+   {
+ edge new_exit = new_spec_edges[SE_ORIG];
+ profile_count new_count = new_exit->src->count;
+ profile_count exit_count = loop_preheader_edge (loop)->count ();
+ profile_probability prob = exit_count.probability_in (new_count);
+ recompute_loop_frequencies (loop, prob);
+   }
 }
   free (new_bbs);
   free (orig_loops);
diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c
new file mode 100644
index 000..e0cf71d5202
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68212.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
max-unroll-times=4 -fdump-rtl-alignments" } */
+
+void foo(long int *a, long int *b, long int n)
+{
+  long int i;
+
+  for (i = 0; i < n; i++)
+a[i] = *b;
+}
+
+/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 
"alignments"} } */
+
-- 
2.25.1



[PATCH 1/2] correct BB frequencies after loop changed

2020-10-09 Thread guojiufu via Gcc-patches
When investigating the issue from 
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
I find the BB COUNTs of loop seems are not accurate in some case.
For example:

In below figure:


   COUNT:268435456  pre-header
|
|  ..
|  ||
V  v|
   COUNT:805306369|
   / \  |
   33%/   \ |
 / \|
v   v   |
COUNT:268435456  COUNT:536870911  | 
exit-edge |   latch |
  ._.

Those COUNTs have below equations:
COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of 
latch:536870911


While after pcom:

   COUNT:268435456  pre-header
|
|  ..
|  ||
V  v|
   COUNT:268435456|
   / \  |
   50%/   \ |
 / \|
v   v   |
COUNT:134217728  COUNT:134217728  | 
exit-edge |   latch |
  ._.

COUNT != COUNT + COUNT
COUNT != COUNT

In some cases, the probility of exit-edge is easy to estimate, then
those COUNTs of other BBs in loop can be re-caculated.

Bootstrap and regtest pass on ppc64le. Is this ok for trunk?

Jiufu

gcc/ChangeLog:
2020-10-09  Jiufu Guo   

* cfgloopmanip.h (recompute_loop_frequencies): New function.
* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
recompute_loop_frequencies.

---
 gcc/cfgloopmanip.c| 53 +++
 gcc/cfgloopmanip.h|  2 +-
 gcc/tree-ssa-loop-manip.c | 28 +++--
 3 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 73134a20e33..b0ca82a67fd 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify-me.h"
 #include "tree-ssa-loop-manip.h"
 #include "dumpfile.h"
+#include "cfgrtl.h"
 
 static void copy_loops_to (class loop **, int,
   class loop *);
@@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
 
   return nloop;
 }
+
+/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
+   is NEW_PROB.  */
+
+bool
+recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
+{
+  edge exit = single_exit (loop);
+  if (!exit)
+return false;
+
+  edge e;
+  edge_iterator ei;
+  edge non_exit;
+  basic_block * bbs;
+  profile_count exit_count = loop_preheader_edge (loop)->count ();
+  profile_probability exit_p = exit_count.probability_in (loop->header->count);
+  profile_count base_count = loop->header->count;
+  profile_count after_num = base_count.apply_probability (exit_p);
+  profile_count after_den = base_count.apply_probability (new_prob);
+
+  /* Update BB counts in loop body.
+ COUNT = COUNT
+ COUNT = COUNT * exit_edge_probility
+ The COUNT = COUNT * old_exit_p / new_prob.  */
+  bbs = get_loop_body (loop);
+  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
+after_den);
+  free (bbs);
+
+  /* Update probability and count of the BB besides exit edge (maybe latch).  
*/
+  FOR_EACH_EDGE (e, ei, exit->src->succs)
+if (e != exit)
+  break;
+  non_exit = e;
+
+  non_exit->probability = new_prob.invert ();
+  non_exit->dest->count = profile_count::zero ();
+  FOR_EACH_EDGE (e, ei, non_exit->dest->preds)
+non_exit->dest->count += e->src->count.apply_probability (e->probability);
+
+  /* Update probability and count of exit destination.  */
+  exit->probability = new_prob;
+  exit->dest->count = profile_count::zero ();
+  FOR_EACH_EDGE (e, ei, exit->dest->preds)
+exit->dest->count += e->src->count.apply_probability (e->probability);
+
+  if (current_ir_type () != IR_GIMPLE)
+update_br_prob_note (exit->src);
+
+  return true;
+}
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 7331e574e2f..d55bab17f65 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -62,5 +62,5 @@ class loop * loop_version (class loop *, void *,
basic_block *,
profile_probability, profile_probability,
 

[PATCH] IPA modref: fix miscompilation in clone when IPA modref is used

2020-10-09 Thread Martin Liška

Hello.

There's a tested Honza's patch for the 2 PRs related to SPEC benchmarks.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests with
one exception:

FAIL: gcc.dg/lto/modref-1 c_lto_modref-1_0.o-c_lto_modref-1_1.o execute -O2 
-flto-partition=max -flto

It's a run-time test that tests:
  if (!__builtin_constant_p (z))
__builtin_abort ();

I guess Honza can incrementally fix the test? I'm going to install the patch.

Thanks,
Martin

gcc/ChangeLog:

PR ipa/97292
PR ipa/97335
* ipa-modref-tree.h (copy_from): Drop summary in a
clone.
---
 gcc/ipa-modref-tree.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index b37280d18c7..8d7f2864793 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -496,7 +496,8 @@ struct GTY((user)) modref_tree
   /* Copy OTHER to THIS.  */
   void copy_from (modref_tree  *other)
   {
-merge (other, NULL);
+auto_vec  parm_map;
+merge (other, &parm_map);
   }
 
   /* Search BASE in tree; return NULL if failed.  */

--
2.28.0



[committed] libstdc++: Fix unused variable warning

2020-10-09 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_performance.h (report_header): Remove
unused variable.

Tested powerpc64le-linux. Committed to trunk.

commit afcbeb35e0b9fb0251d04362a1bd4031520ff7f8
Author: Jonathan Wakely 
Date:   Fri Oct 9 11:52:56 2020

libstdc++: Fix unused variable warning

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_performance.h (report_header): Remove
unused variable.

diff --git a/libstdc++-v3/testsuite/util/testsuite_performance.h 
b/libstdc++-v3/testsuite/util/testsuite_performance.h
index 8927a4129da..9b69f5f51f7 100644
--- a/libstdc++-v3/testsuite/util/testsuite_performance.h
+++ b/libstdc++-v3/testsuite/util/testsuite_performance.h
@@ -249,7 +249,6 @@ namespace __gnu_test
   void
   report_header(const std::string file, const std::string header)
   {
-const char space = ' ';
 const char tab = '\t';
 const char* name = "libstdc++-performance.sum";
 std::string::const_iterator i = file.begin() + file.find_last_of('/') + 1;


Re: [PATCH] IPA: merge profiles more sensitively

2020-10-09 Thread Martin Liška

On 10/7/20 1:56 PM, Martin Liška wrote:

So what do you suggest to fix it :P?


Note that we have a nice reduced test-case here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97295#c6

It's really a mixture of -O1 -fprofile-use and -O0.

Martin


Re: [r11-3723 Regression] FAIL: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized: basic block" 2 on Linux/x86_64

2020-10-09 Thread Richard Biener
On Thu, 8 Oct 2020, sunil.k.pandey wrote:

> On Linux/x86_64,
> 
> 532e882f8872b1b4437e3a0fa8c61d2af2d999d4 is the first bad commit
> commit 532e882f8872b1b4437e3a0fa8c61d2af2d999d4
> Author: Richard Biener 
> Date:   Thu Oct 8 11:53:51 2020 +0200
> 
> adjust BB vectorization dump scanning
> 
> caused
> 
> FAIL: gcc.dg/vect/bb-slp-pr78205.c -flto -ffat-lto-objects  
> scan-tree-dump-times slp2 "optimized: basic block" 3
> FAIL: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "optimized: 
> basic block" 3
> FAIL: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects  
> scan-tree-dump-times slp2 "optimized: basic block" 2
> FAIL: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized: 
> basic block" 2

I filed PR97351 and PR97352 for this, it's really pre-existing issues.

Richard.

> with GCC configured with
> 
> Configured with: ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3723/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr78205.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr78205.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-3.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-subgroups-3.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at skpgkp2 at gmail dot com)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


[PATCH] tree-optimization/97334 - improve BB SLP discovery

2020-10-09 Thread Richard Biener
We're running into a multiplication with one unvectorizable
operand we expect to build from scalars but SLP discovery
fatally fails the build of both since one stmt is commutated:

  _60 = _58 * _59;
  _63 = _59 * _62;
  _66 = _59 * _65;
...

where _59 is the "bad" operand.  The following patch makes the
case work where the first stmt has a good operand by not fatally
failing the SLP build for the operand but communicating upwards
how to commutate.

Bootstrapped / tested on x86_64-unknown-linux-gnu, pushed.

2020-10-09  Richard Biener  

PR tree-optimization/97334
* tree-vect-slp.c (vect_build_slp_tree_1): Do not fatally
fail lanes other than zero when BB vectorizing.

* gcc.dg/vect/bb-slp-pr65935.c: Amend.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c |  3 +++
 gcc/tree-vect-slp.c| 22 ++
 2 files changed, 25 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
index 4e3448eccd7..ea37e4e614c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
@@ -60,3 +60,6 @@ int main()
 /* We should also be able to use 2-lane SLP to initialize the real and
imaginary components in the first loop of main.  */
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 10 "slp1" } } */
+/* We should see the s->phase[dir] operand and only that operand built
+   from scalars.  See PR97334.  */
+/* { dg-final { scan-tree-dump-times "Building vector operands from scalars" 1 
"slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 479c3eeaec7..495fb970e24 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -773,6 +773,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: unvectorizable statement %G",
 stmt);
+ /* ???  For BB vectorization we want to commutate operands in a way
+to shuffle all unvectorizable defs into one operand and have
+the other still vectorized.  The following doesn't reliably
+work for this though but it's the easiest we can do here.  */
+ if (is_a  (vinfo) && i != 0)
+   continue;
  /* Fatal mismatch.  */
  matches[0] = false;
   return false;
@@ -785,6 +791,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: not GIMPLE_ASSIGN nor "
 "GIMPLE_CALL %G", stmt);
+ if (is_a  (vinfo) && i != 0)
+   continue;
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
@@ -797,6 +805,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
  && !vect_record_max_nunits (vinfo, stmt_info, group_size,
  nunits_vectype, max_nunits)))
{
+ if (is_a  (vinfo) && i != 0)
+   continue;
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
@@ -823,6 +833,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: unsupported call type %G",
 call_stmt);
+ if (is_a  (vinfo) && i != 0)
+   continue;
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
@@ -865,6 +877,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: no optab.\n");
+ if (is_a  (vinfo) && i != 0)
+   continue;
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
@@ -876,6 +890,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "Build SLP failed: "
 "op not supported by target.\n");
+ if (is_a  (vinfo) && i != 0)
+   continue;
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
@@ -900,6 +916,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
  if (TREE_CODE (vec) != SSA_NAME
  || !types_compatible_p (vectype, TREE_TYPE (vec)))
{
+ if (is_a  (vinfo) && i != 0)
+   continue;
  

[PATCH] Fixup gcc.dg/vect/pr65947-3.c when masked loads are available

2020-10-09 Thread Richard Biener
The following adds a effective target to properly allow
the gcc.dg/vect/pr65947-3.c expected vectorization to be adjusted
when run with, say, -march=cascadelake.

Tested on x86_64-unknown-linux-gnu, pushed.

2020-10-09  Richard Biener  

gcc/
* doc/sourcebuild.texi (vect_masked_load): Document.

gcc/testsuite
* lib/target-supports.exp (check_effective_target_vect_masked_load):
New effective target.
* gcc.dg/vect/pr65947-3.c: Update.
---
 gcc/doc/sourcebuild.texi  | 3 +++
 gcc/testsuite/gcc.dg/vect/pr65947-3.c | 9 +
 gcc/testsuite/lib/target-supports.exp | 8 
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index b625f1e9f68..49316a5d0ff 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1527,6 +1527,9 @@ optabs on vectors.
 Target supports fully-masked (also known as fully-predicated) loops,
 so that vector loops can handle partial as well as full vectors.
 
+@item vect_masked_load
+Target supports vector masked loads.
+
 @item vect_masked_store
 Target supports vector masked stores.
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-3.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
index 8a2608cf0f1..f1bfad65c22 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-3.c
@@ -51,9 +51,10 @@ main (void)
   return 0;
 }
 
-/* XFAILed because of the fix for PR97307 which sinks the load of a[i], 
preventing
-   if-conversion to happen.  */
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail *-*-* 
} } } */
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
+/* Since the fix for PR97307 which sinks the load of a[i], preventing
+   if-conversion to happen, targets that cannot do masked loads only
+   vectorize the inline copy.  */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target 
vect_masked_load } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { ! 
vect_masked_load } } } } */
 /* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer 
induction." "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 15f0649f8ae..ecf8be3e567 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7194,6 +7194,14 @@ proc check_effective_target_vect_load_lanes { } {
 || [istarget aarch64*-*-*] }}]
 }
 
+# Return 1 if the target supports vector masked loads.
+
+proc check_effective_target_vect_masked_load { } {
+return [expr { [check_avx_available]
+  || [check_effective_target_aarch64_sve]
+  || [istarget amdgcn*-*-*] } ]
+}
+
 # Return 1 if the target supports vector masked stores.
 
 proc check_effective_target_vect_masked_store { } {
-- 
2.26.2


Re: [ Preprocessor ] [ Common ] Feature: Macros for identifying the wide and narrow execution string literal encoding

2020-10-09 Thread Bernhard Reutner-Fischer via Gcc-patches
On 8 October 2020 23:39:15 CEST, JeanHeyd Meneide via Gcc-patches 
 wrote:
>Dear Joseph,
>
>On Thu, Oct 8, 2020 at 1:36 PM Joseph Myers 
>wrote:
>>
>> This documentation doesn't seem sufficient to use the macros.  Do
>they
>> expand to (narrow) string literals?  To an unquoted sequence of
>> characters?  I think from the implementation that the answer is
>strings
>> (so, in particular, not usable for testing anything in #if
>conditionals),
>> but the documentation ought to say so.  The test ought to verify the
>form
>> of the expansion as well (even if it can't do anything useful at
>execution
>> time, because if you make the macros reflect the command-line options
>they
>> are character set names that are meaningful on the host, and any
>> conversion functionality on the target may not use the same names as
>the
>> host).
>
> You're right; sorry about that, I should have been more thorough!
>I thought about adding a test to check the name itself (e.g, for
>"UTF-8"), but that might make tests fail on platforms where the
>default SOURCE_CHARSET from the dev files is not, in fact, UTF-8. I
>could also try to pass some options but then I'd have to guarantee
>that the encoding was available on all testable platforms, too...!
>
>In the end, for the tests, I just initialize two "const char[]"
>directly from the macro expansions to make sure we are getting
>strings. It seems to work okay. Attached is the revised patch with
>better docs and test!

Typo:  comple-time

>2020-10-08  JeanHeyd "ThePhD" Meneide  
>
>* gcc/c-family/c-cppbuiltin.c: Add predefined macro
>definitions for charsets

I think you should put the macro names in braces after the filename and drop 
the trailing "for charsets".

>* gcc/doc/cpp.texi: Document new predefined macro.
>* gcc/testsuite/c-c++-common/cpp/wide-narrow-predef-macros.c (new):

I think you should drop "(new)" above.
thanks,

>  New test for macro definitions to always exist.
>* libcpp/include/cpplib.h: Add functions declarations for
>  retrieving charset names
>* libcpp/directives.c: Add function definitions to retrieve charset
>  names.
>* libcpp/internal.h: Add to/from name preservations


[committed][nvptx] Set -misa=sm_35 by default

2020-10-09 Thread Tom de Vries
Hi,

The nvptx-as assembler verifies the ptx code using ptxas, if there's any
in the PATH.

The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
latest cuda release (11.1) no longer supports sm_30.

Consequently we cannot build gcc against that release (although we should
still be able to build without any cuda release).

Fix this by setting -misa=sm_35 by default.

Tested check-gcc on nvptx.

Tested libgomp on x86_64-linux with nvpx accelerator.

Both build against cuda 9.1.

Committed to trunk.

Thanks,
- Tom

[nvptx] Set -misa=sm_35 by default

gcc/ChangeLog:

2020-10-09  Tom de Vries  

PR target/97348
* config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
default is used.
* config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.

---
 gcc/config/nvptx/nvptx.h   | 5 -
 gcc/config/nvptx/nvptx.opt | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 6ebcc760771..17fe157058c 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,7 +29,10 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-#define ASM_SPEC "%{misa=*:-m %*}"
+/* Default needs to be in sync with default for misa in nvptx.opt.
+   We add a default here to work around a hard-coded sm_30 default in
+   nvptx-as.  */
+#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
 
 #define TARGET_CPU_CPP_BUILTINS()  \
   do   \
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 75c3d54864e..d6910a96cf0 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -59,6 +59,7 @@ Enum(ptx_isa) String(sm_30) Value(PTX_ISA_SM30)
 EnumValue
 Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)
 
+; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
 misa=
-Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM30)
+Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_SM35)
 Specify the version of the ptx ISA to use.


Re: [committed][nvptx] Set -misa=sm_35 by default

2020-10-09 Thread Tobias Burnus

Hi,

On 10/9/20 1:56 PM, Tom de Vries wrote:

The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
latest cuda release (11.1) no longer supports sm_30.


Interestingly, at
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes__ptx-release-history
they still claim to support everything down to sm_10. (They talk about
supported targets, but still.)


Fix this by setting -misa=sm_35 by default.


Can you update the release notes?

The other question is whether and, if so, how we want to add support for
newer PTX ISA versions than 3.1 = CUDA 5.0. In terms of PTX ISA itself,
moving to 6.3 would be a great step forward but requires at least CUDA
10. Hence, it could either a bump of the minimal CUDA version or to have
some way to specify the PTX version. Thoughty?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] Refactor range handling of builtins in vr_values and ranger.

2020-10-09 Thread Aldy Hernandez via Gcc-patches
Hi Jakub.

As the last known expert in this area, would you review this, please? :)

This sets things up so we can share range handling of builtins between
vr_values and ranger.  It is meant to refactor the code so that we can
verify that both implementations yield the same results.

First, we abstract out gimple_ranger::range_of_builtin_call into an externally
visible counterpart that can be called from vr_values.  It will take a
range_query since both ranger and vr_values inherit from this base class.

Then we abstract out all the builtin handling in vr_values into a separate
method that is easier to compare against.

Finally, we call the ranger version from vr_values and compare it with the
vr_values version.  Since this proves both versions return the same,
we can remove vr_values::extract_range_builtin in a follow-up patch.

The vr_values::range_of_expr change brings the vr_values version up to par
with the ranger version.  It should've handled non-SSA's.  This was
a small oversight that went unnoticed because the vr_value version isn't
stressed nearly as much as the ranger version.  The change is needed because
the ranger code handling builtins calls, may call it for integer arguments
in range_of_builtin_ubsan_call.

There should be no change in functionality.

Tested on x86_64, with aarch64 tests still going.

OK provided aarch64 tests finish this century?

gcc/ChangeLog:

* gimple-range.cc (gimple_ranger::range_of_builtin_ubsan_call):
Make externally visble...
(range_of_builtin_ubsan_call): ...here.  Add range_query argument.
(gimple_ranger::range_of_builtin_call): Make externally visible...
(range_of_builtin_call): ...here.  Add range_query argument.
* gimple-range.h (range_of_builtin_call): Move out from class and
make externally visible.
* vr-values.c (vr_values::extract_range_basic): Abstract out
builtin handling to...
(vr_values::range_of_expr): Handle non SSAs.
(vr_values::extract_range_builtin): ...here.
* vr-values.h (class vr_values): Add extract_range_builtin.
(range_of_expr): Rename NAME to EXPR.
---
 gcc/gimple-range.cc |  36 ++--
 gcc/gimple-range.h  |   4 +-
 gcc/vr-values.c | 508 +++-
 gcc/vr-values.h |   3 +-
 4 files changed, 293 insertions(+), 258 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 2ca86ed0e4c..a72919fc6c5 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -546,10 +546,13 @@ gimple_ranger::range_of_call (irange &r, gcall *call)
   return true;
 }
 
+// Return the range of a __builtin_ubsan* in CALL and set it in R.
+// CODE is the type of ubsan call (PLUS_EXPR, MINUS_EXPR or
+// MULT_EXPR).
 
-void
-gimple_ranger::range_of_builtin_ubsan_call (irange &r, gcall *call,
-   tree_code code)
+static void
+range_of_builtin_ubsan_call (range_query &query, irange &r, gcall *call,
+tree_code code)
 {
   gcc_checking_assert (code == PLUS_EXPR || code == MINUS_EXPR
   || code == MULT_EXPR);
@@ -559,8 +562,8 @@ gimple_ranger::range_of_builtin_ubsan_call (irange &r, 
gcall *call,
   int_range_max ir0, ir1;
   tree arg0 = gimple_call_arg (call, 0);
   tree arg1 = gimple_call_arg (call, 1);
-  gcc_assert (range_of_expr (ir0, arg0, call));
-  gcc_assert (range_of_expr (ir1, arg1, call));
+  gcc_assert (query.range_of_expr (ir0, arg0, call));
+  gcc_assert (query.range_of_expr (ir1, arg1, call));
 
   bool saved_flag_wrapv = flag_wrapv;
   // Pretend the arithmetic is wrapping.  If there is any overflow,
@@ -576,9 +579,11 @@ gimple_ranger::range_of_builtin_ubsan_call (irange &r, 
gcall *call,
 r.set_varying (type);
 }
 
+// For a builtin in CALL, return a range in R if known and return
+// TRUE.  Otherwise return FALSE.
 
 bool
-gimple_ranger::range_of_builtin_call (irange &r, gcall *call)
+range_of_builtin_call (range_query &query, irange &r, gcall *call)
 {
   combined_fn func = gimple_call_combined_fn (call);
   if (func == CFN_LAST)
@@ -599,7 +604,7 @@ gimple_ranger::range_of_builtin_call (irange &r, gcall 
*call)
  return true;
}
   arg = gimple_call_arg (call, 0);
-  if (range_of_expr (r, arg, call) && r.singleton_p ())
+  if (query.range_of_expr (r, arg, call) && r.singleton_p ())
{
  r.set (build_one_cst (type), build_one_cst (type));
  return true;
@@ -613,7 +618,7 @@ gimple_ranger::range_of_builtin_call (irange &r, gcall 
*call)
   prec = TYPE_PRECISION (TREE_TYPE (arg));
   mini = 0;
   maxi = prec;
-  gcc_assert (range_of_expr (r, arg, call));
+  gcc_assert (query.range_of_expr (r, arg, call));
   // If arg is non-zero, then ffs or popcount are non-zero.
   if (!range_includes_zero_p (&r))
mini = 1;
@@ -657,7 +662,7 @@ gimple_ranger::range_of_builtin_call (irange &r, gcall 
*call)
}
}
 
-   

Re: [committed][nvptx] Set -misa=sm_35 by default

2020-10-09 Thread Tom de Vries
On 10/9/20 2:19 PM, Tobias Burnus wrote:
> Hi,
> 
> On 10/9/20 1:56 PM, Tom de Vries wrote:
>> The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas
>> of the
>> latest cuda release (11.1) no longer supports sm_30.
> 
> Interestingly, at
> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes__ptx-release-history
> 
> they still claim to support everything down to sm_10. (They talk about
> supported targets, but still.)
>

Hi,

ha, funny.  Well, ptxas is pretty convinced it doesn't want to support it ;)

>> Fix this by setting -misa=sm_35 by default.
> 
> Can you update the release notes?
> 
> The other question is whether and, if so, how we want to add support for
> newer PTX ISA versions than 3.1 = CUDA 5.0. In terms of PTX ISA itself,
> moving to 6.3 would be a great step forward but requires at least CUDA
> 10. Hence, it could either a bump of the minimal CUDA version or to have
> some way to specify the PTX version. Thoughty?

A PR is open for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96005.

FWIW, I'm not planning to work on this short term, my plate is pretty full.

Thanks,
- Tom



Re: [ Preprocessor ] [ Common ] Feature: Macros for identifying the wide and narrow execution string literal encoding

2020-10-09 Thread JeanHeyd Meneide via Gcc-patches
Hello,

> Typo:  comple-time
>
> >2020-10-08  JeanHeyd "ThePhD" Meneide  
> >
> >* gcc/c-family/c-cppbuiltin.c: Add predefined macro
> >definitions for charsets
>
> I think you should put the macro names in braces after the filename and drop 
> the trailing "for charsets".

 Can do!

>
> >* gcc/doc/cpp.texi: Document new predefined macro.
> >* gcc/testsuite/c-c++-common/cpp/wide-narrow-predef-macros.c (new):
>
> I think you should drop "(new)" above.
> thanks,

 I saw that in previous changelogs, but I can change it! Fixed up
the typos, too.

Sincerely,
JeanHeyd

2020-10-09  JeanHeyd "ThePhD" Meneide  

* gcc/c-family/c-cppbuiltin.c: Add predefined
  {__GNUC_EXECUTION_CHARSET_NAME} and
  {__GNUC_WIDE_EXECUTION_CHARSET_NAME} macros
* gcc/doc/cpp.texi: Document above new predefined macros
* gcc/testsuite/c-c++-common/cpp/wide-narrow-predef-macros.c:
  New test for macro definitions to always exist and be strings
* libcpp/include/cpplib.h: Add functions declarations for
  retrieving charset names
* libcpp/directives.c: Add function definitions to retrieve charset
  names
* libcpp/internal.h: Add to/from name preservations
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 74ecca8de8e..8de25786592 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -866,6 +866,13 @@ c_cpp_builtins (cpp_reader *pfile)
 
   define_language_independent_builtin_macros (pfile);
 
+  /* encoding definitions used by users and libraries  */
+  builtin_define_with_value ("__GNUC_EXECUTION_CHARSET_NAME",
+cpp_get_narrow_charset_name (pfile), 1);
+  builtin_define_with_value ("__GNUC_WIDE_EXECUTION_CHARSET_NAME",
+cpp_get_wide_charset_name (pfile), 1);
+
+
   if (c_dialect_cxx ())
   {
 int major;
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 33f876ab706..90f1162add1 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -2451,6 +2451,15 @@ features are supported by GCC.
 @item __NO_MATH_ERRNO__
 This macro is defined if @option{-fno-math-errno} is used, or enabled
 by another option such as @option{-ffast-math} or by default.
+
+@item __GNUC_EXECUTION_CHARSET_NAME
+@itemx __GNUC_WIDE_EXECUTION_CHARSET_NAME
+These macros are defined to expand to a narrow string literal of
+the name of the narrow and wide compile-time execution character
+set used.  It directly reflects the name passed to the options
+@option{-fexec-charset} and @option{-fwide-exec-charset}, or the defaults
+documented for those options (that is, it can expand to something like 
+@code{"UTF-8"}).  @xref{Invocation}.
 @end table
 
 @node System-specific Predefined Macros
diff --git a/gcc/testsuite/c-c++-common/cpp/wide-narrow-predef-macros.c 
b/gcc/testsuite/c-c++-common/cpp/wide-narrow-predef-macros.c
new file mode 100644
index 000..d5440f8a61e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/wide-narrow-predef-macros.c
@@ -0,0 +1,13 @@
+/*
+   { dg-do compile }
+ */
+
+#if !defined(__GNUC_EXECUTION_CHARSET_NAME)
+#error "Required implementation macro for compile-time charset name is not 
present"
+#endif
+#if !defined(__GNUC_WIDE_EXECUTION_CHARSET_NAME)
+#error "Required implementation macro for wide compile-time charset name is 
not present"
+#endif
+
+const char narrow_name[] = __GNUC_EXECUTION_CHARSET_NAME;
+const char wide_name[] = __GNUC_WIDE_EXECUTION_CHARSET_NAME;
diff --git a/libcpp/charset.c b/libcpp/charset.c
index 28b81c9c864..3e5578b1390 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -638,6 +638,9 @@ init_iconv_desc (cpp_reader *pfile, const char *to, const 
char *from)
   char *pair;
   size_t i;
 
+  ret.to = to;
+  ret.from = from;
+
   if (!strcasecmp (to, from))
 {
   ret.func = convert_no_conversion;
diff --git a/libcpp/directives.c b/libcpp/directives.c
index f59718708e4..ad540872581 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -2571,6 +2571,20 @@ cpp_set_callbacks (cpp_reader *pfile, cpp_callbacks *cb)
   pfile->cb = *cb;
 }
 
+/* The narrow character set identifier.  */
+const char *
+cpp_get_narrow_charset_name (cpp_reader *pfile)
+{
+  return pfile->narrow_cset_desc.to;
+}
+
+/* The wide character set identifier.  */
+const char *
+cpp_get_wide_charset_name (cpp_reader *pfile)
+{
+  return pfile->wide_cset_desc.to;
+}
+
 /* The dependencies structure.  (Creates one if it hasn't already been.)  */
 class mkdeps *
 cpp_get_deps (cpp_reader *pfile)
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 8e398863cf6..69a5042d0bf 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -971,6 +971,11 @@ extern cpp_callbacks *cpp_get_callbacks (cpp_reader *) 
ATTRIBUTE_PURE;
 extern void cpp_set_callbacks (cpp_reader *, cpp_callbacks *);
 extern class mkdeps *cpp_get_deps (cpp_reader *) ATTRIBUTE_PURE;
 
+/* Call these to get name data about the various compile-time
+   charsets.  */
+extern const c

[committed] libstdc++: Pass CXXFLAGS to check_performance script

2020-10-09 Thread Jonathan Wakely via Gcc-patches
It looks like our check-performance target runs completely unoptimized,
which is a bit silly. This exports the CXXFLAGS from the parent make
process to the check_performance script.

libstdc++-v3/ChangeLog:

* scripts/check_performance: Use gnu++11 instead of gnu++0x.
* testsuite/Makefile.am (check-performance): Export CXXFLAGS to
child process.
* testsuite/Makefile.in: Regenerate.

Tested powerpc64le-linux. Committed to trunk.

commit 7e7eef2a1bb45c61ee26936ccaab7159dcceca94
Author: Jonathan Wakely 
Date:   Fri Oct 9 13:59:27 2020

libstdc++: Pass CXXFLAGS to check_performance script

It looks like our check-performance target runs completely unoptimized,
which is a bit silly. This exports the CXXFLAGS from the parent make
process to the check_performance script.

libstdc++-v3/ChangeLog:

* scripts/check_performance: Use gnu++11 instead of gnu++0x.
* testsuite/Makefile.am (check-performance): Export CXXFLAGS to
child process.
* testsuite/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/scripts/check_performance 
b/libstdc++-v3/scripts/check_performance
index 3fa927480c9..cde3874741c 100755
--- a/libstdc++-v3/scripts/check_performance
+++ b/libstdc++-v3/scripts/check_performance
@@ -32,7 +32,7 @@ SH_FLAG="-Wl,--rpath -Wl,$BUILD_DIR/../../gcc \
  -Wl,--rpath -Wl,$BUILD_DIR/src/.libs"
 ST_FLAG="-static"
 LINK=$SH_FLAG
-CXX="$COMPILER $INCLUDES $FLAGS -std=gnu++0x $CXXFLAGS $LINK"
+CXX="$COMPILER $INCLUDES $FLAGS -std=gnu++11 $CXXFLAGS $LINK"
 LIBS="./libtestc++.a"
 TESTS_FILE="testsuite_files_performance"
 
diff --git a/libstdc++-v3/testsuite/Makefile.am 
b/libstdc++-v3/testsuite/Makefile.am
index 9cef1e65e1b..2fca179fca4 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -182,6 +182,7 @@ check-compile: testsuite_files ${compile_script}
 check_performance_script=${glibcxx_srcdir}/scripts/check_performance
 check-performance: testsuite_files_performance ${performance_script}
-@(chmod + ${check_performance_script}; \
+ export CXXFLAGS="$(CXXFLAGS)"; \
  ${check_performance_script} ${glibcxx_srcdir} ${glibcxx_builddir})
 
 # Runs the testsuite in debug mode.


[committed] libstdc++: Add performance test for

2020-10-09 Thread Jonathan Wakely via Gcc-patches
This tests std::uniform_int_distribution with various parameters and
engines.

libstdc++-v3/ChangeLog:

* testsuite/performance/26_numerics/random_dist.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit f9919ba717dfaf6018b7e625bebc84a461477b52
Author: Jonathan Wakely 
Date:   Fri Oct 9 12:07:36 2020

libstdc++: Add performance test for 

This tests std::uniform_int_distribution with various parameters and
engines.

libstdc++-v3/ChangeLog:

* testsuite/performance/26_numerics/random_dist.cc: New test.

diff --git a/libstdc++-v3/testsuite/performance/26_numerics/random_dist.cc 
b/libstdc++-v3/testsuite/performance/26_numerics/random_dist.cc
new file mode 100644
index 000..673992d1949
--- /dev/null
+++ b/libstdc++-v3/testsuite/performance/26_numerics/random_dist.cc
@@ -0,0 +1,102 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+
+#include 
+#include 
+
+namespace counters
+{
+  __gnu_test::time_counter time;
+  __gnu_test::resource_counter resource;
+}
+
+
+template
+void do_fill_with_uniform_ints(std::string name, Dist& d, Engine& e)
+{
+  using counters::time;
+  using counters::resource;
+
+  std::vector r;
+  int n = 1000;
+  {
+const auto suffix = "-e" + std::to_string((int)std::log10(n));
+r.resize(n);
+
+start_counters(time, resource);
+for (auto& x : r)
+  x = d(e);
+stop_counters(time, resource);
+report_performance(__FILE__, name+suffix, time, resource);
+clear_counters(time, resource);
+
+d.reset();
+
+start_counters(time, resource);
+d.__generate(begin(r), end(r), e);
+stop_counters(time, resource);
+report_performance(__FILE__, name+"-range"+suffix, time, resource);
+clear_counters(time, resource);
+  }
+}
+
+template
+void fill_with_uniform_ints(std::string name, Engine& e)
+{
+  using Dist = std::uniform_int_distribution;
+  using param_type = typename Dist::param_type;
+
+  unsigned maxima[]{6, 10, 32, 100, 1000, 1024, (1<<16)-1, 1<<16, 1<<20, -1u};
+  for (auto hi : maxima)
+  {
+Dist dist(param_type{0, hi});
+std::ostringstream s;
+s << name << "-uniform_int-" << (dist.max() - dist.min());
+do_fill_with_uniform_ints(s.str(), dist, e);
+  }
+}
+
+int main()
+{
+  using namespace std;
+
+  std::mt19937 mt;
+  fill_with_uniform_ints("mt19937", mt);
+  std::mt19937_64 mt64;
+  fill_with_uniform_ints("mt19937_64", mt64);
+
+  // Same as std::mt19937 but using uint32_t not uint_fast32_t for result_type
+  using mt19937_32 = std::mersenne_twister_engine;
+  mt19937_32 mt32;
+  fill_with_uniform_ints("mt19937_32", mt32);
+
+  std::minstd_rand0 lcg;
+  fill_with_uniform_ints("minstd_rand0", lcg);
+
+  // Same as std::minstd_rand0 but using uint32_t not uint_fast32_t
+  using minstd_rand0_32 = std::linear_congruential_engine;
+  minstd_rand0_32 lcg_32;
+  fill_with_uniform_ints("minstd_rand0_32", lcg_32);
+
+  return 0;
+}
+


Re: [committed] libstdc++: Pass CXXFLAGS to check_performance script

2020-10-09 Thread Jonathan Wakely via Gcc-patches

On 09/10/20 14:02 +0100, Jonathan Wakely wrote:

It looks like our check-performance target runs completely unoptimized,
which is a bit silly. This exports the CXXFLAGS from the parent make
process to the check_performance script.

libstdc++-v3/ChangeLog:

* scripts/check_performance: Use gnu++11 instead of gnu++0x.
* testsuite/Makefile.am (check-performance): Export CXXFLAGS to
child process.
* testsuite/Makefile.in: Regenerate.


A small adjustment to that last patch.

Tested powerpc64le-linux. Committed to trunk.



commit 6ce2cb116af6e0965ff0dd69e7fd1925cf5dc68c
Author: Jonathan Wakely 
Date:   Fri Oct 9 14:07:22 2020

libstdc++: Adjust variable export in makefile

We usually export variables in recipes this way. I'm not sure it's
necessary, but it's consistent.

libstdc++-v3/ChangeLog:

* testsuite/Makefile.am: Set and export variable separately.
* testsuite/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am
index 2fca179fca4..7b412411bfe 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -182,7 +182,7 @@ check-compile: testsuite_files ${compile_script}
 check_performance_script=${glibcxx_srcdir}/scripts/check_performance
 check-performance: testsuite_files_performance ${performance_script}
 	-@(chmod + ${check_performance_script}; \
-	  export CXXFLAGS="$(CXXFLAGS)"; \
+	  CXXFLAGS='$(CXXFLAGS)'; export CXXFLAGS; \
 	  ${check_performance_script} ${glibcxx_srcdir} ${glibcxx_builddir})
 
 # Runs the testsuite in debug mode.


[PATCH] x86-64: Check CMPXCHG16B for x86-64-v[234]

2020-10-09 Thread H.J. Lu via Gcc-patches
x86-64-v2 includes CMPXCHG16B.  Since -mcx16 enables CMPXCHG16B and
defines __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16, check it in x86-64-v[234]
tests.

PR target/97250
* gcc.target/i386/x86-64-v2.c: Verify that
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 is defined.
* gcc.target/i386/x86-64-v3.c: Likewise.
* gcc.target/i386/x86-64-v4.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/x86-64-v2.c | 3 +++
 gcc/testsuite/gcc.target/i386/x86-64-v3.c | 3 +++
 gcc/testsuite/gcc.target/i386/x86-64-v4.c | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/x86-64-v2.c 
b/gcc/testsuite/gcc.target/i386/x86-64-v2.c
index 0f3df3605b5..f17a15de9b6 100644
--- a/gcc/testsuite/gcc.target/i386/x86-64-v2.c
+++ b/gcc/testsuite/gcc.target/i386/x86-64-v2.c
@@ -12,6 +12,9 @@
 #ifndef __SSE2__
 # error __SSE2__ not defined
 #endif
+#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+# error __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 not defined
+#endif
 #ifndef __LAHF_SAHF__
 # error __LAHF_SAHF__ not defined
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/x86-64-v3.c 
b/gcc/testsuite/gcc.target/i386/x86-64-v3.c
index 16a94b18021..784202fb26f 100644
--- a/gcc/testsuite/gcc.target/i386/x86-64-v3.c
+++ b/gcc/testsuite/gcc.target/i386/x86-64-v3.c
@@ -12,6 +12,9 @@
 #ifndef __SSE2__
 # error __SSE2__ not defined
 #endif
+#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+# error __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 not defined
+#endif
 #ifndef __LAHF_SAHF__
 # error __LAHF_SAHF__ not defined
 #endif
diff --git a/gcc/testsuite/gcc.target/i386/x86-64-v4.c 
b/gcc/testsuite/gcc.target/i386/x86-64-v4.c
index 48e928c2955..7c202a42068 100644
--- a/gcc/testsuite/gcc.target/i386/x86-64-v4.c
+++ b/gcc/testsuite/gcc.target/i386/x86-64-v4.c
@@ -12,6 +12,9 @@
 #ifndef __SSE2__
 # error __SSE2__ not defined
 #endif
+#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+# error __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 not defined
+#endif
 #ifndef __LAHF_SAHF__
 # error __LAHF_SAHF__ not defined
 #endif
-- 
2.26.2



Re: [PATCH, libstdc++] Improve the performance of std::uniform_int_distribution (fewer divisions)

2020-10-09 Thread Jonathan Wakely via Gcc-patches

On 06/10/20 15:55 -0400, Daniel Lemire via Libstdc++ wrote:

The updated patch looks good to me. It is indeed cleaner to have a separate
(static) function.

It might be nice to add a comment to explain the _S_nd function maybe with
a comment like "returns a random value in [0,__range)
without any bias" (or something to that effect).

Otherwise, it is algorithmically correct.


Here's what I've just committed and pushed to the master branch.

As expected, this shows significant improvements for some (but not
all) of the cases in the new test I added earlier today,
testsuite/performance/26_numerics/random_dist.cc

Thanks again for the patch.


commit 98c37d3bacbb2f8bbbe56ed53a9547d3be01b66b
Author: Daniel Lemire 
Date:   Fri Oct 9 14:09:36 2020

libstdc++: Optimize uniform_int_distribution using Lemire's algorithm

Co-authored-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/uniform_int_dist.h (uniform_int_distribution::_S_nd):
New member function implementing Lemire's "nearly divisionless"
algorithm.
(uniform_int_distribution::operator()): Use _S_nd when the range
of the URBG is the full width of the result type.

diff --git a/libstdc++-v3/include/bits/uniform_int_dist.h b/libstdc++-v3/include/bits/uniform_int_dist.h
index 6e1e3d5fc5f..ecb8574864a 100644
--- a/libstdc++-v3/include/bits/uniform_int_dist.h
+++ b/libstdc++-v3/include/bits/uniform_int_dist.h
@@ -234,6 +234,34 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 			const param_type& __p);
 
   param_type _M_param;
+
+  // Lemire's nearly divisionless algorithm.
+  // Returns an unbiased random number from __g downscaled to [0,__range)
+  // using an unsigned type _Wp twice as wide as unsigned type _Up.
+  template
+	static _Up
+	_S_nd(_Urbg& __g, _Up __range)
+	{
+	  using __gnu_cxx::__int_traits;
+	  static_assert(!__int_traits<_Up>::__is_signed, "U must be unsigned");
+	  static_assert(!__int_traits<_Wp>::__is_signed, "W must be unsigned");
+
+	  // reference: Fast Random Integer Generation in an Interval
+	  // ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
+	  // https://arxiv.org/abs/1805.10941
+	  _Wp __product = _Wp(__g()) * _Wp(__range);
+	  _Up __low = _Up(__product);
+	  if (__low < __range)
+	{
+	  _Up __threshold = -__range % __range;
+	  while (__low < __threshold)
+		{
+		  __product = _Wp(__g()) * _Wp(__range);
+		  __low = _Up(__product);
+		}
+	}
+	  return __product >> __gnu_cxx::__int_traits<_Up>::__digits;
+	}
 };
 
   template
@@ -256,17 +284,36 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  = __uctype(__param.b()) - __uctype(__param.a());
 
 	__uctype __ret;
-
 	if (__urngrange > __urange)
 	  {
 	// downscaling
+
 	const __uctype __uerange = __urange + 1; // __urange can be zero
-	const __uctype __scaling = __urngrange / __uerange;
-	const __uctype __past = __uerange * __scaling;
-	do
-	  __ret = __uctype(__urng()) - __urngmin;
-	while (__ret >= __past);
-	__ret /= __scaling;
+
+	using __gnu_cxx::__int_traits;
+#if __SIZEOF_INT128__
+	if (__int_traits<__uctype>::__digits == 64
+		&& __urngrange == __int_traits<__uctype>::__max)
+	  {
+		__ret = _S_nd(__urng, __uerange);
+	  }
+	else
+#endif
+	if (__int_traits<__uctype>::__digits == 32
+		&& __urngrange == __int_traits<__uctype>::__max)
+	  {
+		__ret = _S_nd<__UINT64_TYPE__>(__urng, __uerange);
+	  }
+	else
+	  {
+		// fallback case (2 divisions)
+		const __uctype __scaling = __urngrange / __uerange;
+		const __uctype __past = __uerange * __scaling;
+		do
+		  __ret = __uctype(__urng()) - __urngmin;
+		while (__ret >= __past);
+		__ret /= __scaling;
+	  }
 	  }
 	else if (__urngrange < __urange)
 	  {


[RFC][gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p

2020-10-09 Thread Tom de Vries
Hi,

The function gimple_can_duplicate_bb_p currently always returns true.

The presence of can_duplicate_bb_p in tracer.c however suggests that
there are cases when bb's indeed cannot be duplicated.

Move the implementation of can_duplicate_bb_p to gimple_can_duplicate_bb_p.

Bootstrapped and reg-tested on x86_64-linux.

Build x86_64-linux with nvptx accelerator and tested libgomp.

No issues found.

As corner-case check, bootstrapped and reg-tested a patch that makes
gimple_can_duplicate_bb_p always return false, resulting in
PR97333 - "[gimple_can_duplicate_bb_p == false, tree-ssa-threadupdate]
ICE in duplicate_block, at cfghooks.c:1093".

Any comments?

Thanks,
- Tom

[gimple] Move can_duplicate_bb_p to gimple_can_duplicate_bb_p

gcc/ChangeLog:

2020-10-09  Tom de Vries  

* tracer.c (cached_can_duplicate_bb_p): Use can_duplicate_block_p
instead of can_duplicate_bb_p.
(can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p): Move ...
* tree-cfg.c: ... here.
* tracer.c (can_duplicate_bb_p): Move ...
* tree-cfg.c (gimple_can_duplicate_bb_p): here.
* tree-cfg.h (can_duplicate_insn_p, can_duplicate_bb_no_insn_iter_p):
Declare.

---
 gcc/tracer.c   | 61 +-
 gcc/tree-cfg.c | 54 ++-
 gcc/tree-cfg.h |  2 ++
 3 files changed, 56 insertions(+), 61 deletions(-)

diff --git a/gcc/tracer.c b/gcc/tracer.c
index e1c2b9527e5..16b46c65b14 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -84,65 +84,6 @@ bb_seen_p (basic_block bb)
   return bitmap_bit_p (bb_seen, bb->index);
 }
 
-/* Return true if gimple stmt G can be duplicated.  */
-static bool
-can_duplicate_insn_p (gimple *g)
-{
-  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
- duplicated as part of its group, or not at all.
- The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
- group, so the same holds there.  */
-  if (is_gimple_call (g)
-  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
-return false;
-
-  return true;
-}
-
-/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
-static bool
-can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
-{
-  if (bb->index < NUM_FIXED_BLOCKS)
-return false;
-
-  if (gimple *g = last_stmt (CONST_CAST_BB (bb)))
-{
-  /* A transaction is a single entry multiple exit region.  It
-must be duplicated in its entirety or not at all.  */
-  if (gimple_code (g) == GIMPLE_TRANSACTION)
-   return false;
-
-  /* An IFN_UNIQUE call must be duplicated as part of its group,
-or not at all.  */
-  if (is_gimple_call (g)
- && gimple_call_internal_p (g)
- && gimple_call_internal_unique_p (g))
-   return false;
-}
-
-  return true;
-}
-
-/* Return true if BB can be duplicated.  */
-static bool
-can_duplicate_bb_p (const_basic_block bb)
-{
-  if (!can_duplicate_bb_no_insn_iter_p (bb))
-return false;
-
-  for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
-   !gsi_end_p (gsi); gsi_next (&gsi))
-if (!can_duplicate_insn_p (gsi_stmt (gsi)))
-  return false;
-
-  return true;
-}
-
 static sbitmap can_duplicate_bb;
 
 /* Cache VAL as value of can_duplicate_bb_p for BB.  */
@@ -167,7 +108,7 @@ cached_can_duplicate_bb_p (const_basic_block bb)
   return false;
 }
 
-  return can_duplicate_bb_p (bb);
+  return can_duplicate_block_p (bb);
 }
 
 /* Return true if we should ignore the basic block for purposes of tracing.  */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5caf3b62d69..a5677859ffc 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6208,11 +6208,63 @@ gimple_split_block_before_cond_jump (basic_block bb)
 }
 
 
+/* Return true if gimple stmt G can be duplicated.  */
+bool
+can_duplicate_insn_p (gimple *g)
+{
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+ duplicated as part of its group, or not at all.
+ The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
+ group, so the same holds there.  */
+  if (is_gimple_call (g)
+  && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
+ || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)))
+return false;
+
+  return true;
+}
+
+/* Return true if BB can be duplicated.  Avoid iterating over the insns.  */
+bool
+can_duplicate_bb_no_insn_iter_p (const_basic_block bb)
+{
+  if (bb->index < NUM_FIXED_BLOCKS)
+return false;
+
+  if

[PATCH] arc: Improve/add instruction patterns to better use MAC instructions.

2020-10-09 Thread Claudiu Zissulescu via Gcc-patches
From: Claudiu Zissulescu 

ARC MYP7+ instructions add MAC instructions for vector and scalar data
types. This patch adds a madd pattern for 16it datum that is using the
32bit MAC instruction, and dot_prod patterns for v4hi vector
types. The 64bit moves are also upgraded by using vadd2 instuction.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.c (arc_split_move): Recognize vadd2 instructions.
* config/arc/arc.md (movdi_insn): Update pattern to use vadd2
instructions.
(movdf_insn): Likewise.
(maddhisi4): New pattern.
(umaddhisi4): Likewise.
* config/arc/simdext.md (mov_int): Update pattern to use
vadd2.
(sdot_prodv4hi): New pattern.
(udot_prodv4hi): Likewise.
(arc_vec_mac_hi_v4hi): Update/renamed to
arc_vec_mac_v2hiv2si.
(arc_vec_mac_v2hiv2si_zero): New pattern.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config/arc/arc.c  |  8 
 gcc/config/arc/arc.md | 71 ---
 gcc/config/arc/constraints.md |  5 ++
 gcc/config/arc/simdext.md | 90 +++
 4 files changed, 147 insertions(+), 27 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index ec55cfde87a9..d5b521e75e67 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -10202,6 +10202,14 @@ arc_split_move (rtx *operands)
   return;
 }
 
+  if (TARGET_PLUS_QMACW
+  && even_register_operand (operands[0], mode)
+  && even_register_operand (operands[1], mode))
+{
+  emit_move_insn (operands[0], operands[1]);
+  return;
+}
+
   if (TARGET_PLUS_QMACW
   && GET_CODE (operands[1]) == CONST_VECTOR)
 {
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index f9fc11e51a85..1720e8cd2f6f 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1345,8 +1345,8 @@ archs4x, archs4xd"
   ")
 
 (define_insn_and_split "*movdi_insn"
-  [(set (match_operand:DI 0 "move_dest_operand"  "=w, w,r,   m")
-   (match_operand:DI 1 "move_double_src_operand" "c,Hi,m,cCm3"))]
+  [(set (match_operand:DI 0 "move_dest_operand"  "=r, r,r,   m")
+   (match_operand:DI 1 "move_double_src_operand" "r,Hi,m,rCm3"))]
   "register_operand (operands[0], DImode)
|| register_operand (operands[1], DImode)
|| (satisfies_constraint_Cm3 (operands[1])
@@ -1358,6 +1358,13 @@ archs4x, archs4xd"
 default:
   return \"#\";
 
+case 0:
+if (TARGET_PLUS_QMACW
+   && even_register_operand (operands[0], DImode)
+   && even_register_operand (operands[1], DImode))
+  return \"vadd2\\t%0,%1,0\";
+return \"#\";
+
 case 2:
 if (TARGET_LL64
 && memory_operand (operands[1], DImode)
@@ -1374,7 +1381,7 @@ archs4x, archs4xd"
 return \"#\";
 }
 }"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
arc_split_move (operands);
@@ -1420,15 +1427,24 @@ archs4x, archs4xd"
   "if (prepare_move_operands (operands, DFmode)) DONE;")
 
 (define_insn_and_split "*movdf_insn"
-  [(set (match_operand:DF 0 "move_dest_operand"  "=D,r,c,c,r,m")
-   (match_operand:DF 1 "move_double_src_operand" "r,D,c,E,m,c"))]
-  "register_operand (operands[0], DFmode) || register_operand (operands[1], 
DFmode)"
+  [(set (match_operand:DF 0 "move_dest_operand"  "=D,r,r,r,r,m")
+   (match_operand:DF 1 "move_double_src_operand" "r,D,r,E,m,r"))]
+  "register_operand (operands[0], DFmode)
+   || register_operand (operands[1], DFmode)"
   "*
 {
  switch (which_alternative)
{
 default:
   return \"#\";
+
+case 2:
+if (TARGET_PLUS_QMACW
+   && even_register_operand (operands[0], DFmode)
+   && even_register_operand (operands[1], DFmode))
+  return \"vadd2\\t%0,%1,0\";
+return \"#\";
+
 case 4:
 if (TARGET_LL64
&& ((even_register_operand (operands[0], DFmode)
@@ -6177,6 +6193,49 @@ archs4x, archs4xd"
   [(set_attr "length" "0")])
 
 ;; MAC and DMPY instructions
+
+; Use MAC instruction to emulate 16bit mac.
+(define_expand "maddhisi4"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:HI 1 "register_operand" "")
+   (match_operand:HI 2 "extend_operand"   "")
+   (match_operand:SI 3 "register_operand" "")]
+  "TARGET_PLUS_DMPY"
+  "{
+   rtx acc_reg = gen_rtx_REG (DImode, ACC_REG_FIRST);
+   rtx tmp1 = gen_reg_rtx (SImode);
+   rtx tmp2 = gen_reg_rtx (SImode);
+   rtx accl = gen_lowpart (SImode, acc_reg);
+
+   emit_move_insn (accl, operands[3]);
+   emit_insn (gen_rtx_SET (tmp1, gen_rtx_SIGN_EXTEND (SImode, operands[1])));
+   emit_insn (gen_rtx_SET (tmp2, gen_rtx_SIGN_EXTEND (SImode, operands[2])));
+   emit_insn (gen_mac (tmp1, tmp2));
+   emit_move_insn (operands[0], accl);
+   DONE;
+  }")
+
+; The same for the unsigned variant, but using MACU instruction.
+(define_expand "umaddhisi4"
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:HI 1 "register_operand" "")
+   (match_operand:HI 2 "extend_operand"  

[PUSHED] Patch to fix a LRA ICE [PR 97313]

2020-10-09 Thread Vladimir Makarov via Gcc-patches

The following patch has been committed into the main line.  The patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97313

The patch was successfully bootstrapped and tested on x86-64.


gcc/ChangeLog:

2020-10-09  Vladimir Makarov  

	PR rtl-optimization/97313
	* lra-constraints.c (match_reload): Don't keep strict_low_part in
	reloads for non-registers.

gcc/testsuite/ChangeLog:

2020-10-09  Vladimir Makarov  

	PR rtl-optimization/97313
	* gcc.target/i386/pr97313.c: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 301c912cb21..f761d7dfe3c 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1132,8 +1132,13 @@ match_reload (signed char out, signed char *ins, signed char *outs,
   narrow_reload_pseudo_class (out_rtx, goal_class);
   if (find_reg_note (curr_insn, REG_UNUSED, out_rtx) == NULL_RTX)
 {
+  reg = SUBREG_P (out_rtx) ? SUBREG_REG (out_rtx) : out_rtx;
   start_sequence ();
-  if (out >= 0 && curr_static_id->operand[out].strict_low)
+  /* If we had strict_low_part, use it also in reload to keep other
+	 parts unchanged but do it only for regs as strict_low_part
+	 has no sense for memory and probably there is no insn pattern
+	 to match the reload insn in memory case.  */
+  if (out >= 0 && curr_static_id->operand[out].strict_low && REG_P (reg))
 	out_rtx = gen_rtx_STRICT_LOW_PART (VOIDmode, out_rtx);
   lra_emit_move (out_rtx, copy_rtx (new_out_reg));
   emit_insn (*after);
diff --git a/gcc/testsuite/gcc.target/i386/pr97313.c b/gcc/testsuite/gcc.target/i386/pr97313.c
new file mode 100644
index 000..ef93cf1cca8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr97313.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIE" } */
+
+typedef struct {
+  int unspecified : 1;
+  int secure : 1;
+} MemTxAttrs;
+
+enum { MSCAllowNonSecure } tz_msc_read_pdata;
+
+int tz_msc_read_s_0;
+int tz_msc_check();
+int address_space_ldl_le();
+
+void tz_msc_read(MemTxAttrs attrs) {
+  int as = tz_msc_read_s_0;
+  long long data;
+  switch (tz_msc_check()) {
+  case MSCAllowNonSecure:
+attrs.secure = attrs.unspecified = 0;
+data = address_space_ldl_le(as, attrs);
+  }
+  tz_msc_read_pdata = data;
+}


Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-10-09 Thread Martin Sebor via Gcc-patches

On 10/8/20 1:40 PM, Jason Merrill wrote:

On 10/8/20 3:18 PM, Martin Sebor wrote:

On 10/7/20 3:01 PM, Jason Merrill wrote:

On 10/7/20 4:11 PM, Martin Sebor wrote:

...

For the various member functions, please include the comments 
with the definition as well as the in-class declaration.


Only one access_ref member function is defined out-of-line: 
offset_bounded().  I've adjusted the comment and copied it above

the function definition.


And size_remaining, as quoted above?


I have this in my tree:

/* Return the maximum amount of space remaining and if non-null, set
    argument to the minimum.  */

I'll add it when I commit the patch.



I also don't see a comment above the definition of offset_bounded in 
the new patch?


There is a comment in the latest patch.

...

The goal of conditionals is to avoid overwhelming the user with
excessive numbers that may not be meaningful or even relevant
to the warning.  I've corrected the function body, tweaked and
renamed the get_range function to get_offset_range to do a better
job of extracting ranges from the types of some nonconstant
expressions the front end passes it, and added a new test for
all this.  Attached is the new revision.


offset_bounded looks unchanged in the new patch.  It still returns 
true iff either the range is a single value or one of the bounds are 
unrepresentable in ptrdiff_t.  I'm still unclear how this corresponds 
to "Return true if OFFRNG is bounded to a subrange of possible offset 
values."


I don't think you're looking at the latest patch.  It has this:

+/* Return true if OFFRNG is bounded to a subrange of offset values
+   valid for the largest possible object.  */
+
  bool
  access_ref::offset_bounded () const
  {
-  if (offrng[0] == offrng[1])
-    return false;
-
    tree min = TYPE_MIN_VALUE (ptrdiff_type_node);
    tree max = TYPE_MAX_VALUE (ptrdiff_type_node);
-  return offrng[0] <= wi::to_offset (min) || offrng[1] >= 
wi::to_offset (max);
+  return wi::to_offset (min) <= offrng[0] && offrng[1] <= 
wi::to_offset (max);

  }

Here's a link to it in the archive:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555019.html
https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200928/9026783a/attachment-0003.bin 



Ah, yes, there are two patches in that email; the first introduces the 
broken offset_bounded, and the second one fixes it without mentioning 
that in the ChangeLog.  How about moving the fix to the first patch?


Sure, I can do that.  Anything else or is the final version okay
to commit with this adjustment?

Martin




Jason





Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

2020-10-09 Thread Jason Merrill via Gcc-patches

On 10/9/20 10:51 AM, Martin Sebor wrote:

On 10/8/20 1:40 PM, Jason Merrill wrote:

On 10/8/20 3:18 PM, Martin Sebor wrote:

On 10/7/20 3:01 PM, Jason Merrill wrote:

On 10/7/20 4:11 PM, Martin Sebor wrote:

...

For the various member functions, please include the 
comments with the definition as well as the in-class 
declaration.


Only one access_ref member function is defined out-of-line: 
offset_bounded().  I've adjusted the comment and copied it above

the function definition.


And size_remaining, as quoted above?


I have this in my tree:

/* Return the maximum amount of space remaining and if non-null, set
    argument to the minimum.  */

I'll add it when I commit the patch.



I also don't see a comment above the definition of offset_bounded in 
the new patch?


There is a comment in the latest patch.

...

The goal of conditionals is to avoid overwhelming the user with
excessive numbers that may not be meaningful or even relevant
to the warning.  I've corrected the function body, tweaked and
renamed the get_range function to get_offset_range to do a 
better

job of extracting ranges from the types of some nonconstant
expressions the front end passes it, and added a new test for
all this.  Attached is the new revision.


offset_bounded looks unchanged in the new patch.  It still returns 
true iff either the range is a single value or one of the bounds are 
unrepresentable in ptrdiff_t.  I'm still unclear how this 
corresponds to "Return true if OFFRNG is bounded to a subrange of 
possible offset values."


I don't think you're looking at the latest patch.  It has this:

+/* Return true if OFFRNG is bounded to a subrange of offset values
+   valid for the largest possible object.  */
+
  bool
  access_ref::offset_bounded () const
  {
-  if (offrng[0] == offrng[1])
-    return false;
-
    tree min = TYPE_MIN_VALUE (ptrdiff_type_node);
    tree max = TYPE_MAX_VALUE (ptrdiff_type_node);
-  return offrng[0] <= wi::to_offset (min) || offrng[1] >= 
wi::to_offset (max);
+  return wi::to_offset (min) <= offrng[0] && offrng[1] <= 
wi::to_offset (max);

  }

Here's a link to it in the archive:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555019.html
https://gcc.gnu.org/pipermail/gcc-patches/attachments/20200928/9026783a/attachment-0003.bin 




Ah, yes, there are two patches in that email; the first introduces the 
broken offset_bounded, and the second one fixes it without mentioning 
that in the ChangeLog.  How about moving the fix to the first patch?


Sure, I can do that.  Anything else or is the final version okay
to commit with this adjustment?


OK with that adjustment.

Jason



Re: [PATCH] c++: Distinguish btw. alignof and __alignof__ in cp_tree_equal [PR97273]

2020-10-09 Thread Jason Merrill via Gcc-patches

On 10/9/20 4:48 AM, Jakub Jelinek wrote:

On Tue, Oct 06, 2020 at 03:40:52PM -0400, Jason Merrill via Gcc-patches wrote:

On 10/4/20 11:28 PM, Patrick Palka wrote:

cp_tree_equal currently considers alignof the same as __alignof__, but
these operators are semantically different ever since r8-7957.  In the
testcase below, this causes the second static_assert to fail on targets
where alignof(double) != __alignof__(double) because the specialization
cache (which uses cp_tree_equal as the equality predicate) conflates the
two dependent specializations integral_constant<__alignof__(T)> and
integral_constant.

This patch makes cp_tree_equal distinguish between these two operators
by inspecting the ALIGNOF_EXPR_STD_P flag.

Bootstrapped and regtested on x86_64-pc-linux-gnu, and also verified
that we now correctly compile the  PR97273 testcase, does this look OK
for trunk and the release branches?


OK.


Shouldn't we then mangle alignof and __alignof__ differently though?


Good point.  Then I guess __alignof__ should be mangled as v111__alignof__

Jason



[PATCH v2] IBM Z: Change vector copysign to use bitwise operations

2020-10-09 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  OK for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555782.html
v1 -> v2: Use related_int_vector_mode.



The vector copysign pattern incorrectly assumes that vector
if_then_else operates on bits, not on elements.  This can theoretically
mislead the optimizers.  Fix by changing it to use bitwise operations,
like commit 2930bb321794 ("PR94613: Fix vec_sel builtin for IBM Z") did
for vec_sel builtin.

gcc/ChangeLog:

2020-10-07  Ilya Leoshkevich  

* config/s390/s390-protos.h (s390_build_signbit_mask): New
function.
* config/s390/s390.c (s390_contiguous_bitmask_vector_p):
Bitcast the argument to an integral mode.
(s390_expand_vec_init): Do not call
s390_contiguous_bitmask_vector_p with a scalar argument.
(s390_build_signbit_mask): New function.
* config/s390/vector.md (copysign3): Use bitwise
operations.
---
 gcc/config/s390/s390-protos.h |  1 +
 gcc/config/s390/s390.c| 44 ---
 gcc/config/s390/vector.md | 28 +++---
 3 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 6f1bc07db17..029f7289fac 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -121,6 +121,7 @@ extern void s390_expand_vec_compare_cc (rtx, enum rtx_code, 
rtx, rtx, bool);
 extern enum rtx_code s390_reverse_condition (machine_mode, enum rtx_code);
 extern void s390_expand_vcond (rtx, rtx, rtx, enum rtx_code, rtx, rtx);
 extern void s390_expand_vec_init (rtx, rtx);
+extern rtx s390_build_signbit_mask (machine_mode);
 extern rtx s390_return_addr_rtx (int, rtx);
 extern rtx s390_back_chain_rtx (void);
 extern rtx_insn *s390_emit_call (rtx, rtx, rtx, rtx);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 93894307d62..dbb541bbea7 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2467,6 +2467,9 @@ s390_contiguous_bitmask_vector_p (rtx op, int *start, int 
*end)
   rtx elt;
   bool b;
 
+  /* Handle floats by bitcasting them to ints.  */
+  op = gen_lowpart (related_int_vector_mode (GET_MODE (op)).require (), op);
+
   gcc_assert (!!start == !!end);
   if (!const_vec_duplicate_p (op, &elt)
   || !CONST_INT_P (elt))
@@ -6863,15 +6866,16 @@ s390_expand_vec_init (rtx target, rtx vals)
 }
 
   /* Use vector gen mask or vector gen byte mask if possible.  */
-  if (all_same && all_const_int
-  && (XVECEXP (vals, 0, 0) == const0_rtx
- || s390_contiguous_bitmask_vector_p (XVECEXP (vals, 0, 0),
-  NULL, NULL)
- || s390_bytemask_vector_p (XVECEXP (vals, 0, 0), NULL)))
+  if (all_same && all_const_int)
 {
-  emit_insn (gen_rtx_SET (target,
- gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0;
-  return;
+  rtx vec = gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0));
+  if (XVECEXP (vals, 0, 0) == const0_rtx
+ || s390_contiguous_bitmask_vector_p (vec, NULL, NULL)
+ || s390_bytemask_vector_p (vec, NULL))
+   {
+ emit_insn (gen_rtx_SET (target, vec));
+ return;
+   }
 }
 
   /* Use vector replicate instructions.  vlrep/vrepi/vrep  */
@@ -6949,6 +6953,30 @@ s390_expand_vec_init (rtx target, rtx vals)
 }
 }
 
+/* Emit a vector constant that contains 1s in each element's sign bit position
+   and 0s in other positions.  MODE is the desired constant's mode.  */
+extern rtx
+s390_build_signbit_mask (machine_mode mode)
+{
+  /* Generate the integral element mask value.  */
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+  int inner_bitsize = GET_MODE_BITSIZE (inner_mode);
+  wide_int mask_val = wi::set_bit_in_zero (inner_bitsize - 1, inner_bitsize);
+
+  /* Emit the element mask rtx.  Use gen_lowpart in order to cast the integral
+ value to the desired mode.  */
+  machine_mode int_mode = related_int_vector_mode (mode).require ();
+  rtx mask = immed_wide_int_const (mask_val, GET_MODE_INNER (int_mode));
+  mask = gen_lowpart (inner_mode, mask);
+
+  /* Emit the vector mask rtx by mode the element mask rtx.  */
+  int nunits = GET_MODE_NUNITS (mode);
+  rtvec v = rtvec_alloc (nunits);
+  for (int i = 0; i < nunits; i++)
+RTVEC_ELT (v, i) = mask;
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
+
 /* Structure to hold the initial parameters for a compare_and_swap operation
in HImode and QImode.  */
 
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 2573b7d980a..e9332bad0fd 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1425,28 +1425,16 @@ (define_insn_and_split "*sminv2df3_vx"
 
 ; Vector copysign, implement using vector select
 (define_expand "copysign3"
-  [(set (match_operand:VFT 0 "register_operand" "")
-   (if_then_else:VFT
-(eq (match_dup 3)
-(match_dup 4))
-(match_operan

[committed] libstdc++: Fix incorrect results in std::seed_seq::generate [PR 97311]

2020-10-09 Thread Jonathan Wakely via Gcc-patches
This ensures that intermediate results are done in uint32_t values,
meeting the requirement for operations to be done modulo 2^32.

If the target doesn't define __UINT32_TYPE__ then substitute uint32_t
with a class type that uses uint_least32_t and masks the value to
UINT32_MAX.

I've also split the first loop that goes from k=0 to kcommit 3ee44d4c518d61c6bbf75fcf280edc6ce5326ce0
Author: Jonathan Wakely 
Date:   Fri Oct 9 16:10:31 2020

libstdc++: Fix incorrect results in std::seed_seq::generate [PR 97311]

This ensures that intermediate results are done in uint32_t values,
meeting the requirement for operations to be done modulo 2^32.

If the target doesn't define __UINT32_TYPE__ then substitute uint32_t
with a class type that uses uint_least32_t and masks the value to
UINT32_MAX.

I've also split the first loop that goes from k=0 to k> 27);
- __r1 = __detail::__mod<_Type,
-   __detail::_Shift<_Type, 32>::__value>(1664525u * __r1);
- _Type __r2 = __r1;
- if (__k == 0)
-   __r2 += __s;
- else if (__k <= __s)
-   __r2 += __k % __n + _M_v[__k - 1];
- else
-   __r2 += __k % __n;
- __r2 = __detail::__mod<_Type,
-  __detail::_Shift<_Type, 32>::__value>(__r2);
- __begin[(__k + __p) % __n] += __r1;
- __begin[(__k + __q) % __n] += __r2;
- __begin[__k % __n] = __r2;
+ uint32_t __r1 = 1371501266u;
+ uint32_t __r2 = __r1 + __s;
+ __begin[__p] += __r1;
+ __begin[__q] = (uint32_t)__begin[__q] + __r2;
+ __begin[0] = __r2;
+   }
+
+  for (size_t __k = 1; __k <= __s; ++__k)
+   {
+ const size_t __kn = __k % __n;
+ const size_t __kpn = (__k + __p) % __n;
+ const size_t __kqn = (__k + __q) % __n;
+ uint32_t __arg = (__begin[__kn]
+   ^ __begin[__kpn]
+   ^ __begin[(__k - 1) % __n]);
+ uint32_t __r1 = 1664525u * (__arg ^ (__arg >> 27));
+ uint32_t __r2 = __r1 + (uint32_t)__kn + _M_v[__k - 1];
+ __begin[__kpn] = (uint32_t)__begin[__kpn] + __r1;
+ __begin[__kqn] = (uint32_t)__begin[__kqn] + __r2;
+ __begin[__kn] = __r2;
+   }
+
+  for (size_t __k = __s + 1; __k < __m; ++__k)
+   {
+ const size_t __kn = __k % __n;
+ const size_t __kpn = (__k + __p) % __n;
+ const size_t __kqn = (__k + __q) % __n;
+ uint32_t __arg = (__begin[__kn]
+^ __begin[__kpn]
+^ __begin[(__k - 1) % __n]);
+ uint32_t __r1 = 1664525u * (__arg ^ (__arg >> 27));
+ uint32_t __r2 = __r1 + (uint32_t)__kn;
+ __begin[__kpn] = (uint32_t)__begin[__kpn] + __r1;
+ __begin[__kqn] = (uint32_t)__begin[__kqn] + __r2;
+ __begin[__kn] = __r2;
}
 
   for (size_t __k = __m; __k < __m + __n; ++__k)
{
- _Type __arg = (__begin[__k % __n]
-+ __begin[(__k + __p) % __n]
-+ __begin[(__k - 1) % __n]);
- _Type __r3 = __arg ^ (__arg >> 27);
- __r3 = __detail::__mod<_Type,
-  __detail::_Shift<_Type, 32>::__value>(1566083941u * __r3);
- _Type __r4 = __r3 - __k % __n;
- __r4 = __detail::__mod<_Type,
-  __detail::_Shift<_Type, 32>::__value>(__r4);
- __begin[(__k + __p) % __n] ^= __r3;
- __begin[(__k + __q) % __n] ^= __r4;
- __begin[__k % __n] = __r4;
+ const size_t __kn = __k % __n;
+ const size_t __kpn = (__k + __p) % __n;
+ const size_t __kqn = (__k + __q) % __n;
+ uint32_t __arg = (__begin[__kn]
+   + __begin[__kpn]
+   + __begin[(__k - 1) % __n]);
+ uint32_t __r3 = 1566083941u * (__arg ^ (__arg >> 27));
+ uint32_t __r4 = __r3 - __kn;
+ __begin[__kpn] ^= __r3;
+ __begin[__kqn] ^= __r4;
+ __begin[__kn] = __r4;
}
 }
 
diff --git a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc 
b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
index 9cffc3d06f9..0b5f597040b 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
@@ -12,4 +12,4 @@ auto x = std::generate_canonicalhttp://www.gnu.org/licenses/>.
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  // PR libstdc++/97311
+
+  using i64 = std::int_least64_t; // can hold all values of uint32_t
+  std::vector v(10);
+  std::seed_seq s;
+  s.generate(v.begin(), v.end());
+
+  const std::vector expected{
+0xbc199682,
+0x7a094407,
+0xac05bf42,
+0x10baa2f4,
+0x822d6fde,
+0xf08cdc22,
+0x30382aee,
+0xbd5fb4aa,
+0xb26c5a35,
+0xb9619724
+  };
+  VERIFY( v == expected );
+}
+
+int
+main()
+{
+  test01();
+}


[PATCH] libstdc++: Simplify metaprogramming in

2020-10-09 Thread Jonathan Wakely via Gcc-patches
This removes the __detail::_Shift class template, replacing it with a
constexpr function template __pow2m1. Instead of using the _Mod class
template to calculate a modulus just perform a bitwise AND with the
result of __pow2m1. This works because the places that change all
perform a modulus operation with a power of two, x mod 2^w, which can be
replaced with x & (2^w - 1).

The _Mod class template is still needed by linear_congruential_engine
which needs to calculate (a * x + c) % m without overflow.

I'm not committing this yet, please review and check I've not broken
anything.


commit 707145dfc0a034df1be027080c2f7c9dfe314c2b
Author: Jonathan Wakely 
Date:   Fri Oct 9 18:06:00 2020

libstdc++: Simplify metaprogramming in 

This removes the __detail::_Shift class template, replacing it with a
constexpr function template __pow2m1. Instead of using the _Mod class
template to calculate a modulus just perform a bitwise AND with the
result of __pow2m1. This works because the places that change all
perform a modulus operation with a power of two, x mod 2^w, which can be
replaced with x & (2^w - 1).

The _Mod class template is still needed by linear_congruential_engine
which needs to calculate (a * x + c) % m without overflow.

libstdc++-v3/ChangeLog:

* include/bits/random.h (__detail::_Shift): Remove.
(__detail::_Select_uint_least_t<__s, 1>): Allow __int128 to be
used when supported by the compiler, even if _GLIBCXX_USE_INT128
is not defined.
(__pow2m1): New constexpr function template for 2^w - 1.
(__detail::__mod): Remove.
(_Adaptor::min(), _Adaptor::max()): Add constexpr.
(linear_congruential_engine::operator()): Use _Mod::__calc
directly instead of __mod.
(mersenne_twister_engine): Assert 2u < w. Use max() in
assertions.
(mersenne_twister_engine::max()): Use __pow2m1.
(subtract_with_carry_engine::max()): Likewise.
(independent_bits_engine::max()): Likewise.
(seed_seq::seed_seq(initializer_list)): Define inline,
using constructor delegation.
* include/bits/random.tcc (__detail::_Mod<>::__calc): Add
constexpr.
(linear_congruential_engine::seed(result_type)): Replace uses
of __mod function with explicit % operations.
(linear_congruential_engine::seed(Sseq&)): Remove factor
variable and replace multiplications by shifts.
(mersenne_twister_engine::seed(result_type)): Replace uses of
__mod and _Shift by % and & operations.
(mersenne_twister_engine::seed(Sseq&)): Likewise. Replace
multiplications by shifts.
(subtract_with_carry_engine::seed(result_type)): Likewise.
(subtract_with_carry_engine::seed(Sseq&)): Likewise.
(subtract_with_carry_engine::operator()()): Replace _Shift with
__pow2m1.
(seed_seq::seed_seq(initializer_list)): Remove
out-of-line definition.
(seed_seq::seed_seq(InputIterator, InputIterator)): Replace
__mod and _Shift by bitwise AND.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index 0be1191e07d..4be1819d465 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -65,16 +65,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   namespace __detail
   {
-template
- (std::numeric_limits<_UIntType>::digits)>
-  struct _Shift
-  { static const _UIntType __value = 0; };
-
-template
-  struct _Shift<_UIntType, __w, true>
-  { static const _UIntType __value = _UIntType(1) << __w; };
-
 template
   { typedef unsigned long long type; };
 
-#ifdef _GLIBCXX_USE_INT128
+#ifdef __SIZEOF_INT128__
 template
   struct _Select_uint_least_t<__s, 1>
   { typedef unsigned __int128 type; };
 #endif
 
+// `_Mod::__calc(x)` returns `(a x + c) mod m`.
 // Assume a != 0, a < m, c < m, x < m.
 template
-  inline _Tp
-  __mod(_Tp __x)
+// Returns `static_cast<_Tp>(pow(2, w) - 1)`
+template
+  constexpr _Tp
+  __pow2m1(_Tp __w)
   {
-   if _GLIBCXX17_CONSTEXPR (__a == 0)
- return __c;
-   else
- {
-   // _Mod must not be instantiated with a == 0
-   constexpr _Tp __a1 = __a ? __a : 1;
-   return _Mod<_Tp, __m, __a1, __c>::__calc(__x);
- }
+   static_assert(!numeric_limits<_Tp>::is_signed,
+ "type must be unsigned");
+
+#if __cplusplus >= 201402L
+   if (__w > numeric_limits<_Tp>::digits)
+ __builtin_abort();
+#endif
+
+   return ~_Tp(0) >> (numeric_limits<_Tp>::digits - __w);
   }
 
 /*
@@ -171

Re: Problem with static const objects and LTO

2020-10-09 Thread Jeff Law via Gcc-patches


On 10/7/20 5:12 PM, H.J. Lu wrote:
> On Wed, Oct 7, 2020 at 3:09 PM Jeff Law via Gcc-patches
>  wrote:
>> Adding the testcase...
>>
>> On 10/7/20 4:08 PM, Jeff Law wrote:
>>> On 9/17/20 1:03 PM, Jakub Jelinek wrote:
>>> [ ... Big snip, starting over ... ]
>>>
>>> I may have not explained things too well.  So I've put together a small
>>> example that folks can play with to show the underlying issue.
>>>
>>>
>>> There's a static library libfu.a.  In this static library we have a hunk
>>> of local static data (utf8_sb_map) and two functions.  One function puts
>>> the address utf8_sb_map into a structure (rpl_regcomp), the other
>>> verifies that the address stored in the structure is the same as the
>>> address of utf8_sb_map (rpl_regfree).
>>>
>>>
>>> That static library is linked into DSO libdso.so.  The DSO sources
>>> define a single function xregcomp which calls rpl_regcomp, but
>>> references  nothing else from the static library.  Since libfu.a was
>>> linked into the library we actually get a *copy* of rpl_regcomp in the
>>> DSO.  In fact, we get a copy of the entire .o file from libfu.a, which
>>> matches traditional linkage models where the .o file is an atomic unit
>>> for linking.
>>>
>>>
>>> The main program calls xregcomp which is defined in the DSO and calls
>>> rpl_regfree.  The main program links against libdso.so and libfu.a.
>>> Because it links libfu.a, it gets  a copy of rpl_regfree, but *only*
>>> rpl_regfree.  That copy of rpl_regfree references a new and distinct
>>> copy of utf8_sb_map.  Naturally the address of utf8_sb_map in the main
>>> program is different from the one in libdso.so and the test aborts.
>>>
>>>
>>> Without LTO the main program would still reference rpl_regfree, but the
>>> main program would not have its own copy.  rpl_regfree and rpl_regcomp
>>> would both be satisfied by the DSO (which remember has a complete copy
>>> of the .o file from libfu.a).  Thus there would be only one utf8_sb_map
>>> as well and naturally the program will exit normally.
>>>
>>>
>>> So I've got a bunch of thoughts here, but will defer sharing them
>>> immediately so as not to unduly influence anyone.
>>>
>>>
>>> I don't have a sense of how pervasive this issue is.   I know it affects
>>> man-db, but there could well be others.
> This is:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26530
> https://sourceware.org/bugzilla/show_bug.cgi?id=26314

Just to close the loop here.  There was a followup patch from Alan
attached to 26314 that wasn't in Fedora.  Adding that to Fedora fixes
the man-db issues.


Thanks!


jeff



[r11-3750 Regression] FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "Building vector operands from scalars" 1 on Linux/x86_64

2020-10-09 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

16760e5bf7028dfa36b39af305d05cdf2c15b3a9 is the first bad commit
commit 16760e5bf7028dfa36b39af305d05cdf2c15b3a9
Author: Richard Biener 
Date:   Fri Oct 9 12:24:46 2020 +0200

tree-optimization/97334 - improve BB SLP discovery

caused

FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
scan-tree-dump-times slp1 "Building vector operands from scalars" 1
FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "Building vector 
operands from scalars" 1

with GCC configured with

Configured with: ../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3750/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r11-3748 Regression] FAIL: gcc.dg/tree-ssa/modref-2.c execution test on Linux/x86_64

2020-10-09 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

ffe8baa996486fa0313aa804a064a58b0b161f07 is the first bad commit
commit ffe8baa996486fa0313aa804a064a58b0b161f07
Author: Jan Hubicka 
Date:   Fri Oct 9 11:29:58 2020 +0200

IPA modref: fix miscompilation in clone when IPA modref is used

caused

FAIL: gcc.c-torture/execute/ieee/acc1.c execution,  -O1 
FAIL: gcc.c-torture/execute/ieee/acc2.c execution,  -O1 
FAIL: gcc.dg/tree-ssa/modref-2.c execution test

with GCC configured with

Configured with: ../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-3748/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gcc.c-torture/execute/ieee/acc1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gcc.c-torture/execute/ieee/acc1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gcc.c-torture/execute/ieee/acc2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gcc.c-torture/execute/ieee/acc2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gcc.c-torture/execute/ieee/acc2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gcc.c-torture/execute/ieee/acc2.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/modref-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PING][PATCH v2] combine: Don't turn (mult (extend x) 2^n) into extract [PR96998]

2020-10-09 Thread Segher Boessenkool
On Fri, Oct 09, 2020 at 09:38:09AM +0100, Alex Coplan wrote:
> Hi Segher,
> 
> On 08/10/2020 15:20, Segher Boessenkool wrote:
> > On Thu, Oct 08, 2020 at 11:21:26AM +0100, Alex Coplan wrote:
> > > Ping. The kernel is still broken on AArch64.
> > 
> > You *cannot* fix a correctness bug with a combine addition.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/555158.html
> explains why we do precisely that.

And it still is wrong.

> Also, as your own testing confirmed, the patch does indeed fix the issue.

No, it did not.  It showed that before the patch the bug was hit, and
after it it was not.  It does not show the bug was solved.

> > So please fix the target bug first.
> 
> I think the problem here -- the reason that we're talking past each
> other -- is that there are (at least) two parts of the codebase that can
> be blamed for the ICE here:
> 
> 1. aarch64: "The insn is unrecognized, so it's a backend bug
> (missing pattern or similar)."
> 
> 2. combine: "Combine produces a non-canonical insn, so the backend
> (correctly) doesn't recognise it, and combine is at fault."

That cannot cause ICEs!  *That* is the issue.  It is *normal* for insns
combine comes up with to not be allowed by the backend (or recognised
even); in that case, that instruction combination is simply not made.
The code was valid before, and stays valid.  That is all that combine
does: it may or may not change some instructions to semantically
equivalent instructions that the cost model thinks are better to have.

You *cannot* depend on combine to make *any* particular combination or
simplification or anything.  There are hundreds of reasons to abort a
combination.

3.  The insn is not valid for the target, so the target does *not*
recognise it, and combine does *not* make that combination, *and all is
good*.

That is how it is supposed to work (and how it *does* work).

If the input to combine contains invalid instructions, you have to fix a
bug in some earlier pass (or the backend).  All instructions in the
instruction stream have to be valid (some time shortly after expand, and
passes can do crazy stuff internally, so let's say "between passes").

> Now I initially (naively) took interpretation 1 here and tried to fix
> the ICE by adding a pattern to recognise the sign_extract insn that
> combine is producing here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553605.html
> 
> Howerver, in the review of that patch, Richard opened my eyes to
> interpretation 2, which in hindsight is clearly a better way to fix the
> issue.
> 
> Combine already does the canonicalisation for the (ashift x n) case, so
> it seems like an obvious improvement to do the same for the (mult x 2^n)
> case, as this is how shifts are represented inside mem rtxes.

An improvement perhaps, but there is still a bug in the backend.  It is
*normal* for combine to create code invalid for the target, including
non-canonical code.  You should not recognise non-canonical code if you
cannot actually handle it correctly.

> Again, please see Richard's comments here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554518.html
> 
> > 
> > I haven't had time to look at your patch yet, sorry.
> 
> Not to worry. Hopefully this clears up any confusion around what we're
> trying to do here and why.

It doesn't, unfortunately.  But maybe you understand now?


Segher


Re: [PATCH] libstdc++: Implement C++20 features for

2020-10-09 Thread Thomas Rodgers via Gcc-patches


Jonathan Wakely writes:

> On 07/10/20 18:15 -0700, Thomas Rodgers wrote:
>>@@ -500,6 +576,40 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>>   }
>> #endif
>>
>>+#if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
>>+  basic_istringstream(ios_base::openmode __mode, const allocator_type& 
>>__a)
>>+  : __istream_type(), _M_stringbuf(__mode | ios_base::in, __a)
>>+  { this->init(&_M_stringbuf); }
>
> All these & operators need to be std::__addressof(_M_stringbuf)
> instead. _M_stringbuf potentially depends on program-defined types
> (the traits and allocator classes) which means user namespaces are
> considered for ADL and they could define a operator& that gets used.
>
>
>>+
>>+  explicit basic_istringstream(__string_type&& __str,
>>+ios_base::openmode __mode = ios_base::in )
>>+  : __istream_type(), _M_stringbuf(std::move(__str), __mode | 
>>ios_base::in)
>>+  { this->init(&_M_stringbuf); }
>>+
>>+  template
>>+ basic_istringstream(const basic_string<_CharT, _Traits, _SAlloc>& __str,
>>+ const allocator_type& __a)
>>+ : basic_istringstream(__str, ios_base::in, __a)
>>+ { }
>>+
>>+  using __sv_type = basic_string_view;
>
> This typedef seems to only be used once. Might as well just use
> basic_string_view directly in the return type
> of view().
>
> Similarly in basic_ostringstream and basic_stringstream.
>
>>diff --git a/libstdc++-v3/src/c++20/Makefile.in 
>>b/libstdc++-v3/src/c++20/Makefile.in
>>new file mode 100644
>>index 000..0e2de19ae59
>>diff --git a/libstdc++-v3/src/c++20/sstream-inst.cc 
>>b/libstdc++-v3/src/c++20/sstream-inst.cc
>>new file mode 100644
>>index 000..c419176ae8e
>>--- /dev/null
>>+++ b/libstdc++-v3/src/c++20/sstream-inst.cc
>>@@ -0,0 +1,111 @@
>>+// Explicit instantiation file.
>>+
>>+// Copyright (C) 1997-2020 Free Software Foundation, Inc.
>
> Just 2020 here.
>
>>+//
>>+// This file is part of the GNU ISO C++ Library.  This library is free
>>+// software; you can redistribute it and/or modify it under the
>>+// terms of the GNU General Public License as published by the
>>+// Free Software Foundation; either version 3, or (at your option)
>>+// any later version.
>>+
>>+// This library is distributed in the hope that it will be useful,
>>+// but WITHOUT ANY WARRANTY; without even the implied warranty of
>>+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>+// GNU General Public License for more details.
>>+
>>+// Under Section 7 of GPL version 3, you are granted additional
>>+// permissions described in the GCC Runtime Library Exception, version
>>+// 3.1, as published by the Free Software Foundation.
>>+
>>+// You should have received a copy of the GNU General Public License and
>>+// a copy of the GCC Runtime Library Exception along with this program;
>>+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>+// .
>>+
>>+//
>>+// ISO C++ 14882:
>>+//
>>+
>>+#ifndef _GLIBCXX_USE_CXX11_ABI
>>+// Instantiations in this file use the new SSO std::string ABI unless 
>>included
>>+// by another file which defines _GLIBCXX_USE_CXX11_ABI=0.
>
> This copy&pasted comment is misleading now if we're not actually going
> to include it from another file to generate the old ABI symbols.
>
> I think just define it unconditionally and add a comment saying that
> these new symbols are only defines for the SSO string ABI.
>
>>+# define _GLIBCXX_USE_CXX11_ABI 1
>>+#endif
>>+#include 
>>+
>>+namespace std _GLIBCXX_VISIBILITY(default)
>>+{
>>+_GLIBCXX_BEGIN_NAMESPACE_VERSION
>>+
>>+template basic_stringbuf::basic_stringbuf(const allocator_type&);
>>+template basic_stringbuf::basic_stringbuf(ios_base::openmode,
>>+  const allocator_type&);
>>+template basic_stringbuf::basic_stringbuf(__string_type&&,
>>+  ios_base::openmode);
>>+template basic_stringbuf::basic_stringbuf(basic_stringbuf&&,
>>+  const allocator_type&);
>>+template basic_stringbuf::allocator_type
>>+basic_stringbuf::get_allocator() const noexcept;
>>+template basic_stringbuf::__sv_type
>
> Looks like this would be a bit simpler if it just used string_view
> here, not basic_stringbuf::__sv_type, and wstring_view below
> for the wchar_t specializations.
>
> And you could use allocator instead of
> basic_stringbuf::allocator_type.
>
> That looks a little cleaner to me, but it's a matter of opinion.
>
> That would be necessary anyway for the basic_*stringstream types if
> they don't have the __sv_type any more.
>
>
>>diff --git a/libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/1.cc 
>>b/libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/1.cc
>>new file mode 100644
>>index 000..d93141fc232
>>--- /dev/null
>>+++ b/libstdc++-v3/testsuite/27_io/basic_istringstream/cons/char/1.cc
>>@@ -0,0 +1,85 @@
>>+// Copyright (C)