date:20200207

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-07 Thread Segher Boessenkool

On Fri, Feb 07, 2020 at 05:47:32PM +0100, Ulrich Weigand wrote:
> > but what happens to -fsignalling-nans -ffast-math then?  Better leave those
> > in I'd say.
> 
> Ah, it seems I was confused about the intended semantics here.
> 
> I thought that a *more specific* option like -fsignalling-nans was always
> intended to override a more generic option like -ffast-math, no matter
> whether it comes before or after it on the command line.

-fsignaling-nans is an independent option.  Signaling NaNs don't do much
at all when you also have -ffinite-math-only, of course, which says you
don't have *any* NaNs.


Segher

Re: [PATCH][AArch64] Improve clz patterns

2020-02-07 Thread Segher Boessenkool

On Fri, Feb 07, 2020 at 06:01:44PM +, Richard Sandiford wrote:
> Wilco Dijkstra  writes:
> > Although GCC should understand the limited range of clz/ctz/cls results,
> > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> > sign extension.  Avoid this by adding an explicit AND with 127 in the
> > patterns. Deepsjeng performance improves by ~0.6%.

> > PR rtl-optimization/93565

> So I don't think this is OK even as a workaround.  We should fix it in
> target-independent code instead.

Yes, but combine shouldn't do CSE; combine does the *opposite* of CSE,
in some regards.  You shouldn't do CSE without a cost model for it, in
any case.


Segher

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-07 Thread Segher Boessenkool

On Tue, Feb 04, 2020 at 12:06:54PM +, Richard Sandiford wrote:
> > Also, you don't need to emit "#" in output template. This is just for
> > extra safety, we can live without. Please see e.g. "*tzcnt_1".
> 
> IMO it's better to keep it here, where we're relying on the split
> happening for correctness.  There's no theoretical guarantee that
> a split ever happens otherwise.

A split will always happen if you have the "length" attribute and no
register stacks (split_for_shorten_branches, "split5", almost directly
before final).  This has been true for so long that lots of things are
bound to rely on it.  "#" only has real meaning in final, of course
(but is nice documentation for everything else).

Maybe we should just always run split5, and not do it in final at all
anymore?  Or at least always for targets with the "length" attribute?

Segher

Re: [PATCH] rs6000: Fix PR93136, gcc.dg/vmx/ops.c and several other test break after r279772

2020-02-07 Thread Peter Bergner

On 1/9/20 6:29 PM, Peter Bergner wrote:
> On 1/9/20 4:51 PM, Segher Boessenkool wrote:
>> Splitting out separate functions in the testcase shouldn't be so much
>> work?  Or am I too optimistic :-)
>>
>> This should make the test a good deal less prone to random changes in
>> output caused by the lunar cycle.
> 
> Ok, let me take a stab at rewriting the tests to be more similar to the
> pr92923-[12].c and see how much work that is.  I do agree that it would
> be nice not having the insn counts be so fragile.

Sorry for taking so long to get back to this.  I split the functions into
smaller chunks, but didn't need to go all the way to one function per
builtin call.  Does this look better?  The POWER7 and POWER8 files are
identical, except for their -mcpu= option.  The POWER9 file has some slight
differences to the other files, because of new instructions added that we
make sure of.

I've confirmed the updated test cases now pass on both LE and BE.
Ok for trunk?

Peter

PR target/93136
* gcc.dg/vmx/ops.c: Add -flax-vector-conversions to dg-options.
* gcc.target/powerpc/vsx-vector-6.h: Split tests into smaller functions.
* gcc.target/powerpc/vsx-vector-6.p7.c: Adjust scan-assembler-times
regex directives.  Adjust expected instruction counts.
* gcc.target/powerpc/vsx-vector-6.p8.c: Likewise.
* gcc.target/powerpc/vsx-vector-6.p9.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/vmx/ops.c b/gcc/testsuite/gcc.dg/vmx/ops.c
index 6aff80bbd1a..4a0650c06bd 100644
--- a/gcc/testsuite/gcc.dg/vmx/ops.c
+++ b/gcc/testsuite/gcc.dg/vmx/ops.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-maltivec -mabi=altivec -std=gnu99 -mno-vsx -Wno-deprecated" 
} */
+/* { dg-options "-maltivec -mabi=altivec -std=gnu99 -mno-vsx -Wno-deprecated 
-flax-vector-conversions" } */
 #include 
 #include 
 extern char * *var_char_ptr;
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
index a891b64e6fa..0106e8d2901 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
@@ -1,167 +1,154 @@
-/* This test code is included into vsx-vector-6-be.c and vsx-vector-6-le.c.  
-   The two files have the tests for the number of instructions generated for
-   LE versus BE.  */
+/* This test code is included into vsx-vector-6.p7.c, vsx-vector-6.p8.c
+   and vsx-vector-6.p9.c.  The .c files have the tests for the number
+   of instructions generated for each cpu type.  */
 
 #include 
 
-void foo (vector double *out, vector double *in, vector long *p_l, vector bool 
long *p_b,
- vector unsigned char *p_uc, int *i, vector float *p_f,
- vector bool char *outbc, vector bool int *outbi,
- vector bool short *outbsi, vector int *outsi,
- vector unsigned int *outui, vector signed char *outsc,
- vector unsigned char *outuc)
+typedef struct {
+  vector double d;
+  vector float f;
+  vector long sl;
+  vector int si;
+  vector short ss;
+  vector char sc;
+  vector unsigned int ui;
+  vector unsigned short int us;
+  vector unsigned char uc;
+  vector bool long long bll;
+  vector bool long bl;
+  vector bool int bi;
+  vector bool short bs;
+  vector bool char bc;
+} opnd_t;
+
+void
+func_1op (opnd_t *dst, opnd_t *src)
 {
-  vector double in0 = in[0];
-  vector double in1 = in[1];
-  vector double in2 = in[2];
-  vector long inl = *p_l;
-  vector bool long inb = *p_b;
-  vector bool long long inbl0;
-  vector bool long long inbl1;
-  vector unsigned char uc = *p_uc;
-  vector float inf0;
-  vector float inf1;
-  vector float inf2;
-  vector char inc0;
-  vector char inc1;
-  vector bool char inbc0;
-  vector bool char inbc1;
-  vector bool short inbs0;
-  vector bool short inbs1;
-  vector bool int inbi0;
-  vector bool int inbi1;
-  vector signed short int inssi0, inssi1;
-  vector unsigned short int inusi0, inusi1;
-  vector signed int insi0, insi1;
-  vector unsigned int inui0, inui1;
-  vector unsigned char inuc0, inuc1;
-  
-  *out++ = vec_abs (in0);
-  *out++ = vec_add (in0, in1);
-  *out++ = vec_and (in0, in1);
-  *out++ = vec_and (in0, inb);
-  *out++ = vec_and (inb, in0);
-  *out++ = vec_andc (in0, in1);
-  *out++ = vec_andc (in0, inb);
-  *out++ = vec_andc (inb, in0);
-  *out++ = vec_andc (inbl0, in0);
-  *out++ = vec_andc (in0, inbl0);
-
-  *out++ = vec_ceil (in0);
-  *p_b++ = vec_cmpeq (in0, in1);
-  *p_b++ = vec_cmpgt (in0, in1);
-  *p_b++ = vec_cmpge (in0, in1);
-  *p_b++ = vec_cmplt (in0, in1);
-  *p_b++ = vec_cmple (in0, in1);
-  *out++ = vec_div (in0, in1);
-  *out++ = vec_floor (in0);
-  *out++ = vec_madd (in0, in1, in2);
-  *out++ = vec_msub (in0, in1, in2);
-  *out++ = vec_max (in0, in1);
-  *out++ = vec_min (in0, in1);
-  *out++ = vec_msub (in0, in1, in2);
-  *out++ = vec_mul (in0, in1);
-  *out++ = vec_nearbyint (in0);
-  *out++ = vec_nmadd (in0, in1, in2);
-  *out++ = vec_nmsub (in0, in1, in2);
-  *out++ = vec_nor (in0, in1

Re: [PATCH] RISC-V: Fix combined tree builds.

2020-02-07 Thread Jim Wilson

ping

Jim

On Thu, Jan 30, 2020 at 2:36 PM Jim Wilson  wrote:
>
> The RISC-V toolchain doesn't support leb128 because of linker relaxation
> to reduce code size.  This prevents us from computing the leb128 size of a
> value at compile time.  So do a configure time gas feature check regardless
> of gas version.
>
> The libiconv configure change comes from the recent config/lib-link.m4
> patch.  The gcc configure wasn't rebuilt after this change as this was
> intended for library configure files.
>
> Tested with riscv32-elf and arm-eabi combined tree builds.  The riscv build
> fails without the patch and works with the patch where HAVE_AS_LEB128 is now
> false.  The arm build still works and still has HAVE_AS_LEB128 true.
>
> OK?
>
> Jim
>
> gcc/
> PR target/91602
> * configure.ac (HAVE_AS_LEB128): Delete gas version number in
> gcc_GAS_CHECK_FEATURE call.
> * configure: Regenerated.
> ---
>  gcc/configure| 39 +++
>  gcc/configure.ac |  5 -
>  2 files changed, 31 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/configure b/gcc/configure
> index e2c8fc71772..2f57fbf3223 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -974,6 +974,7 @@ with_zstd_include
>  with_zstd_lib
>  enable_rpath
>  with_libiconv_prefix
> +with_libiconv_type
>  enable_sjlj_exceptions
>  with_gcc_major_version_only
>  enable_secureplt
> @@ -1811,6 +1812,7 @@ Optional Packages:
>--with-gnu-ld   assume the C compiler uses GNU ld default=no
>--with-libiconv-prefix[=DIR]  search for libiconv in DIR/include and 
> DIR/lib
>--without-libiconv-prefix don't search for libiconv in includedir and 
> libdir
> +  --with-libiconv-type=TYPE type of library to search for 
> (auto/static/shared)
>--with-gcc-major-version-only
>use only GCC major number in filesystem paths
>--with-pic  try to use only PIC/non-PIC objects [default=use
> @@ -10730,6 +10732,16 @@ if test "${with_libiconv_prefix+set}" = set; then :
>
>  fi
>
> +
> +# Check whether --with-libiconv-type was given.
> +if test "${with_libiconv_type+set}" = set; then :
> +  withval=$with_libiconv_type;  with_libiconv_type=$withval
> +else
> +   with_libiconv_type=auto
> +fi
> +
> +  lib_type=`eval echo \$with_libiconv_type`
> +
>LIBICONV=
>LTLIBICONV=
>INCICONV=
> @@ -10767,13 +10779,13 @@ fi
>found_so=
>found_a=
>if test $use_additional = yes; then
> -if test -n "$shlibext" && test -f 
> "$additional_libdir/lib$name.$shlibext"; then
> +if test -n "$shlibext" && test -f 
> "$additional_libdir/lib$name.$shlibext" && test x$lib_type != xstatic; then
>found_dir="$additional_libdir"
>found_so="$additional_libdir/lib$name.$shlibext"
>if test -f "$additional_libdir/lib$name.la"; then
>  found_la="$additional_libdir/lib$name.la"
>fi
> -else
> +elif test x$lib_type != xshared; then
>if test -f "$additional_libdir/lib$name.$libext"; then
>  found_dir="$additional_libdir"
>  found_a="$additional_libdir/lib$name.$libext"
> @@ -10797,13 +10809,13 @@ fi
>case "$x" in
>  -L*)
>dir=`echo "X$x" | sed -e 's/^X-L//'`
> -  if test -n "$shlibext" && test -f 
> "$dir/lib$name.$shlibext"; then
> +  if test -n "$shlibext" && test -f 
> "$dir/lib$name.$shlibext" && test x$lib_type != xstatic; then
>  found_dir="$dir"
>  found_so="$dir/lib$name.$shlibext"
>  if test -f "$dir/lib$name.la"; then
>found_la="$dir/lib$name.la"
>  fi
> -  else
> +  elif test x$lib_type != xshared; then
>  if test -f "$dir/lib$name.$libext"; then
>found_dir="$dir"
>found_a="$dir/lib$name.$libext"
> @@ -11031,8 +11043,13 @@ fi
>done
>  fi
>else
> -
> LIBICONV="${LIBICONV}${LIBICONV:+ }-l$name"
> -LTLIBICONV="${LTLIBICONV}${LTLIBICONV:+ }-l$name"
> +if x$lib_type = 
> xauto || x$lib_type = xshared; then
> +  LIBICONV="${LIBICONV}${LIBICONV:+ }-l$name"
> +  LTLIBICONV="${LTLIBICONV}${LTLIBICONV:+ }-l$name"
> +else
> +  LIBICONV="${LIBICONV}${LIBICONV:+ }-l:lib$name.$libext"
> +  LTLIBICONV="${LTLIBICONV}${LTLIBICONV:+ }-l:lib$name.$libext"
> +fi
>fi
>  fi
>fi
> @@ -23695,18 +23712,16 @@ _ACEOF
>
>
>  # Check if we have .[us]leb128, and support symbol arithmetic with it.
> +# Targets with aggressive linker rel

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-07 Thread Joseph Myers

On Fri, 7 Feb 2020, Ulrich Weigand wrote:

> I thought that a *more specific* option like -fsignalling-nans was always
> intended to override a more generic option like -ffast-math, no matter
> whether it comes before or after it on the command line.

Yes, that's correct.  (There are cases where it's ambiguous which option 
is more specific - see what I said in 
 about -Werror= - but I 
don't think -ffast-math involves such cases.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-07 Thread Joseph Myers

On Fri, 7 Feb 2020, Richard Biener wrote:

> To me it's a QOI question that depends on the actual case.
> Turning memcpy (p, p, N) into a no-op is the correct thing,
> even though with (too) large N it might trap.  Folding
> a read from outside of an object to zero might be OK
> (it's undefined), but the choice of zero is arbitrary and it's
> clearly not what the writer of the code intended.  Folding
> it do a NaT and do further optimizations based on that
> would be even worse.  Currently we leave the undefined
> code in place but IMHO another sensible choice would
> be to replace it with an explicit trap or unreachable
> (and the choice between those could be dependent on
> a command-line option).

Calling va_arg with a type changed by the default argument promotions 
(float, char, etc.) is another case where there would be a reasonably safe 
fallback (making gimplify_va_arg_expr adjust the type itself to the one it 
mentions in the warning message) as an alternative to generating a trap, 
possibly depending on command-line options to say how the user wishes to 
handle such cases.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [committed] avoid pedantic warning in C++ 98 mode: comma at end of enumerator list

2020-02-07 Thread Joseph Myers

On Wed, 5 Feb 2020, Martin Sebor wrote:

> I removed the trailing comma and (after a few false starts) managed
> to push the change in commit
>   r10-6466-g297aa668293d55ffe100d810e92fbe592f262557.
> 
> I got the error below for my first few attempts.  The message had
> the expected format so I wasn't sure what the problem was until
> I removed the "[-Wpedantic]" part at the end.  Was it looking for
> a bug id and getting confused?

No, this error looks like there not being a blank line after the first 
line of the original commit message (a single-line commit message like you 
ended up with is OK for simple commits where a single line is sufficient 
description, but if there's more then one line there must be a blank line 
after the first line, which should act as a self-contained summary for 
tools such as "git log --oneline" to work well).

> remote: *** Below are the first few lines of the revision history:
> remote: *** | Remove trailing comma to avoid pedantic warning in C++ 98 mode:
> remote: *** |   comma at end of enumerator list [-Wpedantic]

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Use a non-empty test program to test ability to link

2020-02-07 Thread Joseph Myers

On Wed, 5 Feb 2020, Sandra Loosemore wrote:

> This patch is for PR 79193 and 88999, problems where libstdc++ is
> mis-configuring itself when building for a bare-metal target because it thinks
> it can link programs without pulling in the BSP that provides low-level I/O
> support.  (Specifically, this was observed on nios2-elf with Newlib and GDB
> semihosting.)  It'll build just fine if it recognizes that it can only compile
> programs and not link them, but it's confused because using an empty program
> to test for ability to link succeeds.
> 
> Is this configure change OK, and suitable for stage 4?
> 
> BTW, I did run autoconf in every subdirectory that contains a configure.ac,
> but it appears only libstc++-v3 actually uses this test; all the other
> regenerated configure scripts were unchanged.

There's some problem with that regeneration, then; every directory where 
configure.ac mentions GCC_NO_EXECUTABLES should have a resulting change (I 
tested regenerating in libgcc/ with this patch applied and got such a 
change, for example).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Patch to fix PR93561

2020-02-07 Thread Segher Boessenkool

Hi!

On Thu, Feb 06, 2020 at 05:16:14PM -0500, Vladimir Makarov wrote:
> --- a/gcc/lra-assigns.c
> +++ b/gcc/lra-assigns.c
> @@ -964,6 +964,8 @@ spill_for (int regno, bitmap spilled_pseudo_bitmap, bool 
> first_p)
>bitmap_clear (&spill_pseudos_bitmap);
>for (j = hard_regno_nregs (hard_regno, mode) - 1; j >= 0; j--)
>   {
> +  if (hard_regno + j >= FIRST_PSEUDO_REGISTER)
> + break;

if (!HARD_REGISTER_NUM_P (hard_regno + j))
?


Segher

Re: [PATCH] c++: Fix paren init of aggregates in unevaluated context [PR92947]

2020-02-07 Thread Jason Merrill


On 2/7/20 3:29 PM, Marek Polacek wrote:

When I implemented C++20 parenthesized initialization of aggregates
I introduced this bogus cp_unevaluated_operand check, thus disabling
this feature in unevaluated context.  Oop.

Removing the check turned up another bug: I wasn't checking the
return value of digest_init.  So when constructible_expr called
build_new_method_call_1 to see if we can construct one type from
another, it got back a bogus INIT_EXPR that looked something like
*(struct T &) 1 = <<< error >>>.  But that isn't the error_mark_node,
so constructible_expr thought we had been successful in creating the
ctor call, and it gave the wrong answer.  Covered by paren-init17.C.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.


PR c++/92947 - Paren init of aggregates in unevaluated context.
* call.c (build_new_method_call_1): Don't check
cp_unevaluated_operand.  Check the return value of digest_init.

* g++.dg/cpp2a/paren-init21.C: New test.
---
  gcc/cp/call.c |  3 ++-
  gcc/testsuite/g++.dg/cpp2a/paren-init21.C | 14 ++
  2 files changed, 16 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init21.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index fde29f8f631..51621b7dd87 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -10179,7 +10179,6 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
 the two.  */
if (DECL_CONSTRUCTOR_P (fn)
  && !(flags & LOOKUP_ONLYCONVERTING)
- && !cp_unevaluated_operand
  && cxx_dialect >= cxx2a
  && CP_AGGREGATE_TYPE_P (basetype)
  && !user_args->is_empty ())
@@ -10194,6 +10193,8 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
  else
{
  ctor = digest_init (basetype, ctor, complain);
+ if (ctor == error_mark_node)
+   return error_mark_node;
  ctor = build2 (INIT_EXPR, TREE_TYPE (instance), instance, ctor);
  TREE_SIDE_EFFECTS (ctor) = true;
  return ctor;
diff --git a/gcc/testsuite/g++.dg/cpp2a/paren-init21.C 
b/gcc/testsuite/g++.dg/cpp2a/paren-init21.C
new file mode 100644
index 000..cfd37193f66
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/paren-init21.C
@@ -0,0 +1,14 @@
+// PR c++/92947 - Paren init of aggregates in unevaluated context.
+// { dg-do compile { target c++2a } }
+
+struct A {
+  int a;
+  int b;
+};
+
+int main()
+{
+  static_assert(__is_constructible(A, int, int));
+  decltype(A(1,2)) foo;
+  bool b = noexcept(A(1,2));
+}

base-commit: 572992c8920d5339a3ac28d442c436d6daa0bfae

[PATCH] c++: Fix paren init of aggregates in unevaluated context [PR92947]

2020-02-07 Thread Marek Polacek

When I implemented C++20 parenthesized initialization of aggregates
I introduced this bogus cp_unevaluated_operand check, thus disabling
this feature in unevaluated context.  Oop.

Removing the check turned up another bug: I wasn't checking the
return value of digest_init.  So when constructible_expr called
build_new_method_call_1 to see if we can construct one type from
another, it got back a bogus INIT_EXPR that looked something like
*(struct T &) 1 = <<< error >>>.  But that isn't the error_mark_node,
so constructible_expr thought we had been successful in creating the
ctor call, and it gave the wrong answer.  Covered by paren-init17.C.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

PR c++/92947 - Paren init of aggregates in unevaluated context.
* call.c (build_new_method_call_1): Don't check
cp_unevaluated_operand.  Check the return value of digest_init.

* g++.dg/cpp2a/paren-init21.C: New test.
---
 gcc/cp/call.c |  3 ++-
 gcc/testsuite/g++.dg/cpp2a/paren-init21.C | 14 ++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/paren-init21.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index fde29f8f631..51621b7dd87 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -10179,7 +10179,6 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
 the two.  */
   if (DECL_CONSTRUCTOR_P (fn)
  && !(flags & LOOKUP_ONLYCONVERTING)
- && !cp_unevaluated_operand
  && cxx_dialect >= cxx2a
  && CP_AGGREGATE_TYPE_P (basetype)
  && !user_args->is_empty ())
@@ -10194,6 +10193,8 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
  else
{
  ctor = digest_init (basetype, ctor, complain);
+ if (ctor == error_mark_node)
+   return error_mark_node;
  ctor = build2 (INIT_EXPR, TREE_TYPE (instance), instance, ctor);
  TREE_SIDE_EFFECTS (ctor) = true;
  return ctor;
diff --git a/gcc/testsuite/g++.dg/cpp2a/paren-init21.C 
b/gcc/testsuite/g++.dg/cpp2a/paren-init21.C
new file mode 100644
index 000..cfd37193f66
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/paren-init21.C
@@ -0,0 +1,14 @@
+// PR c++/92947 - Paren init of aggregates in unevaluated context.
+// { dg-do compile { target c++2a } }
+
+struct A {
+  int a;
+  int b;
+};
+
+int main()
+{
+  static_assert(__is_constructible(A, int, int));
+  decltype(A(1,2)) foo;
+  bool b = noexcept(A(1,2));
+}

base-commit: 572992c8920d5339a3ac28d442c436d6daa0bfae
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA

[COMMITTED] c++: Fix use of local in constexpr if.

2020-02-07 Thread Jason Merrill

extract_local_specs wasn't finding the mention of 'an' as a template
argument because we weren't walking into template arguments.  So here I
changed cp_walk_subtrees to do so--only walking into template arguments in
the spelling of the type or expression, not any hidden behind typedefs.  The
change to use typedef_variant_p avoids looking through typedefs spelled with
'typedef' as well as those spelled with 'using'.  And then I removed some
now-redundant code for walking into template arguments in a couple of
walk_tree callbacks.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/92654
* tree.c (cp_walk_subtrees): Walk into type template arguments.
* cp-tree.h (TYPE_TEMPLATE_INFO_MAYBE_ALIAS): Use typedef_variant_p
instead of TYPE_ALIAS_P.
* pt.c (push_template_decl_real): Likewise.
(find_parameter_packs_r): Likewise.  Remove dead code.
* error.c (find_typenames_r): Remove dead code.
---
 gcc/cp/cp-tree.h  |  2 +-
 gcc/cp/error.c|  6 ---
 gcc/cp/pt.c   | 40 ---
 gcc/cp/tree.c |  5 +++
 .../g++.dg/cpp1z/constexpr-if-lambda2.C   | 12 ++
 5 files changed, 25 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda2.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b8035b4360d..c46d1e92b3b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -3482,7 +3482,7 @@ struct GTY(()) lang_decl {
for the alias template (if any).  Otherwise behave as
TYPE_TEMPLATE_INFO.  */
 #define TYPE_TEMPLATE_INFO_MAYBE_ALIAS(NODE)   \
-  (TYPE_ALIAS_P (NODE) \
+  (typedef_variant_p (NODE)\
? TYPE_ALIAS_TEMPLATE_INFO (NODE)   \
: TYPE_TEMPLATE_INFO (NODE))
 
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 973b3034e12..ab8638fbaec 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1526,12 +1526,6 @@ find_typenames_r (tree *tp, int *walk_subtrees, void 
*data)
   if (mv && (mv == *tp || !d->p_set->add (mv)))
 vec_safe_push (d->typenames, mv);
 
-  /* Search into class template arguments, which cp_walk_subtrees
- doesn't do.  */
-  if (CLASS_TYPE_P (*tp) && CLASSTYPE_TEMPLATE_INFO (*tp))
-cp_walk_tree (&CLASSTYPE_TI_ARGS (*tp), find_typenames_r,
- data, d->p_set);
-
   return NULL_TREE;
 }
 
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 01bade85cdf..2fb52caa5d4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3818,9 +3818,12 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 (struct find_parameter_pack_data*)data;
   bool parameter_pack_p = false;
 
-  /* Handle type aliases/typedefs.  */
-  if (TYPE_ALIAS_P (t))
+  /* Don't look through typedefs; we are interested in whether a
+ parameter pack is actually written in the expression/type we're
+ looking at, not the target type.  */
+  if (TYPE_P (t) && typedef_variant_p (t))
 {
+  /* But do look at arguments for an alias template.  */
   if (tree tinfo = TYPE_ALIAS_TEMPLATE_INFO (t))
cp_walk_tree (&TI_ARGS (tinfo),
  &find_parameter_packs_r,
@@ -3903,27 +3906,13 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
  &find_parameter_packs_r, ppd, ppd->visited);
 
   /* This switch statement will return immediately if we don't find a
- parameter pack.  */
+ parameter pack.  ??? Should some of these be in cp_walk_subtrees?  */
   switch (TREE_CODE (t))
 {
-case TEMPLATE_PARM_INDEX:
-  return NULL_TREE;
-
 case BOUND_TEMPLATE_TEMPLATE_PARM:
   /* Check the template itself.  */
   cp_walk_tree (&TREE_TYPE (TYPE_TI_TEMPLATE (t)),
&find_parameter_packs_r, ppd, ppd->visited);
-  /* Check the template arguments.  */
-  cp_walk_tree (&TYPE_TI_ARGS (t), &find_parameter_packs_r, ppd,
-   ppd->visited);
-  *walk_subtrees = 0;
-  return NULL_TREE;
-
-case TEMPLATE_TYPE_PARM:
-case TEMPLATE_TEMPLATE_PARM:
-  return NULL_TREE;
-
-case PARM_DECL:
   return NULL_TREE;
 
 case DECL_EXPR:
@@ -3932,20 +3921,6 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
*walk_subtrees = 0;
   return NULL_TREE;
 
-case RECORD_TYPE:
-  if (TYPE_PTRMEMFUNC_P (t))
-   return NULL_TREE;
-  /* Fall through.  */
-
-case UNION_TYPE:
-case ENUMERAL_TYPE:
-  if (TYPE_TEMPLATE_INFO (t))
-   cp_walk_tree (&TYPE_TI_ARGS (t),
- &find_parameter_packs_r, ppd, ppd->visited);
-
-  *walk_subtrees = 0;
-  return NULL_TREE;
-
 case TEMPLATE_DECL:
   if (!DECL_TEMPLATE_TEMPLATE_PARM_P (t))
return NULL_TREE;
@@ -5794,8 +5769,7 @@ push_template_decl_real (tree decl, bool is_friend)

[PATCH] libstdc++: Make sure iterator_traits is populated

2020-02-07 Thread Patrick Palka

Since basic_istream_view::iterator is neither a cpp17 iterator (because it's
move-only) nor does it define all four of the types {difference_type,
value_type, reference, iterator_category}, then by the rule in
[iterator.traits], its iterator_traits has no members.

More concretely this means that it can't e.g. be composed with a
views::transform or a views::filter because they look at the iterator_traits of
the underlying view.

This seems to be a defect in the current spec.

libstdc++-v3/ChangeLog:

* include/std/ranges (ranges::basic_istream_view::iterator::reference):
Define it so that this iterator's iterator_traits is populated.
* testsuite/std/ranges/istream_view.cc: Augment test.
---
 libstdc++-v3/include/std/ranges   |  1 +
 libstdc++-v3/testsuite/std/ranges/istream_view.cc | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 891ecf75eff..6e982d6ded3 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -996,6 +996,7 @@ namespace views
using iterator_category = input_iterator_tag;
using difference_type = ptrdiff_t;
using value_type = _Val;
+   using reference = _Val&;
 
_Iterator() = default;
 
diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc 
b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
index 1729459bce3..7feb3ff4b0a 100644
--- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
@@ -68,10 +68,21 @@ test03()
   VERIFY( ranges::equal(v, (int[]){0,1,2,3,4}) );
 }
 
+void
+test04()
+{
+  std::string s = "0123456789";
+  auto ss = std::istringstream{s};
+  static_assert(ranges::input_range(ss))>);
+  auto v = ranges::istream_view(ss) | views::transform(&X::c);
+  VERIFY( ranges::equal(v, s) );
+}
+
 int
 main()
 {
   test01();
   test02();
   test03();
+  test04();
 }
-- 
2.25.0.191.gde93cc14ab

Re: [PATCH][AArch64] Improve clz patterns

2020-02-07 Thread Richard Sandiford

Wilco Dijkstra  writes:
> Hi Richard,
>
>> Could you go into more detail about what the before and after code
>> looks like, and what combine is doing?  Like you say, this sounds
>> like a target-independent thing on face value.
>
> It is indeed, but it seems specific to instructions where we have range
> information which allows it to remove a redundant sign-extend.
>
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565 for the full details.
>
>> Either way, something like this needs a testcase.
>
> Sure I've added the testcase from pr93565, see below.
>
> Cheers,
> Wilco
>
>
> Although GCC should understand the limited range of clz/ctz/cls results,
> Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> sign extension.  Avoid this by adding an explicit AND with 127 in the
> patterns. Deepsjeng performance improves by ~0.6%.
>
> Bootstrap OK.
>
> ChangeLog:
> 2020-02-04  Wilco Dijkstra  
>
> PR rtl-optimization/93565
> * config/aarch64/aarch64.md (clz2): Mask the clz result.
> (clrsb2): Likewise.
> (ctz2): Likewise.
>
> * gcc.target/aarch64/pr93565.c: New test.

I think this only works because the (and ...) wrapper stops combine
from matching a plain ctz:

Failed to match this instruction:
(set (reg:DI 100 [ _9+-4 ])
(ctz:DI (reg/v:DI 97 [ x ])))

I.e. the patch doesn't work by making the range explicit.  It just
works by wrapping the pattern in something that combine will never
generate naturally.  Something like (ashift:DI ... (const_int 0))
would give the same result.

If I add a second pattern that is never generated directly but can be
matched by combine:

(define_insn_and_split "*ctz2"
 [(set (match_operand:GPI   0 "register_operand" "=r")
   (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
  ""
  "#"
  "reload_completed"
  [(const_int 0)]
  "
  emit_insn (gen_rbit2 (operands[0], operands[1]));
  emit_insn (gen_clz2 (operands[0], operands[0]));
  DONE;
")

then we get the double ctz again.

So I don't think this is OK even as a workaround.  We should fix it in
target-independent code instead.

Thanks,
Richard

Re: [PATCH 2/4] Use const for some function arguments.

2020-02-07 Thread Segher Boessenkool

On Tue, Feb 04, 2020 at 02:55:14PM +0100, Martin Liska wrote:
> 
> gcc/ChangeLog:
> 
> 2020-02-04  Martin Liska  
> 
>   PR c/92472.

That trailing dot should not be there (in some other patches as well).


Segher

Re: [PATCH] issues with configure --enable-checking option

2020-02-07 Thread Richard Sandiford

Roman Zhuykov  writes:
> Hi!
> I've investigated a bit, because some of the following confused me while 
> working with some local 9.2-based branch.
>
> Documentation issues:
> (0) See patch for install.texi at the bottom, two possible values are 
> not documented. Ok for master? Backports?
> (1) For me it seems confusing to have 'tree' and 'gimple' values, but 
> not sure how to solve this.
> (2) Developer has to look into configure scripts to understand which 
> macro will be defined. E.q. 'misc' means "CHECKING_P".
> (3) Install.texi IMHO doesn't properly describe what happens when 
> --enable-checking is used without "=list". Maybe we should explicitly 
> tell this means same as "=yes".
> (4) Statement "This is ‘yes,extra’ by default when building from the 
> source repository or snapshots." is wrong, because 'snapshots' may 
> relate to released branches (e.q. GCC 9-20200125 Snapshot), and 
> gcc/DEV-PHASE is empty there.
> (5) Statement "‘extra’ adds for ‘misc’ checking extra checks ..." is 
> also confusing, one can run 'configure --enable-checking=extra' and will 
> have only ENABLE_EXTRA_CHECKING, but not CHECKING_P, and in common.opt 
> flag_checking will have Init(0).
>
> Behavior issues:
> (6) It is not obvious to have default --enable-checking=release on any 
> commit in releases/* branches (DEV-PHASE is empty there). IMHO it's 
> enough 'experimental' when picking for example some commit between 9.1 
> and 9.2. This also can confuse 'git bisect run' scenario when good 
> revision is old enough and bad revision is on release branch. See also (4).
> (7) Running "configure --enable-checking" means less (!) checks on 
> master branch (where DEV-PHASE is experimental). Default is "yes+extra" 
> and you get only "yes" with that option.
> (8) Why we always start with "release" values ('assert'+'runtime') as 
> default? If developer runs "configure --enable-checking=df,rtl,tree" 
> probably it should mean only picked values are enabled. Why we silently 
> add 'assert' and 'runtime' into that set?
>
> I haven't tried to find additional issues with related 
> '--enable-stage1-checking' option.
>
> Roman
>
> PS. I see some lines have more than 80 chars in install.texi, few of 
> them were added recently while cleaning references to SVN. Patch fixes 
> this only forthe paragraph it touches.
> --
>
> gcc/ChangeLog:
>
> 2020-01-29  Roman Zhuykov 
>
>      * doc/install.texi: Document 'types' and 'gimple' values for
>      '--enable-checking' configure option.
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1845,19 +1845,19 @@ consistency checks of the requested complexity.  
> This does not change the
>   generated code, but adds error checking within the compiler.  This will
>   slow down the compiler and may only work properly if you are building
>   the compiler with GCC@.  This is @samp{yes,extra} by default when building
> -from the source repository or snapshots, but @samp{release} for 
> releases.  The default
> -for building the stage1 compiler is @samp{yes}.  More control
> +from the source repository or snapshots, but @samp{release} for releases.
> +The default for building the stage1 compiler is @samp{yes}.  More control

Pre-existing problem, but: it looks like the current default is yes,types:

[if test "x$enable_checking" = xno || test "x$enable_checking" = x; then
  # For --disable-checking or implicit --enable-checking=release, avoid
  # setting --enable-checking=gc in the default stage1 checking for LTO
  # bootstraps.  See PR62077.
  case $BUILD_CONFIG in
*lto*)
  stage1_checking=--enable-checking=release,misc,gimple,rtlflag,tree,types;;
*)
  stage1_checking=--enable-checking=yes,types;;
  esac
  if test "x$enable_checking" = x && \
 test -d ${srcdir}/gcc && \
 test x"`cat ${srcdir}/gcc/DEV-PHASE`" = xexperimental; then
stage1_checking=--enable-checking=yes,types,extra
  fi
else
  stage1_checking=--enable-checking=$enable_checking,types
fi])

Could you fix that while you're there?

>   over the checks may be had by specifying @var{list}.  The categories of
>   checks available are @samp{yes} (most common checks
> -@samp{assert,misc,tree,gc,rtlflag,runtime}), @samp{no} (no checks at
> -all), @samp{all} (all but @samp{valgrind}), @samp{release} (cheapest
> +@samp{assert,misc,tree,gc,gimple,rtlflag,runtime,types}), @samp{no} (no 
> checks
> +at all), @samp{all} (all but @samp{valgrind}), @samp{release} (cheapest
>   checks @samp{assert,runtime}) or @samp{none} (same as @samp{no}).
>   Individual checks can be enabled with these flags @samp{assert},
> -@samp{df}, @samp{fold}, @samp{gc}, @samp{gcac}, @samp{misc}, @samp{rtl},
> -@samp{rtlflag}, @samp{runtime}, @samp{tree}, @samp{extra} and 
> @samp{valgrind}.
> -@samp{extra} adds for @samp{misc} checking extra checks that might affect
> -code generation and should therefore not differ between stage1 and later
> -stages.
> +@samp{df}, @samp{f

[PATCH] libstdc++: Implement P1878R1 "Constraining Readable Types"

2020-02-07 Thread Jonathan Wakely

* include/bits/iterator_concepts.h (iter_difference_t, iter_value_t):
Use remove_cvref_t.
(readable_traits): Rename to indirectly_readable_traits.
(readable): Rename to indirectly_readable.
(writable): Rename to indirectly_writable.
(__detail::__iter_exchange_move): Do not use remove_reference_t.
(indirectly_swappable): Adjust requires expression parameter types.
expression.
* include/bits/ranges_algo.h (ranges::transform, ranges::replace)
(ranges::replace_if, ranges::generate_n, ranges::generate)
(ranges::remove): Use new name for writable.
* include/bits/stl_iterator.h (__detail::__common_iter_has_arrow):
Use new name for readable.
* include/ext/pointer.h (readable_traits<_Pointer_adapter>): Use
new name for readable_traits.
* testsuite/24_iterators/associated_types/readable.traits.cc: Likewise.
* testsuite/24_iterators/indirect_callable/projected.cc: Adjust for
new definition of indirectly_readable.

Tested powerpc64le-linux, committed to master.

I think this finishes our C++20 Ranges implementation!

commit c8dd2446f597e6d1581414a9c02ff329285181a9
Author: Jonathan Wakely 
Date:   Thu Feb 6 11:21:25 2020 +

libstdc++: Implement P1878R1 "Constraining Readable Types"

* include/bits/iterator_concepts.h (iter_difference_t, 
iter_value_t):
Use remove_cvref_t.
(readable_traits): Rename to indirectly_readable_traits.
(readable): Rename to indirectly_readable.
(writable): Rename to indirectly_writable.
(__detail::__iter_exchange_move): Do not use remove_reference_t.
(indirectly_swappable): Adjust requires expression parameter types.
expression.
* include/bits/ranges_algo.h (ranges::transform, ranges::replace)
(ranges::replace_if, ranges::generate_n, ranges::generate)
(ranges::remove): Use new name for writable.
* include/bits/stl_iterator.h (__detail::__common_iter_has_arrow):
Use new name for readable.
* include/ext/pointer.h (readable_traits<_Pointer_adapter>): Use
new name for readable_traits.
* testsuite/24_iterators/associated_types/readable.traits.cc: 
Likewise.
* testsuite/24_iterators/indirect_callable/projected.cc: Adjust for
new definition of indirectly_readable.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index d9b8958d0a7..04c862a4b97 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -173,11 +173,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // ITER_TRAITS
 template
   using __iter_traits = typename __iter_traits_impl<_Iter, _Tp>::type;
+
+template
+  using __iter_diff_t = typename
+   __iter_traits<_Tp, incrementable_traits<_Tp>>::difference_type;
   } // namespace __detail
 
   template
-using iter_difference_t = typename
-  __detail::__iter_traits<_Tp, incrementable_traits<_Tp>>::difference_type;
+using iter_difference_t = __detail::__iter_diff_t>;
 
   namespace __detail
   {
@@ -188,35 +191,41 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { using value_type = remove_cv_t<_Tp>; };
   } // namespace __detail
 
-  template struct readable_traits { };
+  template struct indirectly_readable_traits { };
 
   template
-struct readable_traits<_Tp*>
+struct indirectly_readable_traits<_Tp*>
 : __detail::__cond_value_type<_Tp>
 { };
 
   template requires is_array_v<_Iter>
-struct readable_traits<_Iter>
+struct indirectly_readable_traits<_Iter>
 { using value_type = remove_cv_t>; };
 
   template
-struct readable_traits
-: readable_traits<_Iter>
+struct indirectly_readable_traits
+: indirectly_readable_traits<_Iter>
 { };
 
   template requires requires { typename _Tp::value_type; }
-struct readable_traits<_Tp>
+struct indirectly_readable_traits<_Tp>
 : __detail::__cond_value_type
 { };
 
   template requires requires { typename _Tp::element_type; }
-struct readable_traits<_Tp>
+struct indirectly_readable_traits<_Tp>
 : __detail::__cond_value_type
 { };
 
+  namespace __detail
+  {
+template
+  using __iter_value_t = typename
+   __iter_traits<_Tp, indirectly_readable_traits<_Tp>>::value_type;
+  } // namespace __detail
+
   template
-using iter_value_t = typename
-  __detail::__iter_traits<_Tp, readable_traits<_Tp>>::value_type;
+using iter_value_t = __detail::__iter_value_t>;
 
   namespace __detail
   {
@@ -235,11 +244,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
&& requires(_Iter __it)
{
  typename incrementable_traits<_Iter>::difference_type;
- typename readable_traits<_Iter>::value_type;
+ typename indirectly_readable_traits<_Iter>::value_type;

[committed] libstdc++: Fix bug in iterator_traits>

2020-02-07 Thread Jonathan Wakely

The wrong type was being used in the __common_iter_has_arrow constraint,
creating a circular dependency where the iterator_traits specialization
was needed before it was complete. The correct parameter for the
__common_iter_has_arrow concept is the first template argument of  the
common_iterator, not the common_iterator itself.

* include/bits/stl_iterator.h (__detail::__common_iter_ptr): Change
to take parameters of common_iterator, instead of the common_iterator
type itself. Fix argument for __common_iter_has_arrow constraint.
(iterator_traits>::pointer): Adjust.

Tested powerpc64le-linux, committed to master.

commit d222d8ec3c100f5c0a6974e7dcee16903f6f0e3a
Author: Jonathan Wakely 
Date:   Fri Feb 7 16:46:42 2020 +

libstdc++: Fix bug in iterator_traits>

The wrong type was being used in the __common_iter_has_arrow constraint,
creating a circular dependency where the iterator_traits specialization
was needed before it was complete. The correct parameter for the
__common_iter_has_arrow concept is the first template argument of  the
common_iterator, not the common_iterator itself.

* include/bits/stl_iterator.h (__detail::__common_iter_ptr): Change
to take parameters of common_iterator, instead of the 
common_iterator
type itself. Fix argument for __common_iter_has_arrow constraint.
(iterator_traits>::pointer): Adjust.

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 69c6ae66cdf..46804656801 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1738,17 +1738,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
 // FIXME: This has to be at namespace-scope because of PR 92103.
-template
+template
   struct __common_iter_ptr
   {
using type = void;
   };
 
-template
-  requires __detail::__common_iter_has_arrow<_Iter>
-  struct __common_iter_ptr<_Iter>
+template
+  requires __detail::__common_iter_has_arrow<_It>
+  struct __common_iter_ptr<_It, _Sent>
   {
-   using type = decltype(std::declval().operator->());
+   using common_iterator = std::common_iterator<_It, _Sent>;
+
+   using type
+ = decltype(std::declval().operator->());
   };
   } // namespace __detail
 
@@ -1762,8 +1765,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
forward_iterator_tag, input_iterator_tag>;
   using value_type = iter_value_t<_It>;
   using difference_type = iter_difference_t<_It>;
-  using pointer = typename
-   __detail::__common_iter_ptr>::type;
+  using pointer = typename __detail::__common_iter_ptr<_It, _Sent>::type;
   using reference = iter_reference_t<_It>;
 };

Re: [ARM] Fix -mpure-code for v6m

2020-02-07 Thread Richard Earnshaw (lists)


On 07/02/2020 16:43, Christophe Lyon wrote:

On Fri, 7 Feb 2020 at 14:49, Richard Earnshaw (lists)
 wrote:


On 07/02/2020 13:19, Christophe Lyon wrote:

When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code
for cortex-m0, I noticed that some testcases were failing because we
still generate "ldr rX, .LCY", which is what we want to avoid with
-mpure-code. This is latent since a recent improvement in fwprop
(PR88833).

In this patch I change the thumb1_movsi_insn pattern so that it emits
the desired instruction sequence when arm_disable_literal_pool is set.

I tried to add a define_split instead, but couldn't make it work: the
compiler then complains it cannot split the instruction, while my new
define_split accepts the same operand types as thumb1_movsi_insn:

c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: could not split insn
(insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342])
  (symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 {*thumb1_movsi_insn}
   (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") [flags 0x2])
  (nil)))
during RTL pass: final

(define_split
[(set (match_operand:SI 0 "register_operand" "")
  (match_operand:SI 1 "general_operand" ""))]
"TARGET_THUMB1
 && arm_disable_literal_pool
 && GET_CODE (operands[1]) == SYMBOL_REF"
[(clobber (const_int 0))]
"
  gen_thumb1_movsi_symbol_ref(operands[0], operands[1]);
  DONE;
"
)
and I put this in thumb1_movsi_insn:
if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool)
{
  return \"#\";
}
return \"ldr\\t%0, %1\";

2020-02-07  Christophe Lyon  

  * config/arm/thumb1.md (thumb1_movsi_insn): Fix ldr alternative to
  work with -mpure-code.



+case 0:
+case 1:
+  return \"movs%0, %1\";
+case 2:
+  return \"movw%0, %1\";

This is OK, but please replace the hard tab in the strings for MOVS/MOVW
with \\t.



OK that was merely a cut & paste from the existing code.

I'm concerned that the length attribute is becoming wrong with my
patch, isn't this a problem?



Potentially yes.  The branch range code needs this to handle overly long 
jumps correctly.


R.


Thanks,

Christophe


R.

Re: [PATCH] alias: Fix offset checks involving section anchors [PR92294]

2020-02-07 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Feb 4, 2020 at 6:44 PM Richard Sandiford
>  wrote:
>>
>> Richard Sandiford  writes:
>> > [...]
>> >> I'm not sure given the issues you've introduced if I could actually
>> >> fill out the matrix of answers without more underlying information.
>> >> ie, when can we get symbols without source level decls,
>> >> anchors+interposition issues, etc.
>> >
>> > OK.  In that case, I wonder whether it would be safer to have a
>> > fourth state on top of the three above:
>> >
>> >   - known distance apart
>> >   - independent
>> >   - known distance apart or independent
>> >   - don't know
>> >
>> > with "don't know" being anything that involves bare symbols?
>> >
>> > Richard
>>
>> How does this look?  Tested on aarch64-linux-gnu and
>> x86_64-linux-gnu.
>>
>> Full description from scratch:
>>
>> memrefs_conflict_p has a slightly odd structure.  It first checks
>> whether two addresses based on SYMBOL_REFs refer to the same object,
>> with a tristate result:
>>
>>   int cmp = compare_base_symbol_refs (x,y);
>>
>> If the addresses do refer to the same object, we can use offset-based checks:
>>
>>   /* If both decls are the same, decide by offsets.  */
>>   if (cmp == 1)
>> return offset_overlap_p (c, xsize, ysize);
>>
>> But then, apart from the special case of forced address alignment,
>> we use an offset-based check even if we don't know whether the
>> addresses refer to the same object:
>>
>>   /* Assume a potential overlap for symbolic addresses that went
>>  through alignment adjustments (i.e., that have negative
>>  sizes), because we can't know how far they are from each
>>  other.  */
>>   if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
>> return -1;
>>   /* If decls are different or we know by offsets that there is no 
>> overlap,
>>  we win.  */
>>   if (!cmp || !offset_overlap_p (c, xsize, ysize))
>> return 0;
>>
>> This somewhat contradicts:
>>
>>   /* In general we assume that memory locations pointed to by different 
>> labels
>>  may overlap in undefined ways.  */
>
> Sorry for not chiming in earlier but isn't the bug that
>
>
>
>> at the end of compare_base_symbol_refs.  In other words, we're taking -1
>> to mean that either (a) the symbols are equal (via aliasing) or (b) the
>> references access non-overlapping objects.
>>
>> But even assuming that's true for normal symbols, it doesn't cope
>> correctly with section anchors.  If a symbol X at ANCHOR+OFFSET
>> is preemptible, either (a) X = ANDHOR+OFFSET or (b) X and ANCHOR
>> reference non-overlapping objects.
>>
>> And an offset-based comparison makes no sense for an anchor symbol
>> vs. a bare symbol with no decl.  If the bare symbol is allowed to
>> alias other symbols then it can surely alias any symbol in the
>> anchor's block, so there are multiple anchor offsets that might
>> induce an alias.
>
> But then isn't the simple fix to honor the -1 and do
>
> diff --git a/gcc/alias.c b/gcc/alias.c
> index 3794f9b6a9e..bf13d37c0f7 100644
> --- a/gcc/alias.c
> +++ b/gcc/alias.c
> @@ -2490,9 +2490,8 @@ memrefs_conflict_p (poly_int64 xsize, rtx x,
> poly_int64 ysize, rtx y,
>  other.  */
>if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0))
> return -1;
> -  /* If decls are different or we know by offsets that there is no 
> overlap,
> -we win.  */
> -  if (!cmp || !offset_overlap_p (c, xsize, ysize))
> +  /* If decls are different, we win.  */
> +  if (cmp == 0)
> return 0;
>/* Decls may or may not be different and offsets overlap*/
>return -1;
>
> ?

The code was deliberately written this way from the ouset though
(rather than it ending up like this through many cuts).  It was
added in g:54363f8a92920f5559c83ddd53e480a27205e6b7:

2015-12-08  Jan Hubicka  

PR ipa/61886
PR middle-end/25140
* tree-ssa-alias.c (ptr_deref_may_alias_decl_p): Use compare_base_decls
(nonoverlapping_component_refs_of_decl_p): Update sanity check.
(decl_refs_may_alias_p): Use compare_base_decls.
* alias.c: Include cgraph.h
(get_alias_set): Add cut-off for recursion.
(rtx_equal_for_memref_p): Use rtx_equal_for_memref_p.
(compare_base_decls): New function.
(base_alias_check): Likewise.
(memrefs_conflict_p): Likewise.
(nonoverlapping_memrefs_p): Likewise.
* alias.h (compare_base_decls): Declare.

which included:

-  if (rtx_equal_for_memref_p (x, y))
+  if (GET_CODE (x) == SYMBOL_REF && GET_CODE (y) == SYMBOL_REF)
+{
+  tree x_decl = SYMBOL_REF_DECL (x);
+  tree y_decl = SYMBOL_REF_DECL (y);
+  int cmp;
+
+  if (!x_decl || !y_decl)
+   {
+ /* Label and normal symbol are never the same. */
+ if (x_decl != y_decl)
+   return 0;
+ return offset_overlap_p (c, xsize, ysize);
+   }
+  else
+cmp = compare_base_decls (x_decl,

Re: [PATCH] Fix -ffast-math flags handling inconsistencies

2020-02-07 Thread Ulrich Weigand

Richard Biener wrote:
> On Fri, Jan 31, 2020 at 6:01 PM Ulrich Weigand  wrote:
> > The overall effect of this patch is that now all component flags of
> > -ffast-math are treated exactly equivalently:
> >   * they are set (changed from their default) with -ffast-math
> >   * they are reset to their default with -fno-fast-math
> >   * __FAST_MATH__ is only defined if the value of the flag matches
> > what -ffast-math would have set it to
> 
> The last part is not obviously correct to me since it doesn't match
> documentation which says
> 
> @item -ffast-math
> @opindex ffast-math
> Sets the options @option{-fno-math-errno}, 
> @option{-funsafe-math-optimizations},
> @option{-ffinite-math-only}, @option{-fno-rounding-math},
> @option{-fno-signaling-nans}, @option{-fcx-limited-range} and
> @option{-fexcess-precision=fast}.
> 
> This option causes the preprocessor macro @code{__FAST_MATH__} to be defined.
> 
> to me this reads as -ffast-math -fexcess-precision=standard defines
> __FAST_MATH__.
> The only relevant part to define __FAST_MATH__ is specifying -ffast-math, 
> other
> options are not relevant (which of course is contradicted by
> implementation - where
> I didn't actually follow its history in that respect).  So can you
> adjust documentation
> as to when exactly __FAST_MATH__ is defined?

Agreed.  This should probably be worded something along the lines of:
"Whenever all of those options are set to these values as listed above,
the preprocessor macro @code{__FAST_MATH__} will be defined."

> > -  if (!opts->frontend_set_flag_signaling_nans)
> > -   opts->x_flag_signaling_nans = 0;
> > -  if (!opts->frontend_set_flag_rounding_math)
> > -   opts->x_flag_rounding_math = 0;
[...]
> > +  // -ffast-math should also reset -fsignaling-nans and -frounding-math,
> > +  // but since those are off by default, there's nothing to do for now.
> ...
> 
> but what happens to -fsignalling-nans -ffast-math then?  Better leave those
> in I'd say.

Ah, it seems I was confused about the intended semantics here.

I thought that a *more specific* option like -fsignalling-nans was always
intended to override a more generic option like -ffast-math, no matter
whether it comes before or after it on the command line.

Is it instead the case that the intended behavior is the generic option
overrides the specific option if the generic option comes later on the
command line?

In that case, I agree that those should be left in.

However, then I think it would be good to add tests for 
!flag_signaling_nans / !flag_rounding_math to fast_math_flags_set_p,
because with these options on, we aren't really in fast-math territory
any more ...

> Note frontends come into play with what is considered -ffast-math
> and -fno-fast-math but below flags are tested irrespectively of that
> interpretation.
> 
> Note there's -fcx-fortran-rules similar to -fcx-limited-range but not tested
> above.  The canonical middle-end "flag" to look at is flag_complex_method.
> Somehow -fcx-fortran-rules doesn't come into play at all above but it
> affects -fcx-limited-range in another inconsistent way in that
> -fcx-limited-range -fcx-fortran-rules and -fcx-fortran-rules 
> -fcx-limited-range
> behave the same (-fcx-fortran-rules takes precedence...).  I guess
> -fcomplex-method=ENUM should be exposed and -fcx-* made
> appropriate aliases here.

I agree it would probably be the best if to use a -fcomplex-method=...
flag, handled analogously to -fexcess-precision in that it defaults to
an explicit DEFAULT value; unless this is overridden by an explicit
command line flag, the language front-end can then choose what DEFAULT
means for this particular language.

However, I'd prefer to not include that change into this patch as well :-)
This patch only changes the behavior related to -fcx-limited-range in one
very special case (-Ofast -fno-fast-math), where it makes it strictly
better.  It should not affect the (existing) interaction with
-fcx-fortran-rules at all -- so anything to improve this can be done
in a separate patch, I think.

> You're tapping into a mine-field ;)

That's for sure :-)

Thanks,
Ulrich


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [ARM] Fix -mpure-code for v6m

2020-02-07 Thread Christophe Lyon

On Fri, 7 Feb 2020 at 14:49, Richard Earnshaw (lists)
 wrote:
>
> On 07/02/2020 13:19, Christophe Lyon wrote:
> > When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code
> > for cortex-m0, I noticed that some testcases were failing because we
> > still generate "ldr rX, .LCY", which is what we want to avoid with
> > -mpure-code. This is latent since a recent improvement in fwprop
> > (PR88833).
> >
> > In this patch I change the thumb1_movsi_insn pattern so that it emits
> > the desired instruction sequence when arm_disable_literal_pool is set.
> >
> > I tried to add a define_split instead, but couldn't make it work: the
> > compiler then complains it cannot split the instruction, while my new
> > define_split accepts the same operand types as thumb1_movsi_insn:
> >
> > c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: could not split 
> > insn
> > (insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342])
> >  (symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 {*thumb1_movsi_insn}
> >   (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") [flags 0x2])
> >  (nil)))
> > during RTL pass: final
> >
> > (define_split
> >[(set (match_operand:SI 0 "register_operand" "")
> >  (match_operand:SI 1 "general_operand" ""))]
> >"TARGET_THUMB1
> > && arm_disable_literal_pool
> > && GET_CODE (operands[1]) == SYMBOL_REF"
> >[(clobber (const_int 0))]
> >"
> >  gen_thumb1_movsi_symbol_ref(operands[0], operands[1]);
> >  DONE;
> >"
> > )
> > and I put this in thumb1_movsi_insn:
> > if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool)
> >{
> >  return \"#\";
> >}
> >return \"ldr\\t%0, %1\";
> >
> > 2020-02-07  Christophe Lyon  
> >
> >  * config/arm/thumb1.md (thumb1_movsi_insn): Fix ldr alternative to
> >  work with -mpure-code.
> >
>
> +case 0:
> +case 1:
> +  return \"movs%0, %1\";
> +case 2:
> +  return \"movw%0, %1\";
>
> This is OK, but please replace the hard tab in the strings for MOVS/MOVW
> with \\t.
>

OK that was merely a cut & paste from the existing code.

I'm concerned that the length attribute is becoming wrong with my
patch, isn't this a problem?

Thanks,

Christophe

> R.

Re: [PATCH] Improve splitX passes management

2020-02-07 Thread Segher Boessenkool

On Thu, Feb 06, 2020 at 12:13:35PM +0100, Uros Bizjak wrote:
> The names of split_before_sched2 ("split4") and split_before_regstack
> ("split3") do not reflect their insertion point in the sequence of passes,
> where split_before_regstack follows split_before_sched2. Reorder the code
> and rename the passes to reflect the reality.

Renaming them to other splitN doesn't help much :-/  Having stable names
is more important (some archs actually use these names), I'd say.  But
it's hard to come up with shortish more meaningful names.

There is no real need for the N in splitN to be in order, but sure it
can be surprising otherwise.

> +bool
> +pass_split_before_regstack::gate (function *)
> +{
> +#if HAVE_ATTR_length && defined (STACK_REGS)
> +  /* If flow2 creates new instructions which need splitting
> + and scheduling after reload is not done, they might not be
> + split until final which doesn't allow splitting
> + if HAVE_ATTR_length.  */
> +  return !enable_split_before_sched2 ();
> +#else
> +  return false;
> +#endif
> +}

flow.c was deleted in 2006...  Is this split still needed at all?  If
so, change the comment please?  :-)

Segher

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-07 Thread Jeff Law

On Thu, 2020-02-06 at 14:16 +0100, Richard Biener wrote:
> On Thu, Feb 6, 2020 at 2:00 PM Jeff Law  wrote:
> > On Wed, 2020-02-05 at 09:19 +0100, Richard Biener wrote:
> > > On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
> > > > On 2/4/20 2:31 PM, Jeff Law wrote:
> > > > > On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> > > > > > On 2/4/20 12:15 PM, Richard Biener wrote:
> > > > > > > On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law 
> > > > > > >  wrote:
> > > > > > > > On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > > > > > > > > On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor 
> > > > > > > > >  wrote:
> > > > > > > > > > PR 93519 reports a false positive -Wrestrict issued for an 
> > > > > > > > > > inlined
> > > > > > > > call
> > > > > > > > > > to strcpy that carefully guards against self-copying.  This 
> > > > > > > > > > is
> > > > > > > > caused
> > > > > > > > > > by the caller's arguments substituted into the call during 
> > > > > > > > > > inlining
> > > > > > > > and
> > > > > > > > > > before dead code elimination.
> > > > > > > > > > 
> > > > > > > > > > The attached patch avoids this by removing -Wrestrict from 
> > > > > > > > > > the
> > > > > > > > folder
> > > > > > > > > > and deferring folding perfectly overlapping (and so 
> > > > > > > > > > undefined)
> > > > > > > > calls
> > > > > > > > > > to strcpy (and mempcpy, but not memcpy) until much later.  
> > > > > > > > > > Calls to
> > > > > > > > > > perfectly overlapping calls to memcpy are still folded 
> > > > > > > > > > early.
> > > > > > > > > 
> > > > > > > > > Why do we bother to warn at all for this case?  Just DWIM 
> > > > > > > > > here.
> > > > > > > > Warnings like
> > > > > > > > > this can be emitted from the analyzer?
> > > > > > > > They potentially can, but the analyzer is and will almost always
> > > > > > > > certainly be considerably slower.  I would not expect it to be 
> > > > > > > > used
> > > > > > > > nearly as much as the core compiler.
> > > > > > > > 
> > > > > > > > WHether or not a particular warning makes sense in the core 
> > > > > > > > compiler or
> > > > > > > > analyzer would seem to me to depend on whether or not we can 
> > > > > > > > reasonably
> > > > > > > > issue warnings without interprocedural analysis.  double-free
> > > > > > > > realistically requires interprocedural analysis to be 
> > > > > > > > effective.  I'm
> > > > > > > > not sure Wrestrict really does.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > That is, I suggest to simply remove the bogus warning code 
> > > > > > > > > from
> > > > > > > > folding
> > > > > > > > > (and _not_ fail the folding).
> > > > > > > > I haven't looked at the patch, but if we can get the warning 
> > > > > > > > out of the
> > > > > > > > folder that's certainly preferable.  And we could investigate 
> > > > > > > > deferring
> > > > > > > > self-copy removal.
> > > > > > > 
> > > > > > > I think the issue is as usual, warning for code we'll later 
> > > > > > > remove as dead. Warning at folding is almost always premature.
> > > > > > 
> > > > > > In this instance the code is reachable (or isn't obviously 
> > > > > > unreachable).
> > > > > > GCC doesn't remove it, but provides benign (and reasonable) 
> > > > > > semantics
> > > > > > for it(*).  To me, that's one aspect of quality.  Letting the user 
> > > > > > know
> > > > > > that the code is buggy is another.  I view that as at least as 
> > > > > > important
> > > > > > as folding the ill-effects away because it makes it possible to fix
> > > > > > the problem so the code works correctly even with compilers that 
> > > > > > don't
> > > > > > provide these benign semantics.
> > > > > If you look at the guts of what happens at the point where we issue 
> > > > > the
> > > > > warning from within gimple_fold_builtin_strcpy we have:
> > > > > 
> > > > > > DCH_to_char (char * in, char * out, int collid)
> > > > > > {
> > > > > >int type;
> > > > > >char * D.2148;
> > > > > >char * dest;
> > > > > >char * num;
> > > > > >long unsigned int _4;
> > > > > >char * _5;
> > > > > > 
> > > > > > ;;   basic block 2, loop depth 0
> > > > > > ;;pred:   ENTRY
> > > > > > ;;succ:   4
> > > > > > 
> > > > > > ;;   basic block 4, loop depth 0
> > > > > > ;;pred:   2
> > > > > > ;;succ:   5
> > > > > > 
> > > > > > ;;   basic block 5, loop depth 0
> > > > > > ;;pred:   4
> > > > > > ;;succ:   6
> > > > > > 
> > > > > > ;;   basic block 6, loop depth 0
> > > > > > ;;pred:   5
> > > > > >if (0 != 0)
> > > > > >  goto ; [53.47%]
> > > > > >else
> > > > > >  goto ; [46.53%]
> > > > > > ;;succ:   7
> > > > > > ;;8
> > > > > > 
> > > > > > ;;   basic block 7, loop depth 0
> > > > > > ;;pred:   6
> > > > > >strcpy (out_1(D), out_1(D));
> > > > > > ;;succ:   8
> > > > > > 
> > > > > > ;;   basic block 8, loop depth 0
> > > > > > ;;pred:

Re: [PATCH 4/3] Add [range.istream]

2020-02-07 Thread Jonathan Wakely


On 07/02/20 09:46 -0500, Patrick Palka wrote:

Fixed and committed with that change.  Thanks for the review!


I've just tested and pushed this fix.

commit 572992c8920d5339a3ac28d442c436d6daa0bfae
Author: Jonathan Wakely 
Date:   Fri Feb 7 16:06:43 2020 +

libstdc++ Fix missing return in istream_view iterator

* include/std/ranges (iota_view): Add braces to prevent -Wempty-body
warning.
(basic_istream_view::_Iterator::operator++()): Add missing return.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index dd0c5cf6aa7..891ecf75eff 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -872,7 +872,9 @@ namespace ranges
   : _M_value(__value), _M_bound(__bound)
   {
 	if constexpr (totally_ordered_with<_Winc, _Bound>)
-	  __glibcxx_assert( bool(__value <= __bound) );
+	  {
+	__glibcxx_assert( bool(__value <= __bound) );
+	  }
   }
 
   constexpr _Iterator
@@ -1012,6 +1014,7 @@ namespace views
 	{
 	  __glibcxx_assert(_M_parent->_M_stream != nullptr);
 	  *_M_parent->_M_stream >> _M_parent->_M_object;
+	  return *this;
 	}
 
 	void

Re: [PATCH] libstdc++: Optimize C++20 comparison category types

2020-02-07 Thread Daniel Krügler

Am Fr., 7. Feb. 2020 um 15:23 Uhr schrieb Jonathan Wakely :
>
> On 07/02/20 10:04 +0100, Daniel Krügler wrote:
> >Am Do., 6. Feb. 2020 um 15:28 Uhr schrieb Jonathan Wakely 
> >:
> >>
> >> On 06/02/20 13:53 +, Jonathan Wakely wrote:
> >> >On 06/02/20 13:40 +, Jonathan Wakely wrote:
> >> >>This reduces sizeof(std::partial_ordering) and optimizes conversion and
> >> >>comparison operators to avoid conditional branches where possible.
> >> >>
> >> >>  * libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 
> >> >> 2.
> >> >>  (partial_ordering::_M_is_ordered): Remove data member.
> >> >>  (partial_ordering): Use second bit of _M_value for unordered. 
> >> >> Adjust
> >> >>  comparison operators.
> >> >>  (weak_ordering::operator partial_ordering): Simplify to remove
> >> >>  branches.
> >> >>  (operator<=>(unspecified, weak_ordering)): Likewise.
> >> >>  (strong_ordering::operator partial_ordering): Likewise.
> >> >>  (strong_ordering::operator weak_ordering): Likewise.
> >> >>  (operator<=>(unspecified, strong_ordering)): Likewise.
> >> >>  * testsuite/18_support/comparisons/categories/partialord.cc: New 
> >> >> test.
> >> >>  * testsuite/18_support/comparisons/categories/strongord.cc: New 
> >> >> test.
> >> >>  * testsuite/18_support/comparisons/categories/weakord.cc: New test.
> >> >>
> >> >>Tested powerpc64le-linux and x86_64-linux.
> >> >>
> >> >>This is an ABI change for the partial_ordering type, but that is why I
> >> >>think we should do it now, not after GCC 10 is released. The sooner
> >> >>the better, before these types are being widely used.
> >> >>
> >> >>I plan to commit this in the next 12 hours or so, unless there are
> >> >>(valid :-) objections.
> >> >>
> >> >>Thanks to Barry Revzin for pointing out there was room for these
> >> >>operators to be improved.
> >> >
> >> >We could also change the int _M_value data member of all three
> >> >comparison category types to be a signed char instead of int. That
> >> >would reduce the size further.
> >>
> >> Or maybe std::int_fast8_t is the right type here.
> >
> >I like this suggestion.
>
> After discussing it with Jakub, I've decided to just use signed char.
>
> In theory int_fast8_t seems like the right choice, but it depends on
> how "fast" is interpreted. Fast for which operations? Is it actually
> going to be faster for the simple comparisons to constants and the
> bit masks that I use in ?
>
> On a number of operating systems int_fast8_t is always int, even
> though byte accesses are no slower on many of the processors whre that
> OS runs.
>
> So we decided that unconditionally using a single byte will give us a
> small size and alignment, without sacrificing any performance.

Yes, I agree with your decision.

- Daniel

Re: [RFA] [PR rtl-optimization/90275] Handle nop reg->reg copies in cse

2020-02-07 Thread Jeff Law

On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:
> On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> > Yea, it's closely related.  In your case you need to effectively ignore
> > the nop insn to get the optimization you want.  In mine that nop insn
> > causes an ICE.
> > 
> > I think we could take your cse bits + adding a !CALL_P separately from
> > the simplify-rtx stuff which Segher objected to.  THat'd likely solve
> > the ARM ICEs and take you a tiny step forward on optimizing that SVE
> > case.  Thoughts?
> 
> CSE should consistently keep track of what insns are no-op moves (in its
> definition, all passes have a slightly different definition of this),
> and use that everywhere consistently.
So does that mean you object to the cse.c portion of Richard's patch?

Jeff
>

Re: [PATCH] xfail and improve some failing libgomp tests

2020-02-07 Thread Jakub Jelinek

On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
> * {target-32.c, thread-limit-2.c}:
> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690

Please don't, I want to deal with that using declare variant, just didn't
get yet around to finishing the last patch needed for that.  Will try next week.

> * target-{33,34}.c:
> no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. 
> https://gcc.gnu.org/PR81688
> 
> * target-link-1.c:
> omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689

I guess this is ok, though of course the right thing would be to implement
both.  There has been even in some PR a suggestion that instead of failing
in nvptx async_run we should just ignore the nowait clause if the plugin
doesn't implement it properly.

Jakub

Re: [PATCH][AArch64] ACLE 8-bit integer matrix multiply-accumulate intrinsics

2020-02-07 Thread Dennis Zhang


Hi all,

On 27/01/2020 13:01, Richard Sandiford wrote:

Dennis Zhang  writes:

[...]
gcc/ChangeLog:

2020-01-23  Dennis Zhang  

* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SSUS): New macro.
* config/aarch64/aarch64-simd-builtins.def (simd_smmla): New.
(simd_ummla, simd_usmmla): New.
* config/aarch64/aarch64-simd.md (aarch64_simd_mmlav16qi): New.
* config/aarch64/arm_neon.h (vmmlaq_s32, vmmlaq_u32): New.
(vusmmlaq_s32): New.
* config/aarch64/iterators.md (unspec): Add UNSPEC_SMATMUL,
UNSPEC_UMATMUL, and UNSPEC_USMATMUL.
(sur): Likewise.
(MATMUL): New iterator.

gcc/testsuite/ChangeLog:

2020-01-23  Dennis Zhang  

* gcc.target/aarch64/simd/vmmla.c: New test.


OK, thanks.

One note below...


diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index f0e0461b7f0..033a6d4e92f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -176,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned,
qualifier_unsigned, qualifier_immediate };
  #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
  
  
  static enum aarch64_type_qualifiers

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 57fc5933b43..885c2540514 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -682,3 +682,8 @@
BUILTIN_VSFDF (UNOP, frint32x, 0)
BUILTIN_VSFDF (UNOP, frint64z, 0)
BUILTIN_VSFDF (UNOP, frint64x, 0)
+
+  /* Implemented by aarch64_simd_mmlav16qi.  */
+  VAR1 (TERNOP, simd_smmla, 0, v16qi)
+  VAR1 (TERNOPU, simd_ummla, 0, v16qi)
+  VAR1 (TERNOP_SSUS, simd_usmmla, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 2989096b170..b7659068b7d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7025,3 +7025,15 @@
"xtn\t%0., %1."
[(set_attr "type" "neon_shift_imm_narrow_q")]
  )
+
+;; 8-bit integer matrix multiply-accumulate
+(define_insn "aarch64_simd_mmlav16qi"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+   (plus:V4SI
+(unspec:V4SI [(match_operand:V16QI 2 "register_operand" "w")
+  (match_operand:V16QI 3 "register_operand" "w")] MATMUL)
+(match_operand:V4SI 1 "register_operand" "0")))]
+  "TARGET_I8MM"
+  "mmla\\t%0.4s, %2.16b, %3.16b"
+  [(set_attr "type" "neon_mla_s_q")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index eaba156e26c..918000d98dc 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34609,6 +34609,36 @@ vrnd64xq_f64 (float64x2_t __a)
  
  #pragma GCC pop_options
  
+/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */

+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+/* Matrix Multiply-Accumulate.  */
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_smmlav16qi (__r, __a, __b);
+}
+
+__extension__ extern __inline uint32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmmlaq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
+{
+  return __builtin_aarch64_simd_ummlav16qi_ (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusmmlaq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
+{
+  return __builtin_aarch64_simd_usmmlav16qi_ssus (__r, __a, __b);
+}
+
+#pragma GCC pop_options
+
  #include "arm_bf16.h"
  
  #undef __aarch64_vget_lane_any

diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index b9843b83c5f..57aca36f646 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -581,6 +581,9 @@
  UNSPEC_FMLSL  ; Used in aarch64-simd.md.
  UNSPEC_FMLAL2 ; Used in aarch64-simd.md.
  UNSPEC_FMLSL2 ; Used in aarch64-simd.md.
+UNSPEC_SMATMUL ; Used in aarch64-simd.md.
+UNSPEC_UMATMUL ; Used in aarch64-simd.md.
+UNSPEC_USMATMUL; Used in aarch64-simd.md.
  UNSPEC_ADR; Used in aarch64-sve.md.
  UNSPEC_SEL; Used in aarch64-sve.md.
  UNSPEC_BRKA   ; Used in aarch64-sve.md.
@@ -2531,6 +2534,8 @@
  
  (define_int_iterator SVE_PITER [UNSPEC_PFIRST UNSPEC_PNEXT])
  
+(define_int_iterator MATMUL [UNSPEC_SMATMUL UNSPEC_UMATMUL UNSPEC_USMATMUL])

+
  ;

Re: [PATCH 4/3] Add [range.istream]

2020-02-07 Thread Patrick Palka

On Fri, 7 Feb 2020, Jonathan Wakely wrote:

> On 06/02/20 19:52 -0500, Patrick Palka wrote:
> > This patch adds ranges::basic_istream_view and ranges::istream_view.  This
> > seems
> > to be the last missing part of the ranges header.
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * include/std/ranges (ranges::__detail::__stream_extractable,
> > ranges::basic_istream_view, ranges::istream_view): Define.
> > * testsuite/std/ranges/istream_view: New test.
> > ---
> > libstdc++-v3/include/std/ranges   | 94 +++
> > .../testsuite/std/ranges/istream_view.cc  | 76 +++
> > 2 files changed, 170 insertions(+)
> > create mode 100644 libstdc++-v3/testsuite/std/ranges/istream_view.cc
> > 
> > diff --git a/libstdc++-v3/include/std/ranges
> > b/libstdc++-v3/include/std/ranges
> > index 8a8fefb6f19..88b98310ef9 100644
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -951,6 +951,100 @@ namespace views
> >   inline constexpr _Iota iota{};
> > } // namespace views
> > 
> > +  namespace __detail
> > +  {
> > +template
> > +  concept __stream_extractable
> > +   = requires(basic_istream<_CharT, _Traits>& is, _Val& t) { is >> t; };
> 
> I was going to ask for "is" and "t" to use reserved names, but those
> names are actually reserved. std::ctype::is is present since C++98 and
> std::binomial_distribution::t() since C++11. So the names are OK.

Phew! :)  I just forgot to uglify those names.

> 
> > +  } // namespace __detail
> > +
> > +  template
> > +requires default_initializable<_Val>
> > +  && __detail::__stream_extractable<_Val, _CharT, _Traits>
> > +class basic_istream_view
> > +: public view_interface>
> > +{
> > +public:
> > +  basic_istream_view() = default;
> > +
> > +  constexpr explicit
> > +  basic_istream_view(basic_istream<_CharT, _Traits>& __stream)
> > +   : _M_stream(std::__addressof(__stream))
> > +  { }
> > +
> > +  constexpr auto
> > +  begin()
> > +  {
> > +   if (_M_stream != nullptr)
> > + *_M_stream >> _M_object;
> > +   return _Iterator{*this};
> > +  }
> > +
> > +  constexpr default_sentinel_t
> > +  end() const noexcept
> > +  { return default_sentinel; }
> > +
> > +private:
> > +  basic_istream<_CharT, _Traits>* _M_stream = nullptr;
> > +  _Val _M_object = _Val();
> > +
> > +  struct _Iterator
> > +  {
> > +  public:
> > +   using iterator_category = input_iterator_tag;
> > +   using difference_type = ptrdiff_t;
> > +   using value_type = _Val;
> > +
> > +   _Iterator() = default;
> > +
> > +   constexpr explicit
> > +   _Iterator(basic_istream_view& __parent) noexcept
> > + : _M_parent(std::__addressof(__parent))
> > +   { }
> > +
> > +   _Iterator(const _Iterator&) = delete;
> > +   _Iterator(_Iterator&&) = default;
> > +   _Iterator& operator=(const _Iterator&) = delete;
> > +   _Iterator& operator=(_Iterator&&) = default;
> > +
> > +   _Iterator&
> > +   operator++()
> > +   {
> > + __glibcxx_assert(_M_parent->_M_stream != nullptr);
> > + *_M_parent->_M_stream >> _M_parent->_M_object;
> > +   }
> > +
> > +   void
> > +   operator++(int)
> > +   { ++*this; }
> > +
> > +   _Val&
> > +   operator*() const
> > +   {
> > + __glibcxx_assert(_M_parent->_M_stream != nullptr);
> > + return _M_parent->_M_object;
> > +   }
> > +
> > +   friend bool
> > +   operator==(const _Iterator& __x, default_sentinel_t)
> > +   { return __x.__at_end(); }
> > +
> > +  private:
> > +   basic_istream_view* _M_parent = nullptr;
> > +
> > +   bool
> > +   __at_end() const
> 
> Please rename this to _M_at_end for consistency with the rest of the
> library.
> 
> OK for master with that tweak, thanks.

Fixed and committed with that change.  Thanks for the review!

Re: [PATCH 3/3] libstdc++: Implement C++20 range adaptors

2020-02-07 Thread Patrick Palka

On Fri, 7 Feb 2020, Jonathan Wakely wrote:

> On 06/02/20 18:53 -0500, Patrick Palka wrote:
> > On Thu, 6 Feb 2020, Jonathan Wakely wrote:
> > > > +#ifdef __cpp_lib_threeway_comparison
> > > 
> > > This macro is mispelled, should be three_way with an underscore.
> > 
> > Oops!  It looks like it's also mispelled in the definition of iota_view
> > earlier in this file.
> 
> Oops, yes, my fault then. Fixed with this patch, tested
> powerpc64le-linux and committed to master. I've reported the defect
> with the return types to the LWG chair.
> 
> Your incremental changes look good, please squash them into the base
> patch and push that to master. Thanks.

Thanks for the review.  I just committed the squashed version after a
successful regtest.

Re: [PATCH] libstdc++: Optimize C++20 comparison category types

2020-02-07 Thread Jonathan Wakely

On 07/02/20 10:04 +0100, Daniel Krügler wrote:

Am Do., 6. Feb. 2020 um 15:28 Uhr schrieb Jonathan Wakely :

On 06/02/20 13:53 +, Jonathan Wakely wrote:
>On 06/02/20 13:40 +, Jonathan Wakely wrote:
>>This reduces sizeof(std::partial_ordering) and optimizes conversion and
>>comparison operators to avoid conditional branches where possible.
>>
>>  * libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 2.
>>  (partial_ordering::_M_is_ordered): Remove data member.
>>  (partial_ordering): Use second bit of _M_value for unordered. Adjust
>>  comparison operators.
>>  (weak_ordering::operator partial_ordering): Simplify to remove
>>  branches.
>>  (operator<=>(unspecified, weak_ordering)): Likewise.
>>  (strong_ordering::operator partial_ordering): Likewise.
>>  (strong_ordering::operator weak_ordering): Likewise.
>>  (operator<=>(unspecified, strong_ordering)): Likewise.
>>  * testsuite/18_support/comparisons/categories/partialord.cc: New test.
>>  * testsuite/18_support/comparisons/categories/strongord.cc: New test.
>>  * testsuite/18_support/comparisons/categories/weakord.cc: New test.
>>
>>Tested powerpc64le-linux and x86_64-linux.
>>
>>This is an ABI change for the partial_ordering type, but that is why I
>>think we should do it now, not after GCC 10 is released. The sooner
>>the better, before these types are being widely used.
>>
>>I plan to commit this in the next 12 hours or so, unless there are
>>(valid :-) objections.
>>
>>Thanks to Barry Revzin for pointing out there was room for these
>>operators to be improved.
>
>We could also change the int _M_value data member of all three
>comparison category types to be a signed char instead of int. That
>would reduce the size further.

Or maybe std::int_fast8_t is the right type here.

I like this suggestion.

After discussing it with Jakub, I've decided to just use signed char.

In theory int_fast8_t seems like the right choice, but it depends on
how "fast" is interpreted. Fast for which operations? Is it actually
going to be faster for the simple comparisons to constants and the
bit masks that I use in ?

On a number of operating systems int_fast8_t is always int, even
though byte accesses are no slower on many of the processors whre that
OS runs.

So we decided that unconditionally using a single byte will give us a
small size and alignment, without sacrificing any performance.

Here's what I've tested (powerpc64le-linux and x86_64-linux) and
committed to master.

commit 0d57370c9cc3c1fb68be96b8cc15b92496c4dd21
Author: Jonathan Wakely 
Date:   Thu Feb 6 13:31:36 2020 +

libstdc++: Optimize C++20 comparison category types

This reduces the size and alignment of all three comparison category
types to a single byte. The partial_ordering::_M_is_ordered flag is
replaced by the value 0x02 in the _M_value member.

This also optimizes conversion and comparison operators to avoid
conditional branches where possible, by comparing _M_value to constants
or using bitwise operations to correctly handle the unordered state.

* libsupc++/compare (__cmp_cat::type): Define typedef for underlying
type of enumerations and comparison category types.
(__cmp_cat::_Ord, __cmp_cat::_Ncmp): Add underlying type.
(__cmp_cat::_Ncmp::unordered): Change value to 2.
(partial_ordering::_M_value, weak_ordering::_M_value)
(strong_ordering::_M_value): Change type to __cmp_cat::type.
(partial_ordering::_M_is_ordered): Remove data member.
(partial_ordering): Use second bit of _M_value for unordered. Adjust
comparison operators.
(weak_ordering::operator partial_ordering): Simplify to remove
branches.
(operator<=>(unspecified, weak_ordering)): Likewise.
(strong_ordering::operator partial_ordering): Likewise.
(strong_ordering::operator weak_ordering): Likewise.
(operator<=>(unspecified, strong_ordering)): Likewise.
* testsuite/18_support/comparisons/categories/partialord.cc: New test.
* testsuite/18_support/comparisons/categories/strongord.cc: New test.
* testsuite/18_support/comparisons/categories/weakord.cc: New test.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index a7a29ef0440..b2d64ef74a4 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -48,9 +48,11 @@ namespace std

   namespace __cmp_cat
   {
-enum class _Ord { equivalent = 0, less = -1, greater = 1 };
+using type = signed char;

-enum class _Ncmp { _Unordered = -127 };
+enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
+
+enum class _Ncmp : type { _Unordered = 2 };

 struct __unspec
 {
@@ -60,19 +62,22 @@ namespace std

   class partial_ordering
   {
-int _M_value;
-bool _M_is_ordered;

Re: [PATCH][AARCH64] Fix for PR86901

2020-02-07 Thread Wilco Dijkstra

Hi,

Richard wrote:
> However, inside the compiler we really want to represent this as a 
>shift.
...
> Ideally this would be handled inside the mid-end expansion of an 
> extract, but in the absence of that I think this is best done inside the 
> extv expansion so that we never end up with a real extract in that case.

Yes the mid-end could be improved - it turns out it is due to expansion of
bitfields, all variations of (x & mask) >> N are optimized into shifts early on.

However it turns out Combine can already transform these zero/sign_extends
into shifts, so we do end up with good code. With the latest patch I get:

typedef struct { int x : 6, y : 6, z : 20; } X;

int f (int x, X *p) { return x + p->z; }

ldr w1, [x1]
add w0, w0, w1, asr 12
ret

So this case looks alright.

> Sounds good. I'll get those setup and running and will report back on 
> findings. What's
> the preferred way to measure codesize? I'm assuming by default the code pages 
> are 
> aligned so smaller differences would need to trip over the boundary to 
> actually show up.

You can use the size command on the binaries:

>size /bin/ls
   textdata bss dec hex filename
 10727120243472  112767   1b87f /bin/ls

As you can see it shows the text size in bytes. It is not rounded up to a page, 
so it is an 
accurate measure of the codesize. Generally -O2 size is most useful to check 
(since that
is what most applications build with), but -Ofast -flto can be useful as well 
(the global 
inlining means you get instruction combinations which appear less often with 
-O2).

Cheers,
Wilco

[COMMITTED] c++: Fix ICE on nonsense requires-clause.

2020-02-07 Thread Jason Merrill

Here we were swallowing all the syntax errors by parsing tentatively, and
returning error_mark_node without ever actually giving an error.  Fixed by
using save_tokens/rollback_tokens instead.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/92517
* parser.c (cp_parser_constraint_primary_expression): Do the main
parse non-tentatively.
---
 gcc/cp/parser.c   | 17 +++--
 gcc/testsuite/g++.dg/cpp2a/concepts-syntax1.C |  9 +
 2 files changed, 16 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-syntax1.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e0f72302e5e..d4c9523289f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -27478,7 +27478,7 @@ cp_parser_constraint_primary_expression (cp_parser 
*parser, bool lambda_p)
   return e;
 }
 
-  cp_parser_parse_tentatively (parser);
+  cp_lexer_save_tokens (parser->lexer);
   cp_id_kind idk;
   location_t loc = input_location;
   cp_expr expr = cp_parser_primary_expression (parser,
@@ -27494,19 +27494,16 @@ cp_parser_constraint_primary_expression (cp_parser 
*parser, bool lambda_p)
   /* The primary-expression could be part of an unenclosed non-logical
 compound expression.  */
   pce = cp_parser_constraint_requires_parens (parser, lambda_p);
-  if (pce != pce_ok)
-   cp_parser_simulate_error (parser);
-  else
-   expr = finish_constraint_primary_expr (expr);
 }
-  if (cp_parser_parse_definitely (parser))
-return expr;
-  if (expr == error_mark_node)
-return error_mark_node;
+  if (pce == pce_ok)
+{
+  cp_lexer_commit_tokens (parser->lexer);
+  return finish_constraint_primary_expr (expr);
+}
 
   /* Retry the parse at a lower precedence. If that succeeds, diagnose the
  error, but return the expression as if it were valid.  */
-  gcc_assert (pce != pce_ok);
+  cp_lexer_rollback_tokens (parser->lexer);
   cp_parser_parse_tentatively (parser);
   if (pce == pce_maybe_operator)
 expr = cp_parser_assignment_expression (parser, NULL, false, false);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-syntax1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-syntax1.C
new file mode 100644
index 000..0a47682c456
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-syntax1.C
@@ -0,0 +1,9 @@
+// PR c++/92517
+// { dg-do compile { target concepts } }
+
+template 
+concept C = true;
+
+template
+requires C decltype // { dg-error "" }
+void f() {}

base-commit: 3c7a03bc360c3511fae3747a71e579e9fd0824f9
-- 
2.18.1

Re: [PATCH 2/3] libstdc++: Implement C++20 constrained algorithms

2020-02-07 Thread Jonathan Wakely


On 03/02/20 21:07 -0500, Patrick Palka wrote:

+  template
+struct binary_transform_result
+{
+  [[no_unique_address]] _Iter1 in1;
+  [[no_unique_address]] _Iter2 in2;
+  [[no_unique_address]] _Out  out;
+
+  template
+   requires convertible_to &&
+ && convertible_to


WHAT IS HAPPENING HERE?!

Notice we have requires A && && B

I'm fixing it, but that needs following up to see if there's a
compiler bug!

Re: [ARM] Fix -mpure-code for v6m

2020-02-07 Thread Richard Earnshaw (lists)


On 07/02/2020 13:19, Christophe Lyon wrote:

When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code
for cortex-m0, I noticed that some testcases were failing because we
still generate "ldr rX, .LCY", which is what we want to avoid with
-mpure-code. This is latent since a recent improvement in fwprop
(PR88833).

In this patch I change the thumb1_movsi_insn pattern so that it emits
the desired instruction sequence when arm_disable_literal_pool is set.

I tried to add a define_split instead, but couldn't make it work: the
compiler then complains it cannot split the instruction, while my new
define_split accepts the same operand types as thumb1_movsi_insn:

c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: could not split insn
(insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342])
 (symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 {*thumb1_movsi_insn}
  (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") [flags 0x2])
 (nil)))
during RTL pass: final

(define_split
   [(set (match_operand:SI 0 "register_operand" "")
 (match_operand:SI 1 "general_operand" ""))]
   "TARGET_THUMB1
&& arm_disable_literal_pool
&& GET_CODE (operands[1]) == SYMBOL_REF"
   [(clobber (const_int 0))]
   "
 gen_thumb1_movsi_symbol_ref(operands[0], operands[1]);
 DONE;
   "
)
and I put this in thumb1_movsi_insn:
if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool)
   {
 return \"#\";
   }
   return \"ldr\\t%0, %1\";

2020-02-07  Christophe Lyon  

 * config/arm/thumb1.md (thumb1_movsi_insn): Fix ldr alternative to
 work with -mpure-code.



+case 0:
+case 1:
+  return \"movs   %0, %1\";
+case 2:
+  return \"movw   %0, %1\";

This is OK, but please replace the hard tab in the strings for MOVS/MOVW 
with \\t.


R.

[ARM] Fix -mpure-code for v6m

2020-02-07 Thread Christophe Lyon

When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code
for cortex-m0, I noticed that some testcases were failing because we
still generate "ldr rX, .LCY", which is what we want to avoid with
-mpure-code. This is latent since a recent improvement in fwprop
(PR88833).

In this patch I change the thumb1_movsi_insn pattern so that it emits
the desired instruction sequence when arm_disable_literal_pool is set.

I tried to add a define_split instead, but couldn't make it work: the
compiler then complains it cannot split the instruction, while my new
define_split accepts the same operand types as thumb1_movsi_insn:

c-c++-common/torture/complex-sign-mixed-add.c:41:1: error: could not split insn
(insn 2989 425 4844 (set (reg/f:SI 3 r3 [1342])
(symbol_ref/u:SI ("*.LC6") [flags 0x2])) 836 {*thumb1_movsi_insn}
 (expr_list:REG_EQUIV (symbol_ref/u:SI ("*.LC6") [flags 0x2])
(nil)))
during RTL pass: final

(define_split
  [(set (match_operand:SI 0 "register_operand" "")
(match_operand:SI 1 "general_operand" ""))]
  "TARGET_THUMB1
   && arm_disable_literal_pool
   && GET_CODE (operands[1]) == SYMBOL_REF"
  [(clobber (const_int 0))]
  "
gen_thumb1_movsi_symbol_ref(operands[0], operands[1]);
DONE;
  "
)
and I put this in thumb1_movsi_insn:
if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool)
  {
return \"#\";
  }
  return \"ldr\\t%0, %1\";

2020-02-07  Christophe Lyon  

* config/arm/thumb1.md (thumb1_movsi_insn): Fix ldr alternative to
work with -mpure-code.
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 613cf9c..a722194 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -696,17 +696,43 @@
   "TARGET_THUMB1
&& (   register_operand (operands[0], SImode)
|| register_operand (operands[1], SImode))"
-  "@
-   movs%0, %1
-   movs%0, %1
-   movw%0, %1
-   #
-   #
-   ldmia\\t%1, {%0}
-   stmia\\t%0, {%1}
-   ldr\\t%0, %1
-   str\\t%1, %0
-   mov\\t%0, %1"
+  "*
+  switch (which_alternative)
+  {
+case 0:
+case 1:
+  return \"movs%0, %1\";
+case 2:
+  return \"movw%0, %1\";
+case 3:
+case 4:
+  return \"#\";
+case 5:
+  return \"ldmia\\t%1, {%0}\";
+case 6:
+  return \"stmia\\t%0, {%1}\";
+case 7:
+  /* Cannot load it directly, split to build it via MOV / LSLS / ADDS.  */
+  if (GET_CODE (operands[1]) == SYMBOL_REF && arm_disable_literal_pool)
+{
+  output_asm_insn (\"movs\\t%0, #:upper8_15:%1\", operands);
+  output_asm_insn (\"lsls\\t%0, #8\", operands);
+  output_asm_insn (\"adds\\t%0, #:upper0_7:%1\", operands);
+  output_asm_insn (\"lsls\\t%0, #8\", operands);
+  output_asm_insn (\"adds\\t%0, #:lower8_15:%1\", operands);
+  output_asm_insn (\"lsls\\t%0, #8\", operands);
+  output_asm_insn (\"adds\\t%0, #:lower0_7:%1\", operands);
+  return \"\";
+}
+  else
+return \"ldr\\t%0, %1\";
+case 8:
+  return \"str\\t%1, %0\";
+case 9:
+  return \"mov\\t%0, %1\";
+default:
+  gcc_unreachable ();
+  }"
   [(set_attr "length" "2,2,4,4,4,2,2,2,2,2")
(set_attr "type" 
"mov_reg,mov_imm,mov_imm,multiple,multiple,load_4,store_4,load_4,store_4,mov_reg")
(set_attr "pool_range" "*,*,*,*,*,*,*,1018,*,*")

Re: [PATCH 4/3] Add [range.istream]

2020-02-07 Thread Jonathan Wakely


On 06/02/20 19:52 -0500, Patrick Palka wrote:

This patch adds ranges::basic_istream_view and ranges::istream_view.  This seems
to be the last missing part of the ranges header.

libstdc++-v3/ChangeLog:

* include/std/ranges (ranges::__detail::__stream_extractable,
ranges::basic_istream_view, ranges::istream_view): Define.
* testsuite/std/ranges/istream_view: New test.
---
libstdc++-v3/include/std/ranges   | 94 +++
.../testsuite/std/ranges/istream_view.cc  | 76 +++
2 files changed, 170 insertions(+)
create mode 100644 libstdc++-v3/testsuite/std/ranges/istream_view.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 8a8fefb6f19..88b98310ef9 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -951,6 +951,100 @@ namespace views
  inline constexpr _Iota iota{};
} // namespace views

+  namespace __detail
+  {
+template
+  concept __stream_extractable
+   = requires(basic_istream<_CharT, _Traits>& is, _Val& t) { is >> t; };


I was going to ask for "is" and "t" to use reserved names, but those
names are actually reserved. std::ctype::is is present since C++98 and
std::binomial_distribution::t() since C++11. So the names are OK.


+  } // namespace __detail
+
+  template
+requires default_initializable<_Val>
+  && __detail::__stream_extractable<_Val, _CharT, _Traits>
+class basic_istream_view
+: public view_interface>
+{
+public:
+  basic_istream_view() = default;
+
+  constexpr explicit
+  basic_istream_view(basic_istream<_CharT, _Traits>& __stream)
+   : _M_stream(std::__addressof(__stream))
+  { }
+
+  constexpr auto
+  begin()
+  {
+   if (_M_stream != nullptr)
+ *_M_stream >> _M_object;
+   return _Iterator{*this};
+  }
+
+  constexpr default_sentinel_t
+  end() const noexcept
+  { return default_sentinel; }
+
+private:
+  basic_istream<_CharT, _Traits>* _M_stream = nullptr;
+  _Val _M_object = _Val();
+
+  struct _Iterator
+  {
+  public:
+   using iterator_category = input_iterator_tag;
+   using difference_type = ptrdiff_t;
+   using value_type = _Val;
+
+   _Iterator() = default;
+
+   constexpr explicit
+   _Iterator(basic_istream_view& __parent) noexcept
+ : _M_parent(std::__addressof(__parent))
+   { }
+
+   _Iterator(const _Iterator&) = delete;
+   _Iterator(_Iterator&&) = default;
+   _Iterator& operator=(const _Iterator&) = delete;
+   _Iterator& operator=(_Iterator&&) = default;
+
+   _Iterator&
+   operator++()
+   {
+ __glibcxx_assert(_M_parent->_M_stream != nullptr);
+ *_M_parent->_M_stream >> _M_parent->_M_object;
+   }
+
+   void
+   operator++(int)
+   { ++*this; }
+
+   _Val&
+   operator*() const
+   {
+ __glibcxx_assert(_M_parent->_M_stream != nullptr);
+ return _M_parent->_M_object;
+   }
+
+   friend bool
+   operator==(const _Iterator& __x, default_sentinel_t)
+   { return __x.__at_end(); }
+
+  private:
+   basic_istream_view* _M_parent = nullptr;
+
+   bool
+   __at_end() const


Please rename this to _M_at_end for consistency with the rest of the
library.

OK for master with that tweak, thanks.

coroutines: Update to n4849 allocation/deallocation.

2020-02-07 Thread Iain Sandoe

Hi,

This is the first of the (small number of) anticipated changes to bring
the implementation into line with the latest published C++20 draft (this
is now expected to be very close to the final, although some wording
might still be adjusted).

The allocation for the coroutine state frame is quite flexible - which means
in reality a number of permutations to cater for and to test for.

Partial or complete adoption of p2014r0 would be additive to this, but it
seems better to me to separate changes for that to a future patch.

Because of the number of permutations, the error messages are somewhat
long-winded in an effort to point the user to the specific problem.

OK for trunk?
thanks
Iain



This updates the coroutine frame allocation and deallocation usage to
match n4849.

[dcl.fct.def.coroutine] /9, /10, /12.

9 An implementation may need to allocate additional storage for a coroutine.
  This storage is known as the coroutine state and is obtained by calling a
  non-array allocation function.  The allocation function’s name is looked up
  in the scope of the promise type.  If this lookup fails, the allocation
  function’s name is looked up in the global scope.  If the lookup finds an
  allocation function in the scope of the promise type, overload resolution
  is performed on a function call created by assembling an argument list.
  The first argument is the amount of space requested, and has type
  std::size_t.  The lvalues p1 . . . pn are the succeeding [user's function]
  arguments. If no viable function is found, overload resolution is performed
  again on a function call created by passing just the amount of space required
  as an argument of type std::size_t.

10 The unqualified-id get_return_object_on_allocation_failure is looked up in
  the scope of the promise type by class member access lookup. If any
  declarations are found, then the result of a call to an allocation function
  used to obtain storage for the coroutine state is assumed to return nullptr
  if it fails to obtain storage, and if a global allocation function is
  selected, the ::operator new(size_t, nothrow_t) form is used. The allocation
  function used in this case shall have a non-throwing noexcept-specification.
  If the allocation function returns nullptr, the coroutine returns control to
  the caller of the coroutine and the return value is obtained by a call to
  T::get_return_object_on_allocation_failure(), where T is the promise type.

12 The deallocation function’s name is looked up in the scope of the promise
  type. If this lookup fails, the deallocation function’s name is looked up in
  the global scope.  If deallocation function lookup finds both a usual
  deallocation function with only a pointer parameter and a usual deallocation
  function with both a pointer parameter and a size parameter, then the
  selected deallocation function shall be the one with two parameters.
  Otherwise, the selected deallocation function shall be the function with one
  parameter. If no usual deallocation function is found, the program is ill-
  formed.  The selected deallocation function shall be called with the address
  of the block of storage to be reclaimed as its first argument.  If a
  deallocation function with a parameter of type std::size_t is used, the size
  of the block is passed as the corresponding argument.

gcc/cp/ChangeLog:

2020-02-07  Iain Sandoe  

* coroutines.cc (build_actor_fn): Implement deallocation function
selection per n4849, dcl.fct.def.coroutine bullet 12.
(morph_fn_to_coro): Implement allocation function selection per
n4849, dcl.fct.def.coroutine bullets 9 and 10.

gcc/testsuite/ChangeLog:

2020-02-07  Iain Sandoe  

* g++.dg/coroutines/coro1-allocators.h: New.
* g++.dg/coroutines/coro-bad-alloc-00-bad-op-new.C: New test.
* g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: New test.
* g++.dg/coroutines/coro-bad-alloc-02-no-op-new-nt.C: New test.
* g++.dg/coroutines/torture/alloc-00-gro-on-alloc-fail.C: Use new
coro1-allocators.h header.
* g++.dg/coroutines/torture/alloc-01-overload-newdel.C: Likewise.
* g++.dg/coroutines/torture/alloc-02-fail-new-grooaf-check.C: New.
* g++.dg/coroutines/torture/alloc-03-overload-new-1.C: New test.
* g++.dg/coroutines/torture/alloc-04-overload-del-use-two-args.C:New.
---
 gcc/cp/coroutines.cc  | 248 --
 .../coroutines/coro-bad-alloc-00-bad-op-new.C |  12 +
 .../coroutines/coro-bad-alloc-01-bad-op-del.C |  13 +
 .../coro-bad-alloc-02-no-op-new-nt.C  |  15 ++
 .../g++.dg/coroutines/coro1-allocators.h  | 184 +
 .../torture/alloc-00-gro-on-alloc-fail.C  |  98 +--
 .../torture/alloc-01-overload-newdel.C|  81 +-
 .../torture/alloc-02-fail-new-grooaf-check.C  |  41 +++
 .../torture/alloc-03-overload-new-1.C |  55 
 .../alloc-04-overload-del-use-two-args.C  |  60

Re: [PATCH] x86-64: Pass aggregates with only float/double in GPRs for MS_ABI

2020-02-07 Thread H.J. Lu

On Fri, Feb 7, 2020 at 2:14 AM JonY <10wa...@gmail.com> wrote:
>
> On 2/7/20 3:23 AM, H.J. Lu wrote:
> > On Wed, Feb 05, 2020 at 09:51:14PM +0100, Uros Bizjak wrote:
> >> On Wed, Feb 5, 2020 at 6:59 PM H.J. Lu  wrote:
> >>>
> >>> MS_ABI requires passing aggregates with only float/double in integer
> >>> registers.  Checked gcc outputs against Clang and fixed:
> >>>
> >>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
> >>> -Wno-unused-variable -Wno-unused-parameter
> >>> -Wno-unused-but-set-variable -Wno-uninitialized -O0
> >>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
> >>> -Wno-unused-variable -Wno-unused-parameter
> >>> -Wno-unused-but-set-variable -Wno-uninitialized -O2
> >>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55
> >>> -Wno-unused-variable -Wno-unused-parameter
> >>> -Wno-unused-but-set-variable -Wno-uninitialized -O0
> >>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55
> >>> -Wno-unused-variable -Wno-unused-parameter
> >>> -Wno-unused-but-set-variable -Wno-uninitialized -O2
> >>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56
> >>> -Wno-unused-variable -Wno-unused-parameter
> >>> -Wno-unused-but-set-variable -Wno-uninitialized -O0
> >>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56
> >>> -Wno-unused-variable -Wno-unused-parameter
> >>> -Wno-unused-but-set-variable -Wno-uninitialized -O2
> >>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
> >>>
> >>> in libffi testsuite.
> >>>
> >>> OK for master and backports to GCC 8/9 branches?
> >>>
> >>> gcc/
> >>>
> >>> PR target/85667
> >>> * config/i386/i386.c (function_arg_ms_64): Add a type argument.
> >>> Don't return aggregates with only SFmode and DFmode in SSE
> >>> register.
> >>> (ix86_function_arg): Pass arg.type to function_arg_ms_64.
> >>>
> >>> gcc/testsuite/
> >>>
> >>> PR target/85667
> >>> * gcc.target/i386/pr85667-10.c: New test.
> >>> * gcc.target/i386/pr85667-7.c: Likewise.
> >>> * gcc.target/i386/pr85667-8.c: Likewise.
> >>> * gcc.target/i386/pr85667-9.c: Likewise.
> >>
> >> LGTM, but should really be reviewed by cygwin, mingw-w64 maintainer (CC'd).
> >>
> >
> > I checked the result against MSVC v19.10 at
> >
> > https://godbolt.org/z/2NPygd
> >
> > My patch matches MSVC v19.10.  I am checking it in tomorrow unless
> > mingw-w64 maintainer objects.
> >
>
> Please go ahead and thanks.
>

I checked it into master and backported it to releases/gcc-9 branch.
No plan to fix releases/gcc-8 branch.

Thanks.

-- 
H.J.

Re: [PATCH] middle-end/93519 - avoid folding stmts in obviously unreachable code

2020-02-07 Thread Richard Biener

On Thu, 6 Feb 2020, Martin Sebor wrote:

> On 2/6/20 7:52 AM, Richard Biener wrote:
> > The inliner folds stmts delayed, the following arranges things so
> > to not fold stmts that are obviously not reachable to avoid warnings
> > from those code regions.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> It fixes the reported problem so it works for me.
> 
> The tests I submitted with my patch fail a number of cases because
> along with strcpy it also deferred folding overlapping mempcpy calls.
> That was not strictly part of the regression so I'm okay with deferring
> it until GCC 11.  I will resubmit an updated patch to defer the folding
> then.

The following is what I have applied, fixing the quadraticness with
switch stmt edges, trading it with possible re-allocation of the
worker stack.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

>From ed41d9926a45d5ebc50d3721dde564cabd36f861 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 6 Feb 2020 15:49:57 +0100
Subject: [PATCH] middle-end/93519 - avoid folding stmts in obviously
 unreachable code

The inliner folds stmts delayed, the following arranges things so
to not fold stmts that are obviously not reachable to avoid warnings
from those code regions.

2020-02-06  Richard Biener  

PR middle-end/93519
* tree-inline.c (fold_marked_statements): Do a PRE walk,
skipping unreachable regions.
(optimize_inline_calls): Skip folding stmts when we didn't
inline.

* gcc.dg/Wrestrict-21.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/Wrestrict-21.c 
b/gcc/testsuite/gcc.dg/Wrestrict-21.c
new file mode 100644
index 000..e300663758e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wrestrict-21.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wrestrict" } */
+
+static char *
+str_numth(char *dest, char *num, int type)
+{
+  if (dest != num)
+__builtin_strcpy(dest, num); /* { dg-bogus "is the same" } */
+  __builtin_strcat(dest, "foo");
+  return dest;
+}
+
+void
+DCH_to_char(char *in, char *out, int collid)
+{
+  char *s = out;
+  str_numth(s, s, 42);
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 5b0050a53d2..23941dade55 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5261,86 +5261,117 @@ static void
 fold_marked_statements (int first, hash_set *statements)
 {
   auto_bitmap to_purge;
-  for (; first < last_basic_block_for_fn (cfun); first++)
-if (BASIC_BLOCK_FOR_FN (cfun, first))
-  {
-gimple_stmt_iterator gsi;
 
-   for (gsi = gsi_start_bb (BASIC_BLOCK_FOR_FN (cfun, first));
-!gsi_end_p (gsi);
-gsi_next (&gsi))
- if (statements->contains (gsi_stmt (gsi)))
-   {
- gimple *old_stmt = gsi_stmt (gsi);
- tree old_decl
-   = is_gimple_call (old_stmt) ? gimple_call_fndecl (old_stmt) : 0;
+  auto_vec stack (n_basic_blocks_for_fn (cfun) + 2);
+  auto_sbitmap visited (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
 
- if (old_decl && fndecl_built_in_p (old_decl))
-   {
- /* Folding builtins can create multiple instructions,
-we need to look at all of them.  */
- gimple_stmt_iterator i2 = gsi;
- gsi_prev (&i2);
- if (fold_stmt (&gsi))
-   {
- gimple *new_stmt;
- /* If a builtin at the end of a bb folded into nothing,
-the following loop won't work.  */
- if (gsi_end_p (gsi))
-   {
- cgraph_update_edges_for_call_stmt (old_stmt,
-old_decl, NULL);
- break;
-   }
- if (gsi_end_p (i2))
-   i2 = gsi_start_bb (BASIC_BLOCK_FOR_FN (cfun, first));
- else
+  stack.quick_push (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+  while (!stack.is_empty ())
+{
+  /* Look at the edge on the top of the stack.  */
+  edge e = stack.pop ();
+  basic_block dest = e->dest;
+
+  if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
+ || bitmap_bit_p (visited, dest->index))
+   continue;
+
+  bitmap_set_bit (visited, dest->index);
+
+  if (dest->index >= first)
+   for (gimple_stmt_iterator gsi = gsi_start_bb (dest);
+!gsi_end_p (gsi); gsi_next (&gsi))
+ {
+   if (!statements->contains (gsi_stmt (gsi)))
+ continue;
+
+   gimple *old_stmt = gsi_stmt (gsi);
+   tree old_decl = (is_gimple_call (old_stmt)
+? gimple_call_fndecl (old_stmt) : 0);
+   if (old_decl && fndecl_built_in_p (old_decl))
+ {
+   /* Folding builtins can create multiple instructions,
+  we need to look at all of them.  *

Re: [PATCH 3/3] libstdc++: Implement C++20 range adaptors

2020-02-07 Thread Jonathan Wakely


On 06/02/20 18:53 -0500, Patrick Palka wrote:

On Thu, 6 Feb 2020, Jonathan Wakely wrote:

> +#ifdef __cpp_lib_threeway_comparison

This macro is mispelled, should be three_way with an underscore.


Oops!  It looks like it's also mispelled in the definition of iota_view
earlier in this file.


Oops, yes, my fault then. Fixed with this patch, tested
powerpc64le-linux and committed to master. I've reported the defect
with the return types to the LWG chair.

Your incremental changes look good, please squash them into the base
patch and push that to master. Thanks.

commit 5713834e4b99e4c4c99eef15698a497f091b7dc4
Author: Jonathan Wakely 
Date:   Fri Feb 7 11:31:12 2020 +

libstdc++: Enable three-way comparison for iota_view iterators

The declaration of operator<=> was disabled due to a typo in the macro.
The declaration was also ill-formed when three_way_comparable<_Winc> is
not satisfied, which is a defect in the C++20 draft.

* include/std/ranges (iota_view::_Iterator): Fix typo in name of
__cpp_lib_three_way_comparison macro and use deduced return type for
operator<=>.
* testsuite/std/ranges/iota/iterator.cc: New test.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 860f7283be5..dc277a74fb6 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -776,8 +776,8 @@ namespace ranges
 	  requires totally_ordered<_Winc>
 	{ return !(__x < __y); }
 
-#ifdef __cpp_lib_threeway_comparison
-	friend constexpr compare_three_way_result_t<_Winc>
+#ifdef __cpp_lib_three_way_comparison
+	friend constexpr auto
 	operator<=>(const _Iterator& __x, const _Iterator& __y)
 	  requires totally_ordered<_Winc> && three_way_comparable<_Winc>
 	{ return __x._M_value <=> __y._M_value; }
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/iterator.cc b/libstdc++-v3/testsuite/std/ranges/iota/iterator.cc
new file mode 100644
index 000..4d471431eae
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/iota/iterator.cc
@@ -0,0 +1,35 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+
+auto i = std::ranges::iota_view{}.begin();
+static_assert( std::three_way_comparable );
+
+struct Inc {
+  Inc& operator++();
+  Inc operator++(int);
+  friend long operator-(Inc, Inc);
+};
+static_assert( ! std::three_way_comparable );
+
+// Instantiating iterator type must be valid despite !three_way_comparable
+auto j = std::ranges::iota_view{}.begin();
+static_assert( ! std::three_way_comparable );

Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-07 Thread Jakub Jelinek

On Fri, Feb 07, 2020 at 10:57:22AM +, JonY wrote:
> >> Is this patch testing still required? I just got back from traveling.
> > 
> > Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
> > (not preserved over calls), while in gcc they are currently handled as
> > preserved across the calls.

The other parts are I guess mainly about SEH.  Consider e.g.
void
foo (void)
{
  register double x __asm ("xmm14");
  register double y __asm ("xmm18");
  asm ("" : "=x" (x));
  asm ("" : "=v" (y));
  x += y;
  y += x;
  asm ("" : : "x" (x));
  asm ("" : : "v" (y));
}
looking at cross-compiler output, with -O2 -mavx512f this emits
.file   "abcdeq.c"
.text
.align 16
.globl  foo
.deffoo;.scl2;  .type   32; .endef
.seh_proc   foo
foo:
subq$40, %rsp
.seh_stackalloc 40
vmovaps %xmm14, (%rsp)
.seh_savexmm%xmm14, 0
vmovaps %xmm18, 16(%rsp)
.seh_savexmm%xmm18, 16
.seh_endprologue
vaddsd  %xmm18, %xmm14, %xmm14
vaddsd  %xmm18, %xmm14, %xmm18
vmovaps (%rsp), %xmm14
vmovaps 16(%rsp), %xmm18
addq$40, %rsp
ret
.seh_endproc
.ident  "GCC: (GNU) 10.0.1 20200207 (experimental)"
Does whatever assembler mingw64 uses even assemble this (I mean the
.seh_savexmm %xmm16, 16 could be problematic)?
I can find e.g.
https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
which then links to
https://gcc.gnu.org/PR65782

So, I'd say we want to add PR target/65782 to the ChangeLog entry in the
patch, and likely add a testcase like the above, so like below?

Have you tested the earlier version of the patch on mingw64 or cygwin?
If yes, can you just test the testcase from the following patch, without and
with the i386.h part and verify it FAILs without and PASSes with it?
Just make check-gcc RUNTESTFLAGS=i386.exp=pr65782.c should do?
If not, can you please test the whole patch?

2020-02-07  Uroš Bizjak  
Jakub Jelinek  

PR target/65782
* config/i386/config/i386/i386.h (CALL_USED_REGISTERS): Make
xmm16-xmm31 call-used even in 64-bit ms-abi.

* gcc.target/i386/pr65782.c: New test.

--- gcc/config/i386/config/i386/i386.h.jj   2020-01-22 10:19:24.199221986 
+0100
+++ gcc/config/i386/config/i386/i386.h  2020-02-04 12:09:12.338341003 +0100
@@ -1128,9 +1128,9 @@ extern const char *host_detect_local_cpu
 /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/  \
  6,   6,6,6,6,6,6,6,   \
 /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/\
- 6,6, 6,6,6,6,6,6, \
+ 1,1, 1,1,1,1,1,1, \
 /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/\
- 6,6, 6,6,6,6,6,6, \
+ 1,1, 1,1,1,1,1,1, \
  /* k0,  k1,  k2,  k3,  k4,  k5,  k6,  k7*/\
  1,   1,   1,   1,   1,   1,   1,   1 }
 
--- gcc/testsuite/gcc.target/i386/pr65782.c.jj  2020-02-07 12:21:09.472819018 
+0100
+++ gcc/testsuite/gcc.target/i386/pr65782.c 2020-02-07 12:24:06.820154495 
+0100
@@ -0,0 +1,16 @@
+/* PR target/65782 */
+/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */
+/* { dg-options "-O2 -mavx512vl" } */
+
+void
+foo (void)
+{
+  register double x __asm ("xmm14");
+  register double y __asm ("xmm18");
+  asm ("" : "=x" (x));
+  asm ("" : "=v" (y));
+  x += y;
+  y += x;
+  asm ("" : : "x" (x));
+  asm ("" : : "v" (y));
+}


Jakub

Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-07 Thread JonY

On 2/6/20 6:07 AM, Jakub Jelinek wrote:
> On Thu, Feb 06, 2020 at 01:00:36AM +, JonY wrote:
>> On 2/4/20 11:42 AM, Jakub Jelinek wrote:
>>> Hi!
>>>
>>> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
 I guess that Comment #9 patch form the PR should be trivially correct,
 but althouhg it looks obvious, I don't want to propose the patch since
 I have no means of testing it.
>>>
>>> I don't have means of testing it either.
>>> https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
>>> is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
>>> 128-bits only) are call preserved.
>>>
>>> Jonathan, could you please test this if it is sufficient to just change
>>> CALL_USED_REGISTERS or if e.g. something in the pro/epilogue needs tweaking
>>> too?  Thanks.
>>
>> Is this patch testing still required? I just got back from traveling.
> 
> Yes, our reading of the MS ABI docs show that xmm16-31 are to be call used
> (not preserved over calls), while in gcc they are currently handled as
> preserved across the calls.
> 
>   Jakub
> 


--- original.s  2020-02-06 09:00:02.014638069 +
+++ new.s   2020-02-07 10:28:55.678317667 +
@@ -7,23 +7,23 @@
 qux:
subq$72, %rsp
.seh_stackalloc 72
-   vmovaps %xmm18, 48(%rsp)
-   .seh_savexmm%xmm18, 48
+   vmovaps %xmm6, 48(%rsp)
+   .seh_savexmm%xmm6, 48
.seh_endprologue
callbar
vmovapd %xmm0, %xmm1
-   vmovapd %xmm1, %xmm18
+   vmovapd %xmm1, %xmm6
callfoo
leaq32(%rsp), %rcx
-   vmovapd %xmm18, %xmm0
-   vmovaps %xmm0, 32(%rsp)
+   vmovapd %xmm6, %xmm0
+   vmovapd %xmm0, 32(%rsp)
callbaz
nop
-   vmovaps 48(%rsp), %xmm18
+   vmovaps 48(%rsp), %xmm6
addq$72, %rsp
ret
.seh_endproc
-   .ident  "GCC: (GNU) 10.0.0 20191024 (experimental)"
+   .ident  "GCC: (GNU) 10.0.1 20200206 (experimental)"
.defbar;.scl2;  .type   32; .endef
.deffoo;.scl2;  .type   32; .endef
.defbaz;.scl2;  .type   32; .endef

GCC with the patch now seems to put the variables in xmm6, unfortunately
I don't know enough of AVX or stack setups to know if that's all that is
needed.



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] x86-64: Pass aggregates with only float/double in GPRs for MS_ABI

2020-02-07 Thread JonY

On 2/7/20 3:23 AM, H.J. Lu wrote:
> On Wed, Feb 05, 2020 at 09:51:14PM +0100, Uros Bizjak wrote:
>> On Wed, Feb 5, 2020 at 6:59 PM H.J. Lu  wrote:
>>>
>>> MS_ABI requires passing aggregates with only float/double in integer
>>> registers.  Checked gcc outputs against Clang and fixed:
>>>
>>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
>>> -Wno-unused-variable -Wno-unused-parameter
>>> -Wno-unused-but-set-variable -Wno-uninitialized -O0
>>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
>>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
>>> -Wno-unused-variable -Wno-unused-parameter
>>> -Wno-unused-but-set-variable -Wno-uninitialized -O2
>>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
>>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55
>>> -Wno-unused-variable -Wno-unused-parameter
>>> -Wno-unused-but-set-variable -Wno-uninitialized -O0
>>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
>>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=55
>>> -Wno-unused-variable -Wno-unused-parameter
>>> -Wno-unused-but-set-variable -Wno-uninitialized -O2
>>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
>>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56
>>> -Wno-unused-variable -Wno-unused-parameter
>>> -Wno-unused-but-set-variable -Wno-uninitialized -O0
>>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
>>> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=56
>>> -Wno-unused-variable -Wno-unused-parameter
>>> -Wno-unused-but-set-variable -Wno-uninitialized -O2
>>> -DABI_NUM=FFI_GNUW64 -DABI_ATTR=MSABI execution test
>>>
>>> in libffi testsuite.
>>>
>>> OK for master and backports to GCC 8/9 branches?
>>>
>>> gcc/
>>>
>>> PR target/85667
>>> * config/i386/i386.c (function_arg_ms_64): Add a type argument.
>>> Don't return aggregates with only SFmode and DFmode in SSE
>>> register.
>>> (ix86_function_arg): Pass arg.type to function_arg_ms_64.
>>>
>>> gcc/testsuite/
>>>
>>> PR target/85667
>>> * gcc.target/i386/pr85667-10.c: New test.
>>> * gcc.target/i386/pr85667-7.c: Likewise.
>>> * gcc.target/i386/pr85667-8.c: Likewise.
>>> * gcc.target/i386/pr85667-9.c: Likewise.
>>
>> LGTM, but should really be reviewed by cygwin, mingw-w64 maintainer (CC'd).
>>
> 
> I checked the result against MSVC v19.10 at
> 
> https://godbolt.org/z/2NPygd
> 
> My patch matches MSVC v19.10.  I am checking it in tomorrow unless
> mingw-w64 maintainer objects.
> 

Please go ahead and thanks.




signature.asc
Description: OpenPGP digital signature

Re: [PATCH][AARCH64] Fix for PR86901

2020-02-07 Thread Richard Earnshaw (lists)


On 07/02/2020 02:12, Modi Mo via gcc-patches wrote:

I did a quick bootstrap, this shows several failures like:

gcc/builtins.c:9427:1: error: unrecognizable insn:
  9427 | }
   | ^
(insn 212 211 213 24 (set (reg:SI 207)
 (zero_extract:SI (reg:SI 206)
 (const_int 26 [0x1a])
 (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
  (nil))

The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
Currently cases like this are split into a right shift in aarch64.md around line
5569:


Appreciate you taking a look and the validation. I've gotten access to an 
aarch64 server and the bootstrap demonstrated the issue you saw. This was 
caused by my re-definition of the pattern to:
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (mode))
+FAIL;

Which meant for SImode only a sum of >32 bit actually triggers the fail condition 
for the define_expand whereas the existing define_insn fails on >=32 bit. I looked 
into the architecture reference manual and the bits are available for ubfx/sbfx for 
that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode 
as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but 
transforms it into a LSR:

Assembly file:
ubfxw0, w0, 24, 8

Disassembly of section .text:

 :
0:   53187c00lsr w0, w0, #24


This is how LSR is implemented in AArch64, as an extract.  So the 
disassembler is showing the canonical representation.


However, inside the compiler we really want to represent this as a 
shift.  Why?  Because there are cases where we can then merge a shift 
into other operations when we can't merge the more general extract 
operation.  For example, we can merge an LSR into a subsequent 'AND' 
operation, but can't merge a more general extract into an AND. 
Essentially, we want the compiler to describe this in the canonical 
shift form rather than the extract form.


Ideally this would be handled inside the mid-end expansion of an 
extract, but in the absence of that I think this is best done inside the 
extv expansion so that we never end up with a real extract in that case.





Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other 
assemblers could trip over, I've attached a new patch that allows this encoding 
and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's 
better to explicitly do the transformation in GCC.


;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
instruction taking advantage of the implicit zero-extension of the X-reg.
(define_split
   [(set (match_operand:DI 0 "register_operand")
 (zero_extract:DI (match_operand:DI 1 "register_operand")
  (match_operand 2
"aarch64_simd_shift_imm_offset_di")
  (match_operand 3
"aarch64_simd_shift_imm_di")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
  GET_MODE_BITSIZE (DImode) - 1)
&& (INTVAL (operands[2]) + INTVAL (operands[3]))
== GET_MODE_BITSIZE (SImode)"
   [(set (match_dup 0)
 (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3]
   {
 operands[4] = gen_lowpart (SImode, operands[1]);
   }

However that only supports DImode, not SImode, so it needs to be changed
to be more general using GPI.

Your new extv patterns should replace the magic patterns above it:


With the previous discovery that a sum of 32/64 will trigger LSR in the 
assembler I was curious what would happen if I remove this pattern. Turns out, 
you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the 
test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which 
doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. 
So this pattern still has value but I don't think it's necessary to extend it 
to DI since that'll automatically get turned into a LSR by the assembler as I 
previously mentioned.



;; ---
;; Bitfields
;; ---

(define_expand ""

These are the current extv/extzv patterns, but without a mode. They are no
longer used when we start using the new ones.

Note you can write  to combine the extzv and extz patterns.
But please add a comment mentioning the pattern names so they are easy to
find!


Good call here, made this change in the new patch. I did see the define_insn of these 
guys below it but somehow missed that they were expanded just above. I believe, from 
my understanding of GCC, that the matching pattern below the first line is what 
constrains  into just extv/extsv from the long list of iterators it 
belongs to. Still, I see that there's constrained iterators elsewhere like:

;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend ""

Re: [PATCH] arm: Fix up arm installed unwind.h for use in pedantic modes [PR93615]

2020-02-07 Thread Ramana Radhakrishnan

On Fri, Feb 7, 2020 at 8:19 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the following testcase shows, unwind.h on ARM can't be (starting with GCC
> 10) compiled with -std=c* modes, only -std=gnu* modes.
> The problem is it uses asm keyword, which isn't a keyword in those modes
> (system headers vs. non-system ones don't make a difference here).
> glibc and other installed headers use __asm or __asm__ keywords instead that
> work fine in both standard and gnu modes.
>
> While there, as it is an installed header, I think it is also wrong to
> completely ignore any identifier namespace rules.
> The generic unwind.h defines just _Unwind* namespace identifiers plus
> _sleb128_t/_uleb128_t (but e.g. unlike libstdc++/glibc headers doesn't
> uglify operand names), the ARM unwind.h is much worse here.  I've just
> changed the gnu_Unwind_Find_got function at least not be in user identifier
> namespace, but perhaps it would be good to go further and rename e.g.
> #define UNWIND_STACK_REG 13
> #define UNWIND_POINTER_REG 12
> #define FDPIC_REGNUM 9
> #define STR(x) #x
> #define XSTR(x) STR(x)
> or e.g.
>   typedef _Unwind_Reason_Code (*personality_routine) (_Unwind_State,
>   _Unwind_Control_Block *, _Unwind_Context *);
> in unwind-arm-common.h.
>
> Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?
>
> 2020-02-07  Jakub Jelinek  
>
> PR target/93615
> * config/arm/unwind-arm.h (gnu_Unwind_Find_got): Rename to ...
> (_Unwind_gnu_Find_got): ... this.  Use __asm instead of asm.  Remove
> trailing :s in asm.  Formatting fixes.
> (_Unwind_decode_typeinfo_ptr): Adjust caller.
>
> * gcc.dg/pr93615.c: New test.
>

Ok, thanks jakub

Ramana

> --- libgcc/config/arm/unwind-arm.h.jj   2020-01-12 11:54:38.616380172 +0100
> +++ libgcc/config/arm/unwind-arm.h  2020-02-06 16:16:54.244624408 +0100
> @@ -43,19 +43,15 @@ extern "C" {
>  #endif
>  _Unwind_Ptr __attribute__((weak)) __gnu_Unwind_Find_got (_Unwind_Ptr);
>
> -static inline _Unwind_Ptr gnu_Unwind_Find_got (_Unwind_Ptr ptr)
> +static inline _Unwind_Ptr _Unwind_gnu_Find_got (_Unwind_Ptr ptr)
>  {
>  _Unwind_Ptr res;
>
>  if (__gnu_Unwind_Find_got)
> -   res =  __gnu_Unwind_Find_got (ptr);
> +   res = __gnu_Unwind_Find_got (ptr);
>  else
> -  {
> -   asm volatile ("mov %[result], r" XSTR(FDPIC_REGNUM)
> - : [result]"=r" (res)
> - :
> - :);
> -  }
> +   __asm volatile ("mov %[result], r" XSTR(FDPIC_REGNUM)
> +   : [result] "=r" (res));
>
>  return res;
>  }
> @@ -75,7 +71,7 @@ static inline _Unwind_Ptr gnu_Unwind_Fin
>  #if __FDPIC__
>/* For FDPIC, we store the offset of the GOT entry.  */
>/* So, first get GOT from dynamic linker and then use indirect access. 
>  */
> -  tmp += gnu_Unwind_Find_got (ptr);
> +  tmp += _Unwind_gnu_Find_got (ptr);
>tmp = *(_Unwind_Word *) tmp;
>  #elif (defined(linux) && !defined(__uClinux__)) || defined(__NetBSD__) \
>  || defined(__FreeBSD__) || defined(__fuchsia__)
> --- gcc/testsuite/gcc.dg/pr93615.c.jj   2020-02-06 22:40:00.921472574 +0100
> +++ gcc/testsuite/gcc.dg/pr93615.c  2020-02-06 22:39:52.937591443 +0100
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c11" } */
> +/* { dg-require-effective-target exceptions } */
> +
> +#include 
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
>
> Jakub
>

Re: [PATCH] avoid issuing -Wrestrict from folder (PR 93519)

2020-02-07 Thread Richard Biener

On Fri, Feb 7, 2020 at 12:08 AM Martin Sebor  wrote:
>
> On 2/6/20 6:16 AM, Richard Biener wrote:
> > On Thu, Feb 6, 2020 at 2:00 PM Jeff Law  wrote:
> >>
> >> On Wed, 2020-02-05 at 09:19 +0100, Richard Biener wrote:
> >>> On Tue, Feb 4, 2020 at 11:02 PM Martin Sebor  wrote:
>  On 2/4/20 2:31 PM, Jeff Law wrote:
> > On Tue, 2020-02-04 at 13:08 -0700, Martin Sebor wrote:
> >> On 2/4/20 12:15 PM, Richard Biener wrote:
> >>> On February 4, 2020 5:30:42 PM GMT+01:00, Jeff Law  
> >>> wrote:
>  On Tue, 2020-02-04 at 10:34 +0100, Richard Biener wrote:
> > On Tue, Feb 4, 2020 at 1:44 AM Martin Sebor  
> > wrote:
> >> PR 93519 reports a false positive -Wrestrict issued for an inlined
>  call
> >> to strcpy that carefully guards against self-copying.  This is
>  caused
> >> by the caller's arguments substituted into the call during inlining
>  and
> >> before dead code elimination.
> >>
> >> The attached patch avoids this by removing -Wrestrict from the
>  folder
> >> and deferring folding perfectly overlapping (and so undefined)
>  calls
> >> to strcpy (and mempcpy, but not memcpy) until much later.  Calls to
> >> perfectly overlapping calls to memcpy are still folded early.
> >
> > Why do we bother to warn at all for this case?  Just DWIM here.
>  Warnings like
> > this can be emitted from the analyzer?
>  They potentially can, but the analyzer is and will almost always
>  certainly be considerably slower.  I would not expect it to be used
>  nearly as much as the core compiler.
> 
>  WHether or not a particular warning makes sense in the core compiler 
>  or
>  analyzer would seem to me to depend on whether or not we can 
>  reasonably
>  issue warnings without interprocedural analysis.  double-free
>  realistically requires interprocedural analysis to be effective.  I'm
>  not sure Wrestrict really does.
> 
> 
> > That is, I suggest to simply remove the bogus warning code from
>  folding
> > (and _not_ fail the folding).
>  I haven't looked at the patch, but if we can get the warning out of 
>  the
>  folder that's certainly preferable.  And we could investigate 
>  deferring
>  self-copy removal.
> >>>
> >>> I think the issue is as usual, warning for code we'll later remove as 
> >>> dead. Warning at folding is almost always premature.
> >>
> >> In this instance the code is reachable (or isn't obviously 
> >> unreachable).
> >> GCC doesn't remove it, but provides benign (and reasonable) semantics
> >> for it(*).  To me, that's one aspect of quality.  Letting the user know
> >> that the code is buggy is another.  I view that as at least as 
> >> important
> >> as folding the ill-effects away because it makes it possible to fix
> >> the problem so the code works correctly even with compilers that don't
> >> provide these benign semantics.
> > If you look at the guts of what happens at the point where we issue the
> > warning from within gimple_fold_builtin_strcpy we have:
> >
> >> DCH_to_char (char * in, char * out, int collid)
> >> {
> >> int type;
> >> char * D.2148;
> >> char * dest;
> >> char * num;
> >> long unsigned int _4;
> >> char * _5;
> >>
> >> ;;   basic block 2, loop depth 0
> >> ;;pred:   ENTRY
> >> ;;succ:   4
> >>
> >> ;;   basic block 4, loop depth 0
> >> ;;pred:   2
> >> ;;succ:   5
> >>
> >> ;;   basic block 5, loop depth 0
> >> ;;pred:   4
> >> ;;succ:   6
> >>
> >> ;;   basic block 6, loop depth 0
> >> ;;pred:   5
> >> if (0 != 0)
> >>   goto ; [53.47%]
> >> else
> >>   goto ; [46.53%]
> >> ;;succ:   7
> >> ;;8
> >>
> >> ;;   basic block 7, loop depth 0
> >> ;;pred:   6
> >> strcpy (out_1(D), out_1(D));
> >> ;;succ:   8
> >>
> >> ;;   basic block 8, loop depth 0
> >> ;;pred:   6
> >> ;;7
> >> _4 = __builtin_strlen (out_1(D));
> >> _5 = out_1(D) + _4;
> >> __builtin_memcpy (_5, "foo", 4);
> >> ;;succ:   3
> >>
> >> ;;   basic block 3, loop depth 0
> >> ;;pred:   8
> >> return;
> >> ;;succ:   EXIT
> >>
> >> }
> >>
> >
> > Which shows the code is obviously unreachable in the case we're warning
> > about.  You can't see this in the dumps because it's exposed by
> > inlining, then cleaned up before writing the dump file.
> 
>  In the specific case of the bug the co

Re: [PATCH] libstdc++: Optimize C++20 comparison category types

2020-02-07 Thread Daniel Krügler

Am Do., 6. Feb. 2020 um 15:28 Uhr schrieb Jonathan Wakely :
>
> On 06/02/20 13:53 +, Jonathan Wakely wrote:
> >On 06/02/20 13:40 +, Jonathan Wakely wrote:
> >>This reduces sizeof(std::partial_ordering) and optimizes conversion and
> >>comparison operators to avoid conditional branches where possible.
> >>
> >>  * libsupc++/compare (__cmp_cat::_Ncmp::unordered): Change value to 2.
> >>  (partial_ordering::_M_is_ordered): Remove data member.
> >>  (partial_ordering): Use second bit of _M_value for unordered. Adjust
> >>  comparison operators.
> >>  (weak_ordering::operator partial_ordering): Simplify to remove
> >>  branches.
> >>  (operator<=>(unspecified, weak_ordering)): Likewise.
> >>  (strong_ordering::operator partial_ordering): Likewise.
> >>  (strong_ordering::operator weak_ordering): Likewise.
> >>  (operator<=>(unspecified, strong_ordering)): Likewise.
> >>  * testsuite/18_support/comparisons/categories/partialord.cc: New test.
> >>  * testsuite/18_support/comparisons/categories/strongord.cc: New test.
> >>  * testsuite/18_support/comparisons/categories/weakord.cc: New test.
> >>
> >>Tested powerpc64le-linux and x86_64-linux.
> >>
> >>This is an ABI change for the partial_ordering type, but that is why I
> >>think we should do it now, not after GCC 10 is released. The sooner
> >>the better, before these types are being widely used.
> >>
> >>I plan to commit this in the next 12 hours or so, unless there are
> >>(valid :-) objections.
> >>
> >>Thanks to Barry Revzin for pointing out there was room for these
> >>operators to be improved.
> >
> >We could also change the int _M_value data member of all three
> >comparison category types to be a signed char instead of int. That
> >would reduce the size further.
>
> Or maybe std::int_fast8_t is the right type here.

I like this suggestion.

- Daniel

[PATCH] xfail and improve some failing libgomp tests

2020-02-07 Thread Harwath, Frederik

Hi,
the libgomp testsuite contains some test cases (all in 
/libgomp/testsuite/libgomp.c/)
which fail with nvptx offloading because of some long standing issues:

* {target-32.c, thread-limit-2.c}:
no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690

* target-{33,34}.c:
no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. 
https://gcc.gnu.org/PR81688

* target-link-1.c:
omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689


All these issues have been known, at least, since 2016:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00972.html

As suggested in this mail:
 "Short term, it should be possible to implement something like -foffload=^nvptx
to skip PTX (and only PTX) offloading on those tests."

Well, we can now skip/xfail tests for nvptx offloading using the effective 
target
"offload_target_nvptx" and the present patch uses this to xfail the tests for 
which
no short-term solution is in sight, i.e. the GOMP_OFFLOAD_async_run and the 
"target link"
related failures.

Regarding the "usleep" issue, I have decided to follow Jakub's suggestion
(cf. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01026.html) to
replace usleep by busy waiting. As noted by Tobias
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690#c4), this involves creating 
separate
test files for the cases with and without usleep. This solution is a bit 
cumbersome but I
think we can live with it, in particular, since the actual test case 
implementations do not
get duplicated (they have been moved into auxiliary header files which are 
shared by both
variants of the corresponding tests).

Since the "usleep" issue also concerns amdgcn, I have introduced an effective 
target
"offload_target_amdgcn" to add xfails for this offloading target, too. This 
behaves like
"offload_target_nvptx" but for amdgcn. Note that the existing amdgcn effective 
targets
cannot be used for our purpose since they are OpenACC-specific.

The new thread-limit-2-nosleep.c should now pass for both nvptx and amdgcn 
offloading
whereas thread-limit-2.c should xfail. The new target-32-nosleep.c passes with 
amdgcn
offloading, but xfails with nvptx offloading, because it also needs the 
unimplemented
GOMP_OFFLOAD_async_run.

With the patch, the detailed test summary now looks as follows for me:

nvptx offloading:

// Expected execution failures due to missing usleep
PASS: libgomp.c/target-32-nosleep.c (test for excess errors)
XFAIL: libgomp.c/target-32-nosleep.c execution test// missing 
GOMP_OFFLOAD_async_run
XFAIL: libgomp.c/target-32.c (test for excess errors)
UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable

PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors)
PASS: libgomp.c/thread-limit-2-nosleep.c execution test
XFAIL: libgomp.c/thread-limit-2.c (test for excess errors)
UNRESOLVED: libgomp.c/thread-limit-2.c compilation failed to produce executable

// Expected execution failures due to missing GOMP_OFFLOAD_async_run
PASS: libgomp.c/target-33.c (test for excess errors)
XFAIL: libgomp.c/target-33.c execution test
PASS: libgomp.c/target-34.c (test for excess errors)
XFAIL: libgomp.c/target-34.c execution test

// Expected compilation failures due to missing target link
XFAIL: libgomp.c/target-link-1.c (test for excess errors)
UNRESOLVED: libgomp.c/target-link-1.c compilation failed to produce executable


amdgcn offloading:

// Tests using usleep
PASS: libgomp.c/target-32-nosleep.c (test for excess errors)
PASS: libgomp.c/target-32-nosleep.c execution test
XFAIL: libgomp.c/target-32.c 7 blank line(s) in output
XFAIL: libgomp.c/target-32.c (test for excess errors)
UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable

PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors)
PASS: libgomp.c/thread-limit-2-nosleep.c execution test
XFAIL: libgomp.c/thread-limit-2.c 1 blank line(s) in output
XFAIL: libgomp.c/thread-limit-2.c (test for excess errors)

// No failures since GOMP_OFFLOAD_async_run works on amdgcn
PASS: libgomp.c/target-33.c (test for excess errors)
PASS: libgomp.c/target-33.c execution test
PASS: libgomp.c/target-34.c (test for excess errors)
PASS: libgomp.c/target-34.c execution test

// No xfail here
PASS: libgomp.c/target-link-1.c (test for excess errors)
FAIL: libgomp.c/target-link-1.c execution test

Note that target-link-1.c execution does also fail on amdgcn.
Since - in contrast to nvptx - it seems that the cause of this failure
has not yet been investigated and discussed, I have not added an xfail
for amdgcn to this test.

All testing has been done with a x86_64-linux-gnu host and target.

Ok to commit this patch?

Best regards,
Frederik





From 6e5e2d45f02235a0f72e6130dcd8d52f88f7b126 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 7 Feb 2020 08:03:00 +0100
Subject: [PATCH] xfail and improve some failing libgomp tests

* libgomp.c/{target-32.c,thread-limit-2.c}

Regarding failures because "no usleep implemented for nvpt

[PATCH] arm: Fix up arm installed unwind.h for use in pedantic modes [PR93615]

2020-02-07 Thread Jakub Jelinek

Hi!

As the following testcase shows, unwind.h on ARM can't be (starting with GCC
10) compiled with -std=c* modes, only -std=gnu* modes.
The problem is it uses asm keyword, which isn't a keyword in those modes
(system headers vs. non-system ones don't make a difference here).
glibc and other installed headers use __asm or __asm__ keywords instead that
work fine in both standard and gnu modes.

While there, as it is an installed header, I think it is also wrong to
completely ignore any identifier namespace rules.
The generic unwind.h defines just _Unwind* namespace identifiers plus
_sleb128_t/_uleb128_t (but e.g. unlike libstdc++/glibc headers doesn't
uglify operand names), the ARM unwind.h is much worse here.  I've just
changed the gnu_Unwind_Find_got function at least not be in user identifier
namespace, but perhaps it would be good to go further and rename e.g.
#define UNWIND_STACK_REG 13
#define UNWIND_POINTER_REG 12
#define FDPIC_REGNUM 9
#define STR(x) #x
#define XSTR(x) STR(x)
or e.g.
  typedef _Unwind_Reason_Code (*personality_routine) (_Unwind_State,
  _Unwind_Control_Block *, _Unwind_Context *);
in unwind-arm-common.h.

Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk?

2020-02-07  Jakub Jelinek  

PR target/93615
* config/arm/unwind-arm.h (gnu_Unwind_Find_got): Rename to ...
(_Unwind_gnu_Find_got): ... this.  Use __asm instead of asm.  Remove
trailing :s in asm.  Formatting fixes.
(_Unwind_decode_typeinfo_ptr): Adjust caller.

* gcc.dg/pr93615.c: New test.

--- libgcc/config/arm/unwind-arm.h.jj   2020-01-12 11:54:38.616380172 +0100
+++ libgcc/config/arm/unwind-arm.h  2020-02-06 16:16:54.244624408 +0100
@@ -43,19 +43,15 @@ extern "C" {
 #endif
 _Unwind_Ptr __attribute__((weak)) __gnu_Unwind_Find_got (_Unwind_Ptr);
 
-static inline _Unwind_Ptr gnu_Unwind_Find_got (_Unwind_Ptr ptr)
+static inline _Unwind_Ptr _Unwind_gnu_Find_got (_Unwind_Ptr ptr)
 {
 _Unwind_Ptr res;
 
 if (__gnu_Unwind_Find_got)
-   res =  __gnu_Unwind_Find_got (ptr);
+   res = __gnu_Unwind_Find_got (ptr);
 else
-  {
-   asm volatile ("mov %[result], r" XSTR(FDPIC_REGNUM)
- : [result]"=r" (res)
- :
- :);
-  }
+   __asm volatile ("mov %[result], r" XSTR(FDPIC_REGNUM)
+   : [result] "=r" (res));
 
 return res;
 }
@@ -75,7 +71,7 @@ static inline _Unwind_Ptr gnu_Unwind_Fin
 #if __FDPIC__
   /* For FDPIC, we store the offset of the GOT entry.  */
   /* So, first get GOT from dynamic linker and then use indirect access.  
*/
-  tmp += gnu_Unwind_Find_got (ptr);
+  tmp += _Unwind_gnu_Find_got (ptr);
   tmp = *(_Unwind_Word *) tmp;
 #elif (defined(linux) && !defined(__uClinux__)) || defined(__NetBSD__) \
 || defined(__FreeBSD__) || defined(__fuchsia__)
--- gcc/testsuite/gcc.dg/pr93615.c.jj   2020-02-06 22:40:00.921472574 +0100
+++ gcc/testsuite/gcc.dg/pr93615.c  2020-02-06 22:39:52.937591443 +0100
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c11" } */
+/* { dg-require-effective-target exceptions } */
+
+#include 
+
+int
+main ()
+{
+  return 0;
+}

Jakub

Re: [PATCH] i386: Better patch to improve avx* vector concatenation [PR93594]

2020-02-07 Thread Uros Bizjak

On Fri, Feb 7, 2020 at 9:05 AM Jakub Jelinek  wrote:
>
> Hi!
>
> After thinking some more on this, we can do better; rather than having to
> add a new prereload splitter pattern to catch all other cases where it might
> be beneficial to fold first part of an UNSPEC_CAST back to the unspec
> operand, this patch reverts the *.md changes I've made yesterday and instead
> tweaks the patterns, so that simplify-rtx.c can optimize those on its own.
> Instead of the whole SET_SRC being an UNSPEC through which simplify-rtx.c
> obviously can't optimize anything, this represents those patterns through a
> VEC_CONCAT (or two nested ones for the 128-bit -> 512-bit casts) with the
> operand as the low part of it and UNSPEC representing just the high part of
> it (the undefined, to be ignored, bits).  While richi suggested using
> already in GIMPLE for those using a SSA_NAME default definition (i.e.
> clearly uninitialized use), I'd say that uninit pass would warn about those,
> but more importantly, in RTL it would probably force zero initialization of
> that or use or an uninitialized pseudo, all of which is hard to match in an
> pattern, so I think an UNSPEC is better for that.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-02-07  Jakub Jelinek  
>
> PR target/93594
> * config/i386/predicates.md (avx_identity_operand): Remove.
> * config/i386/sse.md (*avx_vec_concat_1): Remove.
> (avx__,
> avx512f__256): Change patterns to
> a VEC_CONCAT of the operand and UNSPEC_CAST.
> (avx512f__): Change pattern to
> a VEC_CONCAT of VEC_CONCAT of the operand and UNSPEC_CAST with
> UNSPEC_CAST.

OK.

Thanks,
Uros.

> --- gcc/config/i386/predicates.md.jj2020-02-06 11:08:30.298433916 +0100
> +++ gcc/config/i386/predicates.md   2020-02-06 13:34:50.807095773 +0100
> @@ -1584,19 +1584,6 @@ (define_predicate "palignr_operand"
>return true;
>  })
>
> -;; Return true if OP is a parallel for identity permute.
> -(define_predicate "avx_identity_operand"
> -  (and (match_code "parallel")
> -   (match_code "const_int" "a"))
> -{
> -  int i, nelt = XVECLEN (op, 0);
> -
> -  for (i = 0; i < nelt; ++i)
> -if (INTVAL (XVECEXP (op, 0, i)) != i)
> -  return false;
> -  return true;
> -})
> -
>  ;; Return true if OP is a proper third operand to vpblendw256.
>  (define_predicate "avx2_pblendw_operand"
>(match_code "const_int")
> --- gcc/config/i386/sse.md.jj   2020-02-06 11:08:30.327433479 +0100
> +++ gcc/config/i386/sse.md  2020-02-06 13:40:27.485007762 +0100
> @@ -21157,9 +21157,9 @@ (define_expand "cbranch4"
>
>  (define_insn_and_split "avx__"
>[(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
> -   (unspec:AVX256MODE2P
> - [(match_operand: 1 "nonimmediate_operand" "xm,x")]
> - UNSPEC_CAST))]
> +   (vec_concat:AVX256MODE2P
> + (match_operand: 1 "nonimmediate_operand" "xm,x")
> + (unspec: [(const_int 0)] UNSPEC_CAST)))]
>"TARGET_AVX && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
>"#"
>"&& reload_completed"
> @@ -21358,24 +21358,6 @@ (define_insn "avx_vec_concat"
> (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>
> -(define_insn_and_split "*avx_vec_concat_1"
> -  [(set (match_operand:V_256_512 0 "register_operand")
> -   (vec_concat:V_256_512
> - (vec_select:
> -   (unspec:V_256_512
> - [(match_operand: 1 "nonimmediate_operand")]
> - UNSPEC_CAST)
> -   (match_parallel 3 "avx_identity_operand"
> - [(match_operand 4 "const_int_operand")]))
> - (match_operand: 2 "nonimm_or_0_operand")))]
> -  "TARGET_AVX
> -   && (operands[2] == CONST0_RTX (mode)
> -   || !MEM_P (operands[1]))
> -   && ix86_pre_reload_split ()"
> -  "#"
> -  "&& 1"
> -  [(set (match_dup 0) (vec_concat:V_256_512 (match_dup 1) (match_dup 2)))])
> -
>  (define_insn "vcvtph2ps"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
> (vec_select:V4SF
> @@ -22198,9 +22180,11 @@ (define_insn "sha256rnds2"
>
>  (define_insn_and_split "avx512f__"
>[(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
> -   (unspec:AVX512MODE2P
> - [(match_operand: 1 "nonimmediate_operand" "xm,x")]
> - UNSPEC_CAST))]
> +   (vec_concat:AVX512MODE2P
> + (vec_concat:
> +   (match_operand: 1 "nonimmediate_operand" "xm,x")
> +   (unspec: [(const_int 0)] UNSPEC_CAST))
> + (unspec: [(const_int 0)] UNSPEC_CAST)))]
>"TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
>"#"
>"&& reload_completed"
> @@ -22215,9 +22199,9 @@ (define_insn_and_split "avx512f_
>  (define_insn_and_split "avx512f__256"
>[(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
> -   (unspec:AVX512MODE2P
> - [(match_operand: 1 "nonimmediate_operand" "xm,x")]
> - UNSPEC_CAST))]
> +   (vec_concat:AVX512

[PATCH] i386: Better patch to improve avx* vector concatenation [PR93594]

2020-02-07 Thread Jakub Jelinek

Hi!

After thinking some more on this, we can do better; rather than having to
add a new prereload splitter pattern to catch all other cases where it might
be beneficial to fold first part of an UNSPEC_CAST back to the unspec
operand, this patch reverts the *.md changes I've made yesterday and instead
tweaks the patterns, so that simplify-rtx.c can optimize those on its own.
Instead of the whole SET_SRC being an UNSPEC through which simplify-rtx.c
obviously can't optimize anything, this represents those patterns through a
VEC_CONCAT (or two nested ones for the 128-bit -> 512-bit casts) with the
operand as the low part of it and UNSPEC representing just the high part of
it (the undefined, to be ignored, bits).  While richi suggested using
already in GIMPLE for those using a SSA_NAME default definition (i.e.
clearly uninitialized use), I'd say that uninit pass would warn about those,
but more importantly, in RTL it would probably force zero initialization of
that or use or an uninitialized pseudo, all of which is hard to match in an
pattern, so I think an UNSPEC is better for that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-07  Jakub Jelinek  

PR target/93594
* config/i386/predicates.md (avx_identity_operand): Remove.
* config/i386/sse.md (*avx_vec_concat_1): Remove.
(avx__, 
avx512f__256): Change patterns to
a VEC_CONCAT of the operand and UNSPEC_CAST.
(avx512f__): Change pattern to
a VEC_CONCAT of VEC_CONCAT of the operand and UNSPEC_CAST with
UNSPEC_CAST.

--- gcc/config/i386/predicates.md.jj2020-02-06 11:08:30.298433916 +0100
+++ gcc/config/i386/predicates.md   2020-02-06 13:34:50.807095773 +0100
@@ -1584,19 +1584,6 @@ (define_predicate "palignr_operand"
   return true;
 })
 
-;; Return true if OP is a parallel for identity permute.
-(define_predicate "avx_identity_operand"
-  (and (match_code "parallel")
-   (match_code "const_int" "a"))
-{
-  int i, nelt = XVECLEN (op, 0);
-
-  for (i = 0; i < nelt; ++i)
-if (INTVAL (XVECEXP (op, 0, i)) != i)
-  return false;
-  return true;
-})
-
 ;; Return true if OP is a proper third operand to vpblendw256.
 (define_predicate "avx2_pblendw_operand"
   (match_code "const_int")
--- gcc/config/i386/sse.md.jj   2020-02-06 11:08:30.327433479 +0100
+++ gcc/config/i386/sse.md  2020-02-06 13:40:27.485007762 +0100
@@ -21157,9 +21157,9 @@ (define_expand "cbranch4"
 
 (define_insn_and_split "avx__"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
-   (unspec:AVX256MODE2P
- [(match_operand: 1 "nonimmediate_operand" "xm,x")]
- UNSPEC_CAST))]
+   (vec_concat:AVX256MODE2P
+ (match_operand: 1 "nonimmediate_operand" "xm,x")
+ (unspec: [(const_int 0)] UNSPEC_CAST)))]
   "TARGET_AVX && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"
@@ -21358,24 +21358,6 @@ (define_insn "avx_vec_concat"
(set_attr "prefix" "maybe_evex")
(set_attr "mode" "")])
 
-(define_insn_and_split "*avx_vec_concat_1"
-  [(set (match_operand:V_256_512 0 "register_operand")
-   (vec_concat:V_256_512
- (vec_select:
-   (unspec:V_256_512
- [(match_operand: 1 "nonimmediate_operand")]
- UNSPEC_CAST)
-   (match_parallel 3 "avx_identity_operand"
- [(match_operand 4 "const_int_operand")]))
- (match_operand: 2 "nonimm_or_0_operand")))]
-  "TARGET_AVX
-   && (operands[2] == CONST0_RTX (mode)
-   || !MEM_P (operands[1]))
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(set (match_dup 0) (vec_concat:V_256_512 (match_dup 1) (match_dup 2)))])
-
 (define_insn "vcvtph2ps"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(vec_select:V4SF
@@ -22198,9 +22180,11 @@ (define_insn "sha256rnds2"
 
 (define_insn_and_split "avx512f__"
   [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
-   (unspec:AVX512MODE2P
- [(match_operand: 1 "nonimmediate_operand" "xm,x")]
- UNSPEC_CAST))]
+   (vec_concat:AVX512MODE2P
+ (vec_concat:
+   (match_operand: 1 "nonimmediate_operand" "xm,x")
+   (unspec: [(const_int 0)] UNSPEC_CAST))
+ (unspec: [(const_int 0)] UNSPEC_CAST)))]
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"
@@ -22215,9 +22199,9 @@ (define_insn_and_split "avx512f__256"
   [(set (match_operand:AVX512MODE2P 0 "nonimmediate_operand" "=x,m")
-   (unspec:AVX512MODE2P
- [(match_operand: 1 "nonimmediate_operand" "xm,x")]
- UNSPEC_CAST))]
+   (vec_concat:AVX512MODE2P
+ (match_operand: 1 "nonimmediate_operand" "xm,x")
+ (unspec: [(const_int 0)] UNSPEC_CAST)))]
   "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "#"
   "&& reload_completed"

Jakub

Re: [PATCH] i386: Fix splitters that call extract_insn_cached [PR93611]

2020-02-07 Thread Uros Bizjak

On Fri, Feb 7, 2020 at 8:58 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs.  The generated split_insns starts
> with recog_data.insn = NULL and then tries to put various operands into
> recog_data.operand array and checks various splitter conditions.
> The problem is that some atom related tuning splitters indirectly call
> extract_insn_cached on the insn they are used in.  This can change
> recog_data.operand, but most likely it will just keep it as is, but
> sets recog_data.insn to the current instruction.  If that splitter doesn't
> match, we continue trying some other split conditions and modify
> recog_data.operand array again.  If even that doesn't find any usable
> splitter, we punt, but at that point recog_data.insn says that recog_data
> is valid for that particular instruction, even when recog_data.operand array
> can be anything.
> The safest thing would be to copy whole recog_data to a temporary object
> before doing the calls that can call extract_insn_cached and restore it
> afterwards, but it would be also very costly, recog_data has 1280 bytes.
> So, this patch just makes sure to clear recog_data.insn if it has changed
> during the extract_insn_cached call, which means if we extract_insn_cached
> later, we'll extract it properly, while if we call it say from some other
> context than splitter conditions, the insn is already cached, we don't reset
> the cache.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-02-07  Jakub Jelinek  
>
> PR target/93611
> * config/i386/i386.c (ix86_lea_outperforms): Make sure to clear
> recog_data.insn if distance_non_agu_define changed it.
>
> * gcc.target/i386/pr93611.c: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2020-01-28 08:45:56.781090684 +0100
> +++ gcc/config/i386/i386.c  2020-02-06 16:29:35.548663197 +0100
> @@ -14459,9 +14459,18 @@ ix86_lea_outperforms (rtx_insn *insn, un
>return true;
>  }
>
> +  rtx_insn *rinsn = recog_data.insn;
> +
>dist_define = distance_non_agu_define (regno1, regno2, insn);
>dist_use = distance_agu_use (regno0, insn);
>
> +  /* distance_non_agu_define can call extract_insn_cached.  If this function
> + is called from define_split conditions, that can break insn splitting,
> + because split_insns works by clearing recog_data.insn and then modifying
> + recog_data.operand array and match the various split conditions.  */
> +  if (recog_data.insn != rinsn)
> +recog_data.insn = NULL;
> +
>if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
>  {
>/* If there is no non AGU operand definition, no AGU
> --- gcc/testsuite/gcc.target/i386/pr93611.c.jj  2020-02-06 12:24:28.005976435 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr93611.c 2020-02-06 12:24:17.685131826 
> +0100
> @@ -0,0 +1,5 @@
> +/* PR target/93611 */
> +/* { dg-do compile } */
> +/* { dg-options "-fira-algorithm=priority -O3 -mtune=bonnell" } */
> +
> +#include "../../gcc.dg/vect/pr58508.c"
>
> Jakub
>

55 matches

Mail list logo