Re: [PATCH V2] correct COUNT and PROB for unrolled loop

2020-02-27 Thread Jiufu Guo
Jiufu Guo  writes:

Hi!

I'd like to ping following patch. just in case it may make sense to
include in GCC 10. Thanks!

https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00927.html

Jiufu

> Hi Honza and all,
>
> I updated the patch a little as below. Bootstrap and regtest are ok
> on powerpc64le.
>
> Is OK for trunk?
>
> Thanks for comments.
> Jiufu
>
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 727e951..ded0046 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify-me.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "dumpfile.h"
> +#include "cfgrtl.h"
>  
>  static void copy_loops_to (class loop **, int,
>  class loop *);
> @@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, edge 
> e,
> /* If original loop is executed COUNT_IN times, the unrolled
>loop will account SCALE_MAIN_DEN times.  */
> scale_main = count_in.probability_in (scale_main_den);
> +
> +   /* If we are guessing at the number of iterations and count_in
> +  becomes unrealistically small, reset probability.  */
> +   if (!(count_in.reliable_p () || loop->any_estimate))
> + {
> +   profile_count new_count_in = count_in.apply_probability 
> (scale_main);
> +   profile_count preheader_count = loop_preheader_edge (loop)->count 
> ();
> +   if (new_count_in.apply_scale (1, 10) < preheader_count)
> + scale_main = profile_probability::likely ();
> + }
> +
> scale_act = scale_main * prob_pass_main;
>   }
>else
>   {
> +   profile_count new_loop_count;
> profile_count preheader_count = e->count ();
> -   for (i = 0; i < ndupl; i++)
> - scale_main = scale_main * scale_step[i];
> scale_act = preheader_count.probability_in (count_in);
> +   /* Compute final preheader count after peeling NDUPL copies.  */
> +   for (i = 0; i < ndupl; i++)
> + preheader_count = preheader_count.apply_probability (scale_step[i]);
> +   /* Subtract out exit(s) from peeled copies.  */
> +   new_loop_count = count_in - (e->count () - preheader_count);
> +   scale_main = new_loop_count.probability_in (count_in);
>   }
>  }
>  
> @@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, edge 
> e,
> scale_bbs_frequencies (new_bbs, n, scale_act);
> scale_act = scale_act * scale_step[j];
>   }
> +
> +  /* Need to update PROB of exit edge and corresponding COUNT.  */
> +  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
> +   && bbs_to_scale)
> + {
> +   edge new_exit = new_spec_edges[SE_ORIG];
> +   profile_count new_count_in = new_exit->src->count;
> +   profile_count preheader_count = loop_preheader_edge (loop)->count ();
> +   edge e;
> +   edge_iterator ei;
> +
> +   FOR_EACH_EDGE (e, ei, new_exit->src->succs)
> + if (e != new_exit)
> +   break;
> +
> +   gcc_assert (e && e != new_exit);
> +
> +   new_exit->probability = preheader_count.probability_in (new_count_in);
> +   e->probability = new_exit->probability.invert ();
> +
> +   profile_count new_latch_count
> + = new_exit->src->count.apply_probability (e->probability);
> +   profile_count old_latch_count = e->dest->count;
> +
> +   EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi)
> + scale_bbs_frequencies_profile_count (new_bbs + i, 1,
> +  new_latch_count,
> +  old_latch_count);
> +
> +   if (current_ir_type () != IR_GIMPLE)
> + update_br_prob_note (e->src);
> + }
>  }
>free (new_bbs);
>free (orig_loops);
> diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c
> new file mode 100644
> index 000..f3b7c22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68212.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
> max-unroll-times=4 -fdump-rtl-alignments" } */
> +
> +void foo(long int *a, long int *b, long int n)
> +{
> +  long int i;
> +
> +  for (i = 0; i < n; i++)
> +a[i] = *b;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 
> "alignments"} } */
> +


Re: collect2.exe errors not pruned

2020-02-27 Thread Andrew Pinski
On Thu, Feb 27, 2020 at 9:17 PM Alexandre Oliva  wrote:
>
>
> Testing on platforms with an executable suffix gets a few excess output
> failures because e.g. collect2.exe errors do not get pruned.
>
> I'm not sure it's appropriate for the error to not omit the host
> platform's executable suffix, just as it omits directory components from
> argv[0], so I'm undecided between fixing collect2.c's initialization of
> progname or extending the regexp, as in the (untested) patchlet below.
>
> Any preferences?
>
>
> diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
> index eea4bf3..6d6a7fe 100644
> --- a/gcc/testsuite/lib/prune.exp
> +++ b/gcc/testsuite/lib/prune.exp
> @@ -38,7 +38,7 @@ proc prune_gcc_output { text } {
>  regsub -all "(^|\n)\[^\n\]*:   in .constexpr. expansion \[^\n\]*" $text 
> "" text
>  regsub -all "(^|\n)\[^\n\]*:   in requirements \[^\n\]*" $text "" text
>  regsub -all "(^|\n)inlined from \[^\n\]*" $text "" text
> -regsub -all "(^|\n)collect2: error: ld returned \[^\n\]*" $text "" text
> +regsub -all "(^|\n)collect2(\.exe)?: error: ld returned \[^\n\]*" $text 
> "" text
If you touch that line
You may as well also touch this line too:
>  regsub -all "(^|\n)collect: re(compiling|linking)\[^\n\]*" $text "" text

Thanks,
Andrew Pinski

>  regsub -all "(^|\n)Please submit.*instructions\[^\n\]*" $text "" text
>  regsub -all "(^|\n)\[0-9\]\[0-9\]* errors\." $text "" text
>
> --
> Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
> Free Software Evangelist  Stallman was right, but he's left :(
> GNU Toolchain Engineer   The darkest places in hell are reserved for those
> who maintain their neutrality in times of moral crisis. -- Dante Alighieri


[PATCH], PR target/93937, Fix variable vec_extract insn that will never match

2020-02-27 Thread Michael Meissner
As part of my work in adding support for -mcpu=future, I noticed an insn that
would never match.

Here is the insn:

(define_insn_and_split "*vsx_extract__mode_var"
  [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r")
(zero_extend:
 (unspec:
  [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
  UNSPEC_VSX_EXTRACT)))
   (clobber (match_scratch:DI 3 "=r,r,"))
   (clobber (match_scratch:V2DI 4 "=X,,X"))]
  "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
  "#"
  "&& reload_completed"
  [(const_int 0)]
{
  machine_mode smode = mode;
  rs6000_split_vec_extract_var (gen_rtx_REG (smode, REGNO (operands[0])),
operands[1], operands[2],
operands[3], operands[4]);
  DONE;
}
  [(set_attr "isa" "p9v,*,*")])

It will never match, because the zero_extend result is the same mode as the
input, so the machine independent parts of the compiler would never insert the
zero extend.  Instead, an explicit extend is generated to convert the type to
DImode.  In addition, there is the need to split the insn into two parts, one
that handles the register and the other the memory optimization such as was
done for the other 3 variable insns as part of PR target/93932.

There is a wider issue to optimize all cases of vec_extract to do the sign,
zero, and float extension automatically when we are loading from memory, which
is PR target/93230.  I have patches for all of the cases for 93230, but they
will need to wait until GCC 11 opens up.

But it irks me to have this pattern that mostly was correct, but it would never
match.  As I see it, there are 4 options:

1) Delete the insn completely, since it doesn't match, and then put in code
later to cover the case when PR target/93230 is done.

2) Ignore the patterns in the source code, accepting that they are useless, and
will be fixed some day.

3) Patch the existing insns so that they do match, but don't add all of the
other options that could be added (adding sign extension, adding the ability to
load the value into vector registers with ISA 2.07, optimizing vec_extract
being cnverted to floating point to avoid direct moves, etc.).

4) Do all of the possibilities now.  The trouble is due to the number of
special cases, this can add up to a number of new insns (for example, dealing
with HImode/QImode also being zero extended to SImode in addition to DImode,
dealing with QImode not having a sign extending load, dealing with SImode that
can load into the vector registers at ISA 2.05/2.06 which HI/QI modes need
2.07, etc.).

Given we are in stage 4, I think #4 is not appropriate (but if you want, I can
do the patches).

I would prefer to either delete the patterns (#1) or fix them in a limited
extent (#3).  I would prefer not to leave them unchanged in vsx.md, but
obviously that is the simplest approach.

This patch implements #3, fixing the insns as written, but not extending them
to handle all of the other special cases.

I have built compilers on both big endian and little endian PowerPC Linux
systems.  These patches fix the instruction counts of the three tests that now
eliminate the zero_extend operation.  Can I check these patches into the master
branch?

[gcc]
2020-02-28  Michael Meissner  

PR target/93937
* config/rs6000/vsx.md (vsx_extract__mode_var):
Delete, replace with vsx_extract__uns_di_var.
(vsx_extract__uns_di_var): Replacement insn that will match
zero extensions properly.  Restrict the vector to be in a
register.
(vsx_extract__uns_di_var_load): New insn to handle variable
vector extract from memory and combine it with a zero extend to
DImode.

[gcc/testsuite]
2020-02-28  Michael Meissner  

PR target/93937
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Update
instruction counts.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Update
instruction counts.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Update
instruction counts.

--- /tmp/DHFA3L_vsx.md  2020-02-27 23:48:34.871654766 -0500
+++ gcc/config/rs6000/vsx.md2020-02-27 16:26:51.538724961 -0500
@@ -3749,15 +3749,16 @@ (define_insn_and_split "*vsx_extract__mode_var"
-  [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r")
-   (zero_extend:
-(unspec:
- [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
-  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+;; Variable V16QI/V8HI/V4SI extract from a register and zero extend to DImode.
+(define_insn_and_split "*vsx_extract__uns_di_var"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+   (zero_extend:DI
+(unspec:
+ [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v")
+  (match_operand:DI 2 "gpc_reg_operand" "r,r")]
  UNSPEC_VSX_EXTRACT)))
-   (clobber (match_scratch:DI 3 "=r,r,"))
-   (clobber 

Re: maxval on -inf and nan in Fortran

2020-02-27 Thread Steve Kargl
On Fri, Feb 28, 2020 at 01:02:28PM +0800, Jiufu Guo wrote:
> 
> With -ffast-math -O3, this case `STOP 3` on a few platforms, e.g. ppc64le/x86.
> 

IMHO, using -ffast-math with Fortran code is never correct.
With this option, you got exactly what you wanted.

-- 
Steve


collect2.exe errors not pruned

2020-02-27 Thread Alexandre Oliva


Testing on platforms with an executable suffix gets a few excess output
failures because e.g. collect2.exe errors do not get pruned.

I'm not sure it's appropriate for the error to not omit the host
platform's executable suffix, just as it omits directory components from
argv[0], so I'm undecided between fixing collect2.c's initialization of
progname or extending the regexp, as in the (untested) patchlet below.

Any preferences?


diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
index eea4bf3..6d6a7fe 100644
--- a/gcc/testsuite/lib/prune.exp
+++ b/gcc/testsuite/lib/prune.exp
@@ -38,7 +38,7 @@ proc prune_gcc_output { text } {
 regsub -all "(^|\n)\[^\n\]*:   in .constexpr. expansion \[^\n\]*" $text "" 
text
 regsub -all "(^|\n)\[^\n\]*:   in requirements \[^\n\]*" $text "" text
 regsub -all "(^|\n)inlined from \[^\n\]*" $text "" text
-regsub -all "(^|\n)collect2: error: ld returned \[^\n\]*" $text "" text
+regsub -all "(^|\n)collect2(\.exe)?: error: ld returned \[^\n\]*" $text "" 
text
 regsub -all "(^|\n)collect: re(compiling|linking)\[^\n\]*" $text "" text
 regsub -all "(^|\n)Please submit.*instructions\[^\n\]*" $text "" text
 regsub -all "(^|\n)\[0-9\]\[0-9\]* errors\." $text "" text

-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   The darkest places in hell are reserved for those
who maintain their neutrality in times of moral crisis. -- Dante Alighieri


maxval on -inf and nan in Fortran

2020-02-27 Thread Jiufu Guo
Hi,

When I check a PR93709, I find the testcase maxlocval_4.f90 and
minlocval_4.f90 are checking `maxval/minval` on `-inf` and `nan`.
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob_plain;f=gcc/testsuite/gfortran.dg/maxlocval_4.f90;hb=HEAD
There are code like:
```
l5 = .true.
l5(1,1) = .false.
l5(1,2) = .false.
l5(2,3) = .false.
a = reshape ((/ nan, nan, nan, minf, minf, minf, minf, pinf, minf /), (/ 3, 3 
/))
b = maxval (a, dim = 1)
if (.not.isnan(b(1))) STOP 3
a = nan
a(1,3) = minf
if (maxval (a).ne.minf) STOP 65
if (maxval (a, mask = l5).ne.minf) STOP 70
```

While we know, as gfortran manual said at
5.4 MAX and MIN intrinsics with REAL NaN arguments:
https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gfortran/MAX-and-MIN-intrinsics-with-REAL-NaN-arguments.html#MAX-and-MIN-intrinsics-with-REAL-NaN-arguments

```
The Fortran standard does not specify what the result of the MAX and MIN
intrinsics are if one of the arguments is a NaN.  Accordingly, the GNU
Fortran compiler does not specify that either, as this allows for faster
and more compact code to be generated.  If the programmer wishes to take
some specific action in case one of the arguments is a NaN, it is necessary
to explicitly test the arguments before calling MAX or MIN, e.g. with the
IEEE_IS_NAN function from the intrinsic module IEEE_ARITHMETIC.
```

The test case does not check NaN explicitly.  So, strictly speaking,
this code may need more stronger to check NaN, otherwise it may STOP
during execution, and this STOP is acceptable. Right?

With -ffast-math -O3, this case `STOP 3` on a few platforms, e.g. ppc64le/x86.


Thanks,

Jiufu



maxval on -inf and nan in Fortran

2020-02-27 Thread Jiufu Guo
Hi,

When I check a PR93709, I find the testcase maxlocval_4.f90 and
minlocval_4.f90 are checking `maxval/minval` on `-inf` and `nan`.
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob_plain;f=gcc/testsuite/gfortran.dg/maxlocval_4.f90;hb=HEAD
There are code like:
```
l5 = .true.
l5(1,1) = .false.
l5(1,2) = .false.
l5(2,3) = .false.
a = reshape ((/ nan, nan, nan, minf, minf, minf, minf, pinf, minf /), (/ 3, 3 
/))
b = maxval (a, dim = 1)
if (.not.isnan(b(1))) STOP 3
a = nan
a(1,3) = minf
if (maxval (a).ne.minf) STOP 65
if (maxval (a, mask = l5).ne.minf) STOP 70
```

While we know, as gfortran manual said at
5.4 MAX and MIN intrinsics with REAL NaN arguments:
https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gfortran/MAX-and-MIN-intrinsics-with-REAL-NaN-arguments.html#MAX-and-MIN-intrinsics-with-REAL-NaN-arguments

```
The Fortran standard does not specify what the result of the MAX and MIN
intrinsics are if one of the arguments is a NaN.  Accordingly, the GNU
Fortran compiler does not specify that either, as this allows for faster
and more compact code to be generated.  If the programmer wishes to take
some specific action in case one of the arguments is a NaN, it is necessary
to explicitly test the arguments before calling MAX or MIN, e.g. with the
IEEE_IS_NAN function from the intrinsic module IEEE_ARITHMETIC.
```

The test case does not check NaN explicitly.  So, strictly speaking,
this code may need more stronger to check NaN, otherwise it may STOP
during execution, and this STOP is acceptable. Right?

With -ffast-math -O3, this case `STOP 3` on a few platforms, e.g. ppc64le/x86.


Thanks,

Jiufu



[PATCH v3 4/4] libgomp/test: Remove a build sysroot fix regression

2020-02-27 Thread Maciej W. Rozycki
Fix a problem with commit c8e759b4215b ("libgomp/test: Fix compilation 
for build sysroot") that caused a regression in some standalone test 
environments where testsuite/libgomp-test-support.exp is used, but the 
compiler is expected to be determined by `[find_gcc]', and set the 
GCC_UNDER_TEST TCL variable in testsuite/libgomp-site-extra.exp instead.

libgomp/
* configure.ac: Add testsuite/libgomp-site-extra.exp to output 
files.
* configure: Regenerate.
* testsuite/libgomp-site-extra.exp.in: New file.
* testsuite/libgomp-test-support.exp.in (GCC_UNDER_TEST): Remove 
variable.
* testsuite/Makefile.am (EXTRA_DEJAGNU_SITE_CONFIG): New
variable.
* testsuite/Makefile.in: Regenerate.
---
Changes from v2:

- Do not use `--tool_exec' with AM_RUNTESTFLAGS.

- Move the definition of GCC_UNDER_TEST from 
  testsuite/libgomp-test-support.exp to 
  testsuite/libgomp-site-extra.exp.

Applies on top of v1.
---
 libgomp/configure |3 +++
 libgomp/configure.ac  |1 +
 libgomp/testsuite/Makefile.am |2 ++
 libgomp/testsuite/Makefile.in |6 +-
 libgomp/testsuite/libgomp-site-extra.exp.in   |1 +
 libgomp/testsuite/libgomp-test-support.exp.in |2 --
 6 files changed, 12 insertions(+), 3 deletions(-)

gcc-test-libgomp-site-extra.diff
Index: gcc/libgomp/configure
===
--- gcc.orig/libgomp/configure
+++ gcc/libgomp/configure
@@ -17047,6 +17047,8 @@ ac_config_files="$ac_config_files Makefi
 
 ac_config_files="$ac_config_files 
testsuite/libgomp-test-support.pt.exp:testsuite/libgomp-test-support.exp.in"
 
+ac_config_files="$ac_config_files testsuite/libgomp-site-extra.exp"
+
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
 # tests run on this system so they can be shared between configure
@@ -18200,6 +18202,7 @@ do
 "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
 "libgomp.spec") CONFIG_FILES="$CONFIG_FILES libgomp.spec" ;;
 "testsuite/libgomp-test-support.pt.exp") CONFIG_FILES="$CONFIG_FILES 
testsuite/libgomp-test-support.pt.exp:testsuite/libgomp-test-support.exp.in" ;;
+"testsuite/libgomp-site-extra.exp") CONFIG_FILES="$CONFIG_FILES 
testsuite/libgomp-site-extra.exp" ;;
 
   *) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
   esac
Index: gcc/libgomp/configure.ac
===
--- gcc.orig/libgomp/configure.ac
+++ gcc/libgomp/configure.ac
@@ -436,4 +436,5 @@ GCC_BASE_VER
 AC_CONFIG_FILES(omp.h omp_lib.h omp_lib.f90 libgomp_f.h)
 AC_CONFIG_FILES(Makefile testsuite/Makefile libgomp.spec)
 
AC_CONFIG_FILES([testsuite/libgomp-test-support.pt.exp:testsuite/libgomp-test-support.exp.in])
+AC_CONFIG_FILES([testsuite/libgomp-site-extra.exp])
 AC_OUTPUT
Index: gcc/libgomp/testsuite/Makefile.am
===
--- gcc.orig/libgomp/testsuite/Makefile.am
+++ gcc/libgomp/testsuite/Makefile.am
@@ -12,6 +12,8 @@ _RUNTEST = $(shell if test -f $(top_srcd
 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir
 
+EXTRA_DEJAGNU_SITE_CONFIG = libgomp-site-extra.exp
+
 # Instead of directly in ../testsuite/libgomp-test-support.exp.in, the
 # following variables have to be "routed through" this Makefile, for expansion
 # of the several (Makefile) variables used therein.
Index: gcc/libgomp/testsuite/Makefile.in
===
--- gcc.orig/libgomp/testsuite/Makefile.in
+++ gcc/libgomp/testsuite/Makefile.in
@@ -111,7 +111,8 @@ am__configure_deps = $(am__aclocal_m4_de
 DIST_COMMON = $(srcdir)/Makefile.am
 mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
 CONFIG_HEADER = $(top_builddir)/config.h
-CONFIG_CLEAN_FILES = libgomp-test-support.pt.exp
+CONFIG_CLEAN_FILES = libgomp-test-support.pt.exp \
+   libgomp-site-extra.exp
 CONFIG_CLEAN_VPATH_FILES =
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -310,6 +311,7 @@ _RUNTEST = $(shell if test -f $(top_srcd
 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 
 RUNTESTDEFAULTFLAGS = --tool $$tool --srcdir $$srcdir
+EXTRA_DEJAGNU_SITE_CONFIG = libgomp-site-extra.exp
 all: all-am
 
 .SUFFIXES:
@@ -344,6 +346,8 @@ $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(
 $(am__aclocal_m4_deps):
 libgomp-test-support.pt.exp: $(top_builddir)/config.status 
$(srcdir)/libgomp-test-support.exp.in
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@
+libgomp-site-extra.exp: $(top_builddir)/config.status 
$(srcdir)/libgomp-site-extra.exp.in
+   cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@
 
 mostlyclean-libtool:
-rm -f *.lo
Index: 

[PATCH v3 3/4] libgo/test: Complement compilation fix for build sysroot

2020-02-27 Thread Maciej W. Rozycki
Complement commit b72813a68c94 ("libgo: fix DejaGNU testsuite compiler 
when using build sysroot") and move testsuite/libgo-test-support.exp.in 
to testsuite/libgo-site-extra.exp.in.  Update testsuite/lib/libgo.exp to 
handle the `--tool_exec' option to `runtest' as with other top-level GCC 
target libraries, by using the TOOL_EXECUTABLE TCL variable.

libgo/
* configure.ac: Produce testsuite/libgo-site-extra.exp rather 
than testsuite/libgo-test-support.exp.
* configure: Regenerate.
* testsuite/libgo-test-support.exp.in: Rename file to...
* testsuite/libgo-site-extra.exp.in: ... this.
* testsuite/Makefile.am: Use libgo-site-extra.exp rather than 
libgo-test-support.exp.
* testsuite/Makefile.in: Regenerate.
* testsuite/lib/libgo.exp: Handle TOOL_EXECUTABLE.
---
Changes from v2:

- Rename testsuite/libgo-test-support.exp.in to 
  testsuite/libgo-site-extra.exp.in.

Applies on top of v1.
---
 libgo/configure   |4 ++--
 libgo/configure.ac|2 +-
 libgo/testsuite/Makefile.am   |2 +-
 libgo/testsuite/Makefile.in   |6 +++---
 libgo/testsuite/lib/libgo.exp |   12 
 libgo/testsuite/libgo-site-extra.exp.in   |   17 +
 libgo/testsuite/libgo-test-support.exp.in |   17 -
 7 files changed, 32 insertions(+), 28 deletions(-)

gcc-test-libgo-site-extra.diff
Index: gcc/libgo/configure
===
--- gcc.orig/libgo/configure
+++ gcc/libgo/configure
@@ -15880,7 +15880,7 @@ else
   multilib_arg=
 fi
 
-ac_config_files="$ac_config_files Makefile testsuite/Makefile 
testsuite/libgo-test-support.exp"
+ac_config_files="$ac_config_files Makefile testsuite/Makefile 
testsuite/libgo-site-extra.exp"
 
 
 ac_config_commands="$ac_config_commands default"
@@ -17061,7 +17061,7 @@ do
 "libtool") CONFIG_COMMANDS="$CONFIG_COMMANDS libtool" ;;
 "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
 "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
-"testsuite/libgo-test-support.exp") CONFIG_FILES="$CONFIG_FILES 
testsuite/libgo-test-support.exp" ;;
+"testsuite/libgo-site-extra.exp") CONFIG_FILES="$CONFIG_FILES 
testsuite/libgo-site-extra.exp" ;;
 "default") CONFIG_COMMANDS="$CONFIG_COMMANDS default" ;;
 
   *) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
Index: gcc/libgo/configure.ac
===
--- gcc.orig/libgo/configure.ac
+++ gcc/libgo/configure.ac
@@ -889,7 +889,7 @@ else
   multilib_arg=
 fi
 
-AC_CONFIG_FILES(Makefile testsuite/Makefile testsuite/libgo-test-support.exp)
+AC_CONFIG_FILES(Makefile testsuite/Makefile testsuite/libgo-site-extra.exp)
 
 AC_CONFIG_COMMANDS([default],
 [if test -n "$CONFIG_FILES"; then
Index: gcc/libgo/testsuite/Makefile.am
===
--- gcc.orig/libgo/testsuite/Makefile.am
+++ gcc/libgo/testsuite/Makefile.am
@@ -11,7 +11,7 @@ RUNTEST = `if [ -f $(top_srcdir)/../deja
   echo $(top_srcdir)/../dejagnu/runtest ; \
else echo runtest; fi`
 
-EXTRA_DEJAGNU_SITE_CONFIG = libgo-test-support.exp
+EXTRA_DEJAGNU_SITE_CONFIG = libgo-site-extra.exp
 
 # When running the tests we set GCC_EXEC_PREFIX to the install tree so that
 # files that have already been installed there will be found.  The -B option
Index: gcc/libgo/testsuite/Makefile.in
===
--- gcc.orig/libgo/testsuite/Makefile.in
+++ gcc/libgo/testsuite/Makefile.in
@@ -107,7 +107,7 @@ am__configure_deps = $(am__aclocal_m4_de
 DIST_COMMON = $(srcdir)/Makefile.am
 mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
 CONFIG_HEADER = $(top_builddir)/config.h
-CONFIG_CLEAN_FILES = libgo-test-support.exp
+CONFIG_CLEAN_FILES = libgo-site-extra.exp
 CONFIG_CLEAN_VPATH_FILES =
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -300,7 +300,7 @@ RUNTEST = `if [ -f $(top_srcdir)/../deja
   echo $(top_srcdir)/../dejagnu/runtest ; \
else echo runtest; fi`
 
-EXTRA_DEJAGNU_SITE_CONFIG = libgo-test-support.exp
+EXTRA_DEJAGNU_SITE_CONFIG = libgo-site-extra.exp
 
 # When running the tests we set GCC_EXEC_PREFIX to the install tree so that
 # files that have already been installed there will be found.  The -B option
@@ -340,7 +340,7 @@ $(top_srcdir)/configure: @MAINTAINER_MOD
 $(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
 $(am__aclocal_m4_deps):
-libgo-test-support.exp: $(top_builddir)/config.status 
$(srcdir)/libgo-test-support.exp.in
+libgo-site-extra.exp: $(top_builddir)/config.status 
$(srcdir)/libgo-site-extra.exp.in
cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@
 
 mostlyclean-libtool:

[PATCH v3 0/4] Fix library testsuite compilation for build sysroot

2020-02-27 Thread Maciej W. Rozycki
Hi,

 This is v3 of patch series, originally posted here:






and then reposted as v2 here:






meant to address a problem with the testsuite compiler being set up across 
libatomic, libffi, libgo, libgomp with no correlation whatsoever to the 
target compiler being used in GCC compilation.  Consequently there in no 
arrangement made to set up the compilation sysroot according to the build 
sysroot specified for GCC compilation, causing a catastrophic failure 
across the testsuites affected from the inability to link executables.

 There were concerns raised by Julian and Chung-Lin about the libgomp 
change in v1 where an issue triggered in their standalone test environment 
and the wrong compiler executable was chosen.  To address this issue in v2 
I proposed to use the `--tool_exec' option to `runtest' to choose the 
compiler, however Mike expressed concerns about this approach causing 
troubles where `runtest' is invoked standalone rather than via `make'.

 I have outlined yet another (third) approach in:



and I have since realised that the generated 
`libgomp/testsuite/libgomp-test-support.exp' configuration file is not 
used with the usual automake's mechanism defined to supply cnofiguration 
files to DejaGNU.  Consequently this v3 of the series implements my third 
approach and I am fairly sure (and certainly do hope) it will satisfy 
everyone involved.

 This goes back to v1 for most of the matter and brings back the use of 
GCC_UNDER_TEST (or GOC_UNDER_TEST) supplied via `site.exp' to choose the 
compiler to use for testing.  However for consistency the file to keep 
this setting is in v3 called `*-site-extra.exp' rather than 
`*-test-support.exp'.

 For 1/4 (libatomic) and 2/4 (libffi) this is the only change made 
compared to v1, and v1 series has already been approved by Mike here:



and Jeff here:



These two patches were uncontroversial and the file name change can be 
considered trivial, so I am going to apply v3 of these patches shortly 
unless I hear objections.

 For 3/4 (libgo), where v1 has been already applied, v3 renames 
`libgo-site-extra.exp' to `libgo-test-support.exp' and also brings support 
for the `--tool_exec' option to `runtest', to keep testsuite things 
consistent across top-level target libraries.  I'm leaving it up to Ian to 
decide if he wants things arranged like this in libgo.

 For 4/4 (libgomp) some Makefile infrastructure changes are required along 
the lines of 1/4 and 2/4 to wire in `libgomp-site-extra.exp'; for these I 
request explicit approval before I push them.

 Verified with a cross-compiler configured for the `riscv-linux-gnu' 
target and the `x86_64-linux-gnu' host and using RISC-V/Linux QEMU in the 
user emulation mode as the target board.  Also no change in results with 
`x86_64-linux-gnu' native regression testing.

 See individual change descriptions for details.

 I'm assuming Ian will take care of the 3/4 libgo change; any objections 
about 1/4 and 2/4, and OK to apply 4/4 to the GCC repo?

 Finally, apologies for the confusion my earlier oversight with 
`libgomp/testsuite/libgomp-test-support.exp' wiring may have caused.

  Maciej


[PATCH v3 2/4] libffi/test: Fix compilation for build sysroot

2020-02-27 Thread Maciej W. Rozycki
Fix a problem with the libffi testsuite using a method to determine the 
compiler to use resulting in the tool being different from one the 
library has been built with, and causing a catastrophic failure from the 
inability to actually choose any compiler at all in a cross-compilation 
configuration.

Address this problem by providing a DejaGNU configuration file defining 
the compiler to use, via the GCC_UNDER_TEST TCL variable, set from $CC 
by autoconf, which will have all the required options set for the target 
compiler to build executables in the environment configured, removing 
failures like:

FAIL: libffi.call/closure_fn0.c -W -Wall -Wno-psabi -O0 (test for excess errors)
Excess errors:
default_target_compile: No compiler to compile with
UNRESOLVED: libffi.call/closure_fn0.c -W -Wall -Wno-psabi -O0 compilation 
failed to produce executable

and bringing overall test results for the `riscv64-linux-gnu' target 
(here with the `x86_64-linux-gnu' host and RISC-V QEMU in the Linux user 
emulation mode as the target board) from:

=== libffi Summary ===

# of unexpected failures708
# of unresolved testcases   708
# of unsupported tests  30

to:

=== libffi Summary ===

# of expected passes1934
# of unsupported tests  28

Also respect the TOOL_EXECUTABLE TCL variable for a standalone run via 
`runtest' and remove an unused TOOL_OPTIONS TCL variable instance.

libffi/
* configure.ac: Add testsuite/libffi-site-extra.exp to output 
files.
* configure: Regenerate.
* testsuite/libffi-site-extra.exp.in: New file.
* testsuite/Makefile.am (EXTRA_DEJAGNU_SITE_CONFIG): New 
variable.
* testsuite/Makefile.in: Regenerate.
* testsuite/lib/libffi.exp (libffi-init): Handle GCC_UNDER_TEST.
(libffi_target_compile): Likewise.
---
Changes from v2:

- Revert to v1.

- Rename testsuite/libffi-test-support.exp.in to 
  testsuite/libffi-site-extra.exp.in.

Changes from v1:

- Remove testsuite/libffi-test-support.exp.in and the associated changes.

- Pass $(CC) via `--tool_exec' in $(AM_RUNTESTFLAGS).
---
 libffi/configure  |3 ++-
 libffi/configure.ac   |2 +-
 libffi/testsuite/Makefile.am  |2 ++
 libffi/testsuite/Makefile.in  |5 -
 libffi/testsuite/lib/libffi.exp   |   16 ++--
 libffi/testsuite/libffi-site-extra.exp.in |1 +
 6 files changed, 24 insertions(+), 5 deletions(-)

gcc-test-libffi-gcc-under-test.diff
Index: gcc/libffi/configure
===
--- gcc.orig/libffi/configure
+++ gcc/libffi/configure
@@ -16662,7 +16662,7 @@ ac_config_commands="$ac_config_commands
 ac_config_links="$ac_config_links 
include/ffitarget.h:src/$TARGETDIR/ffitarget.h"
 
 
-ac_config_files="$ac_config_files include/Makefile include/ffi.h Makefile 
testsuite/Makefile man/Makefile libffi.pc"
+ac_config_files="$ac_config_files include/Makefile include/ffi.h Makefile 
testsuite/Makefile testsuite/libffi-site-extra.exp man/Makefile libffi.pc"
 
 
 cat >confcache <<\_ACEOF
@@ -17829,6 +17829,7 @@ do
 "include/ffi.h") CONFIG_FILES="$CONFIG_FILES include/ffi.h" ;;
 "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
 "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
+"testsuite/libffi-site-extra.exp") CONFIG_FILES="$CONFIG_FILES 
testsuite/libffi-site-extra.exp" ;;
 "man/Makefile") CONFIG_FILES="$CONFIG_FILES man/Makefile" ;;
 "libffi.pc") CONFIG_FILES="$CONFIG_FILES libffi.pc" ;;
 
Index: gcc/libffi/configure.ac
===
--- gcc.orig/libffi/configure.ac
+++ gcc/libffi/configure.ac
@@ -377,6 +377,6 @@ test -d src/$TARGETDIR || mkdir src/$TAR
 
 AC_CONFIG_LINKS(include/ffitarget.h:src/$TARGETDIR/ffitarget.h)
 
-AC_CONFIG_FILES(include/Makefile include/ffi.h Makefile testsuite/Makefile 
man/Makefile libffi.pc)
+AC_CONFIG_FILES(include/Makefile include/ffi.h Makefile testsuite/Makefile 
testsuite/libffi-site-extra.exp man/Makefile libffi.pc)
 
 AC_OUTPUT
Index: gcc/libffi/testsuite/Makefile.am
===
--- gcc.orig/libffi/testsuite/Makefile.am
+++ gcc/libffi/testsuite/Makefile.am
@@ -11,6 +11,8 @@ RUNTEST = `if [ -f $(top_srcdir)/../deja
   echo $(top_srcdir)/../dejagnu/runtest ; \
else echo runtest; fi`
 
+EXTRA_DEJAGNU_SITE_CONFIG = libffi-site-extra.exp
+
 AM_RUNTESTFLAGS =
 
 CLEANFILES = *.exe core* *.log *.sum
Index: gcc/libffi/testsuite/Makefile.in
===
--- gcc.orig/libffi/testsuite/Makefile.in
+++ gcc/libffi/testsuite/Makefile.in
@@ -106,7 +106,7 @@ am__configure_deps = $(am__aclocal_m4_de
 DIST_COMMON = $(srcdir)/Makefile.am
 mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
 

[PATCH v3 1/4] libatomic/test: Fix compilation for build sysroot

2020-02-27 Thread Maciej W. Rozycki
Fix a problem with the libatomic testsuite using a method to determine 
the compiler to use resulting in the tool being different from one the 
library has been built with, and causing a catastrophic failure from the 
lack of a suitable `--sysroot=' option where the `--with-build-sysroot=' 
configuration option has been used to build the compiler resulting in 
the inability to link executables.

Address this problem by providing a DejaGNU configuration file defining 
the compiler to use, via the GCC_UNDER_TEST TCL variable, set from $CC 
by autoconf, which will have all the required options set for the target 
compiler to build executables in the environment configured, removing 
failures like:

.../bin/riscv64-linux-gnu-ld: cannot find crt1.o: No such file or directory
.../bin/riscv64-linux-gnu-ld: cannot find -lm
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libatomic.c/atomic-compare-exchange-1.c (test for excess errors)
Excess errors:
.../bin/riscv64-linux-gnu-ld: cannot find crt1.o: No such file or directory
.../bin/riscv64-linux-gnu-ld: cannot find -lm

UNRESOLVED: libatomic.c/atomic-compare-exchange-1.c compilation failed to 
produce executable

and bringing overall test results for the `riscv64-linux-gnu' target 
(here with the `x86_64-linux-gnu' host and RISC-V QEMU in the Linux user 
emulation mode as the target board) from:

=== libatomic Summary ===

# of unexpected failures27
# of unresolved testcases   27

to:

=== libatomic Summary ===

# of expected passes54

libatomic/
* configure.ac: Add testsuite/libatomic-site-extra.exp to output 
files.
* configure: Regenerate.
* libatomic/testsuite/libatomic-site-extra.exp.in: New file.
* testsuite/Makefile.am (EXTRA_DEJAGNU_SITE_CONFIG): New 
variable.
* testsuite/Makefile.in: Regenerate.
---
Changes from v2:

- Revert to v1.

- Rename testsuite/libatomic-test-support.exp.in to 
  testsuite/libatomic-site-extra.exp.in.

Changes from v1:

- Remove testsuite/libatomic-test-support.exp.in and the associated
  changes.

- Pass $(CC) via `--tool_exec' in $(AM_RUNTESTFLAGS).
---
 libatomic/configure |3 +++
 libatomic/configure.ac  |1 +
 libatomic/testsuite/Makefile.am |2 ++
 libatomic/testsuite/Makefile.in |5 -
 libatomic/testsuite/libatomic-site-extra.exp.in |1 +
 5 files changed, 11 insertions(+), 1 deletion(-)

gcc-test-libatomic-gcc-under-test.diff
Index: gcc/libatomic/configure
===
--- gcc.orig/libatomic/configure
+++ gcc/libatomic/configure
@@ -15728,6 +15728,8 @@ fi
 
 ac_config_files="$ac_config_files Makefile testsuite/Makefile"
 
+ac_config_files="$ac_config_files testsuite/libatomic-site-extra.exp"
+
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
 # tests run on this system so they can be shared between configure
@@ -16799,6 +16801,7 @@ do
 "gstdint.h") CONFIG_COMMANDS="$CONFIG_COMMANDS gstdint.h" ;;
 "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
 "testsuite/Makefile") CONFIG_FILES="$CONFIG_FILES testsuite/Makefile" ;;
+"testsuite/libatomic-site-extra.exp") CONFIG_FILES="$CONFIG_FILES 
testsuite/libatomic-site-extra.exp" ;;
 
   *) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
   esac
Index: gcc/libatomic/configure.ac
===
--- gcc.orig/libatomic/configure.ac
+++ gcc/libatomic/configure.ac
@@ -288,4 +288,5 @@ else
 fi
 
 AC_CONFIG_FILES(Makefile testsuite/Makefile)
+AC_CONFIG_FILES(testsuite/libatomic-site-extra.exp)
 AC_OUTPUT
Index: gcc/libatomic/testsuite/Makefile.am
===
--- gcc.orig/libatomic/testsuite/Makefile.am
+++ gcc/libatomic/testsuite/Makefile.am
@@ -11,3 +11,5 @@ EXPECT = $(shell if test -f $(top_buildd
 _RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \
 echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 RUNTEST = $(_RUNTEST) $(AM_RUNTESTFLAGS)
+
+EXTRA_DEJAGNU_SITE_CONFIG = libatomic-site-extra.exp
Index: gcc/libatomic/testsuite/Makefile.in
===
--- gcc.orig/libatomic/testsuite/Makefile.in
+++ gcc/libatomic/testsuite/Makefile.in
@@ -109,7 +109,7 @@ am__configure_deps = $(am__aclocal_m4_de
 DIST_COMMON = $(srcdir)/Makefile.am
 mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
 CONFIG_HEADER = $(top_builddir)/auto-config.h
-CONFIG_CLEAN_FILES =
+CONFIG_CLEAN_FILES = libatomic-site-extra.exp
 CONFIG_CLEAN_VPATH_FILES =
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -278,6 +278,7 @@ _RUNTEST = $(shell if test -f $(top_srcd
 echo 

[committed] correct -Wbuiltin-declaration-mismatch default in manual

2020-02-27 Thread Martin Sebor

Pushed.

Martin

commit ab466f73bb3bd24965cb2c7635b0339509dafbe3 (HEAD -> master)
Author: Martin Sebor 
Date:   Thu Feb 27 16:53:01 2020 -0700

Document that -Wbuiltin-declaration-mismatch is enabled by default.

gcc/ChangeLog:

* doc/invoke.texi (-Wbuiltin-declaration-mismatch): Fix a typo.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f5d4e6dd582..1992369d068 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2020-02-27  Martin Sebor  
+
+   * doc/invoke.texi (-Wbuiltin-declaration-mismatch): Fix a typo.
+
 2020-02-27  Michael Meissner  

PR target/93932
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e70ece6d492..4f88fe68999 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7683,7 +7683,7 @@ or as a non-function, or when a built-in function 
declared with a type

 that does not include a prototype is called with arguments whose promoted
 types do not match those expected by the function.  When @option{-Wextra}
 is specified, also warn when a built-in function that takes arguments is
-declared without a prototype.  The 
@option{-Wno-builtin-declaration-mismatch}

+declared without a prototype.  The @option{-Wbuiltin-declaration-mismatch}
 warning is enabled by default.  To avoid the warning include the 
appropriate

 header to bring the prototypes of built-in functions into scope.


Re: [PATCH], PR target/93932, GCC 9 backport, Do not use input_operand for variable vector extract insns on PowerPC

2020-02-27 Thread Michael Meissner
On Thu, Feb 27, 2020 at 04:57:28PM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Thu, Feb 27, 2020 at 03:38:54PM -0500, Michael Meissner wrote:
> > Here are the equivalent changes for PR target/93932 for the GCC 9 branch.  I
> > have built both big and little endian PowerPC linux compilers and both
> > bootstrapped.  The make check actually fixes the tests that were broken by 
> > the
> > register allocation behavior.  Can I check these patches into GCC 9?
> 
> So what is different in this backport?
> 
> Either way, it needs some soaking time first.  This patch is not much
> safer than average.

Fair enough.

The difference is GCC 9 has all of the old constraints that were eliminated in
the current master branch.

I.e. the first patch for the current master, uses "wa" for the second
constraint, while GCC 9 used "" (which in GCC 9 expanded to "wa" in both
cases).  Here is the patch from the master branch.

--- /tmp/TMHdwO_vsx.md  2020-02-26 13:25:27.250209645 -0500
+++ gcc/config/rs6000/vsx.md2020-02-26 13:25:21.357125563 -0500
@@ -3245,14 +3245,14 @@ (define_insn "vsx_vslo_"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract from a register
 (define_insn_and_split "vsx_extract__var"
-  [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r")
-   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
-(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand: 0 "gpc_reg_operand" "=v")
+   (unspec: [(match_operand:VSX_D 1 "gpc_reg_operand" "v")
+(match_operand:DI 2 "gpc_reg_operand" "r")]
UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,,"))
-   (clobber (match_scratch:V2DI 4 "=,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "="))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"

while the patch for GCC 9 is:

--- /tmp/E213hs_vsx.md  2020-02-26 16:55:13.792745200 -0500
+++ gcc/config/rs6000/vsx.md2020-02-26 16:50:50.614817018 -0500
@@ -3292,14 +3292,14 @@ (define_insn "vsx_vslo_"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract from a register
 (define_insn_and_split "vsx_extract__var"
-  [(set (match_operand: 0 "gpc_reg_operand" "=v,,r")
-   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
-(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand: 0 "gpc_reg_operand" "=v")
+   (unspec: [(match_operand:VSX_D 1 "gpc_reg_operand" "v")
+(match_operand:DI 2 "gpc_reg_operand" "r")]
UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,,"))
-   (clobber (match_scratch:V2DI 4 "=,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "="))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: gcov: reduce code quality loss by reproducible topn merging [PR92924]

2020-02-27 Thread Gerald Pfeifer
On Thu, 27 Feb 2020, Gerald Pfeifer wrote:
>> This (or rather its predecessor?) breaks bootstrap on 32-bit 
>> i386-unknown-freebsd11.3.
>> 
>> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c: In 
>> function 'void dump_histogram_value(FILE*, histogram_value)':
>> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c:268:28:
>>  error: format '%lld' expects argument of type 'long long int', but argument 
>> 3 hastype 'int' [-Werror=format=]
>>   268 |fprintf (dump_file, " all: %" PRId64 "%s, values: ",
>>   |^~~
>>   269 |  abs ((int64_t) hist->hvalue.counters[0]),
>>   |  
>>   |  |
>>   |  int
>> 
>> (I'm not sure why my nightly tester has not caught this, but only
>> the snapshot did.)
> This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93962 .

And Andrew had a very good hint there (thanks!).  

The patch below indeed restores the build on i386-unknown-freebsd11.

Okay?  Or does this qualify as obvious?

Gerald


2020-02-28  Gerald Pfeifer  
Andrew Pinski  

PR bootstrap/93962
* value-prof.c (dump_histogram_value): Use std::abs instead of
abs.
 
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 8e9f129708a..585b909096f 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -266,7 +266,7 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)
  if (hist->hvalue.counters)
{
  fprintf (dump_file, " all: %" PRId64 "%s, values: ",
-  abs ((int64_t) hist->hvalue.counters[0]),
+  std::abs ((int64_t) hist->hvalue.counters[0]),
   hist->hvalue.counters[0] < 0
   ? " (values missing)": "");
  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)


Re: [PATCH], PR target/93932, GCC 9 backport, Do not use input_operand for variable vector extract insns on PowerPC

2020-02-27 Thread Segher Boessenkool
Hi,

On Thu, Feb 27, 2020 at 03:38:54PM -0500, Michael Meissner wrote:
> Here are the equivalent changes for PR target/93932 for the GCC 9 branch.  I
> have built both big and little endian PowerPC linux compilers and both
> bootstrapped.  The make check actually fixes the tests that were broken by the
> register allocation behavior.  Can I check these patches into GCC 9?

So what is different in this backport?

Either way, it needs some soaking time first.  This patch is not much
safer than average.


Segher


[PATCH] tighten up validation of built-in redeclarations (PR 93926)

2020-02-27 Thread Martin Sebor

GCC considers valid explicit declarations of built-ins whose return
types match in their modes, even if the types themselves are
incompatible (say integer and pointer of the same size).  This is
more permissive than for argument types where a pointer/integer
mismatch disqualifies the redeclaration.

With -Wextra enabled although -Wbuiltin-declaration-mismatch
diagnoses these "benign" mismatches in return types the C front-end
still considers the mismatched declaration one of the built-in.  That
can lead to problems down the line when the middle attempts to do its
own sanity checking based on some simple and reasonable notion of type
compatibility (like a malloc kind of function returning a pointer).

The attached patch tightens up the requirements a declaration has to
meet in order to match a built-in to apply the same matching to their
return types as to their arguments.

Tested on x86_64-linux.

Martin
PR middle-end/93926 - ICE on a built-in redeclaration returning an integer instead of a pointer

gcc/c/ChangeLog:

	PR middle-end/93926
	* c-decl.c (types_close_enough_to_match): New function.
	(match_builtin_function_types):
	(diagnose_mismatched_decls): Add missing inform call to a warning.

gcc/testsuite/ChangeLog:

	PR middle-end/93926
	* gcc.dg/Wbuiltin-declaration-mismatch-13.c: New test.


diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 1aa410db6e4..da276e981fa 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -1649,6 +1649,18 @@ c_bind (location_t loc, tree decl, bool is_global)
 
 static GTY(()) tree last_structptr_types[6];
 
+/* Returns true if types T1 and T2 representing return types or types
+   of function arguments are close enough to be considered interchangeable
+   in redeclarations of built-in functions.  */
+
+static bool
+types_close_enough_to_match (tree t1, tree t2)
+{
+  return (TYPE_MODE (t1) == TYPE_MODE (t2)
+	  && POINTER_TYPE_P (t1) == POINTER_TYPE_P (t2)
+	  && FUNCTION_POINTER_TYPE_P (t1) == FUNCTION_POINTER_TYPE_P (t2));
+}
+
 /* Subroutine of compare_decls.  Allow harmless mismatches in return
and argument types provided that the type modes match.  Set *STRICT
and *ARGNO to the expected argument type and number in case of
@@ -1659,16 +1671,19 @@ static tree
 match_builtin_function_types (tree newtype, tree oldtype,
 			  tree *strict, unsigned *argno)
 {
-  /* Accept the return type of the new declaration if same modes.  */
-  tree oldrettype = TREE_TYPE (oldtype);
-  tree newrettype = TREE_TYPE (newtype);
-
   *argno = 0;
   *strict = NULL_TREE;
 
-  if (TYPE_MODE (oldrettype) != TYPE_MODE (newrettype))
+  /* Accept the return type of the new declaration if it has the same
+ mode and if they're both pointers or if neither is.  */
+  tree oldrettype = TREE_TYPE (oldtype);
+  tree newrettype = TREE_TYPE (newtype);
+
+  if (!types_close_enough_to_match (oldrettype, newrettype))
 return NULL_TREE;
 
+  /* Check that the return types are compatible but don't fail if they
+ are not (e.g., int vs long in ILP32) and just let the caller know.  */
   if (!comptypes (TYPE_MAIN_VARIANT (oldrettype),
 		  TYPE_MAIN_VARIANT (newrettype)))
 *strict = oldrettype;
@@ -1692,15 +1707,7 @@ match_builtin_function_types (tree newtype, tree oldtype,
   tree oldtype = TYPE_MAIN_VARIANT (TREE_VALUE (oldargs));
   tree newtype = TYPE_MAIN_VARIANT (TREE_VALUE (newargs));
 
-  /* Fail for types with incompatible modes/sizes.  */
-  if (TYPE_MODE (TREE_VALUE (oldargs))
-	  != TYPE_MODE (TREE_VALUE (newargs)))
-	return NULL_TREE;
-
-  /* Fail for function and object pointer mismatches.  */
-  if ((FUNCTION_POINTER_TYPE_P (oldtype)
-	   != FUNCTION_POINTER_TYPE_P (newtype))
-	  || POINTER_TYPE_P (oldtype) != POINTER_TYPE_P (newtype))
+  if (!types_close_enough_to_match (oldtype, newtype))
 	return NULL_TREE;
 
   unsigned j = (sizeof (builtin_structptr_types)
@@ -1957,11 +1964,10 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 	  && !C_DECL_DECLARED_BUILTIN (olddecl))
 	{
 	  /* Accept "harmless" mismatches in function types such
-	 as missing qualifiers or pointer vs same size integer
-	 mismatches.  This is for the ffs and fprintf builtins.
-	 However, with -Wextra in effect, diagnose return and
-	 argument types that are incompatible according to
-	 language rules.  */
+	 as missing qualifiers or int vs long when they're the same
+	 size.  However, with -Wextra in effect, diagnose return and
+	 argument types that are incompatible according to language
+	 rules.  */
 	  tree mismatch_expect;
 	  unsigned mismatch_argno;
 
@@ -1999,16 +2005,25 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 	  /* If types match only loosely, print a warning but accept
 		 the redeclaration.  */
 	  location_t newloc = DECL_SOURCE_LOCATION (newdecl);
+	  bool warned = false;
 	  if (mismatch_argno)
-		warning_at (newloc, OPT_Wbuiltin_declaration_mismatch,
-			"mismatch in 

Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt

On 2/27/20 2:21 PM, Bill Schmidt wrote:



On 2/27/20 12:48 PM, GT wrote:


Done.

The updated document is at:
https://sourceware.org/glibc/wiki/HomePage?action=AttachFile=view=powerarchvectfuncabi.html


Looks good.  Can you please also remove the 'c' ABI from the mangling, as 
earlier agreed?

Thanks!
Bill



[committed] libstdc++: Disable diagnostic URLs in testsuite

2020-02-27 Thread Jonathan Wakely
* testsuite/lib/libstdc++.exp (v3_target_compile): Add
-fdiagnostics-urls=never to options.

Tested x86_64-linux, committed to master.

commit 449494943e65e4c9cf668a566b0da13e44d79f3b
Author: Jonathan Wakely 
Date:   Thu Feb 27 17:45:06 2020 +

libstdc++: Disable diagnostic URLs in testsuite

* testsuite/lib/libstdc++.exp (v3_target_compile): Add
-fdiagnostics-urls=never to options.

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 94f3fdb2bc8..10a7e748464 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -482,7 +482,7 @@ proc v3_target_compile { source dest type options } {
 global STATIC_LIBCXXFLAGS
 global tool
 
-lappend options "additional_flags=-fno-diagnostics-show-caret 
-fdiagnostics-color=never"
+lappend options "additional_flags=-fno-diagnostics-show-caret 
-fdiagnostics-color=never -fdiagnostics-urls=never"
 
 if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } {
lappend options "libs=${gluefile}"


[PATCH], PR target/93932, GCC 9 backport, Do not use input_operand for variable vector extract insns on PowerPC

2020-02-27 Thread Michael Meissner
Here are the equivalent changes for PR target/93932 for the GCC 9 branch.  I
have built both big and little endian PowerPC linux compilers and both
bootstrapped.  The make check actually fixes the tests that were broken by the
register allocation behavior.  Can I check these patches into GCC 9?

Note, GCC 8 does not need these patches.

2020-02-27  Michael Meissner  

Back port from trunk
2020-02-26  Michael Meissner  

PR target/93932
* config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
Split the insn into two parts.  This insn only does variable
extract from a register.
(vsx_extract__var_load, VSX_D iterator): New insn, do
variable extract from memory.
(vsx_extract_v4sf_var): Split the insn into two parts.  This insn
only does variable extract from a register.
(vsx_extract_v4sf_var_load): New insn, do variable extract from
memory.
(vsx_extract__var, VSX_EXTRACT_I iterator): Split the insn
into two parts.  This insn only does variable extract from a
register.
(vsx_extract__var_load, VSX_EXTRACT_I iterator): New insn,
do variable extract from memory.

--- /tmp/E213hs_vsx.md  2020-02-26 16:55:13.792745200 -0500
+++ gcc/config/rs6000/vsx.md2020-02-26 16:50:50.614817018 -0500
@@ -3292,14 +3292,14 @@ (define_insn "vsx_vslo_"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract from a register
 (define_insn_and_split "vsx_extract__var"
-  [(set (match_operand: 0 "gpc_reg_operand" "=v,,r")
-   (unspec: [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
-(match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand: 0 "gpc_reg_operand" "=v")
+   (unspec: [(match_operand:VSX_D 1 "gpc_reg_operand" "v")
+(match_operand:DI 2 "gpc_reg_operand" "r")]
UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,,"))
-   (clobber (match_scratch:V2DI 4 "=,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "="))]
   "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3310,6 +3310,23 @@ (define_insn_and_split "vsx_extract__var_load"
+  [(set (match_operand: 0 "gpc_reg_operand" "=,r")
+   (unspec: [(match_operand:VSX_D 1 "memory_operand" "Q,Q")
+(match_operand:DI 2 "gpc_reg_operand" "r,r")]
+   UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=,"))]
+  "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], 
operands[2],
+  operands[3], mode);
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Extract a SF element from V4SF
 (define_insn_and_split "vsx_extract_v4sf"
   [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
@@ -3361,14 +3378,14 @@ (define_insn_and_split "*vsx_extract_v4s
   [(set_attr "type" "fpload,fpload,fpload,load")
(set_attr "length" "8")])
 
-;; Variable V4SF extract
+;; Variable V4SF extract from a register
 (define_insn_and_split "vsx_extract_v4sf_var"
-  [(set (match_operand:SF 0 "gpc_reg_operand" "=ww,ww,?r")
-   (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q")
-   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=ww")
+   (unspec:SF [(match_operand:V4SF 1 "gpc_reg_operand" "v")
+   (match_operand:DI 2 "gpc_reg_operand" "r")]
   UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,,"))
-   (clobber (match_scratch:V2DI 4 "=,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "="))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3379,6 +3396,24 @@ (define_insn_and_split "vsx_extract_v4sf
   DONE;
 })
 
+;; Variable V4SF extract from memory
+(define_insn_and_split "*vsx_extract_v4sf_var_load"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=ww,?r")
+   (unspec:SF [(match_operand:V4SF 1 "input_operand" "Q,Q")
+   (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+  UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=,"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rs6000_split_vec_extract_var (operands[0], operands[1], operands[2],
+   operands[3], operands[4]);
+  DONE;
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Expand the builtin form of xxpermdi to canonical rtl.
 (define_expand "vsx_xxpermdi_"
   [(match_operand:VSX_L 0 "vsx_register_operand")
@@ -3720,15 +3755,15 @@ (define_insn_and_split "*vsx_extract__var"
-  [(set (match_operand: 0 "gpc_reg_operand" 

Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt



On 2/27/20 12:48 PM, GT wrote:

‐‐‐ Original Message ‐‐‐
On Thursday, February 27, 2020 9:26 AM, Bill Schmidt  
wrote:


Upon reflection, I agree.  Bert, we need to make changes to the document to
reflect this:

(1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
powerpc64le.

Done. Have provided names and links to respective ABI documents but no longer
explicitly refer to ELF version.


(2) "Vector Length" should remove bullet 3, strike the word
"nonhomogeneous" in bullet 4, and strike the parenthetical clause in
bullet 4.
(3) "Ordering of Vector Arguments" should remove the example involving
homogeneous aggregates.


Done.


It also occurs to me that for bullets 4 and 5 in "Vector Length", the
CDT should be long long, not int, since we pass aggregates in pieces in
64-bit registers and/or chunks of memory.


That determination of Vector Length is common for all architectures and is
implemented in function simd_clone_compute_base_data_type. If we do really
need PPC64 to be different, we'll have to allow the function to be replaced
by architecture-specific versions. Before we do that, do you have
an example of code which ends up with incorrect vectorization with the
existing CDT of int?


No, and I'll withdraw the suggestion.  It seems rather arbitrary in any event.

Thanks for the updates!

Bill




Other small bugs:
  - Bullet 4 says "the CDT determine by a) or b) above", but the referents
should be "(1) or (2)" instead.
  - First line of "Compiler generated variants of vector functions" has
a typo ("umasked").


Done.

The updated document is at:
https://sourceware.org/glibc/wiki/HomePage?action=AttachFile=view=powerarchvectfuncabi.html


Re: gcov: reduce code quality loss by reproducible topn merging [PR92924]

2020-02-27 Thread Gerald Pfeifer
On Mon, 24 Feb 2020, Gerald Pfeifer wrote:
> This (or rather its predecessor?) breaks bootstrap on 32-bit 
> i386-unknown-freebsd11.3.
> 
> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c: In 
> function 'void dump_histogram_value(FILE*, histogram_value)':
> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c:268:28: 
> error: format '%lld' expects argument of type 'long long int', but argument 3 
> hastype 'int' [-Werror=format=]
>   268 |fprintf (dump_file, " all: %" PRId64 "%s, values: ",
>   |^~~
>   269 |  abs ((int64_t) hist->hvalue.counters[0]),
>   |  
>   |  |
>   |  int
> 
> (I'm not sure why my nightly tester has not caught this, but only
> the snapshot did.)

This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93962 .

Gerald


Re: [PATCH] libstdc++: Memoize {drop,drop_while,filter,reverse}_view::begin

2020-02-27 Thread Patrick Palka
On Wed, 26 Feb 2020, Patrick Palka wrote:

> On Tue, 11 Feb 2020, Patrick Palka wrote:
> 
> > This patch adds memoization for these four views so that their begin() has 
> > the
> > required constant time amortized complexity.
> > 
> > In the general case we use std::optional to cache the result.  When the
> > underlying range is a random_access_range then we store the cache as an 
> > offset
> > from the beginning of the range, which should be more compact.  And when the
> > underlying iterator is not copyable, then we completely disable the cache.
> > 
> > Using std::optional in the cache is not ideal though because it means that 
> > the
> > cache can't be utilized during constexpr evaluation.  If instead of
> > std::optional we store a separate flag to denote an empty cache then we'll 
> > be
> > able to use the cache during constexpr evaluation at the cost of a extra 
> > byte or
> > so.  I am not sure which design to settle on.
> 
> Here's v2 of this patch which uses the new helper
> __detail::__maybe_empty_t and provides a more descriptive commit
> message.  It also refines the constraints on the partial specializations
> of _CachedPosition.
> 
> -- >8 --
> 

Here's v3 of this patch which takes advantage of the fact that
value-initialized forward iterators can be compared to.  This means we
can cache the bare iterator instead of having to use std::optional or
needing an external flag denoting the empty state of the cache, which is
both optimal space-wise and constexpr safe!

-- >8 --

Subject: [PATCH] libstdc++: Memoize
 {drop,drop_while,filter,reverse}_view::begin

This patch adds memoization to these four views so that their begin() has the
required amortized constant time complexity.

The cache is enabled only for forward_ranges and above because we need the
underlying iterator to be copyable and multi-pass in order for the cache to be
useful.  In the general case we store the cached result of begin() as a bare
iterator by taking advantage of the fact that value-initialized forward
iterators can be compared with as per N3644, so we can use a value-initialized
iterator to denote the "empty" state of the cache.

As a special case, when the underlying range models random_access_range and when
it's profitable size-wise, then we cache the offset of the iterator from the
beginning of the range instead of caching the iterator itself.

Additionally, in drop_view and reverse_view we disable the cache when the
underlying range models random_access_range, because in these cases recomputing
begin() takes O(1) time anyway.

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::_CachedPosition): New struct.
(views::filter_view::_S_needs_cached_begin): New member variable.
(views::filter_view::_M_cached_begin): New member variable.
(views::filter_view::begin): Use _M_cached_begin to cache its
result.
(views::drop_view::_S_needs_cached_begin): New static member variable.
(views::drop_view::_M_cached_begin): New member variable.
(views::drop_view::begin): Use _M_cached_begin to cache its result
when _S_needs_cached_begin.
(views::drop_while_view::_M_cached_begin): New member variable.
(views::drop_while_view::begin): Use _M_cached_begin to cache its
result.
(views::reverse_view::_S_needs_cached_begin): New static member
variable.
(views::reverse_view::_M_cached_begin): New member variable.
(views::reverse_view::begin): Use _M_cached_begin to cache its result
when _S_needs_cached_begin.
* testsuite/std/ranges/adaptors/drop.cc: Augment test to check that
drop_view::begin caches its result.
* testsuite/std/ranges/adaptors/drop_while.cc: Augment test to check
that drop_while_view::begin caches its result.
* testsuite/std/ranges/adaptors/filter.cc: Augment test to check that
filter_view::begin caches its result.
* testsuite/std/ranges/adaptors/reverse.cc: Augment test to check that
reverse_view::begin caches its result.
---
 libstdc++-v3/include/std/ranges   | 136 --
 .../testsuite/std/ranges/adaptors/drop.cc |  57 
 .../std/ranges/adaptors/drop_while.cc |  38 -
 .../testsuite/std/ranges/adaptors/filter.cc   |  36 +
 .../testsuite/std/ranges/adaptors/reverse.cc  |  56 
 5 files changed, 308 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 38d497ec88e..22a494ae495 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1334,6 +1334,83 @@ namespace views
   }
   } // namespace __detail
 
+  namespace __detail
+  {
+template
+  struct _CachedPosition
+  {
+   constexpr bool
+   _M_has_value() const
+   { return false; }
+
+   constexpr iterator_t<_Range>
+   _M_get(const _Range&) const
+   {
+ 

Re: GLIBC libmvec status

2020-02-27 Thread GT
‐‐‐ Original Message ‐‐‐
On Thursday, February 27, 2020 9:26 AM, Bill Schmidt  
wrote:

>
> Upon reflection, I agree.  Bert, we need to make changes to the document to
> reflect this:
>
> (1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
> powerpc64le.

Done. Have provided names and links to respective ABI documents but no longer
explicitly refer to ELF version.

> (2) "Vector Length" should remove bullet 3, strike the word
> "nonhomogeneous" in bullet 4, and strike the parenthetical clause in
> bullet 4.
> (3) "Ordering of Vector Arguments" should remove the example involving
> homogeneous aggregates.
>

Done.

> It also occurs to me that for bullets 4 and 5 in "Vector Length", the
> CDT should be long long, not int, since we pass aggregates in pieces in
> 64-bit registers and/or chunks of memory.
>

That determination of Vector Length is common for all architectures and is
implemented in function simd_clone_compute_base_data_type. If we do really
need PPC64 to be different, we'll have to allow the function to be replaced
by architecture-specific versions. Before we do that, do you have
an example of code which ends up with incorrect vectorization with the
existing CDT of int?

> Other small bugs:
>  - Bullet 4 says "the CDT determine by a) or b) above", but the referents
> should be "(1) or (2)" instead.
>  - First line of "Compiler generated variants of vector functions" has
> a typo ("umasked").
>

Done.

The updated document is at:
https://sourceware.org/glibc/wiki/HomePage?action=AttachFile=view=powerarchvectfuncabi.html


[PATCH] Limit includes in hashtable_policy.h

2020-02-27 Thread François Dumont
When I use std::is_permutation in hashtable_policy.h I included 
stl_algo.h which is a large header. No other header in include/bits does 
this, I would prefer not being the first to do such a thing.


As it is a recent change I prefer to submit this patch now.

Git commit message:

    libstdc++ Hashtable: Move std::is_permutation to limit includes

    * include/bits/stl_algo.h (__find_if, __count_if, 
std::is_permutation): Move...

    * include/bits/stl_algobase.h: ...here.
    * include/bits/hashtable_policy.h: Remove  include.

testsuite/23_containers/unordered* tested under Linux x86_64, I'll run 
full before any commit.


Ok to commit now ?

Ok to commit once back in stage 1 ?

François

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 22bc4472e32..ef120134914 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -33,8 +33,7 @@
 
 #include 		// for std::tuple, std::forward_as_tuple
 #include 		// for std::numeric_limits
-#include 	// for std::min.
-#include 	// for std::is_permutation.
+#include 	// for std::min, std::is_permutation.
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 6503d1518d3..932ece55529 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -96,76 +96,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	std::iter_swap(__result, __b);
 }
 
-  /// This is an overload used by find algos for the Input Iterator case.
-  template
-_GLIBCXX20_CONSTEXPR
-inline _InputIterator
-__find_if(_InputIterator __first, _InputIterator __last,
-	  _Predicate __pred, input_iterator_tag)
-{
-  while (__first != __last && !__pred(__first))
-	++__first;
-  return __first;
-}
-
-  /// This is an overload used by find algos for the RAI case.
-  template
-_GLIBCXX20_CONSTEXPR
-_RandomAccessIterator
-__find_if(_RandomAccessIterator __first, _RandomAccessIterator __last,
-	  _Predicate __pred, random_access_iterator_tag)
-{
-  typename iterator_traits<_RandomAccessIterator>::difference_type
-	__trip_count = (__last - __first) >> 2;
-
-  for (; __trip_count > 0; --__trip_count)
-	{
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-	}
-
-  switch (__last - __first)
-	{
-	case 3:
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-	case 2:
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-	case 1:
-	  if (__pred(__first))
-	return __first;
-	  ++__first;
-	case 0:
-	default:
-	  return __last;
-	}
-}
-
-  template
-_GLIBCXX20_CONSTEXPR
-inline _Iterator
-__find_if(_Iterator __first, _Iterator __last, _Predicate __pred)
-{
-  return __find_if(__first, __last, __pred,
-		   std::__iterator_category(__first));
-}
-
   /// Provided for stable_partition to use.
   template
 _GLIBCXX20_CONSTEXPR
@@ -3279,18 +3209,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  __new_value);
 }
 
-  template
-_GLIBCXX20_CONSTEXPR
-typename iterator_traits<_InputIterator>::difference_type
-__count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred)
-{
-  typename iterator_traits<_InputIterator>::difference_type __n = 0;
-  for (; __first != __last; ++__first)
-	if (__pred(__first))
-	  ++__n;
-  return __n;
-}
-
 #if __cplusplus >= 201103L
   /**
*  @brief  Determines whether the elements of a sequence are sorted.
@@ -3588,74 +3506,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return std::make_pair(*__p.first, *__p.second);
 }
 
-  template
-_GLIBCXX20_CONSTEXPR
-bool
-__is_permutation(_ForwardIterator1 __first1, _ForwardIterator1 __last1,
-		 _ForwardIterator2 __first2, _BinaryPredicate __pred)
-{
-  // Efficiently compare identical prefixes:  O(N) if sequences
-  // have the same elements in the same order.
-  for (; __first1 != __last1; ++__first1, (void)++__first2)
-	if (!__pred(__first1, __first2))
-	  break;
-
-  if (__first1 == __last1)
-	return true;
-
-  // Establish __last2 assuming equal ranges by iterating over the
-  // rest of the list.
-  _ForwardIterator2 __last2 = __first2;
-  std::advance(__last2, std::distance(__first1, __last1));
-  for (_ForwardIterator1 __scan = __first1; __scan != __last1; ++__scan)
-	{
-	  if (__scan != std::__find_if(__first1, __scan,
-			  __gnu_cxx::__ops::__iter_comp_iter(__pred, __scan)))
-	continue; // We've seen this one before.
-	  
-	  auto __matches
-	= std::__count_if(__first2, __last2,
-			__gnu_cxx::__ops::__iter_comp_iter(__pred, __scan));
-	  if (0 == __matches ||
-	  

Re: [PATCH] libstdc++: -D_GLIBCXX_DEBUG fixes in the constrained algos tests

2020-02-27 Thread Jonathan Wakely

On 27/02/20 10:42 -0500, Patrick Palka wrote:

This fixes some failures in the constrained algos tests when run in debug mode.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/copy/constrained.cc: Don't assume that the
base() of an vector<>::iterator is a pointer.
* testsuite/25_algorithms/copy_backward/constrained.cc: Likewise.
* testsuite/25_algorithms/move/constrained.cc: Likewise.
* testsuite/25_algorithms/move_backward/constrained.cc: Likewise.
* testsuite/25_algorithms/inplace_merge/constrained.cc: Use foo.data()
instead of [0] when foo is a vector.
* testsuite/25_algorithms/partial_sort/constrained.cc: Likewise.
* testsuite/25_algorithms/partial_sort_copy/constrained.cc: Likewise.
* testsuite/25_algorithms/shuffle/constrained.cc: Likewise.
* testsuite/25_algorithms/sort/constrained.cc: Likewise.
* testsuite/25_algorithms/stable_sort/constrained.cc: Likewise.


OK, thanks.




Re: [PATCH] libstdc++: Add missing friend declarations in some range adaptors

2020-02-27 Thread Jonathan Wakely

On 27/02/20 11:27 -0500, Patrick Palka wrote:

Some of the range adaptors have distinct constant and non-constant
iterator/sentinel types, along with converting constructors that can convert a
non-constant iterator/sentinel to a constant iterator/sentinel.  This patch adds
the missing appropriate friend declarations in order to make these converting
constructors well formed.

Strictly speaking it seems the friendship relation doesn't need to go both ways
-- we could get away with declaring e.g. friend _Iterator; instead of
friend _Iterator; but the spec and the reference implementations all
seem to use the latter symmetric form anyway.


I think at least one of those friend declarations in the spec was
recently removed, because the class it was in is presented for
exposition only, so doesn't actually need to declare other
non-existent classes as friends. But it's certainly not a problem to
use _Iterator rather than _Iterator.

OK for master, thanks.




[committed] amdgcn: sub-dword vector min/max/shift/bit operators

2020-02-27 Thread Andrew Stubbs
This patch adds V64QI and V64HI implementations of smin, umin, smax, 
umax, ashift, ashiftrt, lshiftrt, and, ior, xor, not, and popcount.


None of these operators have a specific machine instruction, so they 
need to use V64SI instructions.  For scalar code expr.c can DTRT 
automatically, but not so for vector operations.


The min/max and shift operators emit explicit extends and truncates 
around the actual operator. I don't believe those are needed for the bit 
operators but it can be easily implemented if needed.


There can be more optimal implementations in future, but right now I'm 
interested in correctness. For example, some of the instructions can 
have the extend and/or truncate combined into one "DPP" instruction, so 
I intend to add pattern for the combine pass to use. Similarly, there 
are load instructions with built-in extends, and I can change the 
representation of the stores to allow combining truncates.


Andrew
amdgcn: sub-dword vector min/max/shift/bit operators

2020-02-27  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md (VEC_SUBDWORD_MODE): New mode iterator.
	(2): Change modes to VEC_ALL1REG_INT_MODE.
	(3): Likewise.
	(3): New.
	(v3): New.
	(3): New.
	(3): Rename to ...
	(v64si3): ... this, and change modes to V64SI.
	* config/gcn/gcn.md (mnemonic): Use '%B' for not.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index a0cc9a2d8fc..40e864a8de7 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -16,6 +16,10 @@
 
 ;; {{{ Vector iterators
 
+; Vector modes for sub-dword modes
+(define_mode_iterator VEC_SUBDWORD_MODE
+		  [V64QI V64HI])
+
 ; Vector modes for one vector register
 (define_mode_iterator VEC_1REG_MODE
 		  [V64SI V64HF V64SF])
@@ -1881,20 +1885,20 @@
 (define_code_iterator minmaxop [smin smax umin umax])
 
 (define_insn "2"
-  [(set (match_operand:VEC_1REG_INT_MODE 0 "gcn_valu_dst_operand""=  v")
-	(bitunop:VEC_1REG_INT_MODE
-	  (match_operand:VEC_1REG_INT_MODE 1 "gcn_valu_src0_operand" "vSvB")))]
+  [(set (match_operand:VEC_ALL1REG_INT_MODE 0 "gcn_valu_dst_operand""=  v")
+	(bitunop:VEC_ALL1REG_INT_MODE
+	  (match_operand:VEC_ALL1REG_INT_MODE 1 "gcn_valu_src0_operand" "vSvB")))]
   ""
   "v_0\t%0, %1"
   [(set_attr "type" "vop1")
(set_attr "length" "8")])
 
 (define_insn "3"
-  [(set (match_operand:VEC_1REG_INT_MODE 0 "gcn_valu_dst_operand" "=  v,RD")
-	(bitop:VEC_1REG_INT_MODE
-	  (match_operand:VEC_1REG_INT_MODE 1 "gcn_valu_src0_operand"
+  [(set (match_operand:VEC_ALL1REG_INT_MODE 0 "gcn_valu_dst_operand" "=  v,RD")
+	(bitop:VEC_ALL1REG_INT_MODE
+	  (match_operand:VEC_ALL1REG_INT_MODE 1 "gcn_valu_src0_operand"
   "%  v, 0")
-	  (match_operand:VEC_1REG_INT_MODE 2 "gcn_valu_src1com_operand"
+	  (match_operand:VEC_ALL1REG_INT_MODE 2 "gcn_valu_src1com_operand"
   "vSvB, v")))]
   ""
   "@
@@ -1967,6 +1971,27 @@
   [(set_attr "type" "vmult,ds")
(set_attr "length" "16,8")])
 
+(define_expand "3"
+  [(set (match_operand:VEC_SUBDWORD_MODE 0 "register_operand"  "= v")
+	(shiftop:VEC_SUBDWORD_MODE
+	  (match_operand:VEC_SUBDWORD_MODE 1 "gcn_alu_operand" "  v")
+	  (vec_duplicate:VEC_SUBDWORD_MODE
+	(match_operand:SI 2 "gcn_alu_operand"	   "SvB"]
+  ""
+  {
+enum {ashift, lshiftrt, ashiftrt};
+bool unsignedp = ( == lshiftrt);
+rtx insi1 = gen_reg_rtx (V64SImode);
+rtx insi2 = gen_reg_rtx (SImode);
+rtx outsi = gen_reg_rtx (V64SImode);
+
+convert_move (insi1, operands[1], unsignedp);
+convert_move (insi2, operands[2], unsignedp);
+emit_insn (gen_v64si3 (outsi, insi1, insi2));
+convert_move (operands[0], outsi, unsignedp);
+DONE;
+  })
+
 (define_insn "v64si3"
   [(set (match_operand:V64SI 0 "register_operand"  "= v")
 	(shiftop:V64SI
@@ -1978,6 +2003,26 @@
   [(set_attr "type" "vop2")
(set_attr "length" "8")])
 
+(define_expand "v3"
+  [(set (match_operand:VEC_SUBDWORD_MODE 0 "register_operand"  "=v")
+	(shiftop:VEC_SUBDWORD_MODE
+	  (match_operand:VEC_SUBDWORD_MODE 1 "gcn_alu_operand" " v")
+	  (match_operand:VEC_SUBDWORD_MODE 2 "gcn_alu_operand" "vB")))]
+  ""
+  {
+enum {ashift, lshiftrt, ashiftrt};
+bool unsignedp = ( == ashift ||  == ashiftrt);
+rtx insi1 = gen_reg_rtx (V64SImode);
+rtx insi2 = gen_reg_rtx (V64SImode);
+rtx outsi = gen_reg_rtx (V64SImode);
+
+convert_move (insi1, operands[1], unsignedp);
+convert_move (insi2, operands[2], unsignedp);
+emit_insn (gen_vv64si3 (outsi, insi1, insi2));
+convert_move (operands[0], outsi, unsignedp);
+DONE;
+  })
+
 (define_insn "vv64si3"
   [(set (match_operand:V64SI 0 "register_operand"  "=v")
 	(shiftop:V64SI
@@ -1988,13 +2033,31 @@
   [(set_attr "type" "vop2")
(set_attr "length" "8")])
 
-(define_insn "3"
-  [(set (match_operand:VEC_1REG_INT_MODE 0 "gcn_valu_dst_operand" "=  v,RD")
-	(minmaxop:VEC_1REG_INT_MODE
-	  (match_operand:VEC_1REG_INT_MODE 1 "gcn_valu_src0_operand"
-  "%  v, 0")
-	  (match_operand:VEC_1REG_INT_MODE 2 

Re: [GCC][PATCH][ARM] Add multilib mapping for Armv8.1-M+MVE with -mfloat-abi=hard

2020-02-27 Thread Kyrill Tkachov

Hi Mihail,

On 2/20/20 4:15 PM, Mihail Ionescu wrote:

Hi,

This patch adds a new multilib for armv8.1-m.main+mve with hard float 
abi. For

armv8.1-m.main+mve soft and softfp, the v8-M multilibs will be reused.
The following mappings are also updated:
"-mfloat-abi=hard -march=armv8.1-m.main+mve.fp -> armv8-m.main+fp/hard"
"-mfloat-abi=softfp -march=armv8.1-m.main+mve.fp -> 
armv8-m.main+fp/softfp"

"-mfloat-abi=soft -march=armv8.1-m.main+mve.fp -> armv8-m.main/nofp"

The patch also includes a libgcc change to prevent 
cmse_nonsecure_call.S from being
compiled for v8.1-M. v8.1-M doesn't need it since the same behaviour 
is achieved during

code generation by using the new instructions[1].

[1] https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01654.html

Tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-20  Mihail Ionescu  

    * config/arm/t-rmprofile: create new multilib for
    armv8.1-m.main+mve hard float and reuse v8-m.main ones for
    v8.1-m.main+mve .

gcc/testsuite/ChangeLog:

2020-02-20  Mihail Ionescu  

    * testsuite/gcc.target/arm/multilib.exp: Add new v8.1-M entry.



No testsuite/ in the prefix here.



2020-02-20  Mihail Ionescu  

libgcc/ChangLog:

    * config/arm/t-arm: Do not compile cmse_nonsecure_call.S for 
v8.1-m.


Ok for trunk?


Ok.

Thanks,

Kyrill




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index 
0fb3084c8b20f16ccadba632fc55162b196651d5..16e368f25cc2e3ad341adc2752120ad0defdf2a4 
100644

--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -27,8 +27,8 @@

 # Arch and FPU variants to build libraries with

-MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp
-MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base 
v8-m.main v8-m.main+fp v8-m.main+dp
+MULTI_ARCH_OPTS_RM = 
march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7e-m+fp/march=armv7e-m+fp.dp/march=armv8-m.base/march=armv8-m.main/march=armv8-m.main+fp/march=armv8-m.main+fp.dp/march=armv8.1-m.main+mve
+MULTI_ARCH_DIRS_RM = v6-m v7-m v7e-m v7e-m+fp v7e-m+dp v8-m.base 
v8-m.main v8-m.main+fp v8-m.main+dp v8.1-m.main+mve


 # Base M-profile (no fp)
 MULTILIB_REQUIRED   += mthumb/march=armv6s-m/mfloat-abi=soft
@@ -48,8 +48,7 @@ MULTILIB_REQUIRED += 
mthumb/march=armv8-m.main+fp/mfloat-abi=hard

 MULTILIB_REQUIRED   += mthumb/march=armv8-m.main+fp/mfloat-abi=softfp
 MULTILIB_REQUIRED   += 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=hard
 MULTILIB_REQUIRED   += 
mthumb/march=armv8-m.main+fp.dp/mfloat-abi=softfp

-
-
+MULTILIB_REQUIRED  += mthumb/march=armv8.1-m.main+mve/mfloat-abi=hard

 # Arch Matches
 MULTILIB_MATCHES    += march?armv6s-m=march?armv6-m
@@ -66,12 +65,14 @@ MULTILIB_MATCHES    += 
march?armv7e-m+fp=march?armv7e-m+fpv5
 MULTILIB_REUSE  += $(foreach ARCH, armv6s-m armv7-m armv7e-m 
armv8-m\.base armv8-m\.main, \

mthumb/march.$(ARCH)/mfloat-abi.soft=mthumb/march.$(ARCH)/mfloat-abi.softfp)

+
 # Map v8.1-M to v8-M.
 MULTILIB_MATCHES    += march?armv8-m.main=march?armv8.1-m.main
 MULTILIB_MATCHES    += march?armv8-m.main=march?armv8.1-m.main+dsp
-MULTILIB_MATCHES   += march?armv8-m.main=march?armv8.1-m.main+mve
+MULTILIB_REUSE += 
mthumb/march.armv8-m\.main/mfloat-abi.soft=mthumb/march.armv8\.1-m\.main+mve/mfloat-abi.soft
+MULTILIB_REUSE += 
mthumb/march.armv8-m\.main/mfloat-abi.soft=mthumb/march.armv8\.1-m\.main+mve/mfloat-abi.softfp


-v8_1m_sp_variants = +fp +dsp+fp +mve.fp
+v8_1m_sp_variants = +fp +dsp+fp +mve.fp +fp+mve
 v8_1m_dp_variants = +fp.dp +dsp+fp.dp +fp.dp+mve +fp.dp+mve.fp

 # Map all v8.1-m.main FP sp variants down to v8-m.
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
b/gcc/testsuite/gcc.target/arm/multilib.exp
index 
67d00266f6b5e69aa2a7831cfb9a4353ac4f4340..42aaebfabdf76c45a1909b2aaa1651d3c42ee4b7 
100644

--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -813,6 +813,9 @@ if {[multilib_config "rmprofile"] } {
 {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=soft} 
"thumb/v8-m.main/nofp"
 {-march=armv8.1-m.main+mve -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main/nofp"
 {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main+fp/softfp"
+   {-march=armv8.1-m.main+mve -mfpu=auto -mfloat-abi=hard} 
"thumb/v8.1-m.main+mve/hard"
+   {-march=armv8.1-m.main+mve+fp -mfpu=auto -mfloat-abi=hard} 
"thumb/v8-m.main+fp/hard"
+   {-march=armv8.1-m.main+mve+fp -mfpu=auto -mfloat-abi=softfp} 
"thumb/v8-m.main+fp/softfp"
 {-march=armv8.1-m.main+mve.fp -mfpu=auto -mfloat-abi=hard} 
"thumb/v8-m.main+fp/hard"
 {-march=armv8.1-m.main+mve+fp.dp -mfpu=auto -mfloat-abi=soft} 

Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-02-27 Thread Kyrill Tkachov

Hi Mihail,

On 2/27/20 2:44 PM, Mihail Ionescu wrote:

Hi Kyrill,

On 02/27/2020 11:09 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 2/27/20 10:27 AM, Mihail Ionescu wrote:

Hi,

This patch adds support for the bf16 vector create, get, set,
duplicate and reinterpret intrinsics.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-27  Mihail Ionescu  

    * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the
    beginning of the file.
    (vcreate_bf16, vcombine_bf16): New.
    (vdup_n_bf16, vdupq_n_bf16): New.
    (vdup_lane_bf16, vdup_laneq_bf16): New.
    (vdupq_lane_bf16, vdupq_laneq_bf16): New.
    (vduph_lane_bf16, vduph_laneq_bf16): New.
    (vset_lane_bf16, vsetq_lane_bf16): New.
    (vget_lane_bf16, vgetq_lane_bf16): New.
    (vget_high_bf16, vget_low_bf16): New.
    (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New.
    (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New.
    (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New.
    (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New.
    (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New.
    (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New.
    (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New.
    (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New.
    (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New.
    (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New.
    (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New.
    (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New.
    (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New.
    (vreinterpretq_bf16_p128): New.
    (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New.
    (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New.
    (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New.
    (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New.
    (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New.
    (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New.
    (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New.
    (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New.
    (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New.
    (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New.
    (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New.
    (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New.
    (vreinterpretq_p128_bf16): New.
    * config/arm/arm_neon_builtins.def (VDX): Add V4BF.
    (V_elem): Likewise.
    (V_elem_l): Likewise.
    (VD_LANE): Likewise.
    (VQX) Add V8BF.
    (V_DOUBLE): Likewise.
    (VDQX): Add V4BF and V8BF.
    (V_two_elem, V_three_elem, V_four_elem): Likewise.
    (V_reg): Likewise.
    (V_HALF): Likewise.
    (V_double_vector_mode): Likewise.
    (V_cmp_result): Likewise.
    (V_uf_sclr): Likewise.
    (V_sz_elem): Likewise.
    (Is_d_reg): Likewise.
    (V_mode_nunits): Likewise.
    * config/arm/neon.md (neon_vdup_lane): Enable for BFloat.

gcc/testsuite/ChangeLog:

2020-02-27  Mihail Ionescu  

    * gcc.target/arm/bf16_dup.c: New test.
    * gcc.target/arm/bf16_reinterpret.c: Likewise.

Is it ok for trunk?


This looks mostly ok with a few nits...




Regards,
Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,15 @@ extern "C" {
 #include 
 #include 

+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 
1))
+#define __arm_laneq(__vec, __idx) (__idx ^ 
(__ARM_NUM_LANES(__vec)/2 - 1))

+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
+#endif
+
 typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
@@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   /* For big-endian, GCC's vector indices are reversed within each 64
  bits compared to the architectural lane indices used by Neon
  intrinsics.  */



Please move this comment as well.



-#ifdef __ARM_BIG_ENDIAN
-#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
-#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 
1))
-#define __arm_laneq(__vec, __idx) (__idx ^ 
(__ARM_NUM_LANES(__vec)/2 - 1))

-#else
-#define __arm_lane(__vec, __idx) __idx
-#define __arm_laneq(__vec, __idx) __idx
-#endif

 #define vget_lane_f16(__v, __idx)   \
__extension__ \
@@ -14476,6 +14477,15 @@ vreinterpret_p16_u32 

[PATCH] libstdc++: Add missing friend declarations in some range adaptors

2020-02-27 Thread Patrick Palka
Some of the range adaptors have distinct constant and non-constant
iterator/sentinel types, along with converting constructors that can convert a
non-constant iterator/sentinel to a constant iterator/sentinel.  This patch adds
the missing appropriate friend declarations in order to make these converting
constructors well formed.

Strictly speaking it seems the friendship relation doesn't need to go both ways
-- we could get away with declaring e.g. friend _Iterator; instead of
friend _Iterator; but the spec and the reference implementations all
seem to use the latter symmetric form anyway.

libstdc++-v3/ChangeLog:

* include/std/ranges (transform_view::_Iterator<_Const>): Befriend
_Iterator.
(transform_view::_Sentinel<_Const>): Befriend _Sentinel.
(take_view::_Sentinel<_Const>): Likewise.
(take_while_view::_Sentinel<_Const>): Likewise.
(split_view::_OuterIter<_Const>): Befriend _OuterIter.
* std/ranges/adaptors/split.cc: Augment test.
* std/ranges/adaptors/take.cc: Augment test.
* std/ranges/adaptors/take_while.cc: Augment test.
* std/ranges/adaptors/transform.cc: Augment test.
---
 libstdc++-v3/include/std/ranges   |  8 +++
 .../testsuite/std/ranges/adaptors/split.cc| 14 +++
 .../testsuite/std/ranges/adaptors/take.cc | 16 +
 .../std/ranges/adaptors/take_while.cc | 17 ++
 .../std/ranges/adaptors/transform.cc  | 23 +++
 5 files changed, 78 insertions(+)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 2e3e298adcc..2f08cfd7f16 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1844,6 +1844,7 @@ namespace views
requires indirectly_swappable<_Base_iter>
  { return ranges::iter_swap(__x._M_current, __y._M_current); }
 
+ friend _Iterator;
  friend _Sentinel<_Const>;
};
 
@@ -1896,6 +1897,8 @@ namespace views
  operator-(const _Sentinel& __y, const _Iterator<_Const>& __x)
requires sized_sentinel_for, iterator_t<_Base>>
  { return __y.__distance_from(__x); }
+
+ friend _Sentinel;
};
 
   _Vp _M_base = _Vp();
@@ -2001,6 +2004,8 @@ namespace views
 
  friend constexpr bool operator==(const _CI& __y, const _Sentinel& __x)
  { return __y.count() == 0 || __y.base() == __x._M_end; }
+
+ friend _Sentinel;
};
 
   _Vp _M_base = _Vp();
@@ -2140,6 +2145,8 @@ namespace views
  friend constexpr bool
  operator==(const iterator_t<_Base>& __x, const _Sentinel& __y)
  { return __y._M_end == __x || !std::__invoke(*__y._M_pred, *__x); }
+
+ friend _Sentinel;
};
 
   _Vp _M_base = _Vp();
@@ -2831,6 +2838,7 @@ namespace views
  operator==(const _OuterIter& __x, default_sentinel_t)
  { return __x.__at_end(); };
 
+ friend _OuterIter;
  friend _InnerIter<_Const>;
};
 
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
index 52b015cf0c6..e7556725e4f 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/split.cc
@@ -108,6 +108,19 @@ test05()
str | views::filter(not_space_p)) );
 }
 
+void
+test06()
+{
+  std::string str = "hello world";
+  auto v = str | views::transform(std::identity{}) | views::split(' ');
+
+  // Verify that _Iterator is implicitly convertible to _Iterator.
+  static_assert(!std::same_as);
+  auto b = ranges::cbegin(v);
+  b = ranges::begin(v);
+}
+
 int
 main()
 {
@@ -116,4 +129,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/take.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/take.cc
index e2d2edbe0a8..c42505b44cb 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/take.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/take.cc
@@ -19,6 +19,7 @@
 // { dg-do run { target c++2a } }
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -85,6 +86,20 @@ test04()
   VERIFY( ranges::equal(v | views::take(5), (int[]){1,2,3}) );
 }
 
+void
+test05()
+{
+  std::forward_list x = {1,2,3,4,5};
+  auto v = x | views::transform(std::negate{}) | views::take(4);
+
+  // Verify that _Sentinel is implicitly convertible to _Sentinel.
+  static_assert(!ranges::common_range);
+  static_assert(!std::same_as);
+  auto b = ranges::cend(v);
+  b = ranges::end(v);
+}
+
 int
 main()
 {
@@ -92,4 +107,5 @@ main()
   test02();
   test03();
   test04();
+  test05();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/take_while.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/take_while.cc
index b261ffd1aae..d587127b97e 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/take_while.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/take_while.cc
@@ 

New French PO file for 'gcc' (version 10.1-b20200209)

2020-02-27 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/gcc/fr.po

(This file, 'gcc-10.1-b20200209.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH], PR target/93932, Do not use input_operand for variable vector extract insns on PowerPC

2020-02-27 Thread Segher Boessenkool
Hi!

On Wed, Feb 26, 2020 at 05:32:23PM -0500, Michael Meissner wrote:
> What is happening is in some instances, we want to do a vector extract with a
> variable element where the vector is in a register:
> 
>#include 
> 
>   long long
>   foo (vector long long v, unsigned long n)
>   {
> return vec_extract (v, n);
>   }
> 
> During the reload pass, the register allocator decides that it should spill 
> the
> insn to the stack, and then do the vector extract from memory (which is an
> optimization to prevent loading the vector in case we only want one element).

And that causes LHS/SHL, very undesirable.  Okay.

> Note, there is a 4th place that uses input_operand for variable vector 
> extracts
> that is not touched by this patch.

There are more places that use input_operand.  input_operand is meant to
be used for the RHS of mov patterns (with a reg as LHS), and it isn't
good to use it anywhere else: input_operand allows too much: *all*
memory, many constants, datums that can only go into some kinds of regs
(while you might have another kind).

In pretty much all cases reload/LRA can fix things, but you get worse
code that way.

rs6000_expand_vector_extract is always called with a register as second
operand, so the changes you made are safe.

The "reload_completed"s now have no function other than to force worse
code to be generated, you might want to do something about that as a
follow-up.

> -// P8 (LE) variables: addi,xxpermdi,mr,stxvd2x|stxvd4x,rldicl,sldi,ldx,blr
> -// P8 (BE) constants: mfvsrd
> -// P8 (BE) Variables: addi,xxpermdi,rldicl,mr,stxvd2x|stxvd4x,sldi,ldx,blr
> +// P8 (LE) variables: xori, rldic, mtvsrd, xxpermdi, vslo, mfvsrd
> +// P8 (BE) constants: xxpermdi, mfvsrd
> +// P8 (BE) Variables:   rldic, mtvsrd, xxpermdi, vslo, mfvsrd
>  
> -/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {\maddi\M} 3 { target lp64 } } } */
> +/* results. */
> +/* { dg-final { scan-assembler-times {\mxori\M} 3 { target le } } } */
> +/* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target 
> ilp32 } } } */
>  /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {\mlwz\M} 11 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target le } } } */
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 6 { target { be && lp64 
> } } } } */
>  /* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 { target { be && ilp32 
> } } } } */
> -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target { be && lp64 
> } } } } */
> -/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 3 { target 
> lp64 } } } */
> -/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target 
> ilp32 } } } */
> -/* { dg-final { scan-assembler-times {\mrldicl\M|\mrldic\M|\mrlwinm\M} 3 } } 
> */
> -/* { dg-final { scan-assembler-times {\mmfvsrd\M} 3 { target { lp64 } } } } 
> */
> -/* { dg-final { scan-assembler-times {\mmfvsrd\M} 0 { target { be && ilp32 } 
> } } } */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { lp64 } } } } 
> */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { be && ilp32 } 
> } } } */
> +/* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */

All this stays super fragile.

Okay for trunk.  Thanks!


Segher


Re: structurally compare type_arg_packs [93933]

2020-02-27 Thread Jason Merrill

On 2/27/20 10:33 AM, Nathan Sidwell wrote:

On 2/26/20 5:00 PM, Jason Merrill wrote:

On 2/25/20 4:09 PM, Nathan Sidwell wrote:
We consider all TYPE_ARGUMENT_PACKS distinct types, leading to 
problems with redeclarations.


I'd think that the bug is that we're treating them as types in the 
first place; they aren't types, so they shouldn't reach comptypes.  
I'd lean toward adding an assert to that effect and fixing the caller 
to use e.g. template_args_equal.


Thanks, this patch implements that approach.


That TYPE_ARGUMENT_PACKS are not types, suggests to me that 
NONTYPE_ARGUMENT_PACKS are not expressions.  Perhaps their 
representation should be unified -- I keep encountering code handling 
them essentially doing the same thing for both kinds.  But that's a 
GCC-11 thing at least.


Agreed.  I took a step in that direction when I removed TREE_TYPE from 
NONTYPE_ARGUMENT_PACK, but going on to unify the packs makes sense to me.


Jason



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt



On 2/27/20 9:30 AM, Jakub Jelinek wrote:

On Thu, Feb 27, 2020 at 09:19:25AM -0600, Bill Schmidt wrote:

On 2/27/20 8:52 AM, Jakub Jelinek wrote:

On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:

But is this actually a good idea?  It seems to me this will generate lousy
code in the absence of hardware support.  Won't we be better off warning and
ignoring the directive, leaving the code in scalar form?

Depends on the exact code, I think sometimes it will be just fine and will
allow vectorizing something that really couldn't be otherwise.
Isn't it better to leave it for the user to decide?
They can always ask for it not to be generated (add notinbranch) if it isn't
worthwhile.

You need a high ratio of unguarded code to guarded code in order to pay for all
those vector extract and reconstruct operations.  Sure, some code will be fine,
but a lot of code will be lousy.  This will be particularly true on older
hardware with a less exhaustive set of vector operations.

Why?  E.g. for integral code other than division or memory loads/stores where
nothing will really trap, you can just perform it unguarded.
Just use whatever the vectorizer does right now for conditional code, and if
that isn't as efficient as it could be given a particular HW/ISA, try to improve
it?


If that's how the vectorizer is working today, then my concerns are certainly
lessened.  It's been a while since I've seen how the vectorizer and 
if-conversion
interact, so my perspective is probably outdated.  We'll take a look at it.

Thanks for the discussion!

Bill



I really don't see how is it different say from SSE2 on x86 or even AVX.

Jakub



[PATCH] libstdc++: -D_GLIBCXX_DEBUG fixes in the constrained algos tests

2020-02-27 Thread Patrick Palka
This fixes some failures in the constrained algos tests when run in debug mode.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/copy/constrained.cc: Don't assume that the
base() of an vector<>::iterator is a pointer.
* testsuite/25_algorithms/copy_backward/constrained.cc: Likewise.
* testsuite/25_algorithms/move/constrained.cc: Likewise.
* testsuite/25_algorithms/move_backward/constrained.cc: Likewise.
* testsuite/25_algorithms/inplace_merge/constrained.cc: Use foo.data()
instead of [0] when foo is a vector.
* testsuite/25_algorithms/partial_sort/constrained.cc: Likewise.
* testsuite/25_algorithms/partial_sort_copy/constrained.cc: Likewise.
* testsuite/25_algorithms/shuffle/constrained.cc: Likewise.
* testsuite/25_algorithms/sort/constrained.cc: Likewise.
* testsuite/25_algorithms/stable_sort/constrained.cc: Likewise.
---
 .../testsuite/25_algorithms/copy/constrained.cc  | 16 
 .../25_algorithms/copy_backward/constrained.cc   | 16 
 .../25_algorithms/inplace_merge/constrained.cc   |  2 +-
 .../testsuite/25_algorithms/move/constrained.cc  | 16 
 .../25_algorithms/move_backward/constrained.cc   | 16 
 .../25_algorithms/partial_sort/constrained.cc|  4 ++--
 .../partial_sort_copy/constrained.cc |  8 
 .../25_algorithms/shuffle/constrained.cc |  4 ++--
 .../testsuite/25_algorithms/sort/constrained.cc  |  4 ++--
 .../25_algorithms/stable_sort/constrained.cc |  4 ++--
 10 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc
index 85f7d649608..aafe845db3a 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/constrained.cc
@@ -70,8 +70,8 @@ test01()
   std::vector y(3);
   const int z[3] = { 1, 2, 3 };
   auto [in, out] = ranges::copy(x, ranges::begin(y));
-  VERIFY( in.base() == x.data()+3 );
-  VERIFY( out.base() == y.data()+3 );
+  VERIFY( in == x.begin()+3 );
+  VERIFY( out == y.begin()+3 );
   VERIFY( ranges::equal(y, z) && ranges::equal(x, z) );
 }
 
@@ -81,8 +81,8 @@ test01()
   std::vector y(3);
   const int z[3] = { 1, 2, 3 };
   auto [in, out] = ranges::copy(x, ranges::begin(y));
-  VERIFY( in.base() == x.data()+3 );
-  VERIFY( out.base() == y.data()+3 );
+  VERIFY( in == x.begin()+3 );
+  VERIFY( out == y.begin()+3 );
   VERIFY( ranges::equal(y, z) && ranges::equal(x, z) );
 }
 
@@ -93,8 +93,8 @@ test01()
   auto [in,out] = ranges::copy(make_reverse_iterator(x.end()),
   make_reverse_iterator(x.begin()),
   make_reverse_iterator(y.end()));
-  VERIFY( in.base().base() == x.data()+3 );
-  VERIFY( out.base().base() == y.data() );
+  VERIFY( in.base() == x.begin()+3 );
+  VERIFY( out.base() == y.begin() );
   VERIFY( ranges::equal(y, z) && ranges::equal(x, z) );
 }
 
@@ -105,8 +105,8 @@ test01()
   auto [in,out] = ranges::copy(make_reverse_iterator(x.end()),
   make_reverse_iterator(x.begin()),
   make_reverse_iterator(y.end()));
-  VERIFY( in.base().base() == x.data()+3 );
-  VERIFY( out.base().base() == y.data() );
+  VERIFY( in.base() == x.begin()+3 );
+  VERIFY( out.base() == y.begin() );
   VERIFY( ranges::equal(y, z) && ranges::equal(x, z) );
 }
 }
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy_backward/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/copy_backward/constrained.cc
index 900f78aaa73..9df2a2ff593 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy_backward/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy_backward/constrained.cc
@@ -57,8 +57,8 @@ test01()
   std::vector y(3);
   const int z[3] = { 1, 2, 3 };
   auto [in, out] = ranges::copy_backward(x, ranges::end(y));
-  VERIFY( in.base() == x.data()+3 );
-  VERIFY( out.base() == y.data() );
+  VERIFY( in == x.begin()+3 );
+  VERIFY( out == y.begin() );
   VERIFY( ranges::equal(y, z) && ranges::equal(x, z) );
 }
 
@@ -68,8 +68,8 @@ test01()
   std::vector y(3);
   const int z[3] = { 1, 2, 3 };
   auto [in, out] = ranges::copy_backward(x, ranges::end(y));
-  VERIFY( in.base() == x.data()+3 );
-  VERIFY( out.base() == y.data() );
+  VERIFY( in == x.begin()+3 );
+  VERIFY( out == y.begin() );
   VERIFY( ranges::equal(y, z) && ranges::equal(x, z) );
 }
 
@@ -80,8 +80,8 @@ test01()
   auto [in,out] = ranges::copy_backward(make_reverse_iterator(x.end()),
make_reverse_iterator(x.begin()),

Re: structurally compare type_arg_packs [93933]

2020-02-27 Thread Nathan Sidwell

On 2/26/20 5:00 PM, Jason Merrill wrote:

On 2/25/20 4:09 PM, Nathan Sidwell wrote:
We consider all TYPE_ARGUMENT_PACKS distinct types, leading to 
problems with redeclarations.


I'd think that the bug is that we're treating them as types in the first 
place; they aren't types, so they shouldn't reach comptypes.  I'd lean 
toward adding an assert to that effect and fixing the caller to use e.g. 
template_args_equal.


Thanks, this patch implements that approach.

That TYPE_ARGUMENT_PACKS are not types, suggests to me that 
NONTYPE_ARGUMENT_PACKS are not expressions.  Perhaps their 
representation should be unified -- I keep encountering code handling 
them essentially doing the same thing for both kinds.  But that's a 
GCC-11 thing at least.


nathan

--
Nathan Sidwell
2020-02-27  Nathan Sidwell  

	PR c++/93933
	* pt.c (template_args_equal): Pass ARGUMENT_PACKS through to
	cp_tree_equal.
	* tree.c (cp_tree_equal): Compare ARGUMENT_PACKS here,
	* typeck.c (comptypes): Assert we don't get any argument packs.

diff --git c/gcc/cp/pt.c w/gcc/cp/pt.c
index 6c9abb8f3d3..622c70b352f 100644
--- c/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -8999,25 +8999,8 @@ template_args_equal (tree ot, tree nt, bool partial_order /* = false */)
 PACK_EXPANSION_PATTERN (nt))
 	&& template_args_equal (PACK_EXPANSION_EXTRA_ARGS (ot),
 PACK_EXPANSION_EXTRA_ARGS (nt)));
-  else if (ARGUMENT_PACK_P (ot))
-{
-  int i, len;
-  tree opack, npack;
-
-  if (!ARGUMENT_PACK_P (nt))
-	return 0;
-
-  opack = ARGUMENT_PACK_ARGS (ot);
-  npack = ARGUMENT_PACK_ARGS (nt);
-  len = TREE_VEC_LENGTH (opack);
-  if (TREE_VEC_LENGTH (npack) != len)
-	return 0;
-  for (i = 0; i < len; ++i)
-	if (!template_args_equal (TREE_VEC_ELT (opack, i),
-  TREE_VEC_ELT (npack, i)))
-	  return 0;
-  return 1;
-}
+  else if (ARGUMENT_PACK_P (ot) || ARGUMENT_PACK_P (nt))
+return cp_tree_equal (ot, nt);
   else if (ot && TREE_CODE (ot) == ARGUMENT_PACK_SELECT)
 gcc_unreachable ();
   else if (TYPE_P (nt))
diff --git c/gcc/cp/tree.c w/gcc/cp/tree.c
index 72b3a720ee8..3fc6287d566 100644
--- c/gcc/cp/tree.c
+++ w/gcc/cp/tree.c
@@ -3857,12 +3857,27 @@ cp_tree_equal (tree t1, tree t2)
 			 DEFERRED_NOEXCEPT_PATTERN (t2))
 	  && comp_template_args (DEFERRED_NOEXCEPT_ARGS (t1),
  DEFERRED_NOEXCEPT_ARGS (t2)));
-  break;
 
 case LAMBDA_EXPR:
   /* Two lambda-expressions are never considered equivalent.  */
   return false;
 
+case TYPE_ARGUMENT_PACK:
+case NONTYPE_ARGUMENT_PACK:
+  {
+	tree p1 = ARGUMENT_PACK_ARGS (t1);
+	tree p2 = ARGUMENT_PACK_ARGS (t2);
+	int len = TREE_VEC_LENGTH (p1);
+	if (TREE_VEC_LENGTH (p2) != len)
+	  return false;
+
+	for (int ix = 0; ix != len; ix++)
+	  if (!template_args_equal (TREE_VEC_ELT (p1, ix),
+TREE_VEC_ELT (p2, ix)))
+	return false;
+	return true;
+  }
+
 default:
   break;
 }
diff --git c/gcc/cp/typeck.c w/gcc/cp/typeck.c
index 42d0b47cf1b..2a3243f3e81 100644
--- c/gcc/cp/typeck.c
+++ w/gcc/cp/typeck.c
@@ -1485,6 +1485,10 @@ comptypes (tree t1, tree t2, int strict)
 {
   gcc_checking_assert (t1 && t2);
 
+  /* TYPE_ARGUMENT_PACKS are not really types.  */
+  gcc_checking_assert (TREE_CODE (t1) != TYPE_ARGUMENT_PACK
+		   && TREE_CODE (t2) != TYPE_ARGUMENT_PACK);
+
   if (strict == COMPARE_STRICT && comparing_specializations
   && (t1 != TYPE_CANONICAL (t1) || t2 != TYPE_CANONICAL (t2)))
 /* If comparing_specializations, treat dependent aliases as distinct.  */
diff --git c/gcc/testsuite/g++.dg/concepts/pr93933.C w/gcc/testsuite/g++.dg/concepts/pr93933.C
new file mode 100644
index 000..b4f2c36374d
--- /dev/null
+++ w/gcc/testsuite/g++.dg/concepts/pr93933.C
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++17 } }
+// { dg-options "-fconcepts" }
+
+// distilled from , via header units
+
+template
+struct is_invocable;
+
+template
+concept invocable = is_invocable<_Args...>::value;
+
+template
+requires invocable<_Is>
+class BUG;
+
+template
+requires invocable<_Is>
+class BUG {}; // { dg-bogus "different constraints" }
+
+template struct is_invocable_NT;
+
+template
+concept invocable_NT = is_invocable_NT::value;
+
+template
+requires invocable_NT<_Is>
+class BUG_NT;
+
+template
+requires invocable_NT<_Is>
+class BUG_NT {};


Re: GLIBC libmvec status

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 09:19:25AM -0600, Bill Schmidt wrote:
> On 2/27/20 8:52 AM, Jakub Jelinek wrote:
> > On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:
> > > But is this actually a good idea?  It seems to me this will generate lousy
> > > code in the absence of hardware support.  Won't we be better off warning 
> > > and
> > > ignoring the directive, leaving the code in scalar form?
> > Depends on the exact code, I think sometimes it will be just fine and will
> > allow vectorizing something that really couldn't be otherwise.
> > Isn't it better to leave it for the user to decide?
> > They can always ask for it not to be generated (add notinbranch) if it isn't
> > worthwhile.
> 
> You need a high ratio of unguarded code to guarded code in order to pay for 
> all
> those vector extract and reconstruct operations.  Sure, some code will be 
> fine,
> but a lot of code will be lousy.  This will be particularly true on older
> hardware with a less exhaustive set of vector operations.

Why?  E.g. for integral code other than division or memory loads/stores where
nothing will really trap, you can just perform it unguarded.
Just use whatever the vectorizer does right now for conditional code, and if
that isn't as efficient as it could be given a particular HW/ISA, try to improve
it?

I really don't see how is it different say from SSE2 on x86 or even AVX.

Jakub



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt

On 2/27/20 8:52 AM, Jakub Jelinek wrote:

On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:

But is this actually a good idea?  It seems to me this will generate lousy
code in the absence of hardware support.  Won't we be better off warning and
ignoring the directive, leaving the code in scalar form?

Depends on the exact code, I think sometimes it will be just fine and will
allow vectorizing something that really couldn't be otherwise.
Isn't it better to leave it for the user to decide?
They can always ask for it not to be generated (add notinbranch) if it isn't
worthwhile.


You need a high ratio of unguarded code to guarded code in order to pay for all
those vector extract and reconstruct operations.  Sure, some code will be fine,
but a lot of code will be lousy.  This will be particularly true on older
hardware with a less exhaustive set of vector operations.

In the lousy-code case, my concern is that the user won't be savvy enough to
understand they should add notinbranch.  They'll just notice that their code
runs badly on Power and either complain (good, then we can explain it) or
abandon porting existing code to Power (very bad, and we may never know).
I don't like the downside, and the upside is quite unpredictable.

Bill



Jakub



[committed] libstdc++: Fix debug mode test failures

2020-02-27 Thread Jonathan Wakely

Three fixes for test failures when debug mode is enabled.

Tested x86_64-linux, committed to master.


commit ae7051590d4bf9b844874e727791f236315c835a
Author: Jonathan Wakely 
Date:   Thu Feb 27 15:13:16 2020 +

libstdc++: Define <=> for Debug Mode array

This fixes a test failure with -D_GLIBCXX_DEBUG:

FAIL: 23_containers/array/comparison_operators/constexpr.cc (test for excess errors)

* include/debug/array (operator<=>): Define for C++20.
* testsuite/23_containers/array/tuple_interface/get_debug_neg.cc:
Adjust dg-error line numbers.
* testsuite/23_containers/array/tuple_interface/
tuple_element_debug_neg.cc: Likewise.

diff --git a/libstdc++-v3/include/debug/array b/libstdc++-v3/include/debug/array
index 3f87e98fe8d..dd4044c9c7b 100644
--- a/libstdc++-v3/include/debug/array
+++ b/libstdc++-v3/include/debug/array
@@ -239,6 +239,25 @@ namespace __debug
 operator==(const array<_Tp, _Nm>& __one, const array<_Tp, _Nm>& __two)
 { return std::equal(__one.begin(), __one.end(), __two.begin()); }
 
+#if __cpp_lib_three_way_comparison && __cpp_lib_concepts
+  template
+constexpr __detail::__synth3way_t<_Tp>
+operator<=>(const array<_Tp, _Nm>& __a, const array<_Tp, _Nm>& __b)
+{
+  if constexpr (_Nm && __is_byte<_Tp>::__value)
+	return __builtin_memcmp(__a.data(), __b.data(), _Nm) <=> 0;
+  else
+	{
+	  for (size_t __i = 0; __i < _Nm; ++__i)
+	{
+	  auto __c = __detail::__synth3way(__a[__i], __b[__i]);
+	  if (__c != 0)
+		return __c;
+	}
+	}
+  return strong_ordering::equal;
+}
+#else
   template
 _GLIBCXX20_CONSTEXPR
 inline bool
@@ -271,6 +290,7 @@ namespace __debug
 inline bool
 operator>=(const array<_Tp, _Nm>& __one, const array<_Tp, _Nm>& __two)
 { return !(__one < __two); }
+#endif // three_way_comparison && concepts
 
   // Specialized algorithms.
 
diff --git a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc
index 2736d060aed..0a9525e9654 100644
--- a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_debug_neg.cc
@@ -27,6 +27,6 @@ int n1 = std::get<1>(a);
 int n2 = std::get<1>(std::move(a));
 int n3 = std::get<1>(ca);
 
-// { dg-error "static assertion failed" "" { target *-*-* } 295 }
-// { dg-error "static assertion failed" "" { target *-*-* } 304 }
-// { dg-error "static assertion failed" "" { target *-*-* } 312 }
+// { dg-error "static assertion failed" "" { target *-*-* } 315 }
+// { dg-error "static assertion failed" "" { target *-*-* } 324 }
+// { dg-error "static assertion failed" "" { target *-*-* } 332 }
diff --git a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc
index bca290b3625..0bd5989c04a 100644
--- a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/tuple_element_debug_neg.cc
@@ -22,4 +22,4 @@
 
 typedef std::tuple_element<1, std::array>::type type;
 
-// { dg-error "static assertion failed" "" { target *-*-* } 377 }
+// { dg-error "static assertion failed" "" { target *-*-* } 397 }

commit b112e3cb6025938ef9d8568d318e23e44c0c8fdd
Author: Jonathan Wakely 
Date:   Thu Feb 27 15:13:16 2020 +

libstdc++: Fix std::span test failures with _GLIBCXX_ASSERTIONS

This fixes several failures with -D_GLIBCXX_ASSERTIONS added to the
testsuite flags, such as:

FAIL: 23_containers/span/back_assert_neg.cc (test for excess errors)

* testsuite/23_containers/span/back_assert_neg.cc: Add #undef before
defining _GLIBCXX_ASSERTIONS.
* testsuite/23_containers/span/first_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/first_assert_neg.cc: Likewise.
* testsuite/23_containers/span/front_assert_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_assert_neg.cc: Likewise.
* testsuite/23_containers/span/last_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/last_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_3_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_4_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_5_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_6_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_assert_neg.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/23_containers/span/back_assert_neg.cc b/libstdc++-v3/testsuite/23_containers/span/back_assert_neg.cc
index 

Re: patch to fix PR93564

2020-02-27 Thread Vladimir Makarov



On 2020-02-27 7:33 a.m., Andrew Stubbs wrote:

On 26/02/2020 15:16, Andrew Stubbs wrote:
The problem appears to be that the high-part of a register pair is 
not marked as "ever live".  I'm trying to figure out whether this is 
some kind of target-specific issue that has merely been exposed, but 
it's difficult to see what's going on. I'm pretty sure I've never 
seen this one before.


I'm now pretty sure your patch didn't cause this issue so much as 
expose it.


Either way, it's fixed now.

Thank you for informing me about this.  Such heuristic changes should 
not affect generated code correctness unless they trigger a hidden bug 
in RA or machine-depended code used by RA.





Re: GLIBC libmvec status

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 08:47:19AM -0600, Bill Schmidt wrote:
> But is this actually a good idea?  It seems to me this will generate lousy
> code in the absence of hardware support.  Won't we be better off warning and
> ignoring the directive, leaving the code in scalar form?

Depends on the exact code, I think sometimes it will be just fine and will
allow vectorizing something that really couldn't be otherwise.
Isn't it better to leave it for the user to decide?
They can always ask for it not to be generated (add notinbranch) if it isn't
worthwhile.

Jakub



Re: [PATCH 01/10] i386: Properly encode vector registers in vector move

2020-02-27 Thread H.J. Lu
On Wed, Feb 26, 2020 at 4:24 PM Jeff Law  wrote:
>
> On Wed, 2020-02-26 at 16:02 -0800, H.J. Lu wrote:
> > On Wed, Feb 26, 2020 at 2:42 PM Jeff Law  wrote:
> > > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > > > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> > > >
> > > >0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2
> > > >4: 62 f1 fd 08 6f d1   vmovdqa64 %xmm1,%xmm2
> > > >
> > > > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > > > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > > > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > > > indicate target preferences of vector move encoding.  For vector 
> > > > register
> > > > to vector register move, we can use 512-bit vector move instructions to
> > > > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector 
> > > > moves
> > > > if upper 16 vector registers aren't used.  This patch adds a function,
> > > > ix86_output_ssemov, to generate vector moves:
> > > >
> > > > 1. If zmm registers are used, use EVEX encoding.
> > > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > > > will be generated.
> > > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > > >a. With AVX512VL, AVX512VL vector moves will be generated.
> > > >b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > > >   move will be done with zmm register move.
> > > >
> > > >
> > > [ ... ]
> > >
> > > > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > > > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > > > +   TARGET_AVX512VL or it is a register to register move which can
> > > > +   be done with zmm register move. */
> > > > +
> > > > +static const char *
> > > > +ix86_get_ssemov (rtx *operands, unsigned size,
> > > > +  enum attr_mode insn_mode, machine_mode mode)
> > > > +{
> > > > +  char buf[128];
> > > > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > > > +|| misaligned_operand (operands[1], mode));
> > > > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > > > +  || EXT_REX_SSE_REG_P (operands[1]));
> > > > +  machine_mode scalar_mode;
> > > > +
> > > > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > > > +{
> > > > +  switch (scalar_mode)
> > > > + {
> > > > + case E_QImode:
> > > > +   if (size == 64)
> > > > + opcode = (misaligned_p
> > > > +   ? (TARGET_AVX512BW
> > > > +  ? "vmovdqu8"
> > > > +  : "vmovdqu64")
> > > > +   : "vmovdqa64");
> > > > +   else if (evex_reg_p)
> > > > + {
> > > > +   if (TARGET_AVX512VL)
> > > > + opcode = (misaligned_p
> > > > +   ? (TARGET_AVX512BW
> > > > +  ? "vmovdqu8"
> > > > +  : "vmovdqu64")
> > > > +   : "vmovdqa64");
> > > > + }
> > > > +   else
> > > > + opcode = (misaligned_p
> > > > +   ? (TARGET_AVX512BW
> > > > +  ? "vmovdqu8"
> > > > +  : "%vmovdqu")
> > > > +   : "%vmovdqa");
> > > > +   break;
> > > > + case E_HImode:
> > > > +   if (size == 64)
> > > > + opcode = (misaligned_p
> > > > +   ? (TARGET_AVX512BW
> > > > +  ? "vmovdqu16"
> > > > +  : "vmovdqu64")
> > > > +   : "vmovdqa64");
> > > > +   else if (evex_reg_p)
> > > > + {
> > > > +   if (TARGET_AVX512VL)
> > > > + opcode = (misaligned_p
> > > > +   ? (TARGET_AVX512BW
> > > > +  ? "vmovdqu16"
> > > > +  : "vmovdqu64")
> > > > +   : "vmovdqa64");
> > > > + }
> > > > +   else
> > > > + opcode = (misaligned_p
> > > > +   ? (TARGET_AVX512BW
> > > > +  ? "vmovdqu16"
> > > > +  : "%vmovdqu")
> > > > +   : "%vmovdqa");
> > > > +   break;
> > > > + case E_SImode:
> > > > +   if (size == 64)
> > > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > > +   else if (evex_reg_p)
> > > > + {
> > > > +   if (TARGET_AVX512VL)
> > > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > > + }
> > > > +   else
> > > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > > +   break;
> > > > + case E_DImode:
> > > > + case E_TImode:
> > > > + case E_OImode:
> > > > +   if (size == 64)
> > > > + opcode = misaligned_p 

Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt

On 2/26/20 8:31 AM, Jakub Jelinek wrote:

On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:

The hope is that we can create a vectorized version that returns values
in registers rather than the by-ref parameters, and add code to GCC to
copy things around correctly following the call.  Ideally the signature of
the vectorized version would be sth like

   struct retval {vector double, vector double};
   retval vecsincos (vector double);

In the typical case where calls to sincos are of the form

   sincos (val[i], [i], [i]);

this would allow us to only store the values in the caller upon return,
rather than store them in the callee and potentially reload them
immediately in the caller.  On some Power CPUs, the latter behavior can
result in somewhat costly stalls if the consecutive accesses hit a timing
window.

But can't you do
#pragma omp declare simd linear(sinp, cosp)
void sincos (double x, double *sinp, double *cosp);
?
That is something the vectorizer code could handle and for
   for (int i = 0; i < 1024; i++)
 sincos (val[i], [i], [i]);
just vectorize it as
   for (int i = 0; i < 1024; i += vf)
 _ZGVbN8vl8l8_sincos (*(vector double *)[i], [i], [i]);
Anything else will need specialized code to handle sincos specially in the
vectorizer.


After reading all the discussion on this thread, yes, I agree for now.
It will be good for everybody if we can get the vectorized cexpi sorted
out at some point, which will give us a superior interface.


If you feel it isn't possible to do this, then we can abandon it.  Right
now my understanding is that GCC doesn't vectorize calls to sincos yet
for any targets, so it would be moot except that we really should define
what happens for the future.

This calling convention would also be useful in the future for vectorizing
functions that return complex values either by value or by reference.

Only by value, you really don't know what the code does if something is
passed by reference, whether it is read, written into, or both etc.
And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
pass them, just GCC isn't able to do that right now.


Per the fork of the thread with Segher, I've cried uncle on the specifics
of the calling convention. :)




Well, as a matter of practicality, we don't have any of that implemented
in the rs6000 back end, and we don't have any free resources to do that
in GCC 11.  Is there any documentation about what needs to be done to
support this?  I've always been under the impression that vectorizing for
masking when there isn't any hardware support is a losing proposition, so
we've not investigated it.

You don't need to do pretty much anything, except set
clonei->mask_mode = VOIDmode, I think the generic code should handle that
everything beyond that, in particular add the mask argument and use it
both on the caller side and on the expansion of the to be vectorized clone.


But is this actually a good idea?  It seems to me this will generate lousy
code in the absence of hardware support.  Won't we be better off warning and
ignoring the directive, leaving the code in scalar form?

If and when we have hardware support for vector masking, I'll be happy to
remove this restriction, but I need more convincing to do it now.

Thanks,
Bill



Jakub



Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-02-27 Thread Mihail Ionescu

Hi Kyrill,

On 02/27/2020 11:09 AM, Kyrill Tkachov wrote:

Hi Mihail,

On 2/27/20 10:27 AM, Mihail Ionescu wrote:

Hi,

This patch adds support for the bf16 vector create, get, set,
duplicate and reinterpret intrinsics.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-27  Mihail Ionescu  

    * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the
    beginning of the file.
    (vcreate_bf16, vcombine_bf16): New.
    (vdup_n_bf16, vdupq_n_bf16): New.
    (vdup_lane_bf16, vdup_laneq_bf16): New.
    (vdupq_lane_bf16, vdupq_laneq_bf16): New.
    (vduph_lane_bf16, vduph_laneq_bf16): New.
    (vset_lane_bf16, vsetq_lane_bf16): New.
    (vget_lane_bf16, vgetq_lane_bf16): New.
    (vget_high_bf16, vget_low_bf16): New.
    (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New.
    (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New.
    (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New.
    (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New.
    (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New.
    (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New.
    (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New.
    (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New.
    (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New.
    (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New.
    (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New.
    (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New.
    (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New.
    (vreinterpretq_bf16_p128): New.
    (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New.
    (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New.
    (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New.
    (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New.
    (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New.
    (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New.
    (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New.
    (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New.
    (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New.
    (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New.
    (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New.
    (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New.
    (vreinterpretq_p128_bf16): New.
    * config/arm/arm_neon_builtins.def (VDX): Add V4BF.
    (V_elem): Likewise.
    (V_elem_l): Likewise.
    (VD_LANE): Likewise.
    (VQX) Add V8BF.
    (V_DOUBLE): Likewise.
    (VDQX): Add V4BF and V8BF.
    (V_two_elem, V_three_elem, V_four_elem): Likewise.
    (V_reg): Likewise.
    (V_HALF): Likewise.
    (V_double_vector_mode): Likewise.
    (V_cmp_result): Likewise.
    (V_uf_sclr): Likewise.
    (V_sz_elem): Likewise.
    (Is_d_reg): Likewise.
    (V_mode_nunits): Likewise.
    * config/arm/neon.md (neon_vdup_lane): Enable for BFloat.

gcc/testsuite/ChangeLog:

2020-02-27  Mihail Ionescu  

    * gcc.target/arm/bf16_dup.c: New test.
    * gcc.target/arm/bf16_reinterpret.c: Likewise.

Is it ok for trunk?


This looks mostly ok with a few nits...




Regards,
Mihail


### Attachment also inlined for ease of reply 
###



diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,15 @@ extern "C" {
 #include 
 #include 

+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
+#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 
- 1))

+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
+#endif
+
 typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
@@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   /* For big-endian, GCC's vector indices are reversed within each 64
  bits compared to the architectural lane indices used by Neon
  intrinsics.  */



Please move this comment as well.



-#ifdef __ARM_BIG_ENDIAN
-#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
-#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
-#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 
- 1))

-#else
-#define __arm_lane(__vec, __idx) __idx
-#define __arm_laneq(__vec, __idx) __idx
-#endif

 #define vget_lane_f16(__v, __idx)   \
__extension__ \
@@ -14476,6 +14477,15 @@ vreinterpret_p16_u32 (uint32x2_t __a)
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined 

Re: [GCC] Fix misleading aarch64 mcpu/march warning string

2020-02-27 Thread Kyrill Tkachov

Hi Joel,

On 2/27/20 2:31 PM, Joel Hutton wrote:

The message for conflicting mcpu and march previously printed the
architecture of the CPU instead of the CPU name, as well as omitting the
extensions to the march string. This patch corrects both errors. This
patch fixes PR target/87612.


before:
$ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve
-mcpu=cortex-a76 foo.c

cc1: warning: switch '-mcpu=armv8.2-a' conflicts with '-march=armv8-a'
switch

after:
$ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve
-mcpu=cortex-a76 foo.c

cc1: warning: switch '-mcpu=cortex-a76' conflicts with
'-march=armv8-a+sve' switch


gcc/ChangeLog:

2020-02-27  Joel Hutton  
    PR target/87612
    * config/aarch64/aarch64.c (aarch64_override_options): Fix
misleading warning string.



Newline after the Name/email line in the ChangeLog.

This is okay for trunk.

Do you have commit access?

If not, please follow the steps at 
https://gcc.gnu.org/gitwrite.html#authenticated listing myself as approver.


Then you can commit this yourself.

Thanks,

Kyrill



[PATCH] Fix broken assert

2020-02-27 Thread Nathan Sidwell
In implementing Jason's suggested direction for 93933, the compiler 
exploded in a surprising way.  Turns out an assert had been passing 
NULLS to comptypes, and therefore not checking what it intended.


Further comptypes, could silently accept such nulls under most 
circumstances.


Applying this this to fix the original assert, and assert nulls never 
make it to comptypes.


nathan
--
Nathan Sidwell
2020-02-27  Nathan Sidwell  

	* class.c (adjust_clone_args): Correct arg-checking assert.
	* typeck.c (comptypes): Assert not nulls.

diff --git i/gcc/cp/class.c w/gcc/cp/class.c
index 6b779da0495..b3787f75d7b 100644
--- i/gcc/cp/class.c
+++ w/gcc/cp/class.c
@@ -4900,8 +4900,8 @@ adjust_clone_args (tree decl)
 	  break;
 	}
 
-	  gcc_assert (same_type_p (TREE_TYPE (decl_parms),
-   TREE_TYPE (clone_parms)));
+	  gcc_checking_assert (same_type_p (TREE_VALUE (decl_parms),
+	TREE_VALUE (clone_parms)));
 
 	  if (TREE_PURPOSE (decl_parms) && !TREE_PURPOSE (clone_parms))
 	{
diff --git i/gcc/cp/typeck.c w/gcc/cp/typeck.c
index 103a1a439ec..42d0b47cf1b 100644
--- i/gcc/cp/typeck.c
+++ w/gcc/cp/typeck.c
@@ -1483,10 +1483,13 @@ structural_comptypes (tree t1, tree t2, int strict)
 bool
 comptypes (tree t1, tree t2, int strict)
 {
+  gcc_checking_assert (t1 && t2);
+
   if (strict == COMPARE_STRICT && comparing_specializations
   && (t1 != TYPE_CANONICAL (t1) || t2 != TYPE_CANONICAL (t2)))
 /* If comparing_specializations, treat dependent aliases as distinct.  */
 strict = COMPARE_STRUCTURAL;
+
   if (strict == COMPARE_STRICT)
 {
   if (t1 == t2)


[GCC] Fix misleading aarch64 mcpu/march warning string

2020-02-27 Thread Joel
The message for conflicting mcpu and march previously printed the
architecture of the CPU instead of the CPU name, as well as omitting the
extensions to the march string. This patch corrects both errors. This
patch fixes PR target/87612.


before:
$ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve
-mcpu=cortex-a76 foo.c

cc1: warning: switch '-mcpu=armv8.2-a' conflicts with '-march=armv8-a'
switch

after:
$ aarch64-unknown-linux-gnu-gcc -S -O3 -march=armv8-a+sve
-mcpu=cortex-a76 foo.c

cc1: warning: switch '-mcpu=cortex-a76' conflicts with
'-march=armv8-a+sve' switch


gcc/ChangeLog:

2020-02-27  Joel Hutton  
PR target/87612
* config/aarch64/aarch64.c (aarch64_override_options): Fix
misleading warning string.
>From 67e2be75db63238bb8d4418db70fb5876465f9f7 Mon Sep 17 00:00:00 2001
From: Joel Hutton 
Date: Thu, 27 Feb 2020 12:02:09 +
Subject: [PATCH] Fix aarch64 warning for conflicting mcpu/march

The message for conflicting cpu and march previously printed the
architecture of the CPU instead of the CPU name, as well as omitting the
extensions to the march string. This patch corrects both errors.
---
 gcc/config/aarch64/aarch64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f53c98e73765387974cc14f3d3ab4840a9331a08..4b9747b4c5e70432e900b4087eaefab6da6e162a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14131,8 +14131,8 @@ aarch64_override_options (void)
   if (selected_arch->arch != selected_cpu->arch)
 	{
 	  warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
-		   all_architectures[selected_cpu->arch].name,
-		   selected_arch->name);
+		   aarch64_cpu_string,
+		   aarch64_arch_string);
 	}
   aarch64_isa_flags = arch_isa;
   explicit_arch = selected_arch->arch;
-- 
2.17.1



Re: GLIBC libmvec status

2020-02-27 Thread Bill Schmidt



On 2/27/20 4:52 AM, Segher Boessenkool wrote:

On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:

The reason that homogeneous aggregates matter (at least somewhat) is that
the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
mangling formula that includes the length of parameters and return values.
Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
has a superior one, we chose to use ELFv2's calling convention and make use
of homogeneous aggregates for return values in registers for the case of
vectorized sincos.

Please look at the document to see the constraints we're under to fit into
the different OpenMP clauses and attributes.  It seems to me that we can
only define this for both powerpc64 and powerpc64le by establishing two
different calling conventions, which provides two different vector length
calculations for the sincos return value, and therefore requires two
different function implementations with different mangled names.  (Either
that, or we cripple vectorized sincos by requiring it to return values
through memory.)

I still don't see it.  For all ABIs the length of the arguments and
return value is the same, and homogeneous aggregates doesn't factor
in at all; that is just a detail whether something is passed in
registers or memory (as we have with many other ABIs as well, fwiw).

So why make this part of the mangling rules?

It is perfectly fine to design this with ELFv2 in mind, of course, but
making a dependency on the (current!) (very complex!) ELFv2 rules for
absolutely no reason at all is a mistake, in my opinion.


Upon reflection, I agree.  Bert, we need to make changes to the document to
reflect this:

(1) "Calling convention" should refer to ELFv1 for powerpc64 and ELFv2 for
powerpc64le.
(2) "Vector Length" should remove bullet 3, strike the word
"nonhomogeneous" in bullet 4, and strike the parenthetical clause in
bullet 4.
(3) "Ordering of Vector Arguments" should remove the example involving
homogeneous aggregates.

It also occurs to me that for bullets 4 and 5 in "Vector Length", the
CDT should be long long, not int, since we pass aggregates in pieces in
64-bit registers and/or chunks of memory.

Other small bugs:
 - Bullet 4 says "the CDT determine by a) or b) above", but the referents
should be "(1) or (2)" instead.
 - First line of "Compiler generated variants of vector functions" has
a typo ("umasked").

Segher, thanks for smacking my recalcitrant head until it understands...

Thanks,
Bill




Segher


Re: [PATCH, v3] wwwdocs: e-mail subject lines for contributions

2020-02-27 Thread Nathan Sidwell

On 2/3/20 6:41 AM, Richard Earnshaw (lists) wrote:

On 22/01/2020 17:45, Richard Earnshaw (lists) wrote:


[updated based on v2 discussions]

This patch proposes some new (additional) rules for email subject lines
when contributing to GCC.  The goal is to make sure that, as far as
possible, the subject for a patch will form a good summary when the
message is committed to the repository if applied with 'git am'.  Where
possible, I've tried to align these rules with those already in
use for glibc, so that the differences are minimal and only where
necessary.

Some things that differ from existing practice (at least by some people)
are:

- Use ':' rather than '[]'
   - This is more git friendly and works with 'git am'.
- Put bug numbers at the end of the line rather than the beginning.
   - The bug number is useful, but not as useful as the brief summary.
 Also, use the shortened form, as the topic part is more usefully
 conveyed in the proper topic field (see above).


I've not seen any follow-up to this version.  Should we go ahead and 
adopt this?


do it!

do it! do it! do it!

nathan
--
Nathan Sidwell


[PATCH] Improvements to valid range checks in debug mode

2020-02-27 Thread Jonathan Wakely

These should wait for stage 1 but I'm posting them now for comment.

With the change to __gnu_debug::__valid_range we now get a debug
assertion for:

  std::string s;
  std::min_element(std::string::iterator{}, s.end());

where previously it would just crash with undefined behaviour.

commit 77a610b7e88635ee7c63d82cc30fad9c80abebea
Author: Jonathan Wakely 
Date:   Thu Feb 27 11:17:31 2020 +

libstdc++: Minor optimization for min/max/minmax

By calling the internal __min_element (or __max_element or
__minmax_element) function directly we avoid a function call and the
valid range checks that are redundant when the range is defined by an
initializer_list.

* include/bits/stl_algo.h (min(initializer_list))
(min(initializer_list, Compare)): Call __min_element directly to
avoid redundant debug checks for valid ranges.
(max(initializer_list), max(initializer_list, Compare)):
Likewise, for __max_element.
(minmax(initializer_list), minmax(initializer_list, Compare)):
Likewise, for __minmax_element.

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 6503d1518d3..d5eed9c47f6 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3543,38 +3543,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    __gnu_cxx::__ops::__iter_comp_iter(__comp));
 }
 
-  // N2722 + DR 915.
-  template
-_GLIBCXX14_CONSTEXPR
-inline _Tp
-min(initializer_list<_Tp> __l)
-{ return *std::min_element(__l.begin(), __l.end()); }
-
-  template
-_GLIBCXX14_CONSTEXPR
-inline _Tp
-min(initializer_list<_Tp> __l, _Compare __comp)
-{ return *std::min_element(__l.begin(), __l.end(), __comp); }
-
-  template
-_GLIBCXX14_CONSTEXPR
-inline _Tp
-max(initializer_list<_Tp> __l)
-{ return *std::max_element(__l.begin(), __l.end()); }
-
-  template
-_GLIBCXX14_CONSTEXPR
-inline _Tp
-max(initializer_list<_Tp> __l, _Compare __comp)
-{ return *std::max_element(__l.begin(), __l.end(), __comp); }
-
   template
 _GLIBCXX14_CONSTEXPR
 inline pair<_Tp, _Tp>
 minmax(initializer_list<_Tp> __l)
 {
+  __glibcxx_requires_irreflexive(__l.begin(), __l.end);
   pair __p =
-	std::minmax_element(__l.begin(), __l.end());
+	std::__minmax_element(__l.begin(), __l.end(),
+			  __gnu_cxx::__ops::__iter_less_iter());
   return std::make_pair(*__p.first, *__p.second);
 }
 
@@ -3583,8 +3560,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline pair<_Tp, _Tp>
 minmax(initializer_list<_Tp> __l, _Compare __comp)
 {
+  __glibcxx_requires_irreflexive_pred(__l.begin(), __l.end, __comp);
   pair __p =
-	std::minmax_element(__l.begin(), __l.end(), __comp);
+	std::__minmax_element(__l.begin(), __l.end(),
+			  __gnu_cxx::__ops::__iter_comp_iter(__comp));
   return std::make_pair(*__p.first, *__p.second);
 }
 
@@ -3959,7 +3938,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   for (_RandomAccessIterator __i = __first + 1; __i != __last; ++__i)
 	std::iter_swap(__i, __first + __d(__g, __p_type(0, __i - __first)));
 }
-#endif
+#endif // USE C99_STDINT
 
 #endif // C++11
 
@@ -5902,6 +5881,49 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 __gnu_cxx::__ops::__iter_comp_iter(__comp));
 }
 
+#if __cplusplus >= 201103L
+  // N2722 + DR 915.
+  template
+_GLIBCXX14_CONSTEXPR
+inline _Tp
+min(initializer_list<_Tp> __l)
+{
+  __glibcxx_requires_irreflexive(__l.begin(), __l.end);
+  return *_GLIBCXX_STD_A::__min_element(__l.begin(), __l.end(),
+	  __gnu_cxx::__ops::__iter_less_iter());
+}
+
+  template
+_GLIBCXX14_CONSTEXPR
+inline _Tp
+min(initializer_list<_Tp> __l, _Compare __comp)
+{
+  __glibcxx_requires_irreflexive_pred(__l.begin(), __l.end, __comp);
+  return *_GLIBCXX_STD_A::__min_element(__l.begin(), __l.end(),
+	  __gnu_cxx::__ops::__iter_comp_iter(__comp));
+}
+
+  template
+_GLIBCXX14_CONSTEXPR
+inline _Tp
+max(initializer_list<_Tp> __l)
+{
+  __glibcxx_requires_irreflexive(__l.begin(), __l.end);
+  return *_GLIBCXX_STD_A::__max_element(__l.begin(), __l.end(),
+	  __gnu_cxx::__ops::__iter_less_iter());
+}
+
+  template
+_GLIBCXX14_CONSTEXPR
+inline _Tp
+max(initializer_list<_Tp> __l, _Compare __comp)
+{
+  __glibcxx_requires_irreflexive_pred(__l.begin(), __l.end, __comp);
+  return *_GLIBCXX_STD_A::__max_element(__l.begin(), __l.end(),
+	  __gnu_cxx::__ops::__iter_comp_iter(__comp));
+}
+#endif // C++11
+
 #if __cplusplus >= 201402L
   /// Reservoir sampling algorithm.
   template
Date:   Thu Feb 27 11:20:54 2020 +

libstdc++: Improve check for valid forward iterator range

Since C++14 we can assume that value-initialized forward iterators are
not part of a valid range (except the special case of an empty range
defined by two value-initialized 

[committed] libstdc++: Support N3644 "Null Forward Iterators" for testsuite iterators

2020-02-27 Thread Jonathan Wakely
Comparing value-initialized forward_iterator_wrapper objects fails an
assertion, but should be valid in C++14 and later.

* testsuite/util/testsuite_iterators.h (forward_iterator_wrapper): Add
equality comparisons that support value-initialized iterators.

Tested powerpc64le-linux, committed to master.


commit e94f2542305ccb5c4a3c4e5e8212713747623417
Author: Jonathan Wakely 
Date:   Thu Feb 27 13:01:14 2020 +

libstdc++: Support N3644 "Null Forward Iterators" for testsuite iterators

Comparing value-initialized forward_iterator_wrapper objects fails an
assertion, but should be valid in C++14 and later.

* testsuite/util/testsuite_iterators.h (forward_iterator_wrapper): 
Add
equality comparisons that support value-initialized iterators.

diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index 7b7093919b7..417dff23c50 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -337,6 +337,26 @@ namespace __gnu_test
   ++*this;
   return tmp;
 }
+
+#if __cplusplus >= 201402L
+bool
+operator==(const forward_iterator_wrapper& it) const noexcept
+{
+  // Since C++14 value-initialized forward iterators are comparable.
+  if (this->SharedInfo == nullptr || it.SharedInfo == nullptr)
+   return this->SharedInfo == it.SharedInfo && this->ptr == it.ptr;
+
+  const input_iterator_wrapper& base_this = *this;
+  const input_iterator_wrapper& base_that = it;
+  return base_this == base_that;
+}
+
+bool
+operator!=(const forward_iterator_wrapper& it) const noexcept
+{
+  return !(*this == it);
+}
+#endif
   };
 
   /**


Re: [PATCH] tree-optimization/93508 - make VN translate through _chk and valueize length

2020-02-27 Thread Richard Biener
On Thu, 30 Jan 2020, Richard Biener wrote:

> 
> Value-numbering failed to handle __builtin_{memcpy,memset,...}_chk
> variants when removing abstraction and also failed to use the
> value-numbering lattice when requiring the length argument of the
> call to be constant.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for GCC 11
> unless somebody things we want this right now.

I decided to push this now in the wake of the VN changes going in.

Richard.


Re: patch to fix PR93564

2020-02-27 Thread Andrew Stubbs

On 26/02/2020 15:16, Andrew Stubbs wrote:
The problem appears to be that the high-part of a register pair is not 
marked as "ever live".  I'm trying to figure out whether this is some 
kind of target-specific issue that has merely been exposed, but it's 
difficult to see what's going on. I'm pretty sure I've never seen this 
one before.


I'm now pretty sure your patch didn't cause this issue so much as expose it.

Either way, it's fixed now.

Andrew



[committed] amdgcn: fix ICE on subreg of BI reg

2020-02-27 Thread Andrew Stubbs
This patch fixes an LRA ICE that was exposed by another patch on Sunday. 
I can't see any reason why that patch should cause an ICE, so presumably 
it merely perturbed something.


The problem was that LRA with checking enabled was confirming that all 
the registers it had allocated are considered "ever live", and found 
that the high-part of the VCC register was not live.


The reason was that it had created an instruction like this:

  (set (reg:SI s2)
   (subreg:SI (reg:BI VCC) 0))

This seems like it ought to be fine, except that gcn.c defines (reg:BI 
VCC) to have nregs == 2, whereas (reg:SI VCC) nregs == 1.  The checking 
code uses nregs from the inner register mode, and DF uses nregs from the 
outer subreg mode, and the mismatch causes the ICE.


Using 64-bits for a BImode register is unusual but makes sense because 
instructions writing BImode condition codes to VCC will normally clobber 
the entire DImode register pair, whereas SImode register modes only 
touch one of the two registers.


The solution is to transform the instruction like this:

  (set (subreg:BI (reg:SI s2) 0)
   (reg:BI VCC))

This says approximately the same thing, but now nregs is firmly "2", and 
the ICE goes away.


Andrew
amdgcn: fix ICE on subreg of BI reg.

BImode usually only requires one bit, but instructions that write to VCC also
clobber the reset of the DImode register pair, so gcn_class_max_nregs reports
that two registers are needed for BImode.  Paradoxically, accessing VCC via
SImode is therefore uses fewer registers than accessing via BImode.

The LRA checking code takes this into account, but the DF liveness data also
looks at the subreg, so it says (subreg:SI (reg:BI VCC) 0) only makes the low
part live.  Both are "correct", but they disagree, which causes an ICE.

This doesn't happen when writing conditions to VCC; it happens when accessing
VCC_LO via a regular move to a regular SImode register.

If we transform the subregs so that BImode is always the outer mode then it
basically means the same thing, except that now both LRA and DF calculate nregs
the same, and ICE goes away.

As soon as LRA is done the subregs all evaporate anyway.

2020-02-27  Andrew Stubbs  

	gcc/
	* config/gcn/gcn.md (mov): Add transformations for BI subregs.

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index b527d9a7a8b..d8b49dfd640 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -395,6 +395,31 @@
 	(match_operand:MOV_MODE 1 "general_operand"))]
   ""
   {
+if (SUBREG_P (operands[1])
+	&& GET_MODE (operands[1]) == SImode
+	&& GET_MODE (SUBREG_REG (operands[1])) == BImode)
+{
+  /* (reg:BI VCC) has nregs==2 to ensure it gets clobbered as a whole,
+	 but (subreg:SI (reg:BI VCC)) doesn't, which causes the LRA liveness
+	 checks to assert.  Transform this:
+	   (set (reg:SI) (subreg:SI (reg:BI)))
+	 to this:
+	   (set (subreg:BI (reg:SI)) (reg:BI))  */
+  operands[0] = gen_rtx_SUBREG (BImode, operands[0], 0);
+  operands[1] = SUBREG_REG (operands[1]);
+}
+if (SUBREG_P (operands[0])
+	&& GET_MODE (operands[0]) == SImode
+	&& GET_MODE (SUBREG_REG (operands[0])) == BImode)
+  {
+	/* Likewise, transform this:
+	 (set (subreg:SI (reg:BI)) (reg:SI))
+	   to this:
+	 (set (reg:BI) (subreg:BI (reg:SI))) */
+	operands[0] = SUBREG_REG (operands[0]);
+	operands[1] = gen_rtx_SUBREG (BImode, operands[1], 0);
+  }
+
 if (MEM_P (operands[0]))
   operands[1] = force_reg (mode, operands[1]);
 


Re: [PATCH Coroutines]Insert the default return_void call at correct position

2020-02-27 Thread Nathan Sidwell

On 2/3/20 12:55 AM, bin.cheng wrote:

Hi,

Exception in coroutine is not correctly handled because the default
return_void call is now inserted before the finish suspend point,
rather than at the end of the original coroutine body.  This patch
fixes the issue by generating following code:
   co_await promise.initial_suspend();
   try {
 // The original coroutine body

 promise.return_void(); // The default return_void call.
   } catch (...) {
 promise.unhandled_exception();
   }
   final_suspend:
   // ...

Bootstrap and test on x86_64.  Is it OK?

Thanks,
bin

gcc/cp
2020-02-03  Bin Cheng  

 * coroutines.cc (build_actor_fn): Factor out code inserting the
 default return_void call to...
 (morph_fn_to_coro): ...here, also hoist local var declarations.

gcc/testsuite
2020-02-03  Bin Cheng  

 * g++.dg/coroutines/torture/co-ret-15-default-return_void.C: New.


ok, thanks!

nathan

--
Nathan Sidwell


[Patch, fortran] PR fortran/93957 - [10 Regression] ICE (regression) passing assumed rank arrays with bind(c)

2020-02-27 Thread José Rui Faustino de Sousa

Hi all!

Proposed patch to solve ICE.

Patch tested only on x86_64-pc-linux-gnu.

The code currently calls gfc_trans_deferred_array even when it is not 
necessary triggering an assertion error inside gfc_trans_deferred_array.


Please notice the addition of "sym->ts.type == BT_CLASS" to the 
definition of "alloc_comp_or_fini". Instead of only accepting BT_DERIVED 
it will now also accept BT_CLASS types. It seems to be missing but I may 
be wrong.


Thank you very much.

Best regards,
José Rui

2020-2-27  José Rui Faustino de Sousa  

 PR fortran/93957
 * trans-decl.c (gfc_trans_deferred_vars): Change definition of
 alloc_comp_or_fini logical variable to also accept class type.
 Add if clause guarding the call to gfc_trans_deferred_array.

2020-2-27  José Rui Faustino de Sousa  

 PR fortran/92621
 * PR93957.f90: New test.


diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index e91a279..822cb3e 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -4645,7 +4645,7 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, 
gfc_wrapped_block * block)


   for (sym = proc_sym->tlink; sym != proc_sym; sym = sym->tlink)
 {
-  bool alloc_comp_or_fini = (sym->ts.type == BT_DERIVED)
+  bool alloc_comp_or_fini = (sym->ts.type == BT_DERIVED || 
sym->ts.type == BT_CLASS)

&& (sym->ts.u.derived->attr.alloc_comp
|| gfc_is_finalizable (sym->ts.u.derived,
   NULL));
@@ -4859,8 +4859,11 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, 
gfc_wrapped_block * block)


case AS_ASSUMED_RANK:
case AS_DEFERRED:
- seen_trans_deferred_array = true;
- gfc_trans_deferred_array (sym, block);
+	  if (sym->attr.pointer || sym->attr.allocatable || 
alloc_comp_or_fini)

+   {
+ seen_trans_deferred_array = true;
+ gfc_trans_deferred_array (sym, block);
+   }
  if (sym->ts.type == BT_CHARACTER && sym->ts.deferred
  && sym->attr.result)
{
diff --git a/gcc/testsuite/gfortran.dg/PR93957.f90 
b/gcc/testsuite/gfortran.dg/PR93957.f90

new file mode 100644
index 000..c403e15
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR93957.f90
@@ -0,0 +1,39 @@
+! { dg-do run }
+!
+! PR fortran/93957
+!
+
+function f_ice(this) result(that) bind(c)
+  use, intrinsic :: iso_c_binding, only: c_int
+
+  implicit none
+
+  integer(kind=c_int), intent(in) :: this(..)
+  integer(kind=c_int) :: that
+
+  that = size(this)
+  return
+end function f_ice
+
+program ice_p
+
+  use, intrinsic :: iso_c_binding, only: c_int
+
+  implicit none
+
+  interface
+function f_ice(this) result(that) bind(c)
+  use, intrinsic :: iso_c_binding, only: c_int
+  integer(kind=c_int), intent(in) :: this(..)
+  integer(kind=c_int) :: that
+end function f_ice
+  end interface
+
+  integer(kind=c_int), parameter :: n = 10
+
+  integer(kind=c_int) :: intp(n)
+
+  if(size(intp)/=n)  stop 1
+  if(f_ice(intp)/=n) stop 2
+
+end program ice_p


Re: GLIBC libmvec status

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 11:56:49AM +0100, Richard Biener wrote:
> > > This calling convention would also be useful in the future for vectorizing
> > > functions that return complex values either by value or by reference.
> >
> > Only by value, you really don't know what the code does if something is
> > passed by reference, whether it is read, written into, or both etc.
> > And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
> > pass them, just GCC isn't able to do that right now.
> 
> Ah, ok.  So what's missing is the standard function cexpi both GCC and
> libmvec can use.

That, plus adjust omp-simd-clone.c and the backends so that they do support
the complex modes and essentially transform those into passing/returning of
either vector of the complex elts with twice as many subparts, or twice as
many vectors, like e.g. the Intel ABI specifies.  E.g. for return type
adjustment, right now we have:
  t = TREE_TYPE (TREE_TYPE (fndecl));
  if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
veclen = node->simdclone->vecsize_int;
  else
veclen = node->simdclone->vecsize_float;
  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
  if (veclen > node->simdclone->simdlen)
veclen = node->simdclone->simdlen;
  if (POINTER_TYPE_P (t))
t = pointer_sized_int_node;
  if (veclen == node->simdclone->simdlen)
t = build_vector_type (t, node->simdclone->simdlen);
  else
{
  t = build_vector_type (t, veclen);
  t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
}
and we'd need to deal with the complex types accordingly.
And of course then to teach the vectorizer.

The Intel ABI e.g. for SSE2 (their 'x' letter, which roughly matches our 'b'
letter) they have:
sizeof  VLEN=2  VLEN=4  VLEN=8  VLEN=16
float   4   1*MS128 1*MS128 2*MS128 4*MS128
double  8   1*MD128 2*MD128 4*MD128 8*MD128
float
complex 8   1*MS128 2*MS128 4*MS128 8*MS128
double
complex 16  2*MD128 4*MD128 8*MD128 16*MD128
where MS128 is __m128 and MD128 __m128d, i.e. float
__attribute__((vector_size (16))) and double __attribute__((vector_size (16))).

I'll need to check ICC on godbolt how they actually pass the complex,
whether it is real0 imag0 real1 imag1 real2 imag2 real3 imag3 or
real0 real1 real2 real3 imag0 imag1 imag2 imag3.

Jakub



Re: [GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-02-27 Thread Kyrill Tkachov

Hi Mihail,

On 2/27/20 10:27 AM, Mihail Ionescu wrote:

Hi,

This patch adds support for the bf16 vector create, get, set,
duplicate and reinterpret intrinsics.
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-02-27  Mihail Ionescu  

    * (__ARM_NUM_LANES, __arm_lane, __arm_lane_q): Move to the
    beginning of the file.
    (vcreate_bf16, vcombine_bf16): New.
    (vdup_n_bf16, vdupq_n_bf16): New.
    (vdup_lane_bf16, vdup_laneq_bf16): New.
    (vdupq_lane_bf16, vdupq_laneq_bf16): New.
    (vduph_lane_bf16, vduph_laneq_bf16): New.
    (vset_lane_bf16, vsetq_lane_bf16): New.
    (vget_lane_bf16, vgetq_lane_bf16): New.
    (vget_high_bf16, vget_low_bf16): New.
    (vreinterpret_bf16_u8, vreinterpretq_bf16_u8): New.
    (vreinterpret_bf16_u16, vreinterpretq_bf16_u16): New.
    (vreinterpret_bf16_u32, vreinterpretq_bf16_u32): New.
    (vreinterpret_bf16_u64, vreinterpretq_bf16_u64): New.
    (vreinterpret_bf16_s8, vreinterpretq_bf16_s8): New.
    (vreinterpret_bf16_s16, vreinterpretq_bf16_s16): New.
    (vreinterpret_bf16_s32, vreinterpretq_bf16_s32): New.
    (vreinterpret_bf16_s64, vreinterpretq_bf16_s64): New.
    (vreinterpret_bf16_p8, vreinterpretq_bf16_p8): New.
    (vreinterpret_bf16_p16, vreinterpretq_bf16_p16): New.
    (vreinterpret_bf16_p64, vreinterpretq_bf16_p64): New.
    (vreinterpret_bf16_f32, vreinterpretq_bf16_f32): New.
    (vreinterpret_bf16_f64, vreinterpretq_bf16_f64): New.
    (vreinterpretq_bf16_p128): New.
    (vreinterpret_s8_bf16, vreinterpretq_s8_bf16): New.
    (vreinterpret_s16_bf16, vreinterpretq_s16_bf16): New.
    (vreinterpret_s32_bf16, vreinterpretq_s32_bf16): New.
    (vreinterpret_s64_bf16, vreinterpretq_s64_bf16): New.
    (vreinterpret_u8_bf16, vreinterpretq_u8_bf16): New.
    (vreinterpret_u16_bf16, vreinterpretq_u16_bf16): New.
    (vreinterpret_u32_bf16, vreinterpretq_u32_bf16): New.
    (vreinterpret_u64_bf16, vreinterpretq_u64_bf16): New.
    (vreinterpret_p8_bf16, vreinterpretq_p8_bf16): New.
    (vreinterpret_p16_bf16, vreinterpretq_p16_bf16): New.
    (vreinterpret_p64_bf16, vreinterpretq_p64_bf16): New.
    (vreinterpret_f32_bf16, vreinterpretq_f32_bf16): New.
    (vreinterpretq_p128_bf16): New.
    * config/arm/arm_neon_builtins.def (VDX): Add V4BF.
    (V_elem): Likewise.
    (V_elem_l): Likewise.
    (VD_LANE): Likewise.
    (VQX) Add V8BF.
    (V_DOUBLE): Likewise.
    (VDQX): Add V4BF and V8BF.
    (V_two_elem, V_three_elem, V_four_elem): Likewise.
    (V_reg): Likewise.
    (V_HALF): Likewise.
    (V_double_vector_mode): Likewise.
    (V_cmp_result): Likewise.
    (V_uf_sclr): Likewise.
    (V_sz_elem): Likewise.
    (Is_d_reg): Likewise.
    (V_mode_nunits): Likewise.
    * config/arm/neon.md (neon_vdup_lane): Enable for BFloat.

gcc/testsuite/ChangeLog:

2020-02-27  Mihail Ionescu  

    * gcc.target/arm/bf16_dup.c: New test.
    * gcc.target/arm/bf16_reinterpret.c: Likewise.

Is it ok for trunk?


This looks mostly ok with a few nits...




Regards,
Mihail


### Attachment also inlined for ease of reply    
###



diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,15 @@ extern "C" {
 #include 
 #include 

+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
+#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 
- 1))

+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
+#endif
+
 typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
@@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   /* For big-endian, GCC's vector indices are reversed within each 64
  bits compared to the architectural lane indices used by Neon
  intrinsics.  */



Please move this comment as well.



-#ifdef __ARM_BIG_ENDIAN
-#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
-#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
-#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 
- 1))

-#else
-#define __arm_lane(__vec, __idx) __idx
-#define __arm_laneq(__vec, __idx) __idx
-#endif

 #define vget_lane_f16(__v, __idx)   \
__extension__ \
@@ -14476,6 +14477,15 @@ vreinterpret_p16_u32 (uint32x2_t __a)
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined 
(__ARM_FP16_FORMAT_ALTERNATIVE)

 __extension__ extern 

Re: GLIBC libmvec status

2020-02-27 Thread Richard Biener
On Wed, Feb 26, 2020 at 3:31 PM Jakub Jelinek  wrote:
>
> On Wed, Feb 26, 2020 at 07:55:53AM -0600, Bill Schmidt wrote:
> > The hope is that we can create a vectorized version that returns values
> > in registers rather than the by-ref parameters, and add code to GCC to
> > copy things around correctly following the call.  Ideally the signature of
> > the vectorized version would be sth like
> >
> >   struct retval {vector double, vector double};
> >   retval vecsincos (vector double);
> >
> > In the typical case where calls to sincos are of the form
> >
> >   sincos (val[i], [i], [i]);
> >
> > this would allow us to only store the values in the caller upon return,
> > rather than store them in the callee and potentially reload them
> > immediately in the caller.  On some Power CPUs, the latter behavior can
> > result in somewhat costly stalls if the consecutive accesses hit a timing
> > window.
>
> But can't you do
> #pragma omp declare simd linear(sinp, cosp)
> void sincos (double x, double *sinp, double *cosp);
> ?
> That is something the vectorizer code could handle and for
>   for (int i = 0; i < 1024; i++)
> sincos (val[i], [i], [i]);
> just vectorize it as
>   for (int i = 0; i < 1024; i += vf)
> _ZGVbN8vl8l8_sincos (*(vector double *)[i], [i], [i]);
> Anything else will need specialized code to handle sincos specially in the
> vectorizer.

I guess we'll need special code in the vectorizer anyway because in
GIMPLE we'll have

  for (int i = 0; i < 1024; i++)
   {
  _Complex double tem = __builtin_cexpi (val[i]);
  sinval[i] = __real tem;
  cosval[i] = __imag tem;
   }

we'd have to promote tem back to memory and the call to
sincos (val[i], &__real tem, &__imag tem) virtually or
explicitely.  The vectorizer is currently not happy seeing
_Complex (but dataref analysis would not be happy to see
sincos).  So we do need changes to the vectorizer.

> > If you feel it isn't possible to do this, then we can abandon it.  Right
> > now my understanding is that GCC doesn't vectorize calls to sincos yet
> > for any targets, so it would be moot except that we really should define
> > what happens for the future.
> >
> > This calling convention would also be useful in the future for vectorizing
> > functions that return complex values either by value or by reference.
>
> Only by value, you really don't know what the code does if something is
> passed by reference, whether it is read, written into, or both etc.
> And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
> pass them, just GCC isn't able to do that right now.

Ah, ok.  So what's missing is the standard function cexpi both GCC and
libmvec can use.

> > Well, as a matter of practicality, we don't have any of that implemented
> > in the rs6000 back end, and we don't have any free resources to do that
> > in GCC 11.  Is there any documentation about what needs to be done to
> > support this?  I've always been under the impression that vectorizing for
> > masking when there isn't any hardware support is a losing proposition, so
> > we've not investigated it.
>
> You don't need to do pretty much anything, except set
> clonei->mask_mode = VOIDmode, I think the generic code should handle that
> everything beyond that, in particular add the mask argument and use it
> both on the caller side and on the expansion of the to be vectorized clone.
>
> Jakub
>


[PATCH] libstdc++: Make _GLIBCXX_CONCEPT_CHECKS more constexpr-friendly

2020-02-27 Thread Jonathan Wakely
Although most of the old-style "concept checks" are only really usable
with C++98 because they enforce the wrong things, this is a simple
change that makes them a bit more useful for C++14 and up.

* include/bits/boost_concept_check.h (__function_requires): Add
_GLIBCXX14_CONSTEXPR.
* testsuite/25_algorithms/min/concept_checks.cc: New test.

Tested powerpc64le-linux. These checks aren't enabled by default so
this is safe to change for stage4. Committed to master.

commit eb8e6a30a442c4c12dc903d6e1817b223bbed4a3
Author: Jonathan Wakely 
Date:   Thu Feb 27 10:52:28 2020 +

libstdc++: Make _GLIBCXX_CONCEPT_CHECKS more constexpr-friendly

Although most of the old-style "concept checks" are only really usable
with C++98 because they enforce the wrong things, this is a simple
change that makes them a bit more useful for C++14 and up.

* include/bits/boost_concept_check.h (__function_requires): Add
_GLIBCXX14_CONSTEXPR.
* testsuite/25_algorithms/min/concept_checks.cc: New test.

diff --git a/libstdc++-v3/include/bits/boost_concept_check.h 
b/libstdc++-v3/include/bits/boost_concept_check.h
index a555b3be568..f12c1bdc213 100644
--- a/libstdc++-v3/include/bits/boost_concept_check.h
+++ b/libstdc++-v3/include/bits/boost_concept_check.h
@@ -57,7 +57,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // as possible at runtime, use as few resources as possible, and hopefully
 // be elided out of existence... hmmm.
 template 
-inline void __function_requires()
+_GLIBCXX14_CONSTEXPR inline void __function_requires()
 {
   void (_Concept::*__x)() _IsUnused = &_Concept::__constraints;
 }
diff --git a/libstdc++-v3/testsuite/25_algorithms/min/concept_checks.cc 
b/libstdc++-v3/testsuite/25_algorithms/min/concept_checks.cc
new file mode 100644
index 000..ba61ceff340
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/min/concept_checks.cc
@@ -0,0 +1,23 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++14 } }
+
+#define _GLIBCXX_CONCEPT_CHECKS 1
+#include 
+
+constexpr int i = std::min(1, 2);


Re: GLIBC libmvec status

2020-02-27 Thread Segher Boessenkool
On Tue, Feb 25, 2020 at 07:43:09PM -0600, Bill Schmidt wrote:
> The reason that homogeneous aggregates matter (at least somewhat) is that
> the ABI ^H^H^H^HAPI requires establishing a calling convention and a name-
> mangling formula that includes the length of parameters and return values.
> Since ELFv2 and ELFv1 do not have the same calling convention, and ELFv2
> has a superior one, we chose to use ELFv2's calling convention and make use
> of homogeneous aggregates for return values in registers for the case of
> vectorized sincos.
> 
> Please look at the document to see the constraints we're under to fit into
> the different OpenMP clauses and attributes.  It seems to me that we can
> only define this for both powerpc64 and powerpc64le by establishing two
> different calling conventions, which provides two different vector length
> calculations for the sincos return value, and therefore requires two
> different function implementations with different mangled names.  (Either
> that, or we cripple vectorized sincos by requiring it to return values
> through memory.)

I still don't see it.  For all ABIs the length of the arguments and
return value is the same, and homogeneous aggregates doesn't factor
in at all; that is just a detail whether something is passed in
registers or memory (as we have with many other ABIs as well, fwiw).

So why make this part of the mangling rules?

It is perfectly fine to design this with ELFv2 in mind, of course, but
making a dependency on the (current!) (very complex!) ELFv2 rules for
absolutely no reason at all is a mistake, in my opinion.


Segher


[PATCH] tree-optimization/93953 - avoid reference into hash-map

2020-02-27 Thread Richard Biener
When possibly expanding a hash-map avoid keeping a reference to an
entry.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2020-02-27  Richard Biener  

PR tree-optimization/93953
* tree-vect-slp.c (slp_copy_subtree): Avoid keeping a reference
to the hash-map entry.

* gcc.dg/pr93953.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr93953.c | 17 +
 gcc/tree-vect-slp.c|  7 ---
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr93953.c

diff --git a/gcc/testsuite/gcc.dg/pr93953.c b/gcc/testsuite/gcc.dg/pr93953.c
new file mode 100644
index 000..bf85c146cd9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr93953.c
@@ -0,0 +1,17 @@
+/* PR tree-optimization/93953 */
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=ggc-min-expand=0 --param=ggc-min-heapsize=0" } */
+
+int *b, c, e;
+float d, g, f;
+
+void
+foo (int l)
+{
+  for (; l; ++l)
+{
+  float a = g > l;
+  d += a * b[4 * (l + c * e)];
+  f += a * b[4 * (l + c * e) + 1];
+}
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c7ddd94b39f..9d17e3386fa 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1764,11 +1764,12 @@ slp_copy_subtree (slp_tree node, hash_map )
   unsigned i;
 
   bool existed_p;
-  slp_tree  = map.get_or_insert (node, _p);
+  slp_tree _ref = map.get_or_insert (node, _p);
   if (existed_p)
-return copy;
+return copy_ref;
 
-  copy = XNEW (_slp_tree);
+  copy_ref = XNEW (_slp_tree);
+  slp_tree copy = copy_ref;
   memcpy (copy, node, sizeof (_slp_tree));
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 {
-- 
2.16.4


Re: [PATCH] fix -fdebug-prefix-map without gas .file support

2020-02-27 Thread Richard Biener
On Wed, 26 Feb 2020, Jason Merrill wrote:

> On 2/21/20 5:02 AM, Richard Biener wrote:
> > This applies file mapping when emitting the directory table
> > directly instead of using the assemblers .file directive where
> > we already correctly apply the map.  Notably the non-assembler
> > path is used for the early debug emission for LTO.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK for trunk?
> 
> OK.

I've pushed this variant which makes sure to apply -fdebug-prefix-map
and friends consistently for LTO in locations as well.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2020-02-26  Mark Williams  

* dwarf2out.c (file_name_acquire): Call remap_debug_filename.
* lto-opts.c (lto_write_options): Drop -fdebug-prefix-map,
-ffile-prefix-map and -fmacro-prefix-map.
* lto-streamer-out.c: Include file-prefix-map.h.
(lto_output_location): Remap the file part of locations.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 8da1ad053f6..38b16add568 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -12205,8 +12205,9 @@ file_name_acquire (dwarf_file_data **slot, 
file_name_acquire_data *fnad)
 
   fi = fnad->files + fnad->used_files++;
 
+  f = remap_debug_filename (d->filename);
+
   /* Skip all leading "./".  */
-  f = d->filename;
   while (f[0] == '.' && IS_DIR_SEPARATOR (f[1]))
 f += 2;
 
diff --git a/gcc/lto-opts.c b/gcc/lto-opts.c
index 87e916a2741..2512560cc6d 100644
--- a/gcc/lto-opts.c
+++ b/gcc/lto-opts.c
@@ -131,6 +131,9 @@ lto_write_options (void)
case OPT_SPECIAL_input_file:
case OPT_dumpdir:
case OPT_fresolution_:
+   case OPT_fdebug_prefix_map_:
+   case OPT_ffile_prefix_map_:
+   case OPT_fmacro_prefix_map_:
  continue;
 
default:
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 1faf31c0551..cea5e71cffb 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-offload.h"
 #include "print-tree.h"
 #include "tree-dfa.h"
+#include "file-prefix-map.h" /* remap_debug_filename()  */
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -200,7 +201,7 @@ lto_output_location (struct output_block *ob, struct 
bitpack_d *bp,
 
   if (ob->current_file != xloc.file)
 {
-  bp_pack_string (ob, bp, xloc.file, true);
+  bp_pack_string (ob, bp, remap_debug_filename (xloc.file), true);
   bp_pack_value (bp, xloc.sysp, 1);
 }
   ob->current_file = xloc.file;


Re: PowerPC Add has_arch_pwr* checks

2020-02-27 Thread Segher Boessenkool
Hi!

On Tue, Feb 25, 2020 at 05:02:06PM -0600, will schmidt wrote:
>This adds some procs to target-supports.exp that will allow
> our testcases to accurately determine which -mcpu= option
> is enabled at the time of our testcase compile.

Thanks!  I had to think about the names a bit (this will be
*everywhere*), but I think I like it quite well (it is clear, it is
short, other names that do not refer to the somewhat weird macro names
are not as clear what they actually mean; importantly, these effective
target names are clear they mean "at least this version", not "exactly
this version" (to people who know the _ARCH_PWRn macros, at least).

> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" { target 
> has_arch_pwr8 } } } */

Yeah, I like this :-)

> +# return 1 if our compiler returns the ARCH_PWR defines with the options
> +# as provided by the test.
> +proc check_effective_target_has_arch_pwr5 { } {
> + return [check_no_compiler_messages arch_pwr5 assembly {
> + #ifndef _ARCH_PWR5
> + #error does not have power5 support.
> + #else
> + /* "has power5 support" */
> + #endif
> + }]
> +}
> +proc check_effective_target_has_arch_pwr6 { } {

Please put empty lines between the functions.

With that, okay for trunk.  Thanks!


Segher


[GCC][PATCH][ARM] Add vreinterpret, vdup, vget and vset bfloat16 intrinsic

2020-02-27 Thread Mihail Ionescu
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
09297831cdcd6e695843c17b7724c114f3a129fe..5901a8f1fb84f204ae95f0ccc97bf5ae944c482c
 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,15 @@ extern "C" {
 #include 
 #include 
 
+#ifdef __ARM_BIG_ENDIAN
+#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
+#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
+#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1))
+#else
+#define __arm_lane(__vec, __idx) __idx
+#define __arm_laneq(__vec, __idx) __idx
+#endif
+
 typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
@@ -6147,14 +6156,6 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   /* For big-endian, GCC's vector indices are reversed within each 64
  bits compared to the architectural lane indices used by Neon
  intrinsics.  */
-#ifdef __ARM_BIG_ENDIAN
-#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0]))
-#define __arm_lane(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec) - 1))
-#define __arm_laneq(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1))
-#else
-#define __arm_lane(__vec, __idx) __idx
-#define __arm_laneq(__vec, __idx) __idx
-#endif
 
 #define vget_lane_f16(__v, __idx)  \
   __extension__\
@@ -14476,6 +14477,15 @@ vreinterpret_p16_u32 (uint32x2_t __a)
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vreinterpret_f16_bf16 (bfloat16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vreinterpret_f16_p8 (poly8x8_t __a)
 {
   return (float16x4_t) __a;
@@ -15688,6 +15698,15 @@ vreinterpretq_f16_p16 (poly16x8_t __a)
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vreinterpretq_f16_bf16 (bfloat16x8_t __a)
+{
+  return (float16x8_t) __a;
+}
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vreinterpretq_f16_f32 (float32x4_t __a)
 {
   return (float16x8_t) __a;
@@ -18750,6 +18769,492 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t 
__a, float32x4_t __b,
 #pragma GCC push_options
 #pragma GCC target ("arch=armv8.2-a+bf16")
 
+__extension__ extern __inline bfloat16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vcreate_bf16 (uint64_t __a)
+{
+  return (bfloat16x4_t) __a;
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vdup_n_bf16 (bfloat16_t __a)
+{
+  return __builtin_neon_vdup_nv4bf (__a);
+}
+
+__extension__ extern __inline bfloat16x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vdupq_n_bf16 (bfloat16_t __a)
+{
+  return __builtin_neon_vdup_nv8bf (__a);
+}
+
+__extension__ extern __inline bfloat16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vdup_lane_bf16 (bfloat16x4_t __a, const int __b)
+{
+  return __builtin_neon_vdup_lanev4bf (__a, __b);
+}
+
+__extension__ extern __inline bfloat16x8_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vdupq_lane_bf16 (bfloat16x4_t __a, const int __b)
+{
+  return __builtin_neon_vdup_lanev8bf (__a, __b);
+}
+
+#define vset_lane_bf16(__e, __v, __idx)\
+  __extension__\
+  ({   \
+bfloat16_t __elem = (__e); \
+bfloat16x4_t __vec = (__v);\
+__builtin_arm_lane_check (4, __idx);   \
+__vec[__arm_lane(__vec, __idx)] = __elem;  \
+__vec; \
+  })
+
+#define vsetq_lane_bf16(__e, __v, __idx)   \
+  __extension__\
+  ({   \
+bfloat16_t __elem = (__e); \
+bfloat16x8_t __vec = (__v);\
+__builtin_arm_lane_check (8, __idx);   \
+__vec[__arm_laneq(__vec, __idx)] = __elem; \
+__vec; \
+  })
+
+#define vget_lane_bf16(__v, __idx) \
+  __extension__\
+  ({   \
+bfloat16x4_t __vec = (__v);\
+__builtin_arm_lane_check (4, __idx);   

Re: [PATCH] sccvn: Punt on overflows during vn_reference_lookup_3

2020-02-27 Thread Richard Biener
On Thu, 27 Feb 2020, Jakub Jelinek wrote:

> On Thu, Feb 27, 2020 at 10:30:21AM +0100, Richard Biener wrote:
> > Obviously I don't like the repetitive boiler-plate after
> > the ranges_known_overlap_p checks so I wonder if we can at least
> > factor those into a VN-local ranges_known_overlap_for_pd_p predicate.
> 
> I can do that.

OK.

> > Wouldn't it be possible to simply constrain both sizes to half
> > of the address space?  Then the ranges_known_overlap_p should
> > guarantee that we can represent the difference between the
> > offsets in a signed HWI?
> 
> Well, it is already 1/8th of address space for 64-bit VA, so that would
> be 1/16th then.  Doing it in ranges_known_overlap_p might be too
> restrictive, other places might handle those fine.
> 
> Perhaps we should also rule out the case when pd.offset would be minimum,
> because we e.g. use -pd.offset, or when it or pd.size would be maximum
> (as we e.g. use r->size + 1).
> 
> I'm also a little bit worried about possible overflows in
>   r->size = MAX (r->offset + r->size, newr.offset + newr.size) - 
> r->offset;
> or
>   r->size = MAX (r->offset + r->size,
>  rafter->offset + rafter->size) - r->offset;
> While in ranges_known_overlap_for_pd_p we'd ensure that *->offset + *->size
> doesn't overflow, the subtraction still could.

Hmm, maybe.  So basically

memset (p - large, 0, even-larger);  [p - large, p + 1]
memset (p, 0, large);  [p, p + large]

and a read [p, p + 1] then we'll have very small (negative) r->offset
and rafter->offset + rafter->size will be very large and we'll get
up to SHWI_MAX - SWHI_MIN here.

Now the thing here would be to note that the [p, p + 1] lookup
is constraining what we need to track and we could prune partial-defs
easily (the uniform ones).  Of course that's extra code with possible
sources of bugs.  Or we'll bite the bullet and use widest_int
(or offset_int?) for pd_range/pd_data offset/size pairs.

Richard.


Re: [PATCH] sccvn: Punt on ref->size not multiple of 8 for memset (, 123, ) in 9.x [PR93945]

2020-02-27 Thread Richard Biener
On Thu, 27 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> And here is the corresponding 9.x change where we the patch just punts if
> ref->size is not whole bytes, like we already punt if offseti is not byte
> aligned.
> 
> Tested on x86_64-linux and powerpc64-linux, ok for 9.3?

OK.

Thanks,
Richard.

> 2020-02-27  Jakub Jelinek  
> 
>   PR tree-optimization/93945
>   * tree-ssa-sccvn.c (vn_reference_lookup_3): For memset with non-zero
>   second operand, require ref->size to be a multiple of BITS_PER_UNIT.
> 
>   * gcc.c-torture/execute/pr93945.c: New test.
> 
> --- gcc/tree-ssa-sccvn.c.jj   2020-01-12 12:17:01.031158921 +0100
> +++ gcc/tree-ssa-sccvn.c  2020-02-27 10:55:16.226236453 +0100
> @@ -2113,7 +2113,8 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>  || (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8)))
> && CHAR_BIT == 8 && BITS_PER_UNIT == 8
> && offset.is_constant ()
> -   && offseti % BITS_PER_UNIT == 0))
> +   && offseti % BITS_PER_UNIT == 0
> +   && multiple_p (ref->size, BITS_PER_UNIT)))
>&& poly_int_tree_p (gimple_call_arg (def_stmt, 2))
>&& (TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR
> || TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME))
> --- gcc/testsuite/gcc.c-torture/execute/pr93945.c.jj  2020-02-27 
> 10:54:21.234060635 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr93945.c 2020-02-27 
> 10:54:21.234060635 +0100
> @@ -0,0 +1,45 @@
> +/* PR tree-optimization/93945 */
> +
> +union U { char a[8]; struct S { unsigned int b : 8, c : 13, d : 11; } e; } u;
> +
> +__attribute__((noipa)) int
> +foo (void)
> +{
> +  __builtin_memset (, 0xf4, sizeof (u.a));
> +  return u.e.c;
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> +  asm volatile ("" : : "g" () : "memory");
> +  return u.e.c;
> +}
> +
> +__attribute__((noipa)) int
> +baz (void)
> +{
> +  __builtin_memset (, 0xf4, sizeof (u.a));
> +  return u.e.d;
> +}
> +
> +__attribute__((noipa)) int
> +qux (void)
> +{
> +  asm volatile ("" : : "g" () : "memory");
> +  return u.e.d;
> +}
> +
> +int
> +main ()
> +{
> +  int a = foo ();
> +  int b = bar ();
> +  if (a != b)
> +__builtin_abort ();
> +  a = baz ();
> +  b = qux ();
> +  if (a != b)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[PATCH] sccvn: Punt on ref->size not multiple of 8 for memset (, 123, ) in 9.x [PR93945]

2020-02-27 Thread Jakub Jelinek
Hi!

And here is the corresponding 9.x change where we the patch just punts if
ref->size is not whole bytes, like we already punt if offseti is not byte
aligned.

Tested on x86_64-linux and powerpc64-linux, ok for 9.3?

2020-02-27  Jakub Jelinek  

PR tree-optimization/93945
* tree-ssa-sccvn.c (vn_reference_lookup_3): For memset with non-zero
second operand, require ref->size to be a multiple of BITS_PER_UNIT.

* gcc.c-torture/execute/pr93945.c: New test.

--- gcc/tree-ssa-sccvn.c.jj 2020-01-12 12:17:01.031158921 +0100
+++ gcc/tree-ssa-sccvn.c2020-02-27 10:55:16.226236453 +0100
@@ -2113,7 +2113,8 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   || (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8)))
  && CHAR_BIT == 8 && BITS_PER_UNIT == 8
  && offset.is_constant ()
- && offseti % BITS_PER_UNIT == 0))
+ && offseti % BITS_PER_UNIT == 0
+ && multiple_p (ref->size, BITS_PER_UNIT)))
   && poly_int_tree_p (gimple_call_arg (def_stmt, 2))
   && (TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR
  || TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME))
--- gcc/testsuite/gcc.c-torture/execute/pr93945.c.jj2020-02-27 
10:54:21.234060635 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr93945.c   2020-02-27 
10:54:21.234060635 +0100
@@ -0,0 +1,45 @@
+/* PR tree-optimization/93945 */
+
+union U { char a[8]; struct S { unsigned int b : 8, c : 13, d : 11; } e; } u;
+
+__attribute__((noipa)) int
+foo (void)
+{
+  __builtin_memset (, 0xf4, sizeof (u.a));
+  return u.e.c;
+}
+
+__attribute__((noipa)) int
+bar (void)
+{
+  asm volatile ("" : : "g" () : "memory");
+  return u.e.c;
+}
+
+__attribute__((noipa)) int
+baz (void)
+{
+  __builtin_memset (, 0xf4, sizeof (u.a));
+  return u.e.d;
+}
+
+__attribute__((noipa)) int
+qux (void)
+{
+  asm volatile ("" : : "g" () : "memory");
+  return u.e.d;
+}
+
+int
+main ()
+{
+  int a = foo ();
+  int b = bar ();
+  if (a != b)
+__builtin_abort ();
+  a = baz ();
+  b = qux ();
+  if (a != b)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [PATCH] optabs: Don't use scalar conversions for vectors [PR93843]

2020-02-27 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Wed, Feb 26, 2020 at 11:43:10AM +, Richard Sandiford wrote:
>> In this PR we had a conversion between two integer vectors that
>> both had scalar integer modes.  We then tried to implement the
>> conversion using the scalar optab for those modes, instead of
>> doing the conversion elementwise.
>> 
>> I wondered about letting through scalar modes for single-element
>> vectors, but I don't have any evidence that that's useful/necessary,
>> so it seemed better to keep things simple.
>> 
>> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
>
> Won't this prevent even say __v4qi to __v4uqi and similar conversions
> with scalar modes for those where we don't need any kind of extensions,
> just reinterpret the bits?

Those kinds of conversions aren't accepted by the function anyway,
even for vector modes, since there's no associated optab that they
could use.  The vectoriser handles them using VCEs instead, via
vectorizable_assignment rather than vectorizable_conversion.

I guess for GCC 11 we could allow NOP_EXPR to be used for reinterpreting
signedness if that seems like the right thing to do.

Thanks,
Richard



Re: [PATCH] sccvn: Punt on overflows during vn_reference_lookup_3

2020-02-27 Thread Jakub Jelinek
On Thu, Feb 27, 2020 at 10:30:21AM +0100, Richard Biener wrote:
> Obviously I don't like the repetitive boiler-plate after
> the ranges_known_overlap_p checks so I wonder if we can at least
> factor those into a VN-local ranges_known_overlap_for_pd_p predicate.

I can do that.

> Wouldn't it be possible to simply constrain both sizes to half
> of the address space?  Then the ranges_known_overlap_p should
> guarantee that we can represent the difference between the
> offsets in a signed HWI?

Well, it is already 1/8th of address space for 64-bit VA, so that would
be 1/16th then.  Doing it in ranges_known_overlap_p might be too
restrictive, other places might handle those fine.

Perhaps we should also rule out the case when pd.offset would be minimum,
because we e.g. use -pd.offset, or when it or pd.size would be maximum
(as we e.g. use r->size + 1).

I'm also a little bit worried about possible overflows in
  r->size = MAX (r->offset + r->size, newr.offset + newr.size) - r->offset;
or
  r->size = MAX (r->offset + r->size,
 rafter->offset + rafter->size) - r->offset;
While in ranges_known_overlap_for_pd_p we'd ensure that *->offset + *->size
doesn't overflow, the subtraction still could.

Jakub



Re: [PATCH] gimplify: Don't optimize register const vars to static [PR93949]

2020-02-27 Thread Richard Biener
On Thu, 27 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is rejected, while it was accepted in 3.4 and earlier
> (before tree-ssa merge).
> The problem is that we decide to promote the const variable to TREE_STATIC,
> but TREE_STATIC DECL_REGISTER VAR_DECLs may only be the global register vars
> and so assemble_variable/make_decl_rtl diagnoses it.
> 
> Either we do what the following patch does, where we could consider
> register as a hint the user doesn't want such optimization, because if
> something is forced static, it is not "register" anymore and register static
> is not valid in C either, or we could clear DECL_REGISTER instead, but would
> still need to punt at least on DECL_HARD_REGISTER cases.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-02-27  Jakub Jelinek  
> 
>   PR c/93949
>   * gimplify.c (gimplify_init_constructor): Don't promote readonly
>   DECL_REGISTER variables to TREE_STATIC.
> 
>   * gcc.c-torture/compile/pr93949.c: New test.
> 
> --- gcc/gimplify.c.jj 2020-02-25 13:54:02.087091120 +0100
> +++ gcc/gimplify.c2020-02-26 19:30:57.466490166 +0100
> @@ -4923,6 +4923,7 @@ gimplify_init_constructor (tree *expr_p,
>   && num_nonzero_elements > 1
>   && TREE_READONLY (object)
>   && VAR_P (object)
> + && !DECL_REGISTER (object)
>   && (flag_merge_constants >= 2 || !TREE_ADDRESSABLE (object))
>   /* For ctors that have many repeated nonzero elements
>  represented through RANGE_EXPRs, prefer initializing
> --- gcc/testsuite/gcc.c-torture/compile/pr93949.c.jj  2020-02-26 
> 19:42:15.754530691 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr93949.c 2020-02-26 
> 19:42:08.153642329 +0100
> @@ -0,0 +1,7 @@
> +/* PR c/93949 */
> +
> +void
> +foo (void)
> +{
> +  register const double d[3] = { 0., 1., 2. };
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] sccvn: Punt on overflows during vn_reference_lookup_3

2020-02-27 Thread Richard Biener
On Thu, 27 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> I admit I don't have testcases, but I'm afraid very bad things will happen
> if either the offset2i - offseti or pd.offset + pd.size computations
> overflow.  All are computed in (signed) HOST_WIDE_INT, and at least for the
> memset or CONSTRUCTOR cases I'd fear the stores could be extremely large.
> 
> Or shall we introduce some system.h macros for this, say
> HWI_SADD_OVERFLOW_P and HWI_SSUB_OVERFLOW_P that could use
> __builtin_{add,sub}_overflow_p when available or do those
> comparison checks and use those here?
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux.

Obviously I don't like the repetitive boiler-plate after
the ranges_known_overlap_p checks so I wonder if we can at least
factor those into a VN-local ranges_known_overlap_for_pd_p predicate.

Wouldn't it be possible to simply constrain both sizes to half
of the address space?  Then the ranges_known_overlap_p should
guarantee that we can represent the difference between the
offsets in a signed HWI?

Richard.

> 2020-02-27  Jakub Jelinek  
> 
>   * tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Punt if
>   pd.offset + pd.size would overflow.
>   (vn_reference_lookup_3): Punt if offset2i - offseti would overflow.
> 
> --- gcc/tree-ssa-sccvn.c.jj   2020-02-26 13:38:01.899937521 +0100
> +++ gcc/tree-ssa-sccvn.c  2020-02-26 14:39:26.472695329 +0100
> @@ -1778,7 +1778,9 @@ vn_walk_cb_data::push_partial_def (const
>|| CHAR_BIT != 8
>|| BITS_PER_UNIT != 8
>/* Not prepared to handle PDP endian.  */
> -  || BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
> +  || BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
> +  /* Punt on overflows during pd.offset + pd.size computation.  */
> +  || pd.offset > INTTYPE_MAXIMUM (HOST_WIDE_INT) - pd.size)
>  return (void *)-1;
>  
>bool pd_constant_p = (TREE_CODE (pd.rhs) == CONSTRUCTOR
> @@ -2680,7 +2682,13 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>  && offset2.is_constant ()
>  && maxsize.is_constant ()
>  && ranges_known_overlap_p (offseti, maxsizei, offset2i,
> -   leni << LOG2_BITS_PER_UNIT))
> +   leni << LOG2_BITS_PER_UNIT)
> +/* Punt on overflows.  */
> +&& !((offseti > 0
> +  && offset2i < INTTYPE_MINIMUM (HOST_WIDE_INT) + offseti)
> + || (offseti < 0
> + && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
> ++ offseti
>   {
> pd_data pd;
> pd.rhs = build_constructor (NULL_TREE, NULL);
> @@ -2733,7 +2741,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>  && offset2.is_constant ()
>  && size2.is_constant ()
>  && ranges_known_overlap_p (offseti, maxsizei,
> -   offset2i, size2i))
> +   offset2i, size2i)
> +/* Punt on overflows.  */
> +&& !((offseti > 0
> +  && offset2i < (INTTYPE_MINIMUM (HOST_WIDE_INT)
> + + offseti))
> + || (offseti < 0
> + && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
> ++ offseti
>   {
> /* Let clobbers be consumed by the partial-def tracker
>which can choose to ignore them if they are shadowed
> @@ -2886,7 +2901,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>   }
>   }
> else if (ranges_known_overlap_p (offseti, maxsizei, offset2i,
> -size2i))
> +size2i)
> +/* Punt on overflows.  */
> +&& !((offseti > 0
> +  && offset2i < (INTTYPE_MINIMUM (HOST_WIDE_INT)
> + + offseti))
> + || (offseti < 0
> + && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
> ++ offseti
>   {
> pd_data pd;
> tree rhs = gimple_assign_rhs1 (def_stmt);
> @@ -2969,7 +2991,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>  && offset.is_constant ()
>  && offset2.is_constant ()
>  && size2.is_constant ()
> -&& ranges_known_overlap_p (offset, maxsize, offset2, size2))
> +&& ranges_known_overlap_p (offset, maxsize, offset2, size2)
> +/* Punt on overflows.  */
> +&& !((offseti > 0
> +  && offset2i < (INTTYPE_MINIMUM (HOST_WIDE_INT)
> + + offseti))
> + || (offseti < 0
> + && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
> + 

Re: [PATCH] sccvn: Handle non-byte aligned offset or size for memset (, 123, ) [PR93945]

2020-02-27 Thread Richard Biener
On Thu, 27 Feb 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following is the last spot in vn_reference_lookup_3 that didn't allow
> non-byte aligned offsets or sizes.  To be precise, it did allow size that
> wasn't multiple of byte size and that caused a wrong-code issue on
> big-endian, as the pr93945.c testcase shows, so for GCC 9 we should add
> && multiple_p (ref->size, BITS_PER_UNIT) check instead.
> For the memset with SSA_NAME middle-argument, it still requires byte-aligned
> offset, as we'd otherwise need to rotate the value at runtime.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-02-27  Jakub Jelinek  
> 
>   PR tree-optimization/93582
>   PR tree-optimization/93945
>   * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle memset with
>   non-zero INTEGER_CST second argument and ref->offset or ref->size
>   not a multiple of BITS_PER_UNIT.
> 
>   * gcc.dg/tree-ssa/pr93582-9.c: New test.
>   * gcc.c-torture/execute/pr93945.c: New test.
> 
> --- gcc/tree-ssa-sccvn.c.jj   2020-02-24 12:55:32.619143689 +0100
> +++ gcc/tree-ssa-sccvn.c  2020-02-26 13:38:01.899937521 +0100
> @@ -2386,7 +2386,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>vn_reference_t vr = data->vr;
>gimple *def_stmt = SSA_NAME_DEF_STMT (vuse);
>tree base = ao_ref_base (ref);
> -  HOST_WIDE_INT offseti, maxsizei;
> +  HOST_WIDE_INT offseti = 0, maxsizei, sizei = 0;
>static vec lhs_ops;
>ao_ref lhs_ref;
>bool lhs_ref_ok = false;
> @@ -2541,9 +2541,13 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>&& (integer_zerop (gimple_call_arg (def_stmt, 1))
> || ((TREE_CODE (gimple_call_arg (def_stmt, 1)) == INTEGER_CST
>  || (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8)))
> -   && CHAR_BIT == 8 && BITS_PER_UNIT == 8
> +   && CHAR_BIT == 8
> +   && BITS_PER_UNIT == 8
> +   && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
> && offset.is_constant ()
> -   && offseti % BITS_PER_UNIT == 0))
> +   && ref->size.is_constant ()
> +   && (offseti % BITS_PER_UNIT == 0
> +   || TREE_CODE (gimple_call_arg (def_stmt, 1)) == INTEGER_CST)))
>&& poly_int_tree_p (gimple_call_arg (def_stmt, 2))
>&& (TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR
> || TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME))
> @@ -2604,7 +2608,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>else
>   return (void *)-1;
>tree len = gimple_call_arg (def_stmt, 2);
> -  HOST_WIDE_INT leni, offset2i, offseti;
> +  HOST_WIDE_INT leni, offset2i;
>/* Sometimes the above trickery is smarter than alias analysis.  Take
>   advantage of that.  */
>if (!ranges_maybe_overlap_p (offset, maxsize, offset2,
> @@ -2618,7 +2622,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree
> tree val;
> if (integer_zerop (gimple_call_arg (def_stmt, 1)))
>   val = build_zero_cst (vr->type);
> -   else if (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8))
> +   else if (INTEGRAL_TYPE_P (vr->type)
> +&& known_eq (ref->size, 8)
> +&& offseti % BITS_PER_UNIT == 0)
>   {
> gimple_match_op res_op (gimple_match_cond::UNCOND, NOP_EXPR,
> vr->type, gimple_call_arg (def_stmt, 1));
> @@ -2630,10 +2636,34 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>   }
> else
>   {
> -   unsigned buflen = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (vr->type));
> +   unsigned buflen = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (vr->type)) + 
> 1;
> +   if (INTEGRAL_TYPE_P (vr->type))
> + buflen = GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (vr->type)) + 1;
> unsigned char *buf = XALLOCAVEC (unsigned char, buflen);
> memset (buf, TREE_INT_CST_LOW (gimple_call_arg (def_stmt, 1)),
> buflen);
> +   if (BYTES_BIG_ENDIAN)
> + {
> +   unsigned int amnt
> + = (((unsigned HOST_WIDE_INT) offseti + sizei)
> +% BITS_PER_UNIT);
> +   if (amnt)
> + {
> +   shift_bytes_in_array_right (buf, buflen,
> +   BITS_PER_UNIT - amnt);
> +   buf++;
> +   buflen--;
> + }
> + }
> +   else if (offseti % BITS_PER_UNIT != 0)
> + {
> +   unsigned int amnt
> + = BITS_PER_UNIT - ((unsigned HOST_WIDE_INT) offseti
> +% BITS_PER_UNIT);
> +   shift_bytes_in_array_left (buf, buflen, amnt);
> +   buf++;
> +   buflen--;
> + }
> val = native_interpret_expr (vr->type, buf, buflen);
> if (!val)
>   return (void *)-1;
> --- 

[PATCH] gimplify: Don't optimize register const vars to static [PR93949]

2020-02-27 Thread Jakub Jelinek
Hi!

The following testcase is rejected, while it was accepted in 3.4 and earlier
(before tree-ssa merge).
The problem is that we decide to promote the const variable to TREE_STATIC,
but TREE_STATIC DECL_REGISTER VAR_DECLs may only be the global register vars
and so assemble_variable/make_decl_rtl diagnoses it.

Either we do what the following patch does, where we could consider
register as a hint the user doesn't want such optimization, because if
something is forced static, it is not "register" anymore and register static
is not valid in C either, or we could clear DECL_REGISTER instead, but would
still need to punt at least on DECL_HARD_REGISTER cases.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-27  Jakub Jelinek  

PR c/93949
* gimplify.c (gimplify_init_constructor): Don't promote readonly
DECL_REGISTER variables to TREE_STATIC.

* gcc.c-torture/compile/pr93949.c: New test.

--- gcc/gimplify.c.jj   2020-02-25 13:54:02.087091120 +0100
+++ gcc/gimplify.c  2020-02-26 19:30:57.466490166 +0100
@@ -4923,6 +4923,7 @@ gimplify_init_constructor (tree *expr_p,
&& num_nonzero_elements > 1
&& TREE_READONLY (object)
&& VAR_P (object)
+   && !DECL_REGISTER (object)
&& (flag_merge_constants >= 2 || !TREE_ADDRESSABLE (object))
/* For ctors that have many repeated nonzero elements
   represented through RANGE_EXPRs, prefer initializing
--- gcc/testsuite/gcc.c-torture/compile/pr93949.c.jj2020-02-26 
19:42:15.754530691 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr93949.c   2020-02-26 
19:42:08.153642329 +0100
@@ -0,0 +1,7 @@
+/* PR c/93949 */
+
+void
+foo (void)
+{
+  register const double d[3] = { 0., 1., 2. };
+}

Jakub



Re: middle-end: Fix wrong code caused by disagreemed between FRE and access path oracle [PR 92152]

2020-02-27 Thread Richard Biener
On Wed, 26 Feb 2020, Jan Hubicka wrote:

> Hi,
> this is and TBAA stat for building cc1 with -flto-partition=none.
> 
> From:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 46099243 disambiguations, 55677716 queries
>   ref_maybe_used_by_call_p: 124351 disambiguations, 46883813 queries
>   call_may_clobber_ref_p: 12673 disambiguations, 17133 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 3803 queries
>   nonoverlapping_refs_since_match_p: 19034 disambiguations, 46849 must 
> overlaps, 67934 queries
>   aliasing_component_refs_p: 76737 disambiguations, 300234 queries
>   TBAA oracle: 15816119 disambiguations 39888339 queries
>12364426 are in alias set 0
>7655945 queries asked about the same object
>178 queries asked about the same alias set
>0 access volatile
>2963837 are dependent in the DAG
>1087834 are aritificially in conflict with void *
> 
> PTA query stats:
>   pt_solution_includes: 904096 disambiguations, 9062937 queries
>   pt_solutions_intersect: 853990 disambiguations, 10098128 queries
> 
> to:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 48168904 disambiguations, 57845554 queries
>   ref_maybe_used_by_call_p: 124062 disambiguations, 48953042 queries
>   call_may_clobber_ref_p: 12658 disambiguations, 17137 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 3312 queries
>   nonoverlapping_refs_since_match_p: 18997 disambiguations, 45778 must 
> overlaps, 67109 queries
>   aliasing_component_refs_p: 58756 disambiguations, 296126 queries
>   TBAA oracle: 16036749 disambiguations 40132907 queries
>12352609 are in alias set 0
>7697466 queries asked about the same object
>178 queries asked about the same alias set
>0 access volatile
>2964615 are dependent in the DAG
>1081290 are aritificially in conflict with void *
> 
> PTA query stats:
>   pt_solution_includes: 826579 disambiguations, 8987330 queries
>   pt_solutions_intersect: 841758 disambiguations, 10078495 queries
> 
> So aliasing_component_refs_p drops from 25% disambiguation rate to 19% which
> is quite noticeable. I will run SPEC benchmarks.

OTOH overall TBAA oracle goes from 39.95% to 39.65% only also overall
refs_may_alias_p disambiguation rate goes up!  So I'm not sure you
can compare those numbers since the set of queries in both is
different and possibly unrelated enough...

Different early opt and thus different partitioning/inlining might
also lead to a not meaningful comparison.

Richard.


[PATCH] sccvn: Punt on overflows during vn_reference_lookup_3

2020-02-27 Thread Jakub Jelinek
Hi!

I admit I don't have testcases, but I'm afraid very bad things will happen
if either the offset2i - offseti or pd.offset + pd.size computations
overflow.  All are computed in (signed) HOST_WIDE_INT, and at least for the
memset or CONSTRUCTOR cases I'd fear the stores could be extremely large.

Or shall we introduce some system.h macros for this, say
HWI_SADD_OVERFLOW_P and HWI_SSUB_OVERFLOW_P that could use
__builtin_{add,sub}_overflow_p when available or do those
comparison checks and use those here?

Bootstrapped/regtested on x86_64-linux and i686-linux.

2020-02-27  Jakub Jelinek  

* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Punt if
pd.offset + pd.size would overflow.
(vn_reference_lookup_3): Punt if offset2i - offseti would overflow.

--- gcc/tree-ssa-sccvn.c.jj 2020-02-26 13:38:01.899937521 +0100
+++ gcc/tree-ssa-sccvn.c2020-02-26 14:39:26.472695329 +0100
@@ -1778,7 +1778,9 @@ vn_walk_cb_data::push_partial_def (const
   || CHAR_BIT != 8
   || BITS_PER_UNIT != 8
   /* Not prepared to handle PDP endian.  */
-  || BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
+  || BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
+  /* Punt on overflows during pd.offset + pd.size computation.  */
+  || pd.offset > INTTYPE_MAXIMUM (HOST_WIDE_INT) - pd.size)
 return (void *)-1;
 
   bool pd_constant_p = (TREE_CODE (pd.rhs) == CONSTRUCTOR
@@ -2680,7 +2682,13 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   && offset2.is_constant ()
   && maxsize.is_constant ()
   && ranges_known_overlap_p (offseti, maxsizei, offset2i,
- leni << LOG2_BITS_PER_UNIT))
+ leni << LOG2_BITS_PER_UNIT)
+  /* Punt on overflows.  */
+  && !((offseti > 0
+&& offset2i < INTTYPE_MINIMUM (HOST_WIDE_INT) + offseti)
+   || (offseti < 0
+   && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
+  + offseti
{
  pd_data pd;
  pd.rhs = build_constructor (NULL_TREE, NULL);
@@ -2733,7 +2741,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   && offset2.is_constant ()
   && size2.is_constant ()
   && ranges_known_overlap_p (offseti, maxsizei,
- offset2i, size2i))
+ offset2i, size2i)
+  /* Punt on overflows.  */
+  && !((offseti > 0
+&& offset2i < (INTTYPE_MINIMUM (HOST_WIDE_INT)
+   + offseti))
+   || (offseti < 0
+   && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
+  + offseti
{
  /* Let clobbers be consumed by the partial-def tracker
 which can choose to ignore them if they are shadowed
@@ -2886,7 +2901,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
}
}
  else if (ranges_known_overlap_p (offseti, maxsizei, offset2i,
-  size2i))
+  size2i)
+  /* Punt on overflows.  */
+  && !((offseti > 0
+&& offset2i < (INTTYPE_MINIMUM (HOST_WIDE_INT)
+   + offseti))
+   || (offseti < 0
+   && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
+  + offseti
{
  pd_data pd;
  tree rhs = gimple_assign_rhs1 (def_stmt);
@@ -2969,7 +2991,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   && offset.is_constant ()
   && offset2.is_constant ()
   && size2.is_constant ()
-  && ranges_known_overlap_p (offset, maxsize, offset2, size2))
+  && ranges_known_overlap_p (offset, maxsize, offset2, size2)
+  /* Punt on overflows.  */
+  && !((offseti > 0
+&& offset2i < (INTTYPE_MINIMUM (HOST_WIDE_INT)
+   + offseti))
+   || (offseti < 0
+   && offset2i > (INTTYPE_MAXIMUM (HOST_WIDE_INT)
+  + offseti
{
  pd_data pd;
  pd.rhs = SSA_VAL (def_rhs);

Jakub



[PATCH] sccvn: Handle non-byte aligned offset or size for memset (, 123, ) [PR93945]

2020-02-27 Thread Jakub Jelinek
Hi!

The following is the last spot in vn_reference_lookup_3 that didn't allow
non-byte aligned offsets or sizes.  To be precise, it did allow size that
wasn't multiple of byte size and that caused a wrong-code issue on
big-endian, as the pr93945.c testcase shows, so for GCC 9 we should add
&& multiple_p (ref->size, BITS_PER_UNIT) check instead.
For the memset with SSA_NAME middle-argument, it still requires byte-aligned
offset, as we'd otherwise need to rotate the value at runtime.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-27  Jakub Jelinek  

PR tree-optimization/93582
PR tree-optimization/93945
* tree-ssa-sccvn.c (vn_reference_lookup_3): Handle memset with
non-zero INTEGER_CST second argument and ref->offset or ref->size
not a multiple of BITS_PER_UNIT.

* gcc.dg/tree-ssa/pr93582-9.c: New test.
* gcc.c-torture/execute/pr93945.c: New test.

--- gcc/tree-ssa-sccvn.c.jj 2020-02-24 12:55:32.619143689 +0100
+++ gcc/tree-ssa-sccvn.c2020-02-26 13:38:01.899937521 +0100
@@ -2386,7 +2386,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   vn_reference_t vr = data->vr;
   gimple *def_stmt = SSA_NAME_DEF_STMT (vuse);
   tree base = ao_ref_base (ref);
-  HOST_WIDE_INT offseti, maxsizei;
+  HOST_WIDE_INT offseti = 0, maxsizei, sizei = 0;
   static vec lhs_ops;
   ao_ref lhs_ref;
   bool lhs_ref_ok = false;
@@ -2541,9 +2541,13 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   && (integer_zerop (gimple_call_arg (def_stmt, 1))
  || ((TREE_CODE (gimple_call_arg (def_stmt, 1)) == INTEGER_CST
   || (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8)))
- && CHAR_BIT == 8 && BITS_PER_UNIT == 8
+ && CHAR_BIT == 8
+ && BITS_PER_UNIT == 8
+ && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
  && offset.is_constant ()
- && offseti % BITS_PER_UNIT == 0))
+ && ref->size.is_constant ()
+ && (offseti % BITS_PER_UNIT == 0
+ || TREE_CODE (gimple_call_arg (def_stmt, 1)) == INTEGER_CST)))
   && poly_int_tree_p (gimple_call_arg (def_stmt, 2))
   && (TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR
  || TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME))
@@ -2604,7 +2608,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   else
return (void *)-1;
   tree len = gimple_call_arg (def_stmt, 2);
-  HOST_WIDE_INT leni, offset2i, offseti;
+  HOST_WIDE_INT leni, offset2i;
   /* Sometimes the above trickery is smarter than alias analysis.  Take
  advantage of that.  */
   if (!ranges_maybe_overlap_p (offset, maxsize, offset2,
@@ -2618,7 +2622,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree
  tree val;
  if (integer_zerop (gimple_call_arg (def_stmt, 1)))
val = build_zero_cst (vr->type);
- else if (INTEGRAL_TYPE_P (vr->type) && known_eq (ref->size, 8))
+ else if (INTEGRAL_TYPE_P (vr->type)
+  && known_eq (ref->size, 8)
+  && offseti % BITS_PER_UNIT == 0)
{
  gimple_match_op res_op (gimple_match_cond::UNCOND, NOP_EXPR,
  vr->type, gimple_call_arg (def_stmt, 1));
@@ -2630,10 +2636,34 @@ vn_reference_lookup_3 (ao_ref *ref, tree
}
  else
{
- unsigned buflen = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (vr->type));
+ unsigned buflen = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (vr->type)) + 
1;
+ if (INTEGRAL_TYPE_P (vr->type))
+   buflen = GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (vr->type)) + 1;
  unsigned char *buf = XALLOCAVEC (unsigned char, buflen);
  memset (buf, TREE_INT_CST_LOW (gimple_call_arg (def_stmt, 1)),
  buflen);
+ if (BYTES_BIG_ENDIAN)
+   {
+ unsigned int amnt
+   = (((unsigned HOST_WIDE_INT) offseti + sizei)
+  % BITS_PER_UNIT);
+ if (amnt)
+   {
+ shift_bytes_in_array_right (buf, buflen,
+ BITS_PER_UNIT - amnt);
+ buf++;
+ buflen--;
+   }
+   }
+ else if (offseti % BITS_PER_UNIT != 0)
+   {
+ unsigned int amnt
+   = BITS_PER_UNIT - ((unsigned HOST_WIDE_INT) offseti
+  % BITS_PER_UNIT);
+ shift_bytes_in_array_left (buf, buflen, amnt);
+ buf++;
+ buflen--;
+   }
  val = native_interpret_expr (vr->type, buf, buflen);
  if (!val)
return (void *)-1;
--- gcc/testsuite/gcc.dg/tree-ssa/pr93582-9.c.jj2020-02-26 
13:47:42.246393489 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr93582-9.c   2020-02-26 

Re: Binaries page modifications

2020-02-27 Thread Jonathan Wakely

On 27/02/20 08:23 +, CHIGOT, CLEMENT wrote:

Hi everyone,

I'm one of the owner of the BullFreeware website and I'm seeing that, in 
https://gcc.gnu.org/install/binaries.html, our website is described for "Bull’s Open Source 
Software Archive for for AIX 5L and AIX 6;". Would it be possible to change it for 
"Bull’s Open Source Software Archive for AIX 6 and AIX 7;", as we're no longer working on 
AIX5 ?


Thanks for the mail. I've made that change with this patch, committed
to master as obvious.

By the way, I noticed that the "About" text on the front page of
bullfreeware.com says "appplication".


commit 4fd9efc8877814e8cda506563d0282a267c562c8
Author: Jonathan Wakely 
Date:   Thu Feb 27 08:42:05 2020 +

doc: Update description of BullFreeware

* doc/install.texi (Binaries): Update description of BullFreeware.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 9b24a06d961..92961833ef6 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3274,7 +3274,7 @@ AIX:
 @itemize
 @item
 @uref{http://www.bullfreeware.com,,Bull's Open Source Software Archive for
-for AIX 5L and AIX 6};
+for AIX 6 and AIX 7};
 
 @item
 @uref{http://www.perzl.org/aix/,,AIX Open Source Packages (AIX5L AIX 6.1