Re: [C++] Don't fold __builtin_constant_p prematurely

2019-09-02 Thread Marc Glisse

On Fri, 2 Aug 2019, Marc Glisse wrote:


Ping

On Tue, 16 Jul 2019, Marc Glisse wrote:


Adding a C++ maintainer in Cc:
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00808.html

On Wed, 10 Jul 2019, Marc Glisse wrote:


Hello,

this avoids folding __builtin_constant_p to 0 early when we are not forced 
to do so. Clearly this has an effect, since it uncovered a bug in 
wi::lshift, fixed today ;-)


I wasn't sure about using |= or just =, the first one seemed more 
conservative.


Bootstrap+regtest on x86_64-pc-linux-gnu.

2019-07-11  Marc Glisse  

gcc/cp/
* constexpr.c (cxx_eval_builtin_function_call): Only set
force_folding_builtin_constant_p if manifestly_const_eval.

gcc/testsuite/
* g++.dg/pr85746.C: New file.


--
Marc Glisse


[PATCH] [LIBPHOBOS] Fix multi-lib RUNTESTFLAGS handling

2019-09-02 Thread Bernd Edlinger
Hi,


I've noticed that testing libphobos fails for multi-lib configs:

$ make check-target-libphobos RUNTESTFLAGS="--target_board=unix\{-m32,\}"

fails for every 32bit execution, because the host libgcc_s.so is used which
is not the correct version:

spawn [open ...]
./test_aa.exe: /lib/i386-linux-gnu/libgcc_s.so.1: version `GCC_7.0.0' not found 
(required by ./test_aa.exe)
FAIL: libphobos.aa/test_aa.d execution test

This can be fixed by adding a few lines from 
libstdc++/testsuite/lib/libstdc++.exp
to libphobos/testsuite/lib/libphobos.exp, see attached patch.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

2019-09-02  Bernd Edlinger  

	* testsuite/lib/libphobos.exp (libphobos_init): Add multi-lib libgcc dirs
	to the ld_library_path var.

Index: libphobos/testsuite/lib/libphobos.exp
===
--- libphobos/testsuite/lib/libphobos.exp	(revision 275320)
+++ libphobos/testsuite/lib/libphobos.exp	(working copy)
@@ -170,6 +170,25 @@ proc libphobos_init { args } {
 	append ld_library_path ":${blddir}/src/.libs"
 }
 
+# Compute what needs to be added to the existing LD_LIBRARY_PATH.
+if {$gccdir != ""} {
+	set compiler ${gccdir}/gdc
+
+	if { [is_remote host] == 0 && [which $compiler] != 0 } {
+	  foreach i "[exec $compiler --print-multi-lib]" {
+	set mldir ""
+	regexp -- "\[a-z0-9=_/\.-\]*;" $i mldir
+	set mldir [string trimright $mldir "\;@"]
+	if { "$mldir" == "." } {
+	  continue
+	}
+	if { [llength [glob -nocomplain ${gccdir}/${mldir}/libgcc_s*.so.*]] >= 1 } {
+	  append ld_library_path ":${gccdir}/${mldir}"
+	}
+	  }
+	}
+}
+
 set_ld_library_path_env_vars
 
 libphobos_maybe_build_wrapper "${objdir}/testglue.o"


Re: [x86 testsuite] preserve full register across main

2019-09-02 Thread Alexandre Oliva
On Aug 24, 2019, Uros Bizjak  wrote:

> Can __attribute__ ((mode (__word__))) be used here?

Oh, nice, yes, thanks!

> Otherwise OK.

Here's what I'm installing.


for  gcc/testsuite/ChangeLog

* gcc.target/i386/20020616-1.c: Preserve full register across
main.
---
 gcc/testsuite/gcc.target/i386/20020616-1.c |   14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/20020616-1.c 
b/gcc/testsuite/gcc.target/i386/20020616-1.c
index 5641826b4837c..48dea27956ce2 100644
--- a/gcc/testsuite/gcc.target/i386/20020616-1.c
+++ b/gcc/testsuite/gcc.target/i386/20020616-1.c
@@ -2,12 +2,16 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 
+/* We need this type to be as wide as the register chosen below, so
+   that, when we preserve it across main, we preserve all of it.  */
+typedef int __attribute__ ((mode (__word__))) reg_type;
+
 #if !__PIC__
-register int k asm("%ebx");
+register reg_type k asm("%ebx");
 #elif __amd64
-register int k asm("%r12");
+register reg_type k asm("%r12");
 #else
-register int k asm("%esi");
+register reg_type k asm("%esi");
 #endif
 
 void __attribute__((noinline))
@@ -18,7 +22,7 @@ foo()
 
 void test()
 {
-  int i;
+  reg_type i;
   for (i = 0; i < 10; i += k)
 {
   k = 0;
@@ -28,7 +32,7 @@ void test()
 
 int main()
 {
-  int old = k;
+  reg_type old = k;
   test();
   k = old;
   return 0;


-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!   FSF.org & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Hongtao Liu
On Mon, Sep 2, 2019 at 6:23 PM Richard Biener
 wrote:
>
> On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu  wrote:
> >
> > > which is not the case with core_cost (and similar with skylake_cost):
> > >
> > >   2, 2, 4,/* cost of moving XMM,YMM,ZMM register */
> > >   {6, 6, 6, 6, 12},/* cost of loading SSE registers
> > >in 32,64,128,256 and 512-bit */
> > >   {6, 6, 6, 6, 12},/* cost of storing SSE registers
> > >in 32,64,128,256 and 512-bit */
> > >   2, 2,/* SSE->integer and integer->SSE moves */
> > >
> > > We have the same cost of moving between integer registers (by default
> > > set to 2), between SSE registers and between integer and SSE register
> > > sets. I think that at least the cost of moves between regsets should
> > > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > > that would translate to the value of 6. The value should be low enough
> > > to keep the cost below the value that forces move through the memory.
> > > Changing core register allocation cost of SSE <-> integer to:
> > >
> > > --cut here--
> > > Index: config/i386/x86-tune-costs.h
> > > ===
> > > --- config/i386/x86-tune-costs.h(revision 275281)
> > > +++ config/i386/x86-tune-costs.h(working copy)
> > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> > >in 32,64,128,256 and 512-bit */
> > >{6, 6, 6, 6, 12},/* cost of storing SSE registers
> > >in 32,64,128,256 and 512-bit */
> > > -  2, 2,/* SSE->integer and
> > > integer->SSE moves */
> > > +  6, 6,/* SSE->integer and
> > > integer->SSE moves */
> > >/* End of register allocator costs.  */
> > >},
> > >
> > > --cut here--
> > >
> > > still produces direct move in gcc.target/i386/minmax-6.c
> > >
> > > I think that in addition to attached patch, values between 2 and 6
> > > should be considered in benchmarking. Unfortunately, without access to
> > > regressed SPEC tests, I can't analyse these changes by myself.
> > >
> > > Uros.
> >
> > Apply similar change to skylake_cost, on skylake workstation we got
> > performance like:
> > ---
> > version|
> > 548_exchange_r score
> > gcc10_20180822:   |   10
> > apply remove_max8   |   8.9
> > also apply increase integer_tofrom_sse cost |   9.69
> > -
> > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> > libgfortran.so.5.0.0.
> >
> > I found suspicious code as bellow, does it affect?
>
> This should be fixed after
>
> 2019-08-27  Richard Biener  
>
> * config/i386/i386-features.h
> (general_scalar_chain::~general_scalar_chain): Add.
> (general_scalar_chain::insns_conv): New bitmap.
> (general_scalar_chain::n_sse_to_integer): New.
> (general_scalar_chain::n_integer_to_sse): Likewise.
> (general_scalar_chain::make_vector_copies): Adjust signature.
> * config/i386/i386-features.c
> (general_scalar_chain::general_scalar_chain): Outline,
> initialize new members.
> (general_scalar_chain::~general_scalar_chain): New.
> (general_scalar_chain::mark_dual_mode_def): Record insns
> we need to insert conversions at and count them.
> (general_scalar_chain::compute_convert_gain): Account
> for conversion instructions at chain boundary.
> (general_scalar_chain::make_vector_copies): Generate a single
> copy for a def by a specific insn.
> (general_scalar_chain::convert_registers): First populate
> defs_map, then make copies at out-of chain insns.
>
> where the only ???  is that we have
>
>   const int sse_to_integer; /* cost of moving SSE register to integer.  */
>
> but not integer_to_sse.  In the hard_register sub-struct of processor_cost
Yes.
> we have both:
>
>   const int sse_to_integer; /* cost of moving SSE register to integer.  */
>   const int integer_to_sse; /* cost of moving integer register to SSE. */
>
> IMHO that we have mostly the same kind of costs two times is odd.
They are used for different purposes(one for register allocation, one
for rtx_cost).
Changing cost for register allocation shouldn't affect rtx_cost which
would be used
somewhere else.
> And the compute_convert_gain function adds up apples and oranges.
>
> > --
> > modified   gcc/config/i386/i386-features.c
> > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
> >if (dump_file)
> >  fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
> >
> > -  /* ???  W

Re: [RFA][3/3] Remove Cell Broadband Engine SPU targets: libstdc++

2019-09-02 Thread Jonathan Wakely

On 02/09/19 22:19 +0200, Ulrich Weigand wrote:

[RFA][3/3] Remove Cell Broadband Engine SPU targets: libstdc++

Remove all references to spu from the libstdc++ directory.

Note that libstdc++ currently consideres "__ea" a reserved word
(because it was for the SPU target), and therefore specifically
avoids using it in include/tr1/ell_integral.tcc.  This patch
removes this workaround (which is not strictly necessary, but
seems the way to go ...).

Tested on s390x-ibm-linux.

OK for mainline?


Yes, OK, thanks.




[PATCH] Use type alignment in get_builtin_sync_mem

2019-09-02 Thread Ulrich Weigand
Hello,

on s390x the 128-bit integer type is only aligned to 8 bytes by default,
but when lock-free atomic operations can only be performed on objects
aligned to 16 bytes.  However, we've noticed that GCC sometimes falls
back to library calls *even if* the object is actually 16 byte aligned,
and GCC could easily know this from type alignment information.

However, it turns out that get_builtin_sync_mem *ignores* type alignment
data, and only looks at *mode* alignment and pointer alignment data.
This is a problem since mode alignment doesn't know about user-provided
alignment attributes, and while pointer alignment does sometimes know
about those, this usually only happens with optimization on and also
if the optimizers can trace pointer assignments to the final object.

One important use case where the latter does not happen is with the
C++ atomic classes, because here the object is accessed via the "this"
pointer.  Pointer alignment tracking at this point never knows the
final object, and thus we always get a library call *even though*
libstdc++ has actually marked the class type with the correct alignment
atttribute.

Now one question might be, why does get_pointer_alignment not take
type alignment into account by itself?  This appears to be deliberate
to avoid considering numeric pointer values to be aligned when they
are not for target-specific reasons (e.g. the low bit that indicates
Thumb on ARM).

However, this is not an issue in get_builtin_sync_mem, where we are
actually interested in the alignment of the MEM we're just about to
generate, so it should be fine to check type alignment here.

This patch does just that, fixing the issue we've been seeing.

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich


ChangeLog:

* builtins.c (get_builtin_sync_mem): Respect type alignment.

testsuite/ChangeLog:

* gcc.target/s390/md/atomic_exchange-1.c: Do not use -latomic.
(aligned_int128): New data type.
(ae_128_0): Use it instead of __int128 to ensure proper alignment
for atomic accesses.
(ae_128_1): Likewise.
(g128): Likewise.
(main): Likewise.

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 274142)
+++ gcc/builtins.c  (working copy)
@@ -6001,9 +6001,16 @@
 
   mem = validize_mem (mem);
 
-  /* The alignment needs to be at least according to that of the mode.  */
-  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
-  get_pointer_alignment (loc)));
+  /* The alignment needs to be at least according to that of the mode.
+ Also respect alignment requirements of the type, and alignment
+ info that may be deduced from the expression itself.  */
+  unsigned int align = GET_MODE_ALIGNMENT (mode);
+  if (POINTER_TYPE_P (TREE_TYPE (loc)))
+{
+  unsigned int talign = min_align_of_type (TREE_TYPE (TREE_TYPE (loc)));
+  align = MAX (align, talign * BITS_PER_UNIT);
+}
+  set_mem_align (mem, MAX (align, get_pointer_alignment (loc)));
   set_mem_alias_set (mem, ALIAS_SET_MEMORY_BARRIER);
   MEM_VOLATILE_P (mem) = 1;
 
Index: gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c
===
--- gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c(revision 
274142)
+++ gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c(working copy)
@@ -1,7 +1,7 @@
 /* Machine description pattern tests.  */
 
 /* { dg-do compile } */
-/* { dg-options "-lpthread -latomic" } */
+/* { dg-options "-lpthread" } */
 /* { dg-do run { target { s390_useable_hw } } } */
 
 /**/
@@ -119,19 +119,21 @@
 /**/
 
 #ifdef __s390x__
+typedef __int128 __attribute__((aligned(16))) aligned_int128;
+
 __int128
-ae_128_0 (__int128 *lock)
+ae_128_0 (aligned_int128 *lock)
 {
   return __atomic_exchange_n (lock, 0, 2);
 }
 
 __int128
-ae_128_1 (__int128 *lock)
+ae_128_1 (aligned_int128 *lock)
 {
   return __atomic_exchange_n (lock, 1, 2);
 }
 
-__int128 g128;
+aligned_int128 g128;
 
 __int128
 ae_128_g_0 (void)
@@ -274,7 +276,7 @@
 
 #ifdef __s390x__
   {
-   __int128 lock;
+   aligned_int128 lock;
__int128 rval;
 
lock = oval;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[RFA][3/3] Remove Cell Broadband Engine SPU targets: libstdc++

2019-09-02 Thread Ulrich Weigand
[RFA][3/3] Remove Cell Broadband Engine SPU targets: libstdc++

Remove all references to spu from the libstdc++ directory.

Note that libstdc++ currently consideres "__ea" a reserved word
(because it was for the SPU target), and therefore specifically
avoids using it in include/tr1/ell_integral.tcc.  This patch
removes this workaround (which is not strictly necessary, but
seems the way to go ...).

Tested on s390x-ibm-linux.

OK for mainline?

Bye,
Ulrich


libstdc++-v3/ChangeLog:

* crossconfig.m4: Remove references to spu.
* configure: Regenerate.
* doc/xml/manual/appendix_contributing.xml: Remove references
to __ea as "badword" for spu.
* doc/html/manual/source_code_style.html: Regenerate.
* include/tr1/ell_integral.tcc (__ellint_rd): Do not attempt
to avoid __ea (as "badword" for spu).
(__ellint_rj): Likewise.

Index: libstdc++-v3/crossconfig.m4
===
--- libstdc++-v3/crossconfig.m4 (revision 275321)
+++ libstdc++-v3/crossconfig.m4 (working copy)
@@ -54,14 +54,6 @@
 AC_DEFINE(HAVE_SQRTF)
 ;;
 
-  spu-*-elf*)
-GLIBCXX_CHECK_COMPILER_FEATURES
-GLIBCXX_CHECK_LINKER_FEATURES
-GLIBCXX_CHECK_MATH_SUPPORT
-GLIBCXX_CHECK_STDLIB_SUPPORT
-AM_ICONV
-;;
-
   *-aix*)
 GLIBCXX_CHECK_LINKER_FEATURES
 GLIBCXX_CHECK_MATH_SUPPORT
Index: libstdc++-v3/doc/html/manual/source_code_style.html
===
--- libstdc++-v3/doc/html/manual/source_code_style.html (revision 275321)
+++ libstdc++-v3/doc/html/manual/source_code_style.html (working copy)
@@ -48,9 +48,6 @@
       _res_ext
       __tg_*
 
-      SPU adds:
-      __ea
-
       For GCC:
 
       [Note that this list is out of date. It applies to the 
old
Index: libstdc++-v3/doc/xml/manual/appendix_contributing.xml
===
--- libstdc++-v3/doc/xml/manual/appendix_contributing.xml   (revision 
275321)
+++ libstdc++-v3/doc/xml/manual/appendix_contributing.xml   (working copy)
@@ -463,9 +463,6 @@
   _res_ext
   __tg_*
 
-  SPU adds:
-  __ea
-
   For GCC:
 
   [Note that this list is out of date. It applies to the old
Index: libstdc++-v3/include/tr1/ell_integral.tcc
===
--- libstdc++-v3/include/tr1/ell_integral.tcc   (revision 275321)
+++ libstdc++-v3/include/tr1/ell_integral.tcc   (working copy)
@@ -370,11 +370,10 @@
   __zn = __c0 * (__zn + __lambda);
 }
 
- // Note: __ea is an SPU badname.
-  _Tp __eaa = __xndev * __yndev;
+  _Tp __ea = __xndev * __yndev;
   _Tp __eb = __zndev * __zndev;
-  _Tp __ec = __eaa - __eb;
-  _Tp __ed = __eaa - _Tp(6) * __eb;
+  _Tp __ec = __ea - __eb;
+  _Tp __ed = __ea - _Tp(6) * __eb;
   _Tp __ef = __ed + __ec + __ec;
   _Tp __s1 = __ed * (-__c1 + __c3 * __ed
/ _Tp(3) - _Tp(3) * __c4 * __zndev * __ef
@@ -381,7 +380,7 @@
/ _Tp(2));
   _Tp __s2 = __zndev
* (__c2 * __ef
-+ __zndev * (-__c3 * __ec - __zndev * __c4 - __eaa));
++ __zndev * (-__c3 * __ec - __zndev * __c4 - __ea));
 
   return _Tp(3) * __sigma + __power4 * (_Tp(1) + __s1 + __s2)
 / (__mu * std::sqrt(__mu));
@@ -634,17 +633,16 @@
   __pn = __c0 * (__pn + __lambda);
 }
 
- // Note: __ea is an SPU badname.
-  _Tp __eaa = __xndev * (__yndev + __zndev) + __yndev * __zndev;
+  _Tp __ea = __xndev * (__yndev + __zndev) + __yndev * __zndev;
   _Tp __eb = __xndev * __yndev * __zndev;
   _Tp __ec = __pndev * __pndev;
-  _Tp __e2 = __eaa - _Tp(3) * __ec;
-  _Tp __e3 = __eb + _Tp(2) * __pndev * (__eaa - __ec);
+  _Tp __e2 = __ea - _Tp(3) * __ec;
+  _Tp __e3 = __eb + _Tp(2) * __pndev * (__ea - __ec);
   _Tp __s1 = _Tp(1) + __e2 * (-__c1 + _Tp(3) * __c3 * __e2 / _Tp(4)
 - _Tp(3) * __c4 * __e3 / _Tp(2));
   _Tp __s2 = __eb * (__c2 / _Tp(2)
+ __pndev * (-__c3 - __c3 + __pndev * __c4));
-  _Tp __s3 = __pndev * __eaa * (__c2 - __pndev * __c3)
+  _Tp __s3 = __pndev * __ea * (__c2 - __pndev * __c3)
- __c2 * __pndev * __ec;
 
   return _Tp(3) * __sigma + __power4 * (__s1 + __s2 + __s3)
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[RFA][2/3] Remove Cell Broadband Engine SPU targets: testsuite

2019-09-02 Thread Ulrich Weigand
[RFA][2/3] Remove Cell Broadband Engine SPU targets: testsuite

Remove all references to spu from the testsuite directory.

Tested on s390x-ibm-linux.

OK for mainline?

(Deleted directories omitted from patch.)

Bye,
Ulrich



gcc/testsuite/ChangeLog:

* lib/compat.exp: Remove references to spu.
* lib/fortran-torture.exp: Likewise.
* lib/gcc-dg.exp: Likewise.
* lib/gfortran.exp: Likewise.
* lib/target-supports.exp: Likewise.
* lib/target-utils.exp: Likewise.

* c-c++-common/torture/complex-sign-add.c: Remove references to spu.
* c-c++-common/torture/complex-sign-mixed-add.c: Likewise.
* c-c++-common/torture/complex-sign-mixed-div.c: Likewise.
* c-c++-common/torture/complex-sign-mixed-mul.c: Likewise.
* c-c++-common/torture/complex-sign-mixed-sub.c: Likewise.
* c-c++-common/torture/complex-sign-mul-minus-one.c: Likewise.
* c-c++-common/torture/complex-sign-mul-one.c: Likewise.
* c-c++-common/torture/complex-sign-mul.c: Likewise.
* c-c++-common/torture/complex-sign-sub.c: Likewise.

* g++.dg/opt/temp1.C: Remove references to spu.
* g++.dg/opt/vt1.C: Likewise.
* g++.dg/torture/type-generic-1.C: Likewise.
* g++.dg/warn/pr30551-2.C: Likewise.
* g++.dg/warn/pr30551.C: Likewise.
* g++.old-deja/g++.jason/thunk2.C: Likewise.
* g++.old-deja/g++.other/comdat5.C: Likewise.
* g++.old-deja/g++.other/local-alloc1.C: Likewise.

* gcc.c-torture/compile/20001226-1.c: Remove references to spu.
* gcc.c-torture/execute/20030222-1.c: Likewise.
* gcc.c-torture/execute/20031003-1.c: Likewise.
* gcc.c-torture/execute/20101011-1.c: Likewise.
* gcc.c-torture/execute/conversion.c: Likewise.
* gcc.c-torture/execute/ieee/compare-fp-4.x: Likewise.
* gcc.c-torture/execute/ieee/fp-cmp-2.x: Likewise.
* gcc.c-torture/execute/ieee/inf-1.c: Likewise.
* gcc.c-torture/execute/ieee/inf-2.c: Likewise.
* gcc.c-torture/execute/ieee/mul-subnormal-single-1.x: Likewise.
* gcc.c-torture/execute/ieee/rbug.c: Likewise.
* gcc.c-torture/execute/pr39228.c: Likewise.
* gcc.c-torture/execute/ieee/20010114-2.x: Remove file.
* gcc.c-torture/execute/ieee/20030331-1.x: Remove file.
* gcc.c-torture/execute/ieee/920518-1.x: Remove file.
* gcc.c-torture/execute/ieee/compare-fp-1.x: Remove file.
* gcc.c-torture/execute/ieee/fp-cmp-4f.x: Remove file.
* gcc.c-torture/execute/ieee/fp-cmp-8f.x: Remove file.

* gcc.dg/20020312-2.c: Remove references to spu.
* gcc.dg/20030702-1.c: Likewise.
* gcc.dg/and-1.c: Likewise.
* gcc.dg/builtin-inf-1.c: Likewise.
* gcc.dg/builtins-1.c: Likewise.
* gcc.dg/builtins-43.c: Likewise.
* gcc.dg/builtins-44.c: Likewise.
* gcc.dg/builtins-45.c: Likewise.
* gcc.dg/float-range-1.c: Likewise.
* gcc.dg/float-range-3.c: Likewise.
* gcc.dg/float-range-4.c: Likewise.
* gcc.dg/float-range-5.c: Likewise.
* gcc.dg/fold-overflow-1.c: Likewise.
* gcc.dg/format/ms_unnamed-1.c: Likewise.
* gcc.dg/format/unnamed-1.c: Likewise.
* gcc.dg/hex-round-1.c: Likewise.
* gcc.dg/hex-round-2.c: Likewise.
* gcc.dg/lower-subreg-1.c: Likewise.
* gcc.dg/nrv3.c: Likewise.
* gcc.dg/pr15784-3.c: Likewise.
* gcc.dg/pr27095.c: Likewise.
* gcc.dg/pr28243.c: Likewise.
* gcc.dg/pr28796-2.c: Likewise.
* gcc.dg/pr30551-3.c: Likewise.
* gcc.dg/pr30551-6.c: Likewise.
* gcc.dg/pr30551.c: Likewise.
* gcc.dg/pr70317.c: Likewise.
* gcc.dg/sms-1.c: Likewise.
* gcc.dg/sms-2.c: Likewise.
* gcc.dg/sms-3.c: Likewise.
* gcc.dg/sms-4.c: Likewise.
* gcc.dg/sms-5.c: Likewise.
* gcc.dg/sms-6.c: Likewise.
* gcc.dg/sms-7.c: Likewise.
* gcc.dg/stack-usage-1.c: Likewise.
* gcc.dg/strlenopt-73.c: Likewise.
* gcc.dg/titype-1.c: Likewise.
* gcc.dg/tls/thr-cse-1.c: Likewise.
* gcc.dg/torture/builtin-attr-1.c: Likewise.
* gcc.dg/torture/builtin-complex-1.c: Likewise.
* gcc.dg/torture/builtin-cproj-1.c: Likewise.
* gcc.dg/torture/builtin-frexp-1.c: Likewise.
* gcc.dg/torture/builtin-ldexp-1.c: Likewise.
* gcc.dg/torture/builtin-logb-1.c: Likewise.
* gcc.dg/torture/builtin-math-2.c: Likewise.
* gcc.dg/torture/builtin-math-5.c: Likewise.
* gcc.dg/torture/builtin-modf-1.c: Likewise.
* gcc.dg/torture/fp-int-convert.h: Likewise.
* gcc.dg/torture/pr25947-1.c: Likewise.
* gcc.dg/torture/type-generic-1.c: Likewise.
* gcc.dg/tree-ssa/20040204-1.c: Likewise.
* gcc.dg/tree-ssa/ivopts-1.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-3.c: Likewise.
* gcc.dg/tree-ssa/vector-6.c: Li

[RFA][1/3] Remove Cell Broadband Engine SPU targets

2019-09-02 Thread Ulrich Weigand
Hello,

as announced here: https://gcc.gnu.org/ml/gcc/2019-04/msg00023.html
we have declared the spu-elf target obsolete in GCC 9 with the goal
of removing support in GCC 10.  Nobody has stepped up to take over
maintainership of the target.

This patch set therefore removes this target and all references
to it from the GCC source tree.  (libstdc++ and testsuite are
done as separate patches, this patch handles everything else.)

Tested on s390x-ibm-linux.

OK for mainline?

(Deleted directories omitted from patch.)

Bye,
Ulrich


ChangeLog:

* MAINTAINERS: Remove spu port maintainers.

contrib/ChangeLog:

* compare-all-tests (all_targets): Remove references to spu.
* config-list.mk (LIST): Likewise.

contrib/header-tools/ChangeLog:

* README: Remove references to spu.
* reduce-headers: Likewise.

libbacktrace/ChangeLog:

* configure.ac: Remove references to spu.
* configure: Regenerate.

libcpp/ChangeLog:

* directives.c: Remove references to spu from comments.
* expr.c: Likewise.

libgcc/ChangeLog:

* config.host: Remove references to spu.
* config/spu/: Remove directory.

gcc/ChangeLog:

* config.gcc: Obsolete spu target.  Remove references to spu.
* configure.ac: Remove references to spu.
* configure: Regenerate.
* config/spu/: Remove directory.
* common/config/spu/: Remove directory.

* doc/extend.texi: Remove references to spu.
* doc/invoke.texi: Likewise.
* doc/md.texi: Likewise.
* doc/sourcebuild.texi: Likewise.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 275321)
+++ MAINTAINERS (working copy)
@@ -109,9 +109,6 @@
 sh portOleg Endo   
 sparc port David S. Miller 
 sparc port Eric Botcazou   
-spu port   Trevor Smigiel  

-spu port   David Edelsohn  
-spu port   Ulrich Weigand  
 tilegx portWalter Lee  
 tilepro port   Walter Lee  
 v850 port  Nick Clifton
Index: contrib/compare-all-tests
===
--- contrib/compare-all-tests   (revision 275321)
+++ contrib/compare-all-tests   (working copy)
@@ -34,7 +34,7 @@
 sh_opts='-m3 -m3e -m4 -m4a -m4al -m4/-mieee -m1 -m1/-mno-cbranchdi -m2a 
-m2a/-mieee -m2e -m2e/-mieee'
 sparc_opts='-mcpu=v8/-m32 -mcpu=v9/-m32 -m64'
 
-all_targets='alpha arm avr bfin cris fr30 frv h8300 ia64 iq2000 m32c m32r m68k 
mcore mips mmix mn10300 pa pdp11 ppc sh sparc spu v850 vax xstormy16 xtensa' # 
e500 
+all_targets='alpha arm avr bfin cris fr30 frv h8300 ia64 iq2000 m32c m32r m68k 
mcore mips mmix mn10300 pa pdp11 ppc sh sparc v850 vax xstormy16 xtensa' # e500 
 
 test_one_file ()
 {
Index: contrib/config-list.mk
===
--- contrib/config-list.mk  (revision 275321)
+++ contrib/config-list.mk  (working copy)
@@ -90,7 +90,7 @@
   sparc-leon3-linux-gnuOPT-enable-target=all sparc-netbsdelf \
   
sparc64-sun-solaris2.11OPT-with-gnu-ldOPT-with-gnu-asOPT-enable-threads=posix \
   sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 \
-  sparc64-netbsd sparc64-openbsd spu-elf \
+  sparc64-netbsd sparc64-openbsd \
   tilegx-linux-gnu tilegxbe-linux-gnu tilepro-linux-gnu \
   v850e-elf v850-elf v850-rtems vax-linux-gnu \
   vax-netbsdelf vax-openbsd visium-elf x86_64-apple-darwin \
Index: contrib/header-tools/README
===
--- contrib/header-tools/README (revision 275321)
+++ contrib/header-tools/README (working copy)
@@ -203,7 +203,7 @@
   these targets.  They are also known to the tool.  When building targets it
   will check those targets before the rest.  
   This coverage can be achieved by building config-list.mk with :
-  LIST="aarch64-linux-gnu arm-netbsdelf c6x-elf epiphany-elf hppa2.0-hpux10.1 
i686-mingw32crt i686-pc-msdosdjgpp mipsel-elf powerpc-eabisimaltivec 
rs6000-ibm-aix5.1.0 sh-superh-elf sparc64-elf spu-elf"
+  LIST="aarch64-linux-gnu arm-netbsdelf c6x-elf epiphany-elf hppa2.0-hpux10.1 
i686-mingw32crt i686-pc-msdosdjgpp mipsel-elf powerpc-eabisimaltivec 
rs6000-ibm-aix5.1.0 sh-superh-elf sparc64-elf"
 
   -b specifies the native bootstrapped build root directory
   -t specifies a target build root directory that config-list.mk was run from
Index: contrib/header-tools/reduce-headers
===
--- contrib/header-tools/reduce-headers (revision 275321)
+++ contrib/header-tools/reduce-headers (working copy)
@@ -32,8 +32,7 @@
 "powerpc-eabisimaltivec",
 "rs6000-ibm-aix5.1.0",
 "sh-superh-elf",
-"sparc64-elf",
-"spu-elf"
+"sparc64-elf"
 ]
 
 
Index: libbacktrace/co

Re: [ARM/FDPIC v5 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2019-09-02 Thread Christophe Lyon
On Mon, 2 Sep 2019 at 18:12, Richard Sandiford
 wrote:
>
> Sorry for the slow reply.
>
> Christophe Lyon  writes:
> > On 16/07/2019 13:58, Richard Sandiford wrote:
> >> Christophe Lyon  writes:
> >>> +(define_insn "*restore_pic_register_after_call"
> >>> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")
> >>> +  (match_operand:SI 1 "nonimmediate_operand" "r,m")]
> >>> +  UNSPEC_PIC_RESTORE)
> >>> + (use (match_dup 1))
> >>> + (clobber (match_dup 0))])
> >>> +  ]
> >>> +  ""
> >>> +  "@
> >>> +  mov\t%0, %1
> >>> +  ldr\t%0, %1"
> >>> +)
> >>> +
> >>>   (define_expand "call_internal"
> >>> [(parallel [(call (match_operand 0 "memory_operand" "")
> >>> (match_operand 1 "general_operand" ""))
> >>
> >> Since operand 0 is significant after the instruction, I think this
> >> should be:
> >>
> >> (define_insn "*restore_pic_register_after_call"
> >>[(set (match_operand:SI 0 "s_register_operand" "+r,r")
> >>  (unspec:SI [(match_dup 0)
> >>  (match_operand:SI 1 "nonimmediate_operand" "r,m")]
> >> UNSPEC_PIC_RESTORE))]
> >>...
> >>
> >> The (use (match_dup 1)) looks redundant, since the unspec itself
> >> uses operand 1.
> >>
> > When I try that, I have cases where the restore instruction is discarded, 
> > when the call happens just before function return. Since r9 is 
> > caller-saved, it should be restored but after dse2 the dumps say:
> > (insn (set (reg:SI 9 r9)
> > (unspec:SI [
> > (reg:SI 9 r9)
> > (reg:SI 4 r4 [121])
> >   ] UNSPEC_PIC_RESTORE))
> > (expr_list:REG_UNUSED (reg:SI 9 r9) (nil
> >
> > and this is later removed by cprop_hardreg (which says the exit block uses 
> > r4, sp, and lr: should I make it use r9?)
>
> But if it's caller-saved (i.e. call-clobbered), function A shouldn't
> need to restore r9 after a call unless A needs the value of r9 for
> something.  I.e. A shouldn't need to restore r9 for A's own caller,
> because the caller should be doing that iself.
>
> So if r9 is caller-saved and not referenced between the call and
> function exit, deleting the restore sounds like the right thing to do.
>

Of course! I should have found that myself: I tried this change before I removed
an "optimization" we had that avoided restoring  r9 before calling
functions in the same module... thus breaking the ABI.

Since the previous patch I send didn't have this "optimization", the
above now works.
New patch attached.

Thanks!

Christophe


> Richard
commit 7b606c39834c5fefbde30a3131f9b123d02e7491
Author: Christophe Lyon 
Date:   Thu Feb 8 11:10:51 2018 +0100

[ARM] FDPIC: Add support for FDPIC for arm architecture

The FDPIC register is hard-coded to r9, as defined in the ABI.

We have to disable tailcall optimizations if we don't know if the
target function is in the same module. If not, we have to set r9 to
the value associated with the target module.

When generating a symbol address, we have to take into account whether
it is a pointer to data or to a function, because different
relocations are needed.

2019-XX-XX  Christophe Lyon  
	Mickaël Guêné 

	gcc/
	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
	in FDPIC mode.
	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
	new function.
	* config/arm/arm.c (arm_option_override): Define pic register to
	FDPIC_REGNUM.
	(arm_function_ok_for_sibcall): Disable sibcall optimization if we
	have no decl or go through PLT.
	(calculate_pic_address_constant): New function.
	(legitimize_pic_address): Call calculate_pic_address_constant.
	(arm_load_pic_register): Handle TARGET_FDPIC.
	(arm_is_segment_info_known): New function.
	(arm_pic_static_addr): Add support for FDPIC.
	(arm_load_function_descriptor): New function.
	(arm_emit_call_insn): Add support for FDPIC.
	(arm_assemble_integer): Add support for FDPIC.
	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
	Define. (FDPIC_REGNUM): New define.
	* config/arm/arm.md (call): Add support for FDPIC.
	(call_value): Likewise.
	(restore_pic_register_after_call): New pattern.
	(untyped_call): Disable if FDPIC.
	(untyped_return): Likewise.
	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

	gcc/testsuite/
	* gcc.target/arm/fp16-aapcs-2.c: Adjust scan-assembler-times.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.

Change-Id: I1e96d260074ab7b75d36cdff5d34ad898f35c66f

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6e256ee..34695fa 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   builtin_define ("__ARM_EABI__");
 }
 
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
+
   def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
  

Re: [PATCH][AArch64] Add support for missing CPUs

2019-09-02 Thread James Greenhalgh
On Thu, Aug 22, 2019 at 12:03:33PM +0100, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 8/21/19 10:27 AM, Dennis Zhang wrote:
> > Hi all,
> >
> > This patch adds '-mcpu' options for following CPUs:
> > Cortex-A77, Cortex-A76AE, Cortex-A65, Cortex-A65AE, and Cortex-A34.
> >
> > Related specifications are as following:
> > https://developer.arm.com/ip-products/processors/cortex-a
> >
> > Bootstraped/regtested for aarch64-none-linux-gnu.
> >
> > Please help to check if it's ready.
> >
> This looks ok to me but you'll need maintainer approval.

At this point Kyrill, I fully trust your OK without looking at the
patch in any more detail...

I think at Cauldron we ought to add some time during the Arm/AArch64 BoF
to discuss what the community would like us to do about maintainership in
AArch64. It seems clear to me that I'm just slowing you and others down now
by rubberstamping your decisions.

To be clear, this particular patch is OK for trunk - but I think it is
time to have a conversation about how we can make this experience easier
for everyone.

Thanks,
James

> 
> Thanks,
> 
> Kyrill
> 
> 
> > Many thanks!
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2019-08-21  Dennis Zhang  
> >
> >     * config/aarch64/aarch64-cores.def (AARCH64_CORE): New entries
> >     for Cortex-A77, Cortex-A76AE, Cortex-A65, Cortex-A65AE, and
> >     Cortex-A34.
> >     * config/aarch64/aarch64-tune.md: Regenerated.
> >     * doc/invoke.texi: Document the new processors.


[PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations.

2019-09-02 Thread Barnaby Wilks
Hello,

This patch introduces an optimization for narrowing binary and builtin
math operations to the smallest type when unsafe math optimizations are
enabled (typically -Ofast or -ffast-math).

Consider the example:

   float f (float x) {
 return 1.0 / sqrt (x);
   }

   f:
 fcvt   d0, s0
 fmov   d1, 1.0e+0
 fsqrt  d0, d0
 fdiv   d0, d1, d0
 fcvt   s0, d0
 ret

Given that all outputs are of float type, we can do the whole 
calculation in single precision and avoid any potentially expensive 
conversions between single and double precision.

Aka the expression would end up looking more like

   float f (float x) {
 return 1.0f / sqrtf (x);
   }

   f:
 fsqrt  s0, s0
 fmov   s1, 1.0e+0
 fdiv   s0, s1, s0
 ret

This optimization will narrow casts around math builtins, and also
not try to find the widest type for calculations when processing binary
math operations (if unsafe math optimizations are enable).

Added tests to verify that narrower math builtins are chosen and
no unnecessary casts are introduced when appropriate.

Bootstrapped and regtested on aarch64 and x86_64 with no regressions.

I don't have write access, so if OK for trunk then can someone commit on 
my behalf?

Regards,
Barney

gcc/ChangeLog:

2019-09-02  Barnaby Wilks  

* builtins.c (mathfn_built_in): Expose find implicit builtin parameter.
* builtins.h (mathfn_built_in): Likewise.
* match.pd: Add expressions for simplifying builtin and binary
math expressions.

gcc/testsuite/ChangeLog:

2019-09-02  Barnaby Wilks  

* gcc.dg/fold-single-precision.c: New test.
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 
1ffb491d7850366c74bd694bf9e1c277bcde1da9..5cd02af3be55b041918ad6f1a44d5520f5689fee
 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -108,6 +108,7 @@ extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
 extern tree mathfn_built_in (tree, enum built_in_function fn);
+extern tree mathfn_built_in (tree, enum built_in_function fn, bool implicit);
 extern tree mathfn_built_in (tree, combined_fn);
 extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
 extern rtx builtin_memset_read_str (void *, HOST_WIDE_INT, scalar_int_mode);
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 
695a9d191af4c4922351e3e59601a87b3fedda5c..6cfd7f4af54110fec9f53ddaf71408e7efc329da
 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -2137,6 +2137,12 @@ mathfn_built_in (tree type, enum built_in_function fn)
   return mathfn_built_in_1 (type, as_combined_fn (fn), /*implicit=*/ 1);
 }
 
+tree
+mathfn_built_in (tree type, enum built_in_function fn, bool implicit)
+{
+  return mathfn_built_in_1 (type, as_combined_fn (fn), implicit);
+}
+
 /* If BUILT_IN_NORMAL function FNDECL has an associated internal function,
return its code, otherwise return IFN_LAST.  Note that this function
only tests whether the function is defined in internals.def, not whether
diff --git a/gcc/match.pd b/gcc/match.pd
index 
0317bc704f771f626ab72189b3a54de00087ad5a..3562548de3ebcb986da20986b868d9a3d318c4ee
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5004,10 +5004,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  && newtype == type
  && types_match (newtype, type))
(op (convert:newtype @1) (convert:newtype @2))
-   (with { if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
+   (with
+ {
+   if (!flag_unsafe_math_optimizations)
+ {
+   if (TYPE_PRECISION (ty1) > TYPE_PRECISION (newtype))
  newtype = ty1;
+
if (TYPE_PRECISION (ty2) > TYPE_PRECISION (newtype))
- newtype = ty2; }
+ newtype = ty2;
+ }
+ }
+
   /* Sometimes this transformation is safe (cannot
  change results through affecting double rounding
  cases) and sometimes it is not.  If NEWTYPE is
@@ -5654,3 +5662,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
  @0)
+
+/* Convert expressions of the form
+   (x) math_call1 ((y) z) where (x) and z are the same type, into
+   math_call2 (z), where math_call2 is the math builtin for
+   type x.  Type x (and therefore type of z) must be a lower precision
+   than y/math_call1.  */
+(if (flag_unsafe_math_optimizations && !flag_errno_math)
+  (for op (COSH EXP EXP10 EXP2 EXPM1 GAMMA J0 J1 LGAMMA
+  POW10 SINH TGAMMA Y0 Y1 ACOS ACOSH ASIN ASINH
+  ATAN ATANH CBRT COS ERF ERFC LOG LOG10 LOG2
+  LOG1P SIN TAN TANH SQRT FABS LOGB)
+(simplify
+  (convert (op@0 (convert@1 @2)))
+   (if (SCALAR_FLOAT_TYPE_P (type) && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1))
+ && SCALAR_FLOAT_TYPE_P (TREE_TYPE (@2))
+

Re: [PATCH][AArch64] Add Linux hwcap strings for some extensions

2019-09-02 Thread James Greenhalgh
On Fri, Aug 23, 2019 at 05:42:30PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch adds feature strings for some of the extensions. This string 
> is what is read from /proc/cpuinfo on Linux systems
> and used during -march=native detection.
> 
> The strings are taken from the kernel source tree at:
> https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinfo.c#L45
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

OK.

Thanks,
James

> Thanks,
> Kyrill
> 
> 2019-08-23  Kyrylo Tkachov  
> 
>      * config/aarch64/aarch64-option-extensions.def (sb): Add feature
>      string.
>      (ssbs): Likewise.
>      (sve2): Likewise.
>      (sve2-sm4): Likewise.
>      (sveaes): Likewise.
>      (svesha3): Likewise.
>      (svebitperm): Likewise.
> 


Re: [PATCH][AArch64] Add support for __jcvt intrinsic

2019-09-02 Thread James Greenhalgh
On Mon, Sep 02, 2019 at 01:16:32PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch implements the __jcvt ACLE intrinsic [1] that maps down to 
> the FJCVTZS [2] instruction from Armv8.3-a.
> No fancy mode iterators or nothing. Just a single builtin, UNSPEC and 
> define_insn and the associate plumbing.
> This patch also defines __ARM_FEATURE_JCVT to indicate when the 
> intrinsic is available.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

OK.

Thanks,
James

> Thanks,
> Kyrill
> 
> [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
> [2] 
> https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero
> 
> 2019-09-02  Kyrylo Tkachov  
> 
>      * config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
>      (aarch64_fjcvtzs): New define_insn.
>      * config/aarch64/aarch64.h (TARGET_JSCVT): Define.
>      * config/aarch64/aarch64-builtins.c (aarch64_builtins):
>      Add AARCH64_JSCVT.
>      (aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
>      (aarch64_expand_builtin): Handle AARCH64_JSCVT.
>      * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>      __ARM_FEATURE_JCVT where appropriate.
>      * config/aarch64/arm_acle.h (__jcvt): Define.
> 
> 2019-09-02  Kyrylo Tkachov  
> 
>      * gcc.target/aarch64/acle/jcvt_1.c: New test.
> 


Re: [PATCH] builtin fadd variants implementation

2019-09-02 Thread Joseph Myers
On Mon, 2 Sep 2019, Tejas Joshi wrote:

> Hello.
> Should a result like 1.4 be considered as inexact if truncating
> (narrowing?) from double to float? (due to loss of trailing bits)

If the mathematical result of the arithmetic operation is literally the 
decimal number 1.4, as opposed to the double value represented by the C 
constant 1.4 which is actually 0x1.6p+0, then it is inexact 
regardless of the (non-decimal) types involved.  For example, fdiv (7, 5), 
ddivl (7, 5), etc. are always inexact.

If the mathematical result of the arithmetic operation is 
0x1.6p+0, the closest approximation to 1.4 in IEEE binary64, 
then it is inexact for result formats narrower than binary64 and exact for 
result formats that can represent that value.  For example, fadd (1.4, 
0.0) is inexact (the truncation to float is inexact although the addition 
is exact).  But daddl (1.4, 0.0) - note the arguments are double 
constants, not long double - is exact, because the mathematical result is 
exactly representable in double.  Whereas daddl (1.4L, 0.0L) would be 
inexact if long double is wider than double.

The question is always whether the infinite-precision mathematical result 
of the arithmetic operation - which takes values representable in its 
argument types - is exactly representable in the final result type.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions

2019-09-02 Thread Ilya Leoshkevich
> Am 02.09.2019 um 12:37 schrieb Richard Biener :
> 
> On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich  wrote:
>> 
>>> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich :
>>> 
 Am 30.08.2019 um 09:12 schrieb Richard Biener :
 
 On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich  
 wrote:
> 
>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich :
>> 
>> Bootstrap and regtest running on x86_64-redhat-linux and
>> s390x-redhat-linux.
>> 
>> This patch series adds signaling FP comparison support (both scalar and
>> vector) to s390 backend.
> 
> I'm running into a problem on ppc64 with this patch, and it would be
> great if someone could help me figure out the best way to resolve it.
> 
> vector36.C test is failing because gimplifier produces the following
> 
> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> 
> from
> 
> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>{ -1, -1, -1, -1 } ,
>{ 0, 0, 0, 0 } >
> 
> Since the comparison tree code is now hidden behind a temporary, my code
> does not have anything to pass to the backend.  The reason for creating
> a temporary is that the comparison can trap, and so the following check
> in gimplify_expr fails:
> 
> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
>  goto out;
> 
> gimple_test_f is is_gimple_condexpr, and it eventually calls
> operation_could_trap_p (GT).
> 
> My current solution is to simply state that backend does not support
> SSA_NAME in vector comparisons, however, I don't like it, since it may
> cause performance regressions due to having to fall back to scalar
> comparisons.
> 
> I was thinking about two other possible solutions:
> 
> 1. Change the gimplifier to allow trapping vector comparisons.  That's
> a bit complicated, because tree_could_throw_p checks not only for
> floating point traps, but also e.g. for array index out of bounds
> traps.  So I would have to create a tree_could_throw_p version which
> disregards specific kinds of traps.
> 
> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
> its tree_code instead of SSA_NAME.  The potential problem I see with
> this is that there appears to be no guarantee that _5 will be inlined
> into _6 at a later point.  So if we say that we don't need to fall
> back to scalar comparisons based on availability of vector >
> instruction and inlining does not happen, then what's actually will
> be required is vector selection (vsel on S/390), which might not be
> available in general case.
> 
> What would be a better way to proceed here?
 
 On GIMPLE there isn't a good reason to split out trapping comparisons
 from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
 where it is important because we'd have no way to represent EH info
 when not done.  It might be a bit awkward to preserve EH across RTL
 expansion though in case the [VEC_]COND_EXPR are not expanded
 as a single pattern, but I'm not sure.
>>> 
>>> Ok, so I'm testing the following now - for the problematic test that
>>> helped:
>>> 
>>> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
>>> index b0c9f9b671a..940aa394769 100644
>>> --- a/gcc/gimple-expr.c
>>> +++ b/gcc/gimple-expr.c
>>> @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
>>>|| TREE_CODE (t) == BIT_FIELD_REF);
>>> }
>>> 
>>> -/*  Return true if T is a GIMPLE condition.  */
>>> +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr. 
>>>  */
>>> 
>>> -bool
>>> -is_gimple_condexpr (tree t)
>>> +static bool
>>> +is_gimple_condexpr_1 (tree t, bool allow_traps)
>>> {
>>>  return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
>>> - && !tree_could_throw_p (t)
>>> + && (allow_traps || !tree_could_throw_p (t))
>>>  && is_gimple_val (TREE_OPERAND (t, 0))
>>>  && is_gimple_val (TREE_OPERAND (t, 1;
>>> }
>>> 
>>> +/*  Return true if T is a GIMPLE condition.  */
>>> +
>>> +bool
>>> +is_gimple_condexpr (tree t)
>>> +{
>>> +  return is_gimple_condexpr_1 (t, false);
>>> +}
>>> +
>>> +/* Like is_gimple_condexpr, but allow the T to trap.  */
>>> +
>>> +bool
>>> +is_possibly_trapping_gimple_condexpr (tree t)
>>> +{
>>> +  return is_gimple_condexpr_1 (t, true);
>>> +}
>>> +
>>> /* Return true if T is a gimple address.  */
>>> 
>>> bool
>>> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
>>> index 1ad1432bd17..20546ca5b99 100644
>>> --- a/gcc/gimple-expr.h
>>> +++ b/gcc/gimple-expr.h
>>> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum 
>>> tree_code *, tree *,
>>> tree *);
>>>

Re: [ARM/FDPIC v5 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2019-09-02 Thread Richard Sandiford
Sorry for the slow reply.

Christophe Lyon  writes:
> On 16/07/2019 13:58, Richard Sandiford wrote:
>> Christophe Lyon  writes:
>>> +(define_insn "*restore_pic_register_after_call"
>>> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")
>>> +  (match_operand:SI 1 "nonimmediate_operand" "r,m")]
>>> +  UNSPEC_PIC_RESTORE)
>>> + (use (match_dup 1))
>>> + (clobber (match_dup 0))])
>>> +  ]
>>> +  ""
>>> +  "@
>>> +  mov\t%0, %1
>>> +  ldr\t%0, %1"
>>> +)
>>> +
>>>   (define_expand "call_internal"
>>> [(parallel [(call (match_operand 0 "memory_operand" "")
>>> (match_operand 1 "general_operand" ""))
>> 
>> Since operand 0 is significant after the instruction, I think this
>> should be:
>> 
>> (define_insn "*restore_pic_register_after_call"
>>[(set (match_operand:SI 0 "s_register_operand" "+r,r")
>>  (unspec:SI [(match_dup 0)
>>  (match_operand:SI 1 "nonimmediate_operand" "r,m")]
>> UNSPEC_PIC_RESTORE))]
>>...
>> 
>> The (use (match_dup 1)) looks redundant, since the unspec itself
>> uses operand 1.
>> 
> When I try that, I have cases where the restore instruction is discarded, 
> when the call happens just before function return. Since r9 is caller-saved, 
> it should be restored but after dse2 the dumps say:
> (insn (set (reg:SI 9 r9)
> (unspec:SI [
> (reg:SI 9 r9)
> (reg:SI 4 r4 [121])
>   ] UNSPEC_PIC_RESTORE))
> (expr_list:REG_UNUSED (reg:SI 9 r9) (nil
>
> and this is later removed by cprop_hardreg (which says the exit block uses 
> r4, sp, and lr: should I make it use r9?)

But if it's caller-saved (i.e. call-clobbered), function A shouldn't
need to restore r9 after a call unless A needs the value of r9 for
something.  I.e. A shouldn't need to restore r9 for A's own caller,
because the caller should be doing that iself.

So if r9 is caller-saved and not referenced between the call and
function exit, deleting the restore sounds like the right thing to do.

Richard


Re: [PATCH] S/390: Fix failing RTL check in s390_canonicalize_comparison

2019-09-02 Thread Andreas Krebbel
On 02.09.19 16:46, Ilya Leoshkevich wrote:
> Bootstrap and regtest running on s390x-redhat-linux.
> 
> The new sigfpe-eh.c fails with
> 
> internal compiler error: RTL check: expected elt 0 type 'e' or 'u', have 
> 'w' (rtx const_int)
> 
> This is most likely due to a typo: XEXP (*op1, 0) was used, when
> XEXP (*op1, 0) was intended.  This did not cause any user-visible
> problems, because reversed_comparison_code_parts ignores the
> respective argument, and the release compiler is built without RTL
> checks.
> 
> gcc/ChangeLog:
> 
> 2019-09-02  Ilya Leoshkevich  
> 
>   * config/s390/s390.c (s390_canonicalize_comparison): Use XEXP
>   (*op0, 1) instead of XEXP (*op1, 0).
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-09-02  Ilya Leoshkevich  
> 
>   * gcc.target/s390/sigfpe-eh.c: New test.

Ok. Thanks!

Andreas



[PATCH] S/390: Fix failing RTL check in s390_canonicalize_comparison

2019-09-02 Thread Ilya Leoshkevich
Bootstrap and regtest running on s390x-redhat-linux.

The new sigfpe-eh.c fails with

internal compiler error: RTL check: expected elt 0 type 'e' or 'u', have 
'w' (rtx const_int)

This is most likely due to a typo: XEXP (*op1, 0) was used, when
XEXP (*op1, 0) was intended.  This did not cause any user-visible
problems, because reversed_comparison_code_parts ignores the
respective argument, and the release compiler is built without RTL
checks.

gcc/ChangeLog:

2019-09-02  Ilya Leoshkevich  

* config/s390/s390.c (s390_canonicalize_comparison): Use XEXP
(*op0, 1) instead of XEXP (*op1, 0).

gcc/testsuite/ChangeLog:

2019-09-02  Ilya Leoshkevich  

* gcc.target/s390/sigfpe-eh.c: New test.
---
 gcc/config/s390/s390.c|  2 +-
 gcc/testsuite/gcc.target/s390/sigfpe-eh.c | 10 ++
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/sigfpe-eh.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index fa17d7d5d08..24784266848 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1783,7 +1783,7 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
   if (*code == EQ)
new_code = reversed_comparison_code_parts (GET_CODE (*op0),
   XEXP (*op0, 0),
-  XEXP (*op1, 0), NULL);
+  XEXP (*op0, 1), NULL);
   else
new_code = GET_CODE (*op0);
 
diff --git a/gcc/testsuite/gcc.target/s390/sigfpe-eh.c 
b/gcc/testsuite/gcc.target/s390/sigfpe-eh.c
new file mode 100644
index 000..52b0bf39d9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/sigfpe-eh.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=z196 -O2 -fexceptions -fnon-call-exceptions" } */
+
+extern float f (void);
+extern float g (void);
+
+float h (float x, float y)
+{
+  return x < y ? f () : g ();
+}
-- 
2.21.0



Re: [PATCH] [AARCH64] Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2019-09-02 Thread Kyrill Tkachov

Hi Shaokun

On 8/31/19 8:12 AM, Shaokun Zhang wrote:

The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, the CPU core will
not execute the unnecessary DCache clean or Icache Invalidation
instructions.

2019-08-31  Shaokun Zhang 

    * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC in
__aarch64_sync_cache_range function.


Sorry, I just tried compiling this to look at the assembly output. I 
think there's a couple of issues...




---
 libgcc/config/aarch64/sync-cache.c | 56 
--

 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/aarch64/sync-cache.c 
b/libgcc/config/aarch64/sync-cache.c

index 791f5e42ff44..0b057efbdcab 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along 
with this program;

 see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
 . */

+#define CTR_IDC_SHIFT   28
+#define CTR_DIC_SHIFT   29
+
 void __aarch64_sync_cache_range (const void *, const void *);

 void
@@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, 
const void *end)

   icache_lsize = 4 << (cache_info & 0xF);
   dcache_lsize = 4 << ((cache_info >> 16) & 0xF);

-  /* Loop over the address range, clearing one cache line at once.
- Data cache must be flushed to unification first to make sure the
- instruction cache fetches the updated data.  'end' is exclusive,
- as per the GNU definition of __clear_cache.  */
+  /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of 
Unification is

+ not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) {



By the C rules, this will evaluate to 0 always and the whole path is 
eliminated. What you want here is:


  if (((cache_info >> CTR_IDC_SHIFT) & 0x1) == 0x0)



+    /* Loop over the address range, clearing one cache line at once.
+   Data cache must be flushed to unification first to make sure the
+   instruction cache fetches the updated data.  'end' is exclusive,
+   as per the GNU definition of __clear_cache.  */

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));

-  for (; address < (const char *) end; address += dcache_lsize)
-    asm volatile ("dc\tcvau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += dcache_lsize)
+  asm volatile ("dc\tcvau, %0"
+   :
+   : "r" (address)
+   : "memory");
+  }

   asm volatile ("dsb\tish" : : : "memory");

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
+  /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the 
Point of

+ Unification is not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) {


Same here, this should be:

  if (((cache_info >> CTR_DIC_SHIFT) & 0x1) == 0x0)


Thanks,

Kyrill


+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));

-  for (; address < (const char *) end; address += icache_lsize)
-    asm volatile ("ic\tivau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += icache_lsize)
+  asm volatile ("ic\tivau, %0"
+   :
+   : "r" (address)
+   : "memory");

-  asm volatile ("dsb\tish; isb" : : : "memory");
+    asm volatile ("dsb\tish" : : : "memory");
+  }
+  asm volatile("isb" : : : "memory")
 }
--
2.7.4



Bunch of location improvements

2019-09-02 Thread Paolo Carlini

Hi,

all should be more or less straightforward. I also propose to use an 
additional range for that error message about constinit && constexpr 
mentioned to Marek a few days ago. Tested x86_64-linux.


Thanks, Paolo.

/

/cp
2019-09-02  Paolo Carlini  

* decl.c (has_designator_problem): Use cp_expr_loc_or_input_loc
in error_at.
(build_enumerator): Likewise.
(cp_finish_decl): Use DECL_SOURCE_LOCATION.
(grokdeclarator): Use id_loc in two error_at; change errror
message about constinit together constexpr to use two ranges.

/testsuite
2019-09-02  Paolo Carlini  

* g++.dg/cpp0x/enum29.C: Test location(s) too.
* g++.dg/cpp0x/lambda/lambda-ice10.C: Likewise.
* g++.dg/cpp2a/constinit3.C: Likewise.
* g++.dg/ext/desig4.C: Likewise.
* g++.dg/ext/label10.C: Likewise.
* g++.old-deja/g++.other/dtor3.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 275318)
+++ cp/decl.c   (working copy)
@@ -6108,8 +6108,9 @@ has_designator_problem (reshape_iter *d, tsubst_fl
   if (d->cur->index)
 {
   if (complain & tf_error)
-   error ("C99 designator %qE outside aggregate initializer",
-  d->cur->index);
+   error_at (cp_expr_loc_or_input_loc (d->cur->index),
+ "C99 designator %qE outside aggregate initializer",
+ d->cur->index);
   else
return true;
 }
@@ -7282,8 +7283,9 @@ cp_finish_decl (tree decl, tree init, bool init_co
   if ((flags & LOOKUP_CONSTINIT)
  && !(dk == dk_thread || dk == dk_static))
{
- error ("% can only be applied to a variable with static "
-"or thread storage duration");
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "% can only be applied to a variable with "
+   "static or thread storage duration");
  return;
}
 
@@ -10622,8 +10624,9 @@ grokdeclarator (const cp_declarator *declarator,
 && !uniquely_derived_from_p (ctype,
  current_class_type))
  {
-   error ("invalid use of qualified-name %<%T::%D%>",
-  qualifying_scope, decl);
+   error_at (id_declarator->id_loc,
+ "invalid use of qualified-name %<%T::%D%>",
+ qualifying_scope, decl);
return error_mark_node;
  }
  }
@@ -10810,8 +10813,9 @@ grokdeclarator (const cp_declarator *declarator,
  keywords shall appear in a decl-specifier-seq."  */
   if (constinit_p && constexpr_p)
 {
-  error_at (min_location (declspecs->locations[ds_constinit],
- declspecs->locations[ds_constexpr]),
+  gcc_rich_location richloc (declspecs->locations[ds_constinit]);
+  richloc.add_range (declspecs->locations[ds_constexpr]);
+  error_at (&richloc,
"can use at most one of the % and % "
"specifiers");
   return error_mark_node;
@@ -11815,7 +11819,8 @@ grokdeclarator (const cp_declarator *declarator,
&& inner_declarator->u.id.sfk == sfk_destructor
&& arg_types != void_list_node)
  {
-   error ("destructors may not have parameters");
+   error_at (declarator->id_loc,
+ "destructors may not have parameters");
arg_types = void_list_node;
parms = NULL_TREE;
  }
@@ -15155,8 +15160,9 @@ build_enumerator (tree name, tree value, tree enum
  if (! INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P
  (TREE_TYPE (value)))
{
- error ("enumerator value for %qD must have integral or "
-"unscoped enumeration type", name);
+ error_at (cp_expr_loc_or_input_loc (value),
+   "enumerator value for %qD must have integral or "
+   "unscoped enumeration type", name);
  value = NULL_TREE;
}
  else
Index: testsuite/g++.dg/cpp0x/enum29.C
===
--- testsuite/g++.dg/cpp0x/enum29.C (revision 275318)
+++ testsuite/g++.dg/cpp0x/enum29.C (working copy)
@@ -38,7 +38,7 @@ enum E0 { e0 = X0() };
 enum E1 { e1 = X1() };
 enum E2 { e2 = X2() };
 enum E3 { e3 = X3() };
-enum E4 { e4 = X4() };  // { dg-error "integral" }
+enum E4 { e4 = X4() };  // { dg-error "16:enumerator value for .e4. must have 
integral" }
 enum E5 { e5 = X5() };  // { dg-error "ambiguous" }
 
 enum F0 : int { f0 = X0() };
Index: testsuite/g++.dg/cpp0x/lambda/lambda-ice10.C
===

Re: [PATCH] [AARCH64] Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2019-09-02 Thread Kyrill Tkachov

Hi Shaokun,

On 8/31/19 8:12 AM, Shaokun Zhang wrote:

The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, the CPU core will
not execute the unnecessary DCache clean or Icache Invalidation
instructions.

2019-08-31  Shaokun Zhang 

    * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC in
__aarch64_sync_cache_range function.


As mentioned in the RFC, I think this is ok but...



---
 libgcc/config/aarch64/sync-cache.c | 56 
--

 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/aarch64/sync-cache.c 
b/libgcc/config/aarch64/sync-cache.c

index 791f5e42ff44..0b057efbdcab 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along 
with this program;

 see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
 . */

+#define CTR_IDC_SHIFT   28
+#define CTR_DIC_SHIFT   29
+
 void __aarch64_sync_cache_range (const void *, const void *);

 void
@@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, 
const void *end)

   icache_lsize = 4 << (cache_info & 0xF);
   dcache_lsize = 4 << ((cache_info >> 16) & 0xF);

-  /* Loop over the address range, clearing one cache line at once.
- Data cache must be flushed to unification first to make sure the
- instruction cache fetches the updated data.  'end' is exclusive,
- as per the GNU definition of __clear_cache.  */
+  /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of 
Unification is

+ not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) {
+    /* Loop over the address range, clearing one cache line at once.
+   Data cache must be flushed to unification first to make sure the
+   instruction cache fetches the updated data.  'end' is exclusive,
+   as per the GNU definition of __clear_cache.  */

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));

-  for (; address < (const char *) end; address += dcache_lsize)
-    asm volatile ("dc\tcvau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += dcache_lsize)
+  asm volatile ("dc\tcvau, %0"
+   :
+   : "r" (address)
+   : "memory");
+  }

   asm volatile ("dsb\tish" : : : "memory");

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
+  /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the 
Point of

+ Unification is not required for instruction to data coherence.  */
+
+  if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) {
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));

-  for (; address < (const char *) end; address += icache_lsize)
-    asm volatile ("ic\tivau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += icache_lsize)
+  asm volatile ("ic\tivau, %0"
+   :
+   : "r" (address)
+   : "memory");

-  asm volatile ("dsb\tish; isb" : : : "memory");
+    asm volatile ("dsb\tish" : : : "memory");
+  }
+  asm volatile("isb" : : : "memory")


... a semicolon has gone missing here after the asm volatile so this 
will not compiler. A problem with the mail client?


Thanks,

Kyrill




 }
--
2.7.4



Re: [PATCH] Disable postreload GCSE on large code

2019-09-02 Thread Richard Sandiford
Richard Biener  writes:
> This disables postreload GCSE the same way we disable GCSE/cprop.
> On the PR36262 testcase this removes
>
>  load CSE after reload  : 129.00 ( 72%)   0.08 (  5%) 130.50 ( 
> 72%)   6 kB (  0%)
>
> With a smaller testcase both PRE and postreload GCSE still run
> and GCSE shows itself roughly 2x the cost of postreload GCSE there
> (still wondering why we have two implementations of the same thing?!)
>
> I've seem postreload CSE pop up a lot on larger testcases while
> PRE turns itself off.
>
> So, does this look reasonable?
>
> Thanks,
> Richard.
>
> 2019-09-02  Richard Biener  
>
>   PR rtl-optimization/36262
>   * postreload-gcse.c: Include intl.h and gcse.h.
>   (gcse_after_reload_main): Skip pass if gcse_or_cprop_is_too_expensive
>   says so.

LGTM.  At first I thought:

  unsigned int memory_request = (n_basic_blocks_for_fn (cfun)
 * SBITMAP_SET_SIZE (max_reg_num ())
 * sizeof (SBITMAP_ELT_TYPE));

might be a bad approximation after reload, but it looks like the
sbitmap sizes are O(ninsns), and so it's probably pretty good after all.

Richard


Re: [PATCH] Fix PR 91605

2019-09-02 Thread Richard Biener
On Mon, 2 Sep 2019, Bernd Edlinger wrote:

> On 9/2/19 9:50 AM, Richard Biener wrote:
> > On Sun, 1 Sep 2019, Bernd Edlinger wrote:
> > 
> >> Hi,
> >>
> >> this fixes an oversight in r274986.
> >> We need to avoid using movmisalign on DECL_P which are not in memory,
> >> similar to the !mem_ref_refers_to_non_mem_p which unfortunately can't
> >> handle DECL_P.
> >>
> > 
> > But
> > 
> > -  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
> > +  && (DECL_P (to) ? MEM_P (DECL_RTL (to))
> > + : !mem_ref_refers_to_non_mem_p (to))
> > 
> > and in mem_ref_refers_to_non_mem_p we do
> > 
> >   if (!DECL_RTL_SET_P (base))
> > return nortl;
> > 
> >   return (!MEM_P (DECL_RTL (base)));
> > 
> > so when !DECL_RTL_SET_P (t) we can go full speed ahead?  That said,
> > can we refactor addr_expr_of_non_mem_decl_p_1 to put
> > 
> 
> Ah, I was not aware that DECL_RTL has a side-effect if !DECL_RTL_SET_P.
> 
> 
> >   if (TREE_CODE (addr) != ADDR_EXPR)
> > return false;
> > 
> >   tree base = TREE_OPERAND (addr, 0);
> > 
> > into the single caller and re-use it then also for the DECL_P case?
> > 
> 
> Yes, that is probably better then.
> 
> So how about this?
> Is it OK?

OK.

Thanks,
Richard.


Re: [PATCH] Fix PR 91605

2019-09-02 Thread Bernd Edlinger
On 9/2/19 9:50 AM, Richard Biener wrote:
> On Sun, 1 Sep 2019, Bernd Edlinger wrote:
> 
>> Hi,
>>
>> this fixes an oversight in r274986.
>> We need to avoid using movmisalign on DECL_P which are not in memory,
>> similar to the !mem_ref_refers_to_non_mem_p which unfortunately can't
>> handle DECL_P.
>>
> 
> But
> 
> -  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
> +  && (DECL_P (to) ? MEM_P (DECL_RTL (to))
> + : !mem_ref_refers_to_non_mem_p (to))
> 
> and in mem_ref_refers_to_non_mem_p we do
> 
>   if (!DECL_RTL_SET_P (base))
> return nortl;
> 
>   return (!MEM_P (DECL_RTL (base)));
> 
> so when !DECL_RTL_SET_P (t) we can go full speed ahead?  That said,
> can we refactor addr_expr_of_non_mem_decl_p_1 to put
> 

Ah, I was not aware that DECL_RTL has a side-effect if !DECL_RTL_SET_P.


>   if (TREE_CODE (addr) != ADDR_EXPR)
> return false;
> 
>   tree base = TREE_OPERAND (addr, 0);
> 
> into the single caller and re-use it then also for the DECL_P case?
> 

Yes, that is probably better then.

So how about this?
Is it OK?


Thanks
Bernd.
2019-09-01  Bernd Edlinger  

	PR middle-end/91605
	* expr.c (addr_expr_of_non_mem_decl_p_1): Refactor into...
	(non_mem_decl_p): ...this.
	(mem_ref_refers_to_non_mem_p): Handle DECL_P as well ase MEM_REF.
	(expand_assignment): Call mem_ref_referes_to_non_mem_p
	unconditionally as before.

testsuite:
2019-09-01  Bernd Edlinger  

	PR middle-end/91605
	* g++.target/i386/pr91605.C: New test.

Index: gcc/expr.c
===
--- gcc/expr.c	(revision 275279)
+++ gcc/expr.c	(working copy)
@@ -4942,18 +4942,13 @@ get_bit_range (poly_uint64_pod *bitstart, poly_uin
   *bitend = *bitstart + tree_to_poly_uint64 (DECL_SIZE (repr)) - 1;
 }
 
-/* Returns true if ADDR is an ADDR_EXPR of a DECL that does not reside
-   in memory and has non-BLKmode.  DECL_RTL must not be a MEM; if
-   DECL_RTL was not set yet, return NORTL.  */
+/* Returns true if BASE is a DECL that does not reside in memory and
+   has non-BLKmode.  DECL_RTL must not be a MEM; if
+   DECL_RTL was not set yet, return false.  */
 
 static inline bool
-addr_expr_of_non_mem_decl_p_1 (tree addr, bool nortl)
+non_mem_decl_p (tree base)
 {
-  if (TREE_CODE (addr) != ADDR_EXPR)
-return false;
-
-  tree base = TREE_OPERAND (addr, 0);
-
   if (!DECL_P (base)
   || TREE_ADDRESSABLE (base)
   || DECL_MODE (base) == BLKmode)
@@ -4960,19 +4955,33 @@ static inline bool
 return false;
 
   if (!DECL_RTL_SET_P (base))
-return nortl;
+return false;
 
   return (!MEM_P (DECL_RTL (base)));
 }
 
-/* Returns true if the MEM_REF REF refers to an object that does not
+/* Returns true if REF refers to an object that does not
reside in memory and has non-BLKmode.  */
 
 static inline bool
 mem_ref_refers_to_non_mem_p (tree ref)
 {
-  tree base = TREE_OPERAND (ref, 0);
-  return addr_expr_of_non_mem_decl_p_1 (base, false);
+  tree base;
+
+  if (TREE_CODE (ref) == MEM_REF
+  || TREE_CODE (ref) == TARGET_MEM_REF)
+{
+  tree addr = TREE_OPERAND (ref, 0);
+
+  if (TREE_CODE (addr) != ADDR_EXPR)
+	return false;
+
+  base = TREE_OPERAND (addr, 0);
+}
+  else
+base = ref;
+
+  return non_mem_decl_p (base);
 }
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
@@ -5004,7 +5013,7 @@ expand_assignment (tree to, tree from, bool nontem
|| TREE_CODE (to) == TARGET_MEM_REF
|| DECL_P (to))
   && mode != BLKmode
-  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
+  && !mem_ref_refers_to_non_mem_p (to)
   && ((align = get_object_alignment (to))
 	  < GET_MODE_ALIGNMENT (mode))
   && (((icode = optab_handler (movmisalign_optab, mode))
Index: gcc/testsuite/g++.target/i386/pr91605.C
===
--- gcc/testsuite/g++.target/i386/pr91605.C	(revision 0)
+++ gcc/testsuite/g++.target/i386/pr91605.C	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-fpack-struct -mavx" } */
+
+struct A {
+  __attribute__((__vector_size__(4 * sizeof(double double data;
+};
+struct B {
+  A operator*(B);
+};
+void fn1() {
+  B x, y;
+  x *y;
+}


Re: [PATCH] Fix up go regressions caused by my recent switchconv changes (PR go/91617)

2019-09-02 Thread Jakub Jelinek
On Mon, Sep 02, 2019 at 01:29:24AM -0700, Andrew Pinski wrote:
> Seems like this would fix PR91632 also.
> Which has a C testcase included.

Indeed, I've committed the following after testing it with the
patch reverted as well as with current trunk where it doesn't FAIL anymore.

2019-09-02  Jakub Jelinek  

PR tree-optimization/91632
* gcc.c-torture/execute/pr91632.c: New test.

--- gcc/testsuite/gcc.c-torture/execute/pr91632.c.jj2019-09-02 
15:28:10.598774511 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr91632.c   2019-09-02 
15:28:00.540925398 +0200
@@ -0,0 +1,30 @@
+/* PR tree-optimization/91632 */
+/* { dg-additional-options "-fwrapv" } */
+
+static int
+__attribute__((noipa))
+foo (char x)
+{
+  switch (x)
+{
+case '"':
+case '<':
+case '>':
+case '\\':
+case '^':
+case '`':
+case '{':
+case '|':
+case '}':
+  return 0;
+}
+  return 1;
+}
+
+int
+main ()
+{
+  if (foo ('h') == 0)
+__builtin_abort ();
+  return 0;
+}


Jakub


[PATCH] Disable postreload GCSE on large code

2019-09-02 Thread Richard Biener


This disables postreload GCSE the same way we disable GCSE/cprop.
On the PR36262 testcase this removes

 load CSE after reload  : 129.00 ( 72%)   0.08 (  5%) 130.50 ( 
72%)   6 kB (  0%)

With a smaller testcase both PRE and postreload GCSE still run
and GCSE shows itself roughly 2x the cost of postreload GCSE there
(still wondering why we have two implementations of the same thing?!)

I've seem postreload CSE pop up a lot on larger testcases while
PRE turns itself off.

So, does this look reasonable?

Thanks,
Richard.

2019-09-02  Richard Biener  

PR rtl-optimization/36262
* postreload-gcse.c: Include intl.h and gcse.h.
(gcse_after_reload_main): Skip pass if gcse_or_cprop_is_too_expensive
says so.

Index: gcc/postreload-gcse.c
===
--- gcc/postreload-gcse.c   (revision 275294)
+++ gcc/postreload-gcse.c   (working copy)
@@ -38,7 +38,9 @@ along with GCC; see the file COPYING3.
 #include "params.h"
 #include "tree-pass.h"
 #include "dbgcnt.h"
+#include "intl.h"
 #include "gcse-common.h"
+#include "gcse.h"
 
 /* The following code implements gcse after reload, the purpose of this
pass is to cleanup redundant loads generated by reload and other
@@ -1371,6 +1373,10 @@ delete_redundant_insns (void)
 static void
 gcse_after_reload_main (rtx f ATTRIBUTE_UNUSED)
 {
+  /* Return if it is too expensive.  */
+  if (gcse_or_cprop_is_too_expensive (_("load CSE after register allocation "
+   "disabled")))
+return;
 
   memset (&stats, 0, sizeof (stats));
 


Re: [libgomp, GSoC'19] Work-stealing task scheduling

2019-09-02 Thread Ray Kim
Hi Jakub,

> Your mailer sadly made the patch not applicable, I had to spent quite a long
> time to just undo the undesirable line breaking.  Please fix your mailer not
> to break lines automatically, or at least send patches as attachments on
> which hopefully it will not do that on.

Sorry for wasting your time again.
I just changed my mailer but missed to changed the wrapping option.
This will not occur from now on.

> > This patch implemented work-stealing task scheduling for GSoC'19 final
> > evaluations.
> > Currently there are some issues that needs to be further addressed,
> > however I think it is functional.
> > 
> > This that could be improved are as follows:
> > 1. Currently the threads busy wait even when the task queues are empty
> > until the stopping criterion are met.
> > The previous way the task waits were implemented do not work well on
> > this implementation.
> > So I initially removed the task wait feature all together.
> 
> For the GSoC submission I guess I can live with that, but not long term,
> many apps oversubscribe the machine, run more threads than the hw has
> available and unbounded busy waiting in that case is unacceptable.
> For latency reasons some busy waiting is performed, but the user should have
> control over how long that busy waiting is done and then have a fallback to
> sleeping.  See e.g. OMP_WAIT_POLICY or GOMP_SPINCOUNT env variables that
> allow to control that behavior.

I totally agree with you.
This was a quick hack to match the GSoC submission.
I think though, that implementing the waiting will take some additional effort.

> E.g. for the taskwait case, when you run gomp_execute_task and it returns
> false, hasn't executed any task, can't you perhaps task the depend_lock
> on the paren't or just atomically set some state bit that you are about to
> sleep and go to sleep, and then on the other side when doing
> __atomic_sub_fetch (&parent->num_children, 1, MEMMODEL_ACQ_REL);
> check the return value and if num_children is 0, verify if the parent isn't
> sleeping and if it could be sleeping, wake it up.
> Guess similarly for taskgroup wait or waiting for dependencies.

This is close to how taskwait was originally implemented,
However a deadlock kept causing the threads to never wake up.
I suppose waking up the parent from the task has some issue in the 
work-stealing context?
Also, I think the threads should be woken up whenever a new task is added to 
the team.
Since we're doing work-stealing, any thread can run the newly added task.
This however, wasn't possible in the previous implementation,
since the parent task of the newly added task cannot access the tasks of the 
sleeping threads.
I think I'll probably have to figure out a clearer way to implement this.

> > 2. The barrier interfaces use a big bold lock.
> > Because the barrier related functions such as gomp_team_barrier_done
> > are not atomic (as far as I believe), they are protected using team-
> > >barrier_lock.
> 
> Can't those operations that need protection just be turned into __atomic_*
> instead?
> 

Yes, I'll work on this.
 
> Have you managed to run some benchmarks?  E.g. the EPCC microbenchmarks,
> LCALS/CLOMP, etc.?

I didn't run any serious benchmark but plan to run the Barcelona OMP task 
benchmark.
Benchmarks on simple tasks revealed that the work-stealing version is a little 
bit 
slower than the GSoC 2nd evaluation version.
This could probably be caused by the busy waiting.
I'll post benchmark results once the other issues are resolved.

> I must say I don't understand why do you use atomics on
> gomp_task_state state as kind of critical section, instead of say
> depend_lock mutex?  I'd note that e.g. the data structures layout needs to
> be carefully considered so that there is no cache line ping-pong on often
> atomically modified cache lines from other threads.

This was actually the part I suffered the most.
Using depend_lock is not possible since either the last child,
or the thread executing the parent will free the task.
Once the task is freed, accessing the depend_lock will cause a segfault.
I really couldn't come up with a more safer, clearer solution.
If you have any better way to implement this, please let me know.
I'll be really greatful.

I'll try to improve the issues you raised.
Thanks for the thorough review.

Ray Kim






Re: [PATCH][AArch64] Implement ACLE intrinsics for FRINT[32,64][Z,X]

2019-09-02 Thread Kyrill Tkachov



On 9/2/19 1:16 PM, Kyrill Tkachov wrote:

Hi all,

This patch implements the ACLE intrinsics to access the
FRINT[32,64][Z,X] scalar[1] and vector[2][3] instructions
from Armv8.5-a. These are enabled when the __ARM_FEATURE_FRINT macro is
defined.

They're added in a fairly standard way through builtins and unspecs at
the RTL level.
The scalar intrinsics


Sorry, some malfunction occurred.

The scalar intrinsics are available through  whereas the Adv 
SIMD ones are in 


Thanks,

Kyrill




Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

[1] 
https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

[2]
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?page=2&search=vrnd32
[3]
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?page=2&search=vrnd64

2019-09-02  Kyrylo Tkachov  

 * config/aarch64/aarch64.md ("unspec"): Add UNSPEC_FRINT32Z,
 UNSPEC_FRINT32X, UNSPEC_FRINT64Z, UNSPEC_FRINT64X.
 (aarch64_): New define_insn.
 * config/aarch64/aarch64.h (TARGET_FRINT): Define.
 * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
 __ARM_FEATURE_FRINT when appropriate.
 * config/aarch64/aarch64-simd-builtins.def: Add builtins for 
frint32z,

 frint32x, frint64z, frint64x.
 * config/aarch64/arm_acle.h (__rint32zf, __rint32z, __rint64zf,
 __rint64z, __rint32xf, __rint32x, __rint64xf, __rint64x): Define.
 * config/aarch64/arm_neon.h (vrnd32z_f32, vrnd32zq_f32, vrnd32z_f64,
 vrnd32zq_f64, vrnd32x_f32, vrnd32xq_f32, vrnd32x_f64, vrnd32xq_f64,
 vrnd64z_f32, vrnd64zq_f32, vrnd64z_f64, vrnd64zq_f64, vrnd64x_f32,
 vrnd64xq_f32, vrnd64x_f64, vrnd64xq_f64): Define.
 * config/aarch64/iterators.md (VSFDF): Define.
 (FRINTNZX): Likewise.
 (frintnzs_op): Likewise.

2019-09-02  Kyrylo Tkachov  

 * gcc.target/aarch64/acle/rintnzx_1.c: New test.
 * gcc.target/aarch64/simd/vrndnzx_1.c: Likewise.



Re: [libgomp, GSoC'19] Work-stealing task scheduling

2019-09-02 Thread Jakub Jelinek
On Mon, Aug 26, 2019 at 01:08:39AM +0900, Ray Kim wrote:
> This patch implemented work-stealing task scheduling for GSoC'19 final
> evaluations.
> Currently there are some issues that needs to be further addressed,
> however I think it is functional.
> 
> This that could be improved are as follows:
> 1. Currently the threads busy wait even when the task queues are empty
> until the stopping criterion are met.
> The previous way the task waits were implemented do not work well on
> this implementation.
> So I initially removed the task wait feature all together.

For the GSoC submission I guess I can live with that, but not long term,
many apps oversubscribe the machine, run more threads than the hw has
available and unbounded busy waiting in that case is unacceptable.
For latency reasons some busy waiting is performed, but the user should have
control over how long that busy waiting is done and then have a fallback to
sleeping.  See e.g. OMP_WAIT_POLICY or GOMP_SPINCOUNT env variables that
allow to control that behavior.

E.g. for the taskwait case, when you run gomp_execute_task and it returns
false, hasn't executed any task, can't you perhaps task the depend_lock
on the paren't or just atomically set some state bit that you are about to
sleep and go to sleep, and then on the other side when doing
__atomic_sub_fetch (&parent->num_children, 1, MEMMODEL_ACQ_REL);
check the return value and if num_children is 0, verify if the parent isn't
sleeping and if it could be sleeping, wake it up.
Guess similarly for taskgroup wait or waiting for dependencies.

> 2. The barrier interfaces use a big bold lock.
> Because the barrier related functions such as gomp_team_barrier_done
> are not atomic (as far as I believe), they are protected using team-
> >barrier_lock.

Can't those operations that need protection just be turned into __atomic_*
instead?

> 3. The currently used work-stealing algorithm is extremely basic.
> Currently, if the thread dedicated queue is empty, work is stolen by
>  uniformly picking a random queue. While this is the way libomp does
> things, it could be improved by applying more advanced algorithms.
> Detph-first scheduling and locality aware work-stealing for example.
> 
> I'll run some benchmarks and post the results.

Have you managed to run some benchmarks?  E.g. the EPCC microbenchmarks,
LCALS/CLOMP, etc.?

As for the implementation, e.g.
  /* This array contains the taskqueues and the structures for implicit tasks.
 The implicit tasks start from
 &implicit_task + sizeof (gomp_taskqueue) * num_taskqueue.  */
  struct gomp_task taskqueues_and_implicit_tasks[];
is too ugly, you chose to put first the taskqueues which have
struct gomp_taskqueue type and only after that the struct gomp_task ones, so
using struct gomp_task for the flexible array member is wrong.
Either swap those two, put implicit tasks first and after that the
taskqueues, then the implicit tasks can be accessed through
->implicit_tasks[num].  Or change the type to struct gomp_taskqueue and
access the taskqueues directly and for the implicit tasks use the inline
accessor.  Or perhaps best, use a new type that puts struct gomp_task and
struct gomp_taskqueue next to each other and use that type as the type
of the flexible array member, then use ->implicit_tasks[i].task
for what was previously ->implicit_tasks[i] and ->implicit_tasks[i].queue
for the queues.

I must say I don't understand why do you use atomics on
gomp_task_state state as kind of critical section, instead of say
depend_lock mutex?  I'd note that e.g. the data structures layout needs to
be carefully considered so that there is no cache line ping-pong on often
atomically modified cache lines from other threads.

Further incremental changes what I've noticed when going through the code:

--- libgomp/libgomp.h   2019-09-02 13:02:20.187427096 +0200
+++ libgomp/libgomp.h   2019-09-02 14:13:18.376464048 +0200
@@ -460,7 +460,7 @@
 
   /* Mutex for protecting the dependency hash table and the lifecycle of the
  task.  The lock is taken whenever dependencies are updated and the
- a task lifecycle related critical section is entered (e.g. num_children
+ task lifecycle related critical section is entered (e.g. num_children
  becomes 0).  */
   gomp_mutex_t depend_lock;
 

--- libgomp/task.c  2019-09-02 12:54:27.879483614 +0200
+++ libgomp/task.c  2019-09-02 13:57:33.900760986 +0200
@@ -819,15 +819,11 @@
 requeuing here.  */
   if (__atomic_load_n (&ttask->state, MEMMODEL_ACQUIRE)
  == GOMP_TARGET_TASK_FINISHED)
-   {
- gomp_target_task_completion (team, task, thr);
-   }
+   gomp_target_task_completion (team, task, thr);
   else if (__atomic_exchange_n (&ttask->state, GOMP_TARGET_TASK_RUNNING,
MEMMODEL_ACQ_REL)
   == GOMP_TARGET_TASK_FINISHED)
-   {
- gomp_target_task_completion (team, task, thr);
-   }
+   gomp_target_task_completion

[PATCH][AArch64] Add support for __jcvt intrinsic

2019-09-02 Thread Kyrill Tkachov

Hi all,

This patch implements the __jcvt ACLE intrinsic [1] that maps down to 
the FJCVTZS [2] instruction from Armv8.3-a.
No fancy mode iterators or nothing. Just a single builtin, UNSPEC and 
define_insn and the associate plumbing.
This patch also defines __ARM_FEATURE_JCVT to indicate when the 
intrinsic is available.


Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] 
https://developer.arm.com/docs/ddi0596/latest/simd-and-floating-point-instructions-alphabetic-order/fjcvtzs-floating-point-javascript-convert-to-signed-fixed-point-rounding-toward-zero


2019-09-02  Kyrylo Tkachov  

    * config/aarch64/aarch64.md (UNSPEC_FJCVTZS): Define.
    (aarch64_fjcvtzs): New define_insn.
    * config/aarch64/aarch64.h (TARGET_JSCVT): Define.
    * config/aarch64/aarch64-builtins.c (aarch64_builtins):
    Add AARCH64_JSCVT.
    (aarch64_init_builtins): Initialize __builtin_aarch64_jcvtzs.
    (aarch64_expand_builtin): Handle AARCH64_JSCVT.
    * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
    __ARM_FEATURE_JCVT where appropriate.
    * config/aarch64/arm_acle.h (__jcvt): Define.

2019-09-02  Kyrylo Tkachov  

    * gcc.target/aarch64/acle/jcvt_1.c: New test.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a20a2ae1acc1ea8951d899431b57be3bd8c9ad3e..9424916d2466aa9f014ce7c0a13667ccc8eeb9ed 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -438,6 +438,8 @@ enum aarch64_builtins
   /* Special cased Armv8.3-A Complex FMA by Lane quad Builtins.  */
   AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
   AARCH64_SIMD_FCMLA_LANEQ_BUILTINS
+  /* Builtin for Arm8.3-a Javascript conversion instruction.  */
+  AARCH64_JSCVT,
   /* TME builtins.  */
   AARCH64_TME_BUILTIN_TSTART,
   AARCH64_TME_BUILTIN_TCOMMIT,
@@ -1150,6 +1152,12 @@ aarch64_init_builtins (void)
   aarch64_init_builtin_rsqrt ();
   aarch64_init_rng_builtins ();
 
+  tree ftype_jcvt
+= build_function_type_list (intSI_type_node, double_type_node, NULL);
+  aarch64_builtin_decls[AARCH64_JSCVT]
+= add_builtin_function ("__builtin_aarch64_jcvtzs", ftype_jcvt,
+			AARCH64_JSCVT, BUILT_IN_MD, NULL, NULL_TREE);
+
   /* Initialize pointer authentication builtins which are backed by instructions
  in NOP encoding space.
 
@@ -1739,6 +1747,16 @@ aarch64_expand_builtin (tree exp,
 
   return target;
 
+case AARCH64_JSCVT:
+  arg0 = CALL_EXPR_ARG (exp, 0);
+  op0 = force_reg (DFmode, expand_normal (arg0));
+  if (!target)
+	target = gen_reg_rtx (SImode);
+  else
+	target = force_reg (SImode, target);
+  emit_insn (GEN_FCN (CODE_FOR_aarch64_fjcvtzs) (target, op0));
+  return target;
+
 case AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF:
 case AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF:
 case AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF:
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index c05efeda820f4428eace6e57020eed1b288032e9..137aa18af4620d4cefce1dfe5d92e4df67a278ba 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -110,6 +110,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_CRC32, "__ARM_FEATURE_CRC32", pfile);
   aarch64_def_or_undef (TARGET_DOTPROD, "__ARM_FEATURE_DOTPROD", pfile);
   aarch64_def_or_undef (TARGET_COMPLEX, "__ARM_FEATURE_COMPLEX", pfile);
+  aarch64_def_or_undef (TARGET_JSCVT, "__ARM_FEATURE_JCVT", pfile);
 
   cpp_undef (pfile, "__AARCH64_CMODEL_TINY__");
   cpp_undef (pfile, "__AARCH64_CMODEL_SMALL__");
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 67279b44198be1ea0e950c80504e948d3af504f9..de270e3bf818ea0a2096abc3529abf129d822e88 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -289,6 +289,9 @@ extern unsigned aarch64_architecture_version;
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3	(AARCH64_ISA_V8_3)
 
+/* Javascript conversion instruction from Armv8.3-a.  */
+#define TARGET_JSCVT	(TARGET_FLOAT && AARCH64_ISA_V8_3)
+
 /* Armv8.3-a Complex number extension to AdvSIMD extensions.  */
 #define TARGET_COMPLEX (TARGET_SIMD && TARGET_ARMV8_3)
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d933916e519feac38b79e6d42ff4f0a340de67c6..13e09e0a40aae9993b7a2f8f6c6e10a929994677 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -141,6 +141,7 @@
 UNSPEC_CRC32X
 UNSPEC_FCVTZS
 UNSPEC_FCVTZU
+UNSPEC_FJCVTZS
 UNSPEC_FRINT32Z
 UNSPEC_FRINT32X
 UNSPEC_FRINT64Z
@@ -6925,6 +6926,15 @@
   [(set_attr "length" "0")]
 )
 
+(define_insn "aarch64_fjcvtzs"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:DF 1 "register_operand" "w")]
+		   UNSPEC_FJCVTZS))]
+  "TARGET_JSCVT"
+  "fjcvtzs\\t%w0, %d1"
+  [(set_attr "type"

[PATCH][AArch64] Implement ACLE intrinsics for FRINT[32,64][Z,X]

2019-09-02 Thread Kyrill Tkachov

Hi all,

This patch implements the ACLE intrinsics to access the 
FRINT[32,64][Z,X] scalar[1] and vector[2][3] instructions
from Armv8.5-a. These are enabled when the __ARM_FEATURE_FRINT macro is 
defined.


They're added in a fairly standard way through builtins and unspecs at 
the RTL level.

The scalar intrinsics

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?page=2&search=vrnd32
[3] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?page=2&search=vrnd64


2019-09-02  Kyrylo Tkachov  

    * config/aarch64/aarch64.md ("unspec"): Add UNSPEC_FRINT32Z,
    UNSPEC_FRINT32X, UNSPEC_FRINT64Z, UNSPEC_FRINT64X.
    (aarch64_): New define_insn.
    * config/aarch64/aarch64.h (TARGET_FRINT): Define.
    * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
    __ARM_FEATURE_FRINT when appropriate.
    * config/aarch64/aarch64-simd-builtins.def: Add builtins for frint32z,
    frint32x, frint64z, frint64x.
    * config/aarch64/arm_acle.h (__rint32zf, __rint32z, __rint64zf,
    __rint64z, __rint32xf, __rint32x, __rint64xf, __rint64x): Define.
    * config/aarch64/arm_neon.h (vrnd32z_f32, vrnd32zq_f32, vrnd32z_f64,
    vrnd32zq_f64, vrnd32x_f32, vrnd32xq_f32, vrnd32x_f64, vrnd32xq_f64,
    vrnd64z_f32, vrnd64zq_f32, vrnd64z_f64, vrnd64zq_f64, vrnd64x_f32,
    vrnd64xq_f32, vrnd64x_f64, vrnd64xq_f64): Define.
    * config/aarch64/iterators.md (VSFDF): Define.
    (FRINTNZX): Likewise.
    (frintnzs_op): Likewise.

2019-09-02  Kyrylo Tkachov  

    * gcc.target/aarch64/acle/rintnzx_1.c: New test.
    * gcc.target/aarch64/simd/vrndnzx_1.c: Likewise.

diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index e532c6cd142f64f050d7b5da8ab01e1f5ac3b909..c05efeda820f4428eace6e57020eed1b288032e9 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -157,6 +157,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SM4, "__ARM_FEATURE_SM4", pfile);
   aarch64_def_or_undef (TARGET_F16FML, "__ARM_FEATURE_FP16_FML", pfile);
 
+  aarch64_def_or_undef (TARGET_FRINT, "__ARM_FEATURE_FRINT", pfile);
   aarch64_def_or_undef (TARGET_TME, "__ARM_FEATURE_TME", pfile);
 
   /* Not for ACLE, but required to keep "float.h" correct if we switch
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 779111a486dc63cb2618629435f19592ed1dc9e9..f4ca35a59704c761fe2ac2b6d401fff7c8aba80d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -676,3 +676,9 @@
   /* Implemented by aarch64_fmllq_laneq_highv4sf.  */
   VAR1 (QUADOP_LANE, fmlalq_laneq_high, 0, v4sf)
   VAR1 (QUADOP_LANE, fmlslq_laneq_high, 0, v4sf)
+
+  /* Implemented by aarch64_.  */
+  BUILTIN_VSFDF (UNOP, frint32z, 0)
+  BUILTIN_VSFDF (UNOP, frint32x, 0)
+  BUILTIN_VSFDF (UNOP, frint64z, 0)
+  BUILTIN_VSFDF (UNOP, frint64x, 0)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 41de3cb8831cd1a9476fe835816367c6579212d5..67279b44198be1ea0e950c80504e948d3af504f9 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -292,6 +292,9 @@ extern unsigned aarch64_architecture_version;
 /* Armv8.3-a Complex number extension to AdvSIMD extensions.  */
 #define TARGET_COMPLEX (TARGET_SIMD && TARGET_ARMV8_3)
 
+/* Floating-point rounding instructions from Armv8.5-a.  */
+#define TARGET_FRINT (AARCH64_ISA_V8_5 && TARGET_FLOAT)
+
 /* TME instructions are enabled.  */
 #define TARGET_TME (AARCH64_ISA_TME)
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9a64caff2436a0c648890b551cf09b1b4ac852d6..d933916e519feac38b79e6d42ff4f0a340de67c6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -141,6 +141,10 @@
 UNSPEC_CRC32X
 UNSPEC_FCVTZS
 UNSPEC_FCVTZU
+UNSPEC_FRINT32Z
+UNSPEC_FRINT32X
+UNSPEC_FRINT64Z
+UNSPEC_FRINT64X
 UNSPEC_URECPE
 UNSPEC_FRECPE
 UNSPEC_FRECPS
@@ -7344,6 +7348,16 @@
(set_attr "speculation_barrier" "true")]
 )
 
+(define_insn "aarch64_"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+	(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+		  FRINTNZX))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+  "\\t%0, %1"
+  [(set_attr "type" "f_rint")]
+)
+
 ;; Transactional Memory Extension (TME) instructions.
 
 (define_insn "tstart"
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 0427ec8b02111fc6991eb98b8ffb6d8ed8dd3a3f..0347d1d36a39d65ff264e2fbda45c4daad33a2c9 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -130,6 +130,59 @@ __ttest (void)
 #pragma GCC pop_options
 #endif
 
+#

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-09-02 Thread Wilco Dijkstra
ping


 
   
 Currently the Arm backend selects the alternative sched pressure algorithm.
  The issue is that this doesn't take register pressure into account, and so
  it causes significant additional spilling on Arm where there are only 14
  allocatable registers.  SPEC2006 shows significant codesize reduction
  with the default pressure algorithm, so switch back to that.  PR77308 shows
  ~800 fewer instructions.
  
  SPECINT2006 is ~0.6% faster on Cortex-A57 together with the other DImode
  patches. Overall SPEC codesize is 1.1% smaller.
  
  Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
  
  ChangeLog:
  2019-07-29  Wilco Dijkstra  
  
  * config/arm/arm.c (arm_option_override): Don't override sched
  pressure algorithm.
  
  --
  
  diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
  index 
81286cadf32f908e045d704128c5e06842e0cc92..628cf02f23fb29392a63d87f561c3ee2fb73a515
 100644
  --- a/gcc/config/arm/arm.c
  +++ b/gcc/config/arm/arm.c
  @@ -3575,11 +3575,6 @@ arm_option_override (void)
     if (use_neon_for_64bits == 1)
    prefer_neon_for_64bits = true;
   
  -  /* Use the alternative scheduling-pressure algorithm by default.  */
  -  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 
SCHED_PRESSURE_MODEL,
  -    global_options.x_param_values,
  -    global_options_set.x_param_values);
  -
     /* Look through ready list and all of queue for instructions
    relevant for L2 auto-prefetcher.  */
     int param_sched_autopref_queue_depth;
  
  

Re: [PATCH][AArch64] Fix symbol offset limit

2019-09-02 Thread Wilco Dijkstra
     
 ping
     
   
  In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations. 
   This means the offset can use all of the +/-4GB offset, leaving no offset 
available
   for the symbol itself.  This results in relocation overflow and link-time 
errors
   for simple expressions like &global_char + 0xff00.
   
   To avoid this, limit the offset to +/-1MB so that the symbol needs to be 
within a
   3.9GB offset from its references.  For the tiny code model use a 64KB 
offset, allowing
   most of the 1MB range for code/data between the symbol and its references.
   
   Bootstrapped on AArch64, passes regress, OK for commit?
   
   ChangeLog:
   2018-11-09  Wilco Dijkstra  
   
   gcc/
   * config/aarch64/aarch64.c (aarch64_classify_symbol):
   Apply reasonable limit to symbol offsets.
   
   testsuite/
   * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
   * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.
   
   --
   
   diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
   index 
83453d03095018eddd1801e71ef3836849267444..0023cb37bbae5afe9387840c1bb6b43586d4fac2
 100644
   --- a/gcc/config/aarch64/aarch64.c
   +++ b/gcc/config/aarch64/aarch64.c
   @@ -13047,26 +13047,26 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT 
offset)
     the offset does not cause overflow of the final address.  But
     we have no way of knowing the address of symbol at compile time
     so we can't accurately say if the distance between the PC and
   -    symbol + offset is outside the addressible range of +/-1M in the
   -    TINY code model.  So we rely on images not being greater than
   -    1M and cap the offset at 1M and anything beyond 1M will have to
   -    be loaded using an alternative mechanism.  Furthermore if the
   -    symbol is a weak reference to something that isn't known to
   -    resolve to a symbol in this module, then force to memory.  */
   +    symbol + offset is outside the addressible range of +/-1MB in 
the
   +    TINY code model.  So we limit the maximum offset to +/-64KB and
   +    assume the offset to the symbol is not larger than +/-(1MB - 
64KB).
   +    Furthermore force to memory if the symbol is a weak reference to
   +    something that doesn't resolve to a symbol in this module.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
   - || !IN_RANGE (offset, -1048575, 1048575))
   + || !IN_RANGE (offset, -0x1, 0x1))
    return SYMBOL_FORCE_TO_MEM;
   +
  return SYMBOL_TINY_ABSOLUTE;
    
    case AARCH64_CMODEL_SMALL:
  /* Same reasoning as the tiny code model, but the offset cap here 
is
   -    4G.  */
   +    1MB, allowing +/-3.9GB for the offset to the symbol.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
   - || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
   -   HOST_WIDE_INT_C (4294967264)))
   + || !IN_RANGE (offset, -0x10, 0x10))
    return SYMBOL_FORCE_TO_MEM;
   +
  return SYMBOL_SMALL_ABSOLUTE;
    
    case AARCH64_CMODEL_TINY_PIC:
   diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
   index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
   --- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
   +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
   @@ -1,12 +1,12 @@
   -/* { dg-do compile } */
   +/* { dg-do link } */
    /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
    
   -int fixed_regs[0x0020];
   +char fixed_regs[0x0020];
    
    int
   -foo()
   +main ()
    {
   -  return fixed_regs[0x0008];
   +  return fixed_regs[0x000ff000];
    }
    
    /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
   diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
   index 
6574cf4310430b847e77ea56bf8f20ef312d53e4..75c87c12f08004c153efc5192e5cfab566c089db
 100644
   --- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
   +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
   @@ -1,12 +1,12 @@
   -/* { dg-do compile } */
   +/* { dg-do link } */
    /* { dg-options "-O3 -save-temps -mcmodel=small" } */
    
   -int fixed_regs[0x2ULL];
   +char fixed_regs[0x2ULL];
    
    int
   -foo()
   +main ()
    {
   -  return fixed_regs[0x1ULL];
   +  return fixed_regs[0xf000ULL];
    }
    
    /* { dg-final { scan-assembler-not "adrp\tx\[0-9\]+, fixed_regs\\\+" } } */
   
   

Re: [PATCH] Add .pd extension to c_exts.

2019-09-02 Thread Martin Liška
On 9/2/19 1:08 PM, Alexander Monakov wrote:
> On Mon, 2 Sep 2019, Martin Liška wrote:
> 
>>> If that's the case, we should look into overriding 'tabstop' for all files 
>>> in
>>> the gcc tree, including .md files, not just .pd and C/C++ files, right?
>>
>> Can be done but we don't have any 'au BufRead *.md' rule right now.
> 
> The solution I had in mind was to set expected formatting for all files in our
> local vimrc like this:
> 
> --- a/contrib/vimrc
> +++ b/contrib/vimrc
> @@ -31,17 +31,17 @@ function! SetStyle()
>if stridx(l:fname, 'libsanitizer') != -1
>  return
>endif
> +  setlocal tabstop=8
> +  setlocal softtabstop=2
> +  setlocal shiftwidth=2
> +  setlocal noexpandtab
> +  setlocal textwidth=80
> +  setlocal formatoptions-=ro formatoptions+=cqlt
>let l:ext = fnamemodify(l:fname, ":e")
>let l:c_exts = ['c', 'h', 'cpp', 'cc', 'C', 'H', 'def', 'java']
>if index(l:c_exts, l:ext) != -1
>  setlocal cindent
> -setlocal tabstop=8
> -setlocal softtabstop=2
> -setlocal shiftwidth=2
> -setlocal noexpandtab
>  setlocal cinoptions=>4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0
> -setlocal textwidth=80
> -setlocal formatoptions-=ro formatoptions+=cqlt
>endif
>  endfunction
> 
> 
> Alexander
> 

Works for me, please include the reversion of r275295.

Thanks,
Martin


Re: [PATCH] builtin fadd variants implementation

2019-09-02 Thread Tejas Joshi
Hello.
Should a result like 1.4 be considered as inexact if truncating
(narrowing?) from double to float? (due to loss of trailing bits)
Comments of real_arithmetic says that it returns TRUE if the result is
inexact. There's another function, exact_real_truncate which returns
TRUE if truncation is exact. Why are these functions returning
different results for same input like, 1.4 + 0.0? (real_arithmetic
returns false (exact) as well as exact_real_truncate, but here,
inexact ) Does real_arithmetic considers only the exactness in the
same range of its arguments? To check for inexactness of narrowing,
which result should I consider?

Thanks,
Tejas


[PATCH] Use __constinit keyword in libstdc++ sources

2019-09-02 Thread Jonathan Wakely

Now that Marek has implemented constinit (and made it available
pre-C++20 as __constinit) we can use it to enforce constant init for
the globals in src/c++17/memory_resource.cc

Tested x86_64-linux, committed to trunk.

commit 0956a4644f960ad4d537e901d64ec69186ed46b4
Author: redi 
Date:   Mon Sep 2 11:31:34 2019 +

Use __constinit keyword in libstdc++ sources

* src/c++17/memory_resource.cc: Use __constinit keyword.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@275315 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index b6698011f5c..2b64039d280 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -85,8 +85,8 @@ namespace pmr
~constant_init() { /* do nothing, union member is not destroyed */ }
   };
 
-constant_init newdel_res{};
-constant_init null_res{};
+__constinit constant_init newdel_res{};
+__constinit constant_init null_res{};
 #if ATOMIC_POINTER_LOCK_FREE == 2
 using atomic_mem_res = atomic;
 # define _GLIBCXX_ATOMIC_MEM_RES_CAN_BE_CONSTANT_INITIALIZED
@@ -139,7 +139,7 @@ namespace pmr
 #endif // ATOMIC_POINTER_LOCK_FREE == 2
 
 #ifdef _GLIBCXX_ATOMIC_MEM_RES_CAN_BE_CONSTANT_INITIALIZED
-constant_init default_res{&newdel_res.obj};
+__constinit constant_init default_res{&newdel_res.obj};
 #else
 # include "default_resource.h"
 #endif


Re: [libstdc++,doc] xml/manual/policy_data_structures_biblio.xml

2019-09-02 Thread Jonathan Wakely

On 01/09/19 20:42 +0800, Gerald Pfeifer wrote:

microsoft.com redirects the existing link and changed the title of
the document; this adjust both.

Committed.

Jonathan(?), if you could regenerate the libstdc++ online docs, that
would be nice.


Done as part of r275314.



Re: [PATCH] PR libstdc++/91067 add more missing exports for directory iterators

2019-09-02 Thread Jonathan Wakely

On 29/08/19 13:16 +0100, Jonathan Wakely wrote:

PR libstdc++/91067
* acinclude.m4 (libtool_VERSION): Bump to 6:28:0.
* configure: Regenerate.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.28): Add new version. Export
missing symbols.
* testsuite/27_io/filesystem/iterators/91067.cc: Test move
constructors.
* testsuite/util/testsuite_abi.cc: Add new symbol version.

As mentioned yesterday, we need to add some more exports for
std::filesystem directory iterators. As discussed in PR 91067 Clang
inlines the move constructor and optimises it to a tail call to the C2
move constructor of __shared_ptr, which wasn't exported.

Tested x86_64-linux, i686-linux, powerpc64-linux. Committing to trunk
and (later today) gcc-9-branch.


This documents the change, committed to trunk and gcc-9-branch.

commit ebd01e7d6cc42e4e46a9dd567f46edbb5a4e05e9
Author: Jonathan Wakely 
Date:   Mon Sep 2 11:38:15 2019 +0100

Update libstdc++ docs for library version bumps

* doc/xml/manual/abi.xml: Document 9.x library versions.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml b/libstdc++-v3/doc/xml/manual/abi.xml
index d1e6b989a71..969edd7c834 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -268,7 +268,9 @@ compatible.
 GCC 7.1.0: libstdc++.so.6.0.23
 GCC 7.2.0: libstdc++.so.6.0.24
 GCC 8.0.0: libstdc++.so.6.0.25
-GCC 9.0.0: libstdc++.so.6.0.26
+GCC 9.1.0: libstdc++.so.6.0.26
+GCC 9.2.0: libstdc++.so.6.0.27
+GCC 9.3.0: libstdc++.so.6.0.28
 
 
   Note 1: Error should be libstdc++.so.3.0.3.


Re: [PATCH] Optimize to_chars

2019-09-02 Thread Jonathan Wakely

On 30/08/19 17:08 +0100, Jonathan Wakely wrote:

On 30/08/19 17:01 +0100, Jonathan Wakely wrote:

On 30/08/19 17:27 +0300, Antony Polukhin wrote:

Bunch of micro optimizations for std::to_chars:
* For base == 8 replacing the lookup in __digits table with arithmetic
computations leads to a same CPU cycles for a loop (exchanges two
movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this
saves 129 bytes of data and totally avoids a chance of cache misses on
__digits.
* For base == 16 replacing the lookup in __digits table with
arithmetic computations leads to a few additional instructions, but
totally avoids a chance of cache misses on __digits (- ~9 cache misses
for worst case) and saves 513 bytes of const data.
* Replacing __first[pos] and __first[pos - 1] with __first[1] and
__first[0] on final iterations saves ~2% of code size.
* Removing trailing '\0' from arrays of digits allows the linker to
merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and
"0123456789abcdef" could share the same address). This improves data
locality and reduces binary sizes.
* Using __detail::__to_chars_len_2 instead of a generic
__detail::__to_chars_len makes the operation O(1) instead of O(N). It
also makes the code two times shorter ( https://godbolt.org/z/Peq_PG)
.

In sum: this significantly reduces the size of a binary (for about
4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals
with latency (CPU cache misses) without changing the iterations count
and without adding costly instructions into the loops.


This is great, thanks.

Have you tried comparing the improved code to libc++'s implementation?
I believe they use precomputed arrays of digits, but they use larger
arrays that allow 4 bytes to be written at once, which is considerably
faster (and those precomputed arrays libe in libc++.so not in the
header). Would we be better off keeping the precomputed arrays and
expanding them to do 4-byte writes?

Since we don't have a patch to do that, I think I'll commit yours. We
can always go back to precomputed arrays later if somebody does that
work.

My only comments are on the changelog:


Changelog:
 * include/std/charconv (__detail::__to_chars_8,
 __detail::__to_chars_16): Replace array of precomputed digits


When the list of changed functions is split across lines it should be
like this:

 * include/std/charconv (__detail::__to_chars_8)
 (__detail::__to_chars_16): Replace array of precomputed digits

i.e close the parentheses before the line break, and reopen on the
next line.


 with arithmetic operations to avoid CPU cache misses. Remove
 zero termination from array of digits to allow symbol merge with
 generic implementation of __detail::__to_chars. Replace final
 offsets with constants. Use __detail::__to_chars_len_2 instead
 of a generic __detail::__to_chars_len.
 * include/std/charconv (__detail::__to_chars): Remove


Don't repeat the asterisk and filename for later changes in the same
file, i.e.

 (__detail::__to_chars): Remove zero termination from array of digits.
 (__detail::__to_chars_2): Leading digit is always '1'.

There's no changelog entry for the changes to __to_chars_len_8 and
__to_chars_len_16.


Oh, there's no separate __to_chars_len_16 function anyway, and you did
mention it as "Use __detail::__to_chars_len_2 instead ..." - sorry!

I think we might as well inline __to_chars_len_8 into __to_chars_8,
there's not much benefit to having a separate function. I'll do that
as a follow up patch.


I've committed this patch to simplify the code a bit.

Tested x86_64-linux, committed to trunk.

commit 8ea5798914e9cf5b823d245d85db4d76e0631d0e
Author: Jonathan Wakely 
Date:   Fri Aug 30 17:27:48 2019 +0100

Minor simplifications for std::to_chars implementation

* include/std/charconv (__detail::__to_chars_2_len): Use std::log2p1.
(__detail::__to_chars_8_len): Remove.
(__detail::__to_chars_8): Inline length calculation here.
(__detail::__from_chars_binary): Use numeric_limits instead of
CHAR_BIT.

diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv
index 4e94c39656d..ceefa3b6778 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -35,8 +35,9 @@
 
 #include 
 #include 
-#include 
-#include  // for __to_chars_len, __to_chars_10_impl
+#include 			// for __log2p1
+#include 		// for isdigit
+#include 	// for __to_chars_len, __to_chars_10_impl
 #include  // for std::errc
 
 // Define when floating point is supported: #define __cpp_lib_to_chars 201611L
@@ -96,43 +97,7 @@ namespace __detail
   template
 constexpr unsigned
 __to_chars_len_2(_Tp __value) noexcept
-{
-  static_assert(is_integral<_Tp>::value, "implementation bug");
-  static_assert(is_unsigned<_Tp>::value, "implementation bug");
-
-  constexpr size_t __nbits = __CHAR_BIT__ * sizeof(_Tp);
-
-  // N.B. __builtin_clzll is undefined if __value == 0, but std::

Re: [libgomp, GSoC'19] Work-stealing task scheduling

2019-09-02 Thread Jakub Jelinek
On Mon, Aug 26, 2019 at 02:58:46PM +0900, Ray Kim wrote:
> Fixed typos, lowered memory constraints where appropriate.

Your mailer sadly made the patch not applicable, I had to spent quite a long
time to just undo the undesirable line breaking.  Please fix your mailer not
to break lines automatically, or at least send patches as attachments on
which hopefully it will not do that on.

Before starting review, I've also fixed various formatting glitches.

Here is the fixed up patch (and attached interdiff with the changes I've
done):

--- libgomp/config/linux/bar.c.jj   2019-09-02 12:24:54.900952708 +0200
+++ libgomp/config/linux/bar.c  2019-09-02 12:25:37.446319594 +0200
@@ -199,13 +199,13 @@ gomp_team_barrier_wait_cancel (gomp_barr
 void
 gomp_team_barrier_cancel (struct gomp_team *team)
 {
-  gomp_mutex_lock (&team->task_lock);
+  gomp_mutex_lock (&team->barrier_lock);
   if (team->barrier.generation & BAR_CANCELLED)
 {
-  gomp_mutex_unlock (&team->task_lock);
+  gomp_mutex_unlock (&team->barrier_lock);
   return;
 }
   team->barrier.generation |= BAR_CANCELLED;
-  gomp_mutex_unlock (&team->task_lock);
+  gomp_mutex_unlock (&team->barrier_lock);
   futex_wake ((int *) &team->barrier.generation, INT_MAX);
 }
--- libgomp/task.c.jj   2019-01-01 12:38:37.977648900 +0100
+++ libgomp/task.c  2019-09-02 12:54:27.879483614 +0200
@@ -29,6 +29,7 @@
 #include "libgomp.h"
 #include 
 #include 
+#include 
 #include "gomp-constants.h"
 
 typedef struct gomp_task_depend_entry *hash_entry_type;
@@ -76,16 +77,21 @@ gomp_init_task (struct gomp_task *task,
   task->parent = parent_task;
   task->icv = *prev_icv;
   task->kind = GOMP_TASK_IMPLICIT;
-  task->taskwait = NULL;
+  task->state = GOMP_STATE_NORMAL;
   task->in_tied_task = false;
   task->final_task = false;
   task->copy_ctors_done = false;
   task->parent_depends_on = false;
-  priority_queue_init (&task->children_queue);
   task->taskgroup = NULL;
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
+  task->num_awaited = 0;
+  task->num_children = 0;
+  /* Currently we're initializing the depend lock for every tasks.
+ However, it coulbd be possible that the mutex can be initialized
+ on-demand by the dependers.  */
+  gomp_mutex_init (&task->depend_lock);
 }
 
 /* Clean up a task, after completing it.  */
@@ -100,61 +106,115 @@ gomp_end_task (void)
   thr->task = task->parent;
 }
 
-/* Clear the parent field of every task in LIST.  */
+/* If task is free to go, clean up the task.  */
 
-static inline void
-gomp_clear_parent_in_list (struct priority_list *list)
+void
+gomp_maybe_end_task (void)
 {
-  struct priority_node *p = list->tasks;
-  if (p)
-do
-  {
-   priority_node_to_task (PQ_CHILDREN, p)->parent = NULL;
-   p = p->next;
-  }
-while (p != list->tasks);
+  struct gomp_thread *thr = gomp_thread ();
+  struct gomp_task *task = thr->task;
+
+  if (__atomic_load_n (&task->num_children, MEMMODEL_ACQUIRE) == 0)
+gomp_finish_task (task);
+  thr->task = task->parent;
 }
 
-/* Splay tree version of gomp_clear_parent_in_list.
+/* Enqueue the task to its corresponding task queue.
 
-   Clear the parent field of every task in NODE within SP, and free
-   the node when done.  */
+   Currently, the 'corresponding task queue' is the queue dedicated to the
+   calling thread.  team and thread are passed as an optimization.  */
 
-static void
-gomp_clear_parent_in_tree (prio_splay_tree sp, prio_splay_tree_node node)
+void
+gomp_enqueue_task (struct gomp_task *task, struct gomp_team *team,
+  struct gomp_thread *thr, int priority)
 {
-  if (!node)
-return;
-  prio_splay_tree_node left = node->left, right = node->right;
-  gomp_clear_parent_in_list (&node->key.l);
+  int tid = thr->ts.team_id;
+  struct gomp_taskqueue *queue = &gomp_team_taskqueue (team)[tid];
+
+  __atomic_add_fetch (&team->task_queued_count, 1, MEMMODEL_ACQ_REL);
+  gomp_mutex_lock (&queue->queue_lock);
+  priority_queue_insert (&queue->priority_queue, task, priority,
+PRIORITY_INSERT_END, task->parent_depends_on);
+  gomp_mutex_unlock (&queue->queue_lock);
+  return;
+}
+
+/* Dequeue a task from the specific queue.  */
+
+inline static struct gomp_task *
+gomp_dequeue_task_from_queue (int qid, struct gomp_team *team)
+{
+  struct gomp_task *task = NULL;
+  struct gomp_taskqueue *queue = &gomp_team_taskqueue (team)[qid];
+  gomp_mutex_lock (&queue->queue_lock);
+
 #if _LIBGOMP_CHECKING_
-  memset (node, 0xaf, sizeof (*node));
+  priority_queue_verify (&queue->priority_queue, false);
 #endif
-  /* No need to remove the node from the tree.  We're nuking
- everything, so just free the nodes and our caller can clear the
- entire splay tree.  */
-  free (node);
-  gomp_clear_parent_in_tree (sp, left);
-  gomp_clear_parent_in_tree (sp, right);
+
+  if (priority_queue_empty_p (&queue->priority_queue, MEMMODEL_RELAXED))
+{
+  gomp_mutex_unlock (&queue->queue_

Re: [PATCH] Add .pd extension to c_exts.

2019-09-02 Thread Alexander Monakov
On Mon, 2 Sep 2019, Martin Liška wrote:

> > If that's the case, we should look into overriding 'tabstop' for all files 
> > in
> > the gcc tree, including .md files, not just .pd and C/C++ files, right?
> 
> Can be done but we don't have any 'au BufRead *.md' rule right now.

The solution I had in mind was to set expected formatting for all files in our
local vimrc like this:

--- a/contrib/vimrc
+++ b/contrib/vimrc
@@ -31,17 +31,17 @@ function! SetStyle()
   if stridx(l:fname, 'libsanitizer') != -1
 return
   endif
+  setlocal tabstop=8
+  setlocal softtabstop=2
+  setlocal shiftwidth=2
+  setlocal noexpandtab
+  setlocal textwidth=80
+  setlocal formatoptions-=ro formatoptions+=cqlt
   let l:ext = fnamemodify(l:fname, ":e")
   let l:c_exts = ['c', 'h', 'cpp', 'cc', 'C', 'H', 'def', 'java']
   if index(l:c_exts, l:ext) != -1
 setlocal cindent
-setlocal tabstop=8
-setlocal softtabstop=2
-setlocal shiftwidth=2
-setlocal noexpandtab
 setlocal cinoptions=>4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0
-setlocal textwidth=80
-setlocal formatoptions-=ro formatoptions+=cqlt
   endif
 endfunction


Alexander

Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions

2019-09-02 Thread Richard Biener
On Fri, Aug 30, 2019 at 5:25 PM Ilya Leoshkevich  wrote:
>
> > Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich :
> >
> >> Am 30.08.2019 um 09:12 schrieb Richard Biener :
> >>
> >> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich  
> >> wrote:
> >>>
>  Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich :
> 
>  Bootstrap and regtest running on x86_64-redhat-linux and
>  s390x-redhat-linux.
> 
>  This patch series adds signaling FP comparison support (both scalar and
>  vector) to s390 backend.
> >>>
> >>> I'm running into a problem on ppc64 with this patch, and it would be
> >>> great if someone could help me figure out the best way to resolve it.
> >>>
> >>> vector36.C test is failing because gimplifier produces the following
> >>>
> >>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
> >>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >>>
> >>> from
> >>>
> >>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
> >>> { -1, -1, -1, -1 } ,
> >>> { 0, 0, 0, 0 } >
> >>>
> >>> Since the comparison tree code is now hidden behind a temporary, my code
> >>> does not have anything to pass to the backend.  The reason for creating
> >>> a temporary is that the comparison can trap, and so the following check
> >>> in gimplify_expr fails:
> >>>
> >>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
> >>>   goto out;
> >>>
> >>> gimple_test_f is is_gimple_condexpr, and it eventually calls
> >>> operation_could_trap_p (GT).
> >>>
> >>> My current solution is to simply state that backend does not support
> >>> SSA_NAME in vector comparisons, however, I don't like it, since it may
> >>> cause performance regressions due to having to fall back to scalar
> >>> comparisons.
> >>>
> >>> I was thinking about two other possible solutions:
> >>>
> >>> 1. Change the gimplifier to allow trapping vector comparisons.  That's
> >>>  a bit complicated, because tree_could_throw_p checks not only for
> >>>  floating point traps, but also e.g. for array index out of bounds
> >>>  traps.  So I would have to create a tree_could_throw_p version which
> >>>  disregards specific kinds of traps.
> >>>
> >>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
> >>>  its tree_code instead of SSA_NAME.  The potential problem I see with
> >>>  this is that there appears to be no guarantee that _5 will be inlined
> >>>  into _6 at a later point.  So if we say that we don't need to fall
> >>>  back to scalar comparisons based on availability of vector >
> >>>  instruction and inlining does not happen, then what's actually will
> >>>  be required is vector selection (vsel on S/390), which might not be
> >>>  available in general case.
> >>>
> >>> What would be a better way to proceed here?
> >>
> >> On GIMPLE there isn't a good reason to split out trapping comparisons
> >> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs
> >> where it is important because we'd have no way to represent EH info
> >> when not done.  It might be a bit awkward to preserve EH across RTL
> >> expansion though in case the [VEC_]COND_EXPR are not expanded
> >> as a single pattern, but I'm not sure.
> >
> > Ok, so I'm testing the following now - for the problematic test that
> > helped:
> >
> > diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> > index b0c9f9b671a..940aa394769 100644
> > --- a/gcc/gimple-expr.c
> > +++ b/gcc/gimple-expr.c
> > @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t)
> > || TREE_CODE (t) == BIT_FIELD_REF);
> > }
> >
> > -/*  Return true if T is a GIMPLE condition.  */
> > +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr. 
> >  */
> >
> > -bool
> > -is_gimple_condexpr (tree t)
> > +static bool
> > +is_gimple_condexpr_1 (tree t, bool allow_traps)
> > {
> >   return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> > - && !tree_could_throw_p (t)
> > + && (allow_traps || !tree_could_throw_p (t))
> >   && is_gimple_val (TREE_OPERAND (t, 0))
> >   && is_gimple_val (TREE_OPERAND (t, 1;
> > }
> >
> > +/*  Return true if T is a GIMPLE condition.  */
> > +
> > +bool
> > +is_gimple_condexpr (tree t)
> > +{
> > +  return is_gimple_condexpr_1 (t, false);
> > +}
> > +
> > +/* Like is_gimple_condexpr, but allow the T to trap.  */
> > +
> > +bool
> > +is_possibly_trapping_gimple_condexpr (tree t)
> > +{
> > +  return is_gimple_condexpr_1 (t, true);
> > +}
> > +
> > /* Return true if T is a gimple address.  */
> >
> > bool
> > diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> > index 1ad1432bd17..20546ca5b99 100644
> > --- a/gcc/gimple-expr.h
> > +++ b/gcc/gimple-expr.h
> > @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum 
> > tree_code *, tree *,
> >  tree *);
> > extern bool is_gimple_lvalue (tree);
> > extern bool is_gimple

Re: [PATCH] Add .pd extension to c_exts.

2019-09-02 Thread Martin Liška
On 9/2/19 11:56 AM, Alexander Monakov wrote:
> On Mon, 2 Sep 2019, Martin Liška wrote:
> 
>> Yep, I'm going to apply following patch that does it properly for the 
>> gcc-match
>> file type.
> 
> So just to make sure I understand correctly why you need this:
> 
> you use some other value of 'tabstop' in Vim, and need to reset it back to its
> default value of 8, as assumed by coding standards in GCC?

Yes, that was my motivation.

> 
> If that's the case, we should look into overriding 'tabstop' for all files in
> the gcc tree, including .md files, not just .pd and C/C++ files, right?

Can be done but we don't have any 'au BufRead *.md' rule right now.

Martin

> 
> Thanks.
> Alexander
> 



Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Richard Biener
On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu  wrote:
>
> > which is not the case with core_cost (and similar with skylake_cost):
> >
> >   2, 2, 4,/* cost of moving XMM,YMM,ZMM register */
> >   {6, 6, 6, 6, 12},/* cost of loading SSE registers
> >in 32,64,128,256 and 512-bit */
> >   {6, 6, 6, 6, 12},/* cost of storing SSE registers
> >in 32,64,128,256 and 512-bit */
> >   2, 2,/* SSE->integer and integer->SSE moves */
> >
> > We have the same cost of moving between integer registers (by default
> > set to 2), between SSE registers and between integer and SSE register
> > sets. I think that at least the cost of moves between regsets should
> > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > that would translate to the value of 6. The value should be low enough
> > to keep the cost below the value that forces move through the memory.
> > Changing core register allocation cost of SSE <-> integer to:
> >
> > --cut here--
> > Index: config/i386/x86-tune-costs.h
> > ===
> > --- config/i386/x86-tune-costs.h(revision 275281)
> > +++ config/i386/x86-tune-costs.h(working copy)
> > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> >in 32,64,128,256 and 512-bit */
> >{6, 6, 6, 6, 12},/* cost of storing SSE registers
> >in 32,64,128,256 and 512-bit */
> > -  2, 2,/* SSE->integer and
> > integer->SSE moves */
> > +  6, 6,/* SSE->integer and
> > integer->SSE moves */
> >/* End of register allocator costs.  */
> >},
> >
> > --cut here--
> >
> > still produces direct move in gcc.target/i386/minmax-6.c
> >
> > I think that in addition to attached patch, values between 2 and 6
> > should be considered in benchmarking. Unfortunately, without access to
> > regressed SPEC tests, I can't analyse these changes by myself.
> >
> > Uros.
>
> Apply similar change to skylake_cost, on skylake workstation we got
> performance like:
> ---
> version|
> 548_exchange_r score
> gcc10_20180822:   |   10
> apply remove_max8   |   8.9
> also apply increase integer_tofrom_sse cost |   9.69
> -
> Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> libgfortran.so.5.0.0.
>
> I found suspicious code as bellow, does it affect?

This should be fixed after

2019-08-27  Richard Biener  

* config/i386/i386-features.h
(general_scalar_chain::~general_scalar_chain): Add.
(general_scalar_chain::insns_conv): New bitmap.
(general_scalar_chain::n_sse_to_integer): New.
(general_scalar_chain::n_integer_to_sse): Likewise.
(general_scalar_chain::make_vector_copies): Adjust signature.
* config/i386/i386-features.c
(general_scalar_chain::general_scalar_chain): Outline,
initialize new members.
(general_scalar_chain::~general_scalar_chain): New.
(general_scalar_chain::mark_dual_mode_def): Record insns
we need to insert conversions at and count them.
(general_scalar_chain::compute_convert_gain): Account
for conversion instructions at chain boundary.
(general_scalar_chain::make_vector_copies): Generate a single
copy for a def by a specific insn.
(general_scalar_chain::convert_registers): First populate
defs_map, then make copies at out-of chain insns.

where the only ???  is that we have

  const int sse_to_integer; /* cost of moving SSE register to integer.  */

but not integer_to_sse.  In the hard_register sub-struct of processor_cost
we have both:

  const int sse_to_integer; /* cost of moving SSE register to integer.  */
  const int integer_to_sse; /* cost of moving integer register to SSE. */

IMHO that we have mostly the same kind of costs two times is odd.
And the compute_convert_gain function adds up apples and oranges.

> --
> modified   gcc/config/i386/i386-features.c
> @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
>if (dump_file)
>  fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
>
> -  /* ???  What about integer to SSE?  */
> +  /* ???  What about integer to SSE?  */???
>EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
>  cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> --
> --
> BR,
> Hongtao


[SPARC] Fix PR target/91323

2019-09-02 Thread Eric Botcazou
The SPARC back-end was aligned on the x86 back-end wrt LTGT so it needs to be 
changed too.  The patch also changes the wording of the description of the 
operator in doc/generic.texi, rtl.def and tree.def.

Tested on SPARC/Solaris, approved by Richard B. and applied on the mainline.


2019-09-02  Eric Botcazou  

PR target/91323
* doc/generic.texi (LTGT_EXPR): Merge with other comparison operators.
* rtl.def (LTGT): Likewise.  Add note about floating-point exceptions.
* tree.def (LTGT_EXPR): Likewise.
* config/sparc/sparc.c (select_cc_mode): Return CCFPEmode for LTGT.

-- 
Eric BotcazouIndex: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 275270)
+++ config/sparc/sparc.c	(working copy)
@@ -3203,13 +3203,13 @@ select_cc_mode (enum rtx_code op, rtx x,
 	case UNGT:
 	case UNGE:
 	case UNEQ:
-	case LTGT:
 	  return CCFPmode;
 
 	case LT:
 	case LE:
 	case GT:
 	case GE:
+	case LTGT:
 	  return CCFPEmode;
 
 	default:
Index: doc/generic.texi
===
--- doc/generic.texi	(revision 275068)
+++ doc/generic.texi	(working copy)
@@ -1564,21 +1564,23 @@ allows the backend to choose between the
 @itemx LE_EXPR
 @itemx GT_EXPR
 @itemx GE_EXPR
+@itemx LTGT_EXPR
 @itemx EQ_EXPR
 @itemx NE_EXPR
-These nodes represent the less than, less than or equal to, greater
-than, greater than or equal to, equal, and not equal comparison
-operators.  The first and second operands will either be both of integral
-type, both of floating type or both of vector type.  The result type of
-these expressions will always be of integral, boolean or signed integral
-vector type.  These operations return the result type's zero value for
-false, the result type's one value for true, and a vector whose elements
-are zero (false) or minus one (true) for vectors.
+These nodes represent the less than, less than or equal to, greater than,
+greater than or equal to, less or greater than, equal, and not equal
+comparison operators.  The first and second operands will either be both
+of integral type, both of floating type or both of vector type, except for
+LTGT_EXPR where they will only be both of floating type.  The result type
+of these expressions will always be of integral, boolean or signed integral
+vector type.  These operations return the result type's zero value for false,
+the result type's one value for true, and a vector whose elements are zero
+(false) or minus one (true) for vectors.
 
 For floating point comparisons, if we honor IEEE NaNs and either operand
 is NaN, then @code{NE_EXPR} always returns true and the remaining operators
 always return false.  On some targets, comparisons against an IEEE NaN,
-other than equality and inequality, may generate a floating point exception.
+other than equality and inequality, may generate a floating-point exception.
 
 @item ORDERED_EXPR
 @itemx UNORDERED_EXPR
@@ -1596,15 +1598,13 @@ and the result type's one value for true
 @itemx UNGT_EXPR
 @itemx UNGE_EXPR
 @itemx UNEQ_EXPR
-@itemx LTGT_EXPR
 These nodes represent the unordered comparison operators.
 These operations take two floating point operands and determine whether
 the operands are unordered or are less than, less than or equal to,
 greater than, greater than or equal to, or equal respectively.  For
 example, @code{UNLT_EXPR} returns true if either operand is an IEEE
-NaN or the first operand is less than the second.  With the possible
-exception of @code{LTGT_EXPR}, all of these operations are guaranteed
-not to generate a floating point exception.  The result
+NaN or the first operand is less than the second.  All these operations
+are guaranteed not to generate a floating point exception.  The result
 type of these expressions will always be of integral or boolean type.
 These operations return the result type's zero value for false,
 and the result type's one value for true.
Index: rtl.def
===
--- rtl.def	(revision 275068)
+++ rtl.def	(working copy)
@@ -552,20 +552,25 @@ DEF_RTL_EXPR(POST_INC, "post_inc", "e",
 DEF_RTL_EXPR(PRE_MODIFY, "pre_modify", "ee", RTX_AUTOINC)
 DEF_RTL_EXPR(POST_MODIFY, "post_modify", "ee", RTX_AUTOINC)
 
-/* Comparison operations.  The ordered comparisons exist in two
-   flavors, signed and unsigned.  */
+/* Comparison operations.  The first 6 are allowed only for integral,
+floating-point and vector modes.  LTGT is only allowed for floating-point
+modes.  The last 4 are allowed only for integral and vector modes.
+For floating-point operations, if either operand is a NaN, then NE returns
+true and the remaining operations return false.  The operations other than
+EQ and NE may generate an exception on quiet NaNs.  */
 DEF_RTL_EXPR(NE, "ne", "ee", RTX_COMM_COMPARE)
 DEF_RTL_EXPR(EQ, "eq", "ee", RTX_COMM_COMPARE)
 DEF_RTL_EXPR(GE, "ge", "ee", RTX_COMPARE)
 DEF_RTL_EXPR(

Re: [PATCH] Add .pd extension to c_exts.

2019-09-02 Thread Alexander Monakov
On Mon, 2 Sep 2019, Martin Liška wrote:

> Yep, I'm going to apply following patch that does it properly for the 
> gcc-match
> file type.

So just to make sure I understand correctly why you need this:

you use some other value of 'tabstop' in Vim, and need to reset it back to its
default value of 8, as assumed by coding standards in GCC?

If that's the case, we should look into overriding 'tabstop' for all files in
the gcc tree, including .md files, not just .pd and C/C++ files, right?

Thanks.
Alexander

[PATCH][gcc] libgccjit: handle long literals in playback::context::new_string_literal

2019-09-02 Thread Andrea Corallo
Hi all,
yesterday I've found an interesting bug in libgccjit.
Seems we have an hard limitation of 200 characters for literal strings.
Attempting to create longer strings lead to ICE during pass_expand
while performing a sanity check in get_constant_size.

Tracking down the issue seems the code we have was inspired from
c-family/c-common.c:c_common_nodes_and_builtins were array_domain_type
is actually defined with a size of 200.
The comment that follows that point sounded premonitory :) :)

/* Make a type for arrays of characters.
   With luck nothing will ever really depend on the length of this
   array type.  */

At least in the current implementation the type is set by
fix_string_type were the actual string length is taken in account.

I attach a patch updating the logic accordingly and a new testcase
 for that.

make check-jit is passing clean.

Best Regards
  Andrea

gcc/jit/ChangeLog
2019-??-??  Andrea Corallo  

* jit-playback.h
(gcc::jit::recording::context m_recording_ctxt): Remove
m_char_array_type_node field.
* jit-playback.c
(playback::context::context) Remove m_char_array_type_node from member
initializer list.
(playback::context::new_string_literal) Fix logic to handle string
length > 200.

gcc/testsuite/ChangeLog
2019-??-??  Andrea Corallo  

* jit.dg/all-non-failing-tests.h: Add test-long-string-literal.c.
* jit.dg/test-long-string-literal.c: New testcase.
diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 9eeb2a7..a26b8d3 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -88,7 +88,6 @@ playback::context::context (recording::context *ctxt)
   : log_user (ctxt->get_logger ()),
 m_recording_ctxt (ctxt),
 m_tempdir (NULL),
-m_char_array_type_node (NULL),
 m_const_char_ptr (NULL)
 {
   JIT_LOG_SCOPE (get_logger ());
@@ -670,9 +669,12 @@ playback::rvalue *
 playback::context::
 new_string_literal (const char *value)
 {
-  tree t_str = build_string (strlen (value), value);
-  gcc_assert (m_char_array_type_node);
-  TREE_TYPE (t_str) = m_char_array_type_node;
+  /* Compare with c-family/c-common.c: fix_string_type.  */
+  size_t len = strlen (value);
+  tree i_type = build_index_type (size_int (len));
+  tree a_type = build_array_type (char_type_node, i_type);
+  tree t_str = build_string (len, value);
+  TREE_TYPE (t_str) = a_type;
 
   /* Convert to (const char*), loosely based on
  c/c-typeck.c: array_to_pointer_conversion,
@@ -2703,10 +2705,6 @@ playback::context::
 replay ()
 {
   JIT_LOG_SCOPE (get_logger ());
-  /* Adapted from c-common.c:c_common_nodes_and_builtins.  */
-  tree array_domain_type = build_index_type (size_int (200));
-  m_char_array_type_node
-= build_array_type (char_type_node, array_domain_type);
 
   m_const_char_ptr
 = build_pointer_type (build_qualified_type (char_type_node,
diff --git a/gcc/jit/jit-playback.h b/gcc/jit/jit-playback.h
index d4b148e..801f610 100644
--- a/gcc/jit/jit-playback.h
+++ b/gcc/jit/jit-playback.h
@@ -322,7 +322,6 @@ private:
 
   auto_vec m_functions;
   auto_vec m_globals;
-  tree m_char_array_type_node;
   tree m_const_char_ptr;
 
   /* Source location handling.  */
diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index 0272e6f8..1b3d561 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -220,6 +220,13 @@
 #undef create_code
 #undef verify_code
 
+/* test-long-string-literal.c */
+#define create_code create_code_long_string_literal
+#define verify_code verify_code_long_string_literal
+#include "test-long-string-literal.c"
+#undef create_code
+#undef verify_code
+
 /* test-sum-of-squares.c */
 #define create_code create_code_sum_of_squares
 #define verify_code verify_code_sum_of_squares
diff --git a/gcc/testsuite/jit.dg/test-long-string-literal.c b/gcc/testsuite/jit.dg/test-long-string-literal.c
new file mode 100644
index 000..882567c
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-long-string-literal.c
@@ -0,0 +1,48 @@
+#include 
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#include "harness.h"
+
+const char very_long_string[] =
+  "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"
+  "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"
+  "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"
+  "abcabcabcabcabcabcabcabcabcabca";
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  /* Build the test_fn.  */
+  gcc_jit_function *f =
+gcc_jit_context_new_function (
+  ctxt, NULL,
+  GCC_JIT_FUNCTION_EXPORTED,
+  gcc_jit_context_get_type(ctxt,
+			   GCC_JIT_TYPE_CONST_CHAR_PTR),
+"test_long_string_literal",
+0, NULL, 0);
+  gcc_jit_block *blk =
+gcc_jit_function_new_block (f, "init_block");
+  gcc_jit_rvalue *res =
+gcc_jit_context_new_string_literal (ctxt, v

Re: GCC 9 backports

2019-09-02 Thread Martin Liška
Hi.

There are 2 more patches that I've just tested.

Martin
>From 367c03f190d78f1811715b4158ccff9c9aa08a1a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 2 Sep 2019 07:06:54 +
Subject: [PATCH 1/2] Backport r275291

gcc/ChangeLog:

2019-09-02  Martin Liska  

	PR gcov-profile/91601
	* gcov.c (path_contains_zero_cycle_arc): Rename to ...
	(path_contains_zero_or_negative_cycle_arc): ... this and handle
	also negative edges.
	(circuit): Handle also negative edges as they can happen
	in some situations.
---
 gcc/gcov.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index b06a6714c2e..7e51c2efb30 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -725,10 +725,10 @@ unblock (const block_info *u, block_vector_t &blocked,
 /* Return true when PATH contains a zero cycle arc count.  */
 
 static bool
-path_contains_zero_cycle_arc (arc_vector_t &path)
+path_contains_zero_or_negative_cycle_arc (arc_vector_t &path)
 {
   for (unsigned i = 0; i < path.size (); i++)
-if (path[i]->cs_count == 0)
+if (path[i]->cs_count <= 0)
   return true;
   return false;
 }
@@ -754,7 +754,7 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
 {
   block_info *w = arc->dst;
   if (w < start
-	  || arc->cs_count == 0
+	  || arc->cs_count <= 0
 	  || !linfo.has_block (w))
 	continue;
 
@@ -765,7 +765,7 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
 	  handle_cycle (path, count);
 	  loop_found = true;
 	}
-  else if (!path_contains_zero_cycle_arc (path)
+  else if (!path_contains_zero_or_negative_cycle_arc (path)
 	   &&  find (blocked.begin (), blocked.end (), w) == blocked.end ())
 	loop_found |= circuit (w, path, start, blocked, block_lists, linfo,
 			   count);
@@ -780,7 +780,7 @@ circuit (block_info *v, arc_vector_t &path, block_info *start,
   {
 	block_info *w = arc->dst;
 	if (w < start
-	|| arc->cs_count == 0
+	|| arc->cs_count <= 0
 	|| !linfo.has_block (w))
 	  continue;
 
-- 
2.23.0

>From f4e0d3156d4c1e651caf2e796df5a10d4619b6eb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 2 Sep 2019 07:07:11 +
Subject: [PATCH 2/2] Backport r275292

gcc/c-family/ChangeLog:

2019-09-02  Martin Liska  

	PR c++/91155
	* c-common.c (fname_as_string): Use cxx_printable_name for
	__PRETTY_FUNCTION__ same as was used before r265711.

gcc/testsuite/ChangeLog:

2019-09-02  Martin Liska  

	PR c++/91155
	* g++.dg/torture/pr91155.C: New test.
---
 gcc/cp/decl.c  | 20 +---
 gcc/testsuite/g++.dg/torture/pr91155.C | 18 ++
 2 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr91155.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 05ceda89d4c..e860f26e55d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4474,13 +4474,27 @@ cp_fname_init (const char* name, tree *type_p)
 static tree
 cp_make_fname_decl (location_t loc, tree id, int type_dep)
 {
-  const char *const name = (type_dep && in_template_function ()
-			? NULL : fname_as_string (type_dep));
+  const char * name = NULL;
+  bool release_name = false;
+  if (!(type_dep && in_template_function ()))
+{
+  if (current_function_decl == NULL_TREE)
+	name = "top level";
+  else if (type_dep == 1) /* __PRETTY_FUNCTION__ */
+	name = cxx_printable_name (current_function_decl, 2);
+  else if (type_dep == 0) /* __FUNCTION__ */
+	{
+	  name = fname_as_string (type_dep);
+	  release_name = true;
+	}
+  else
+	gcc_unreachable ();
+}
   tree type;
   tree init = cp_fname_init (name, &type);
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
-  if (name)
+  if (release_name)
 free (CONST_CAST (char *, name));
 
   /* As we're using pushdecl_with_scope, we must set the context.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr91155.C b/gcc/testsuite/g++.dg/torture/pr91155.C
new file mode 100644
index 000..04e4f7ab41b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr91155.C
@@ -0,0 +1,18 @@
+/* PR c++/91155.  */
+
+template< char C > struct dummy {};
+
+template< typename T > const char *test()
+{
+  __builtin_printf ("test: %s\n", __PRETTY_FUNCTION__);
+  return __PRETTY_FUNCTION__;
+}
+
+int main()
+{
+if (__builtin_strcmp ("const char* test() [with T = dummy<\'\\000\'>]", test< dummy< '\0' > > ()) != 0)
+{};//  __builtin_abort ();
+if (__builtin_strcmp ("const char* test() [with T = dummy<\'\\\'\'>]", test< dummy< '\'' > > ()) != 0)
+{};//  __builtin_abort ();
+return 0;
+}
-- 
2.23.0



Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Uros Bizjak
On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu  wrote:
>
> > which is not the case with core_cost (and similar with skylake_cost):
> >
> >   2, 2, 4,/* cost of moving XMM,YMM,ZMM register */
> >   {6, 6, 6, 6, 12},/* cost of loading SSE registers
> >in 32,64,128,256 and 512-bit */
> >   {6, 6, 6, 6, 12},/* cost of storing SSE registers
> >in 32,64,128,256 and 512-bit */
> >   2, 2,/* SSE->integer and integer->SSE moves */
> >
> > We have the same cost of moving between integer registers (by default
> > set to 2), between SSE registers and between integer and SSE register
> > sets. I think that at least the cost of moves between regsets should
> > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > that would translate to the value of 6. The value should be low enough
> > to keep the cost below the value that forces move through the memory.
> > Changing core register allocation cost of SSE <-> integer to:
> >
> > --cut here--
> > Index: config/i386/x86-tune-costs.h
> > ===
> > --- config/i386/x86-tune-costs.h(revision 275281)
> > +++ config/i386/x86-tune-costs.h(working copy)
> > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> >in 32,64,128,256 and 512-bit */
> >{6, 6, 6, 6, 12},/* cost of storing SSE registers
> >in 32,64,128,256 and 512-bit */
> > -  2, 2,/* SSE->integer and
> > integer->SSE moves */
> > +  6, 6,/* SSE->integer and
> > integer->SSE moves */
> >/* End of register allocator costs.  */
> >},
> >
> > --cut here--
> >
> > still produces direct move in gcc.target/i386/minmax-6.c
> >
> > I think that in addition to attached patch, values between 2 and 6
> > should be considered in benchmarking. Unfortunately, without access to
> > regressed SPEC tests, I can't analyse these changes by myself.
> >
> > Uros.
>
> Apply similar change to skylake_cost, on skylake workstation we got
> performance like:
> ---
> version|
> 548_exchange_r score
> gcc10_20180822:   |   10
> apply remove_max8   |   8.9
> also apply increase integer_tofrom_sse cost |   9.69
> -
> Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> libgfortran.so.5.0.0.
>
> I found suspicious code as bellow, does it affect?

Hard to say without access to the test, but I'm glad that changing the
knob has noticeable effect. I think that (as said by Alan) a fine-tune
of register pressure calculation will be needed to push this forward.

Uros.

> --
> modified   gcc/config/i386/i386-features.c
> @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
>if (dump_file)
>  fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
>
> -  /* ???  What about integer to SSE?  */
> +  /* ???  What about integer to SSE?  */???
>EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
>  cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> --
> --
> BR,
> Hongtao


Re: [PATCH] Fix up go regressions caused by my recent switchconv changes (PR go/91617)

2019-09-02 Thread Richard Biener
On Mon, 2 Sep 2019, Jakub Jelinek wrote:

> On Sun, Sep 01, 2019 at 06:44:15PM +0200, Richard Biener wrote:
> > On September 1, 2019 6:34:25 PM GMT+02:00, Jakub Jelinek  
> > wrote:
> > >On Sat, Aug 31, 2019 at 08:25:49PM +0200, Richard Biener wrote:
> > >> So why not always return an unsigned type then by telling
> > >type_for_size? 
> > >
> > >So like this (if it passes bootstrap/regtest)?
> > 
> > Yes. 
> 
> Unfortunately that didn't work, because TYPE_MAX_VALUE/TYPE_MIN_VALUE
> are not present on POINTER_TYPEs.
> 
> Here is an updated version that passed bootstrap/regtest on both
> x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2019-09-02  Jakub Jelinek  
> 
>   PR go/91617
>   * fold-const.c (range_check_type): For enumeral and boolean
>   type, pass 1 to type_for_size langhook instead of
>   TYPE_UNSIGNED (etype).  Return unsigned_type_for result whenever
>   etype isn't TYPE_UNSIGNED INTEGER_TYPE.
>   (build_range_check): Don't call unsigned_type_for for pointer types.
>   * match.pd (X / C1 op C2): Don't call unsigned_type_for on
>   range_check_type result.
> 
> --- gcc/fold-const.c.jj   2019-08-27 22:52:24.207334541 +0200
> +++ gcc/fold-const.c  2019-09-01 22:46:17.091058145 +0200
> @@ -4938,10 +4938,9 @@ range_check_type (tree etype)
>/* First make sure that arithmetics in this type is valid, then make sure
>   that it wraps around.  */
>if (TREE_CODE (etype) == ENUMERAL_TYPE || TREE_CODE (etype) == 
> BOOLEAN_TYPE)
> -etype = lang_hooks.types.type_for_size (TYPE_PRECISION (etype),
> - TYPE_UNSIGNED (etype));
> +etype = lang_hooks.types.type_for_size (TYPE_PRECISION (etype), 1);
>  
> -  if (TREE_CODE (etype) == INTEGER_TYPE && !TYPE_OVERFLOW_WRAPS (etype))
> +  if (TREE_CODE (etype) == INTEGER_TYPE && !TYPE_UNSIGNED (etype))
>  {
>tree utype, minv, maxv;
>  
> @@ -4959,6 +4958,8 @@ range_check_type (tree etype)
>else
>   return NULL_TREE;
>  }
> +  else if (POINTER_TYPE_P (etype))
> +etype = unsigned_type_for (etype);
>return etype;
>  }
>  
> @@ -5049,9 +5050,6 @@ build_range_check (location_t loc, tree
>if (etype == NULL_TREE)
>  return NULL_TREE;
>  
> -  if (POINTER_TYPE_P (etype))
> -etype = unsigned_type_for (etype);
> -
>high = fold_convert_loc (loc, etype, high);
>low = fold_convert_loc (loc, etype, low);
>exp = fold_convert_loc (loc, etype, exp);
> --- gcc/match.pd.jj   2019-08-27 12:26:40.745863588 +0200
> +++ gcc/match.pd  2019-09-01 18:23:02.098729356 +0200
> @@ -1569,8 +1569,6 @@ (define_operator_list COND_TERNARY
>   tree etype = range_check_type (TREE_TYPE (@0));
>   if (etype)
> {
> - if (! TYPE_UNSIGNED (etype))
> -   etype = unsigned_type_for (etype);
>   hi = fold_convert (etype, hi);
>   lo = fold_convert (etype, lo);
>   hi = const_binop (MINUS_EXPR, etype, hi, lo);
> 
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts

2019-09-02 Thread Richard Sandiford
Christophe Lyon  writes:
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index c7a464c..721729d 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1167,7 +1167,7 @@ arm*-*-netbsdelf*)
>   tmake_file="${tmake_file} arm/t-arm"
>   target_cpu_cname="strongarm"
>   ;;
> -arm*-*-linux-*)  # ARM GNU/Linux with ELF
> +arm*-*-linux-* | arm*-*-uclinuxfdpiceabi)# ARM GNU/Linux 
> with ELF
>   tm_file="dbxelf.h elfos.h gnu-user.h linux.h linux-android.h 
> glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h"
>   extra_options="${extra_options} linux-android.opt"
>   case $target in

Better to remove the "# ARM GNU/Linux with ELF" comment too, since it
doesn't cover the new case and was already misleading given the
bionic support.

> diff --git a/libgcc/config.host b/libgcc/config.host
> index 91abc84..facca2a 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -435,7 +435,7 @@ arm*-*-fuchsia*)
>  arm*-*-netbsdelf*)
>   tmake_file="$tmake_file arm/t-arm arm/t-netbsd t-slibgcc-gld-nover"
>   ;;
> -arm*-*-linux*)   # ARM GNU/Linux with ELF
> +arm*-*-linux* | arm*-*-uclinuxfdpiceabi) # ARM GNU/Linux 
> with ELF
>   tmake_file="${tmake_file} arm/t-arm t-fixedpoint-gnu-prefix t-crtfm"
>   tmake_file="${tmake_file} arm/t-elf arm/t-bpabi arm/t-linux-eabi 
> t-slibgcc-libgcc"
>   tm_file="$tm_file arm/bpabi-lib.h"

Same here.

OK with those changes, thanks.

Richard


Re: [PATCH] Fix up go regressions caused by my recent switchconv changes (PR go/91617)

2019-09-02 Thread Andrew Pinski
On Mon, Sep 2, 2019 at 1:14 AM Jakub Jelinek  wrote:
>
> On Sun, Sep 01, 2019 at 06:44:15PM +0200, Richard Biener wrote:
> > On September 1, 2019 6:34:25 PM GMT+02:00, Jakub Jelinek  
> > wrote:
> > >On Sat, Aug 31, 2019 at 08:25:49PM +0200, Richard Biener wrote:
> > >> So why not always return an unsigned type then by telling
> > >type_for_size?
> > >
> > >So like this (if it passes bootstrap/regtest)?
> >
> > Yes.
>
> Unfortunately that didn't work, because TYPE_MAX_VALUE/TYPE_MIN_VALUE
> are not present on POINTER_TYPEs.
>
> Here is an updated version that passed bootstrap/regtest on both
> x86_64-linux and i686-linux, ok for trunk?

Seems like this would fix PR91632 also.
Which has a C testcase included.

Thanks,
Andrew Pinski

>
> 2019-09-02  Jakub Jelinek  
>
> PR go/91617
> * fold-const.c (range_check_type): For enumeral and boolean
> type, pass 1 to type_for_size langhook instead of
> TYPE_UNSIGNED (etype).  Return unsigned_type_for result whenever
> etype isn't TYPE_UNSIGNED INTEGER_TYPE.
> (build_range_check): Don't call unsigned_type_for for pointer types.
> * match.pd (X / C1 op C2): Don't call unsigned_type_for on
> range_check_type result.
>
> --- gcc/fold-const.c.jj 2019-08-27 22:52:24.207334541 +0200
> +++ gcc/fold-const.c2019-09-01 22:46:17.091058145 +0200
> @@ -4938,10 +4938,9 @@ range_check_type (tree etype)
>/* First make sure that arithmetics in this type is valid, then make sure
>   that it wraps around.  */
>if (TREE_CODE (etype) == ENUMERAL_TYPE || TREE_CODE (etype) == 
> BOOLEAN_TYPE)
> -etype = lang_hooks.types.type_for_size (TYPE_PRECISION (etype),
> -   TYPE_UNSIGNED (etype));
> +etype = lang_hooks.types.type_for_size (TYPE_PRECISION (etype), 1);
>
> -  if (TREE_CODE (etype) == INTEGER_TYPE && !TYPE_OVERFLOW_WRAPS (etype))
> +  if (TREE_CODE (etype) == INTEGER_TYPE && !TYPE_UNSIGNED (etype))
>  {
>tree utype, minv, maxv;
>
> @@ -4959,6 +4958,8 @@ range_check_type (tree etype)
>else
> return NULL_TREE;
>  }
> +  else if (POINTER_TYPE_P (etype))
> +etype = unsigned_type_for (etype);
>return etype;
>  }
>
> @@ -5049,9 +5050,6 @@ build_range_check (location_t loc, tree
>if (etype == NULL_TREE)
>  return NULL_TREE;
>
> -  if (POINTER_TYPE_P (etype))
> -etype = unsigned_type_for (etype);
> -
>high = fold_convert_loc (loc, etype, high);
>low = fold_convert_loc (loc, etype, low);
>exp = fold_convert_loc (loc, etype, exp);
> --- gcc/match.pd.jj 2019-08-27 12:26:40.745863588 +0200
> +++ gcc/match.pd2019-09-01 18:23:02.098729356 +0200
> @@ -1569,8 +1569,6 @@ (define_operator_list COND_TERNARY
> tree etype = range_check_type (TREE_TYPE (@0));
> if (etype)
>   {
> -   if (! TYPE_UNSIGNED (etype))
> - etype = unsigned_type_for (etype);
> hi = fold_convert (etype, hi);
> lo = fold_convert (etype, lo);
> hi = const_binop (MINUS_EXPR, etype, hi, lo);
>
>
> Jakub


Re: [PATCH] Fix up go regressions caused by my recent switchconv changes (PR go/91617)

2019-09-02 Thread Jakub Jelinek
On Sun, Sep 01, 2019 at 06:44:15PM +0200, Richard Biener wrote:
> On September 1, 2019 6:34:25 PM GMT+02:00, Jakub Jelinek  
> wrote:
> >On Sat, Aug 31, 2019 at 08:25:49PM +0200, Richard Biener wrote:
> >> So why not always return an unsigned type then by telling
> >type_for_size? 
> >
> >So like this (if it passes bootstrap/regtest)?
> 
> Yes. 

Unfortunately that didn't work, because TYPE_MAX_VALUE/TYPE_MIN_VALUE
are not present on POINTER_TYPEs.

Here is an updated version that passed bootstrap/regtest on both
x86_64-linux and i686-linux, ok for trunk?

2019-09-02  Jakub Jelinek  

PR go/91617
* fold-const.c (range_check_type): For enumeral and boolean
type, pass 1 to type_for_size langhook instead of
TYPE_UNSIGNED (etype).  Return unsigned_type_for result whenever
etype isn't TYPE_UNSIGNED INTEGER_TYPE.
(build_range_check): Don't call unsigned_type_for for pointer types.
* match.pd (X / C1 op C2): Don't call unsigned_type_for on
range_check_type result.

--- gcc/fold-const.c.jj 2019-08-27 22:52:24.207334541 +0200
+++ gcc/fold-const.c2019-09-01 22:46:17.091058145 +0200
@@ -4938,10 +4938,9 @@ range_check_type (tree etype)
   /* First make sure that arithmetics in this type is valid, then make sure
  that it wraps around.  */
   if (TREE_CODE (etype) == ENUMERAL_TYPE || TREE_CODE (etype) == BOOLEAN_TYPE)
-etype = lang_hooks.types.type_for_size (TYPE_PRECISION (etype),
-   TYPE_UNSIGNED (etype));
+etype = lang_hooks.types.type_for_size (TYPE_PRECISION (etype), 1);
 
-  if (TREE_CODE (etype) == INTEGER_TYPE && !TYPE_OVERFLOW_WRAPS (etype))
+  if (TREE_CODE (etype) == INTEGER_TYPE && !TYPE_UNSIGNED (etype))
 {
   tree utype, minv, maxv;
 
@@ -4959,6 +4958,8 @@ range_check_type (tree etype)
   else
return NULL_TREE;
 }
+  else if (POINTER_TYPE_P (etype))
+etype = unsigned_type_for (etype);
   return etype;
 }
 
@@ -5049,9 +5050,6 @@ build_range_check (location_t loc, tree
   if (etype == NULL_TREE)
 return NULL_TREE;
 
-  if (POINTER_TYPE_P (etype))
-etype = unsigned_type_for (etype);
-
   high = fold_convert_loc (loc, etype, high);
   low = fold_convert_loc (loc, etype, low);
   exp = fold_convert_loc (loc, etype, exp);
--- gcc/match.pd.jj 2019-08-27 12:26:40.745863588 +0200
+++ gcc/match.pd2019-09-01 18:23:02.098729356 +0200
@@ -1569,8 +1569,6 @@ (define_operator_list COND_TERNARY
tree etype = range_check_type (TREE_TYPE (@0));
if (etype)
  {
-   if (! TYPE_UNSIGNED (etype))
- etype = unsigned_type_for (etype);
hi = fold_convert (etype, hi);
lo = fold_convert (etype, lo);
hi = const_binop (MINUS_EXPR, etype, hi, lo);


Jakub


Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Hongtao Liu
> which is not the case with core_cost (and similar with skylake_cost):
>
>   2, 2, 4,/* cost of moving XMM,YMM,ZMM register */
>   {6, 6, 6, 6, 12},/* cost of loading SSE registers
>in 32,64,128,256 and 512-bit */
>   {6, 6, 6, 6, 12},/* cost of storing SSE registers
>in 32,64,128,256 and 512-bit */
>   2, 2,/* SSE->integer and integer->SSE moves */
>
> We have the same cost of moving between integer registers (by default
> set to 2), between SSE registers and between integer and SSE register
> sets. I think that at least the cost of moves between regsets should
> be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> that would translate to the value of 6. The value should be low enough
> to keep the cost below the value that forces move through the memory.
> Changing core register allocation cost of SSE <-> integer to:
>
> --cut here--
> Index: config/i386/x86-tune-costs.h
> ===
> --- config/i386/x86-tune-costs.h(revision 275281)
> +++ config/i386/x86-tune-costs.h(working copy)
> @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
>in 32,64,128,256 and 512-bit */
>{6, 6, 6, 6, 12},/* cost of storing SSE registers
>in 32,64,128,256 and 512-bit */
> -  2, 2,/* SSE->integer and
> integer->SSE moves */
> +  6, 6,/* SSE->integer and
> integer->SSE moves */
>/* End of register allocator costs.  */
>},
>
> --cut here--
>
> still produces direct move in gcc.target/i386/minmax-6.c
>
> I think that in addition to attached patch, values between 2 and 6
> should be considered in benchmarking. Unfortunately, without access to
> regressed SPEC tests, I can't analyse these changes by myself.
>
> Uros.

Apply similar change to skylake_cost, on skylake workstation we got
performance like:
---
version|
548_exchange_r score
gcc10_20180822:   |   10
apply remove_max8   |   8.9
also apply increase integer_tofrom_sse cost |   9.69
-
Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
libgfortran.so.5.0.0.

I found suspicious code as bellow, does it affect?
--
modified   gcc/config/i386/i386-features.c
@@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
   if (dump_file)
 fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);

-  /* ???  What about integer to SSE?  */
+  /* ???  What about integer to SSE?  */???
   EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
 cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
--
-- 
BR,
Hongtao


[PATCH] [gcc-8-branch] Fix recent unique_ptr regressions

2019-09-02 Thread Jonathan Wakely

These test changes should have been committed with r275193.

* testsuite/20_util/unique_ptr/assign/48635_neg.cc: Replace dg-error
with dg-prune-output for enable_if failure.
* testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc: Add
dg-prune-output for enable_if failure.

Tested x86_64-linux (properly this time), committed to gcc-8-branch.

commit 5514ed57565e009de51831298e81718e3278a7cd
Author: Jonathan Wakely 
Date:   Mon Sep 2 08:51:49 2019 +0100

Fix recent unique_ptr regressions

These test changes should have been committed with r275193.

* testsuite/20_util/unique_ptr/assign/48635_neg.cc: Replace dg-error
with dg-prune-output for enable_if failure.
* testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc: Add
dg-prune-output for enable_if failure.

diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/assign/48635_neg.cc 
b/libstdc++-v3/testsuite/20_util/unique_ptr/assign/48635_neg.cc
index b22d0e123b4..23a5eb007a1 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/assign/48635_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/assign/48635_neg.cc
@@ -42,10 +42,9 @@ void f()
   std::unique_ptr ud(nullptr, d);
   ub = std::move(ud); // { dg-error "no match" }
   ub2 = ud; // { dg-error "no match" }
-// { dg-error "no type" "" { target *-*-* } 307 }
 
   std::unique_ptr uba(nullptr, b);
   std::unique_ptr uda(nullptr, d);
   uba = std::move(uda); // { dg-error "no match" }
-// { dg-error "no type" "" { target *-*-* } 566 }
 }
+// { dg-prune-output "no type" }
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc 
b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc
index c1b1c9efc64..7e820ba129a 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc
@@ -39,7 +39,7 @@ test07()
   std::unique_ptr cA3(p); // { dg-error "no matching function" }
   std::unique_ptr vA3(p); // { dg-error "no matching function" }
   std::unique_ptr cvA3(p); // { dg-error "no matching 
function" }
-  // { dg-error "no type" "" { target *-*-* } 473 }
+  // { dg-prune-output "no type" }
 }
 
 template


Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts

2019-09-02 Thread Christophe Lyon
On Fri, 30 Aug 2019 at 16:49, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Fri, 30 Aug 2019 at 11:00, Richard Sandiford
> >  wrote:
> >>
> >> Christophe Lyon  writes:
> >> > @@ -785,7 +785,7 @@ case ${target} in
> >> >esac
> >> >tmake_file="t-slibgcc"
> >> >case $target in
> >> > -*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | 
> >> > *-*-kopensolaris*-gnu)
> >> > +*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | 
> >> > *-*-kopensolaris*-gnu  | *-*-uclinuxfdpiceabi)
> >> >:;;
> >> >  *-*-gnu*)
> >> >native_system_header_dir=/include
> >>
> >> I don't think this is necessary, since this target will never match the
> >> following *-*-gnu*) stanza anyway.
> > OK (I thought it was clearer to add the fdpic config where we already
> > have linux that would not match)
>
> I think the idea is to match pure GNU systems only in the second stanza
> (i.e. GNU/Hurd).  So we need the first stanza to exclude hybrid-GNU
> systems like GNU/Linux, GNU/Solaris, GNU/FreeBSD, etc.
>
> Since uclinuxfdpiceabi isn't a GNU-based system, I don't think it
> needs to appear at all.
>
> >> > diff --git a/libtool.m4 b/libtool.m4
> >> > index 8966762..64e507a 100644
> >> > --- a/libtool.m4
> >> > +++ b/libtool.m4
> >> > @@ -3734,7 +3739,7 @@ m4_if([$1], [CXX], [
> >> >   ;;
> >> >   esac
> >> >   ;;
> >> > -  linux* | k*bsd*-gnu | kopensolaris*-gnu)
> >> > +  linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*)
> >> >   case $cc_basename in
> >> > KCC*)
> >> >   # KAI C++ Compiler
> >>
> >> Is this needed?  It seems to be in the !GCC branch of an if/else.
> > I must admit I didn't test this case. I thought it was needed because
> > this target does not match "linux*", in case someone tries to compile
> > with another compiler...
> >
> >
> >>
> >> If it is needed, the default:
> >>
> >> _LT_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no
> >>
> >> seems correct for non-FDPIC uclinux.
> >>
> > So, either use uclinuxfdpiceabi above, or do nothing and do not try to
> > support other compilers?
>
> Yeah.  I think the latter's better, since in this context we only
> need libtool.m4 to support building with GCC.  The decision might
> be different for upstream libtool, but do any commercial compilers
> support Arm FDPIC yet?
>
> >> > @@ -4032,7 +4037,7 @@ m4_if([$1], [CXX], [
> >> >_LT_TAGVAR(lt_prog_compiler_static, $1)='-non_shared'
> >> >;;
> >> >
> >> > -linux* | k*bsd*-gnu | kopensolaris*-gnu)
> >> > +linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*)
> >> >case $cc_basename in
> >> ># old Intel for x86_64 which still supported -KPIC.
> >> >ecc*)
> >>
> >> Same here.
> >>
> >> > @@ -5946,7 +5951,7 @@ if test "$_lt_caught_CXX_error" != yes; then
> >> >  _LT_TAGVAR(inherit_rpath, $1)=yes
> >> >  ;;
> >> >
> >> > -  linux* | k*bsd*-gnu | kopensolaris*-gnu)
> >> > +  linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
> >> >  case $cc_basename in
> >> >KCC*)
> >> >   # Kuck and Associates, Inc. (KAI) C++ Compiler
> >>
> >> Here too the code seems to be dealing specifically with non-GCC compilers.
> >>
> >> > @@ -6598,7 +6603,7 @@ interix[[3-9]]*)
> >> >_LT_TAGVAR(postdeps,$1)=
> >> >;;
> >> >
> >> > -linux*)
> >> > +linux* | uclinux*)
> >> >case `$CC -V 2>&1 | sed 5q` in
> >> >*Sun\ C*)
> >> >  # Sun C++ 5.9
> >>
> >> Here too.  (It only seems to do anything for Sun's C compiler.)
> >>
> >> The fewer hunks we have to maintain downstream the better :-)
> >>
> > Sure.
> >
> > I thought safer/cleaner to prepare the cases for non-GCC compilers, I
> > guess it's better not to add that until proven useful?
>
> Yeah, I think so.  I guess it depends on your POV.  To me, it seems
> cleaner to add uclinux* and uclinuxfdpiceabi only where we know there's
> a specific need, since that's also how we decide which of uclinux* and
> uclinuxfdpiceabi to use.
>

OK, here is an updated version of this patch.

Christophe

> Thanks,
> Richard
commit 0dbd18d60be654fa2ff2ae85670cc096db5217a5
Author: Christophe Lyon 
Date:   Fri May 4 15:11:35 2018 +

[ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts

The new arm-uclinuxfdpiceabi target behaves pretty much like
arm-linux-gnueabi. In order to enable the same set of features, we
have to update several configure scripts that generally match targets
like *-*-linux*: in most places, we add *-uclinux* where there is
already *-linux*, or uclinux* when there is already linux*.

In gcc/config.gcc and libgcc/config.host we use *-*-uclinuxfdpiceabi
because there is already a different behaviour for *-*uclinux* target.

In libtool.m4, we use uclinuxfdpiceabi in cases where ELF shared
libraries support is required, as uclinux does not guarantee that.

2019-XX-XX  Christophe Lyon  

	config/
	* futex.m4: 

Re: [PATCH] Fix PR 91605

2019-09-02 Thread Richard Biener
On Sun, 1 Sep 2019, Bernd Edlinger wrote:

> Hi,
> 
> this fixes an oversight in r274986.
> We need to avoid using movmisalign on DECL_P which are not in memory,
> similar to the !mem_ref_refers_to_non_mem_p which unfortunately can't
> handle DECL_P.
> 

But

-  && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
+  && (DECL_P (to) ? MEM_P (DECL_RTL (to))
+ : !mem_ref_refers_to_non_mem_p (to))

and in mem_ref_refers_to_non_mem_p we do

  if (!DECL_RTL_SET_P (base))
return nortl;

  return (!MEM_P (DECL_RTL (base)));

so when !DECL_RTL_SET_P (t) we can go full speed ahead?  That said,
can we refactor addr_expr_of_non_mem_decl_p_1 to put

  if (TREE_CODE (addr) != ADDR_EXPR)
return false;

  tree base = TREE_OPERAND (addr, 0);

into the single caller and re-use it then also for the DECL_P case?

Thanks,
Richard.


> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH] Add .pd extension to c_exts.

2019-09-02 Thread Martin Liška
On 8/30/19 2:54 PM, Richard Biener wrote:
> On Fri, Aug 30, 2019 at 2:31 PM Alexander Monakov  wrote:
>>
>>
>>
>> On Fri, 30 Aug 2019, Richard Biener wrote:
>>
>>> On Fri, Aug 30, 2019 at 12:58 PM Martin Liška  wrote:

 Hi.

 I would like to add .pd to c_exts so that one
 can have correctly set tab width, etc.
>>>
>>> But then it will auto-indent with too much spaces, no?
>>
>> I think it's fine, the script does
>>
>>   setlocal cindent
>>   setlocal cinoptions=>4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0
>>
>> so it's a bit smarter than indenting everything by 2 spaces :)
>>
>> So +1 from me for the patch, but note that now we have a contrib/vim-gcc-dev
>> "plugin" directory, it might be more natural to add gcc-match indent rules
>> there.
> 
> Yeah, that would be better than claiming .pd is C ...
> 
> Richard.
> 
>>
>> Alexander

Yep, I'm going to apply following patch that does it properly for the gcc-match
file type.

Martin
>From c72341c6a23ecc519ca2ecf104970b28c301b968 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 2 Sep 2019 09:44:18 +0200
Subject: [PATCH] Set tabstop=8 for gcc-match file types.

contrib/ChangeLog:

2019-09-02  Martin Liska  

	* vim-gcc-dev/syntax/gcc-match.vim: Set tabstop=8.
---
 contrib/vim-gcc-dev/syntax/gcc-match.vim | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/contrib/vim-gcc-dev/syntax/gcc-match.vim b/contrib/vim-gcc-dev/syntax/gcc-match.vim
index 356b07a15b2..e46140da044 100644
--- a/contrib/vim-gcc-dev/syntax/gcc-match.vim
+++ b/contrib/vim-gcc-dev/syntax/gcc-match.vim
@@ -68,4 +68,6 @@ hi def link pdComment Comment
 hi def link pdTodoTodo
 hi def link pdPreProc PreProc
 
+setlocal tabstop=8
+
 let b:current_syntax = "gcc-match"
-- 
2.23.0



Re: [v3] Update Solaris baselines for GCC 9.3

2019-09-02 Thread Jonathan Wakely

On 01/09/19 12:47 +0200, Rainer Orth wrote:

And now the Solaris libstdc++ baseline updates for the gcc-9 branch.

Tested on i386-pc-solaris2.1[01] and sparc-sun-solaris2.1[01].  Ok for
mainline?


OK for gcc-9-branch :-)

Thanks.




Re: [v3] Update Solaris baselines for GCC 10.0

2019-09-02 Thread Jonathan Wakely

On 01/09/19 12:45 +0200, Rainer Orth wrote:

Here's are the updates to the Solaris libstdc++ baselines on mainline.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.  Ok for mainline?


Yes, thanks.



Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-09-02 Thread Alan Modra
On Sun, Sep 01, 2019 at 09:48:49PM +0200, Uros Bizjak wrote:
> the first try to implement the idea of forcing a subclass (I must
> admit that the patch is a bit of a shot in the dark...).

Yes, keep in mind that rs6000_ira_change_pseudo_allocno_class is a
hack, and one that might only be useful with
TARGET_COMPUTE_PRESSURE_CLASSES (and possibly SCHED_PRESSURE_MODEL
too).

-- 
Alan Modra
Australia Development Lab, IBM