RE: [committed][PATCH][GCC][testsuite][Arm] fix testism, add required option after require.

2019-01-14 Thread Tamar Christina
Hi Christoph,

Thanks for the report, It seems we're really inconsistent with lane index 
endianness on arm, patch is going through
Final bootstrap and regtesting to send out. With the AArch64 patch this should 
take care of all big-endian issues.

I'm hoping to fix the endianness stuff for Arm in GCC 10 so they're more 
aligned with AArch64.

Thanks,
Tamar

-Original Message-
From: Christophe Lyon  
Sent: Monday, January 14, 2019 1:46 PM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; Ramana Radhakrishnan 
; Richard Earnshaw ; 
ni...@redhat.com; Kyrylo Tkachov 
Subject: Re: [committed][PATCH][GCC][testsuite][Arm] fix testism, add required 
option after require.

Hi Tamar,

On Fri, 11 Jan 2019 at 15:22, Tamar Christina  wrote:
>
> Hi All,
>
> The test declared the fp16 requirement, but didn't add the options 
> causing it to fail when the target doesn't have it on by default.
>
> Bootstrapped Regtested on arm-none-Linux-gnueabihf and no issues.
>
> committed under the gcc obvious rules.
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
> 2019-01-11  Tamar Christina  
>
> * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Require 
> neon
> and add options.
>

Thanks for this patch.

However, the scan-assembler-times part of the test still fail on armeb:
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#180 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#270 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\], #0
1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\],
#180 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\],
#270 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\], #90
1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#180 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#270 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\], #0
1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\],
#180 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\],
#270 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\], #90
1

But you are probably already aware of that.

Christophe


> --


Re: ISO_Fortran_binding patch

2019-01-14 Thread Richard Biener
On January 15, 2019 12:07:53 AM GMT+01:00, Jakub Jelinek  
wrote:
>On Sat, Jan 12, 2019 at 06:35:20PM +, Paul Richard Thomas wrote:
>> Done as revision 267884.
>
>The other tests FAILs too:
>FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O0  (test for excess
>errors)
>UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O0  compilation
>failed to produce executable
>FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O1  (test for excess
>errors)
>UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O1  compilation
>failed to produce executable
>FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O2  (test for excess
>errors)
>UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O2  compilation
>failed to produce executable
>FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -fomit-frame-pointer
>-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
>excess errors)
>UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O3
>-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
>-finline-functions  compilation failed to produce executable
>FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -g  (test for excess
>errors)
>UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -g  compilation
>failed to produce executable
>FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -Os  (test for excess
>errors)
>UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -Os  compilation
>failed to produce executable
>
>The problem is that:
>Excess errors:
>/home/jakub/src/gcc/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c:3:10:
>fatal error: ISO_Fortran_binding.h: No such file or directory
>compilation terminated.
>
>It either should
>#include "../../../libgfortran/ISO_Fortran_binding.h"
>instead or the Fortran *.exp files should arrange for
>-I.../libgfortran/
>to be added to all gfortran tests.  Because right now it FAILs if you
>don't
>have ISO_Fortran_binding.h header installed, or succeeds, but includes
>header from some other compiler version or even other compiler
>altogether.
>
>Where is that header installed BTW?
>Would be best if it got installed in directories like:
>$prefix/lib/gcc/$target/$version/include
>
>See e.g. libssp or libsanitizer, both have something like
>target_noncanonical = @target_noncanonical@
>libsubincludedir =
>$(libdir)/gcc/$(target_noncanonical)/$(gcc_version)/include
>nobase_libsubinclude_HEADERS = ssp/ssp.h ssp/string.h ssp/stdio.h
>ssp/unistd.h
>
>You probably want it to go directly in the include dir, so without the
>ssp/
>or whatever else prefixes.

It's there, but also in the multilib locations (which is dubious? Not sure if 
we ever search tose include paths) 

Richard. 

>
>   Jakub



Re: [PATCH] match.pd (uintptr_t) ptr1 [=!]= (uintptr_t) ptr2 improvements (PR tree-optimization/88775)

2019-01-14 Thread Richard Biener
On January 15, 2019 12:17:49 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The following patch (except for the else if (!INTEGRAL_TYPE_P ) part is
>(improved) code to fix up the ptr1 != ptr2 comparison handling. 
>Unfortunately it looks like we don't really want to lower pointer
>equality
>comparisons performed in integral type to normal pointer equality
>comparisons for GCC 9, as it is too risky, so the following patch keeps
>doing what we were doing for pointer comparisons, but for the
>comparisons
>of addresses in integral types
>1) fixes a bug, where for zero sized objects we'd happily optimize
>(uintptr_t) [0] != (uintptr_t) [0] even for int e[0] = {}, f[0] =
>{}
>2) improves the rest, mainly if the offset is different:
>- if one variable is automatic and the other one is global, we can fold
>  - if one or both pointers are in the middle of objects they point to,
> it is fine too
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Thanks, 
Richard. 

>2019-01-14  Jakub Jelinek  
>
>   PR tree-optimization/88775
>   * match.pd (cmp (convert1?@2 addr@0) (convert2? addr@1)): Optimize
>   equal == 0 equality pointer comparisons some more if compared in
>   integral types and either one points to an automatic var and the
>   other to a global, or we can prove at least one points to the middle
>   or both point to start or both point to end.
>
>   * gcc.dg/tree-ssa/pr88775-1.c: New test.
>   * gcc.dg/tree-ssa/pr88775-2.c: New test.
>
>--- gcc/match.pd.jj2019-01-12 12:20:56.190999044 +0100
>+++ gcc/match.pd   2019-01-14 14:26:18.805750285 +0100
>@@ -3896,6 +3896,52 @@ (define_operator_list COND_TERNARY
>   || TREE_CODE (base1) == SSA_NAME
>   || TREE_CODE (base1) == STRING_CST))
>  equal = (base0 == base1);
>+   if (equal == 0)
>+   {
>+ if (!DECL_P (base0) || !DECL_P (base1))
>+   equal = 2;
>+ else if (cmp != EQ_EXPR && cmp != NE_EXPR)
>+   equal = 2;
>+ /* If this is a pointer comparison, ignore for now even
>+valid equalities where one pointer is the offset zero
>+of one object and the other to one past end of another one.  */
>+ else if (!INTEGRAL_TYPE_P (TREE_TYPE (@2)))
>+   ;
>+ /* Assume that automatic variables can't be adjacent to global
>+variables.  */
>+ else if (is_global_var (base0) != is_global_var (base1))
>+   ;
>+ else
>+   {
>+ tree sz0 = DECL_SIZE_UNIT (base0);
>+ tree sz1 = DECL_SIZE_UNIT (base1);
>+ /* If sizes are unknown, e.g. VLA or not representable,
>+punt.  */
>+ if (!tree_fits_poly_int64_p (sz0)
>+ || !tree_fits_poly_int64_p (sz1))
>+   equal = 2;
>+ else
>+   {
>+ poly_int64 size0 = tree_to_poly_int64 (sz0);
>+ poly_int64 size1 = tree_to_poly_int64 (sz1);
>+ /* If one offset is pointing (or could be) to the beginning
>+of one object and the other is pointing to one past the
>+last byte of the other object, punt.  */
>+ if (maybe_eq (off0, 0) && maybe_eq (off1, size1))
>+   equal = 2;
>+ else if (maybe_eq (off1, 0) && maybe_eq (off0, size0))
>+   equal = 2;
>+ /* If both offsets are the same, there are some cases
>+we know that are ok.  Either if we know they aren't
>+zero, or if we know both sizes are no zero.  */
>+ if (equal == 2
>+ && known_eq (off0, off1)
>+ && (known_ne (off0, 0)
>+ || (known_ne (size0, 0) && known_ne (size1, 0
>+   equal = 0;
>+   }
>+   }
>+   }
>  }
>  (if (equal == 1
> && (cmp == EQ_EXPR || cmp == NE_EXPR
>@@ -3918,16 +3964,12 @@ (define_operator_list COND_TERNARY
>   { constant_boolean_node (known_ge (off0, off1), type); })
>(if (cmp == GT_EXPR && (known_gt (off0, off1) || known_le (off0,
>off1)))
>   { constant_boolean_node (known_gt (off0, off1), type); }))
>-  (if (equal == 0
>- && DECL_P (base0) && DECL_P (base1)
>- /* If we compare this as integers require equal offset.  */
>- && (!INTEGRAL_TYPE_P (TREE_TYPE (@2))
>- || known_eq (off0, off1)))
>-   (switch
>-  (if (cmp == EQ_EXPR)
>-   { constant_boolean_node (false, type); })
>-  (if (cmp == NE_EXPR)
>-   { constant_boolean_node (true, type); })
>+  (if (equal == 0)
>+  (switch
>+   (if (cmp == EQ_EXPR)
>+{ constant_boolean_node (false, type); })
>+   (if (cmp == NE_EXPR)
>+{ constant_boolean_node (true, type); })
> 
> /* Simplify pointer equality compares using PTA.  */
> (for 

PR88788 - Infinite loop in malloc_candidate_p_1

2019-01-14 Thread Prathamesh Kulkarni
Hi Richard,
I tested your fix and it passes bootstrap+test on
x86_64-unknown-linux-gnu and cross-tested on following arm and aarch64
sub-targets:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/267917-pr88788-3/report-build-info.html

Is it OK to commit ?

Thanks,
Prathamesh
2019-01-15  Richard Biener  
Prathamesh Kulkarni  

PR ipa/88378
* ipa-pure-const.c (malloc_candidate_p_1): Add parameter visited and
return true if SSA_NAME is already marked in visited bitmap.
* (malloc_candidate_p): Pass visited to malloc_candidate_p_1.

testsuite/
* g++.dg/ipa/pr88788.C: New test.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 37b58853fe1..8227eed29bc 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -878,9 +878,12 @@ check_retval_uses (tree retval, gimple *stmt)
 }
 
 static bool
-malloc_candidate_p_1 (function *fun, tree retval, gimple *ret_stmt, bool ipa)
+malloc_candidate_p_1 (function *fun, tree retval, gimple *ret_stmt, bool ipa,
+ bitmap visited)
 {
   cgraph_node *node = cgraph_node::get_create (fun->decl);
+  if (!bitmap_set_bit (visited, SSA_NAME_VERSION (retval)))
+return true;
 
   if (!check_retval_uses (retval, ret_stmt))
 DUMP_AND_RETURN("Return value has uses outside return stmt"
@@ -925,7 +928,7 @@ malloc_candidate_p_1 (function *fun, tree retval, gimple 
*ret_stmt, bool ipa)
gimple *arg_def = SSA_NAME_DEF_STMT (arg);
if (is_a (arg_def))
  {
-   if (!malloc_candidate_p_1 (fun, arg, phi, ipa))
+   if (!malloc_candidate_p_1 (fun, arg, phi, ipa, visited))
DUMP_AND_RETURN ("nested phi fail")
continue;
  }
@@ -971,6 +974,7 @@ malloc_candidate_p (function *fun, bool ipa)
   || !flag_delete_null_pointer_checks)
 return false;
 
+  auto_bitmap visited;
   FOR_EACH_EDGE (e, ei, exit_block->preds)
 {
   gimple_stmt_iterator gsi = gsi_last_bb (e->src);
@@ -987,7 +991,7 @@ malloc_candidate_p (function *fun, bool ipa)
  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
DUMP_AND_RETURN("Return value is not SSA_NAME or not a pointer type.")
 
-  if (!malloc_candidate_p_1 (fun, retval, ret_stmt, ipa))
+  if (!malloc_candidate_p_1 (fun, retval, ret_stmt, ipa, visited))
return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/ipa/pr88788.C 
b/gcc/testsuite/g++.dg/ipa/pr88788.C
new file mode 100644
index 000..94af174f82c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr88788.C
@@ -0,0 +1,1003 @@
+/* { dg-do compile } */
+/* { dg-options "-w -O2" } */
+
+  extern "C" {
+  typedef long unsigned int size_t;
+  typedef long int __ssize_t;
+  typedef __ssize_t ssize_t;
+  extern int strncmp (const char *__s1, const char *__s2, size_t 
__n)  throw () __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1, 
2)));
+  extern size_t strlen (const char *__s)  throw () 
__attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1)));
+  extern void *malloc (size_t __size) throw () __attribute__ 
((__malloc__)) __attribute__ ((__warn_unused_result__));
+  }
+   typedef ssize_t Py_ssize_t;
+ extern "C" {
+  typedef struct _object 
+  PyObject;
+  typedef struct 
+  _Py_Identifier;
+  typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);
+  struct PyMethodDef {
+  const char *ml_name;
+  PyCFunction ml_meth;
+  int ml_flags;
+  const char *ml_doc;
+  };
+  typedef struct swig_type_info *(*swig_dycast_func)(void **);
+  typedef struct swig_type_info {
+const char *name;
+const char *str;
+swig_dycast_func dcast;
+struct swig_cast_info *cast;
+void *clientdata;
+int owndata;
+  }
+  swig_type_info;
+  static __attribute__ ((__unused__)) char * SWIG_PackVoidPtr(char 
*buff, void *ptr, const char *name, size_t bsz) ;
+  typedef struct swig_const_info {
+int type;
+char *name;
+void *pvalue;
+swig_type_info **ptype;
+  }
+  swig_const_info;
+  static __attribute__ ((__unused__)) PyObject * 
SWIG_Python_NewPointerObj(PyObject *self, void *ptr, swig_type_info *type, int 
flags) ;
+}
+   static swig_type_info *swig_types[276];
+   namespace boost {
+  namespace noncopyable_ {
+class noncopyable   {
+ protected:   };
+  }
+  typedef noncopyable_::noncopyable noncopyable;
+  }
+   namespace storage {
+

Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support

2019-01-14 Thread Christophe Lyon
On Mon, 14 Jan 2019 at 20:59, Jason Merrill  wrote:
>
> On 12/23/18 9:27 PM, Tom Honermann wrote:
> > Attached is a revised patch that addresses changes in P0482R6 as well as
> > feedback provided by Jason.  Changes from the prior patch include:
> > - Updated the value of the __cpp_char8_t feature test macro to 201811
> >per P0482R6.
> > - Enable char8_t support with -std=c++2a per adoption of P0482R6 in
> >San Diego.
> > - Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
> >by Jason.
> > - Removed unnecessary checks of 'flag_char8_t' within the C++ front
> >end as requested by Jason.
> > - Corrected the regression spotted by Jason regarding initialization of
> >signed char and unsigned char arrays with string literals.
> > - Made minor changes to the error message emitted for ill-formed
> >initialization of char arrays with UTF-8 string literals.  These
> >changes do not yet implement Jason's suggestion; I'll follow up with a
> >separate patch for that due to additional test impact.
> >
> > Tested on x86_64-linux.
>
> I just applied the compiler changes with small modifications, as
> follows; thank you very much for the patches.  Jonathan should check in
> the library portion before long.
>
> Jason

Hi,

The new testcase g++.dg/ext/utf-cvt-char8_t.C fails at least on arm and aarch64:

g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++14  (test for warnings, line 24)
g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++17  (test for warnings, line 24)

Christophe


Re: [WIP] Reimplementation of IPA-SRA

2019-01-14 Thread Martin Liška

On 1/2/19 2:20 PM, Martin Liška wrote:

On 12/30/18 12:41 AM, Martin Jambor wrote:

Any comments welcome,


Hi Martin.

I'll run smoke test for OBS Factory with -flto flags enabled for the patch.
So far I've noticed that current trunk can't profilebootstrap with following
configuration:

$ ../configure --enable-languages=c,c++,d --disable-multilib 
--disable-libsanitizer --disable-werror
...
$ make profiledbootstrap
...
during RTL pass: expand
/home/mliska/Programming/gcc/libphobos/src/std/range/package.d: In function 
‘sanitize’:
/home/mliska/Programming/gcc/libphobos/src/std/range/package.d:10053:5: 
internal compiler error: in make_decl_rtl, at varasm.c:1333
10053 | return SortedRange!(Unqual!R, pred)(r);
   | ^
0xa97a21 make_decl_rtl(tree_node*)
../../gcc/varasm.c:1333
0x1086da5 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:9938
0x151427c expand_expr
../../gcc/expr.h:279
0x151427c expand_expr_addr_expr_1
../../gcc/expr.c:7945
0x1086417 expand_expr_addr_expr
../../gcc/expr.c:8066
0x1086417 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:11221
0x1085909 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:10303
0x1514648 expand_assignment(tree_node*, tree_node*, bool)
../../gcc/expr.c:5352
0x101a32e expand_gimple_stmt_1
../../gcc/cfgexpand.c:3746
0x101a32e expand_gimple_stmt
../../gcc/cfgexpand.c:3844
0x10185ad expand_gimple_basic_block
../../gcc/cfgexpand.c:5880
0x1015e31 execute
../../gcc/cfgexpand.c:6502

Martin



Hi.

Much easier to reproduce:

$ ~/Programming/gcc/configure --enable-languages=c,c++,fortran,d 
--prefix=/home/marxin/bin/gcc --disable-bootstrap --disable-multilib 
--without-isl
$ make
...
/bin/sh ../libtool --tag=D   --mode=compile /dev/shm/objdir/./gcc/gdc 
-B/dev/shm/objdir/./gcc/ -B/home/marxin/bin/gcc/x86_64-pc-linux-gnu/bin/ 
-B/home/marxin/bin/gcc/x86_64-pc-linux-gnu/lib/ -isystem 
/home/marxin/bin/gcc/x86_64-pc-linux-gnu/include -isystem 
/home/marxin/bin/gcc/x86_64-pc-linux-gnu/sys-include-fPIC -g -O2  -nostdinc 
-I /home/marxin/Programming/gcc/libphobos/src -I 
/home/marxin/Programming/gcc/libphobos/libdruntime -I ../libdruntime -I . -c -o 
std/uni.lo /home/marxin/Programming/gcc/libphobos/src/std/uni.d
libtool: compile:  /dev/shm/objdir/./gcc/gdc -B/dev/shm/objdir/./gcc/ 
-B/home/marxin/bin/gcc/x86_64-pc-linux-gnu/bin/ 
-B/home/marxin/bin/gcc/x86_64-pc-linux-gnu/lib/ -isystem 
/home/marxin/bin/gcc/x86_64-pc-linux-gnu/include -isystem 
/home/marxin/bin/gcc/x86_64-pc-linux-gnu/sys-include -fPIC -g -O2 -nostdinc -I 
/home/marxin/Programming/gcc/libphobos/src -I 
/home/marxin/Programming/gcc/libphobos/libdruntime -I ../libdruntime -I . -c 
/home/marxin/Programming/gcc/libphobos/src/std/uni.d -fversion=Shared -o 
std/.libs/uni.o
during RTL pass: expand
/home/marxin/Programming/gcc/libphobos/src/std/range/package.d: In function 
‘sanitize’:
/home/marxin/Programming/gcc/libphobos/src/std/range/package.d:10053:5: 
internal compiler error: in make_decl_rtl, at varasm.c:1337
10053 | return SortedRange!(Unqual!R, pred)(r);
  | ^
0x733287 make_decl_rtl(tree_node*)
/home/marxin/Programming/gcc/gcc/varasm.c:1333
0xa9dcda expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/marxin/Programming/gcc/gcc/expr.c:9953
0xaa7b5e expand_expr
/home/marxin/Programming/gcc/gcc/expr.h:279
0xaa7b5e expand_expr_addr_expr_1
/home/marxin/Programming/gcc/gcc/expr.c:7960
0xa9c3d4 expand_expr_addr_expr
/home/marxin/Programming/gcc/gcc/expr.c:8081
0xa9c3d4 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/marxin/Programming/gcc/gcc/expr.c:11236
0xa9d605 expand_expr
/home/marxin/Programming/gcc/gcc/expr.h:279
0xa9d605 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
/home/marxin/Programming/gcc/gcc/expr.c:10318
0xaa9ae5 expand_expr
/home/marxin/Programming/gcc/gcc/expr.h:279
0xaa9ae5 expand_assignment(tree_node*, tree_node*, bool)
/home/marxin/Programming/gcc/gcc/expr.c:5367
0x98b547 expand_gimple_stmt_1
/home/marxin/Programming/gcc/gcc/cfgexpand.c:3746
0x98b547 expand_gimple_stmt
/home/marxin/Programming/gcc/gcc/cfgexpand.c:3844
0x98da9f expand_gimple_basic_block
/home/marxin/Programming/gcc/gcc/cfgexpand.c:5880
0x992837 execute
/home/marxin/Programming/gcc/gcc/cfgexpand.c:6503

so ICE in D language run-time library.

Martin


Re: [REVISED PATCH 1/9]: C++ P0482R5 char8_t: Documentation updates

2019-01-14 Thread Tom Honermann

On 1/4/19 7:40 PM, Martin Sebor wrote:

On 12/23/18 7:27 PM, Tom Honermann wrote:
Attached is a revised patch that addresses feedback provided by Jason 
and Sandra.  Changes from the prior patch include:

- Updates to the -fchar8_t option documentation as requested by Jason.
- Corrections for indentation, spacing, hyphenation, and wrapping as
   requested by Sandra.



Just a minor nit that backticks in code examples should be avoided
(per the TexInfo manual, they can cause trouble when copying code
from PDF readers):

+@smallexample
+char ca[] = u8"xx"; // error: char-array initialized from wide
+    //    string
+const char *cp = u8"xx";// error: invalid conversion from
+    //    `const char8_t*' to `const char*'


Thanks for catching that, Martin.  Patch relative to trunk (r267930) 
attached to correct this (Jason already committed the original change).


Tom.



Martin



Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 267930)
+++ gcc/doc/invoke.texi	(working copy)
@@ -2468,16 +2468,16 @@
 char ca[] = u8"xx"; // error: char-array initialized from wide
 //string
 const char *cp = u8"xx";// error: invalid conversion from
-//`const char8_t*' to `const char*'
+//'const char8_t*' to 'const char*'
 int f(const char*);
 auto v = f(u8"xx"); // error: invalid conversion from
-//`const char8_t*' to `const char*'
+//'const char8_t*' to 'const char*'
 std::string s@{u8"xx"@};  // error: no matching function for call to
-//`std::basic_string::basic_string()'
+//'std::basic_string::basic_string()'
 using namespace std::literals;
 s = u8"xx"s;// error: conversion from
-//`basic_string' to non-scalar
-//type `basic_string' requested
+//'basic_string' to non-scalar
+//type 'basic_string' requested
 @end smallexample
 
 @item -fcheck-new


Re: PATCH: Updated error messages for ill-formed cases of array initialization by string literal

2019-01-14 Thread Tom Honermann

On 1/4/19 7:25 PM, Martin Sebor wrote:

On 12/27/18 1:49 PM, Tom Honermann wrote:
As requested by Jason in the review of the P0482 (char8_t) core 
language changes, this patch includes updates to the error messages 
emitted for ill-formed cases of array initialization with a string 
literal.  With these changes, error messages that previously looked 
something like these:


- "char-array initialized from wide string"
- "wide character array initialized from non-wide string"
- "wide character array initialized from incompatible wide string"

now look like:

- "cannot initialize array of type 'char' from a string literal with 
type array of 'short unsigned int'"


The first word "type" doesn't quite work here.  The type of every
array is "array of T" where T is the type of the element, so for
instance, "array of char."  Saying "array of type X" makes it sound
like X is the type of the whole array, which is of course not
the case when X is char.  I think you want to use the same wording
as for the second type:

  "cannot initialize array of 'char' from a string literal with
  type array of 'short unsigned int'"

or perhaps even better

  "cannot initialize array of 'char' from a string literal with
  type 'char16_t[N]'"

(i.e., show the actual type of the string, including its bound).


Thank you for the feedback, Martin; sorry for the delayed response.  
I'll follow up with a revised patch within the next week or two.


Tom.



Martin





Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support

2019-01-14 Thread Tom Honermann

On 1/14/19 2:58 PM, Jason Merrill wrote:

On 12/23/18 9:27 PM, Tom Honermann wrote:
Attached is a revised patch that addresses changes in P0482R6 as well 
as feedback provided by Jason. Changes from the prior patch include:

- Updated the value of the __cpp_char8_t feature test macro to 201811
   per P0482R6.
- Enable char8_t support with -std=c++2a per adoption of P0482R6 in
   San Diego.
- Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
   by Jason.
- Removed unnecessary checks of 'flag_char8_t' within the C++ front
   end as requested by Jason.
- Corrected the regression spotted by Jason regarding initialization of
   signed char and unsigned char arrays with string literals.
- Made minor changes to the error message emitted for ill-formed
   initialization of char arrays with UTF-8 string literals. These
   changes do not yet implement Jason's suggestion; I'll follow up 
with a

   separate patch for that due to additional test impact.

Tested on x86_64-linux.


I just applied the compiler changes with small modifications, as 
follows; thank you very much for the patches.  Jonathan should check 
in the library portion before long.


Excellent, thank you, Jason!

Tom.



Jason





Re: ISO_Fortran_binding patch

2019-01-14 Thread Steve Kargl
On Tue, Jan 15, 2019 at 12:07:53AM +0100, Jakub Jelinek wrote:
> On Sat, Jan 12, 2019 at 06:35:20PM +, Paul Richard Thomas wrote:
> > Done as revision 267884.
> 
> Where is that header installed BTW?
> Would be best if it got installed in directories like:
> $prefix/lib/gcc/$target/$version/include
> 

I have it in 

${HOME}/work/x/lib/gcc/x86_64-unknown-freebsd13.0/9.0.0/include

where my $prefix is ${HOME}/work/x.  So, this seems to match
your "best" suggestion. 

-- 
Steve


[PATCH] avoid issuing -Warray-bounds during folding (PR 88800)

2019-01-14 Thread Martin Sebor

The gimple_fold_builtin_memory_op() function folds calls to memcpy
and similar to MEM_REF when the size of the copy is a small power
of 2, but it does so without considering whether the copy might
write (or read) past the end of one of the objects.  To detect
these kinds of errors (and help distinguish them from -Westrict)
the folder calls into the wrestrict pass and lets it diagnose them.
Unfortunately, that can lead to false positives for even some fairly
straightforward code that is ultimately found to be unreachable.
PR 88800 is a report of one such problem.

To avoid these false positives the attached patch adjusts
the function to avoid issuing -Warray-bounds for out-of-bounds
calls to memcpy et al.  Instead, the patch disables the folding
of such invalid calls (and only those).  Those that are not
eliminated during DCE or other subsequent passes are eventually
diagnosed by the wrestrict pass.

Since this change required removing the dependency of the detection
on the warning options (originally done as a micro-optimization to
avoid spending compile-time cycles on something that wasn't needed)
the patch also adds tests to verify that code generation is not
affected as a result of warnings being enabled or disabled.  With
the patch as is, the invalid memcpy calls end up emitted (currently
they are folded into equally invalid MEM_REFs).  At some point,
I'd like us to consider whether they should be replaced with traps
(possibly under the control of  as has been proposed a number of
times in the past.  If/when that's done, these tests will need to
be adjusted to look for traps instead.

Tested on x86_64-linux.

Martin
PR tree-optimization/88800 - Spurious -Werror=array-bounds for non-taken branch

gcc/ChangeLog:

	PR tree-optimization/88800
	* gimple-fold.c (gimple_fold_builtin_memory_op): Avoid checking
	NO_WARNING bit here.  Avoid folding out-of-bounds calls.
	* gimple-ssa-warn-restrict.c (maybe_diag_offset_bounds): Remove
	redundant argument.  Add new argument and issue diagnostics under
	its control.  Detect out-of-bounds access even with warnings
	disabled.
	(check_bounds_or_overlap): Change return type.  Add argument.
	(wrestrict_dom_walker::check_call): Adjust.
	* gimple-ssa-warn-restrict.h (check_bounds_or_overlap): Add argument.
	* tree-ssa-strlen.c (handle_builtin_strcpy): Adjust to change in
	check_bounds_or_overlap's return value.
	(handle_builtin_stxncpy): Same.
	(handle_builtin_strcat): Same.

gcc/testsuite/ChangeLog:

	PR tree-optimization/88800
	* c-c++-common/Wrestrict.c: Adjust.
	* gcc.dg/Warray-bounds-37.c: New test.
	* gcc.dg/builtin-memcpy-2.c: New test.
	* gcc.dg/builtin-memcpy.c: New test.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c	(revision 267925)
+++ gcc/gimple-fold.c	(working copy)
@@ -697,8 +697,6 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterato
   tree destvar, srcvar;
   location_t loc = gimple_location (stmt);
 
-  bool nowarn = gimple_no_warning_p (stmt);
-
   /* If the LEN parameter is a constant zero or in range where
  the only valid value is zero, return DEST.  */
   if (size_must_be_zero_p (len))
@@ -766,12 +764,16 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterato
 	  unsigned ilen = tree_to_uhwi (len);
 	  if (pow2p_hwi (ilen))
 	{
-	  /* Detect invalid bounds and overlapping copies and issue
-		 either -Warray-bounds or -Wrestrict.  */
-	  if (!nowarn
-		  && check_bounds_or_overlap (as_a (stmt),
-	  dest, src, len, len))
-	  	gimple_set_no_warning (stmt, true);
+	  /* Detect out-of-bounds accesses without issuing warnings.
+		 Avoid folding out-of-bounds copies but to avoid false
+		 positives for unreachable code defer warning until after
+		 DCE has worked its magic.
+		 -Wrestrict is still diagnosed.  */
+	  if (int warning = check_bounds_or_overlap (as_a (stmt),
+			 dest, src, len, len,
+			 false, false))
+		if (warning != OPT_Wrestrict)
+		  return false;
 
 	  scalar_int_mode mode;
 	  tree type = lang_hooks.types.type_for_size (ilen * 8, 1);
@@ -1038,10 +1040,16 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterato
 	}
 	}
 
-  /* Detect invalid bounds and overlapping copies and issue either
-	 -Warray-bounds or -Wrestrict.  */
-  if (!nowarn)
-	check_bounds_or_overlap (as_a (stmt), dest, src, len, len);
+  /* Same as above, detect out-of-bounds accesses without issuing
+	 warnings.  Avoid folding out-of-bounds copies but to avoid
+	 false positives for unreachable code defer warning until
+	 after DCE has worked its magic.
+	 -Wrestrict is still diagnosed.  */
+  if (int warning = check_bounds_or_overlap (as_a (stmt),
+		 dest, src, len, len,
+		 false, false))
+	if (warning != OPT_Wrestrict)
+	  return false;
 
   gimple *new_stmt;
   if (is_gimple_reg_type (TREE_TYPE (srcvar)))
Index: gcc/gimple-ssa-warn-restrict.c
===

Re: [PATCH] c-family: Update unaligned adress of packed member check

2019-01-14 Thread H.J. Lu
On Mon, Jan 14, 2019 at 10:00 AM H.J. Lu  wrote:
>
> On Mon, Jan 14, 2019 at 6:22 AM Jakub Jelinek  wrote:
> >
> > On Sun, Jan 13, 2019 at 06:54:05AM -0800, H.J. Lu wrote:
> > > > What always matters is whether we take address of a packed structure
> > > > field/non-static data member or whether we just read that field.
> > > > The former should be warned about, the latter not.
> > > >
> > >
> > > How about this patch?  It checks if address is taken with NOP.
> >
> > I'd like to first understand the convert_p argument to
> > warn_for_address_or_pointer_of_packed_member.
> >
> > To me it seems you want to emit two different warnings, perhaps one
> > surpressed if the other one is emitted, but you actually from the start
> > decide which of the two you are going to check for.  That is just weird.
>
> convert_p  is only for C.
>
> > Consider -O2 -Waddress-of-packed-member -Wno-incompatible-pointer-types:
> >
> > struct __attribute__((packed)) S { char p; int a, b, c; };
> >
> > int *
> > foo (int x, struct S *p)
> > {
> >   return x ? >a : >b;
> > }
> >
> > int *
> > bar (int x, struct S *p)
> > {
> >   return (int *) (x ? >a : >b);
> > }
> >
> > short *
> > baz (int x, struct S *p)
> > {
> >   return x ? >a : >b;
> > }
> >
> > short *
> > qux (int x, struct S *p)
> > {
> >   return (short *) (x ? >a : >b);
> > }
> >
> > This warns in foo, bar and qux, but doesn't warn in baz, because we've
> > decided upfront that that case is convert_p = true.
> >
> > I would have expected that the convert_p argument isn't passed at all,
> > the function always does the diagnostics about taking address that is
> > done with !convert_p right now, and either do the pointer -> pointer
> > conversion warning somewhere else (wherever we detect a pointer to pointer
> > conversion, even in the middle of expression?), or do it wherever you do
> > currently, but again always if the orig_rhs and type pointer types are
> > different.
> >
>
> When convert_p is true, we need to treat pointer conversion
> as a special case.  I am testing this updated patch.
>

There are no regressions with this patch:

https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00792.html

-- 
H.J.


[PATCH] match.pd (uintptr_t) ptr1 [=!]= (uintptr_t) ptr2 improvements (PR tree-optimization/88775)

2019-01-14 Thread Jakub Jelinek
Hi!

The following patch (except for the else if (!INTEGRAL_TYPE_P ) part is
(improved) code to fix up the ptr1 != ptr2 comparison handling. 
Unfortunately it looks like we don't really want to lower pointer equality
comparisons performed in integral type to normal pointer equality
comparisons for GCC 9, as it is too risky, so the following patch keeps
doing what we were doing for pointer comparisons, but for the comparisons
of addresses in integral types
1) fixes a bug, where for zero sized objects we'd happily optimize
(uintptr_t) [0] != (uintptr_t) [0] even for int e[0] = {}, f[0] = {}
2) improves the rest, mainly if the offset is different:
   - if one variable is automatic and the other one is global, we can fold
   - if one or both pointers are in the middle of objects they point to,
 it is fine too
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-14  Jakub Jelinek  

PR tree-optimization/88775
* match.pd (cmp (convert1?@2 addr@0) (convert2? addr@1)): Optimize
equal == 0 equality pointer comparisons some more if compared in
integral types and either one points to an automatic var and the
other to a global, or we can prove at least one points to the middle
or both point to start or both point to end.

* gcc.dg/tree-ssa/pr88775-1.c: New test.
* gcc.dg/tree-ssa/pr88775-2.c: New test.

--- gcc/match.pd.jj 2019-01-12 12:20:56.190999044 +0100
+++ gcc/match.pd2019-01-14 14:26:18.805750285 +0100
@@ -3896,6 +3896,52 @@ (define_operator_list COND_TERNARY
|| TREE_CODE (base1) == SSA_NAME
|| TREE_CODE (base1) == STRING_CST))
  equal = (base0 == base1);
+   if (equal == 0)
+{
+  if (!DECL_P (base0) || !DECL_P (base1))
+equal = 2;
+  else if (cmp != EQ_EXPR && cmp != NE_EXPR)
+equal = 2;
+  /* If this is a pointer comparison, ignore for now even
+ valid equalities where one pointer is the offset zero
+ of one object and the other to one past end of another one.  */
+  else if (!INTEGRAL_TYPE_P (TREE_TYPE (@2)))
+;
+  /* Assume that automatic variables can't be adjacent to global
+ variables.  */
+  else if (is_global_var (base0) != is_global_var (base1))
+;
+  else
+{
+  tree sz0 = DECL_SIZE_UNIT (base0);
+  tree sz1 = DECL_SIZE_UNIT (base1);
+  /* If sizes are unknown, e.g. VLA or not representable,
+ punt.  */
+  if (!tree_fits_poly_int64_p (sz0)
+  || !tree_fits_poly_int64_p (sz1))
+equal = 2;
+  else
+{
+  poly_int64 size0 = tree_to_poly_int64 (sz0);
+  poly_int64 size1 = tree_to_poly_int64 (sz1);
+  /* If one offset is pointing (or could be) to the beginning
+ of one object and the other is pointing to one past the
+ last byte of the other object, punt.  */
+  if (maybe_eq (off0, 0) && maybe_eq (off1, size1))
+equal = 2;
+  else if (maybe_eq (off1, 0) && maybe_eq (off0, size0))
+equal = 2;
+  /* If both offsets are the same, there are some cases
+ we know that are ok.  Either if we know they aren't
+ zero, or if we know both sizes are no zero.  */
+  if (equal == 2
+  && known_eq (off0, off1)
+  && (known_ne (off0, 0)
+  || (known_ne (size0, 0) && known_ne (size1, 0
+equal = 0;
+}
+}
+}
  }
  (if (equal == 1
  && (cmp == EQ_EXPR || cmp == NE_EXPR
@@ -3918,16 +3964,12 @@ (define_operator_list COND_TERNARY
{ constant_boolean_node (known_ge (off0, off1), type); })
(if (cmp == GT_EXPR && (known_gt (off0, off1) || known_le (off0, off1)))
{ constant_boolean_node (known_gt (off0, off1), type); }))
-  (if (equal == 0
-  && DECL_P (base0) && DECL_P (base1)
-  /* If we compare this as integers require equal offset.  */
-  && (!INTEGRAL_TYPE_P (TREE_TYPE (@2))
-  || known_eq (off0, off1)))
-   (switch
-   (if (cmp == EQ_EXPR)
-{ constant_boolean_node (false, type); })
-   (if (cmp == NE_EXPR)
-{ constant_boolean_node (true, type); })
+  (if (equal == 0)
+   (switch
+(if (cmp == EQ_EXPR)
+ { constant_boolean_node (false, type); })
+(if (cmp == NE_EXPR)
+ { constant_boolean_node (true, type); })
 
 /* Simplify pointer equality compares using PTA.  */
 (for neeq (ne eq)
--- gcc/testsuite/gcc.dg/tree-ssa/pr88775-1.c.jj2019-01-14 
13:18:09.332006700 +0100
+++ 

Re: ISO_Fortran_binding patch

2019-01-14 Thread Jakub Jelinek
On Sat, Jan 12, 2019 at 06:35:20PM +, Paul Richard Thomas wrote:
> Done as revision 267884.

The other tests FAILs too:
FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O0  (test for excess errors)
UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O0  compilation failed to 
produce executable
FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O1  (test for excess errors)
UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O1  compilation failed to 
produce executable
FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O2  (test for excess errors)
UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O2  compilation failed to 
produce executable
FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  compilation failed to 
produce executable
FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -g  (test for excess errors)
UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -O3 -g  compilation failed 
to produce executable
FAIL: gfortran.dg/ISO_Fortran_binding_1.f90   -Os  (test for excess errors)
UNRESOLVED: gfortran.dg/ISO_Fortran_binding_1.f90   -Os  compilation failed to 
produce executable

The problem is that:
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/gfortran.dg/ISO_Fortran_binding_1.c:3:10: 
fatal error: ISO_Fortran_binding.h: No such file or directory
compilation terminated.

It either should
#include "../../../libgfortran/ISO_Fortran_binding.h"
instead or the Fortran *.exp files should arrange for -I.../libgfortran/
to be added to all gfortran tests.  Because right now it FAILs if you don't
have ISO_Fortran_binding.h header installed, or succeeds, but includes
header from some other compiler version or even other compiler altogether.

Where is that header installed BTW?
Would be best if it got installed in directories like:
$prefix/lib/gcc/$target/$version/include

See e.g. libssp or libsanitizer, both have something like
target_noncanonical = @target_noncanonical@
libsubincludedir = $(libdir)/gcc/$(target_noncanonical)/$(gcc_version)/include
nobase_libsubinclude_HEADERS = ssp/ssp.h ssp/string.h ssp/stdio.h ssp/unistd.h

You probably want it to go directly in the include dir, so without the ssp/
or whatever else prefixes.

Jakub


Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-14 Thread Richard Sandiford
Steve Ellcey  writes:
> On Fri, 2019-01-11 at 14:45 +, Richard Sandiford wrote:
>> 
>> > +
>> > +/* Return true for types that could be supported as SIMD return or
>> > +   argument types.  */
>> > +
>> > +static bool supported_simd_type (tree t)
>> > +{
>> > +  return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t));
>> 
>> We should also check that the size is 1, 2, 4 or 8 bytes.
>
> I fixed this, I also allow for POINTER_P types which allowed me
> to not do the POINTER_P check below which you asked about and
> which I now think was a mistake (more comments below).

Ah, yeah, agree that's the right thing to do.

>> > +  if (clonei->simdlen == 0)
>> > +{
>> > +  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
>> > +  clonei->simdlen = clonei->vecsize_int;
>> > +  else
>> > +  clonei->simdlen = clonei->vecsize_float;
>> > +  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
>> > +  return 1;
>> > +}
>> 
>> I should have noticed this last time, but base_type is the CDT in the
>> Intel ABI.  That isn't always right for the AArch64 ABI.
>> 
>> I think for now currently_supported_simd_type should take base_type
>> as a second parameter and check that the given type has the same
>> size.
>
> I have not changed this, I am not quite sure what you mean.  What is
> CDT?  Clone data type?  Are you saying I should use node->decl->type
> instead of base_type?

CDT is the Characteristic Data Type and is specific to the Intel ABI:

/* Given a SIMD clone in NODE, calculate the characteristic data
   type and return the coresponding type.  The characteristic data
   type is computed as described in the Intel Vector ABI.  */

static tree
simd_clone_compute_base_data_type (struct cgraph_node *node,
   struct cgraph_simd_clone *clone_info)

This has consequences that we didn't want for AArch64, such as
assigning different implicit simdlens for "double f(float)" and
"float g(double)".  The rules also don't extend naturally to
SVE-like architectures, where mixed data sizes are best handled
using unpacked vectors.

But at this stage I think it would be better to leave cases in which
the Intel ABI gives a different mapping from the AArch64 ABI to GCC 10.
For GCC 9 it seems better to handle only the cases that are the same
under both ABIs.

And using the CDT is stil OK in the trivial case that the return type
and arguments are all supported vector element types and all have the
same size.  So I think for GCC 9 we should just handle that case.

We can do that by passing base_type as a second argument to
currently_supported_simd_type and checking that the first argument
has the same size.

> @@ -18420,6 +18422,140 @@ aarch64_estimated_poly_value (poly_int64 val)
>return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
>  }
>  
> +
> +/* Return true for types that could be supported as SIMD return or
> +   argument types.  */
> +
> +static bool supported_simd_type (tree t)
> +{
> +  HOST_WIDE_INT s;
> +  gcc_assert (tree_fits_shwi_p (TYPE_SIZE_UNIT (t)));
> +  s = tree_to_shwi (TYPE_SIZE_UNIT (t));
> +  return ((FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
> +   && (s == 1 || s == 2 || s == 4 || s == 8));

We should only assert after checking FLOAT_TYPE_P etc.  And there's
no need to assert explicitly, since tree_to_shwi already asserts
where necessary.  So I think this should be:

  if (SCALAR_FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
{
  HOST_WIDE_INT s = tree_to_shwi (TYPE_SIZE_UNIT (t));
  return s == 1 || s == 2 || s == 4 || s == 8;
}
  return false;
  
(SCALAR_FLOAT_TYPE_P so that the tests are consistent about not handling
complex types, sorry for not thinking about that before.)

> +  clonei->vecsize_mangle = 'n';
> +  clonei->mask_mode = VOIDmode;
> +  clonei->vecsize_int = 128;
> +  clonei->vecsize_float = 128;
> +
> +  if (clonei->simdlen == 0)
> +{
> +  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
> + clonei->simdlen = clonei->vecsize_int;
> +  else
> + clonei->simdlen = clonei->vecsize_float;
> +  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
> +  return 1;

The AArch64 vector ABI says that having no simdlen should imply both
64-bit and 128-bit implementations.  E.g.:

   #pragma omp declare simd
   int32_t foo(int32_t x);

should provide:

   int32x2_t _ZGVnN2v_foo(int32x2_t vx);
   int32x2_t _ZGVnM2v_foo(int32x2_t vx, uint32x2_t vmask);
   int32x4_t _ZGVnN4v_foo(int32x4_t vx);
   int32x4_t _ZGVnM4v_foo(int32x4_t vx, uint32x4_t vmask);

whereas here we're only providing the 128-bit versions.  I think we should:

- return 2
- set vecsize_int and vecsize_float to 64 when num==0
- set vecsize_int and vecsize_float to 128 when num==1

> +  /* Restrict ourselves to vectors that fit in a single register  */
> +
> +  gcc_assert (tree_fits_shwi_p (TYPE_SIZE (base_type)));
> +  vsize = clonei->simdlen * tree_to_shwi (TYPE_SIZE (base_type));
> +  

Re: C++ PATCH for c++/88825 - ICE with bogus function return type deduction

2019-01-14 Thread Jason Merrill

On 1/14/19 3:15 PM, Marek Polacek wrote:

On Mon, Jan 14, 2019 at 03:06:33PM -0500, Jason Merrill wrote:

On 1/13/19 9:11 PM, Marek Polacek wrote:

In this (invalid) testcase the return type deduction failed so FUNCTYPE was
error_mark_node and can_do_nrvo_p crashed.  One way to fix this would be to
check error_operand_p as below.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-01-13  Marek Polacek  

PR c++/88825 - ICE with bogus function return type deduction.
* typeck.c (can_do_nrvo_p): Check error_operand_p.


error_operand_p also checks TREE_TYPE of its operand, is that useful here
instead of only comparing functype to error_mark_node?


Actually, it isn't.  So we can get away with a simple comparison, as in the
below:

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.

Jason



Re: warnings about unused shared_ptr/unique_ptr comparisons

2019-01-14 Thread Jonathan Wakely

On 14/01/19 16:53 +0100, Ulrich Drepper wrote:

This is a conservative implementation of a patch to make
shared/unique_ptrs behave more like plain old pointers.  More about this
in bug #88738

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88738

The summary is

- using clang, which enables a warning for unused results of all
comparison operation, found a real bug

- a library implementation is limited in scope and tedious to add
everywhere. At this stage of gcc 9 it was the only acceptable solution,
though

- longer term there should be a warning for comparison operators.
Possibly on by default with the possibility to disable it with an
attribute (see the discussion in the bug).


The patch proposed here only changes the code for C++17 and up to use
the [[nodiscard]] attribute.  For gcc 10 we can either widen this or
implement a better way with the help of the compiler.

I ran the regression test suite and didn't see any additional failures.

OK?


As it only makes changes for C++17 and up, this is OK for trunk now.
Thanks.



Re: Fix random_sample_n and random_shuffle when RAND_MAX is small

2019-01-14 Thread Jonathan Wakely

On 12/12/18 22:31 +0100, Giovanni Bajo wrote:

Hello,

we hit a bug today while cross-compiling a C++ program with mingw32:
if random_shuffle or random_sample_n are called with a sequence of
elements whose length is higher than RAND_MAX, the functions don't
behave as expected because they ignore elements beyond RAND_MAX. This
does not happen often on Linux where glibc defines RAND_MAX to 2**31,
but mingw32 (all released versions) relies on the very old msvcrt.lib,
where RAND_MAX is just 2**15.

I found mentions of this problem in 2011
(http://mingw-users.1079350.n2.nabble.com/RAND-MAX-still-16bit-td6299546.html)
and 2006 
(https://mingw-users.narkive.com/gAIO4G5V/rand-max-problem-why-is-it-only-16-bit).

I'm attaching a proof-of-concept patch that fixes the problem by
introducing an embedded xorshift generator, seeded with std::rand (so
that the functions still depend on srand — it looks like this is not
strictly required by the standard, but it sounds like a good thing to
do for backward compatibility with existing programs). I was wondering
if this approach is OK or something else is preferred.


I'd prefer not to introduce that change unconditionally. The existing
code works fine when std::distance(first, last) < RAND_MAX, and as we
have random access iterators we can check that cheaply.

We'd prefer a bug report in Bugzilla with a testcase that demonstrates
the bug. A portable regression test for our testsuite might not be
practical if it needs more than RAND_MAX elements, but one that runs
for mingw and verifies the fix there would be needed.

See https://gcc.gnu.org/contribute.html#patches for guidelines for
submitting patches (and the rest of the page for other requirements,
like copyright assignment or disclaimers).




Re: [PING] [PATCH v5][C][ADA] use function descriptors instead of trampolines in C

2019-01-14 Thread Jeff Law
On 1/13/19 2:18 PM, Uecker, Martin wrote:
> 
> Does this patch have a change? This version seems risk-free and
> is a clear improvement from simply doing nothing for 
> '-fno-trampolines'. Also it is useful in situations where
> one cannot have an executable stack.
> 
> 
> I am currently thinking about working
> around this problem by calling nested functions with the
> following macro (x86_64 only):
I'm deferring to gcc-10.  We're well into stage4 and that's really where
my focus needs to be.  If another maintainer wants to push on these, I
won't object, but I won't be looking at this issue again until we're
into gcc-10 stage1 development.

Jeff


Re: C++ PATCH for c++/88825 - ICE with bogus function return type deduction

2019-01-14 Thread Marek Polacek
On Mon, Jan 14, 2019 at 03:06:33PM -0500, Jason Merrill wrote:
> On 1/13/19 9:11 PM, Marek Polacek wrote:
> > In this (invalid) testcase the return type deduction failed so FUNCTYPE was
> > error_mark_node and can_do_nrvo_p crashed.  One way to fix this would be to
> > check error_operand_p as below.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2019-01-13  Marek Polacek  
> > 
> > PR c++/88825 - ICE with bogus function return type deduction.
> > * typeck.c (can_do_nrvo_p): Check error_operand_p.
> 
> error_operand_p also checks TREE_TYPE of its operand, is that useful here
> instead of only comparing functype to error_mark_node?

Actually, it isn't.  So we can get away with a simple comparison, as in the
below:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-01-13  Marek Polacek  

PR c++/88825 - ICE with bogus function return type deduction.
* typeck.c (can_do_nrvo_p): Check error_mark_node.

* g++.dg/cpp1y/auto-fn55.C: New test.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index 43d2899a3c4..3d3049cc3a0 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -9342,6 +9342,8 @@ is_std_move_p (tree fn)
 static bool
 can_do_nrvo_p (tree retval, tree functype)
 {
+  if (functype == error_mark_node)
+return false;
   if (retval)
 STRIP_ANY_LOCATION_WRAPPER (retval);
   tree result = DECL_RESULT (current_function_decl);
diff --git gcc/testsuite/g++.dg/cpp1y/auto-fn55.C 
gcc/testsuite/g++.dg/cpp1y/auto-fn55.C
new file mode 100644
index 000..aea2740e1f5
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1y/auto-fn55.C
@@ -0,0 +1,8 @@
+// PR c++/88825
+// { dg-do compile { target c++14 } }
+
+auto f () -> auto *
+{
+  int t = 0;
+  return t; // { dg-error "unable to deduce" }
+}


Re: [C++ Patch] Fix locations of two "typedef is initialized" errors

2019-01-14 Thread Jason Merrill

On 1/13/19 6:33 PM, Paolo Carlini wrote:

Hi,

I think we need this patch too in order to have consistent locations for 
the set of error messages about invalid initializers - most of which I 
changed in patch 23 of this series - and also in order to have 
consistent locations for the two cases - in class, out of class - of 
ill-formed initialized typedefs. Note that when we'll consistently have 
precise locations stored in the initializers we'll have to revisit the 
already mentioned check in check_methods and the one changed here, in 
start_decl, which currently both don't have readily available the 
initializer itself. Also note that this patch relies on the patch I sent 
earlier today, that is relies on a more accurate location stored in the 
TYPE_DECL.


OK.

Jason



Re: [C++ Patch] Improve grokbitfield location

2019-01-14 Thread Jason Merrill

On 1/13/19 6:21 PM, Paolo Carlini wrote:

Hi,

today I realized that if we move further up the "famous" location_t loc 
declaration in grokdeclarator we can often pass a precise location when 
building TYPE_DECLs for typedef names too, thus, in particular, 
profitably use DECL_SOURCE_LOCATION in a grokbitfield error and also 
enabling further improvements. Tested x86_64-linux.


OK.

Jason



Re: [C++ PATCH] Add __cpp_guaranteed_copy_elision and __cpp_nontype_template_parameter_auto

2019-01-14 Thread Jason Merrill

On 1/12/19 8:36 AM, Jakub Jelinek wrote:

Hi!

So, from what I can understand, __cpp_guaranteed_copy_elision
is a C++17 P0135R1 feature test macro for a feature we claim to support,
and __cpp_nontype_template_parameter_auto is a new name for the
__cpp_template_auto macro (which doesn't appear anymore in the SD-6 lists,
but clang++ keeps it for backwards compatibility too).

Tested on x86_64-linux, ok for trunk?

2019-01-12  Jakub Jelinek  

* c-cppbuiltin.c (c_cpp_builtin): Define __cpp_guaranteed_copy_elision
and __cpp_nontype_template_parameter_auto.  Add a comment that
__cpp_template_auto is deprecated.

* g++.dg/cpp1z/feat-cxx1z.C: Add tests for
__cpp_guaranteed_copy_elision and __cpp_nontype_template_parameter_auto
feature test macros.
* g++.dg/cpp2a/feat-cxx2a.C: Likewise.


OK.

Jason



Re: C++ PATCH for c++/88825 - ICE with bogus function return type deduction

2019-01-14 Thread Jason Merrill

On 1/13/19 9:11 PM, Marek Polacek wrote:

In this (invalid) testcase the return type deduction failed so FUNCTYPE was
error_mark_node and can_do_nrvo_p crashed.  One way to fix this would be to
check error_operand_p as below.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-01-13  Marek Polacek  

PR c++/88825 - ICE with bogus function return type deduction.
* typeck.c (can_do_nrvo_p): Check error_operand_p.


error_operand_p also checks TREE_TYPE of its operand, is that useful 
here instead of only comparing functype to error_mark_node?


Jason


Re: C++ PATCH for c++/88830 - ICE with abstract class

2019-01-14 Thread Jason Merrill

On 1/14/19 10:41 AM, Marek Polacek wrote:

On Mon, Jan 14, 2019 at 12:10:14PM +0100, Jakub Jelinek wrote:

On Sun, Jan 13, 2019 at 09:07:00PM -0500, Marek Polacek wrote:

diff --git gcc/cp/decl2.c gcc/cp/decl2.c
index e4cf4e0a361..7b656712471 100644
--- gcc/cp/decl2.c
+++ gcc/cp/decl2.c
@@ -2229,7 +2229,8 @@ maybe_emit_vtables (tree ctype)
   never get generated.  */
if (CLASSTYPE_PURE_VIRTUALS (ctype)
&& TYPE_HAS_NONTRIVIAL_DESTRUCTOR (ctype)
-  && DECL_DEFAULTED_IN_CLASS_P(CLASSTYPE_DESTRUCTOR(ctype)))
+  && !CLASSTYPE_LAZY_DESTRUCTOR (ctype)
+  && DECL_DEFAULTED_IN_CLASS_P (CLASSTYPE_DESTRUCTOR (ctype)))
  note_vague_linkage_fn (CLASSTYPE_DESTRUCTOR(ctype));


Just a formatting nit.  s/CLASSTYPE_DESTRUCTOR/& / on the above line too
when you are at it.  Otherwise I came up with identical patch to yours
(should have noticed the PR is ASSIGNED :( ).


:(  I missed the second formatting problem, fixed here:

2019-01-14  Marek Polacek  

PR c++/88830 - ICE with abstract class.
* decl2.c (maybe_emit_vtables): Check CLASSTYPE_LAZY_DESTRUCTOR.
Fix formatting.


OK.

Jason



Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support

2019-01-14 Thread Jason Merrill

On 12/23/18 9:27 PM, Tom Honermann wrote:
Attached is a revised patch that addresses changes in P0482R6 as well as 
feedback provided by Jason.  Changes from the prior patch include:

- Updated the value of the __cpp_char8_t feature test macro to 201811
   per P0482R6.
- Enable char8_t support with -std=c++2a per adoption of P0482R6 in
   San Diego.
- Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
   by Jason.
- Removed unnecessary checks of 'flag_char8_t' within the C++ front
   end as requested by Jason.
- Corrected the regression spotted by Jason regarding initialization of
   signed char and unsigned char arrays with string literals.
- Made minor changes to the error message emitted for ill-formed
   initialization of char arrays with UTF-8 string literals.  These
   changes do not yet implement Jason's suggestion; I'll follow up with a
   separate patch for that due to additional test impact.

Tested on x86_64-linux.


I just applied the compiler changes with small modifications, as 
follows; thank you very much for the patches.  Jonathan should check in 
the library portion before long.


Jason
commit 08872ecfcbe97cc6ccedf31b8d9a7edeb29bf290
Author: Jason Merrill 
Date:   Mon Jan 7 23:51:35 2019 -0500

Implement P0482R5, char8_t: A type for UTF-8 characters and strings

gcc/cp/
* cvt.c (type_promotes_to): Handle char8_t promotion.
* decl.c (grokdeclarator): Handle invalid type specifier
combinations involving char8_t.
* lex.c (init_reswords): Add char8_t as a reserved word.
* mangle.c (write_builtin_type): Add name mangling for char8_t (Du).
* parser.c (cp_keyword_starts_decl_specifier_p)
(cp_parser_simple_type_specifier): Recognize char8_t as a simple
type specifier.
(cp_parser_string_literal): Use char8_array_type_node for the type
of CPP_UTF8STRING.
(cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
headers.
* rtti.c (emit_support_tinfos): type_info support for char8_t.
* tree.c (char_type_p): Recognize char8_t as a character type.
* typeck.c (string_conv_p): Handle conversions of u8 string
literals of char8_t type.
(check_literal_operator_args): Handle UDLs with u8 string literals
of char8_t type.
* typeck2.c (ordinary_char_type_p): New.
(digest_init_r): Disallow initializing a char array with a u8 string
literal.
gcc/c-family/
* c-common.c (c_common_reswords): Add char8_t.
(fix_string_type): Use char8_t for the type of u8 string literals.
(c_common_get_alias_set): char8_t doesn't alias.
(c_common_nodes_and_builtins): Define char8_t as a builtin type in
C++.
(c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
(keyword_begins_type_specifier): Add RID_CHAR8.
* c-common.h (rid): Add RID_CHAR8.
(c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
Define char8_type_node and char8_array_type_node.
* c-cppbuiltin.c (cpp_atomic_builtins): Predefine
__GCC_ATOMIC_CHAR8_T_LOCK_FREE.
(c_cpp_builtins): Predefine __cpp_char8_t.
* c-lex.c (lex_string): Use char8_array_type_node as the type of
CPP_UTF8STRING.
(lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
* c-opts.c: If not otherwise specified, enable -fchar8_t when
targeting C++2a.
* c.opt: Add the -fchar8_t command line option.
libiberty/
* cp-demangle.c (cplus_demangle_builtin_types)
(cplus_demangle_type): Add name demangling for char8_t (Du).
* cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
new char8_t type.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5ed1d133420..1151708aaf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -206,7 +206,7 @@ in the following sections.
 @item C++ Language Options
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control @gol
--faligned-new=@var{n}  -fargs-in-order=@var{n}  -fcheck-new @gol
+-faligned-new=@var{n}  -fargs-in-order=@var{n}  -fchar8_t  -fcheck-new @gol
 -fconstexpr-depth=@var{n}  -fconstexpr-loop-limit=@var{n} @gol
 -fno-elide-constructors @gol
 -fno-enforce-eh-specs @gol
@@ -2426,6 +2426,60 @@ but few users will need to override the default of
 
 This flag is enabled by default for @option{-std=c++17}.
 
+@item -fchar8_t
+@itemx -fno-char8_t
+@opindex fchar8_t
+@opindex fno-char8_t
+Enable support for @code{char8_t} as adopted for C++2a.  This includes
+the addition of a new @code{char8_t} fundamental type, changes to the
+types of UTF-8 string 

Re: [PATCH] restore CFString handling in attribute format (PR 88638)

2019-01-14 Thread Martin Sebor

On 1/5/19 2:41 PM, Dominique d'Humières wrote:

Hi Martin,

The patch on top of r267591 fixes pr88638 without regression.

Note

  FAIL: c-c++-common/attributes-4.c  -std=gnu++14 (test for excess errors)
  FAIL: c-c++-common/attributes-4.c  -std=gnu++17 (test for excess errors)
  FAIL: c-c++-common/attributes-4.c  -std=gnu++98 (test for excess errors)

Thanks for the fix.


I just committed it.  I don't see this test fail with my Darwin
cross-compiler.  The failures I do see in attr*.c tests are these:

FAIL: gcc.dg/attr-copy-6.c (test for excess errors)
FAIL: gcc.dg/attr-ms_struct-packed1.c (test for excess errors)
FAIL: gcc.dg/attr-weakref-1-darwin.c (test for excess errors)
FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
FAIL: c-c++-common/attr-aligned-1.c  -Wc++-compat  (test for excess errors)
XPASS: c-c++-common/attr-nonstring-3.c  -Wc++-compat  pr86688 (test for 
warnings, line 409)


The attr-copy-6.c failure is due to
  error: only weak aliases are supported in this configuration

The other FAILs are all because the tests are expected to run but
can't in this configuration.

The XPASS seems to be new and present in native builds as well so
it's something to look into.

Martin


Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2019-01-14 Thread Aaron Sawdey
The patch for this was committed to trunk as 267562 (see below). Is this also 
ok for backport to 8?

Thanks,
   Aaron

On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
 Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
 to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
 inline expansion from doing unaligned vector or using vector load/store
 other than lvx/stvx. More description of the issue is here:

 https://patchwork.ozlabs.org/patch/814059/

 OK for trunk if bootstrap/regtest ok?
>>>
>>> Okay, but see below.
>>>
>> [snip]
>>>
>>> This is extraordinarily clumsy :-)  Maybe something like:
>>>
>>> static rtx
>>> gen_lvx_v4si_move (rtx dest, rtx src)
>>> {
>>>   gcc_assert (!(MEM_P (dest) && MEM_P (src));
>>>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>>>   if (MEM_P (dest))
>>> return gen_altivec_stvx_v4si_internal (dest, src);
>>>   else if (MEM_P (src))
>>> return gen_altivec_lvx_v4si_internal (dest, src);
>>>   else
>>> gcc_unreachable ();
>>> }
>>>
>>> (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
>>> the useless extra variable.
>>
>> I think this should be better:
> 
> The gcc_unreachable at the end catches the non-mem to non-mem case.
> 
>> static rtx
>> gen_lvx_v4si_move (rtx dest, rtx src)
>> {
>>   gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && 
>> !MEM_P(dest)));
> 
> But if you prefer this, how about
> 
> {
>   gcc_assert (MEM_P (dest) ^ MEM_P (src));
>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> 
>   if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
>   else
> return gen_altivec_lvx_v4si_internal (dest, src);
> }
> 
> :-)
> 
> 
> Segher
> 

2019-01-03  Aaron Sawdey  

* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.


Index: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c   (revision 267299)
+++ gcc/config/rs6000/rs6000-string.c   (working copy)
@@ -2669,6 +2669,25 @@
   return true;
 }

+/* Generate loads and stores for a move of v4si mode using lvx/stvx.
+   This uses altivec_{l,st}vx__internal which use unspecs to
+   keep combine from changing what instruction gets used.
+
+   DEST is the destination for the data.
+   SRC is the source of the data for the move.  */
+
+static rtx
+gen_lvx_v4si_move (rtx dest, rtx src)
+{
+  gcc_assert (MEM_P (dest) ^ MEM_P (src));
+  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
+
+  if (MEM_P (dest))
+return gen_altivec_stvx_v4si_internal (dest, src);
+  else
+return gen_altivec_lvx_v4si_internal (dest, src);
+}
+
 /* Expand a block move operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.

@@ -2721,11 +2740,11 @@

   /* Altivec first, since it will be faster than a string move
 when it applies, and usually not significantly larger.  */
-  if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || 
align >= 128))
+  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
{
  move_bytes = 16;
  mode = V4SImode;
- gen_func.mov = gen_movv4si;
+ gen_func.mov = gen_lvx_v4si_move;
}
   else if (bytes >= 8 && TARGET_POWERPC64
   && (align >= 64 || !STRICT_ALIGNMENT))



-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



[PATCH] c-family: Update unaligned adress of packed member check

2019-01-14 Thread H.J. Lu
On Mon, Jan 14, 2019 at 6:22 AM Jakub Jelinek  wrote:
>
> On Sun, Jan 13, 2019 at 06:54:05AM -0800, H.J. Lu wrote:
> > > What always matters is whether we take address of a packed structure
> > > field/non-static data member or whether we just read that field.
> > > The former should be warned about, the latter not.
> > >
> >
> > How about this patch?  It checks if address is taken with NOP.
>
> I'd like to first understand the convert_p argument to
> warn_for_address_or_pointer_of_packed_member.
>
> To me it seems you want to emit two different warnings, perhaps one
> surpressed if the other one is emitted, but you actually from the start
> decide which of the two you are going to check for.  That is just weird.

convert_p  is only for C.

> Consider -O2 -Waddress-of-packed-member -Wno-incompatible-pointer-types:
>
> struct __attribute__((packed)) S { char p; int a, b, c; };
>
> int *
> foo (int x, struct S *p)
> {
>   return x ? >a : >b;
> }
>
> int *
> bar (int x, struct S *p)
> {
>   return (int *) (x ? >a : >b);
> }
>
> short *
> baz (int x, struct S *p)
> {
>   return x ? >a : >b;
> }
>
> short *
> qux (int x, struct S *p)
> {
>   return (short *) (x ? >a : >b);
> }
>
> This warns in foo, bar and qux, but doesn't warn in baz, because we've
> decided upfront that that case is convert_p = true.
>
> I would have expected that the convert_p argument isn't passed at all,
> the function always does the diagnostics about taking address that is
> done with !convert_p right now, and either do the pointer -> pointer
> conversion warning somewhere else (wherever we detect a pointer to pointer
> conversion, even in the middle of expression?), or do it wherever you do
> currently, but again always if the orig_rhs and type pointer types are
> different.
>

When convert_p is true, we need to treat pointer conversion
as a special case.  I am testing this updated patch.

-- 
H.J.
From 7b7281896a731371ecd8c293582c0c4dcff0c92f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 12 Jan 2019 21:03:50 -0800
Subject: [PATCH] c-family: Update unaligned adress of packed member check

Properly check unaligned pointer conversion as well as properly strip
NOPS and don't warn address of packed member if address isn't taken
with NOPS.

gcc/c-family/

	PR c/51628
	PR c/88664
	* c-warn.c (check_address_of_packed_member): Renamed to ...
	(check_address_of_packed_member): This.  Add a boolean argument
	to also warn pointer conversion.
	(check_and_warn_address_of_packed_member): Renamed to ...
	(check_and_warn_address_or_pointer_of_packed_member): This.
	Add a boolean argument to also warn pointer conversion.
	(warn_for_address_or_pointer_of_packed_member): Don't check
	pointer conversion here.

gcc/testsuite/

	PR c/51628
	PR c/88664
	* c-c++-common/pr51628-33.c: New test.
	* c-c++-common/pr88664-1.c: Likewise.
	* c-c++-common/pr88664-2.c: Likewise.
	* gcc.dg/pr51628-34.c: Likewise.
---
 gcc/c-family/c-warn.c   | 140 ++--
 gcc/testsuite/c-c++-common/pr51628-33.c |  19 
 gcc/testsuite/c-c++-common/pr88664-1.c  |  20 
 gcc/testsuite/c-c++-common/pr88664-2.c  |  22 
 gcc/testsuite/gcc.dg/pr51628-34.c   |  25 +
 5 files changed, 166 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr51628-33.c
 create mode 100644 gcc/testsuite/c-c++-common/pr88664-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr88664-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr51628-34.c

diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index 79b2d8ad449..d3660ffd2c1 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -2713,12 +2713,16 @@ check_alignment_of_packed_member (tree type, tree field)
   return NULL_TREE;
 }
 
-/* Return struct or union type if the right hand value, RHS, takes the
-   unaligned address of packed member of struct or union when assigning
-   to TYPE.  Otherwise, return NULL_TREE.  */
+/* Return struct or union type if the right hand value, RHS:
+   1. For CONVERT_P == true, is a pointer value which isn't aligned to a
+  pointer type TYPE.
+   2. For CONVERT_P == false, is an address which takes the unaligned
+  address of packed member of struct or union when assigning to TYPE.
+   Otherwise, return NULL_TREE.  */
 
 static tree
-check_address_of_packed_member (tree type, tree rhs)
+check_address_or_pointer_of_packed_member (bool convert_p, tree type,
+	   tree rhs)
 {
   if (INDIRECT_REF_P (rhs))
 rhs = TREE_OPERAND (rhs, 0);
@@ -2726,6 +2730,35 @@ check_address_of_packed_member (tree type, tree rhs)
   if (TREE_CODE (rhs) == ADDR_EXPR)
 rhs = TREE_OPERAND (rhs, 0);
 
+  if (convert_p
+  && (TREE_CODE (rhs) == PARM_DECL
+	  || TREE_CODE (rhs) == VAR_DECL))
+{
+  tree rhstype = TREE_TYPE (rhs);
+  if ((POINTER_TYPE_P (rhstype)
+	   || TREE_CODE (rhstype) == ARRAY_TYPE)
+	  && TYPE_PACKED (TREE_TYPE (rhstype)))
+	{
+	  unsigned int type_align = TYPE_ALIGN_UNIT (TREE_TYPE (type));
+	  unsigned 

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2019-01-14 Thread Andi Kleen
On Mon, Jan 14, 2019 at 04:15:20PM +0800, Bin.Cheng wrote:
> On Mon, Jan 14, 2019 at 4:07 PM Andi Kleen  wrote:
> >
> > Bin Cheng,
> >
> > I did some testing on this now. The attached patch automatically increases 
> > the iterations
> > for autofdo profiles.
> Hi Andi, thanks very much for tuning these.
> >
> > But even with even more iterations I still have stable failures in
> >
> > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> > FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ 
> > \ta-zA-Z0-0]+foo[._]+cold
> I think these two are supposed to fail with current code base.


We should mark it as XFAIL then I guess.

Is it understood why it doesn't work?

> > FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call 
> > -> direct call.* a1 transformation on insn"
> I also got unstable pass/fail for indirect call optimization when
> tuning iterations, and haven't got an iteration number which passes
> all the time.  I guess we need to combine decreasing of sampling count
> here.

Okay I will look into that.

Could also try if prime sample after values help, this sometimes fixes
problems with systematically missing some code in sampling.

> > FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> > times"
> This one should fail too.

Same.

-Andi


Re: Set inline-unit-growth to 40

2019-01-14 Thread Qing Zhao
Hi, Honza,

in addition to the code size problems, there are several runtime regression for 
the SPEC: (If I read the table correctly, if not, let me know)

SPEC/SPEC2006/INT/483.xalancbmk 
146.131 
4.89%

SPEC/SPEC2006/FP/436.cactusADM 
 
130.967 8.07%   

SPEC/SPEC2006/FP/435.gromacs 

182.555 11.73%  

SPEC/SPEC2017/INT/541.leela_r 
  
452.333 4.17%   

SPEC/SPEC2017/INT/520.omnetpp_r 
395.582 
4.98%   

do we have plan to study and fix these run-time regression?

thanks.

Qing

> On Jan 12, 2019, at 12:32 PM, Jan Hubicka  wrote:
> 
> Hello,
> this patch sets inline-unit-growth to 40.  The performance changes are
> - Firefox, LTO
>  
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=f7bd026e1a931b9a284d1c85c2577a72dd592820=try=74889968abcc688b8d161863566ed273c0401ee4=1=opt=1=1
>  After fixes to inlining priorities this makes difference without
>  profile feedback only.
> 
>  Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
>  feedback.
> - Firefox noLTO
>  
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=c902b72340a3dca3114f58578c1c8f3e6a1cd89c=try=4974da6f92c144a9c09765b56a564a640069ddb9=1=1=1
>  With about 7% code size growth
> - SPEC
>  
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10_percentage_change=0.02=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> - C++ benchmarks
>  
> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10_changes=on_percentage_change=0.02=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> 
> I am not entirely happy about the code-size/performance tradeoffs but it
> is concerned only for programs built with -O3 or having too many inline
> keywords.  I have looked into inlining decisions for Firefox, HHVM and
> Clang and inliner gets out of growt bounds way too early and some of
> more performance aware projects already sets the limit up.
> 
> I will tune other metrics down to handle some of the code size problems.
> 
> Honza
> 
> Index: ChangeLog
> ===
> --- ChangeLog (revision 267882)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,7 @@
> +2019-01-05  Jan Hubicka  
> +
> + * params.def (inline-unit-growth): Set to 40.
> +
> 2019-01-12  Jakub Jelinek  
> 
>   * tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
> Index: params.def
> ===
> --- params.def(revision 267882)
> +++ params.def(working copy)
> @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
> DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>"inline-unit-growth",
>"How much can given compilation unit grow because of the inlining (in 
> percent).",
> -  20, 0, 0)
> +  40, 0, 0)
> DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>"ipcp-unit-growth",
>"How much can given compilation unit grow because of the 
> interprocedural constant propagation (in percent).",



Re: Set inline-unit-growth to 40

2019-01-14 Thread Jan Hubicka
Hello,
> > Index: params.def
> > ===
> > --- params.def  (revision 267882)
> > +++ params.def  (working copy)
> > @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
> >  DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
> >  "inline-unit-growth",
> >  "How much can given compilation unit grow because of the inlining 
> > (in percent).",
> > -20, 0, 0)
> > +40, 0, 0)
> >  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
> >  "ipcp-unit-growth",
> >  "How much can given compilation unit grow because of the 
> > interprocedural constant propagation (in percent).",
> 
> This patch introduces a regression in libstdc++:
> FAIL: ext/pb_ds/regression/list_update_map_rand.cc execution test
> on a few arm targets.
> 
> For instance:
> arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a5
> --with-fpu vfpv3-d16-fp16

Adjusting inliner heuiristics should not trigger correcness issues, so
this seems like a bug that was previously latent.  I guess it may
legally break correct code only if stack usage gets too large.

Do you have any idea what breaks in this testcase?

Honza
> 
> Using --with-mode thumb and the same other configure options makes the
> test pass.
> I'm seeing this with other configurations --with-mode arm and
> --with-fpu vfp* (as opposed to neon*)
> 
> The .log file has:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> qemu: uncaught target signal 11 (Segmentation fault) - core dumped
> 
> The (incomplete?) qemu execution trace ends with:
> 
> IN:
> 0x40ada6b8:  e5910008  ldr  r0, [r1, #8]
> 0x40ada6bc:  e156  cmp  r6, r0
> 0x40ada6c0:  1a4f  bne  #0x40ada804
> 
> IN:
> 0x40ada6c4:  e5960004  ldr  r0, [r6, #4]
> 0x40ada6c8:  e582100c  str  r1, [r2, #0xc]
> 0x40ada6cc:  e3500c02  cmp  r0, #0x200
> 0x40ada6d0:  e5812008  str  r2, [r1, #8]
> 0x40ada6d4:  3a02  blo  #0x40ada6e4
> 
> IN:
> 0x40adb880:  ea3e  b#0x40adb580
> 
> IN: 
> _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_E13subscript_impENSt3tr117integral_constantIiLi0EEE
> 0x0001ffc4:  e3a06000  mov  r6, #0
> 0x0001ffc8:  ea88  b#0x1fdf0
> 
> IN: 
> _ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_E13subscript_impENSt3tr117integral_constantIiLi0EEE
> 0x0001fdf0:  ee180a10  vmov r0, s16
> 0x0001fdf4:  ebffcd12  bl   #0x13244
> 
> Christophe


Re: Set inline-unit-growth to 40

2019-01-14 Thread Christophe Lyon
Hi Honza,

On Sat, 12 Jan 2019 at 19:32, Jan Hubicka  wrote:
>
> Hello,
> this patch sets inline-unit-growth to 40.  The performance changes are
> - Firefox, LTO
>   
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=f7bd026e1a931b9a284d1c85c2577a72dd592820=try=74889968abcc688b8d161863566ed273c0401ee4=1=opt=1=1
>   After fixes to inlining priorities this makes difference without
>   profile feedback only.
>
>   Code size growth is about 9.15% with LTO and 3.95 with LTO and profile
>   feedback.
> - Firefox noLTO
>   
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=c902b72340a3dca3114f58578c1c8f3e6a1cd89c=try=4974da6f92c144a9c09765b56a564a640069ddb9=1=1=1
>   With about 7% code size growth
> - SPEC
>   
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10_percentage_change=0.02=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
> - C++ benchmarks
>   
> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=10_changes=on_percentage_change=0.02=46e2bd1143b5c60af814916d7673879b34ceb3f6%2Cc0d79cfe9c4ec30823480f2f9b256600e8e3899f
>
> I am not entirely happy about the code-size/performance tradeoffs but it
> is concerned only for programs built with -O3 or having too many inline
> keywords.  I have looked into inlining decisions for Firefox, HHVM and
> Clang and inliner gets out of growt bounds way too early and some of
> more performance aware projects already sets the limit up.
>
> I will tune other metrics down to handle some of the code size problems.
>
> Honza
>
> Index: ChangeLog
> ===
> --- ChangeLog   (revision 267882)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,7 @@
> +2019-01-05  Jan Hubicka  
> +
> +   * params.def (inline-unit-growth): Set to 40.
> +
>  2019-01-12  Jakub Jelinek  
>
> * tree-ssa-loop-ivopts.c (find_inv_vars): Fix a comment typo.
> Index: params.def
> ===
> --- params.def  (revision 267882)
> +++ params.def  (working copy)
> @@ -227,7 +227,7 @@ DEFPARAM(PARAM_LARGE_UNIT_INSNS,
>  DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>  "inline-unit-growth",
>  "How much can given compilation unit grow because of the inlining 
> (in percent).",
> -20, 0, 0)
> +40, 0, 0)
>  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>  "ipcp-unit-growth",
>  "How much can given compilation unit grow because of the 
> interprocedural constant propagation (in percent).",

This patch introduces a regression in libstdc++:
FAIL: ext/pb_ds/regression/list_update_map_rand.cc execution test
on a few arm targets.

For instance:
arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a5
--with-fpu vfpv3-d16-fp16

Using --with-mode thumb and the same other configure options makes the
test pass.
I'm seeing this with other configurations --with-mode arm and
--with-fpu vfp* (as opposed to neon*)

The .log file has:




























qemu: uncaught target signal 11 (Segmentation fault) - core dumped

The (incomplete?) qemu execution trace ends with:

IN:
0x40ada6b8:  e5910008  ldr  r0, [r1, #8]
0x40ada6bc:  e156  cmp  r6, r0
0x40ada6c0:  1a4f  bne  #0x40ada804

IN:
0x40ada6c4:  e5960004  ldr  r0, [r6, #4]
0x40ada6c8:  e582100c  str  r1, [r2, #0xc]
0x40ada6cc:  e3500c02  cmp  r0, #0x200
0x40ada6d0:  e5812008  str  r2, [r1, #8]
0x40ada6d4:  3a02  blo  #0x40ada6e4

IN:
0x40adb880:  ea3e  b#0x40adb580

IN: 
_ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_E13subscript_impENSt3tr117integral_constantIiLi0EEE
0x0001ffc4:  e3a06000  mov  r6, #0
0x0001ffc8:  ea88  b#0x1fdf0

IN: 
_ZN10__gnu_pbds4test6detail30container_rand_regression_testINS_11list_updateINS0_10basic_typeES4_St8equal_toIS4_ENS0_26lu_move_to_front_policy_t_EN9__gnu_cxx22throw_allocator_randomIS4_E13subscript_impENSt3tr117integral_constantIiLi0EEE
0x0001fdf0:  ee180a10  vmov r0, s16
0x0001fdf4:  ebffcd12  bl   #0x13244

Christophe


Re: [PATCH] Fix location of tls_wrapper_fn (PR gcov-profile/88263).

2019-01-14 Thread Jason Merrill

On 1/14/19 10:43 AM, Martin Liška wrote:

Hi.

This is another fix for the PR where I updated location of
tls_wrapper.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


OK.

Jason



Re: warnings about unused shared_ptr/unique_ptr comparisons

2019-01-14 Thread Kyrill Tkachov

On 14/01/19 15:53, Ulrich Drepper wrote:

This is a conservative implementation of a patch to make
shared/unique_ptrs behave more like plain old pointers.  More about this
in bug #88738

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88738

The summary is

- using clang, which enables a warning for unused results of all
comparison operation, found a real bug

- a library implementation is limited in scope and tedious to add
everywhere. At this stage of gcc 9 it was the only acceptable solution,
though

- longer term there should be a warning for comparison operators.
Possibly on by default with the possibility to disable it with an
attribute (see the discussion in the bug).


The patch proposed here only changes the code for C++17 and up to use
the [[nodiscard]] attribute.  For gcc 10 we can either widen this or
implement a better way with the help of the compiler.

I ran the regression test suite and didn't see any additional failures.

OK?


Forwarding to the libstdc++ list for these patches.

Thanks,
Kyrill


libstdc++-v3/
2019-02-14  Ulrich Drepper  

PR libstdc++/88738
Warn about unused comparisons of shared_ptr/unique_ptr
* include/bits/c++config [_GLIBCXX_NODISCARD]: Define.
* include/bits/shared_ptr.h: Use it for operator ==, !=,
<, <=, >, >= for shared_ptr.
* include/bits/unique_ptr.h: Likewise for unique_ptr.





warnings about unused shared_ptr/unique_ptr comparisons

2019-01-14 Thread Ulrich Drepper
This is a conservative implementation of a patch to make
shared/unique_ptrs behave more like plain old pointers.  More about this
in bug #88738

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88738

The summary is

- using clang, which enables a warning for unused results of all
comparison operation, found a real bug

- a library implementation is limited in scope and tedious to add
everywhere. At this stage of gcc 9 it was the only acceptable solution,
though

- longer term there should be a warning for comparison operators.
Possibly on by default with the possibility to disable it with an
attribute (see the discussion in the bug).


The patch proposed here only changes the code for C++17 and up to use
the [[nodiscard]] attribute.  For gcc 10 we can either widen this or
implement a better way with the help of the compiler.

I ran the regression test suite and didn't see any additional failures.

OK?

libstdc++-v3/
2019-02-14  Ulrich Drepper  

PR libstdc++/88738
Warn about unused comparisons of shared_ptr/unique_ptr
* include/bits/c++config [_GLIBCXX_NODISCARD]: Define.
* include/bits/shared_ptr.h: Use it for operator ==, !=,
<, <=, >, >= for shared_ptr.
* include/bits/unique_ptr.h: Likewise for unique_ptr.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 9b2fabd7d76..97bb6db70b1 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -99,6 +99,14 @@
 # define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11")))
 #endif
 
+// Macro to warn about unused results.
+#if __cplusplus >= 201703L
+# define _GLIBCXX_NODISCARD [[__nodiscard__]]
+#else
+# define _GLIBCXX_NODISCARD
+#endif
+
+
 
 #if __cplusplus
 
diff --git a/libstdc++-v3/include/bits/shared_ptr.h 
b/libstdc++-v3/include/bits/shared_ptr.h
index 99009ab4f99..d504627d1a0 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -380,37 +380,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // 20.7.2.2.7 shared_ptr comparisons
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator==(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return __a.get() == __b.get(); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator==(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return !__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator==(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return !__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return __a.get() != __b.get(); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return (bool)__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return (bool)__a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 {
   using _Tp_elt = typename shared_ptr<_Tp>::element_type;
@@ -420,7 +420,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 {
   using _Tp_elt = typename shared_ptr<_Tp>::element_type;
@@ -428,7 +428,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 {
   using _Tp_elt = typename shared_ptr<_Tp>::element_type;
@@ -436,47 +436,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return !(__b < __a); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return !(nullptr < __a); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return !(__a < nullptr); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>(const shared_ptr<_Tp>& __a, const shared_ptr<_Up>& __b) noexcept
 { return (__b < __a); }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>(const shared_ptr<_Tp>& __a, nullptr_t) noexcept
 { return nullptr < __a; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>(nullptr_t, const shared_ptr<_Tp>& __a) noexcept
 { return __a < nullptr; }
 
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>=(const shared_ptr<_Tp>& 

[PATCH] Fix location of tls_wrapper_fn (PR gcov-profile/88263).

2019-01-14 Thread Martin Liška
Hi.

This is another fix for the PR where I updated location of
tls_wrapper.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From 07811d1057196abad898c9aeda08cd9113aedf70 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Jan 2019 14:57:01 +0100
Subject: [PATCH] Fix location of tls_wrapper_fn (PR gcov-profile/88263).

gcc/cp/ChangeLog:

2019-01-14  Martin Liska  

	PR gcov-profile/88263
	* decl2.c (get_tls_wrapper_fn): Use DECL_SOURCE_LOCATION
	as location of the TLS wrapper.

gcc/testsuite/ChangeLog:

2019-01-14  Martin Liska  

	PR gcov-profile/88263
	* g++.dg/gcov/pr88263-2.C: New test.
---
 gcc/cp/decl2.c|  4 +++-
 gcc/testsuite/g++.dg/gcov/pr88263-2.C | 25 +
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/pr88263-2.C

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index dbab95fbc96..9085e5cb154 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3433,7 +3433,9 @@ get_tls_wrapper_fn (tree var)
   tree type = non_reference (TREE_TYPE (var));
   type = build_reference_type (type);
   tree fntype = build_function_type (type, void_list_node);
-  fn = build_lang_decl (FUNCTION_DECL, sname, fntype);
+
+  fn = build_lang_decl_loc (DECL_SOURCE_LOCATION (var),
+FUNCTION_DECL, sname, fntype);
   SET_DECL_LANGUAGE (fn, lang_c);
   TREE_PUBLIC (fn) = TREE_PUBLIC (var);
   DECL_ARTIFICIAL (fn) = true;
diff --git a/gcc/testsuite/g++.dg/gcov/pr88263-2.C b/gcc/testsuite/g++.dg/gcov/pr88263-2.C
new file mode 100644
index 000..f0cf15f5d0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/pr88263-2.C
@@ -0,0 +1,25 @@
+// PR gcov-profile/88263
+// { dg-options "-fprofile-arcs -ftest-coverage -std=c++11" }
+// { dg-do run { target native } }
+
+#include 
+
+namespace logging {
+class Logstream {
+	~Logstream();
+	static thread_local std::ostringstream os_;
+};
+}
+namespace logging {
+thread_local std::ostringstream Logstream::os_;
+Logstream::~Logstream() {
+	os_.clear();
+}
+}
+
+int main()
+{
+  return 0;
+}
+
+// { dg-final { run-gcov pr88263-2.C } }
-- 
2.20.1



Re: C++ PATCH for c++/88830 - ICE with abstract class

2019-01-14 Thread Marek Polacek
On Mon, Jan 14, 2019 at 12:10:14PM +0100, Jakub Jelinek wrote:
> On Sun, Jan 13, 2019 at 09:07:00PM -0500, Marek Polacek wrote:
> > diff --git gcc/cp/decl2.c gcc/cp/decl2.c
> > index e4cf4e0a361..7b656712471 100644
> > --- gcc/cp/decl2.c
> > +++ gcc/cp/decl2.c
> > @@ -2229,7 +2229,8 @@ maybe_emit_vtables (tree ctype)
> >   never get generated.  */
> >if (CLASSTYPE_PURE_VIRTUALS (ctype)
> >&& TYPE_HAS_NONTRIVIAL_DESTRUCTOR (ctype)
> > -  && DECL_DEFAULTED_IN_CLASS_P(CLASSTYPE_DESTRUCTOR(ctype)))
> > +  && !CLASSTYPE_LAZY_DESTRUCTOR (ctype)
> > +  && DECL_DEFAULTED_IN_CLASS_P (CLASSTYPE_DESTRUCTOR (ctype)))
> >  note_vague_linkage_fn (CLASSTYPE_DESTRUCTOR(ctype));
> 
> Just a formatting nit.  s/CLASSTYPE_DESTRUCTOR/& / on the above line too
> when you are at it.  Otherwise I came up with identical patch to yours
> (should have noticed the PR is ASSIGNED :( ).

:(  I missed the second formatting problem, fixed here:

2019-01-14  Marek Polacek  

PR c++/88830 - ICE with abstract class.
* decl2.c (maybe_emit_vtables): Check CLASSTYPE_LAZY_DESTRUCTOR.
Fix formatting.

* g++.dg/other/abstract7.C: New test.

diff --git gcc/cp/decl2.c gcc/cp/decl2.c
index e4cf4e0a361..902bb8cab4f 100644
--- gcc/cp/decl2.c
+++ gcc/cp/decl2.c
@@ -2229,8 +2229,9 @@ maybe_emit_vtables (tree ctype)
  never get generated.  */
   if (CLASSTYPE_PURE_VIRTUALS (ctype)
   && TYPE_HAS_NONTRIVIAL_DESTRUCTOR (ctype)
-  && DECL_DEFAULTED_IN_CLASS_P(CLASSTYPE_DESTRUCTOR(ctype)))
-note_vague_linkage_fn (CLASSTYPE_DESTRUCTOR(ctype));
+  && !CLASSTYPE_LAZY_DESTRUCTOR (ctype)
+  && DECL_DEFAULTED_IN_CLASS_P (CLASSTYPE_DESTRUCTOR (ctype)))
+note_vague_linkage_fn (CLASSTYPE_DESTRUCTOR (ctype));
 
   /* Since we're writing out the vtable here, also write the debug
  info.  */
diff --git gcc/testsuite/g++.dg/other/abstract7.C 
gcc/testsuite/g++.dg/other/abstract7.C
new file mode 100644
index 000..95781602c95
--- /dev/null
+++ gcc/testsuite/g++.dg/other/abstract7.C
@@ -0,0 +1,14 @@
+// PR c++/88830
+
+struct a {
+  ~a();
+};
+class b {
+  virtual void c(int &);
+};
+class C : b {
+  void c(int &);
+  virtual int d() = 0;
+  a e;
+};
+void C::c(int &) {}


Re: add tsv110 pipeline scheduling

2019-01-14 Thread Kyrill Tkachov

Hi Wuyuan,

On 14/01/19 14:02, wuyuan (E) wrote:

Hi  Kyrill:
  The gcc 7.3.0 does not discard the store1 and load1 command; I did 
not expect the community's latest gcc changes so large .
  now I downloaded the latest GCC code, put the patch into GCC source 
code, the compiler can pass, thank you very much for your work!



For the future, please test the patches against the branch you plan to apply 
them to.
In this case, since you're submitting a trunk patch it needs to be applied and 
tested on trunk.

This latest version builds on trunk and looks ok to me but you'll need approval 
from the aarch64 maintainers to commit.
I've cc'ed them for you.

Thanks,
Kyrill



Best Regards,

wuyuan


   * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
   * config/aarch64/aarch64.md : Add "tsv110.md"
   * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 70b0766..085c40f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -103,7 +103,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, cortexa72, 
0x41, 0xd0c, -1)
  
  /* HiSilicon ('H') cores. */

-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
  
  /* ARMv8.4-A Architecture Processors.  */
  
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

index 513aec1..97e0703 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -356,6 +356,7 @@
  (include "thunderx.md")
  (include "../arm/xgene1.md")
  (include "thunderx2t99.md")
+(include "tsv110.md")
  
  ;; ---

  ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..e33c5cc
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, 

[PATCH]: Mention -Waddress-of-packed-member change in GCC 9

2019-01-14 Thread H.J. Lu
OK to install?

H.J.
---
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.34
diff -u -p -r1.34 changes.html
--- changes.html13 Jan 2019 17:40:11 -  1.34
+++ changes.html14 Jan 2019 14:33:58 -
@@ -79,6 +79,14 @@ a work-in-progress.
 __builtin_convertvector built-in for vector conversions
   has been added. 
   
+
+  New warnings:
+  
+  -Waddress-of-packed-member, enabled by default,
+  warns unaligned pointer value from adress of packed member of
+  struct or union.
+  
+  
 
 
 C++


Re: [PATCH] C-family: Only check the non-pointer data member

2019-01-14 Thread Jakub Jelinek
On Sun, Jan 13, 2019 at 06:54:05AM -0800, H.J. Lu wrote:
> > What always matters is whether we take address of a packed structure
> > field/non-static data member or whether we just read that field.
> > The former should be warned about, the latter not.
> >
> 
> How about this patch?  It checks if address is taken with NOP.

I'd like to first understand the convert_p argument to
warn_for_address_or_pointer_of_packed_member.

To me it seems you want to emit two different warnings, perhaps one
surpressed if the other one is emitted, but you actually from the start
decide which of the two you are going to check for.  That is just weird.

Consider -O2 -Waddress-of-packed-member -Wno-incompatible-pointer-types:

struct __attribute__((packed)) S { char p; int a, b, c; };

int *
foo (int x, struct S *p)
{
  return x ? >a : >b;
}

int *
bar (int x, struct S *p)
{
  return (int *) (x ? >a : >b);
}

short *
baz (int x, struct S *p)
{
  return x ? >a : >b;
}

short *
qux (int x, struct S *p)
{
  return (short *) (x ? >a : >b);
}

This warns in foo, bar and qux, but doesn't warn in baz, because we've
decided upfront that that case is convert_p = true.

I would have expected that the convert_p argument isn't passed at all,
the function always does the diagnostics about taking address that is
done with !convert_p right now, and either do the pointer -> pointer
conversion warning somewhere else (wherever we detect a pointer to pointer
conversion, even in the middle of expression?), or do it wherever you do
currently, but again always if the orig_rhs and type pointer types are
different.

Jakub


Re: PATCH: Add -Waddress-of-packed-member to GCC 9 porting guide

2019-01-14 Thread H.J. Lu
On Mon, Jan 14, 2019 at 5:53 AM Richard Biener
 wrote:
>
> On Mon, Jan 14, 2019 at 2:46 PM H.J. Lu  wrote:
> >
> > This patch adds -Waddress-of-packed-member to GCC 9 porting guide.
> >
> > OK to install?
>
> The docs fail to mention what to do when the unaligned pointer is _not_
> safe to use.  That is, how do I fix
>
> struct { char c; int i[4]; } s __attribute__((packed));
> int foo()
> {
>   int *p = s.i;
>   return bar (p);
> }
> int bar (int *q)
> {
>   return *q;
> }
>
> for the cases where eliding the pointer isn't easily possible?

You can't have both packed struct and aligned array at the same time.
The only thing I can say is "don't do it".

> Please also mention the new warning in changes.html
> (it seems to be enabled by default even?).

I will add a paragraph.

H.J.
> IIRC the frontends themselves build "bogus" pointer types
> to aligned data from a simple [1] because the FIELD_DECLs
> types are naturally aligned.
>
> Richard.
>
> > Thanks.
> >
> > H.J.
> > ---
> > Index: gcc-9/porting_to.html
> > ===
> > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/porting_to.html,v
> > retrieving revision 1.1
> > diff -u -r1.1 porting_to.html
> > --- gcc-9/porting_to.html   11 Jan 2019 18:21:45 -  1.1
> > +++ gcc-9/porting_to.html   14 Jan 2019 13:46:07 -
> > @@ -56,13 +56,36 @@
> >}
> >
> >
> > +C/C++ language issues
> > +
> > +-Waddress-of-packed-member
> > +is enabled by default
> > +
> > +
> > +  When address of packed member of struct or union is taken, it may result
> > +  in an unaligned pointer value.  A new warning
> > +  -Waddress-of-packed-member was added to check alignment at
> > +  pointer assignment.  It warns both unaligned address and unaligned
> > +  pointer.
> > +
> > +
> > +
> > +  If the pointer value is safe to use, you can suppress
> > +  -Waddress-of-packed-member warnings by using pragmas:
> > +
> > +  
> > +#pragma GCC diagnostic push
> > +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
> > +/* (code for which the warning is to be disabled)  */
> > +#pragma GCC diagnostic pop
> > +  
> > +
> >  
> >
> >  
> >
> >  

Re: [PATCH][RFC] Extend locations where to seach for Fortran pre-include.

2019-01-14 Thread Martin Liška
On 1/11/19 7:06 PM, Joseph Myers wrote:
> On Fri, 11 Jan 2019, Martin Liška wrote:
> 
>> +/* Same as add_prefix, but prepending target_sysroot_hdrs_suffix to prefix. 
>>  */
> 
> Actually, it should be prepending target_system_root, but followed by 
> target_sysroot_hdrs_suffix rather than target_sysroot_suffix.  That is, 
> this function should be following add_sysrooted_prefix more closely.
> 
>> +  if (target_sysroot_hdrs_suffix)
> 
> So this should be "if (target_system_root)" - it needs to be sysrooted 
> even if there is no sysroot headers suffix.
> 
>> +{
>> +  char *sysroot_no_trailing_dir_separator
>> += xstrdup (target_sysroot_hdrs_suffix);
>> +  size_t sysroot_len = strlen (target_sysroot_hdrs_suffix);
> 
> And again this would use target_system_root.
> 
>> +  if (sysroot_len > 0
>> +  && target_sysroot_hdrs_suffix[sysroot_len - 1] == DIR_SEPARATOR)
>> +sysroot_no_trailing_dir_separator[sysroot_len - 1] = '\0';
> 
> Likewise.
> 
>> +  if (target_sysroot_suffix)
>> +prefix = concat (sysroot_no_trailing_dir_separator,
>> + target_sysroot_suffix, prefix, NULL);
> 
> While this would use target_sysroot_hdrs_suffix.
> 

Thanks for review, fixed that in updated version of the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From 8f60e280c40d60b1590d0eb41ce130582c7733a9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 20 Nov 2018 15:09:16 +0100
Subject: [PATCH] Extend locations where to seach for Fortran pre-include.

gcc/ChangeLog:

2019-01-14  Martin Liska  

	* Makefile.in: Set TOOL_INCLUDE_DIR and NATIVE_SYSTEM_HEADER_DIR
	for GCC driver.
	* config/gnu-user.h (TARGET_F951_OPTIONS): Add 'finclude%s/' as
	a new argument.
	* gcc.c (add_sysrooted_hdrs_prefix): New function.
	(path_prefix_reset): Move up in the source file.
	(find_fortran_preinclude_file): Make complex search for the
	fortran header files.
---
 gcc/Makefile.in   |   4 +-
 gcc/config/gnu-user.h |   2 +-
 gcc/gcc.c | 103 ++
 3 files changed, 87 insertions(+), 22 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2fa9083d1b3..095156bd537 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2172,7 +2172,9 @@ DRIVER_DEFINES = \
   @TARGET_SYSTEM_ROOT_DEFINE@ \
   $(VALGRIND_DRIVER_DEFINES) \
   $(if $(SHLIB),$(if $(filter yes,@enable_shared@),-DENABLE_SHARED_LIBGCC)) \
-  -DCONFIGURE_SPECS="\"@CONFIGURE_SPECS@\""
+  -DCONFIGURE_SPECS="\"@CONFIGURE_SPECS@\"" \
+  -DTOOL_INCLUDE_DIR=\"$(gcc_tooldir)/include\" \
+  -DNATIVE_SYSTEM_HEADER_DIR=\"$(NATIVE_SYSTEM_HEADER_DIR)\"
 
 CFLAGS-gcc.o += $(DRIVER_DEFINES) -DBASEVER=$(BASEVER_s)
 gcc.o: $(BASEVER)
diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index ba146921655..055a4f0afec 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -151,4 +151,4 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 #undef TARGET_F951_OPTIONS
 #define TARGET_F951_OPTIONS "%{!nostdinc:\
-  %:fortran-preinclude-file(-fpre-include= math-vector-fortran.h)}"
+  %:fortran-preinclude-file(-fpre-include= math-vector-fortran.h finclude%s/)}"
diff --git a/gcc/gcc.c b/gcc/gcc.c
index bcd04df1691..797ed36616f 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2976,6 +2976,44 @@ add_sysrooted_prefix (struct path_prefix *pprefix, const char *prefix,
   add_prefix (pprefix, prefix, component, priority,
 	  require_machine_suffix, os_multilib);
 }
+
+/* Same as add_prefix, but prepending target_sysroot_hdrs_suffix to prefix.  */
+
+static void
+add_sysrooted_hdrs_prefix (struct path_prefix *pprefix, const char *prefix,
+			   const char *component,
+			   /* enum prefix_priority */ int priority,
+			   int require_machine_suffix, int os_multilib)
+{
+  if (!IS_ABSOLUTE_PATH (prefix))
+fatal_error (input_location, "system path %qs is not absolute", prefix);
+
+  if (target_system_root)
+{
+  char *sysroot_no_trailing_dir_separator = xstrdup (target_system_root);
+  size_t sysroot_len = strlen (target_system_root);
+
+  if (sysroot_len > 0
+	  && target_system_root[sysroot_len - 1] == DIR_SEPARATOR)
+	sysroot_no_trailing_dir_separator[sysroot_len - 1] = '\0';
+
+  if (target_sysroot_hdrs_suffix)
+	prefix = concat (sysroot_no_trailing_dir_separator,
+			 target_sysroot_hdrs_suffix, prefix, NULL);
+  else
+	prefix = concat (sysroot_no_trailing_dir_separator, prefix, NULL);
+
+  free (sysroot_no_trailing_dir_separator);
+
+  /* We have to override this because GCC's notion of sysroot
+	 moves along with GCC.  */
+  component = "GCC";
+}
+
+  add_prefix (pprefix, prefix, component, priority,
+	  require_machine_suffix, os_multilib);
+}
+
 
 /* Execute the command specified by the arguments on the current line of spec.
When using pipes, this includes several piped-together commands
@@ -9896,20 +9934,61 @@ debug_level_greater_than_spec_func (int 

Re: add tsv110 pipeline scheduling

2019-01-14 Thread wuyuan (E)
Hi  Kyrill:
 The gcc 7.3.0 does not discard the store1 and load1 command; I did not 
expect the community's latest gcc changes so large .   
 now I downloaded the latest GCC code, put the patch into GCC source 
code, the compiler can pass, thank you very much for your work!

Best Regards,

wuyuan


  * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
  * config/aarch64/aarch64.md : Add "tsv110.md"
  * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 70b0766..085c40f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -103,7 +103,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 513aec1..97e0703 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -356,6 +356,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..e33c5cc
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   (const_string 

[PATCH][GCC][AArch64] Fix big-endian neon-intrinsics ICEs

2019-01-14 Thread Tamar Christina
Hi All,


This patch fixes some ICEs when the fcmla_lane intrinsics are used on
big endian by correcting the lane indices and removing the hardcoded byte
offset from subreg calls and instead use subreg_lowpart_offset.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Cross compiled and regtested on aarch64_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2019-01-14  Tamar Christina  

* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use 
correct
max nunits for endian swap.
(aarch64_expand_fcmla_builtin): Correct subreg code.
* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane,
aarch64_fcmla_laneqv4hf, aarch64_fcmlaq_lane): Correct 
lane
endianness.

-- 
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 04063e5ed134d2e64487db23b8fa7794817b2739..c8f5a555f6724433dc6cea1cff3547c0c66c54a7 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1197,7 +1197,9 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
 		= GET_MODE_NUNITS (vmode).to_constant ();
 		  aarch64_simd_lane_bounds (op[opc], 0, nunits / 2, exp);
 		  /* Keep to GCC-vector-extension lane indices in the RTL.  */
-		  op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
+		  int lane = INTVAL (op[opc]);
+		  op[opc] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane),
+	  SImode);
 		}
 	  /* Fall through - if the lane index isn't a constant then
 		 the next case will error.  */
@@ -1443,14 +1445,12 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
   int nunits = GET_MODE_NUNITS (quadmode).to_constant ();
   aarch64_simd_lane_bounds (lane_idx, 0, nunits / 2, exp);
 
-  /* Keep to GCC-vector-extension lane indices in the RTL.  */
-  lane_idx = aarch64_endian_lane_rtx (quadmode, INTVAL (lane_idx));
-
   /* Generate the correct register and mode.  */
   int lane = INTVAL (lane_idx);
 
   if (lane < nunits / 4)
-op2 = simplify_gen_subreg (d->mode, op2, quadmode, 0);
+op2 = simplify_gen_subreg (d->mode, op2, quadmode,
+			   subreg_lowpart_offset (d->mode, quadmode));
   else
 {
   /* Select the upper 64 bits, either a V2SF or V4HF, this however
@@ -1460,15 +1460,24 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
 	 gen_highpart_mode generates code that isn't optimal.  */
   rtx temp1 = gen_reg_rtx (d->mode);
   rtx temp2 = gen_reg_rtx (DImode);
-  temp1 = simplify_gen_subreg (d->mode, op2, quadmode, 0);
+  temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
+   subreg_lowpart_offset (d->mode, quadmode));
   temp1 = simplify_gen_subreg (V2DImode, temp1, d->mode, 0);
-  emit_insn (gen_aarch64_get_lanev2di (temp2, temp1 , const1_rtx));
+  if (BYTES_BIG_ENDIAN)
+	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
+  else
+	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const1_rtx));
   op2 = simplify_gen_subreg (d->mode, temp2, GET_MODE (temp2), 0);
 
   /* And recalculate the index.  */
   lane -= nunits / 4;
 }
 
+  /* Keep to GCC-vector-extension lane indices in the RTL, only nunits / 4
+ (max nunits in range check) are valid.  Which means only 0-1, so we
+ only need to know the order in a V2mode.  */
+  lane_idx = aarch64_endian_lane_rtx (V2DImode, lane);
+
   if (!target)
 target = gen_reg_rtx (d->mode);
   else
@@ -1477,8 +1486,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
   rtx pat = NULL_RTX;
 
   if (d->lane)
-pat = GEN_FCN (d->icode) (target, op0, op1, op2,
-			  gen_int_mode (lane, SImode));
+pat = GEN_FCN (d->icode) (target, op0, op1, op2, lane_idx);
   else
 pat = GEN_FCN (d->icode) (target, op0, op1, op2);
 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index be6c27d319a1ca6fee581d8f8856a4dff8f4a060..805d7a895fad4c7370260fd77ef9864805206b07 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -455,7 +455,10 @@
    (match_operand:SI 4 "const_int_operand" "n")]
    FCMLA)))]
   "TARGET_COMPLEX"
-  "fcmla\t%0., %2., %3., #"
+{
+  operands[4] = aarch64_endian_lane_rtx (mode, INTVAL (operands[4]));
+  return "fcmla\t%0., %2., %3., #";
+}
   [(set_attr "type" "neon_fcmla")]
 )
 
@@ -467,7 +470,10 @@
  (match_operand:SI 4 "const_int_operand" "n")]
  FCMLA)))]
   "TARGET_COMPLEX"
-  "fcmla\t%0.4h, %2.4h, %3.h[%4], #"
+{
+  operands[4] = aarch64_endian_lane_rtx (V4HFmode, INTVAL (operands[4]));
+  return "fcmla\t%0.4h, %2.4h, %3.h[%4], #";
+}
   [(set_attr "type" "neon_fcmla")]
 )
 
@@ -479,7 +485,12 @@
  (match_operand:SI 4 "const_int_operand" "n")]
  FCMLA)))]
   "TARGET_COMPLEX"
-  "fcmla\t%0., %2., %3., #"
+{
+  int nunits = GET_MODE_NUNITS (mode).to_constant ();
+  operands[4]
+= gen_int_mode (ENDIAN_LANE_N (nunits / 2, INTVAL 

Re: [PATCH, OpenACC] Properly handle wait clause with no arguments

2019-01-14 Thread Chung-Lin Tang

Hi Thomas,
this version of the wait-clause-with-no-args patch revises the following:

(1) The way the Fortran FE parts are implemented, which essentially is your 
code.
(I'll reflect that in the final ChangeLog)

(2) Instead of trying to encode ACC_ASYNC_NOVAL into num_waits, I've followed
your suggestion to just treat it as a normal async. This means the 
gcc/omp-expand.c
parts in the last patch are discarded.

(3) Things in oacc-parallel.c have been mostly adjusted to only handle the 
wait(ACC_ASYNC_NOVAL)
case inside goacc_wait().

Hope this is now okay for trunk when appropriate.

Thanks,
Chung-Lin
Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 267913)
+++ gcc/c/c-parser.c(working copy)
@@ -13410,7 +13410,7 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 }
 
 /* OpenACC:
-   wait ( int-expr-list ) */
+   wait [( int-expr-list )] */
 
 static tree
 c_parser_oacc_clause_wait (c_parser *parser, tree list)
@@ -13419,7 +13419,15 @@ c_parser_oacc_clause_wait (c_parser *parser, tree
 
   if (c_parser_peek_token (parser)->type == CPP_OPEN_PAREN)
 list = c_parser_oacc_wait_list (parser, clause_loc, list);
+  else
+{
+  tree c = build_omp_clause (clause_loc, OMP_CLAUSE_WAIT);
 
+  OMP_CLAUSE_DECL (c) = build_int_cst (integer_type_node, 
GOMP_ASYNC_NOVAL);
+  OMP_CLAUSE_CHAIN (c) = list;
+  list = c;
+}
+
   return list;
 }
 
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 267913)
+++ gcc/cp/parser.c (working copy)
@@ -32815,7 +32815,7 @@ cp_parser_oacc_wait_list (cp_parser *parser, locat
 }
 
 /* OpenACC:
-   wait ( int-expr-list ) */
+   wait [( int-expr-list )] */
 
 static tree
 cp_parser_oacc_clause_wait (cp_parser *parser, tree list)
@@ -32822,10 +32822,16 @@ cp_parser_oacc_clause_wait (cp_parser *parser, tre
 {
   location_t location = cp_lexer_peek_token (parser->lexer)->location;
 
-  if (cp_lexer_peek_token (parser->lexer)->type != CPP_OPEN_PAREN)
-return list;
+  if (cp_lexer_peek_token (parser->lexer)->type == CPP_OPEN_PAREN)
+list = cp_parser_oacc_wait_list (parser, location, list);
+  else
+{
+  tree c = build_omp_clause (location, OMP_CLAUSE_WAIT);
 
-  list = cp_parser_oacc_wait_list (parser, location, list);
+  OMP_CLAUSE_DECL (c) = build_int_cst (integer_type_node, 
GOMP_ASYNC_NOVAL);
+  OMP_CLAUSE_CHAIN (c) = list;
+  list = c;
+}
 
   return list;
 }
Index: gcc/fortran/openmp.c
===
--- gcc/fortran/openmp.c(revision 267913)
+++ gcc/fortran/openmp.c(working copy)
@@ -1885,7 +1885,19 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const
  break;
}
  else if (m == MATCH_NO)
-   needs_space = true;
+   {
+ gfc_expr *expr
+   = gfc_get_constant_expr (BT_INTEGER,
+gfc_default_integer_kind,
+_current_locus);
+ mpz_set_si (expr->value.integer, GOMP_ASYNC_NOVAL);
+ gfc_expr_list **expr_list = >wait_list;
+ while (*expr_list)
+   expr_list = &(*expr_list)->next;
+ *expr_list = gfc_get_expr_list ();
+ (*expr_list)->expr = expr;
+ needs_space = true;
+   }
  continue;
}
  if ((mask & OMP_CLAUSE_WORKER)
Index: libgomp/oacc-parallel.c
===
--- libgomp/oacc-parallel.c (revision 267913)
+++ libgomp/oacc-parallel.c (working copy)
@@ -206,9 +206,7 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (voi
case GOMP_LAUNCH_WAIT:
  {
unsigned num_waits = GOMP_LAUNCH_OP (tag);
-
-   if (num_waits)
- goacc_wait (async, num_waits, );
+   goacc_wait (async, num_waits, );
break;
  }
 
@@ -514,13 +512,20 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum,
 static void
 goacc_wait (int async, int num_waits, va_list *ap)
 {
-  struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev = thr->dev;
-
   while (num_waits--)
 {
   int qid = va_arg (*ap, int);
-  
+
+  /* Waiting on ACC_ASYNC_NOVAL maps to 'wait all'.  */
+  if (qid == acc_async_noval)
+   {
+ if (async == acc_async_sync)
+   acc_wait_all ();
+ else
+   acc_wait_all_async (async);
+ break;
+   }
+
   if (acc_async_test (qid))
continue;
 
@@ -531,7 +536,7 @@ goacc_wait (int async, int num_waits, va_list *ap)
launching on, the queue itself will order work as
required, so there's no need to wait explicitly.  */
   else
-   

Re: [PATCH v3 00/10] AMD GCN Port v3

2019-01-14 Thread Andrew Stubbs

On 11/01/2019 23:19, Jeff Law wrote:

And I think the V3 patch is reasonable enough to go in now.  There's
some concerns that have been raised with the implementation, but I'm
comfortable with Andrew faulting in fixes if those concerns turn into
real issues.

Andrew, you're green-lighted for the trunk.


Excellent!

Thank you very much Jeff. :-)

I will now rebase, retest, change all the dates to 2019, and get it 
committed.


I shall follow up with the various documentation adjustments in the next 
week or so, I hope.


Many thanks

Andrew



Re: PATCH: Add -Waddress-of-packed-member to GCC 9 porting guide

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 2:46 PM H.J. Lu  wrote:
>
> This patch adds -Waddress-of-packed-member to GCC 9 porting guide.
>
> OK to install?

The docs fail to mention what to do when the unaligned pointer is _not_
safe to use.  That is, how do I fix

struct { char c; int i[4]; } s __attribute__((packed));
int foo()
{
  int *p = s.i;
  return bar (p);
}
int bar (int *q)
{
  return *q;
}

for the cases where eliding the pointer isn't easily possible?

Please also mention the new warning in changes.html
(it seems to be enabled by default even?).

IIRC the frontends themselves build "bogus" pointer types
to aligned data from a simple [1] because the FIELD_DECLs
types are naturally aligned.

Richard.

> Thanks.
>
> H.J.
> ---
> Index: gcc-9/porting_to.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/porting_to.html,v
> retrieving revision 1.1
> diff -u -r1.1 porting_to.html
> --- gcc-9/porting_to.html   11 Jan 2019 18:21:45 -  1.1
> +++ gcc-9/porting_to.html   14 Jan 2019 13:46:07 -
> @@ -56,13 +56,36 @@
>}
>
>
> +C/C++ language issues
> +
> +-Waddress-of-packed-member
> +is enabled by default
> +
> +
> +  When address of packed member of struct or union is taken, it may result
> +  in an unaligned pointer value.  A new warning
> +  -Waddress-of-packed-member was added to check alignment at
> +  pointer assignment.  It warns both unaligned address and unaligned
> +  pointer.
> +
> +
> +
> +  If the pointer value is safe to use, you can suppress
> +  -Waddress-of-packed-member warnings by using pragmas:
> +
> +  
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
> +/* (code for which the warning is to be disabled)  */
> +#pragma GCC diagnostic pop
> +  
> +
>  
>
>  
>
>  

PATCH: Add -Waddress-of-packed-member to GCC 9 porting guide

2019-01-14 Thread H.J. Lu
This patch adds -Waddress-of-packed-member to GCC 9 porting guide.

OK to install?

Thanks.

H.J.
---
Index: gcc-9/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/porting_to.html,v
retrieving revision 1.1
diff -u -r1.1 porting_to.html
--- gcc-9/porting_to.html   11 Jan 2019 18:21:45 -  1.1
+++ gcc-9/porting_to.html   14 Jan 2019 13:46:07 -
@@ -56,13 +56,36 @@
   }
   
 
+C/C++ language issues
+
+-Waddress-of-packed-member
+is enabled by default
+
+
+  When address of packed member of struct or union is taken, it may result
+  in an unaligned pointer value.  A new warning
+  -Waddress-of-packed-member was added to check alignment at
+  pointer assignment.  It warns both unaligned address and unaligned
+  pointer.
+
+
+
+  If the pointer value is safe to use, you can suppress
+  -Waddress-of-packed-member warnings by using pragmas:
+
+  
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Waddress-of-packed-member"
+/* (code for which the warning is to be disabled)  */
+#pragma GCC diagnostic pop
+  
+
 
 
 
  
 

Re: [committed][PATCH][GCC][testsuite][Arm] fix testism, add required option after require.

2019-01-14 Thread Christophe Lyon
Hi Tamar,

On Fri, 11 Jan 2019 at 15:22, Tamar Christina  wrote:
>
> Hi All,
>
> The test declared the fp16 requirement, but didn't add the options causing it 
> to
> fail when the target doesn't have it on by default.
>
> Bootstrapped Regtested on arm-none-Linux-gnueabihf and no issues.
>
> committed under the gcc obvious rules.
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
> 2019-01-11  Tamar Christina  
>
> * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Require 
> neon
> and add options.
>

Thanks for this patch.

However, the scan-assembler-times part of the test still fail on armeb:
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#180 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#270 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\], #0
1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\],
#180 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\],
#270 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\td[0-9]+, d[0-9]+, d[0-9]+\\[1\\], #90
1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#180 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#270 3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90
3
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\], #0
1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\],
#180 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\],
#270 1
gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
scan-assembler-times vcmla.f16\\tq[0-9]+, q[0-9]+, d[0-9]+\\[1\\], #90
1

But you are probably already aware of that.

Christophe


> --


[PATCH] Improve match.pd dumping

2019-01-14 Thread Richard Biener


This distinguishes (match ...) from (simplify ...) where the former
doesn't really mean we apply some pattern but rather we have matched
some expression.

Committed as obvious.

Richard.

2019-01-14  Richard Biener  

* genmatch.c (dt_simplify::gen_1): Change dumping dependent on
whether we are in (simplify ...) or (match ...) context.

diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 5edd39af4cc..7b9b09c7d8b 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -3311,7 +3311,9 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 }
 
   fprintf_indent (f, indent, "if (__builtin_expect (dump_file && (dump_flags & 
TDF_FOLDING), 0)) "
-  "fprintf (dump_file, \"Applying pattern ");
+  "fprintf (dump_file, \"%s ",
+  s->kind == simplify::SIMPLIFY
+  ? "Applying pattern" : "Matching expression");
   fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
   output_line_directive (f,
 result ? result->location : s->match->location, true,


Re: [PATCH, testsuite] Skip new charset tests on Darwin8-10.

2019-01-14 Thread Jonathan Wakely

On 12/01/19 16:46 +, Iain Sandoe wrote:

Hi,

These earlier Darwin versions have “FP_≈” inside a comment in 
architecture/{ppc,i386}/math.h, which is included by math.h which causes the 
tests to fail.

The intent of the tests (i.e. to ensure that the library itself does not emit 
non-ascii) is covered by other platforms, including later Darwin editions.  
AFAICT, this issue was fixed from Darwin11 onwards (although I have not tested 
every edition / looked for other possible non-ascii cases, in other headers).

Since there’s no expectation that the headers would ever be updated, and it 
doesn’t seem worth applying fixincludes for this, let’s skip the tests on 
versions with the issue.

Tested on powerpc-darwin9, x86_64-darwin10 and x86_64-darwin18.

OK for trunk?


OK, thanks.



Re: [PATCH] Improve RTL DSE with -fstack-protector* (PR rtl-optimization/88796)

2019-01-14 Thread Richard Biener
On Fri, 11 Jan 2019, Jakub Jelinek wrote:

> On Fri, Jan 11, 2019 at 01:53:21PM +0100, Richard Biener wrote:
> > >The canary slot in the stack frame is written in the prologue using
> > >MEM_VOLATILE_P store, so we never consider those to be DSEd and is only
> > >read
> > >in the epilogue, so it shouldn't alias any other stores.
> > >Similarly, __stack_chk_guard variable or say the TLS ssp slot or
> > >whatever
> > >else is used to hold the random pointer-sized value really shouldn't be
> > >changed in -fstack-protector* instrumented functions, as that would
> > >mean
> > >they remembered one value in the prologue and would fail comparison in
> > >the
> > >epilogue if it changed in between.  So, I believe we can safely ignore
> > >the
> > >whole stack_pointer_test instruction in RTL DSE.
> > >
> > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Isn't it enough to have the decl marked DECL_NONALIASED?  Alias analysis
> > should not consider any address aliasing this (well, any with a mem_expr I
> > guess).
> 
> No.  RTL DSE gives up completely in all MEM_VOLATILE_P reads.
>   if ((MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
>   || (MEM_VOLATILE_P (mem)))
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, " adding wild read, volatile or barrier.\n");
>   add_wild_read (bb_info);
>   insn_info->cannot_delete = true;
>   return;
> }
> so it doesn't make into the alias oracle in any way, no idea why this has
> been added in that form, seems to be a big hammer to me, but it is like that
> (we obviously shouldn't try to replace_read those, but otherwise, I'd say
> that whether a volatile or non-volatile read kills some store or not doesn't
> really depend on whether it is volatile or not, but on the address;
> I guess stage4 isn't the right time to change that though, it is this way
> since r123530 when dse.c has been added).
> 
> Furthermore, the MEM_EXPR isn't always a DECL on which DECL_NONALIASED could 
> be
> applied, e.g. on x86_64-linux it is a MEM_REF built for the TLS memory slot.
> Those were killing all the stores too.

Ah, OK.

Well, the patch is OK then I suppose.

Thanks,
Richard.


Re: C++ PATCH for c++/88830 - ICE with abstract class

2019-01-14 Thread Jakub Jelinek
On Sun, Jan 13, 2019 at 09:07:00PM -0500, Marek Polacek wrote:
> diff --git gcc/cp/decl2.c gcc/cp/decl2.c
> index e4cf4e0a361..7b656712471 100644
> --- gcc/cp/decl2.c
> +++ gcc/cp/decl2.c
> @@ -2229,7 +2229,8 @@ maybe_emit_vtables (tree ctype)
>   never get generated.  */
>if (CLASSTYPE_PURE_VIRTUALS (ctype)
>&& TYPE_HAS_NONTRIVIAL_DESTRUCTOR (ctype)
> -  && DECL_DEFAULTED_IN_CLASS_P(CLASSTYPE_DESTRUCTOR(ctype)))
> +  && !CLASSTYPE_LAZY_DESTRUCTOR (ctype)
> +  && DECL_DEFAULTED_IN_CLASS_P (CLASSTYPE_DESTRUCTOR (ctype)))
>  note_vague_linkage_fn (CLASSTYPE_DESTRUCTOR(ctype));

Just a formatting nit.  s/CLASSTYPE_DESTRUCTOR/& / on the above line too
when you are at it.  Otherwise I came up with identical patch to yours
(should have noticed the PR is ASSIGNED :( ).

Jakub


Re: [PATCH 10/10] libiberty: Correct an invalid assumption

2019-01-14 Thread Iain Buclaw
On Fri, 11 Jan 2019 at 01:20, Ben L  wrote:
>
> Hi all,
>
> First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
> there's obvious errors repeated in my patches. AFAICT I should be sending each
> change individually rather than as one bulk patch, so I'm sorry about the spam
> too.
>
> All of these changes were found by fuzzing libiberty's demanglers over the
> past week, and I have at least one more that it's currently crashing out on
> but I haven't had time to look into why yet.
>
> Obviously since this is my first time emailing I don't have write access to
> commit any of these, so if any are approved then I'd be grateful if you can
> commit them too.
>
> Thanks,
> Ben
>
> --
>
> As a counter example: 888 * 10 = -3344831479658869200, which 
> is
> valid for 64 bit longs, and evidently divisible by 10.
>
> Also safely check that adding the digit won't cause an overflow too.
>
> No testcase provided since one of the previous testcases flagged this issue 
> up.
>
>  * d-demangle.c: Include  if available.
>  (LONG_MAX): Define if necessary.
>  (dlang_number): Fix overflow.
>

Thanks, do you have a copyright assignment with the FSF?

Looks like the D demangling bits can just be committed as one patch,
just one nit though.

---
> @@ -206,15 +213,18 @@ dlang_number (const char *mangled, long *ret)
>
>   while (ISDIGIT (*mangled))
> {
> +  long digit = mangled[0] - '0';
> +  mangled++;
> +
> +  if (*ret > LONG_MAX / 10)
> +   return NULL;
> +
>   (*ret) *= 10;
>
> -  /* If an overflow occured when multiplying by ten, the result
> -will not be a multiple of ten.  */
> -  if ((*ret % 10) != 0)
> +  if (LONG_MAX - digit < *ret)
>return NULL;
>
> -  (*ret) += mangled[0] - '0';
> -  mangled++;
> +  (*ret) += digit;
>  }
>
>if (*mangled == '\0' || *ret < 0)
---

Rather than checking for overflow twice, I think it would be
sufficient to just do:
---
long digit = mangled[0] - '0';

if (*ret > ((LONG_MAX - digit) / 10))
  return NULL;

(*ret) *= 10;
(*ret) += digit;
mangled++;
---

-- 
Iain


Re: [PATCH 3/3][GCC][AARCH64] Add support for pointer authentication B key

2019-01-14 Thread Kyrill Tkachov



On 08/01/19 11:38, Sam Tebbs wrote:


On 1/7/19 6:28 PM, James Greenhalgh wrote:
> On Fri, Dec 21, 2018 at 09:00:10AM -0600, Sam Tebbs wrote:
>> On 11/9/18 11:04 AM, Sam Tebbs wrote:
>
> 
>
>> Attached is an improved patch with "hint" removed from the test scans,
>> pauth_hint_num_a and pauth_hint_num_b merged into pauth_hint_num and the
>> "gcc_assert (cfun->machine->frame.laid_out)" removal reverted since was
>> an unnecessary change.
>>
>> OK for trunk?
> While the AArch64 parts look OK to me and are buried behind an option so are
> relatively safe even though we're late in development, you'll need someone
> else to approve the libgcc changes. Especially as you change a generic
> routine with an undocumented (?) AArch64-specific change.
>
> Thanks,
> James

Thanks James, CC'ing Ian Lance Taylor.



Jeff, could you help with reviewing the libgcc changes please?
I believe the latest version was posted at:
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01569.html

Thanks,
Kyrill


The documentation relevant to the libgcc change is expected to be
published in the near future.

>
>> gcc/
>> 2018-12-21  Sam Tebbs
>>
>>   * config/aarch64/aarch64-builtins.c (aarch64_builtins): Add
>>   AARCH64_PAUTH_BUILTIN_AUTIB1716 and AARCH64_PAUTH_BUILTIN_PACIB1716.
>>   * config/aarch64/aarch64-builtins.c (aarch64_init_pauth_hint_builtins):
>>   Add autib1716 and pacib1716 initialisation.
>>   * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin): Add 
checks
>>   for autib1716 and pacib1716.
>>   * config/aarch64/aarch64-protos.h (aarch64_key_type,
>>   aarch64_post_cfi_startproc): Define.
>>   * config/aarch64/aarch64-protos.h (aarch64_ra_sign_key): Define extern.
>>   * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): 
Add
>>   check for b-key.
>>   * config/aarch64/aarch64.c (aarch64_ra_sign_key,
>>   aarch64_post_cfi_startproc, aarch64_handle_pac_ret_b_key): Define.
>>   * config/aarch64/aarch64.h (TARGET_ASM_POST_CFI_STARTPROC): Define.
>>   * config/aarch64/aarch64.c (aarch64_pac_ret_subtypes): Add "b-key".
>>   * config/aarch64/aarch64.md (unspec): Add UNSPEC_AUTIA1716,
>>   UNSPEC_AUTIB1716, UNSPEC_AUTIASP, UNSPEC_AUTIBSP, UNSPEC_PACIA1716,
>>   UNSPEC_PACIB1716, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>   * config/aarch64/aarch64.md (do_return): Add check for b-key.
>>   * config/aarch64/aarch64.md (sp): Replace
>>   pauth_hint_num_a with pauth_hint_num.
>>   * config/aarch64/aarch64.md (1716): Replace
>>   pauth_hint_num_a with pauth_hint_num.
>>   * config/aarch64/aarch64.opt (msign-return-address=): Deprecate.
>>   * config/aarch64/iterators.md (PAUTH_LR_SP): Add UNSPEC_AUTIASP,
>>   UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>   * config/aarch64/iterators.md (PAUTH_17_16): Add UNSPEC_AUTIA1716,
>>   UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716.
>>   * config/aarch64/iterators.md (pauth_mnem_prefix): Add 
UNSPEC_AUTIA1716,
>>   UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716, UNSPEC_AUTIASP,
>>   UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>   * config/aarch64/iterators.md (pauth_hint_num_a): Replace
>>   UNSPEC_PACI1716 and UNSPEC_AUTI1716 with UNSPEC_PACIA1716 and
>>   UNSPEC_AUTIA1716 respectively.
>>   * config/aarch64/iterators.md (pauth_hint_num_a): Rename to 
pauth_hint_num
>>   and add UNSPEC_PACIBSP, UNSPEC_AUTIBSP, UNSPEC_PACIB1716, 
UNSPEC_AUTIB1716.
>>
>> gcc/testsuite
>> 2018-12-21  Sam Tebbs
>>
>>   * gcc.target/aarch64/return_address_sign_1.c (dg-final): Replace
>>   "autiasp" and "paciasp" with "hint\t29 // autisp" and
>>   "hint\t25 // pacisp" respectively.
>>   * gcc.target/aarch64/return_address_sign_2.c (dg-final): Replace
>>   "paciasp" with "hint\t25 // pacisp".
>>   * gcc.target/aarch64/return_address_sign_3.c (dg-final): Replace
>>   "paciasp" and "autiasp" with "pacisp" and "autisp" respectively.
>>   * gcc.target/aarch64/return_address_sign_b_1.c: New file.
>>   * gcc.target/aarch64/return_address_sign_b_2.c: New file.
>>   * gcc.target/aarch64/return_address_sign_b_3.c: New file.
>>   * gcc.target/aarch64/return_address_sign_b_exception.c: New file.
>>   * gcc.target/aarch64/return_address_sign_builtin.c: New file
>>
>> libgcc/
>> 2018-12-21  Sam Tebbs
>>
>>   * config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key): New
>>   function.
>>   * config/aarch64/aarch64-unwind.h (aarch64_post_extract_frame_addr,
>>   aarch64_post_frob_eh_handler_addr): Add check for b-key.
>>   * unwind-dw2-fde.c (get_cie_encoding): Add check for 'B' in 
augmentation
>>   string.
>>   * unwind-dw2.c (extract_cie_info): Add check for 'B' in augmentation
>>   string.
>>




Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 10:21 AM Jonathan Wakely  wrote:
>
> On Mon, 14 Jan 2019 at 09:17, Richard Biener  
> wrote:
> >
> > On Mon, Jan 14, 2019 at 9:42 AM Jakub Jelinek  wrote:
> > >
> > > On Mon, Jan 14, 2019 at 09:29:03AM +0100, Richard Biener wrote:
> > > > So why is this not just
> > > >
> > > >   return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
> > > >
> > > > or with the casts elided?  Does the C++ standard say pointers are
> > > > to be compared unsigned here?  Or do all targets GCC support
> > > > lay out the address space in a way that this is correct for pointers
> > > > into distinct objects?
> > >
> > > See PR78420 for details on why it is done that way.
> >
> > I see.  So the __builtin_is_constant_evaluated thing makes it
> > "correct" (but then eventually exposing the non-total order issue again).
>
> No, because comparing unrelated pointers isn't allowed in constexpr
> contexts, so it just gets rejected at compile time.

I see.

>
> > And if I read the PR correctly we'd really like to be able to write
> >
> >  if (__builtin_constant_p (, ))
> >return result;
> >
> > to make sure whatever undesired-in-the-IL things of 
> > do not leak there.
> >
> > Btw, wouldn't sth like
> >
> >   if (__builtin_is_constant_evaluated())
> > {
> >union U { __UINTPTR_TYPE__ u; _Tp *p } __ux, __uy;
> >__ux.p = __x;
> >__uy.p = __y;
> >return __ux.u < __uy.u;
> > }
> >
> > be more correct and consistent?  Well, or any other way of
> > evading that reinterpret-cast "issue"?
> >
> > Richard.
> >
> >
> > >
> > > Jakub


[PATCH, d] Committed merge with upstream dmd

2019-01-14 Thread Iain Buclaw
Hi,

This patch merges the D front-end implementation with dmd upstream cd2034cd7.

One fix in the asm statement parser to stop parsing if the end of the
statement has been reached, and moves all inline asm tests to gdc.dg.
These being adjusted where necessary to test the GCC style instead.

Bootstrapped and tested on x86_64-linux-gnu.

Committed to trunk as r267913.
-- 
Iain
---
gcc/testsuite/ChangeLog:

2019-01-14  Iain Buclaw  

* gdc.dg/asm1.d: New test.
* gdc.dg/asm2.d: New test.
* gdc.dg/asm3.d: New test.
* gdc.dg/asm4.d: New test.
* lib/gdc.exp (gdc_init): Set gcc_error_prefix and gcc_warning_prefix.
---
diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index feb65923273..a3b2db74af4 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-6d5b853d30908638d49210ebe600917296b8ab9b
+cd2034cd7b157dd8f3e94c684061bb1aa630b2b6
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/iasmgcc.c b/gcc/d/dmd/iasmgcc.c
index 3c0494d5717..cecbdefe41a 100644
--- a/gcc/d/dmd/iasmgcc.c
+++ b/gcc/d/dmd/iasmgcc.c
@@ -224,7 +224,7 @@ Lerror:
 static GccAsmStatement *parseGccAsm(Parser *p, GccAsmStatement *s)
 {
 s->insn = p->parseExpression();
-if (p->token.value == TOKsemicolon)
+if (p->token.value == TOKsemicolon || p->token.value == TOKeof)
 goto Ldone;
 
 // No semicolon followed after instruction template, treat as extended asm.
@@ -254,7 +254,7 @@ static GccAsmStatement *parseGccAsm(Parser *p, GccAsmStatement *s)
 assert(0);
 }
 
-if (p->token.value == TOKsemicolon)
+if (p->token.value == TOKsemicolon || p->token.value == TOKeof)
 goto Ldone;
 }
 Ldone:
@@ -288,6 +288,7 @@ Statement *gccAsmSemantic(GccAsmStatement *s, Scope *sc)
 *ptoklist = NULL;
 }
 p.token = *toklist;
+p.scanloc = s->loc;
 
 // Parse the gcc asm statement.
 s = parseGccAsm(, s);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d0611e3bc37..373f39d2a8b 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,11 @@
+2019-01-14  Iain Buclaw  
+
+	* gdc.dg/asm1.d: New test.
+	* gdc.dg/asm2.d: New test.
+	* gdc.dg/asm3.d: New test.
+	* gdc.dg/asm4.d: New test.
+	* lib/gdc.exp (gdc_init): Set gcc_error_prefix and gcc_warning_prefix.
+
 2019-01-13  Jerry DeLisle  
 
 	PR libfortran/88776
diff --git a/gcc/testsuite/gdc.dg/asm1.d b/gcc/testsuite/gdc.dg/asm1.d
new file mode 100644
index 000..7b00e4d54ec
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/asm1.d
@@ -0,0 +1,82 @@
+// { dg-do compile }
+module asm1;
+
+void parse1()
+{
+asm
+{
+""h;// { dg-error "found 'h' when expecting ':'" }
+}
+}
+
+void parse2()
+{
+asm 
+{
+"" : : "g" 1 ? 2 : 3;
+"" : : "g" 1 ? 2 : : 3;
+// { dg-error "expression expected, not ':'" "" { target *-*-* } .-1 }
+// { dg-error "expected constant string constraint for operand" "" { target *-*-* } .-2 }
+}
+}
+
+void parse3()
+{
+asm { "" [; }
+// { dg-error "expression expected, not ';'" "" { target *-*-* } .-1 }
+// { dg-error "found 'EOF' when expecting ','" "" { target *-*-* } .-2 }
+// { dg-error "found 'EOF' when expecting ']'" "" { target *-*-* } .-3 }
+// { dg-error "found 'EOF' when expecting ';'" "" { target *-*-* } .-4 }
+}
+
+void semantic1()
+{
+{
+int one;
+L1:
+;
+}
+asm { "" : : : : L1, L2; }
+// { dg-error "goto skips declaration of variable asm1.semantic1.one" "" { target *-*-* } .-1 }
+// { dg-error "goto skips declaration of variable asm1.semantic1.two" "" { target *-*-* } .-2 }
+{
+int two;
+L2:
+;
+}
+}
+
+void semantic2a(X...)(X expr)
+{
+alias X[0] var1;
+asm { "%0" : "=m" var1; }   // { dg-error "double 'double' is a type, not an lvalue" }
+}
+
+void semantic2()
+{
+   semantic2a(3.6); // { dg-error "template instance asm1.semantic2a!double error instantiating" }
+}
+
+void semantic3()
+{
+asm 
+{
+unknown;// { dg-error "undefined identifier" }
+}
+}
+
+struct S4
+{
+template opDispatch(string Name, P...)
+{
+static void opDispatch(P) {}
+}
+}
+
+void semantic4()
+{
+asm
+{
+"%0" : : "m" S4.foo;// { dg-error "template instance opDispatch!\"foo\" has no value" }
+}
+}
diff --git a/gcc/testsuite/gdc.dg/asm2.d b/gcc/testsuite/gdc.dg/asm2.d
new file mode 100644
index 000..bce0e41a60f
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/asm2.d
@@ -0,0 +1,8 @@
+// { dg-do compile }
+module asm2;
+
+void test()
+{
+asm const shared { }// { dg-error "const/immutable/shared/inout attributes are not allowed on asm blocks" }
+}
+
diff --git a/gcc/testsuite/gdc.dg/asm3.d b/gcc/testsuite/gdc.dg/asm3.d
new file mode 100644
index 000..333d83ec99b
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/asm3.d
@@ -0,0 +1,24 @@

Re: add tsv110 pipeline scheduling

2019-01-14 Thread Kyrill Tkachov

Hi Wuyuan,


On 13/01/19 09:36, wuyuan (E) wrote:

Hi  Kyrill:
Thank you very much for your to review my patch. I have modified the code 
accordingly to your opinion.
first, mul64 was renamed to widen_mul64, and use load_4, load_8 to loading 
4 and 8 bytes in the latest version of GCC. besides, I change the reservation 
durations (the *16 part above) to 8. Test performance with some test cases, the 
result has improvement (Will these changes improvement performance?).
now, the tsv110 automaton size is 8641 states. I don't know if the code 
modification is complete. If there is any need to modify it, please let me 
know, thank you.


Thanks, that's a much better size.
I wouldn't expect the changes to change performance much, but it is possible.
One comment inline.



  2019-01-11  wuyuan  

 * config/aarch64/aarch64-cores.def (tsv1100): Change scheduling model.
 * config/aarch64/aarch64.md : Add "tsv110.md"
 * config/aarch64/tsv110.md: New file.

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 70b0766..085c40f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -103,7 +103,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, cortexa72, 
0x41, 0xd0c, -1)
  
  /* HiSilicon ('H') cores. */

-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 
0xd01, -1)
  
  /* ARMv8.4-A Architecture Processors.  */
  
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

index 513aec1..97e0703 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -356,6 +356,7 @@
  (include "thunderx.md")
  (include "../arm/xgene1.md")
  (include "thunderx2t99.md")
+(include "tsv110.md")
  
  ;; ---

  ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..e33c5cc
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+   

Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Jonathan Wakely
On Mon, 14 Jan 2019 at 09:17, Richard Biener  wrote:
>
> On Mon, Jan 14, 2019 at 9:42 AM Jakub Jelinek  wrote:
> >
> > On Mon, Jan 14, 2019 at 09:29:03AM +0100, Richard Biener wrote:
> > > So why is this not just
> > >
> > >   return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
> > >
> > > or with the casts elided?  Does the C++ standard say pointers are
> > > to be compared unsigned here?  Or do all targets GCC support
> > > lay out the address space in a way that this is correct for pointers
> > > into distinct objects?
> >
> > See PR78420 for details on why it is done that way.
>
> I see.  So the __builtin_is_constant_evaluated thing makes it
> "correct" (but then eventually exposing the non-total order issue again).
>
> And if I read the PR correctly we'd really like to be able to write
>
>  if (__builtin_constant_p (, ))
>return result;
>
> to make sure whatever undesired-in-the-IL things of 
> do not leak there.
>
> Btw, wouldn't sth like
>
>   if (__builtin_is_constant_evaluated())
> {
>union U { __UINTPTR_TYPE__ u; _Tp *p } __ux, __uy;
>__ux.p = __x;
>__uy.p = __y;
>return __ux.u < __uy.u;
> }
>
> be more correct and consistent?  Well, or any other way of
> evading that reinterpret-cast "issue"?

Also no, because you can't change the active member of a union in
constexpr (and can't type pun using unions either).


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Jonathan Wakely
On Mon, 14 Jan 2019 at 09:17, Richard Biener  wrote:
>
> On Mon, Jan 14, 2019 at 9:42 AM Jakub Jelinek  wrote:
> >
> > On Mon, Jan 14, 2019 at 09:29:03AM +0100, Richard Biener wrote:
> > > So why is this not just
> > >
> > >   return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
> > >
> > > or with the casts elided?  Does the C++ standard say pointers are
> > > to be compared unsigned here?  Or do all targets GCC support
> > > lay out the address space in a way that this is correct for pointers
> > > into distinct objects?
> >
> > See PR78420 for details on why it is done that way.
>
> I see.  So the __builtin_is_constant_evaluated thing makes it
> "correct" (but then eventually exposing the non-total order issue again).

No, because comparing unrelated pointers isn't allowed in constexpr
contexts, so it just gets rejected at compile time.


> And if I read the PR correctly we'd really like to be able to write
>
>  if (__builtin_constant_p (, ))
>return result;
>
> to make sure whatever undesired-in-the-IL things of 
> do not leak there.
>
> Btw, wouldn't sth like
>
>   if (__builtin_is_constant_evaluated())
> {
>union U { __UINTPTR_TYPE__ u; _Tp *p } __ux, __uy;
>__ux.p = __x;
>__uy.p = __y;
>return __ux.u < __uy.u;
> }
>
> be more correct and consistent?  Well, or any other way of
> evading that reinterpret-cast "issue"?
>
> Richard.
>
>
> >
> > Jakub


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Jonathan Wakely
On Mon, 14 Jan 2019 at 08:29, Richard Biener  wrote:
>
> On Thu, Jan 10, 2019 at 10:02 AM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > In Marc's testcase, we generate terrible code for std::string assignment,
> > because the __builtin_constant_p is kept in the IL for way too long and the
> > optimizers (jump threading?) create way too many copies of the
> > memcpy/memmove calls that it is then hard to bring it back in sanitity.
> > On the testcase in the PR, GCC 7 emits on x86_64 with -O2 99 bytes long
> > function, GCC 9 unpatched 259 bytes long, with this patch it emits
> > 139 bytes long, better but still not as good as before.  I guess we'll need
> > to improve GIMPLE optimizers too, but having twice as small IL for these
> > heavily used operators where e.g. _M_disjunct uses two of them and we wind
> > up with twice as many branches because of that is IMHO very useful.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 1) I'm not really sure about proper formatting in libstdc++, I thought you
> >don't use space before ( in function calls, but then why is there a space
> >in __builtin_constant_p?
> > 2) not really sure about that #if __cplusplus >= 201402L either, I think we
> >don't really want to use __builtin_is_constant_evaluated at least in
> >C++98 code, but even in C++11, if the operator isn't constexpr, is there
> >any point trying to help it do the right thing in constexpr contexts?
> >
> > 2019-01-09  Jakub Jelinek  
> >
> > PR tree-optimization/88775
> > * include/bits/stl_function.h (greater<_Tp*>::operator(),
> > less<_Tp*>::operator(), greater_equal<_Tp*>::operator(),
> > less_equal<_Tp*>::operator()): Use __builtin_is_constant_evaluated
> > instead of __builtin_constant_p if available.  Don't bother with
> > the pointer comparison in C++11 and earlier.
> >
> > --- libstdc++-v3/include/bits/stl_function.h.jj 2019-01-01 
> > 12:45:51.182541077 +0100
> > +++ libstdc++-v3/include/bits/stl_function.h2019-01-09 
> > 23:15:34.824800676 +0100
> > @@ -413,8 +413,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >_GLIBCXX14_CONSTEXPR bool
> >operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
> >{
> > +#if __cplusplus >= 201402L
> > +#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
> > +   if (__builtin_is_constant_evaluated())
> > +#else
> > if (__builtin_constant_p (__x > __y))
> > +#endif
> >   return __x > __y;
> > +#endif
> > return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
>
> I wonder what the idea behind this is.  It smells like trying to avoid
> undefined behavior (relational compare of pointers to different objects?)

That's not undefined in C++, it just gives an unspecified result (so
it's not specified whether x < y or y < x, or possibly even x == y,
e.g. for segmented memory).

The std::greater, std::less etc. function objects are required to give
a total order across all pointers, different objects or not, i.e.
while < might give any unspecified result, std::less has tighter
restrictions.

For that to work in general, we need the casts, or GCC's optimizers
give the wrong result. But within constexpr those functions only need
to be valid for related objects (as in C) and so we just use < there,
and rely on the compiler to reject comparisons to different objects
(because that's not allowed in constexpr).

So we can't just use < everywhere, because the optimizers don't like
it, and we can't use the cast everywhere, because that's not allowed
in constexpr.


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 9:42 AM Jakub Jelinek  wrote:
>
> On Mon, Jan 14, 2019 at 09:29:03AM +0100, Richard Biener wrote:
> > So why is this not just
> >
> >   return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
> >
> > or with the casts elided?  Does the C++ standard say pointers are
> > to be compared unsigned here?  Or do all targets GCC support
> > lay out the address space in a way that this is correct for pointers
> > into distinct objects?
>
> See PR78420 for details on why it is done that way.

I see.  So the __builtin_is_constant_evaluated thing makes it
"correct" (but then eventually exposing the non-total order issue again).

And if I read the PR correctly we'd really like to be able to write

 if (__builtin_constant_p (, ))
   return result;

to make sure whatever undesired-in-the-IL things of 
do not leak there.

Btw, wouldn't sth like

  if (__builtin_is_constant_evaluated())
{
   union U { __UINTPTR_TYPE__ u; _Tp *p } __ux, __uy;
   __ux.p = __x;
   __uy.p = __y;
   return __ux.u < __uy.u;
}

be more correct and consistent?  Well, or any other way of
evading that reinterpret-cast "issue"?

Richard.


>
> Jakub


Re: [PATCH 2/3] Fix autoprofiledbootstrap

2019-01-14 Thread Bin.Cheng
On Mon, Jan 14, 2019 at 4:20 PM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> autoprofiledbootstrap fails currently with
>
> In file included from ../../gcc/gcc/hash-table.h:236,
>  from ../../gcc/gcc/coretypes.h:440,
>  from ../../gcc/gcc/ipa-devirt.c:110:
> In static member function 'static void va_heap::release(vec vl_embed>*&) [with T = tree_node*]',
> inlined from 'void vec::release() [with T = tree_node*]' at 
> ../../gcc/gcc/vec.h:1679:20,
> inlined from 'auto_vec::~auto_vec() [with T = tree_node*; long 
> unsigned int N = 8]' at ../../gcc/gcc/vec.h:1436:5,
> inlined from 'vec possible_polymorphic_call_targets(tree, 
> long int, ipa_polymorphic_call_context, bool*, void**, bool)' at 
> ../../gcc/gcc/ipa-devirt.c:3099:22:
> ../../gcc/gcc/vec.h:311:10: error: attempt to free a non-heap object 
> 'bases_to_consider' [-Werror=free-nonheap-object]
>   311 |   ::free (v);
>   |   ~~~^~~
> ../../gcc/gcc/vec.h:311:10: error: attempt to free a non-heap object 
> 'bases_to_consider' [-Werror=free-nonheap-object]
> cc1plus: all warnings being treated as errors
>
> The problem is that auto_vec uses a variable to keep track if the vector
> is on the heap or auto. Normally this gets constant resolved, but only
> when the right functions are inlined. With autofdo for some reason
> the compiler decides to not inline these vec functions, even though
> they are marked as "inline"
A comment not closely related to this patch.  We observed the same
inline behavior in which perf data is inadequate, sometime it has
non-trivial impact on kernel compilation.  We have patch fall back to
guessed profile count if the profiled count is of low quality.  Will
send it out in GCC10.

Thanks,
bin
>
> Mark them as ALWAYS_INLINE instead.
>
> gcc/:
>
> 2019-01-14  Andi Kleen  
>
> * vec.h (using_auto_storage, release): Mark as ALWAYS_INLINE.
> ---
>  gcc/vec.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/vec.h b/gcc/vec.h
> index 407269c5ad3..1f5b78b1fac 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -1664,7 +1664,7 @@ vec::create (unsigned nelems 
> MEM_STAT_DECL)
>  /* Free the memory occupied by the embedded vector.  */
>
>  template
> -inline void
> +ALWAYS_INLINE void
>  vec::release (void)
>  {
>if (!m_vec)
> @@ -1940,7 +1940,7 @@ vec::reverse (void)
>  }
>
>  template
> -inline bool
> +ALWAYS_INLINE bool
>  vec::using_auto_storage () const
>  {
>return m_vec->m_vecpfx.m_using_auto_storage;
> --
> 2.19.1
>


Re: [PATCH 3/3] Increase iterations for autofdo tests

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 9:20 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Bin cheng pointed out that the autofdo tests are unstable because they
> don't have enough iterations for the perf sampling to get enough data.
>
> Increase the iterations, but only for autofdo. This avoids any impact
> on targets that use a slow emulator, which will never run the host
> only autofdo tests.

Can you instead use sth like AFDO_ITER_FACTOR #defined to 1 if not
defined?

> gcc/testsuite/:
>
> 2019-01-14  Andi Kleen  
>
> * g++.dg/tree-prof/morefunc.C (ITER): Add.
> (test1): Use.
> (test2): Use.
> * gcc.dg/tree-prof/cold_partition_label.c (ITER): Add.
> (main): Use.
> * gcc.dg/tree-prof/crossmodule-indircall-1.c (ITER): Add.
> (main): Use
> * gcc.dg/tree-prof/indir-call-prof.c (ITER): Add.
> (main): Use.
> * gcc.dg/tree-prof/peel-1.c (ITER): Add.
> (t): Use.
> (main): Use.
> * gcc.dg/tree-prof/pr52027.c (ITER): Add.
> (main): Use.
> * gcc.dg/tree-prof/tracer-1.c (ITER): Add.
> (main): Use.
> * gcc.dg/tree-prof/unroll-1.c (ITER): Add.
> (t): Use.
> (main): Use.
> * gcc.dg/tree-prof/update-cunroll-2.c (ITER): Add.
> (main): Use.
> * lib/profopt.exp: Pass -DITER to autofdo compilations.
> ---
>  gcc/testsuite/g++.dg/tree-prof/morefunc.C  |  8 ++--
>  gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c  |  6 +-
>  .../gcc.dg/tree-prof/crossmodule-indircall-1.c | 10 +++---
>  gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c   |  6 +-
>  gcc/testsuite/gcc.dg/tree-prof/peel-1.c| 10 +++---
>  gcc/testsuite/gcc.dg/tree-prof/pr52027.c   |  6 +-
>  gcc/testsuite/gcc.dg/tree-prof/tracer-1.c  |  7 ++-
>  gcc/testsuite/gcc.dg/tree-prof/unroll-1.c  | 10 +++---
>  gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c  |  8 ++--
>  gcc/testsuite/lib/profopt.exp  |  4 ++--
>  10 files changed, 56 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C 
> b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> index a9bdc167f45..02b01c073e9 100644
> --- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> +++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> @@ -2,6 +2,10 @@
>  #include "reorder_class1.h"
>  #include "reorder_class2.h"
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int g;
>
>  #ifdef _PROFILE_USE
> @@ -19,7 +23,7 @@ static __attribute__((always_inline))
>  void test1 (A *tc)
>  {
>int i;
> -  for (i = 0; i < 1000; i++)
> +  for (i = 0; i < ITER; i++)
>   g += tc->foo();
> if (g<100) g++;
>  }
> @@ -28,7 +32,7 @@ static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
>int i;
> -  for (i = 0; i < 100; i++)
> +  for (i = 0; i < ITER; i++)
>   g += tc->foo();
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c 
> b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> index 450308d6407..099069da6a7 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> @@ -9,6 +9,10 @@ const char *sarr[SIZE];
>  const char *buf_hot;
>  const char *buf_cold;
>
> +#ifndef ITER
> +#define ITER 100
> +#endif
> +
>  __attribute__((noinline))
>  void
>  foo (int path)
> @@ -32,7 +36,7 @@ main (int argc, char *argv[])
>int i;
>buf_hot =  "hello";
>buf_cold = "world";
> -  for (i = 0; i < 100; i++)
> +  for (i = 0; i < ITER; i++)
>  foo (argc);
>return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> index 58109d54dc7..32d22c69c6c 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> @@ -2,6 +2,10 @@
>  /* { dg-additional-sources "crossmodule-indircall-1a.c" } */
>  /* { dg-options "-O3 -flto -DDOJOB=1" } */
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int a;
>  extern void (*p[2])(int n);
>  void abort (void);
> @@ -10,12 +14,12 @@ main()
>  { int i;
>
>/* This call shall be converted.  */
> -  for (i = 0;i<1000;i++)
> +  for (i = 0;i  p[0](1);
>/* This call shall not be converted.  */
> -  for (i = 0;i<1000;i++)
> +  for (i = 0;i  p[i%2](2);
> -  if (a != 1000)
> +  if (a != ITER)
>  abort ();
>
>return 0;
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c 
> b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> index 53063c3e7fa..8b9dfbb78c7 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> @@ -1,5 +1,9 @@
>  /* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile 
> -fdump-ipa-afdo" } */
>
> +#ifndef ITER
> +#define ITER 10
> +#endif
> +
>  static 

Re: [PATCH 1/3] Lower sampling rate for autofdo bootstrap

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 9:20 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> autofdo create_gcov uses a lot of memory for large sample files.
> Since gcc runs quite long the sample files generated during
> the bootstrap are fairly ig.
>
> Currently I can't even build make autoprofiledbootstrap on my system at
> home because create_gcov needs more than 12GB and runs out of memory.
>
> This should probably be fixed in create_gcov, but for now
> lowering the sampling rate works well enough for me. The bootstrap
> run is long enough that it gets good enough data in any case.

OK.

Richard.

> gcc/:
> 2019-01-14  Andi Kleen  
>
> * Makefile.in: Lower autofdo sampling rate by 10x.
> * Makefile.tpl: Dito.
> ---
>  Makefile.in  | 2 +-
>  Makefile.tpl | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile.in b/Makefile.in
> index aa41730528a..28539a45372 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -387,7 +387,7 @@ MAKEINFO = @MAKEINFO@
>  EXPECT = @EXPECT@
>  RUNTEST = @RUNTEST@
>
> -AUTO_PROFILE = gcc-auto-profile -c 100
> +AUTO_PROFILE = gcc-auto-profile -c 1000
>
>  # This just becomes part of the MAKEINFO definition passed down to
>  # sub-makes.  It lets flags be given on the command line while still
> diff --git a/Makefile.tpl b/Makefile.tpl
> index 1ab65ac8ec4..126296fb49a 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -390,7 +390,7 @@ MAKEINFO = @MAKEINFO@
>  EXPECT = @EXPECT@
>  RUNTEST = @RUNTEST@
>
> -AUTO_PROFILE = gcc-auto-profile -c 100
> +AUTO_PROFILE = gcc-auto-profile -c 1000
>
>  # This just becomes part of the MAKEINFO definition passed down to
>  # sub-makes.  It lets flags be given on the command line while still
> --
> 2.19.1
>


Re: [PATCH 2/3] Fix autoprofiledbootstrap

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 9:20 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> autoprofiledbootstrap fails currently with
>
> In file included from ../../gcc/gcc/hash-table.h:236,
>  from ../../gcc/gcc/coretypes.h:440,
>  from ../../gcc/gcc/ipa-devirt.c:110:
> In static member function 'static void va_heap::release(vec vl_embed>*&) [with T = tree_node*]',
> inlined from 'void vec::release() [with T = tree_node*]' at 
> ../../gcc/gcc/vec.h:1679:20,
> inlined from 'auto_vec::~auto_vec() [with T = tree_node*; long 
> unsigned int N = 8]' at ../../gcc/gcc/vec.h:1436:5,
> inlined from 'vec possible_polymorphic_call_targets(tree, 
> long int, ipa_polymorphic_call_context, bool*, void**, bool)' at 
> ../../gcc/gcc/ipa-devirt.c:3099:22:
> ../../gcc/gcc/vec.h:311:10: error: attempt to free a non-heap object 
> 'bases_to_consider' [-Werror=free-nonheap-object]
>   311 |   ::free (v);
>   |   ~~~^~~
> ../../gcc/gcc/vec.h:311:10: error: attempt to free a non-heap object 
> 'bases_to_consider' [-Werror=free-nonheap-object]
> cc1plus: all warnings being treated as errors
>
> The problem is that auto_vec uses a variable to keep track if the vector
> is on the heap or auto. Normally this gets constant resolved, but only
> when the right functions are inlined. With autofdo for some reason
> the compiler decides to not inline these vec functions, even though
> they are marked as "inline"
>
> Mark them as ALWAYS_INLINE instead.

This might fix your case but I think it only papers over the issue.  Consider

 auto_vec<...> vec;
 not-inlined-foo (vec);

where the function can end up re-allocating the vector.  I think the more
appropriate fix is to add #pragma GCC diagnostic pus/pop and
ignored "-Wfree-nonheap-object" around the inline function (and hope
for the best that works in the inlined contexts...)

Richard.

> gcc/:
>
> 2019-01-14  Andi Kleen  
>
> * vec.h (using_auto_storage, release): Mark as ALWAYS_INLINE.
> ---
>  gcc/vec.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/vec.h b/gcc/vec.h
> index 407269c5ad3..1f5b78b1fac 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -1664,7 +1664,7 @@ vec::create (unsigned nelems 
> MEM_STAT_DECL)
>  /* Free the memory occupied by the embedded vector.  */
>
>  template
> -inline void
> +ALWAYS_INLINE void
>  vec::release (void)
>  {
>if (!m_vec)
> @@ -1940,7 +1940,7 @@ vec::reverse (void)
>  }
>
>  template
> -inline bool
> +ALWAYS_INLINE bool
>  vec::using_auto_storage () const
>  {
>return m_vec->m_vecpfx.m_using_auto_storage;
> --
> 2.19.1
>


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Jakub Jelinek
On Mon, Jan 14, 2019 at 09:29:03AM +0100, Richard Biener wrote:
> So why is this not just
> 
>   return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
> 
> or with the casts elided?  Does the C++ standard say pointers are
> to be compared unsigned here?  Or do all targets GCC support
> lay out the address space in a way that this is correct for pointers
> into distinct objects?

See PR78420 for details on why it is done that way.

Jakub


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Ville Voutilainen
On Mon, 14 Jan 2019 at 10:29, Richard Biener  wrote:
> >_GLIBCXX14_CONSTEXPR bool
> >operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
> >{
> > +#if __cplusplus >= 201402L
> > +#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
> > +   if (__builtin_is_constant_evaluated())
> > +#else
> > if (__builtin_constant_p (__x > __y))
> > +#endif
> >   return __x > __y;
> > +#endif
> > return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
>
> I wonder what the idea behind this is.  It smells like trying to avoid
> undefined behavior (relational compare of pointers to different objects?)
> but then executing that nevertheless when "constant"?
>
> I think this just doesn't work since the compiler, when evaluating
> __x > __y [for constant folding] is exploiting the fact that doing
> non-equality compares on pointers into different objects invoke
> undefined behavior.

When that happens, the function is ill-formed when constant-evaluated,
which is fine.
When the comparison is not UB, it should constant-evaluate without problems.

> So why is this not just
>   return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
> or with the casts elided?

Those casts are reinterpret_casts, so the function could never be
constant-evaluated.
The casts need to be there to avoid UB for the run-time cases.


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-14 Thread Richard Biener
On Thu, Jan 10, 2019 at 10:02 AM Jakub Jelinek  wrote:
>
> Hi!
>
> In Marc's testcase, we generate terrible code for std::string assignment,
> because the __builtin_constant_p is kept in the IL for way too long and the
> optimizers (jump threading?) create way too many copies of the
> memcpy/memmove calls that it is then hard to bring it back in sanitity.
> On the testcase in the PR, GCC 7 emits on x86_64 with -O2 99 bytes long
> function, GCC 9 unpatched 259 bytes long, with this patch it emits
> 139 bytes long, better but still not as good as before.  I guess we'll need
> to improve GIMPLE optimizers too, but having twice as small IL for these
> heavily used operators where e.g. _M_disjunct uses two of them and we wind
> up with twice as many branches because of that is IMHO very useful.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 1) I'm not really sure about proper formatting in libstdc++, I thought you
>don't use space before ( in function calls, but then why is there a space
>in __builtin_constant_p?
> 2) not really sure about that #if __cplusplus >= 201402L either, I think we
>don't really want to use __builtin_is_constant_evaluated at least in
>C++98 code, but even in C++11, if the operator isn't constexpr, is there
>any point trying to help it do the right thing in constexpr contexts?
>
> 2019-01-09  Jakub Jelinek  
>
> PR tree-optimization/88775
> * include/bits/stl_function.h (greater<_Tp*>::operator(),
> less<_Tp*>::operator(), greater_equal<_Tp*>::operator(),
> less_equal<_Tp*>::operator()): Use __builtin_is_constant_evaluated
> instead of __builtin_constant_p if available.  Don't bother with
> the pointer comparison in C++11 and earlier.
>
> --- libstdc++-v3/include/bits/stl_function.h.jj 2019-01-01 12:45:51.182541077 
> +0100
> +++ libstdc++-v3/include/bits/stl_function.h2019-01-09 23:15:34.824800676 
> +0100
> @@ -413,8 +413,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX14_CONSTEXPR bool
>operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
>{
> +#if __cplusplus >= 201402L
> +#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
> +   if (__builtin_is_constant_evaluated())
> +#else
> if (__builtin_constant_p (__x > __y))
> +#endif
>   return __x > __y;
> +#endif
> return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;

I wonder what the idea behind this is.  It smells like trying to avoid
undefined behavior (relational compare of pointers to different objects?)
but then executing that nevertheless when "constant"?

I think this just doesn't work since the compiler, when evaluating
__x > __y [for constant folding] is exploiting the fact that doing
non-equality compares on pointers into different objects invoke
undefined behavior.

So why is this not just

  return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;

or with the casts elided?  Does the C++ standard say pointers are
to be compared unsigned here?  Or do all targets GCC support
lay out the address space in a way that this is correct for pointers
into distinct objects?

Richard.


>}
>  };
> @@ -426,8 +432,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX14_CONSTEXPR bool
>operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
>{
> +#if __cplusplus >= 201402L
> +#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
> +   if (__builtin_is_constant_evaluated())
> +#else
> if (__builtin_constant_p (__x < __y))
> +#endif
>   return __x < __y;
> +#endif
> return (__UINTPTR_TYPE__)__x < (__UINTPTR_TYPE__)__y;
>}
>  };
> @@ -439,8 +451,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX14_CONSTEXPR bool
>operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
>{
> +#if __cplusplus >= 201402L
> +#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
> +   if (__builtin_is_constant_evaluated())
> +#else
> if (__builtin_constant_p (__x >= __y))
> +#endif
>   return __x >= __y;
> +#endif
> return (__UINTPTR_TYPE__)__x >= (__UINTPTR_TYPE__)__y;
>}
>  };
> @@ -452,8 +470,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_GLIBCXX14_CONSTEXPR bool
>operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
>{
> +#if __cplusplus >= 201402L
> +#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
> +   if (__builtin_is_constant_evaluated())
> +#else
> if (__builtin_constant_p (__x <= __y))
> +#endif
>   return __x <= __y;
> +#endif
> return (__UINTPTR_TYPE__)__x <= (__UINTPTR_TYPE__)__y;
>}
>  };
>
> Jakub


[PATCH 1/3] Lower sampling rate for autofdo bootstrap

2019-01-14 Thread Andi Kleen
From: Andi Kleen 

autofdo create_gcov uses a lot of memory for large sample files.
Since gcc runs quite long the sample files generated during
the bootstrap are fairly ig.

Currently I can't even build make autoprofiledbootstrap on my system at
home because create_gcov needs more than 12GB and runs out of memory.

This should probably be fixed in create_gcov, but for now
lowering the sampling rate works well enough for me. The bootstrap
run is long enough that it gets good enough data in any case.

gcc/:
2019-01-14  Andi Kleen  

* Makefile.in: Lower autofdo sampling rate by 10x.
* Makefile.tpl: Dito.
---
 Makefile.in  | 2 +-
 Makefile.tpl | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index aa41730528a..28539a45372 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -387,7 +387,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 100
+AUTO_PROFILE = gcc-auto-profile -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
diff --git a/Makefile.tpl b/Makefile.tpl
index 1ab65ac8ec4..126296fb49a 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -390,7 +390,7 @@ MAKEINFO = @MAKEINFO@
 EXPECT = @EXPECT@
 RUNTEST = @RUNTEST@
 
-AUTO_PROFILE = gcc-auto-profile -c 100
+AUTO_PROFILE = gcc-auto-profile -c 1000
 
 # This just becomes part of the MAKEINFO definition passed down to
 # sub-makes.  It lets flags be given on the command line while still
-- 
2.19.1



[PATCH 3/3] Increase iterations for autofdo tests

2019-01-14 Thread Andi Kleen
From: Andi Kleen 

Bin cheng pointed out that the autofdo tests are unstable because they
don't have enough iterations for the perf sampling to get enough data.

Increase the iterations, but only for autofdo. This avoids any impact
on targets that use a slow emulator, which will never run the host
only autofdo tests.

gcc/testsuite/:

2019-01-14  Andi Kleen  

* g++.dg/tree-prof/morefunc.C (ITER): Add.
(test1): Use.
(test2): Use.
* gcc.dg/tree-prof/cold_partition_label.c (ITER): Add.
(main): Use.
* gcc.dg/tree-prof/crossmodule-indircall-1.c (ITER): Add.
(main): Use
* gcc.dg/tree-prof/indir-call-prof.c (ITER): Add.
(main): Use.
* gcc.dg/tree-prof/peel-1.c (ITER): Add.
(t): Use.
(main): Use.
* gcc.dg/tree-prof/pr52027.c (ITER): Add.
(main): Use.
* gcc.dg/tree-prof/tracer-1.c (ITER): Add.
(main): Use.
* gcc.dg/tree-prof/unroll-1.c (ITER): Add.
(t): Use.
(main): Use.
* gcc.dg/tree-prof/update-cunroll-2.c (ITER): Add.
(main): Use.
* lib/profopt.exp: Pass -DITER to autofdo compilations.
---
 gcc/testsuite/g++.dg/tree-prof/morefunc.C  |  8 ++--
 gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c  |  6 +-
 .../gcc.dg/tree-prof/crossmodule-indircall-1.c | 10 +++---
 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c   |  6 +-
 gcc/testsuite/gcc.dg/tree-prof/peel-1.c| 10 +++---
 gcc/testsuite/gcc.dg/tree-prof/pr52027.c   |  6 +-
 gcc/testsuite/gcc.dg/tree-prof/tracer-1.c  |  7 ++-
 gcc/testsuite/gcc.dg/tree-prof/unroll-1.c  | 10 +++---
 gcc/testsuite/gcc.dg/tree-prof/update-cunroll-2.c  |  8 ++--
 gcc/testsuite/lib/profopt.exp  |  4 ++--
 10 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C 
b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
index a9bdc167f45..02b01c073e9 100644
--- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
+++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
@@ -2,6 +2,10 @@
 #include "reorder_class1.h"
 #include "reorder_class2.h"
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int g;
 
 #ifdef _PROFILE_USE
@@ -19,7 +23,7 @@ static __attribute__((always_inline))
 void test1 (A *tc)
 {
   int i;
-  for (i = 0; i < 1000; i++)
+  for (i = 0; i < ITER; i++)
  g += tc->foo(); 
if (g<100) g++;
 }
@@ -28,7 +32,7 @@ static __attribute__((always_inline))
 void test2 (B *tc)
 {
   int i;
-  for (i = 0; i < 100; i++)
+  for (i = 0; i < ITER; i++)
  g += tc->foo();
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c 
b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 450308d6407..099069da6a7 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -9,6 +9,10 @@ const char *sarr[SIZE];
 const char *buf_hot;
 const char *buf_cold;
 
+#ifndef ITER
+#define ITER 100
+#endif
+
 __attribute__((noinline))
 void 
 foo (int path)
@@ -32,7 +36,7 @@ main (int argc, char *argv[])
   int i;
   buf_hot =  "hello";
   buf_cold = "world";
-  for (i = 0; i < 100; i++)
+  for (i = 0; i < ITER; i++)
 foo (argc);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
index 58109d54dc7..32d22c69c6c 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
@@ -2,6 +2,10 @@
 /* { dg-additional-sources "crossmodule-indircall-1a.c" } */
 /* { dg-options "-O3 -flto -DDOJOB=1" } */
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int a;
 extern void (*p[2])(int n);
 void abort (void);
@@ -10,12 +14,12 @@ main()
 { int i;
 
   /* This call shall be converted.  */
-  for (i = 0;i<1000;i++)
+  for (i = 0;i

[PATCH 2/3] Fix autoprofiledbootstrap

2019-01-14 Thread Andi Kleen
From: Andi Kleen 

autoprofiledbootstrap fails currently with

In file included from ../../gcc/gcc/hash-table.h:236,
 from ../../gcc/gcc/coretypes.h:440,
 from ../../gcc/gcc/ipa-devirt.c:110:
In static member function 'static void va_heap::release(vec*&) [with T = tree_node*]',
inlined from 'void vec::release() [with T = tree_node*]' at 
../../gcc/gcc/vec.h:1679:20,
inlined from 'auto_vec::~auto_vec() [with T = tree_node*; long 
unsigned int N = 8]' at ../../gcc/gcc/vec.h:1436:5,
inlined from 'vec possible_polymorphic_call_targets(tree, 
long int, ipa_polymorphic_call_context, bool*, void**, bool)' at 
../../gcc/gcc/ipa-devirt.c:3099:22:
../../gcc/gcc/vec.h:311:10: error: attempt to free a non-heap object 
'bases_to_consider' [-Werror=free-nonheap-object]
  311 |   ::free (v);
  |   ~~~^~~
../../gcc/gcc/vec.h:311:10: error: attempt to free a non-heap object 
'bases_to_consider' [-Werror=free-nonheap-object]
cc1plus: all warnings being treated as errors

The problem is that auto_vec uses a variable to keep track if the vector
is on the heap or auto. Normally this gets constant resolved, but only
when the right functions are inlined. With autofdo for some reason
the compiler decides to not inline these vec functions, even though
they are marked as "inline"

Mark them as ALWAYS_INLINE instead.

gcc/:

2019-01-14  Andi Kleen  

* vec.h (using_auto_storage, release): Mark as ALWAYS_INLINE.
---
 gcc/vec.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/vec.h b/gcc/vec.h
index 407269c5ad3..1f5b78b1fac 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1664,7 +1664,7 @@ vec::create (unsigned nelems 
MEM_STAT_DECL)
 /* Free the memory occupied by the embedded vector.  */
 
 template
-inline void
+ALWAYS_INLINE void
 vec::release (void)
 {
   if (!m_vec)
@@ -1940,7 +1940,7 @@ vec::reverse (void)
 }
 
 template
-inline bool
+ALWAYS_INLINE bool
 vec::using_auto_storage () const
 {
   return m_vec->m_vecpfx.m_using_auto_storage;
-- 
2.19.1



Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2019-01-14 Thread Bin.Cheng
On Mon, Jan 14, 2019 at 4:07 PM Andi Kleen  wrote:
>
> Bin Cheng,
>
> I did some testing on this now. The attached patch automatically increases 
> the iterations
> for autofdo profiles.
Hi Andi, thanks very much for tuning these.
>
> But even with even more iterations I still have stable failures in
>
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ 
> \ta-zA-Z0-0]+foo[._]+cold
I think these two are supposed to fail with current code base.
> FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call -> 
> direct call.* a1 transformation on insn"
I also got unstable pass/fail for indirect call optimization when
tuning iterations, and haven't got an iteration number which passes
all the time.  I guess we need to combine decreasing of sampling count
here.
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> times"
This one should fail too.

Thanks,
bin
>
> Did these really ever work for you?
>
> -Andi
>
>
> diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C 
> b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> index a9bdc167f45..02b01c073e9 100644
> --- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> +++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
> @@ -2,6 +2,10 @@
>  #include "reorder_class1.h"
>  #include "reorder_class2.h"
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int g;
>
>  #ifdef _PROFILE_USE
> @@ -19,7 +23,7 @@ static __attribute__((always_inline))
>  void test1 (A *tc)
>  {
>int i;
> -  for (i = 0; i < 1000; i++)
> +  for (i = 0; i < ITER; i++)
>   g += tc->foo();
> if (g<100) g++;
>  }
> @@ -28,7 +32,7 @@ static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
>int i;
> -  for (i = 0; i < 100; i++)
> +  for (i = 0; i < ITER; i++)
>   g += tc->foo();
>  }
>
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c 
> b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> index 450308d6407..099069da6a7 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
> @@ -9,6 +9,10 @@ const char *sarr[SIZE];
>  const char *buf_hot;
>  const char *buf_cold;
>
> +#ifndef ITER
> +#define ITER 100
> +#endif
> +
>  __attribute__((noinline))
>  void
>  foo (int path)
> @@ -32,7 +36,7 @@ main (int argc, char *argv[])
>int i;
>buf_hot =  "hello";
>buf_cold = "world";
> -  for (i = 0; i < 100; i++)
> +  for (i = 0; i < ITER; i++)
>  foo (argc);
>return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> index 58109d54dc7..32d22c69c6c 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
> @@ -2,6 +2,10 @@
>  /* { dg-additional-sources "crossmodule-indircall-1a.c" } */
>  /* { dg-options "-O3 -flto -DDOJOB=1" } */
>
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
>  int a;
>  extern void (*p[2])(int n);
>  void abort (void);
> @@ -10,12 +14,12 @@ main()
>  { int i;
>
>/* This call shall be converted.  */
> -  for (i = 0;i<1000;i++)
> +  for (i = 0;i  p[0](1);
>/* This call shall not be converted.  */
> -  for (i = 0;i<1000;i++)
> +  for (i = 0;i  p[i%2](2);
> -  if (a != 1000)
> +  if (a != ITER)
>  abort ();
>
>return 0;
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c 
> b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> index 53063c3e7fa..8b9dfbb78c7 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof.c
> @@ -1,5 +1,9 @@
>  /* { dg-options "-O2 -fdump-tree-optimized -fdump-ipa-profile 
> -fdump-ipa-afdo" } */
>
> +#ifndef ITER
> +#define ITER 10
> +#endif
> +
>  static int a1 (void)
>  {
>  return 10;
> @@ -28,7 +32,7 @@ main (void)
>int (*p) (void);
>int  i;
>
> -  for (i = 0; i < 1000; i ++)
> +  for (i = 0; i < ITER*100; i++)
>  {
> setp (, i);
> p ();
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c 
> b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
> index 7245b68c1ee..b6ed178e1ad 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/peel-1.c
> @@ -1,13 +1,17 @@
>  /* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops 
> -fpeel-loops" } */
>  void abort();
>
> -int a[1000];
> +#ifndef ITER
> +#define ITER 1000
> +#endif
> +
> +int a[ITER];
>  int
>  __attribute__ ((noinline))
>  t()
>  {
>int i;
> -  for (i=0;i<1000;i++)
> +  for (i=0;i  if (!a[i])
>return 1;
>abort ();
> @@ -16,7 +20,7 @@ int
>  main()
>  {
>int i;
> -  for (i=0;i<1000;i++)
> +  for (i=0;i  t();
>return 0;
>  }
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr52027.c 
> b/gcc/testsuite/gcc.dg/tree-prof/pr52027.c
> index c46a14b2e86..bf2a83a336d 

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2019-01-14 Thread Andi Kleen
Bin Cheng,

I did some testing on this now. The attached patch automatically increases the 
iterations
for autofdo profiles.

But even with even more iterations I still have stable failures in 

FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ 
\ta-zA-Z0-0]+foo[._]+cold
FAIL: gcc.dg/tree-prof/indir-call-prof.c scan-ipa-dump afdo "Indirect call -> 
direct call.* a1 transformation on insn"
FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 times"

Did these really ever work for you? 

-Andi


diff --git a/gcc/testsuite/g++.dg/tree-prof/morefunc.C 
b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
index a9bdc167f45..02b01c073e9 100644
--- a/gcc/testsuite/g++.dg/tree-prof/morefunc.C
+++ b/gcc/testsuite/g++.dg/tree-prof/morefunc.C
@@ -2,6 +2,10 @@
 #include "reorder_class1.h"
 #include "reorder_class2.h"
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int g;
 
 #ifdef _PROFILE_USE
@@ -19,7 +23,7 @@ static __attribute__((always_inline))
 void test1 (A *tc)
 {
   int i;
-  for (i = 0; i < 1000; i++)
+  for (i = 0; i < ITER; i++)
  g += tc->foo(); 
if (g<100) g++;
 }
@@ -28,7 +32,7 @@ static __attribute__((always_inline))
 void test2 (B *tc)
 {
   int i;
-  for (i = 0; i < 100; i++)
+  for (i = 0; i < ITER; i++)
  g += tc->foo();
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c 
b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 450308d6407..099069da6a7 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -9,6 +9,10 @@ const char *sarr[SIZE];
 const char *buf_hot;
 const char *buf_cold;
 
+#ifndef ITER
+#define ITER 100
+#endif
+
 __attribute__((noinline))
 void 
 foo (int path)
@@ -32,7 +36,7 @@ main (int argc, char *argv[])
   int i;
   buf_hot =  "hello";
   buf_cold = "world";
-  for (i = 0; i < 100; i++)
+  for (i = 0; i < ITER; i++)
 foo (argc);
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
index 58109d54dc7..32d22c69c6c 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indircall-1.c
@@ -2,6 +2,10 @@
 /* { dg-additional-sources "crossmodule-indircall-1a.c" } */
 /* { dg-options "-O3 -flto -DDOJOB=1" } */
 
+#ifndef ITER
+#define ITER 1000
+#endif
+
 int a;
 extern void (*p[2])(int n);
 void abort (void);
@@ -10,12 +14,12 @@ main()
 { int i;
 
   /* This call shall be converted.  */
-  for (i = 0;i<1000;i++)
+  for (i = 0;i