date:20170117

[PATCH][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-17 Thread Maxim Ostapenko


Hi,

as was figured out in PR LTO + ASan raises false initialization order 
fiasco alarm due to in LTO case main_input_filename doesn't match module 
name passed to __asan_before_dynamic_init.
Following Jakub's suggestion I used TRANSLATION_UNIT_DECL for 
corresponding globals to overcome this issue (I needed to create a 
source location for each TRANSLATION_UNIT_DECL).
However, when testing, I hit on a nasty issue: for some reason 
linemap_ordinary_map_lookup, called from lto_output_location for given 
TRANSLATION_UNIT_DECL, hit an assert:


[...]
  linemap_assert (line >= MAP_START_LOCATION (result));
  return result;
}

due to line == 2.

After some investigation I noticed that source locations are propagated 
through location cache that can be partially invalidated by 
lto_location_cache::revert_location_cache call. And that was my case: 
after adding source location for TRANSLATION_UNIT_DECL into location 
cache, it was reverted by calling 
lto_location_cache::revert_location_cache from unify_scc before it was 
accepted:


static void
lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
vec resolutions)
{
[...]
  /* Try to unify the SCC with already existing ones.  */
  if (!flag_ltrans
  && unify_scc (data_in, from,
len, scc_entry_len, scc_hash))
continue;

For now I can overcome it by calling 
location_cache.accept_location_cache for TRANSLATION_UNIT_DECL, but I 
wonder if more reliable fix is possible.


Attached patch fixes the issue mentioned in PR and passes regression 
testing and LTO bootstrap on x86_64-unknown-linux-gnu.

Could you please take a look on it?

-Maxim
gcc/lto/ChangeLog:

2017-01-17  Maxim Ostapenko  

	* lto.c (lto_read_decls): accept location cache for
	TRANSLATION_UNIT_DECL.

gcc/testsuite/ChangeLog:

2017-01-17  Maxim Ostapenko  

	* gcc.dg/cpp/mi1.c: Adjust testcase.
	* gcc.dg/pch/cpp-3.c: Likewise.

gcc/ChangeLog:

2017-01-17  Maxim Ostapenko  

	* asan.c (get_translation_unit_decl): New function.
	(asan_add_global): Extract modules file name from globals
	TRANSLATION_UNIT_DECL in lto mode.
	* tree.c (build_translation_unit_decl): Add source location for newly
	built TRANSLATION_UNIT_DECL.

diff --git a/gcc/asan.c b/gcc/asan.c
index 7450044..9a59fe4 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2372,6 +2372,22 @@ asan_needs_odr_indicator_p (tree decl)
 	  && TREE_PUBLIC (decl));
 }
 
+/* For given DECL return its corresponding TRANSLATION_UNIT_DECL.  */
+
+static const_tree
+get_translation_unit_decl (tree decl)
+{
+  const_tree context = decl;
+  while (context && TREE_CODE (context) != TRANSLATION_UNIT_DECL)
+{
+  if (TREE_CODE (context) == BLOCK)
+	context = BLOCK_SUPERCONTEXT (context);
+  else
+	context = get_containing_scope (context);
+}
+  return context;
+}
+
 /* Append description of a single global DECL into vector V.
TYPE is __asan_global struct type as returned by asan_global_struct.  */
 
@@ -2391,7 +2407,14 @@ asan_add_global (tree decl, tree type, vec *v)
 pp_string (_pp, "");
   str_cst = asan_pp_string (_pp);
 
-  pp_string (_name_pp, main_input_filename);
+  const char *filename = main_input_filename;
+  if (in_lto_p)
+{
+  const_tree translation_unit_decl = get_translation_unit_decl (decl);
+  if (translation_unit_decl)
+	filename = DECL_SOURCE_FILE (translation_unit_decl);
+}
+  pp_string (_name_pp, filename);
   module_name_cst = asan_pp_string (_name_pp);
 
   if (asan_needs_local_alias (decl))
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index d77d85d..c65e7cd 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1707,7 +1707,13 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
 	  && (TREE_CODE (first) == IDENTIFIER_NODE
 		  || TREE_CODE (first) == INTEGER_CST
 		  || TREE_CODE (first) == TRANSLATION_UNIT_DECL))
-	continue;
+	{
+	  /* For TRANSLATION_UNIT_DECL we need to accept location cache now
+	 to avoid possible reverting during following unify_scc call.  */
+	  if (TREE_CODE (first) == TRANSLATION_UNIT_DECL)
+		data_in->location_cache.accept_location_cache ();
+	  continue;
+	}
 
 	  /* Try to unify the SCC with already existing ones.  */
 	  if (!flag_ltrans
diff --git a/gcc/testsuite/gcc.dg/cpp/mi1.c b/gcc/testsuite/gcc.dg/cpp/mi1.c
index 0cfedad..9817431 100644
--- a/gcc/testsuite/gcc.dg/cpp/mi1.c
+++ b/gcc/testsuite/gcc.dg/cpp/mi1.c
@@ -13,7 +13,7 @@
 
 /* { dg-do compile }
{ dg-options "-H" }
-   { dg-message "mi1c\.h\n\[^\n\]*mi1cc\.h\n\[^\n\]*mi1nd\.h\n\[^\n\]*mi1ndp\.h\n\[^\n\]*mi1x\.h" "redundant include check" { target *-*-* } 0 } */
+   { dg-message "mi1c\.h\n\[^\n\]*mi1cc\.h\n\[^\n\]*mi1nd\.h\n\[^\n\]*mi1ndp\.h\n\[^\n\]*mi1x\.h\n\[^\n\]*mi1\.c" "redundant include check" { target *-*-* } 0 } */
 
 #include "mi1c.h"
 #include "mi1c.h"

Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-17 Thread Jeff Law


On 01/17/2017 08:16 PM, Martin Sebor wrote:

On 01/17/2017 12:38 AM, Jakub Jelinek wrote:

On Mon, Jan 16, 2017 at 05:06:40PM -0700, Martin Sebor wrote:

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.


I fear this is going to break various 32-bit database programs and
similar
that mmap say 3GB of RAM and then work on that memory chunk as
contiguous.
Some things don't work too well in that case (pointer differences),
but it
is unlikely they would be using those, while your patch actively
breaks it
even for loops that can be transformed into memset (memcpy of course
isn't a
problem, because you need some virtual address space to copy it from).


I agree that breaking those applications would be bad.  It could
be dealt with by adding an option to let them disable the insertion
of the trap.  With the warning, programmers would get a heads up
that their (already dubious) code won't work otherwise.  I don't
think it's a necessary or wise to have the default mode be the most
permissive (and most dangerous) and expect users to tweak options
to make it safe.  Rather, I would argue that it should be the other
way around.  Make the default safe and strict and let the advanced
users who know how deal with the risks tweak those options.
I still come back to the assertion that changing this loop to a mem* is 
fundamentally the wrong thing to do as it changes something that has 
well defined semantics to something that is invalid.


Thus the transformation into a mem* call is invalid.
jeff

Re: [PATCH 2/2] [msp430] Remove mpy.o from libgcc

2017-01-17 Thread DJ Delorie


Committed.  Thanks!

Re: libgo patch committed: Update to Go1.8rc1

2017-01-17 Thread Ian Lance Taylor

On Mon, Jan 16, 2017 at 7:21 AM, Rainer Orth
 wrote:
>
> I'm getting further on Solaris now, but the build still fails:

I committed this patch to fix the remaining build problems on Solaris.
Bootstrapped and ran some of the Go tests on i386-sun-solaris11 and
x86_64-pc-linux-gnu.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 244484)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-223cba75b947afc1ee5a13a60c15c66f6ff355c1
+2b3d389f961b8461b3fdf42318a628f68b56f8b1
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/golang_org/x/net/lif/link.go
===
--- libgo/go/golang_org/x/net/lif/link.go   (revision 244456)
+++ libgo/go/golang_org/x/net/lif/link.go   (working copy)
@@ -84,7 +84,7 @@ func links(eps []endpoint, name string)
b := make([]byte, lifn.Count*sizeofLifreq)
lifc.Family = uint16(ep.af)
lifc.Len = lifn.Count * sizeofLifreq
-   littleEndian.PutUint64(lifc.Lifcu[:], 
uint64(uintptr(unsafe.Pointer([0]
+   lifc.Lifcu = unsafe.Pointer([0])
ioc = int64(sysSIOCGLIFCONF)
if err := ioctl(ep.s, uintptr(ioc), unsafe.Pointer()); err 
!= nil {
continue
Index: libgo/go/golang_org/x/net/lif/syscall.go
===
--- libgo/go/golang_org/x/net/lif/syscall.go(revision 244456)
+++ libgo/go/golang_org/x/net/lif/syscall.go(working copy)
@@ -11,23 +11,12 @@ import (
"unsafe"
 )
 
-//go:cgo_import_dynamic libc_ioctl ioctl "libc.so"
-
-//go:linkname procIoctl libc_ioctl
-
-var procIoctl uintptr
-
-func sysvicall6(trap, nargs, a1, a2, a3, a4, a5, a6 uintptr) (uintptr, 
uintptr, syscall.Errno)
-
-// TODO: replace with runtime.KeepAlive when available
-//go:noescape
-func keepAlive(p unsafe.Pointer)
+//extern __go_ioctl_ptr
+func libc_ioctl(int32, int32, unsafe.Pointer) int32
 
 func ioctl(s, ioc uintptr, arg unsafe.Pointer) error {
-   _, _, errno := sysvicall6(uintptr(unsafe.Pointer()), 3, s, 
ioc, uintptr(arg), 0, 0, 0)
-   keepAlive(arg)
-   if errno != 0 {
-   return error(errno)
+   if libc_ioctl(int32(s), int32(ioc), arg) < 0 {
+   return syscall.GetErrno()
}
return nil
 }
Index: libgo/go/golang_org/x/net/lif/zsys_solaris.go
===
--- libgo/go/golang_org/x/net/lif/zsys_solaris.go   (revision 244456)
+++ libgo/go/golang_org/x/net/lif/zsys_solaris.go   (working copy)
@@ -3,6 +3,8 @@
 
 package lif
 
+import "unsafe"
+
 const (
sysAF_UNSPEC = 0x0
sysAF_INET   = 0x2
@@ -59,15 +61,11 @@ const (
 )
 
 const (
-   sizeofLifnum   = 0xc
sizeofLifreq   = 0x178
-   sizeofLifconf  = 0x18
-   sizeofLifIfinfoReq = 0x10
 )
 
 type sysLifnum struct {
Familyuint16
-   Pad_cgo_0 [2]byte
Flags int32
Count int32
 }
@@ -81,16 +79,13 @@ type lifreq struct {
 
 type lifconf struct {
Familyuint16
-   Pad_cgo_0 [2]byte
Flags int32
Len   int32
-   Pad_cgo_1 [4]byte
-   Lifcu [8]byte
+   Lifcu unsafe.Pointer
 }
 
 type lifIfinfoReq struct {
Maxhops  uint8
-   Pad_cgo_0[3]byte
Reachtimeuint32
Reachretrans uint32
Maxmtu   uint32
Index: libgo/go/golang_org/x/net/lif/zsys_solaris_amd64.go
===
--- libgo/go/golang_org/x/net/lif/zsys_solaris_amd64.go (revision 244166)
+++ libgo/go/golang_org/x/net/lif/zsys_solaris_amd64.go (working copy)
@@ -1,103 +0,0 @@
-// Created by cgo -godefs - DO NOT EDIT
-// cgo -godefs defs_solaris.go
-
-package lif
-
-const (
-   sysAF_UNSPEC = 0x0
-   sysAF_INET   = 0x2
-   sysAF_INET6  = 0x1a
-
-   sysSOCK_DGRAM = 0x1
-)
-
-type sockaddrStorage struct {
-   Family uint16
-   X_ss_pad1  [6]int8
-   X_ss_align float64
-   X_ss_pad2  [240]int8
-}
-
-const (
-   sysLIFC_NOXMIT  = 0x1
-   sysLIFC_EXTERNAL_SOURCE = 0x2
-   sysLIFC_TEMPORARY   = 0x4
-   sysLIFC_ALLZONES= 0x8
-   sysLIFC_UNDER_IPMP  = 0x10
-   sysLIFC_ENABLED = 0x20
-
-   sysSIOCGLIFADDR= -0x3f87968f
-   sysSIOCGLIFDSTADDR = -0x3f87968d
-   sysSIOCGLIFFLAGS   = -0x3f87968b
-   sysSIOCGLIFMTU = -0x3f879686
-   sysSIOCGLIFNETMASK = -0x3f879683
-   sysSIOCGLIFMETRIC  = -0x3f879681
-   sysSIOCGLIFNUM = -0x3ff3967e
-   sysSIOCGLIFINDEX   = -0x3f87967b
-   sysSIOCGLIFSUBNET  = -0x3f879676
-   sysSIOCGLIFLNKINFO = -0x3f879674
-   sysSIOCGLIFCONF=

Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-17 Thread Martin Sebor


On 01/17/2017 12:38 AM, Jakub Jelinek wrote:

On Mon, Jan 16, 2017 at 05:06:40PM -0700, Martin Sebor wrote:

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.


I fear this is going to break various 32-bit database programs and similar
that mmap say 3GB of RAM and then work on that memory chunk as contiguous.
Some things don't work too well in that case (pointer differences), but it
is unlikely they would be using those, while your patch actively breaks it
even for loops that can be transformed into memset (memcpy of course isn't a
problem, because you need some virtual address space to copy it from).


I agree that breaking those applications would be bad.  It could
be dealt with by adding an option to let them disable the insertion
of the trap.  With the warning, programmers would get a heads up
that their (already dubious) code won't work otherwise.  I don't
think it's a necessary or wise to have the default mode be the most
permissive (and most dangerous) and expect users to tweak options
to make it safe.  Rather, I would argue that it should be the other
way around.  Make the default safe and strict and let the advanced
users who know how deal with the risks tweak those options.

Martin

Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-17 Thread Martin Sebor


On 01/17/2017 10:57 AM, Jeff Law wrote:

On 01/17/2017 09:12 AM, Martin Sebor wrote:

On 01/17/2017 08:26 AM, Jeff Law wrote:

On 01/16/2017 05:06 PM, Martin Sebor wrote:

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.

But doesn't the creation of the bogus memset signal an invalid
transformation in the loop optimizer?  ie, if we're going to convert a
loop into a memset, then we'd damn well better be sure the loop bounds
are reasonable.


I'm not sure that emitting the memset call is necessarily a bug in
the loop optimizer (which in all likelihood wasn't written with
the goal of preventing or detecting possible buffer overflows).
The loop with the excessive bound is in the source code and can
be reached given the right inputs (calling v.resize(v.size() - 1)
on an empty vector.  It's a lurking bug in the program that, if
triggered, will overflow the vector and crash the program (or worse)
with or without the optimization.

Right, but that doesn't mean that the loop optimizer can turn it into a
memset.  If the bounds are such that we're going to invoke undefined
behaviour from memset, then the loop optimizer must leave the loop alone.



What else could the loop optimizer could do in this instance?
I suppose it could just leave the loop alone and avoid emitting
the memset call.  That would avoid the warning but mask the
problem with the overflow.  In my mind, preventing the overflow
given that we have the opportunity is the right thing to do.
That is, after all, the goal of the warning.

The right warning in this case is WRT the loop iteration space
independent of mem*.


I agree that warning for the user code would be appropriate if
the loop with the excessive bound were unavoidable or at least
reachable.  But in the submitted test case it isn't because
the call to vector::resize() is guarded.  In fact, the out of-
bounds memset is also emitted with this modified test case:

  void f (std::vector )
  {
size_t n = v.size ();

if (n > 1 && n < 5)
  v.resize (n - 1);
  }

I believe the root cause of the the out-of-bounds memset is the
lack of support for pointer ranges or at least some notion of
their relationships and constraints.  A std::vector is defined
by three pointers that satisfy the following relation:

  start <= finish <= end_of_storage

Queries about the capacity and size of a vector are done in terms
of expressions involving these three pointers:

  size_type capacity () {
return end_of_storage - start;
  }

  size_type size () {
return finish - start;
  }

Space remaining is hand-coded as

  end_of_storage - finish

The trouble is that GCC has no idea about the constraints on
the three pointers and so it treats them essentially as unrelated
integers with no ranges (except for their scale that depends on
the type the pointer points to).

Absent support for pointer ranges, GCC would be able to generate
much better code with just some help from annotations telling it
about at least some of the constraints.  In this case, for example,
adding the following optimistic invariant:

   size_type space_left = end_of_storage - finish;

   if (space_left > SIZE_MAX / 2 / sizeof (T))
 __builtin_unreachable ();

to vector::_M_default_append() lets GCC avoid the memset and emit
significantly more efficient code.  On x86_64 it reduces the number
instructions for the test case by 40%.

Martin

[PATCH], Add support for PowerPC ISA 3.0 vector byte reverse instructions

2017-01-17 Thread Michael Meissner

This patch adds support for adding built-in functions for the ISA 3.0 vector
byte reverse instructions (XXBR{Q,D,W,H}).

The vec_revb built-in function follows the specifications in the OpenPOWER ABI
for Linux Supplement Power Architecture 64-Bit ELF V2 ABI and reverses the
bytes in each vector element.  I added a GCC extension, so that vec_revb of a
vector unsigned char, vector signed char, or vector char argument would act the
same as a vector __int128_t or vector __uint128_t (i.e. reverse all of the
bytes).

In the course of working on this patch, I tried to get the code in the rs6000.c
function altivec_expand_vec_perm_const to generate these instructions, but
there doesn't seem to be any caller of it to implement the permutes.  I will
open a bug on this altivec_expand_vec_perm_const, and if these patches are
checked in, I will try to add support for the instructions.

I have checked this on a little endian power8 system (64-bit only), a big
endian power8 system (64-bit only), and a big endian power7 system (both 32-bit
and 64-bit), and there were no regressions.  Can I check this into the trunk?

[gcc]
2017-01-17  Michael Meissner  

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
__builtin_vec_revb builtins.
* config/rs6000/rs6000-builtins.def (P9V_BUILTIN_XXBRQ_V16QI): Add
built-in functions to support generation of the ISA 3.0 XXBR
vector byte reverse instructions.
(P9V_BUILTIN_XXBRQ_V1TI): Likewise.
(P9V_BUILTIN_XXBRD_V2DI): Likewise.
(P9V_BUILTIN_XXBRD_V2DF): Likewise.
(P9V_BUILTIN_XXBGW_V4SI): Likewise.
(P9V_BUILTIN_XXBGW_V4SF): Likewise.
(P9V_BUILTIN_XXBGH_V8HI): Likewise.
(P9V_BUILTIN_VEC_REVB): Likewise.
* config/rs6000/vsx.md (p9_xxbrq_v1ti): New insns/expanders to
generate the ISA 3.0 XXBR vector byte reverse instructions.
(p9_xxbrq_v16qi): Likewise.
(p9_xxbrd_, VSX_D iterator): Likewise.
(p9_xxbrw_, VSX_W iterator): Likewise.
(p9_xxbrh_v8hi): Likewise.
* config/rs6000/altivec.h (vec_revb): Define if ISA 3.0.
* doc/extend.texi (RS/6000 Altivec Built-ins): Document the
vec_revb built-in functions.

[gcc/testsuite]
2017-01-13  Michael Meissner  

* gcc.target/powerpc/p9-xxbr-1.c: New test.
* gcc.target/powerpc/p9-xxbr-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 244382)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -5016,6 +5016,31 @@ const struct altivec_builtin_types altiv
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
 RS6000_BTI_unsigned_V16QI, 0 },
 
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
+RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
+RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
+RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
+RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
+RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
+RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DF,
+RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
+RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
+RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SF,
+RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
+RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
+RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 244382)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1932,6 +1932,14 @@ BU_P9V_64BIT_VSX_1 (VSESDP,  "scalar_extr
 BU_P9V_VSX_1 (VSTDCNDP,"scalar_test_neg_dp",   CONST,  xststdcnegdp)
 BU_P9V_VSX_1 (VSTDCNSP,"scalar_test_neg_sp",   CONST,  xststdcnegsp)
 
+BU_P9V_VSX_1 (XXBRQ_V16QI, "xxbrq_v16qi",  CONST,  p9_xxbrq_v16qi)
+BU_P9V_VSX_1 (XXBRQ_V1TI,  "xxbrq_v1ti",   CONST,

[PATCH] testcase for builtin expansion of strncmp and strcmp

2017-01-17 Thread Aaron Sawdey

This patch adds test gcc.dg/strncmp-2.c for builtin expansion of
strncmp and strcmp. This tests a couple more things such as differences
that occur after the zero byte, and something that glibc does which is
to call strncmp with SIZE_MAX for the length which looks for overflow
issues.

I've included interested parties from targets that have a strncmp
builtin.

The test passes on x86_64 and on ppc64le with -mcpu=power6. It will not
pass on ppc64/ppc64le -mcpu=power[78] until I check in my patch that
segher ack'd yesterday and is currently regtesting. OK for trunk?

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/testsuite/gcc.dg/strncmp-2.c
===
--- gcc/testsuite/gcc.dg/strncmp-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/strncmp-2.c	(working copy)
@@ -0,0 +1,715 @@
+/* Test strcmp/strncmp builtin expansion for compilation and proper execution.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target ptr32plus } */
+
+#include 
+#include 
+#include 
+#include 
+
+#define MAX_SZ 200
+
+static void test_driver_strcmp (void (test_func)(const char *, const char *, int), int align)
+{
+  char buf1[2*MAX_SZ+10],buf2[2*MAX_SZ+10];
+  int sz, diff_pos, zero_pos;
+  int e;
+  for(sz = 1; sz < MAX_SZ; sz++)
+for(diff_pos = ((sz>10)?(sz-10):0); diff_pos < sz+10; diff_pos++)
+  for(zero_pos = ((sz>10)?(sz-10):0); zero_pos < sz+10; zero_pos++)
+	{
+	  memset(buf1, 'A', sizeof(buf1));
+	  memset(buf2, 'A', sizeof(buf2));
+	  buf2[diff_pos] = 'B';
+	  buf1[zero_pos] = 0;
+	  buf2[zero_pos] = 0;
+	  e = -1;
+	  if ( zero_pos == 0 || zero_pos <= diff_pos ) e = 0;
+	  (*test_func)(buf1,buf2,e);
+	  (*test_func)(buf2,buf1,-e);
+	  (*test_func)(buf2,buf2,0);
+	  /* differing length: */
+	  buf2[diff_pos] = 0;
+	  (*test_func)(buf1,buf2,-e);
+	  memset(buf2+diff_pos,'B',sizeof(buf2)-diff_pos);
+	  buf2[zero_pos] = 0;
+	  (*test_func)(buf1,buf2,e);
+	  (*test_func)(buf2,buf1,-e);
+	}
+}
+
+static void test_driver_strncmp (void (test_func)(const char *, const char *, int), size_t sz, int align)
+{
+  char buf1[MAX_SZ*2+10],buf2[MAX_SZ*2+10];
+  size_t test_sz = (sz10)?(test_sz-10):0);
+  diff_pos < test_sz+10; diff_pos++)
+for(zero_pos = ((test_sz>10)?(test_sz-10):0);
+	zero_pos < test_sz+10; zero_pos++)
+  {
+	memset(buf1, 'A', 2*test_sz);
+	memset(buf2, 'A', 2*test_sz);
+	buf2[diff_pos] = 'B';
+	buf1[zero_pos] = 0;
+	buf2[zero_pos] = 0;
+	e = -1;
+	if ( diff_pos >= sz ) e = 0;
+	if ( zero_pos <= diff_pos ) e = 0;
+	(*test_func)(buf1,buf2,e);
+	(*test_func)(buf2,buf1,-e);
+	(*test_func)(buf2,buf2,0);
+	/* differing length: */
+	buf2[diff_pos] = 0;
+	(*test_func)(buf1,buf2,-e);
+	memset(buf2+diff_pos,'B',sizeof(buf2)-diff_pos);
+	buf2[zero_pos] = 0;
+	(*test_func)(buf1,buf2,e);
+	(*test_func)(buf2,buf1,-e);
+  }
+}
+
+#define RUN_TESTN(SZ, ALIGN) test_driver_strncmp (test_strncmp_ ## SZ ## _ ## ALIGN, SZ, ALIGN);
+#define RUN_TEST(ALIGN)  test_driver_strcmp (test_strcmp_ ## ALIGN, ALIGN);
+
+#define DEF_TESTN(SZ, ALIGN)	 \
+static void test_strncmp_ ## SZ ## _ ## ALIGN (const char *str1, const char *str2, int expect)	 \
+{ \
+  char three[8192] __attribute__ ((aligned (4096)));		 \
+  char four[8192] __attribute__ ((aligned (4096)));		 \
+  char *a, *b;			 \
+  int i,j,r;			 \
+  for (j = 0; j < 2; j++)	 \
+{ \
+  for (i = 0; i < 2; i++)	 \
+	{			 \
+	  a = three+i*ALIGN+j*(4096-2*i*ALIGN);			 \
+	  b = four+i*ALIGN+j*(4096-2*i*ALIGN);			 \
+	  strcpy(a,str1);	 \
+	  strcpy(b,str2);	 \
+	  r = strncmp(a,b,SZ);	 \
+	  if ( r < 0 && !(expect < 0) ) \
+	{ abort(); }	 \
+	  if ( r > 0 && !(expect > 0) ) \
+	{ abort(); }	 \
+	  if ( r == 0 && !(expect == 0) )			 \
+	{ abort(); }	 \
+	}			 \
+} \
+}
+#define DEF_TEST(ALIGN)		 \
+static void test_strcmp_ ## ALIGN (const char *str1, const char *str2, int expect)		 \
+{ \
+  char three[8192] __attribute__ ((aligned (4096)));		 \
+  char four[8192] __attribute__ ((aligned (4096)));		 \
+  char *a, *b;			 \
+  int i,j,r;			 \
+  for (j = 0; j < 2; j++)	 \
+{ \
+  for (i = 0; i < 2; i++)	 \
+	{			 \
+	  a = three+i*ALIGN+j*(4096-2*i*ALIGN);			 \
+	  b = four+i*ALIGN+j*(4096-2*i*ALIGN);			 \
+	  strcpy(a,str1);	 \
+	  strcpy(b,str2);	 \
+	  r = strcmp(a,b);	 \
+	  if ( r < 0 && !(expect < 0) ) \
+	{ abort(); }	 \
+	  if ( r > 0 && !(expect > 0) ) \
+	{ abort(); }	 \
+	  if ( r == 0 && !(expect == 0) )			 \
+	{ abort(); }

Re: [PATCH] PR target/79004, Fix char/short -> _Float128 on PowerPC -mcpu=power9

2017-01-17 Thread Michael Meissner

On Tue, Jan 17, 2017 at 07:00:36PM -0600, Segher Boessenkool wrote:
> On Tue, Jan 17, 2017 at 07:39:05PM -0500, Michael Meissner wrote:
> > It turns out the testcase I submitted for pr79004 failed, since I had the 
> > wrong
> > syntax for \m and \M (you need to use {\m...\M} not "\m...\M".
> 
> You can use quotes, but then it is "\\m...\\M", like with all backslashes
> in quotes.  {} saves you from that headache.  "" has all three kinds of
> substitution applied to it; {} gets none.  "man tcl", the most enlightening
> 204 lines (many empty) about Tcl you'll ever read :-)

Yeah, I figured that I could do it with multiple \'s, but it was simpler to use
{} like the other examples, rather than figure out how many \'s are needed
(BTDT).

I don't remember whether tcl used grep, egrep, or perl style regexps, so I
don't tend to use the more complicated variants.

> 
> > I also forgot to add -mfloat128.  I committed this patch as obvious:
> 
> Thanks!

I did test it before submitting it. :-)

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] PR target/79004, Fix char/short -> _Float128 on PowerPC -mcpu=power9

2017-01-17 Thread Segher Boessenkool

On Tue, Jan 17, 2017 at 07:39:05PM -0500, Michael Meissner wrote:
> It turns out the testcase I submitted for pr79004 failed, since I had the 
> wrong
> syntax for \m and \M (you need to use {\m...\M} not "\m...\M".

You can use quotes, but then it is "\\m...\\M", like with all backslashes
in quotes.  {} saves you from that headache.  "" has all three kinds of
substitution applied to it; {} gets none.  "man tcl", the most enlightening
204 lines (many empty) about Tcl you'll ever read :-)

> I also forgot to add -mfloat128.  I committed this patch as obvious:

Thanks!

Segher

Re: [PATCH] PR target/79004, Fix char/short -> _Float128 on PowerPC -mcpu=power9

2017-01-17 Thread Michael Meissner

On Wed, Jan 11, 2017 at 04:39:19PM -0600, Segher Boessenkool wrote:
> On Mon, Jan 09, 2017 at 07:32:27PM -0500, Michael Meissner wrote:
> > This patch fixes PR target/79004 by eliminating the optimization of avoiding
> > direct move if we are converting an 8/16-bit integer value from memory to 
> > IEEE
> > 128-bit floating point.
> > 
> > I opened a new bug (PR target/79038) to address the underlying issue that 
> > the
> > IEEE 128-bit floating point integer conversions were written before small
> > integers were allowed in the traditional Altivec registers.  This meant 
> > that we
> > had to use UNSPEC and explicit temporaries to get the integers into the
> > appropriate registers.
> > 
> > I have tested this bug by doing a bootstrap build and make check on a little
> > endian power8 system and using an assembler that knows about ISA 3.0
> > instructions.  I added a new test to verify the results.  Can I check this 
> > into
> > the trunk?  This is not an issue on GCC 6.x.
> 
> Okay, thanks!  Two comments:
> 
> > +/* { dg-final { scan-assembler-not " bl __"} } */
> > +/* { dg-final { scan-assembler "xscvdpqp"  } } */
> > +/* { dg-final { scan-assembler "xscvqpdp"  } } */
> 
> This line always matches if ...
> 
> > +/* { dg-final { scan-assembler "xscvqpdpo" } } */
> 
> ... this one does.  I recommend \m \M .
> 
> > +/* { dg-final { scan-assembler "xscvqpsdz" } } */
> > +/* { dg-final { scan-assembler "xscvqpswz" } } */
> > +/* { dg-final { scan-assembler "xscvsdqp"  } } */
> > +/* { dg-final { scan-assembler "xscvudqp"  } } */
> > +/* { dg-final { scan-assembler "lxsd"  } } */
> > +/* { dg-final { scan-assembler "lxsiwax"   } } */
> > +/* { dg-final { scan-assembler "lxsiwzx"   } } */
> > +/* { dg-final { scan-assembler "lxssp" } } */
> > +/* { dg-final { scan-assembler "stxsd" } } */
> > +/* { dg-final { scan-assembler "stxsiwx"   } } */
> > +/* { dg-final { scan-assembler "stxssp"} } */
> 
> There are many more than 14 instructions generated; maybe you want
> scan-assembler-times?

It turns out the testcase I submitted for pr79004 failed, since I had the wrong
syntax for \m and \M (you need to use {\m...\M} not "\m...\M".  I also forgot
to add -mfloat128.  I committed this patch as obvious:

2017-01-17  Michael Meissner  

PR target/79004
* gcc.target/powerpc/pr79004.c: Add -mfloat128 to the test
options.  Fix up the syntax for using \m and \M.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/testsuite/gcc.target/powerpc/pr79004.c
===
--- gcc/testsuite/gcc.target/powerpc/pr79004.c  (revision 244555)
+++ gcc/testsuite/gcc.target/powerpc/pr79004.c  (working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
-/* { dg-options "-mcpu=power9 -O2" } */
+/* { dg-options "-mcpu=power9 -O2 -mfloat128" } */
 
 #include 
 
@@ -101,18 +101,18 @@
 void to_uns_int_store_n (TYPE a, unsigned int *p, long n) { p[n] = (unsigned 
int)a; }
 void to_uns_long_store_n (TYPE a, unsigned long *p, long n) { p[n] = (unsigned 
long)a; }
 
-/* { dg-final { scan-assembler-not "\mbl __"   } } */
-/* { dg-final { scan-assembler "\mxscvdpqp\M"  } } */
-/* { dg-final { scan-assembler "\mxscvqpdp\M"  } } */
-/* { dg-final { scan-assembler "\mxscvqpdpo\M" } } */
-/* { dg-final { scan-assembler "\mxscvqpsdz\M" } } */
-/* { dg-final { scan-assembler "\mxscvqpswz\M" } } */
-/* { dg-final { scan-assembler "\mxscvsdqp\M"  } } */
-/* { dg-final { scan-assembler "\mxscvudqp\M"  } } */
-/* { dg-final { scan-assembler "\mlxsd\M"  } } */
-/* { dg-final { scan-assembler "\mlxsiwax\M"   } } */
-/* { dg-final { scan-assembler "\mlxsiwzx\M"   } } */
-/* { dg-final { scan-assembler "\mlxssp\M" } } */
-/* { dg-final { scan-assembler "\mstxsd\M" } } */
-/* { dg-final { scan-assembler "\mstxsiwx\M"   } } */
-/* { dg-final { scan-assembler "\mstxssp\M"} } */
+/* { dg-final { scan-assembler-not {\mbl __}   } } */
+/* { dg-final { scan-assembler {\mxscvdpqp\M}  } } */
+/* { dg-final { scan-assembler {\mxscvqpdp\M}  } } */
+/* { dg-final { scan-assembler {\mxscvqpdpo\M} } } */
+/* { dg-final { scan-assembler {\mxscvqpsdz\M} } } */
+/* { dg-final { scan-assembler {\mxscvqpswz\M} } } */
+/* { dg-final { scan-assembler {\mxscvsdqp\M}  } } */
+/* { dg-final { scan-assembler {\mxscvudqp\M}  } } */
+/* { dg-final { scan-assembler {\mlxsd\M}  } } */
+/* { dg-final { scan-assembler {\mlxsiwax\M}   } } */
+/* { dg-final { scan-assembler {\mlxsiwzx\M}   } } */

Re: [PATCH 2/2] IPA ICF: make algorithm stable to survive -fcompare-debug

2017-01-17 Thread Dominik Vogt

On Tue, Jan 10, 2017 at 03:40:00PM +0100, Martin Liška wrote:
> On 01/10/2017 02:56 PM, Richard Biener wrote:
> >On Mon, Jan 9, 2017 at 4:05 PM, Martin Liška  wrote:
> >>Second part of the patch does sorting of final congruence classes, it's 
> >>groups
> >>and items included in the groups according DECL_UID.
> >>
> >>Both patches can bootstrap together on ppc64le-redhat-linux and survive 
> >>regression tests.
> >>
> >>Ready to be installed?
> >
> >Minor nit:
> >
> >+  auto_vec  classes;
> >+  for (hash_table::iterator it = m_classes.begin ();
> >+   it != m_classes.end (); ++it)
> >+classes.safe_push (*it);
> >
> >use quick_push and reserve_exact m_classes.elements () elements for
> >the classes vector before.
> 
> Thanks for hint.
> 
> >
> >+
> >+  classes.qsort (sort_congruence_class_groups_by_decl_uid);
> >
> >Ok with that change.
> 
> Installed as r244273.

This patch has somehow broken the error line information in some
s390x test case:

.../build/gcc/xgcc -B.../gcc/ 
.../gcc/testsuite/gcc.target/s390/target-attribute/tattr-2.c -O3 -march=zEC12 
-mno-htm -S -m64 -o tattr-2.s

...
.../gcc/testsuite/gcc.target/s390/target-attribute/tattr-2.c:
In function ‘a0’:

.../gcc/testsuite/gcc.target/s390/target-attribute/tattr-2.c:23:3:

error: Builtin ‘__builtin_tend’ is not supported without -mhtm
(default with -march=zEC12 and higher).

But function a0 is actually in lines 37 to 43.  It looks like the
message has used the same line number as the previous message.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH 9c] callgraph: handle __RTL functions

2017-01-17 Thread Jeff Law


On 01/17/2017 02:21 AM, Richard Biener wrote:


So I guess my question is how do you ensure that even though cgraph hasn't
looked at code that we're appropriately conservative with how the file is
processed?  Particularly if there's other code in the source file that is
expected to interact with the RTL native code?


I think that as we're finalizing the function from the FE before the
cgraph is built
(and even throw away the RTL?) we have no other choice than treating a __RTL
function as black box which means treat it as possibly calling all function in
the TU and reading/writing/taking the address of all decls in the TU.  Consider

static int i;
static void foo () {}
int __RTL main()
{
  ... call foo, access i ...
}

which probably will right now optimize i and foo away and thus fail to link?
That's what I think will currently happen.  I don't know the IPA bits 
very well, so I could have missed something.





But I think we can sort out these "details" when we run into them...
I can live with that -- I strongly suspect we're going to find all kinds 
of things of a similar nature the more we poke at this stuff.


With that in mind, I'll go ahead and ack 9c for the trunk with the 
explicit understanding that we know there's stuff that's going to break 
the harder we push it and we'll incrementally work to improve it.


It certainly helps that I see this as strictly for developers, not 
users.  So I'm willing to be more lax on a lot of stuff.


jeff

Re: [PATCH, MIPS] Target flag and build option to disable indexed memory OPs.

2017-01-17 Thread Doug Gilmore

On 01/17/2017 05:41 AM, Moore, Catherine wrote:
> 
>
>> ...
>> Having thought further I agree we can safely ignore DSP indexed load
>> and micromips LWXS on
>> the basis that DSP code will not run on a MIPS64 processor anyway (at
>> least none that I
>> know of) so the issue cannot occur and similarly for microMIPS, there
>> are no 64-bit cores.
>>
>> Restricting to just LWXC1/SWXC1/LDXC1/SDXC1 is therefore fine but
>> we should reflect
>> that in option names then.
>>
>> --with-lxc1-sxc1 --without-lxc1-sxc1
>> -mlxc1-sxc1
>>
>> These names reflect the internal macro that controls availability of
>> these instructions.
>>
>> Macro name: __mips_no_lxc1_sxc1
>> Defined when !ISA_HAS_LXC1_SXC1 so would be present even when
>> targeting a core that
>> doesn't have the instructions anyway.
>>
>> Any refinements to this Catherine?
>>
> No.  This plan looks good.
> 
Sounds good, I'll update the patch accordingly.

BTW, if we did guard all of the indexed memory OPs with a flag
there would be ~150 tests to clean up when configuring with indexed
memory OPs disabled.  When I tested with indexed memory OPs disabled
with the original patch, there were no additional regressions.

Also I'll be updating the bug report with my current take on what went
wrong with r216501.

Thanks,

Doug

Re: [PATCH][libstdc++-v3, fuchsia] Add support for fuchsia targets to libstdc++

2017-01-17 Thread Josh Conner via gcc-patches


On 1/17/17 3:27 PM, Jonathan Wakely wrote:

On 17/01/17 14:55 -0800, Josh Conner via libstdc++ wrote:

On 1/17/17 2:35 PM, Jonathan Wakely wrote:

On 17/01/17 13:15 -0800, Josh Conner via libstdc++ wrote:

This patch adds fuchsia support to libstdc++. OK for trunk?


Is fuchsia only supported as a cross-compiler target, not native?
For the moment. I have a patch that adds fuchsia support to 
libtool.m4 
(http://lists.gnu.org/archive/html/libtool-patches/2016-12/msg0.html), 
but I haven't gotten a response yet.


OK, thanks. I don't think your change can cause problems for any
existing targets, so it's OK for trunk even though we're right at the
end of stage 3.

Applied, thanks!

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread Wilco Dijkstra

kugan wrote:
> Wilco Dijkstra wrote:
> > +   /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
> > +   if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))

> Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is 
> done only for SImode. Am I missing something?

The diff doesn't show enough context, but this is inside an if that checks
for DImode shifts. Note the shift count is SImode. 

> Also, what was the motivation for this patch. Is that to improve the 
> maintainability of the arm back-end?

These particular patterns should never have existed. Optimized
expansions should be added to arm_emit_coreregs_64bit_shift.

You may have noticed a few patches have been proposed recently to 
improve the generated code of DImode operations (PR77308).
The key realization was that GCC will generate absolutely terrible code 
unless either all DImode operations are split before register allocation,
or we only use Neon instructions. There is no middle ground here, trying
to allocate DImode registers from only 5 available register pairs (if lucky)
just isn't going to work.

So the goal is to enable early splitting in all DImode patterns. Removing
no-split multi-instruction patterns helps -  these are a bad idea anyway.

Wilco

Re: [PATCH][libstdc++-v3, fuchsia] Add support for fuchsia targets to libstdc++

2017-01-17 Thread Jonathan Wakely


On 17/01/17 14:55 -0800, Josh Conner via libstdc++ wrote:

On 1/17/17 2:35 PM, Jonathan Wakely wrote:

On 17/01/17 13:15 -0800, Josh Conner via libstdc++ wrote:

This patch adds fuchsia support to libstdc++. OK for trunk?


Is fuchsia only supported as a cross-compiler target, not native?
For the moment. I have a patch that adds fuchsia support to libtool.m4 (http://lists.gnu.org/archive/html/libtool-patches/2016-12/msg0.html), 
but I haven't gotten a response yet.


OK, thanks. I don't think your change can cause problems for any
existing targets, so it's OK for trunk even though we're right at the
end of stage 3.

Re: [PATCH][libstdc++-v3, fuchsia] Add support for fuchsia targets to libstdc++

2017-01-17 Thread Josh Conner via gcc-patches


On 1/17/17 2:35 PM, Jonathan Wakely wrote:

On 17/01/17 13:15 -0800, Josh Conner via libstdc++ wrote:

This patch adds fuchsia support to libstdc++. OK for trunk?


Is fuchsia only supported as a cross-compiler target, not native?
For the moment. I have a patch that adds fuchsia support to libtool.m4 
(http://lists.gnu.org/archive/html/libtool-patches/2016-12/msg0.html), 
but I haven't gotten a response yet.


- Josh

Re: [PATCH][libstdc++-v3, fuchsia] Add support for fuchsia targets to libstdc++

2017-01-17 Thread Jonathan Wakely


On 17/01/17 13:15 -0800, Josh Conner via libstdc++ wrote:

This patch adds fuchsia support to libstdc++. OK for trunk?


Is fuchsia only supported as a cross-compiler target, not native?

Re: patch to fix PR79058

2017-01-17 Thread Christophe Lyon

Hi Vladimir,

On 17 January 2017 at 17:14, Vladimir Makarov  wrote:
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79058
>
> The patch was successfully bootstrapped and tested on x86-64.
>
> Committed as rev. 244535.
>
>

The new testcase fails to compile on arm*-linux-gnueabihf configurations
when using -mthumb (either via runtestflags or configuring gcc
--with-mode=thumb)
because "sorry, unimplemented: Thumb-1 hard-float VFP ABI"

A few other tests suffer from the same problem though, and the guard is
tricky to get right :-(

Even those using something like dg-require-effective-target arm_arch_v4t_ok
are not protected because this effective target relies on preprocessor defines
only. Adding a variable declaration in arm_arch_FUNC_ok (in target-supports.exp,
like check_effective_target_arm_thumb1) would make the effective
target test fail,
but I never understood why it isn't desirable to do so (I proposed a
patch years ago :-)

Christophe

Re: [PATCH] -mstack-protector-guard and friends (PR78875)

2017-01-17 Thread David Edelsohn

On Tue, Jan 17, 2017 at 12:43 PM, Segher Boessenkool
 wrote:
> Currently, on PowerPC, code compiled with -fstack-protector will load
> the canary from -0x7010(13) (for -m64) or from -0x7008(2) (for -m32)
> if GCC was compiled against GNU libc 2.4 or newer or some other libc
> that supports -fstack-protector, and from the global variable
> __stack_chk_guard otherwise.
>
> This does not work well for Linux and other OS kernels and similar.
> For such non-standard applications, this patch creates a few new
> command-line options.  The relevant new use cases are:
>
> -mstack-protector-guard=global
> Use the __stack_chk_guard variable, no matter how this GCC was
> configured.
>
> -mstack-protector-guard=tls
> Use the canary from TLS.  This will error out if this GCC was built
> with a C library that does not support it.
>
> -mstack-protector-guard=tls -mstack-protector-register=
> -mstack-protector-offset=
> Load the canary from offset  from base register .
>
>
> Bootstrap and test running.  Is this okay for trunk?
>
>
> Segher
>
>
> 2017-01-17  Segher Boessenkool  
>
> PR target/78875
>
> * config/rs6000/rs6000-opts.h (stack_protector_guard): New enum.
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Handle
> the new options.
> * config/rs6000/rs6000.md (stack_protect_set): Handle the new more
> flexible settings.
> (stack_protect_test): Ditto.
> * config/rs6000/rs6000.opt (mstack-protector-guard=,
> mstack-protector-guard-reg=, mstack-protector-guard-offset=): New
> options.
> * doc/invoke.texi (Option Summary) [RS/6000 and PowerPC Options]:
> Add -mstack-protector-guard=, -mstack-protector-guard-reg=, and
> -mstack-protector-guard-offset=.
> (RS/6000 and PowerPC Options): Ditto.
>
> gcc/testsuite/
> * gcc.target/powerpc/ssp-1.c: New testcase.
> * gcc.target/powerpc/ssp-2.c: New testcase.

Okay.

Thanks, David

Re: [PATCH] Add support for Fuchsia (OS)

2017-01-17 Thread Josh Conner via gcc-patches

Gerald -

Attached is my recommended patch for changes to the web docs describing Fuchsia 
support. Please let
me know if there's anything else I can do.

Thanks!

- Josh

On 12/11/16 7:24 AM, Gerald Pfeifer wrote:

On Thu, 8 Dec 2016, Josh Conner wrote:

This patch adds support to gcc for the Fuchsia OS
(https://fuchsia.googlesource.com/).

Once this is in, can you please suggest a news item for our
main page?
(You could cook a patch following https://gcc.gnu.org/about.html
or suggest wording or an HTML snippet, and I'll take it from there.)
Similarly, would be good to add this to gcc-7/changes.html.
Gerald

Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.1037
diff -r1.1037 index.html
48a49,53
> Fuchsia OS support
>  [2017-10-01]
>  https://fuchsia.googlesource.com/;> Fuchsia OS
>  support was added to GCC, contributed by Google.
> 
Index: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.39
diff -r1.39 changes.html
533a534,539
> Fuchsia
>
>  Support has been added for the
>  https://fuchsia.googlesource.com/;> Fuchsia OS.
>
>

[wwwdocs] Document significant Ada change

2017-01-17 Thread Eric Botcazou

Applied.

-- 
Eric BotcazouIndex: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.38
diff -r1.38 changes.html
57c57,61
< 
---
> Ada
> 
>   On mainstream native platforms, Ada programs no longer require the stack
>   to be made executable in order to run properly.
>

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread kugan


Hi Wilco,

On 18/01/17 06:23, Wilco Dijkstra wrote:

ChangeLog:
2017-01-17  Wilco Dijkstra  

* config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
(arm_ashldi3_1bit): Remove pattern.
(ashrdi3): Remove shift by 1 expansion.
(arm_ashrdi3_1bit): Remove pattern.
(lshrdi3): Remove shift by 1 expansion.
(arm_lshrdi3_1bit): Remove pattern.
* config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
cost of ashldi3 by 1.
* config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
(di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
   + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
  if (speed_p)
*cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
  return true;
}
Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is 
done only for SImode. Am I missing something?


Also, what was the motivation for this patch. Is that to improve the 
maintainability of the arm back-end?


Thanks,
Kugan

Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-17 Thread Andrew Waterman

On Tue, Jan 17, 2017 at 12:48 PM, Karsten Merker  wrote:
> On Mon, Jan 16, 2017 at 09:37:15PM -0800, Palmer Dabbelt wrote:
>> On Sat, 14 Jan 2017 02:05:27 PST (-0800), mer...@debian.org wrote:
>> > Palmer Dabbelt wrote:
>> >
>> >> diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
>> >> new file mode 100644
>> >> index 000..045f6cc
>> >> --- /dev/null
>> >> +++ b/gcc/config/riscv/linux.h
>> >> [...]
>> >>  +#define GLIBC_DYNAMIC_LINKER "/lib" XLEN_SPEC "/" ABI_SPEC "/ld.so.1"
>> >
>> > [with XLEN_SPEC being either 32 or 64 and ABI_SPEC being one of
>> >  ilp32, ilp32f, ilp32d, lp64, lp64f, lp64d]
> [...]
>> > I am not fully happy with the way the dynamic linker path (which
>> > gets embedded into every Linux executable built by gcc and
>> > therefore cannot be changed later) is defined here.  The dynamic
>> > linker path must be unique over all platforms for which a Linux
>> > port exists to make multiarch installations (i.e. having
>> > dynamically linked binaries for multiple architectures/ABIs in
>> > the same root filesystem) possible.  The path specifier as cited
>> > above contains nothing that makes the linker path inherently
>> > specific to RISC-V.  While there is AFAIK no other architecture
>> > that currently uses exactly this specific linker path model with
>> > the ABI specifier as a separate subdirectory (instead of encoding
>> > it into the filename), so that there technically isn't a naming
>> > conflict, I think that we should follow the convention of the
>> > other "modern" Linux architectures, which all include the
>> > architecture name in their linker path:
>> >
>> >   * arm64:/lib/ld-linux-aarch64.so.1
>> >   * armhf:/lib/ld-linux-armhf.so.3
>> >   * ia64: /lib/ld-linux-ia64.so.2
>> >   * mips n64: /lib64/ld-linux-mipsn8.so.1
>> >   * nios2:/lib/ld-linux-nios2.so.1
>> >   * x86_64:   /lib64/ld-linux-x86-64.so.2
>> >
>> > So the actual ld.so binary should be called something like
>> > "ld-linux-rv.so.1" instead of just "ld.so.1". With everything
>> > else staying the same, that would give us a dynamic linker path
>> > along the lines of "/lib64/lp64f/ld-linux-rv.so.1" for an RV64G
>> > system.
> [...]
>> Just to be clear, the paths you'd like would look exactly like
>>
>>   rv32-ilp32: /lib32/ilp32/ld-linux-rv.so.1
>>   rv64-lp64d: /lib64/lp64d/ld-linux-rv.so.1
>>
>> ?
>
> Yes, that is what I had in mind.
>
>> If so, that should be a pretty straight-forward change.  I'll
>> incorporate it into our v2 patchset.  I'd also be OK with something
>> like "/lib64/lp64d/ld-linux-rv64imafd-lp64d.so.1", if for some reason
>> that's better (it looks a bit more like the other architectures to
>> me).  I'm really OK with pretty much anything here, so feel free to
>> offer suggestions -- otherwise I'll just go with what's listed above.
>
> Including the ABI specifier twice, i.e. both as a subdirectory
> name (.../lp64d/...) and as part of the ld.so filename
> (ld-linux-rv64imafd-lp64d.so.1) doesn't seem useful to me. The
> ABI specifier must be a part of the dynamic linker path, but
> having it in there once should be enough :-).
>
> Whether one encodes the ABI specifier inside the ld.so filename
> or in the form of a subdirectory AFAICS doesn't make any
> technical difference and appears to me largely as a matter of
> taste.  My proposal above was just the minimalst-possible change
> against the existing code that would fullfill my aim.
>
> The other Linux platforms commonly don't use subdirectories and
> instead encode the ABI specifier as part of the ld.so filename
> (e.g. the "hf" part in /lib/ld-linux-armhf.so.3 specifies
> hardfloat EABI, and the "n8" part in
> /lib64/ld-linux-mipsn8.so.1 specifies a specific MIPS ABI variant),
> while RISC-V currently encodes the ABI variant as a subdirectory name.
>
> Stefan O'Rear wrote on the RISC-V sw-dev list that he would prefer to
> encode the ABI specifier as part of the ld.so filename and put
> everything in /lib instead of differentiating the directory by XLEN,
> which would keep things largely similar to the other Linux platforms.
> Based on your two examples above that would result in something like:
>
> rv32-ilp32: /lib/ld-linux-rv32ilp32.so.1
> rv64-lp64d: /lib/ld-linux-rv64lp64d.so.1
>
> I am happy with any of these variants as long as the resulting
> naming scheme encodes both platform and ABI and thereby makes
> sure that the dynamic linker path is free of conflicts in a
> multiarch installation.  Stefan's proposal is nearer to what
> other Linux platforms do, but I assume that Andrew Waterman,
> who has introduced the current RISC-V scheme with the ABI
> subdirectories, has had a reason to do things the way they are.
> Andrew, can you perhaps comment on this?

Thanks for taking the time to ponder this.  I agree that the important
point is that the ABI (hence XLEN) is encoded somewhere in the
filename--and that once is enough :-).

We went with the /libXX/YY/ approach because, on a

[PATCH][libstdc++-v3, fuchsia] Add support for fuchsia targets to libstdc++

2017-01-17 Thread Josh Conner via gcc-patches


This patch adds fuchsia support to libstdc++. OK for trunk?

Thanks -

Josh

2017-01-17  Joshua Conner  

* crossconfig.m4: Add fuchsia OS.
* configure: Regenerate.

Index: configure
===
--- configure   (revision 244542)
+++ configure   (working copy)
@@ -53327,6 +53327,12 @@
 done
 
 ;;
+
+  *-fuchsia*)
+SECTION_FLAGS='-ffunction-sections -fdata-sections'
+
+;;
+
   *-hpux*)
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
 
Index: crossconfig.m4
===
--- crossconfig.m4  (revision 244542)
+++ crossconfig.m4  (working copy)
@@ -134,6 +134,12 @@
 fi
 AC_CHECK_FUNCS(__cxa_thread_atexit)
 ;;
+
+  *-fuchsia*)
+SECTION_FLAGS='-ffunction-sections -fdata-sections'
+AC_SUBST(SECTION_FLAGS)
+;;
+
   *-hpux*)
 SECTION_FLAGS='-ffunction-sections -fdata-sections'
 AC_SUBST(SECTION_FLAGS)

Re: [PATCH] Fix testcase for PR c/78304

2017-01-17 Thread David Malcolm

On Tue, 2017-01-17 at 10:45 +0100, Christophe Lyon wrote:
> On 16 January 2017 at 19:50, David Malcolm 
> wrote:
> > On Mon, 2017-01-16 at 13:31 +0100, Rainer Orth wrote:
> > > Hi Christophe,
> > > 
> > > > > Successfully bootstrapped on x86_64-pc-linux-gnu;
> > > > > adds 34 PASS results to gcc.sum.
> > > > > 
> > > > These 2 tests fail on arm:
> > > > 
> > > >   gcc.dg/format/pr78304.c (test for warnings, line 9)
> > > >   gcc.dg/format/pr78304.c   -DWIDE   (test for warnings, line
> > > > 9)
> > > 
> > > also on sparc-sun-solaris2.12 and i386-pc-solaris2.12, 32-bit
> > > only.
> > > 
> > >   Rainer
> > 
> > Sorry about the failures.
> > 
> > The tests I committed made assumptions about size_t and long
> > being invalid for use with "%u".
> > 
> > The tests only need some invalid type, so this patch converts
> > them to attempt a print "const char *" with "%u", which should be
> > invalid for every target (and hence generate the expected warning).
> > 
> > I reproduced the problem on i686-pc-linux-gnu, and the patch fixes
> > it there.
> > 
> > Committed to trunk as r244502.
> > 
> > Does this fix the test for you?
> > 
> Yes, they pass now.
> Thanks

Thanks; I've closed out the bug again.

> > Thanks; sorry again.
> > Dave
> > 
> > gcc/testsuite/ChangeLog:
> > PR c/78304
> > * gcc.dg/format/pr78304.c: Convert argument from integral
> > type
> > to a pointer.
> > * gcc.dg/format/pr78304-2.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/format/pr78304-2.c | 4 ++--
> >  gcc/testsuite/gcc.dg/format/pr78304.c   | 4 ++--
> >  2 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/format/pr78304-2.c
> > b/gcc/testsuite/gcc.dg/format/pr78304-2.c
> > index 5ee6d65..83648c4 100644
> > --- a/gcc/testsuite/gcc.dg/format/pr78304-2.c
> > +++ b/gcc/testsuite/gcc.dg/format/pr78304-2.c
> > @@ -5,7 +5,7 @@ extern int printf (const char *, ...);
> > 
> >  # define PRIu32"u"
> > 
> > -void test (long size)
> > +void test (const char *msg)
> >  {
> > -  printf ("size: %" PRIu32 "\n", size); /* { dg-warning "expects
> > argument of type" } */
> > +  printf ("size: %" PRIu32 "\n", msg); /* { dg-warning "expects
> > argument of type" } */
> >  }
> > diff --git a/gcc/testsuite/gcc.dg/format/pr78304.c
> > b/gcc/testsuite/gcc.dg/format/pr78304.c
> > index d0a96f6..f6ad807 100644
> > --- a/gcc/testsuite/gcc.dg/format/pr78304.c
> > +++ b/gcc/testsuite/gcc.dg/format/pr78304.c
> > @@ -4,7 +4,7 @@
> >  #include 
> >  #include 
> > 
> > -void test (size_t size)
> > +void test (const char *msg)
> >  {
> > -  printf ("size: %" PRIu32 "\n", size); /* { dg-warning "expects
> > argument of type" } */
> > +  printf ("size: %" PRIu32 "\n", msg); /* { dg-warning "expects
> > argument of type" } */
> >  }
> > --
> > 1.8.5.3
> >

[PATCH] Introduce opt_pass::skip virtual function

2017-01-17 Thread David Malcolm

On Tue, 2017-01-17 at 10:28 +0100, Richard Biener wrote:
> On Mon, Jan 16, 2017 at 10:42 PM, Jeff Law  wrote:
> > On 01/09/2017 07:38 PM, David Malcolm wrote:
> > >
> > > gcc/ChangeLog:
> > > * passes.c: Include "insn-addr.h".
> > > (should_skip_pass_p): Add logging.  Update logic for
> > > running
> > > "expand" to be compatible with both __GIMPLE and __RTL.
> > >  Guard
> > > property-provider override so it is only done for gimple
> > > passes.
> > > Don't skip dfinit.
> > > (skip_pass): New function.
> > > (execute_one_pass): Call skip_pass when skipping passes.
> > > ---
> > >  gcc/passes.c | 65
> > > +---
> > >  1 file changed, 58 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/gcc/passes.c b/gcc/passes.c
> > > index 31262ed..6954d1e 100644
> > > --- a/gcc/passes.c
> > > +++ b/gcc/passes.c
> > > @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not
> > > see
> > >  #include "cfgrtl.h"
> > >  #include "tree-ssa-live.h"  /* For remove_unused_locals.  */
> > >  #include "tree-cfgcleanup.h"
> > > +#include "insn-addr.h" /* for INSN_ADDRESSES_ALLOC.  */
> >
> > insn-addr?  Yuk.
> >
> >
> > >
> > >  using namespace gcc;
> > >
> > > @@ -2315,26 +2316,73 @@ should_skip_pass_p (opt_pass *pass)
> > >if (!cfun->pass_startwith)
> > >  return false;
> > >
> > > -  /* We can't skip the lowering phase yet -- ideally we'd
> > > - drive that phase fully via properties.  */
> > > -  if (!(cfun->curr_properties & PROP_ssa))
> > > -return false;
> > > + /* For __GIMPLE functions, we have to at least start when we
> > > leave
> > > + SSA.  */
> > > +  if (pass->properties_destroyed & PROP_ssa)
> > > +{
> > > +  if (!quiet_flag)
> > > +   fprintf (stderr, "starting anyway when leaving SSA:
> > > %s\n",
> > > pass->name);
> > > +  cfun->pass_startwith = NULL;
> > > +  return false;
> > > +}
> >
> > This seems to need a comment -- it's not obvious how destroying the
> > SSA
> > property maps to a pass that can not be skipped.
> > >
> > >
> > >
> > > -  /* And also run any property provider.  */
> > > -  if (pass->properties_provided != 0)
> > > +  /* Run any property provider.  */
> > > +  if (pass->type == GIMPLE_PASS
> > > +  && pass->properties_provided != 0)
> > >  return false;
> >
> > So comment needed here too.  I read this as "if a gimple pass
> > provides a
> > property then it should not be skipped.  Which means that an RTL
> > pass that
> > provides a property can?
> >
> >
> > >
> > > +  /* Don't skip df init; later RTL passes need it.  */
> > > +  if (strstr (pass->name, "dfinit") != NULL)
> > > +return false;
> >
> > Which seems like a failing in RTL passes saying they need DF init.
> >
> >
> >
> > > +/* Skip the given pass, for handling passes before "startwith"
> > > +   in __GIMPLE and__RTL-marked functions.
> > > +   In theory, this ought to be a no-op, but some of the RTL
> > > passes
> > > +   need additional processing here.  */
> > > +
> > > +static void
> > > +skip_pass (opt_pass *pass)
> >
> > ...
> > This all feels like a failing in how we handle state in the RTL
> > world. And I
> > suspect it's prone to error.  Imagine if I'm hacking on something
> > in the RTL
> > world and my code depends on something else being set up.   I
> > really ought
> > to have a way within my pass to indicate what I depend on. Having
> > it hidden
> > away in passes.c makes it easy to miss/forget.
> >
> >
> > > +{
> > > +  /* Pass "reload" sets the global "reload_completed", and many
> > > + things depend on this (e.g. instructions in .md files).  */
> > > +  if (strcmp (pass->name, "reload") == 0)
> > > +reload_completed = 1;
> >
> > Seems like this ought to be a property provided by LRA/reload.
> >
> >
> > > +
> > > +  /* The INSN_ADDRESSES vec is normally set up by
> > > + shorten_branches; set it up for the benefit of passes that
> > > + run after this.  */
> > > +  if (strcmp (pass->name, "shorten") == 0)
> > > +INSN_ADDRESSES_ALLOC (get_max_uid ());
> >
> > Similarly ought to be provided by shorten-branches
> >
> > > +
> > > +  /* Update the cfg hooks as appropriate.  */
> > > +  if (strcmp (pass->name, "into_cfglayout") == 0)
> > > +{
> > > +  cfg_layout_rtl_register_cfg_hooks ();
> > > +  cfun->curr_properties |= PROP_cfglayout;
> > > +}
> > > +  if (strcmp (pass->name, "outof_cfglayout") == 0)
> > > +{
> > > +  rtl_register_cfg_hooks ();
> > > +  cfun->curr_properties &= ~PROP_cfglayout;
> > > +}
> > > +}
> >
> > This feels somewhat different, but still a hack.
> >
> > I don't have strong suggestions on how to approach this, but what
> > we've got
> > here feels like a hack and one prone to bitrot.
>
> All the above needs a bit of cleanup in the way we use (or not use)
> PROP_xxx.
> For example right now you can't startwith a __GIMPLE with a pass
> inside the
>

Re: [PATCH] -mstack-protector-guard and friends (PR78875)

2017-01-17 Thread Segher Boessenkool

On Tue, Jan 17, 2017 at 05:43:54PM +, Segher Boessenkool wrote:
> Currently, on PowerPC, code compiled with -fstack-protector will load
> the canary from -0x7010(13) (for -m64) or from -0x7008(2) (for -m32)
> if GCC was compiled against GNU libc 2.4 or newer or some other libc
> that supports -fstack-protector, and from the global variable
> __stack_chk_guard otherwise.
> 
> This does not work well for Linux and other OS kernels and similar.
> For such non-standard applications, this patch creates a few new
> command-line options.  The relevant new use cases are:
> 
> -mstack-protector-guard=global
> Use the __stack_chk_guard variable, no matter how this GCC was
> configured.
> 
> -mstack-protector-guard=tls
> Use the canary from TLS.  This will error out if this GCC was built
> with a C library that does not support it.
> 
> -mstack-protector-guard=tls -mstack-protector-register=
> -mstack-protector-offset=
> Load the canary from offset  from base register .
> 
> 
> Bootstrap and test running.  Is this okay for trunk?

No problems found on powerpc64-linux {-m32,-m64}.


Segher

Backports to 6.x

2017-01-17 Thread Jakub Jelinek

Hi!

I've backported a couple of patches to gcc-6-branch after
bootstrapping/regtesting them on x86_64-linux and i686-linux.

Jakub
2017-01-17  Jakub Jelinek  

Backported from mainline
2016-12-21  Jakub Jelinek  

PR fortran/78866
* openmp.c (resolve_omp_clauses): Diagnose assumed size arrays in
OpenMP map, to and from clauses.
* trans-openmp.c: Include diagnostic-core.h, temporarily redefining
GCC_DIAG_STYLE to __gcc_tdiag__.
(gfc_omp_finish_clause): Diagnose implicitly mapped assumed size
arrays.

* gfortran.dg/gomp/map-1.f90: Add expected error.
* gfortran.dg/gomp/pr78866-1.f90: New test.
* gfortran.dg/gomp/pr78866-2.f90: New test.

--- gcc/fortran/openmp.c(revision 243859)
+++ gcc/fortran/openmp.c(revision 243860)
@@ -3530,6 +3530,11 @@ resolve_omp_clauses (gfc_code *code, gfc
else
  resolve_oacc_data_clauses (n->sym, n->where, name);
  }
+   else if (list != OMP_CLAUSE_DEPEND
+&& n->sym->as
+&& n->sym->as->type == AS_ASSUMED_SIZE)
+ gfc_error ("Assumed size array %qs in %s clause at %L",
+n->sym->name, name, >where);
  }
 
if (list != OMP_LIST_DEPEND)
--- gcc/fortran/trans-openmp.c  (revision 243859)
+++ gcc/fortran/trans-openmp.c  (revision 243860)
@@ -37,6 +37,11 @@ along with GCC; see the file COPYING3.
 #include "arith.h"
 #include "omp-low.h"
 #include "gomp-constants.h"
+#undef GCC_DIAG_STYLE
+#define GCC_DIAG_STYLE __gcc_tdiag__
+#include "diagnostic-core.h"
+#undef GCC_DIAG_STYLE
+#define GCC_DIAG_STYLE __gcc_gfc__
 
 int ompws_flags;
 
@@ -1028,6 +1033,21 @@ gfc_omp_finish_clause (tree c, gimple_se
 return;
 
   tree decl = OMP_CLAUSE_DECL (c);
+
+  /* Assumed-size arrays can't be mapped implicitly, they have to be
+ mapped explicitly using array sections.  */
+  if (TREE_CODE (decl) == PARM_DECL
+  && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
+  && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
+  && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
+   GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
+== NULL)
+{
+  error_at (OMP_CLAUSE_LOCATION (c),
+   "implicit mapping of assumed size array %qD", decl);
+  return;
+}
+
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
   if (POINTER_TYPE_P (TREE_TYPE (decl)))
 {
--- gcc/testsuite/gfortran.dg/gomp/pr78866-1.f90(nonexistent)
+++ gcc/testsuite/gfortran.dg/gomp/pr78866-1.f90(revision 243860)
@@ -0,0 +1,19 @@
+! PR fortran/78866
+! { dg-do compile }
+
+subroutine pr78866(x)
+  integer :: x(*)
+!$omp target map(x)! { dg-error "Assumed size array" }
+  x(1) = 1
+!$omp end target
+!$omp target data map(tofrom: x)   ! { dg-error "Assumed size array" }
+!$omp target update to(x)  ! { dg-error "Assumed size array" }
+!$omp target update from(x)! { dg-error "Assumed size array" }
+!$omp end target data
+!$omp target map(x(:23))   ! { dg-bogus "Assumed size array" }
+  x(1) = 1
+!$omp end target
+!$omp target map(x(:)) ! { dg-error "upper bound of assumed 
size array section" }
+  x(1) = 1 ! { dg-error "not a proper array 
section" "" { target *-*-* } .-1 }
+!$omp end target
+end
--- gcc/testsuite/gfortran.dg/gomp/pr78866-2.f90(nonexistent)
+++ gcc/testsuite/gfortran.dg/gomp/pr78866-2.f90(revision 243860)
@@ -0,0 +1,9 @@
+! PR fortran/78866
+! { dg-do compile }
+
+subroutine pr78866(x)
+  integer :: x(*)
+!$omp target   ! { dg-error "implicit mapping of assumed size array" }
+  x(1) = 1
+!$omp end target
+end
--- gcc/testsuite/gfortran.dg/gomp/map-1.f90(revision 243859)
+++ gcc/testsuite/gfortran.dg/gomp/map-1.f90(revision 243860)
@@ -70,7 +70,7 @@ subroutine test(aas)
   ! { dg-error "Rightmost upper bound of assumed size array section not 
specified" "" { target *-*-* } 68 }
   ! { dg-error "'aas' in MAP clause at \\\(1\\\) is not a proper array 
section" "" { target *-*-* } 68 }
 
-  !$omp target map(aas) ! { dg-error "The upper bound in the last dimension 
must appear" "" { xfail *-*-* } }
+  !$omp target map(aas) ! { dg-error "Assumed size array" }
   !$omp end target
 
   !$omp target map(aas(5:7))
2017-01-17  Jakub Jelinek  

Backported from mainline
2017-01-04  Jakub Jelinek  

PR c++/71182
* parser.c (cp_lexer_previous_token): Use vec_safe_address in the
assertion, as lexer->buffer may be NULL.

* g++.dg/cpp0x/pr71182.C: New test.

--- gcc/cp/parser.c (revision 244069)
+++ gcc/cp/parser.c (revision 244070)
@@ -766,7 +766,7 @@ cp_lexer_previous_token (cp_lexer *lexer
   /* Skip

Re: [C++ PATCH] PR 79091, ICE with unnamed enum mangle

2017-01-17 Thread Jason Merrill

On Tue, Jan 17, 2017 at 3:14 PM, Jason Merrill  wrote:
> On Tue, Jan 17, 2017 at 1:20 PM, Nathan Sidwell  wrote:
>> Jason,
>> in r241944:
>> 2016-11-07  Jason Merrill  
>>
>> Implement P0012R1, Make exception specifications part of the type
>> system.
>>
>> You increment processing_template_decl around the mangling of a template
>> function decl.  AFAICT, that's so that nothrow_spec_p doesn't explode at:
>>   gcc_assert (processing_template_decl
>>   || TREE_PURPOSE (spec) == error_mark_node);
>> when called from the mangler at:
>>   if (nothrow_spec_p (spec))
>> write_string ("Do");
>>   else if (TREE_PURPOSE (spec))
>> 
>>
>> the trouble is that's now causing no_linkage_check to bail out early with:
>>   if (processing_template_decl)
>> return NULL_TREE;
>>
>> thus triggering the assert:
>>  gcc_assert (no_linkage_check (type, /*relaxed_p=*/true));
>>   /* Just use the old mangling at namespace scope.  */
>>
>> It seems to me risky to have processsing_template_decl incremented, as
>> no_linkage_check is called from a number of places in the mangler.
>
> Makes sense.
>
>> Thus the
>> attached patch, which adds a default arg to nothrow_spec_p to tell it to be
>> a little more lenient.
>>
>> In the old days, I'd've made nothrow_spec_p an asserting wrapper for a
>> non-asserting function, and called that non-asserting function from the
>> mangler.  But we can use default arg magic to avoid adjusting all the other
>> call sites.  I'm fine with doing it the wrapper way, if you'd prefer.
>>
>> ok?
>
> Hmm, what if write_exception_spec checks for a dependent
> noexcept-specifier first, and noexcept_spec_p second?  That seems like
> it would avoid needing any change to nothrow_spec_p.

(OK either way)

Jason

Re: [C++ PATCH] PR 79091, ICE with unnamed enum mangle

2017-01-17 Thread Jason Merrill

On Tue, Jan 17, 2017 at 1:20 PM, Nathan Sidwell  wrote:
> Jason,
> in r241944:
> 2016-11-07  Jason Merrill  
>
> Implement P0012R1, Make exception specifications part of the type
> system.
>
> You increment processing_template_decl around the mangling of a template
> function decl.  AFAICT, that's so that nothrow_spec_p doesn't explode at:
>   gcc_assert (processing_template_decl
>   || TREE_PURPOSE (spec) == error_mark_node);
> when called from the mangler at:
>   if (nothrow_spec_p (spec))
> write_string ("Do");
>   else if (TREE_PURPOSE (spec))
> 
>
> the trouble is that's now causing no_linkage_check to bail out early with:
>   if (processing_template_decl)
> return NULL_TREE;
>
> thus triggering the assert:
>  gcc_assert (no_linkage_check (type, /*relaxed_p=*/true));
>   /* Just use the old mangling at namespace scope.  */
>
> It seems to me risky to have processsing_template_decl incremented, as
> no_linkage_check is called from a number of places in the mangler.

Makes sense.

> Thus the
> attached patch, which adds a default arg to nothrow_spec_p to tell it to be
> a little more lenient.
>
> In the old days, I'd've made nothrow_spec_p an asserting wrapper for a
> non-asserting function, and called that non-asserting function from the
> mangler.  But we can use default arg magic to avoid adjusting all the other
> call sites.  I'm fine with doing it the wrapper way, if you'd prefer.
>
> ok?

Hmm, what if write_exception_spec checks for a dependent
noexcept-specifier first, and noexcept_spec_p second?  That seems like
it would avoid needing any change to nothrow_spec_p.

Jason

[PATCH 3/5] improve usage of PROP_gimple_lomp_dev

2017-01-17 Thread Alexander Monakov

This patch implements propagation of PROP_gimple_lomp_dev during inlining to
allow using it to decide whether pass_omp_device_lower needs to run.

We need to clear this property in expand_omp_simd when the _simt_ clause is
present even if we are not doing any SIMT transforms, because we need to
cleanup the call to GOMP_USE_SIMT () guarding the entry to the cloned loop.

* omp-expand.c (expand_omp_simd): Clear PROP_gimple_lomp_dev regardless 
of safelen status.
* omp-offload.c (pass_omp_device_lower::gate): Use PROP_gimple_lomp_dev.
* passes.c (dump_properties): Handle PROP_gimple_lomp_dev.
* tree-inline.c (expand_call_inline): Propagate PROP_gimple_lomp_dev.

---
 gcc/omp-expand.c  | 11 +++
 gcc/omp-offload.c |  9 ++---
 gcc/passes.c  |  2 ++
 gcc/tree-inline.c |  9 ++---
 4 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 6a29df6..1312735 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -4590,13 +4590,16 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
 }
   tree step = fd->loop.step;
 
-  bool is_simt = (safelen_int > 1
- && omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
- OMP_CLAUSE__SIMT_));
-  tree simt_lane = NULL_TREE, simt_maxlane = NULL_TREE;
+  bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
+ OMP_CLAUSE__SIMT_);
   if (is_simt)
 {
   cfun->curr_properties &= ~PROP_gimple_lomp_dev;
+  is_simt = safelen_int > 1;
+}
+  tree simt_lane = NULL_TREE, simt_maxlane = NULL_TREE;
+  if (is_simt)
+{
   simt_lane = create_tmp_var (unsigned_type_node);
   gimple *g = gimple_build_call_internal (IFN_GOMP_SIMT_LANE, 0);
   gimple_call_set_lhs (g, simt_lane);
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 8c2c6eb..acecb63 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1613,14 +1613,9 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *ARG_UNUSED (fun))
+  virtual bool gate (function *fun)
 {
-  /* FIXME: this should use PROP_gimple_lomp_dev.  */
-#ifdef ACCEL_COMPILER
-  return true;
-#else
-  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
-#endif
+  return !(fun->curr_properties & PROP_gimple_lomp_dev);
 }
   virtual unsigned int execute (function *)
 {
diff --git a/gcc/passes.c b/gcc/passes.c
index d11b712..db006f9 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2900,6 +2900,8 @@ dump_properties (FILE *dump, unsigned int props)
 fprintf (dump, "PROP_rtl\n");
   if (props & PROP_gimple_lomp)
 fprintf (dump, "PROP_gimple_lomp\n");
+  if (props & PROP_gimple_lomp_dev)
+fprintf (dump, "PROP_gimple_lomp_dev\n");
   if (props & PROP_gimple_lcx)
 fprintf (dump, "PROP_gimple_lcx\n");
   if (props & PROP_gimple_lvec)
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 0de0b89..9b49e0d 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4413,6 +4413,7 @@ expand_call_inline (basic_block bb, gimple *stmt, 
copy_body_data *id)
   bool purge_dead_abnormal_edges;
   gcall *call_stmt;
   unsigned int i;
+  unsigned int prop_mask, src_properties;
 
   /* The gimplifier uses input_location in too many places, such as
  internal_get_tmp_var ().  */
@@ -4617,11 +4618,13 @@ expand_call_inline (basic_block bb, gimple *stmt, 
copy_body_data *id)
   id->call_stmt = stmt;
 
   /* If the src function contains an IFN_VA_ARG, then so will the dst
- function after inlining.  */
-  if ((id->src_cfun->curr_properties & PROP_gimple_lva) == 0)
+ function after inlining.  Likewise for IFN_GOMP_USE_SIMT.  */
+  prop_mask = PROP_gimple_lva | PROP_gimple_lomp_dev;
+  src_properties = id->src_cfun->curr_properties & prop_mask;
+  if (src_properties != prop_mask)
 {
   struct function *dst_cfun = DECL_STRUCT_FUNCTION (id->dst_fn);
-  dst_cfun->curr_properties &= ~PROP_gimple_lva;
+  dst_cfun->curr_properties &= src_properties | ~prop_mask;
 }
 
   gcc_assert (!id->src_cfun->after_inlining);
-- 
1.8.3.1

[PATCH 4/5] nvptx: implement SIMT enter/exit insns

2017-01-17 Thread Alexander Monakov

This patch adds handling of new omp_simt_enter/omp_simt_exit named insns
in the NVPTX backend.

* config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare.
(nvptx_output_simt_exit): Declare.
* config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use
cfun->machine->unisimt_location.  Handle NULL unisimt_predicate.
(init_softstack_frame): Move initialization of crtl->is_leaf to...
(nvptx_declare_function_name): ...here.  Emit declaration of local
memory space buffer for omp_simt_enter insn.
(nvptx_output_unisimt_switch): New.
(nvptx_output_softstack_switch): New.
(nvptx_output_simt_enter): New.
(nvptx_output_simt_exit): New.
* config/nvptx/nvptx.h (struct machine_function): New fields
has_simtreg, unisimt_location, simt_stack_size, simt_stack_align.
* config/nvptx/nvptx.md (UNSPECV_SIMT_ENTER): New unspec.
(UNSPECV_SIMT_EXIT): Ditto.
(omp_simt_enter_insn): New insn.
(omp_simt_enter): New expansion.
(omp_simt_exit): New insn.
* config/nvptx/nvptx.opt (msoft-stack-reserve-local): New option.


---
 gcc/config/nvptx/nvptx-protos.h |   2 +
 gcc/config/nvptx/nvptx.c| 163 +++-
 gcc/config/nvptx/nvptx.h|   6 ++
 gcc/config/nvptx/nvptx.md   |  39 ++
 gcc/config/nvptx/nvptx.opt  |   4 +
 5 files changed, 196 insertions(+), 18 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 331ec0a..2f836c1 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -53,5 +53,7 @@ extern const char *nvptx_output_mov_insn (rtx, rtx);
 extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
 extern const char *nvptx_output_return (void);
 extern const char *nvptx_output_set_softstack (unsigned);
+extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
+extern const char *nvptx_output_simt_exit (rtx);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index b3f025f..f132845 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1047,11 +1047,6 @@ init_softstack_frame (FILE *file, unsigned alignment, 
HOST_WIDE_INT size)
   fprintf (file, "\t\tsub.u%d %s, %s, " HOST_WIDE_INT_PRINT_DEC ";\n",
   bits, reg_stack, reg_frame, size);
 
-  /* Usually 'crtl->is_leaf' is computed during register allocator
- initialization (which is not done on NVPTX) or for pressure-sensitive
- optimizations.  Initialize it here, except if already set.  */
-  if (!crtl->is_leaf)
-crtl->is_leaf = leaf_function_p ();
   if (!crtl->is_leaf)
 fprintf (file, "\t\tst.shared.u%d [%s], %s;\n",
 bits, reg_sspslot, reg_stack);
@@ -1079,24 +1074,29 @@ nvptx_init_axis_predicate (FILE *file, int regno, const 
char *name)
 static void
 nvptx_init_unisimt_predicate (FILE *file)
 {
+  cfun->machine->unisimt_location = gen_reg_rtx (Pmode);
+  int loc = REGNO (cfun->machine->unisimt_location);
   int bits = POINTER_SIZE;
-  int master = REGNO (cfun->machine->unisimt_master);
-  int pred = REGNO (cfun->machine->unisimt_predicate);
+  fprintf (file, "\t.reg.u%d %%r%d;\n", bits, loc);
   fprintf (file, "\t{\n");
   fprintf (file, "\t\t.reg.u32 %%ustmp0;\n");
   fprintf (file, "\t\t.reg.u%d %%ustmp1;\n", bits);
-  fprintf (file, "\t\t.reg.u%d %%ustmp2;\n", bits);
   fprintf (file, "\t\tmov.u32 %%ustmp0, %%tid.y;\n");
   fprintf (file, "\t\tmul%s.u32 %%ustmp1, %%ustmp0, 4;\n",
   bits == 64 ? ".wide" : ".lo");
-  fprintf (file, "\t\tmov.u%d %%ustmp2, __nvptx_uni;\n", bits);
-  fprintf (file, "\t\tadd.u%d %%ustmp2, %%ustmp2, %%ustmp1;\n", bits);
-  fprintf (file, "\t\tld.shared.u32 %%r%d, [%%ustmp2];\n", master);
-  fprintf (file, "\t\tmov.u32 %%ustmp0, %%tid.x;\n");
-  /* Compute 'master lane index' as 'tid.x & __nvptx_uni[tid.y]'.  */
-  fprintf (file, "\t\tand.b32 %%r%d, %%r%d, %%ustmp0;\n", master, master);
-  /* Compute predicate as 'tid.x == master'.  */
-  fprintf (file, "\t\tsetp.eq.u32 %%r%d, %%r%d, %%ustmp0;\n", pred, master);
+  fprintf (file, "\t\tmov.u%d %%r%d, __nvptx_uni;\n", bits, loc);
+  fprintf (file, "\t\tadd.u%d %%r%d, %%r%d, %%ustmp1;\n", bits, loc, loc);
+  if (cfun->machine->unisimt_predicate)
+{
+  int master = REGNO (cfun->machine->unisimt_master);
+  int pred = REGNO (cfun->machine->unisimt_predicate);
+  fprintf (file, "\t\tld.shared.u32 %%r%d, [%%r%d];\n", master, loc);
+  fprintf (file, "\t\tmov.u32 %%ustmp0, %%laneid;\n");
+  /* Compute 'master lane index' as 'laneid & __nvptx_uni[tid.y]'.  */
+  fprintf (file, "\t\tand.b32 %%r%d, %%r%d, %%ustmp0;\n", master, master);
+  /* Compute predicate as 'tid.x == master'.  */
+  fprintf (file, "\t\tsetp.eq.u32 %%r%d, %%r%d, %%ustmp0;\n", pred, 
master);
+}
   fprintf (file, "\t}\n");
   need_unisimt_decl = true;
 }
@@ -1220,6 +1220,12 @@ nvptx_declare_function_name (FILE

[PATCH 5/5] omp-low: implement SIMT privatization

2017-01-17 Thread Alexander Monakov

This patch adjusts privatization in OpenMP SIMD loops lowered for SIMT targets.
Addressable private variables become fields of new '.omp_simt' structure that
is allocated by a call to GOMP_SIMT_ENTER ().  This function is similar to
__builtin_alloca_with_align, except that it obtains per-SIMT-lane storage and
implicitly performs target-specific actions; on NVPTX that means a transition
to per-lane softstacks and inverting the uniform-simt mask.


* internal-fn.c (expand_GOMP_SIMT_ENTER): New.
(expand_GOMP_SIMT_EXIT): New.
* internal-fn.def (GOMP_SIMT_ENTER): New internal function.
(GOMP_SIMT_EXIT): Ditto.
* target-insns.def (omp_simt_enter): New insn.
(omp_simt_exit): Ditto.
* omp-low.c (struct omplow_simd_context): New fields simtrec,
simt_ilist.
(lower_rec_simd_input_clauses): Implement SIMT privatization.
(lower_rec_input_clauses): Likewise.
(lower_lastprivate_clauses): Handle SIMT privatization.

---
 gcc/internal-fn.c|  34 +
 gcc/internal-fn.def  |   2 +
 gcc/omp-low.c| 136 ---
 gcc/target-insns.def |   2 +
 4 files changed, 145 insertions(+), 29 deletions(-)

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index b1dbc98..bc94a3d 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -166,6 +166,40 @@ expand_GOMP_USE_SIMT (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* Allocate per-lane storage and begin non-uniform execution region.  */
+
+static void
+expand_GOMP_SIMT_ENTER (internal_fn, gcall *stmt)
+{
+  rtx target;
+  tree lhs = gimple_call_lhs (stmt);
+  if (lhs)
+target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  else
+target = gen_reg_rtx (Pmode);
+  rtx size = expand_normal (gimple_call_arg (stmt, 0));
+  rtx align = expand_normal (gimple_call_arg (stmt, 1));
+  struct expand_operand ops[3];
+  create_output_operand ([0], target, Pmode);
+  create_input_operand ([1], size, Pmode);
+  create_input_operand ([2], align, Pmode);
+  gcc_assert (targetm.have_omp_simt_enter ());
+  expand_insn (targetm.code_for_omp_simt_enter, 3, ops);
+}
+
+/* Deallocate per-lane storage and leave non-uniform execution region.  */
+
+static void
+expand_GOMP_SIMT_EXIT (internal_fn, gcall *stmt)
+{
+  gcc_checking_assert (!gimple_call_lhs (stmt));
+  rtx arg = expand_normal (gimple_call_arg (stmt, 0));
+  struct expand_operand ops[1];
+  create_input_operand ([0], arg, Pmode);
+  gcc_assert (targetm.have_omp_simt_exit ());
+  expand_insn (targetm.code_for_omp_simt_exit, 1, ops);
+}
+
 /* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
without SIMT execution this should be expanded in omp_device_lower pass.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 9a03e17..c3dbb02 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -142,6 +142,8 @@ DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary)
 
 DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_SIMT_ENTER, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_SIMT_EXIT, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, 
NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a5f8bf65..499afce 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3452,6 +3452,8 @@ omp_clause_aligned_alignment (tree clause)
 struct omplow_simd_context {
   tree idx;
   tree lane;
+  tree simtrec;
+  gimple_seq simt_ilist;
   int max_vf;
   bool is_simt;
 };
@@ -3488,18 +3490,48 @@ lower_rec_simd_input_clauses (tree new_var, omp_context 
*ctx,
   if (max_vf == 1)
 return false;
 
-  tree atype = build_array_type_nelts (TREE_TYPE (new_var), max_vf);
-  tree avar = create_tmp_var_raw (atype);
-  if (TREE_ADDRESSABLE (new_var))
-TREE_ADDRESSABLE (avar) = 1;
-  DECL_ATTRIBUTES (avar)
-= tree_cons (get_identifier ("omp simd array"), NULL,
-DECL_ATTRIBUTES (avar));
-  gimple_add_tmp_var (avar);
-  ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
-NULL_TREE, NULL_TREE);
-  lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
-NULL_TREE, NULL_TREE);
+  if (sctx->is_simt)
+{
+  if (is_gimple_reg (new_var))
+   {
+ ivar = lvar = new_var;
+ return true;
+   }
+  tree field = build_decl (DECL_SOURCE_LOCATION (new_var), FIELD_DECL,
+  DECL_NAME (new_var), TREE_TYPE (new_var));
+  SET_DECL_ALIGN (field, DECL_ALIGN (new_var));
+  DECL_USER_ALIGN (field) = DECL_USER_ALIGN (new_var);
+  TREE_THIS_VOLATILE (field) = TREE_THIS_VOLATILE (new_var);
+  tree rectype = TREE_TYPE (TREE_TYPE

[PATCH 2/5] ipa-inline: disallow inlining into SIMT regions

2017-01-17 Thread Alexander Monakov

This patch prevents inlining into SIMT code by introducing a new loop
property 'in_simtreg' and using ANNOTATE_EXPR (_, 'simtreg') to carry this
property between omp-low and the cfg pass (this is needed only for SIMD
reduction helper loops; for main bodies of SIMD loops omp-expand sets
loop->in_simtreg directly).

Technically the gimplify.c hunk is not needed since the frontends wouldn't
produce annot_expr_simtreg_kind.

Eventually inlining should be possible to do by remapping callee's variables
to use storage provided by IFN_GOMP_SIMT_ENTER ().

* cfgloop.h (struct loop): New field 'in_simtreg'.  Use it...
* ipa-inline.c (can_inline_edge_p): ...here to disallow inlining.
* omp-expand.c (expand_omp_simd): Set loop->in_simtreg.
* omp-low.c (lower_rec_input_clauses): Annotate condition of SIMT
reduction loops with annot_expr_simtreg_kind.
* tree-core.h (enum annot_expr_kind): New entry
'annot_expr_simtreg_kind'.
* tree-cfg.c (replace_loop_annotate_in_block): Handle
annot_expr_simtreg_kind.
(replace_loop_annotate): Ditto.
* tree-pretty-print.c (dump_generic_node): Ditto.
* gimplify.c (gimple_boolify): Ditto.

---
 gcc/cfgloop.h   |  3 +++
 gcc/gimplify.c  |  1 +
 gcc/ipa-inline.c| 14 ++
 gcc/omp-expand.c|  1 +
 gcc/omp-low.c   |  9 -
 gcc/tree-cfg.c  |  4 
 gcc/tree-core.h |  1 +
 gcc/tree-pretty-print.c |  3 +++
 8 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 0448a61..25b441c 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -220,6 +220,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if the loop is part of an oacc kernels region.  */
   bool in_oacc_kernels_region;
 
+  /* True if the loop corresponds to a SIMT variant of OpenMP SIMD region.  */
+  bool in_simtreg;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
  builtins.  */
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index a300133..a09037b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -3682,6 +3682,7 @@ gimple_boolify (tree expr)
case annot_expr_ivdep_kind:
case annot_expr_no_vector_kind:
case annot_expr_vector_kind:
+   case annot_expr_simtreg_kind:
  TREE_OPERAND (expr, 0) = gimple_boolify (TREE_OPERAND (expr, 0));
  if (TREE_CODE (type) != BOOLEAN_TYPE)
TREE_TYPE (expr) = boolean_type_node;
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 5f2371c..dd4c0948 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -101,6 +101,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "gimple-ssa.h"
 #include "cgraph.h"
+#include "cfgloop.h"
 #include "lto-streamer.h"
 #include "trans-mem.h"
 #include "calls.h"
@@ -374,6 +375,19 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
   e->inline_failed = CIF_ATTRIBUTE_MISMATCH;
   inlinable = false;
 }
+  /* Don't inline into SIMT variants of OpenMP SIMD regions: inlining needs to
+ remap addressable variables to use storage provided by IFN_SIMT_ENTER.  */
+  else if (flag_openmp && gimple_has_body_p (caller->decl))
+{
+  struct loop *l;
+  for (l = gimple_bb (e->call_stmt)->loop_father; l; l = loop_outer (l))
+   if (l->in_simtreg)
+ {
+   e->inline_failed = CIF_UNSPECIFIED;
+   inlinable = false;
+   break;
+ }
+}
   /* Check if caller growth allows the inlining.  */
   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
   && !disregard_limits
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 1f1055c..6a29df6 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -4794,6 +4794,7 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
   loop->latch = cont_bb;
   add_loop (loop, l1_bb->loop_father);
   loop->safelen = safelen_int;
+  loop->in_simtreg = is_simt;
   if (simduid)
{
  loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 13d9b6b..a5f8bf65 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4510,7 +4510,14 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
*ilist, gimple_seq *dlist,
  gimple_seq_add_stmt (dlist, g);
 
  gimple_seq_add_stmt (dlist, gimple_build_label (header));
- g = gimple_build_cond (LT_EXPR, simt_lane, simt_vf, body, end);
+ t = create_tmp_var (boolean_type_node);
+ g = gimple_build_assign (t, LT_EXPR, simt_lane, simt_vf);
+ gimple_seq_add_stmt (dlist, g);
+ tree ann = build_int_cst (integer_type_node, annot_expr_simtreg_kind);
+ g = gimple_build_call_internal (IFN_ANNOTATE, 2, t, ann);
+ gimple_call_set_lhs (g, t);
+ gimple_seq_add_stmt (dlist, g);
+

[PATCH 1/5] omp-low: introduce omplow_simd_context

2017-01-17 Thread Alexander Monakov

In preparation to handle new SIMT privatization in lower_rec_simd_input_clauses
this patch factors out variables common to this and lower_rec_input_clauses to
a new structure.  No functional change intended.

* omp-low.c (omplow_simd_context): New struct.  Use it...
(lower_rec_simd_input_clauses): ...here and...
(lower_rec_input_clauses): ...here to hold common data.  Adjust all
references to idx, lane, max_vf, is_simt.

---
 gcc/omp-low.c | 79 ---
 1 file changed, 43 insertions(+), 36 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e69b2b2..13d9b6b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3445,20 +3445,28 @@ omp_clause_aligned_alignment (tree clause)
   return build_int_cst (integer_type_node, al);
 }
 
+
+/* This structure is part of the interface between lower_rec_simd_input_clauses
+   and lower_rec_input_clauses.  */
+
+struct omplow_simd_context {
+  tree idx;
+  tree lane;
+  int max_vf;
+  bool is_simt;
+};
+
 /* Helper function of lower_rec_input_clauses, used for #pragma omp simd
privatization.  */
 
 static bool
-lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int _vf,
- tree , tree , tree , tree )
+lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
+ omplow_simd_context *sctx, tree , tree )
 {
+  int _vf = sctx->max_vf;
   if (max_vf == 0)
 {
-  if (omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
-  OMP_CLAUSE__SIMT_))
-   max_vf = omp_max_simt_vf ();
-  else
-   max_vf = omp_max_vf ();
+  max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
   if (max_vf > 1)
{
  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
@@ -3473,8 +3481,8 @@ lower_rec_simd_input_clauses (tree new_var, omp_context 
*ctx, int _vf,
}
   if (max_vf > 1)
{
- idx = create_tmp_var (unsigned_type_node);
- lane = create_tmp_var (unsigned_type_node);
+ sctx->idx = create_tmp_var (unsigned_type_node);
+ sctx->lane = create_tmp_var (unsigned_type_node);
}
 }
   if (max_vf == 1)
@@ -3488,9 +3496,9 @@ lower_rec_simd_input_clauses (tree new_var, omp_context 
*ctx, int _vf,
 = tree_cons (get_identifier ("omp simd array"), NULL,
 DECL_ATTRIBUTES (avar));
   gimple_add_tmp_var (avar);
-  ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, idx,
+  ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
 NULL_TREE, NULL_TREE);
-  lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, lane,
+  lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
 NULL_TREE, NULL_TREE);
   if (DECL_P (new_var))
 {
@@ -3534,14 +3542,13 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
*ilist, gimple_seq *dlist,
   int pass;
   bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
  && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD);
-  bool maybe_simt = is_simd && omp_find_clause (clauses, OMP_CLAUSE__SIMT_);
-  int max_vf = 0;
-  tree lane = NULL_TREE, idx = NULL_TREE;
+  omplow_simd_context sctx = omplow_simd_context ();
   tree simt_lane = NULL_TREE;
   tree ivar = NULL_TREE, lvar = NULL_TREE;
   gimple_seq llist[3] = { };
 
   copyin_seq = NULL;
+  sctx.is_simt = is_simd && omp_find_clause (clauses, OMP_CLAUSE__SIMT_);
 
   /* Set max_vf=1 (which will later enforce safelen=1) in simd loops
  with data sharing clauses referencing variable sized vars.  That
@@ -3553,18 +3560,18 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
*ilist, gimple_seq *dlist,
{
case OMP_CLAUSE_LINEAR:
  if (OMP_CLAUSE_LINEAR_ARRAY (c))
-   max_vf = 1;
+   sctx.max_vf = 1;
  /* FALLTHRU */
case OMP_CLAUSE_PRIVATE:
case OMP_CLAUSE_FIRSTPRIVATE:
case OMP_CLAUSE_LASTPRIVATE:
  if (is_variable_sized (OMP_CLAUSE_DECL (c)))
-   max_vf = 1;
+   sctx.max_vf = 1;
  break;
case OMP_CLAUSE_REDUCTION:
  if (TREE_CODE (OMP_CLAUSE_DECL (c)) == MEM_REF
  || is_variable_sized (OMP_CLAUSE_DECL (c)))
-   max_vf = 1;
+   sctx.max_vf = 1;
  break;
default:
  continue;
@@ -4119,8 +4126,8 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, 
gimple_seq *dlist,
  tree y = lang_hooks.decls.omp_clause_dtor (c, new_var);
  if ((TREE_ADDRESSABLE (new_var) || nx || y
   || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
- && lower_rec_simd_input_clauses (new_var, ctx, max_vf,
-  idx, lane, ivar, lvar))
+ && lower_rec_simd_input_clauses (new_var, ctx, ,
+  ivar, lvar))

[PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-01-17 Thread Alexander Monakov

Hello,

This patch series addresses a correctness issue in how OpenMP SIMD regions are
transformed for SIMT execution.  On NVPTX, OpenMP target code runs with
per-warp stacks outside of SIMD regions, and needs to transition to per-lane
stacks on SIMD region boundaries.  Originally the plan was to implement that
by outlining SIMD loop into a separate function, and switch stacks around the
function call.  I didn't like that approach due to how it would penalize even
the simplest SIMD loops, and how it's not convinient to implement in GCC.

These patches implement an alternative approach I didn't see until recently.
Instead of outlining, collect variables that would need to be on per-lane
stacks (that is, addressable private variables) to one struct, and allocate
that struct with an alloca-like function.

After OpenMP lowering, inlining might break this by inlining functions with
address-taken locals into SIMD regions.  For now, such inlining is disallowed
(this penalizes only SIMT code), but eventually that can be handled by
collecting those locals into an allocated struct in a similar manner.

Alexander

[PATCH, i386]: Do not mix mask registers with other register sets

2017-01-17 Thread Uros Bizjak

Hello!

As said above i386.c, inline_secondary_memory_needed:

--cut here--
   The function can't work reliably when one of the CLASSES is a class
   containing registers from multiple sets.  We avoid this by never combining
   different sets in a single alternative in the machine description.
   Ensure that this constraint holds to avoid unexpected surprises.
--cut here--

The patch enforces this constraint also for mask registers and fixes
an oversight in *movsi_internal.

2017-01-17  Uros Bizjak  

* config/i386/i386.h (MASK_CLASS_P): New define.
* config/i386/i386.c (inline_secondary_memory_needed): Ensure that
there are no registers from different register sets also when
mask registers are used.  Update function comment.
* config/i386/i386.md (*movsi_internal): Split (*k/*krm) alternative
to (*k/*r) and (*k/*km) alternatives.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 244540)
+++ config/i386/i386.c  (working copy)
@@ -39868,19 +39868,19 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
   return false;
 }
 
-/* If we are copying between general and FP registers, we need a memory
-   location. The same is true for SSE and MMX registers.
+/* If we are copying between registers from different register sets
+   (e.g. FP and integer), we may need a memory location.
 
-   To optimize register_move_cost performance, allow inline variant.
-
-   The macro can't work reliably when one of the CLASSES is class containing
-   registers from multiple units (SSE, MMX, integer).  We avoid this by never
-   combining those units in single alternative in the machine description.
+   The function can't work reliably when one of the CLASSES is a class
+   containing registers from multiple sets.  We avoid this by never combining
+   different sets in a single alternative in the machine description.
Ensure that this constraint holds to avoid unexpected surprises.
 
-   When STRICT is false, we are being called from REGISTER_MOVE_COST, so do not
-   enforce these sanity checks.  */
+   When STRICT is false, we are being called from REGISTER_MOVE_COST,
+   so do not enforce these sanity checks.
 
+   To optimize register_move_cost performance, define inline variant.  */
+
 static inline bool
 inline_secondary_memory_needed (enum reg_class class1, enum reg_class class2,
machine_mode mode, int strict)
@@ -39887,12 +39887,15 @@ inline_secondary_memory_needed (enum reg_class cla
 {
   if (lra_in_progress && (class1 == NO_REGS || class2 == NO_REGS))
 return false;
+
   if (MAYBE_FLOAT_CLASS_P (class1) != FLOAT_CLASS_P (class1)
   || MAYBE_FLOAT_CLASS_P (class2) != FLOAT_CLASS_P (class2)
   || MAYBE_SSE_CLASS_P (class1) != SSE_CLASS_P (class1)
   || MAYBE_SSE_CLASS_P (class2) != SSE_CLASS_P (class2)
   || MAYBE_MMX_CLASS_P (class1) != MMX_CLASS_P (class1)
-  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2))
+  || MAYBE_MMX_CLASS_P (class2) != MMX_CLASS_P (class2)
+  || MAYBE_MASK_CLASS_P (class1) != MASK_CLASS_P (class1)
+  || MAYBE_MASK_CLASS_P (class2) != MASK_CLASS_P (class2))
 {
   gcc_assert (!strict || lra_in_progress);
   return true;
@@ -39902,7 +39905,7 @@ inline_secondary_memory_needed (enum reg_class cla
 return true;
 
   /* Between mask and general, we have moves no larger than word size.  */
-  if ((MAYBE_MASK_CLASS_P (class1) != MAYBE_MASK_CLASS_P (class2))
+  if ((MASK_CLASS_P (class1) != MASK_CLASS_P (class2))
   && (GET_MODE_SIZE (mode) > UNITS_PER_WORD))
   return true;
 
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 244540)
+++ config/i386/i386.h  (working copy)
@@ -1378,6 +1378,8 @@ enum reg_class
   reg_class_subset_p ((CLASS), ALL_SSE_REGS)
 #define MMX_CLASS_P(CLASS) \
   ((CLASS) == MMX_REGS)
+#define MASK_CLASS_P(CLASS) \
+  reg_class_subset_p ((CLASS), MASK_REGS)
 #define MAYBE_INTEGER_CLASS_P(CLASS) \
   reg_classes_intersect_p ((CLASS), GENERAL_REGS)
 #define MAYBE_FLOAT_CLASS_P(CLASS) \
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 244540)
+++ config/i386/i386.md (working copy)
@@ -2324,9 +2324,9 @@
 
 (define_insn "*movsi_internal"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-   "=r,m ,*y,*y,?rm,?*y,*v,*v,*v,m ,?r ,?r,?*Yi,*k  ,*rm")
+   "=r,m ,*y,*y,?rm,?*y,*v,*v,*v,m ,?r ,?r,?*Yi,*k,*k 
,*rm")
(match_operand:SI 1 "general_operand"
-   "g ,re,C ,*y,*y ,rm ,C ,*v,m ,*v,*Yj,*v,r   ,*krm,*k"))]
+   "g ,re,C ,*y,*y ,rm ,C ,*v,m ,*v,*Yj,*v,r   
,*r,*km,*k"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch

[PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread Wilco Dijkstra

A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

* config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
(arm_ashldi3_1bit): Remove pattern.
(ashrdi3): Remove shift by 1 expansion.
(arm_ashrdi3_1bit): Remove pattern.
(lshrdi3): Remove shift by 1 expansion.
(arm_lshrdi3_1bit): Remove pattern.
* config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
cost of ashldi3 by 1.
* config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
(di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
   + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
  if (speed_p)
*cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
  return true;
}
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-{
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI0 "s_register_operand" "=r,")
-(ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI0 "s_register_operand" "")
(ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-{
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-(ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
(ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-{
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-(lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "lshrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
(lshiftrt:SI (match_operand:SI 1 "s_register_operand" "")
diff --git

Re: [PATCH] [ARC] Clean up arc header file.

2017-01-17 Thread Mike Stump

On Jan 17, 2017, at 3:30 AM, Andrew Burgess  wrote:
> 
>> This patch revamps the arc's header file by means of using separate
>> headers for different tool targets. Each target header file holds the
>> specific compiler backend macros definitions. Thus, we have:
>> - elf.h is used for bare metal type of toolchains.
>> - linux.h is used by our Linux type of toolchains.
>> - big.h is used by big-endians toolchains.
>> 
>> This patch also cleans up arc specifics from config.gcc, consolidating
>> everything in one of the above new header files.
>> 
>> OK to apply?
> 
> I'm happy with this change, but I don't think it can be applied until
> GCC is back in to Stage 1, right?

Ports have more latitude to check things into gcc into stage 2+ and more 
latitude to check things into release branches.  The patch set strikes me as 
something not unreasonable to drop into trunk if you want.

Like all things, you have to use your good judgement.  You should have around 3 
months to spot and correct any deficiencies in the patch, which strikes me as a 
reasonable amount of time to spot any problems.

As the time grows short, you'll want to approve less and tighten down the 
criteria you use.  You should weigh things like, risk, how hard it is to review 
the work, how easy is it to miss something in a review, the type of failure it 
might introduce, the benefits the work brings, any perceived downsides, the 
likelihood of the test suite being able to spot problems with the work, do you 
have time to fix any deficiencies people might find, and so on.

That said, if you're not comfortable approving it, as reviewer, asking for it 
to wait till stage one isn't unreasonable.

[PATCH][libgcc, fuchsia]

2017-01-17 Thread Josh Conner via gcc-patches


The attached patch adds fuchsia support to libgcc.

OK for trunk?

Thanks -

Josh

2017-01-17  Joshua Conner  

* config/arm/unwind-arm.h (_Unwind_decode_typeinfo_ptr): Use
pc-relative indirect handling for fuchsia.
* config/t-slibgcc-fuchsia: New file.
* config.host (*-*-fuchsia*, aarch64*-*-fuchsia*, arm*-*-fuchsia*,
x86_64-*-fuchsia*): Add definitions.

Index: config/arm/unwind-arm.h
===
--- config/arm/unwind-arm.h (revision 244542)
+++ config/arm/unwind-arm.h (working copy)
@@ -49,7 +49,7 @@
return 0;
 
 #if (defined(linux) && !defined(__uClinux__)) || defined(__NetBSD__) \
-|| defined(__FreeBSD__)
+|| defined(__FreeBSD__) || defined(__fuchsia__)
   /* Pc-relative indirect.  */
 #define _GLIBCXX_OVERRIDE_TTYPE_ENCODING (DW_EH_PE_pcrel | DW_EH_PE_indirect)
   tmp += ptr;
Index: config/t-slibgcc-fuchsia
===
--- config/t-slibgcc-fuchsia(revision 0)
+++ config/t-slibgcc-fuchsia(working copy)
@@ -0,0 +1,22 @@
+# Copyright (C) 2017 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# Fuchsia-specific shared library overrides.
+
+SHLIB_LDFLAGS = -Wl,--soname=$(SHLIB_SONAME) \
+$(LDFLAGS)
Index: config.host
===
--- config.host (revision 244542)
+++ config.host (working copy)
@@ -228,6 +228,10 @@
   ;;
   esac
   ;;
+*-*-fuchsia*)
+  tmake_file="$tmake_file t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip t-slibgcc 
t-slibgcc-fuchsia"
+  extra_parts="crtbegin.o crtend.o"
+  ;;
 *-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-gnu* | 
*-*-kopensolaris*-gnu)
   tmake_file="$tmake_file t-crtstuff-pic t-libgcc-pic t-eh-dw2-dip t-slibgcc 
t-slibgcc-gld t-slibgcc-elf-ver t-linux"
   extra_parts="crtbegin.o crtbeginS.o crtbeginT.o crtend.o crtendS.o"
@@ -337,6 +341,10 @@
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
;;
+aarch64*-*-fuchsia*)
+   tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
+   ;;
 aarch64*-*-linux*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/linux-unwind.h
@@ -389,6 +397,12 @@
unwind_header=config/arm/unwind-arm.h
tmake_file="${tmake_file} t-softfp-sfdf t-softfp-excl arm/t-softfp 
t-softfp"
;;
+arm*-*-fuchsia*)
+   tmake_file="${tmake_file} arm/t-arm arm/t-elf arm/t-bpabi"
+   tmake_file="${tmake_file} arm/tsoftfp t-softfp"
+   tm_file="${tm_file} arm/bpabi-lib.h"
+   unwind_header=config/arm/unwind-arm.h
+   ;;
 arm*-*-netbsdelf*)
tmake_file="$tmake_file arm/t-arm arm/t-netbsd t-slibgcc-gld-nover"
;;
@@ -583,6 +597,9 @@
 x86_64-*-elf* | x86_64-*-rtems*)
tmake_file="$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic"
;;
+x86_64-*-fuchsia*)
+   tmake_file="$tmake_file t-libgcc-pic"
+   ;;
 i[34567]86-*-dragonfly*)
tmake_file="${tmake_file} i386/t-dragonfly i386/t-crtstuff"
md_unwind_header=i386/dragonfly-unwind.h

Re: [PATCH C++] Fix PR77489 -- mangling of discriminator >= 10

2017-01-17 Thread Jason Merrill

On Thu, Jan 12, 2017 at 2:36 AM, Markus Trippelsdorf
 wrote:
> On 2017.01.11 at 13:03 +0100, Jakub Jelinek wrote:
>> On Wed, Jan 11, 2017 at 12:48:29PM +0100, Markus Trippelsdorf wrote:
>> > @@ -1965,7 +1966,11 @@ write_discriminator (const int discriminator)
>> >if (discriminator > 0)
>> >  {
>> >write_char ('_');
>> > +  if (abi_version_at_least(11) && discriminator - 1 >= 10)
>> > +   write_char ('_');
>> >write_unsigned_number (discriminator - 1);
>> > +  if (abi_version_at_least(11) && discriminator - 1 >= 10)
>> > +   write_char ('_');
>>
>> Formatting nits, there should be space before (11).
>>
>> > +// { dg-final { scan-assembler "_ZZ3foovE8localVar__10_" } }
>> > +// { dg-final { scan-assembler "_ZZ3foovE8localVar__11_" } }
>>
>> Would be nice to also
>> // { dg-final { scan-assembler "_ZZ3foovE8localVar_9" } }
>>
>> Otherwise, I defer to Jason (primarily whether this doesn't need
>> ABI version 12).
>
> Thanks for review. I will fix these issues.
> Jason said on IRC that he is fine with ABI version 11.
>
> Ok for trunk?

This also needs the same (invoke.texi and warning) changes that Nathan
pointed out on your other mangling patch.

Jason

Re: [PATCH v2 C++] Fix PR70182 -- missing "on" in mangling of unresolved operators

2017-01-17 Thread Jason Merrill

On Thu, Jan 12, 2017 at 7:31 AM, Nathan Sidwell  wrote:
> Thanks, that does address my comments.  AFAICT your reading of the ABI doc
> is correct, but I'd like Jason to confirm that.

I agree.

> you're missing the testsuite/ChangeLog entry, don't forget.

I don't believe in modifying testsuite/ChangeLog for tests;
https://gcc.gnu.org/codingconventions.html#ChangeLogs links to some
discussion.

Jason

Re: [PATCH] Fix DW_AT_data_member_location/DW_AT_bit_offset handling (PR debug/78839)

2017-01-17 Thread Jason Merrill

On Wed, Jan 11, 2017 at 3:19 PM, Jakub Jelinek  wrote:
> +  else
>  #endif /* PCC_BITFIELD_TYPE_MATTERS */
> -
> -  tree_result = byte_position (decl);
> +tree_result = byte_position (decl);

Let's add a blank line after this assignment.  OK with that change.

Jason

[C++ PATCH] PR 79091, ICE with unnamed enum mangle

2017-01-17 Thread Nathan Sidwell


Jason,
in r241944:
2016-11-07  Jason Merrill  

Implement P0012R1, Make exception specifications part of the type
system.

You increment processing_template_decl around the mangling of a template 
function decl.  AFAICT, that's so that nothrow_spec_p doesn't explode at:

  gcc_assert (processing_template_decl
  || TREE_PURPOSE (spec) == error_mark_node);
when called from the mangler at:
  if (nothrow_spec_p (spec))
write_string ("Do");
  else if (TREE_PURPOSE (spec))


the trouble is that's now causing no_linkage_check to bail out early with:
  if (processing_template_decl)
return NULL_TREE;

thus triggering the assert:
 gcc_assert (no_linkage_check (type, /*relaxed_p=*/true));
  /* Just use the old mangling at namespace scope.  */

It seems to me risky to have processsing_template_decl incremented, as 
no_linkage_check is called from a number of places in the mangler.  Thus 
the attached patch, which adds a default arg to nothrow_spec_p to tell 
it to be a little more lenient.


In the old days, I'd've made nothrow_spec_p an asserting wrapper for a 
non-asserting function, and called that non-asserting function from the 
mangler.  But we can use default arg magic to avoid adjusting all the 
other call sites.  I'm fine with doing it the wrapper way, if you'd prefer.


ok?

--
Nathan Sidwell
2017-01-17  Nathan Sidwell  

	cp/
	PR c++/79091
	* cp-tree.h (nothrow_spec_p): add defaulted arg.
	* except.c (nothrow_spec_p): add unresolved_ok arg.
	* mangle.c (write_exception_spec): adjust nothrow_spec_p call.
	(write_encoding): don't increment processing_template_decl around
	encoding.

	testsuite/
	PR c++/79091
	* g++.dg/pr79091.C: New.

Index: cp/cp-tree.h
===
--- cp/cp-tree.h	(revision 244535)
+++ cp/cp-tree.h	(working copy)
@@ -5990,7 +5990,7 @@ extern void check_handlers			(tree);
 extern tree finish_noexcept_expr		(tree, tsubst_flags_t);
 extern bool expr_noexcept_p			(tree, tsubst_flags_t);
 extern void perform_deferred_noexcept_checks	(void);
-extern bool nothrow_spec_p			(const_tree);
+extern bool nothrow_spec_p			(const_tree, bool = false);
 extern bool type_noexcept_p			(const_tree);
 extern bool type_throw_all_p			(const_tree);
 extern tree build_noexcept_spec			(tree, int);
Index: cp/except.c
===
--- cp/except.c	(revision 244535)
+++ cp/except.c	(working copy)
@@ -1137,10 +1137,12 @@ expr_noexcept_p (tree expr, tsubst_flags
 return true;
 }
 
-/* Return true iff SPEC is throw() or noexcept(true).  */
+/* Return true iff SPEC is throw() or noexcept(true).  UNRESOLVED_OK
+   is true if it's ok to have an unresolved noexcept spec.  That
+   happens during mangling of a template instantation.  */
 
 bool
-nothrow_spec_p (const_tree spec)
+nothrow_spec_p (const_tree spec, bool unresolved_ok)
 {
   gcc_assert (!DEFERRED_NOEXCEPT_SPEC_P (spec));
   if (spec == NULL_TREE
@@ -1150,7 +1152,8 @@ nothrow_spec_p (const_tree spec)
   if (TREE_PURPOSE (spec) == NULL_TREE
   || spec == noexcept_true_spec)
 return true;
-  gcc_assert (processing_template_decl
+  gcc_assert (unresolved_ok
+	  || processing_template_decl
 	  || TREE_PURPOSE (spec) == error_mark_node);
   return false;
 }
Index: cp/mangle.c
===
--- cp/mangle.c	(revision 244535)
+++ cp/mangle.c	(working copy)
@@ -366,7 +366,7 @@ write_exception_spec (tree spec)
   return;
 }
 
-  if (nothrow_spec_p (spec))
+  if (nothrow_spec_p (spec, true))
 write_string ("Do");
   else if (TREE_PURPOSE (spec))
 {
@@ -829,7 +829,6 @@ write_encoding (const tree decl)
 
   if (tmpl)
 	{
-	  ++processing_template_decl;
 	  fn_type = get_mostly_instantiated_function_type (decl);
 	  /* FN_TYPE will not have parameter types for in-charge or
 	 VTT parameters.  Therefore, we pass NULL_TREE to
@@ -846,9 +845,6 @@ write_encoding (const tree decl)
   write_bare_function_type (fn_type,
 mangle_return_type_p (decl),
 d);
-
-  if (tmpl)
-	--processing_template_decl;
 }
 }
 
Index: testsuite/g++.dg/pr79091.C
===
--- testsuite/g++.dg/pr79091.C	(revision 0)
+++ testsuite/g++.dg/pr79091.C	(working copy)
@@ -0,0 +1,25 @@
+// PR 79091 ICE mangling an unnamed enum in a tempate instantiation.
+
+enum  {
+  One = 1
+};
+
+template
+class Matrix {};
+
+template
+Matrix *Bar ()
+{
+  return 0;
+}
+
+template 
+Matrix *Baz ()
+{
+  return 0;
+}
+
+bool Foo ()
+{
+  return Baz<1> () == Bar<1> ();
+}

Re: [PATCH][AArch64] Improve SHA1 scheduling

2017-01-17 Thread James Greenhalgh

On Tue, Dec 06, 2016 at 03:10:50PM +, Wilco Dijkstra wrote:
>     
> 
> ping

OK.

This has been on the list since before Stage 1 closed and should be low
risk outside of code using the SHA1H intrinsics.

Though, given where we are in the release cycle, please give
Richard/Marcus 24 hours to object before pushing it.

Thanks,
James

> From: Wilco Dijkstra
> Sent: 25 October 2016 18:08
> To: GCC Patches
> Cc: nd
> Subject: [PATCH][AArch64] Improve SHA1 scheduling
>     
> SHA1H instructions may be scheduled after a SHA1C instruction
> that uses the same input register.  However SHA1C updates its input,
> so if SHA1H is scheduled after it, it requires an extra move.
> Increase the priority of SHA1H to ensure it gets scheduled
> earlier, avoiding the move.
> 
> Is this something the generic scheduler could do automatically for
> instructions with RMW operands?
> 
> Passes bootstrap & regress. OK for commit?
> 
> ChangeLog:
> 2016-10-25  Wilco Dijkstra  
> 
>     * config/aarch64/aarch64.c (aarch64_sched_adjust_priority)
>     New function.
>     (TARGET_SCHED_ADJUST_PRIORITY): Define target hook.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 9b2f9cb19343828dc39e9950ebbefe941521942a..2b25bd1bdd6f4e7737f8e04c3b3684cdff6c4b80
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13668,6 +13668,26 @@ aarch64_sched_fusion_priority (rtx_insn *insn, int 
> max_pri,
>    return;
>  }
>  
> +/* Implement the TARGET_SCHED_ADJUST_PRIORITY hook.
> +   Adjust priority of sha1h instructions so they are scheduled before
> +   other SHA1 instructions.  */
> +
> +static int
> +aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
> +{
> +  rtx x = PATTERN (insn);
> +
> +  if (GET_CODE (x) == SET)
> +    {
> +  x = SET_SRC (x);
> +
> +  if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_SHA1H)
> +   return priority + 10;
> +    }
> +
> +  return priority;
> +}
> +
>  /* Given OPERANDS of consecutive load/store, check if we can merge
>     them into ldp/stp.  LOAD is true if they are load instructions.
>     MODE is the mode of memory operands.  */
> @@ -14431,6 +14451,9 @@ aarch64_optab_supported_p (int op, machine_mode 
> mode1, machine_mode,
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>  
> +#undef TARGET_SCHED_ADJUST_PRIORITY
> +#define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
> +
>  #undef TARGET_SCHED_MACRO_FUSION_P
>  #define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p
>  
> 
>

Re: [PATCH][AArch64 - v4] Simplify eh_return implementation

2017-01-17 Thread James Greenhalgh

On Mon, Jan 16, 2017 at 03:00:48PM +, Wilco Dijkstra wrote:
> Here is the updated version:
> 
> This patch simplifies the handling of the EH return value.  We force the use 
> of the
> frame pointer so the return location is always at FP + 8.  This means we can 
> emit
> a simple volatile access in EH_RETURN_HANDLER_RTX without needing md
> patterns, splitters and frame offset calculations.  The new implementation 
> also
> fixes various bugs in aarch64_final_eh_return_addr, which does not work with
> -fomit-frame-pointer, alloca or outgoing arguments.
> 
> Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport
> this to GCC6.x?

This is OK for trunk. I think it would be useful on GCC 6, but give it a few
days on trunk to wait for fallout before backporting.

Thanks,
James

> 
> ChangeLog:
> 
> 2017-01-16  Wilco Dijkstra  
> 
> PR77455
> gcc/
> * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.
> * config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove.
> (EH_RETURN_HANDLER_RTX): New define.
> * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
> Force frame pointer in EH return functions.
> (aarch64_expand_epilogue): Add barrier for eh_return.
> (aarch64_final_eh_return_addr): Remove.
> (aarch64_eh_return_handler_rtx): New function.
> * config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr):
> Remove.
> (aarch64_eh_return_handler_rtx): New prototype.
> 
> testsuite/
> * gcc.target/aarch64/eh_return.c: New test.

[PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-01-17 Thread Wilco Dijkstra

After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This means the Thumb-2 iordi_notdi_di
patterns are no longer used - the split ORR and NOT instructions are merged
into ORN by Combine.  With -mfpu=neon the iordi_notdi_di patterns are used
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN).  After this there are no Thumb-2 specific DImode 
patterns.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html

ChangeLog:
2017-01-17  Wilco Dijkstra  

* config/arm/thumb2.md (iordi_notdi_di): Remove pattern.
(iordi_notzesidi_di): Likewise.
(iordi_notdi_zesidi): Likewise.
(iordi_notsesidi_di): Likewise.

--

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
2e7580f220eae1524fef69719b1796f50f5cf27c..91471d4650ecae4f4e87b549d84d11adf3014ad2
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1434,103 +1434,6 @@
(set_attr "type" "alu_sreg")]
 )
 
-; Constants for op 2 will never be given to these patterns.
-(define_insn_and_split "*iordi_notdi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r"))
-   (match_operand:DI 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
-   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[4] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-operands[5] = gen_highpart (SImode, operands[2]);
-operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notzesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (zero_extend:DI
-(match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,?r")))]
-  "TARGET_THUMB2"
-  "#"
-  ; (not (zero_extend...)) means operand0 will always be 0x
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (const_int -1))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "4,8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notdi_zesidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r"))
-   (zero_extend:DI
-(match_operand:SI 1 "s_register_operand" "r,r"]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (not:SI (match_dup 4)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-operands[4] = gen_highpart (SImode, operands[2]);
-operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-(match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int 31)))
-  (match_dup 4)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[4] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
 (define_insn "*orsi_notsi_si"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
(ior:SI (not:SI (match_operand:SI 2 "s_register_operand" "r"))

Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-17 Thread Jeff Law


On 01/17/2017 09:12 AM, Martin Sebor wrote:

On 01/17/2017 08:26 AM, Jeff Law wrote:

On 01/16/2017 05:06 PM, Martin Sebor wrote:

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.

But doesn't the creation of the bogus memset signal an invalid
transformation in the loop optimizer?  ie, if we're going to convert a
loop into a memset, then we'd damn well better be sure the loop bounds
are reasonable.


I'm not sure that emitting the memset call is necessarily a bug in
the loop optimizer (which in all likelihood wasn't written with
the goal of preventing or detecting possible buffer overflows).
The loop with the excessive bound is in the source code and can
be reached given the right inputs (calling v.resize(v.size() - 1)
on an empty vector.  It's a lurking bug in the program that, if
triggered, will overflow the vector and crash the program (or worse)
with or without the optimization.
Right, but that doesn't mean that the loop optimizer can turn it into a 
memset.  If the bounds are such that we're going to invoke undefined 
behaviour from memset, then the loop optimizer must leave the loop alone.




What else could the loop optimizer could do in this instance?
I suppose it could just leave the loop alone and avoid emitting
the memset call.  That would avoid the warning but mask the
problem with the overflow.  In my mind, preventing the overflow
given that we have the opportunity is the right thing to do.
That is, after all, the goal of the warning.
The right warning in this case is WRT the loop iteration space 
independent of mem*.





As I mentioned privately yesterday, I'm actually pleasantly
surprised that it's helped identify this opportunity in GCC itself.
My hope was to eventually go and find the places where GCC emits
potentially out of bounds calls (based on user inputs) and fix them
to emit better code on the assumption that they can't be valid or
replace them with traps if they could happen in a running program.
It didn't occur to me that the warning itself would help find them.

Martin

Re: [PATCH, rs6000] Add support for vbpermd instruction and vec_bperm API

2017-01-17 Thread Segher Boessenkool

Hi Bill,

On Tue, Jan 17, 2017 at 09:58:52AM -0600, Bill Schmidt wrote:
> Bootstrapped and tested on powerpc64-unknown-linux-gnu and on
> powerpc64le-unknown-linux-gnu with no regressions.  Is this ok for
> trunk?

Yes this is fine.  Just one trivial remark, fix it or not, your choice...

> +; One of the vector API interfaces requires returning vector unsigned char.
> +(define_insn "altivec_vbpermq2"
> +  [(set (match_operand:V16QI 0 "register_operand" "=v")
> + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
> +(match_operand:V16QI 2 "register_operand" "v")]
> +   UNSPEC_VBPERMQ))]
> +  "TARGET_P8_VECTOR"
> +  "vbpermq %0,%1,%2"
> +  [(set_attr "length" "4")
> +   (set_attr "type" "vecsimple")])

Length 4 is the default (so you can just leave it out).  This is less
clutter, and makes it clearer where the length is *not* the default.


Segher

[PATCH] -mstack-protector-guard and friends (PR78875)

2017-01-17 Thread Segher Boessenkool

Currently, on PowerPC, code compiled with -fstack-protector will load
the canary from -0x7010(13) (for -m64) or from -0x7008(2) (for -m32)
if GCC was compiled against GNU libc 2.4 or newer or some other libc
that supports -fstack-protector, and from the global variable
__stack_chk_guard otherwise.

This does not work well for Linux and other OS kernels and similar.
For such non-standard applications, this patch creates a few new
command-line options.  The relevant new use cases are:

-mstack-protector-guard=global
Use the __stack_chk_guard variable, no matter how this GCC was
configured.

-mstack-protector-guard=tls
Use the canary from TLS.  This will error out if this GCC was built
with a C library that does not support it.

-mstack-protector-guard=tls -mstack-protector-register=
-mstack-protector-offset=
Load the canary from offset  from base register .


Bootstrap and test running.  Is this okay for trunk?


Segher


2017-01-17  Segher Boessenkool  

PR target/78875

* config/rs6000/rs6000-opts.h (stack_protector_guard): New enum.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Handle
the new options.
* config/rs6000/rs6000.md (stack_protect_set): Handle the new more
flexible settings.
(stack_protect_test): Ditto.
* config/rs6000/rs6000.opt (mstack-protector-guard=,
mstack-protector-guard-reg=, mstack-protector-guard-offset=): New
options.
* doc/invoke.texi (Option Summary) [RS/6000 and PowerPC Options]:
Add -mstack-protector-guard=, -mstack-protector-guard-reg=, and
-mstack-protector-guard-offset=.
(RS/6000 and PowerPC Options): Ditto.

gcc/testsuite/
* gcc.target/powerpc/ssp-1.c: New testcase.
* gcc.target/powerpc/ssp-2.c: New testcase.

---
 gcc/config/rs6000/rs6000-opts.h  |  6 
 gcc/config/rs6000/rs6000.c   | 48 +++
 gcc/config/rs6000/rs6000.md  | 49 +++-
 gcc/config/rs6000/rs6000.opt | 28 ++
 gcc/doc/invoke.texi  | 19 +
 gcc/testsuite/gcc.target/powerpc/ssp-1.c |  6 
 gcc/testsuite/gcc.target/powerpc/ssp-2.c |  6 
 7 files changed, 142 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/ssp-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/ssp-2.c

diff --git a/gcc/config/rs6000/rs6000-opts.h b/gcc/config/rs6000/rs6000-opts.h
index d58b980..086217a 100644
--- a/gcc/config/rs6000/rs6000-opts.h
+++ b/gcc/config/rs6000/rs6000-opts.h
@@ -154,6 +154,12 @@ enum rs6000_vector {
   VECTOR_OTHER /* Some other vector unit */
 };
 
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard {
+  SSP_TLS, /* per-thread canary in TLS block */
+  SSP_GLOBAL   /* global canary */
+};
+
 /* No enumeration is defined to index the -mcpu= values (entries in
processor_target_table), with the type int being used instead, but
we need to distinguish the special "native" value.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d17a719..10632d6 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4935,6 +4935,54 @@ rs6000_option_override_internal (bool global_init_p)
atoi (rs6000_sched_insert_nops_str));
 }
 
+  /* Handle stack protector */
+  if (!global_options_set.x_rs6000_stack_protector_guard)
+#ifdef TARGET_THREAD_SSP_OFFSET
+rs6000_stack_protector_guard = SSP_TLS;
+#else
+rs6000_stack_protector_guard = SSP_GLOBAL;
+#endif
+
+#ifdef TARGET_THREAD_SSP_OFFSET
+  rs6000_stack_protector_guard_offset = TARGET_THREAD_SSP_OFFSET;
+  rs6000_stack_protector_guard_reg = TARGET_64BIT ? 13 : 2;
+#endif
+
+  if (global_options_set.x_rs6000_stack_protector_guard_offset_str)
+{
+  char *endp;
+  const char *str = rs6000_stack_protector_guard_offset_str;
+
+  errno = 0;
+  long offset = strtol (str, , 0);
+  if (!*str || *endp || errno)
+   error ("%qs is not a valid number "
+  "in -mstack-protector-guard-offset=", str);
+
+  if (!IN_RANGE (offset, -0x8000, 0x7fff)
+ || (TARGET_64BIT && (offset & 3)))
+   error ("%qs is not a valid offset "
+  "in -mstack-protector-guard-offset=", str);
+
+  rs6000_stack_protector_guard_offset = offset;
+}
+
+  if (global_options_set.x_rs6000_stack_protector_guard_reg_str)
+{
+  const char *str = rs6000_stack_protector_guard_reg_str;
+  int reg = decode_reg_name (str);
+
+  if (!IN_RANGE (reg, 1, 31))
+   error ("%qs is not a valid base register "
+  "in -mstack-protector-guard-reg=", str);
+
+  rs6000_stack_protector_guard_reg = reg;
+}
+
+  if (rs6000_stack_protector_guard == SSP_TLS
+  && !IN_RANGE (rs6000_stack_protector_guard_reg,

[PATCH] RFC: PR78905 define _GLIBCXX_RELEASE macro

2017-01-17 Thread Jonathan Wakely


As I said in https://gcc.gnu.org/ml/libstdc++/2017-01/msg00109.html
the __GLIBCXX__ macro is useless, but is the closest thing we have to
a version macro for libstdc++. This matters when using libstdc++ with
Clang or Intel icc or other compilers, because you can't check the
__GNUC__ macro. I've seen several requests for a way to check the
libstdc++ version, or complaints that there is no way to do it.

This patch adds a new _GLIBCXX_RELEASE macro that contains the same
value as __GNUC__ i.e. the major release number.

- Yes, it only contains the major number. We could in theory have
 _GLIBCXX_MAJOR and _GLIBCXX_MINOR instead, but between
 _GLIBCXX_RELEASE and __GLIBCXX__ you can identify the release branch
 and a date within that branch.

- The name is "RELEASE" because we used to define _GLIBCXX_VERSION
 many years ago, but it was a string literal and this is an integer.
 To avoid problems for any old code checking for _GLIBCXX_VERSION I
 chose a different name.

Thoughts?


PR libstdc++/78905
* include/Makefile.am (_GLIBCXX_RELEASE): Set value.
* include/Makefile.in: Regenerate.
* include/bits/c++config (_GLIBCXX_RELEASE): Add #define.
* testsuite/ext/profile/mutex_extensions_neg.cc: Use lineno of 0 in
dg-error.

commit 6518efcc098f852310ef15e7cdb563d19803051a
Author: Jonathan Wakely 
Date:   Tue Jan 17 15:45:55 2017 +

PR78905 define _GLIBCXX_RELEASE macro

PR libstdc++/78905
* include/Makefile.am (_GLIBCXX_RELEASE): Set value.
* include/Makefile.in: Regenerate.
* include/bits/c++config (_GLIBCXX_RELEASE): Add #define.
* testsuite/ext/profile/mutex_extensions_neg.cc: Use lineno of 0 in
dg-error.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index dfdceb3..3703bd1 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -1238,6 +1238,7 @@ ${host_builddir}/c++config.h: ${CONFIG_HEADER} \
  stamp-cxx11-abi \
  stamp-allocator-new
@date=`cat ${toplevel_srcdir}/gcc/DATESTAMP` ;\
+   release=`sed 's/^\([0-9]*\).*$$/\1/' ${toplevel_srcdir}/gcc/BASE-VER` ;\
ns_version=`cat stamp-namespace-version` ;\
visibility=`cat stamp-visibility` ;\
externtemplate=`cat stamp-extern-template` ;\
@@ -1249,6 +1250,7 @@ ${host_builddir}/c++config.h: ${CONFIG_HEADER} \
${CONFIG_HEADER} > /dev/null 2>&1 \
&& ldbl_compat='s,^#undef _GLIBCXX_LONG_DOUBLE_COMPAT$$,#define 
_GLIBCXX_LONG_DOUBLE_COMPAT 1,' ;\
sed -e "s,define __GLIBCXX__,define __GLIBCXX__ $$date," \
+   -e "s,define _GLIBCXX_RELEASE,define _GLIBCXX_RELEASE $$release," \
-e "s,define _GLIBCXX_INLINE_VERSION, define _GLIBCXX_INLINE_VERSION 
$$ns_version," \
-e "s,define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY, define 
_GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY $$visibility," \
-e "s,define _GLIBCXX_EXTERN_TEMPLATE$$, define 
_GLIBCXX_EXTERN_TEMPLATE $$externtemplate," \
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 0cc1865..691716b 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -30,9 +30,12 @@
 #ifndef _GLIBCXX_CXX_CONFIG_H
 #define _GLIBCXX_CXX_CONFIG_H 1
 
-// The current version of the C++ library in compressed ISO date format.
+// The datestamp of the C++ library in compressed ISO date format.
 #define __GLIBCXX__
 
+// The major release number for the GCC release the C++ library belongs to.
+#define _GLIBCXX_RELEASE
+
 // Macros for various attributes.
 //   _GLIBCXX_PURE
 //   _GLIBCXX_CONST
diff --git a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc 
b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
index 32a4e91..645aa24 100644
--- a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
+++ b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
@@ -25,7 +25,7 @@
 
 #include 
 
-// { dg-error "multiple inlined namespaces" "" { target *-*-* } 350 }
+// { dg-error "multiple inlined namespaces" "" { target *-*-* } 0 }
 
 // "template argument 1 is invalid"
 // { dg-prune-output "tuple:993" }

Re: [PATCH 9c] callgraph: handle __RTL functions

2017-01-17 Thread David Malcolm

On Tue, 2017-01-17 at 13:35 +0100, Jan Hubicka wrote:
> > On Mon, Jan 16, 2017 at 10:25 PM, Jeff Law  wrote:
> > > On 01/09/2017 07:38 PM, David Malcolm wrote:
> > > > 
> > > > The RTL backend code is full of singleton state, so we have to
> > > > handle
> > > > functions as soon as we parse them.  This requires various
> > > > special-casing
> > > > in the callgraph code.
> > > > 
> > > > gcc/ChangeLog:
> > > > * cgraph.h (symtab_node::native_rtl_p): New decl.
> > > > * cgraphunit.c (symtab_node::native_rtl_p): New
> > > > function.
> > > > (symtab_node::needed_p): Don't assert for early
> > > > assembly output
> > > > for __RTL functions.
> > > > (cgraph_node::finalize_function): Set "force_output"
> > > > for __RTL
> > > > functions.
> > > > (cgraph_node::analyze): Bail out early for __RTL
> > > > functions.
> > > > (analyze_functions): Update assertion to support __RTL
> > > > functions.
> > > > (cgraph_node::expand): Bail out early for __RTL
> > > > functions.
> > > > * gimple-expr.c: Include "tree-pass.h".
> > > > (gimple_has_body_p): Return false for __RTL functions.
> > > > ---
> > > >  gcc/cgraph.h  |  4 
> > > >  gcc/cgraphunit.c  | 41 ++-
> > > > --
> > > >  gcc/gimple-expr.c |  3 ++-
> > > >  3 files changed, 44 insertions(+), 4 deletions(-)
> > > > 
> > > 
> > > > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> > > > index 81a3ae9..ed699e1 100644
> > > > --- a/gcc/cgraphunit.c
> > > > +++ b/gcc/cgraphunit.c
> > > 
> > >  @@ -568,6 +591,12 @@ cgraph_node::add_new_function (tree fndecl,
> > > bool
> > > lowered)
> > > > 
> > > >  void
> > > >  cgraph_node::analyze (void)
> > > >  {
> > > > +  if (native_rtl_p ())
> > > > +{
> > > > +  analyzed = true;
> > > > +  return;
> > > > +}
> > > 
> > > So my concern here would be how this interacts with the rest of
> > > the cgraph
> > > machinery.  Essentially you're saying we've built all the
> > > properties for the
> > > given code.  But AFAICT that can't be true and cgraph isn't
> > > actually aware
> > > of any of the properties of the native RTL code (even such things
> > > as what
> > > functions the native RTL code might call).
> > > 
> > > So I guess my question is how do you ensure that even though
> > > cgraph hasn't
> > > looked at code that we're appropriately conservative with how the
> > > file is
> > > processed?  Particularly if there's other code in the source file
> > > that is
> > > expected to interact with the RTL native code?
> > 
> > I think that as we're finalizing the function from the FE before
> > the
> > cgraph is built
> > (and even throw away the RTL?) we have no other choice than
> > treating a __RTL
> > function as black box which means treat it as possibly calling all
> > function in
> > the TU and reading/writing/taking the address of all decls in the
> > TU.  Consider
> 
> I guess RTL frontend may be arranged to mark all such decls as used
> or just require
> user to do it, like we do with asm statements.
> 
> I wonder why we need to insert those definitions into cgraph at first
> place...

They're added to the cgraph by this call:

  /* Add to cgraph.  */
  cgraph_node::finalize_function (fndecl, false);

within function_reader::create_function (in r244110, though that code
isn't called yet; it's called by the stuff in patch 9).

If I hack out that call, so that __RTL functions aren't in the cgraph,
then I see lots of failures in the kit, for example here in predict.c:

maybe_hot_frequency_p (struct function *fun, int freq)
{
  struct cgraph_node *node = cgraph_node::get (fun->decl);

  [...read though node, so it must be non-NULL]

Similarly, this line in varasm.c's assemble_start_function assumes that
the fndecl has a symtab node:

  align = symtab_node::get (decl)->definition_alignment ();

etc.

I don't know how many other places make the assumption that cfun's
fndecl has a node in the callgraph.

Given that I want to have __RTL functions called by non-__RTL functions
(and the patch kit handles this), it seemed saner to go down the route
of adding the decl to the callgraph.

> Honza
> > 
> > static int i;
> > static void foo () {}
> > int __RTL main()
> > {
> >   ... call foo, access i ...
> > }
> > 
> > which probably will right now optimize i and foo away and thus fail
> > to link?

> > But I think we can sort out these "details" when we run into
> > them...
> > 
> > Richard.
> > 
> > > Jeff

Re: [PATCH] Fix wrong assumption in contains_type_p (PR ipa/71207).

2017-01-17 Thread Jan Hubicka

> 
> Ok, applied without the renaming as r244530. I guess you added that to cut 
> the recursion.
> 
> Would it be fine to install the patch to active branches after proper testing?
OK
Honza
> Thanks,
> Martin
> 
> >> bool consider_placement_new,
> >> bool consider_bases)
> >>  {
> >> @@ -463,18 +463,18 @@ contains_type_p (tree outer_type, HOST_WIDE_INT 
> >> offset,
> >>/* Check that type is within range.  */
> >>if (offset < 0)
> >>  return false;
> >> -  if (TYPE_SIZE (outer_type) && TYPE_SIZE (otr_type)
> >> -  && TREE_CODE (TYPE_SIZE (outer_type)) == INTEGER_CST
> >> -  && TREE_CODE (TYPE_SIZE (otr_type)) == INTEGER_CST
> >> -  && wi::ltu_p (wi::to_offset (TYPE_SIZE (outer_type)),
> >> -  (wi::to_offset (TYPE_SIZE (otr_type)) + offset)))
> >> -return false;
> >> +
> >> +  /* PR ipa/71207
> >> + As OUTER_TYPE can be a type which has a diamond virtual inheritance,
> >> + it's not necessary that INNER_TYPE will fit within OUTER_TYPE with
> >> + a given offset.  It can happen that INNER_TYPE also contains a base 
> >> object,
> >> + however it would point to the same instance in the OUTER_TYPE.  */
> >>  
> >>context.offset = offset;
> >>context.outer_type = TYPE_MAIN_VARIANT (outer_type);
> >>context.maybe_derived_type = false;
> >>context.dynamic = false;
> >> -  return context.restrict_to_inner_class (otr_type, 
> >> consider_placement_new,
> >> +  return context.restrict_to_inner_class (inner_type, 
> >> consider_placement_new,
> >>  consider_bases);
> >>  }
> >>  
> >> diff --git a/gcc/testsuite/g++.dg/ipa/pr71207.C 
> >> b/gcc/testsuite/g++.dg/ipa/pr71207.C
> >> new file mode 100644
> >> index 000..19a03998460
> >> --- /dev/null
> >> +++ b/gcc/testsuite/g++.dg/ipa/pr71207.C
> >> @@ -0,0 +1,42 @@
> >> +/* PR ipa/71207 */
> >> +/* { dg-do run } */
> >> +
> >> +class Class1
> >> +{
> >> +public:
> >> +  Class1() {};
> >> +  virtual ~Class1() {};
> >> +
> >> +protected:
> >> +  unsigned Field1;
> >> +};
> >> +
> >> +class Class2 : public virtual Class1
> >> +{
> >> +};
> >> +
> >> +class Class3 : public virtual Class1
> >> +{
> >> +public:
> >> +  virtual void Method1() = 0;
> >> +
> >> +  void Method2()
> >> +  {
> >> +Method1();
> >> +  }
> >> +};
> >> +
> >> +class Class4 : public Class2, public virtual Class3
> >> +{
> >> +public:
> >> +  Class4() {};
> >> +  virtual void Method1() {};
> >> +};
> >> +
> >> +int main()
> >> +{
> >> +  Class4 var1;
> >> +  var1.Method2();
> >> +
> >> +  return 0;
> >> +}
> >> -- 
> >> 2.11.0
> >>
> >

Re: [PATCH] Emit DW_AT_data_bit_offset instead of DW_AT_{data_member_location,bit_offset,byte_size} for -gdwarf-5 (PR debug/71669)

2017-01-17 Thread Jason Merrill

OK.

On Thu, Jan 12, 2017 at 3:27 PM, Jakub Jelinek  wrote:
> Hi!
>
> While DW_AT_data_bit_offset has been introduced already in DWARF4, GDB only
> gained support for it last November, so I think it is better to enable this
> only for -gdwarf-5 for now and we can reconsider it in a year or two.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-01-12  Jakub Jelinek  
>
> PR debug/71669
> * dwarf2out.c (add_data_member_location_attribute): For constant
> offset bitfield emit for -gdwarf-5 DW_AT_data_bit_offset attribute
> instead of DW_AT_data_member_location, DW_AT_bit_offset and
> DW_AT_byte_size attributes.
>
> --- gcc/dwarf2out.c.jj  2017-01-11 19:05:15.0 +0100
> +++ gcc/dwarf2out.c 2017-01-12 15:08:29.842773537 +0100
> @@ -18272,6 +18272,23 @@ add_data_member_location_attribute (dw_d
>
>if (! loc_descr)
>  {
> +  /* While DW_AT_data_bit_offset has been added already in DWARF4,
> +e.g. GDB only added support to it in November 2016.  For DWARF5
> +we need newer debug info consumers anyway.  We might change this
> +to dwarf_version >= 4 once most consumers catched up.  */
> +  if (dwarf_version >= 5
> + && TREE_CODE (decl) == FIELD_DECL
> + && DECL_BIT_FIELD_TYPE (decl))
> +   {
> + tree off = bit_position (decl);
> + if (tree_fits_uhwi_p (off) && get_AT (die, DW_AT_bit_size))
> +   {
> + remove_AT (die, DW_AT_byte_size);
> + remove_AT (die, DW_AT_bit_offset);
> + add_AT_unsigned (die, DW_AT_data_bit_offset, tree_to_uhwi 
> (off));
> + return;
> +   }
> +   }
>if (dwarf_version > 2)
> {
>   /* Don't need to output a location expression, just the constant. */
>
> Jakub

Re: [C++ PATCH] c++/61636 generic lambdas and this capture

2017-01-17 Thread Jason Merrill


On 01/13/2017 08:33 AM, Nathan Sidwell wrote:

* lambda.c (resolvable_dummy): New, broken out of ...


Maybe resolvable_dummy_lambda, since that's what it returns?

OK with that change.

Jason

RE: [PATCH] MIPS: Fix generation of DIV.G and MOD.G for Loongson targets.

2017-01-17 Thread Toma Tabacu

> Maciej Rozycki writes:
> >  This ought to be handled then, likely by adding Loongson-specific RTL
> > insns matching the `divmod4' and `udivmod4' expanders.  It
> > may be as simple as say (conceptually, untested):
> >
> > (define_insn "divmod4_loongson"
> >   [(set (match_operand:GPR 0 "register_operand" "=d")
> > (any_div:GPR (match_operand:GPR 1 "register_operand" "d")
> >  (match_operand:GPR 2 "register_operand" "d")))
> >(set (match_operand:GPR 3 "register_operand" "=d")
> > (any_mod:GPR (match_dup 1)
> >  (match_dup 2)))]
> >   "TARGET_LOONGSON_2EF"
> > {
> >   return mips_output_division
> > ("div.g\t%0,%1,%2\;mod.g\t%3,%1,%2",
> operands);
> > }
> >   [(set_attr "type" "idiv")
> >(set_attr "mode" "")])
> >
> > although any final fix will have to take an instruction count adjustment
> > into account too, as `mips_idiv_insns' won't as it stands handle the new
> > case.

Thanks for the tip Maciej!
I will tackle that issue in a separate patch.

Matthew Fortune writes:
> 
> Sounds good. I'd prefer to get the testsuite clean first then improve the
> code quality as a later step since it is not a regression and we are
> a few days off stage 4.
> 
> In terms of the patch then the ISA_HAS_DIV3 macro is not currently used so
> I suggest that instead it is renamed to ISA_AVOID_DIV_HILO and then use
> that macro in the definition of ISA_HAS_DIV and ISA_HAS_DDIV to turn
> off the DIV/DDIV instructions.
> 
> The ISA_HAS_DIV3 should have been cleaned up when R6 was added as it is
> ambiguous and could refer to multiple variants of 3-reg operand DIV now
> rather than just Loongson's.
> 
> Thanks,
> Matthew

I believe the patch below fits the description.
I've also added a (too?) succinct explanation for the ISA_AVOID_DIV_HILO macro.

Tested with mips-mti-elf.

Regards,
Toma

gcc/ChangeLog:

* config/mips/mips.h: Add macro to prevent generation of regular
(D)DIV(U) instructions for Loongson.

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index f91b43d..e21e7d8 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -967,19 +967,24 @@ struct mips_cpu_info {
 /* ISA supports instructions DMUL, DMULU, DMUH, DMUHU.  */
 #define ISA_HAS_R6DMUL (TARGET_64BIT && mips_isa_rev >= 6)
 
+/* For Loongson, it is preferable to use the Loongson-specific division and
+   modulo instructions instead of the regular (D)DIV(U) instruction, because
+   the former are faster and also have the effect of reducing code size.  */
+#define ISA_AVOID_DIV_HILO ((TARGET_LOONGSON_2EF   \
+ || TARGET_LOONGSON_3A)\
+&& !TARGET_MIPS16)
+
 /* ISA supports instructions DDIV and DDIVU. */
 #define ISA_HAS_DDIV   (TARGET_64BIT   \
 && !TARGET_MIPS5900\
+&& !ISA_AVOID_DIV_HILO \
 && mips_isa_rev <= 5)
 
 /* ISA supports instructions DIV and DIVU.
This is always true, but the macro is needed for ISA_HAS_DIV
in mips.md.  */
-#define ISA_HAS_DIV(mips_isa_rev <= 5)
-
-#define ISA_HAS_DIV3   ((TARGET_LOONGSON_2EF   \
- || TARGET_LOONGSON_3A)\
-&& !TARGET_MIPS16)
+#define ISA_HAS_DIV(!ISA_AVOID_DIV_HILO\
+&& mips_isa_rev <= 5)
 
 /* ISA supports instructions DIV, DIVU, MOD and MODU.  */
 #define ISA_HAS_R6DIV  (mips_isa_rev >= 6)

Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2017-01-17 Thread Jakub Jelinek

On Tue, Jan 17, 2017 at 05:16:44PM +0100, Martin Liška wrote:
> > If it did, we would ICE because ASAN_POISON_USE would survive this way until
> > expansion.  A quick fix for the ICE (if it can ever happen) would be easy,
> > in sanopt remove ASAN_POISON_USE calls which have argument that is not lhs
> > of ASAN_POISON (all other ASAN_POISON_USE calls will be handled by my
> > incremental patch).  Of course that would also mean in that case we'd report
> > a read rather than write.  But if it can't happen or is very unlikely to
> > happen, then it is a non-issue.
> 
> Thank you Jakub for working on that.
> 
> The patch is fine, I added DCE support and a test-case. Please see attached 
> patch.
> asan.exp regression tests look fine and I've been building linux kernel with 
> KASAN
> enabled. I'll also do asan-boostrap.
> 
> I would like to commit the patch soon, should I squash both patches together, 
> or would it
> be preferred to separate basic optimization and support for stores?

Your choice, either is fine.  If the two patches pass bootstrap/regtest
(ideally also asan-bootstrap), they are ok for trunk.  Just one nit:

> --- a/gcc/tree-ssa-dce.c
> +++ b/gcc/tree-ssa-dce.c
> @@ -1384,6 +1384,10 @@ eliminate_unnecessary_stmts (void)
> case IFN_MUL_OVERFLOW:
>   maybe_optimize_arith_overflow (, MULT_EXPR);
>   break;
> +   case IFN_ASAN_POISON:
> + if (!gimple_has_lhs (stmt))
> +   remove_dead_stmt (, bb);
> + break;
> default:
>   break;
> }

This doesn't seem to be the best spot for it.  At least when looking at
say:
int
foo (int x)
{
  int *ptr = 0;

  if (x < 127)
return 5;

  {
int a;
ptr = 
*ptr = 12345;
  }

  if (x == 34)
return *ptr;
  return 7;
}
where the ASAN_POISON is initially used and only after evrp becomes dead,
then cddce1 calls eliminate_unnecessary_stmts and removes the lhs of the
ASAN_POISON only (and not the whole stmt, unlike how e.g. GOMP_SIMD_LANE is
handled), and only next dce pass tons of passes later removes the
ASAN_POISON call.
So IMHO you need one of these (untested) patches.  The former assumes that
the DCE pass is the only one that can drop the lhs of ASAN_POISON.  If that
is not the case, then perhaps the second patch is better, by removing the
stmt regardless if we've removed the lhs in the current dce pass or in
whatever earlier pass.  I think it shouldn't break IFN_*_OVERFLOW, because
maybe_optimize_arith_overflow starts with
tree lhs = gimple_call_lhs (stmt);
if (lhs == NULL_TREE || ...)
  return;

Jakub
--- gcc/tree-ssa-dce.c.jj   2017-01-01 12:45:38.380670110 +0100
+++ gcc/tree-ssa-dce.c  2017-01-17 17:35:43.650902141 +0100
@@ -1367,10 +1367,18 @@ eliminate_unnecessary_stmts (void)
  update_stmt (stmt);
  release_ssa_name (name);
 
- /* GOMP_SIMD_LANE without lhs is not needed.  */
- if (gimple_call_internal_p (stmt)
- && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
-   remove_dead_stmt (, bb);
+ /* GOMP_SIMD_LANE or ASAN_POISON without lhs is not
+needed.  */
+ if (gimple_call_internal_p (stmt))
+   switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_GOMP_SIMD_LANE:
+ case IFN_ASAN_POISON:
+   remove_dead_stmt (, bb);
+   break;
+ default:
+   break;
+ }
}
  else if (gimple_call_internal_p (stmt))
switch (gimple_call_internal_fn (stmt))
--- gcc/tree-ssa-dce.c.jj   2017-01-01 12:45:38.380670110 +0100
+++ gcc/tree-ssa-dce.c  2017-01-17 17:37:38.639427099 +0100
@@ -1366,13 +1366,8 @@ eliminate_unnecessary_stmts (void)
  maybe_clean_or_replace_eh_stmt (stmt, stmt);
  update_stmt (stmt);
  release_ssa_name (name);
-
- /* GOMP_SIMD_LANE without lhs is not needed.  */
- if (gimple_call_internal_p (stmt)
- && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
-   remove_dead_stmt (, bb);
}
- else if (gimple_call_internal_p (stmt))
+ if (gimple_call_internal_p (stmt))
switch (gimple_call_internal_fn (stmt))
  {
  case IFN_ADD_OVERFLOW:
@@ -1384,6 +1379,13 @@ eliminate_unnecessary_stmts (void)
  case IFN_MUL_OVERFLOW:
maybe_optimize_arith_overflow (, MULT_EXPR);
break;
+ /* GOMP_SIMD_LANE or ASAN_POISON without lhs is not
+needed.  */
+ case IFN_GOMP_SIMD_LANE:
+ case IFN_ASAN_POISON:

Re: libgo patch committed: Update to Go1.8rc1

2017-01-17 Thread Lynn A. Boger

In the past, the libgo version number has always matched the Go version, 
not the gcc release.


The libgo for Go 1.7 and Go 1.8 are not the same.  If someone wanted to 
build gccgo for Go 1.7 using an old commit


maybe for testing or comparison purposes, the libgo version would not 
identify which one it was because


libgo.so.10 could be either Go 1.7 or 1.8.  Maybe nobody cares about 
doing that. It's not a huge deal to me but


it would be a point of confusion.

GCC 6.x was Go 1.6.

On 01/17/2017 10:09 AM, Jakub Jelinek wrote:

On Tue, Jan 17, 2017 at 10:03:25AM -0600, Lynn A. Boger wrote:

I think this is missing the update of the libgo version number.

Why?  GCC 6.x shipped with libgo.so.9, so I don't see anything wrong
on 7.x shipping libgo.so.10.

Jakub

Re: [PATCH] adding missing LTO to some warning options (PR 78606)

2017-01-17 Thread Martin Sebor


On 01/17/2017 05:04 AM, Kyrill Tkachov wrote:

Hi Martin,

On 10/01/17 22:16, Martin Sebor wrote:

The -Walloca-larger-than, -Wformat-length, and -Wformat-truncation
options do not mention LTO among the supported languages and so are
disabled when -flto is used, causing false negatives.

The attached patch adds the missing LTO to the three options. This
makes -Walloca-larger-than work with LTO but not the other two
options, implying that something else is preventing the gimple-ssa-
sprintf pass from running when -flto is enabled.  I haven't had
the cycles to look into what that might be yet.  Since the root
causes are independent I'd like to commit this patch first and
deal with the  -Wformat-{length,truncation} problem separately,
under a new bug (or give someone with a better understanding of
LTO the opportunity to do it).



I see the new test FAILing on arm and aarch64 targets.
FAIL: gcc.dg/pr78768.c execution test


Thanks.  The test doesn't need to run.  It just needs to link.
I changed it in r244537.

Martin

Re: [PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization

2017-01-17 Thread Jakub Jelinek

On Tue, Jan 17, 2017 at 05:22:34PM +0100, Bernd Schmidt wrote:
> On 01/16/2017 08:26 PM, Jeff Law wrote:
> > On 01/13/2017 11:19 AM, Thomas Preudhomme wrote:
> > > Ping? I'm not sure if an ok from Valdimir is enough or if I also need RM
> > > approval.
> > Vlad's approval is all you need.
> 
> Is that a general rule? I'm never too certain on that.

Unless the branch is frozen for release (at which point all commits need RM
approval) or closed, maintainers/reviewers can approve backports in the
areas they are maintainers or reviewers for.
Of course good judgement should be used on what should be backported and
what should not.

Jakub

Re: [PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization

2017-01-17 Thread Bernd Schmidt


On 01/16/2017 08:26 PM, Jeff Law wrote:

On 01/13/2017 11:19 AM, Thomas Preudhomme wrote:

Ping? I'm not sure if an ok from Valdimir is enough or if I also need RM
approval.

Vlad's approval is all you need.


Is that a general rule? I'm never too certain on that.


Bernd

Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2017-01-17 Thread Martin Liška

On 01/16/2017 03:20 PM, Jakub Jelinek wrote:
> On Mon, Jan 09, 2017 at 03:58:04PM +0100, Martin Liška wrote:
 Well, having following sample:

 int
 main (int argc, char **argv)
 {
   int *ptr = 0;

   {
 int a;
 ptr = 
 *ptr = 12345;
   }

   *ptr = 12345;
   return *ptr;
 }

> 
>> I'm still not sure how to do that. Problem is that transformation from:
>>
>>   ASAN_MARK (UNPOISON, , 4);
>>   a = 5;
>>   ASAN_MARK (POISON, , 4);
>>
>> to 
>>
>>   a_8 = 5;
>>   a_9 = ASAN_POISON ();
>>
>> happens in tree-ssa.c, after SSA is created, in situation where we prove the 
>> 'a'
>> does not need to live in memory. Thus said, question is how to identify that 
>> we
>> need to transform into SSA in a different way:
>>
>>a_10 = ASAN_POISON ();
>>ASAN_POISON (a_10);
> 
> I meant something like this (completely untested, and without the testcase
> added to the testsuite).
> The incremental patch as is relies on the ASAN_POISON_USE call having the
> argument the result of ASAN_POISON, it would ICE if that is not the case
> (especially if -fsanitize-recover=address).  Dunno if some optimization
> might decide to create a PHI in between, say merge two unrelated vars for
> if (something)
>   {
> x_1 = ASAN_POISON ();
> ...
> ASAN_POISON_USE (x_1);
>   }
> else
>   {
> y_2 = ASAN_POISON ();
> ...
> ASAN_POISON_USE (y_2);
>   }
> to turn that into:
> if (something)
>   x_1 = ASAN_POISON ();
> else
>   y_2 = ASAN_POISON ();
> _3 = PHI ;
> ...
> ASAN_POISON_USE (_3);
> 
> If it did, we would ICE because ASAN_POISON_USE would survive this way until
> expansion.  A quick fix for the ICE (if it can ever happen) would be easy,
> in sanopt remove ASAN_POISON_USE calls which have argument that is not lhs
> of ASAN_POISON (all other ASAN_POISON_USE calls will be handled by my
> incremental patch).  Of course that would also mean in that case we'd report
> a read rather than write.  But if it can't happen or is very unlikely to
> happen, then it is a non-issue.

Thank you Jakub for working on that.

The patch is fine, I added DCE support and a test-case. Please see attached 
patch.
asan.exp regression tests look fine and I've been building linux kernel with 
KASAN
enabled. I'll also do asan-boostrap.

I would like to commit the patch soon, should I squash both patches together, 
or would it
be preferred to separate basic optimization and support for stores?

Thanks,
Martin

> Something missing from the patch is some change in DCE to remove ASAN_POISON
> calls without lhs earlier.  I think we can't make ASAN_POISON ECF_CONST, we
> don't want it to be merged for different variables.
> 
> --- gcc/internal-fn.def.jj2017-01-16 13:19:49.0 +0100
> +++ gcc/internal-fn.def   2017-01-16 14:25:37.427962196 +0100
> @@ -167,6 +167,7 @@ DEF_INTERNAL_FN (ABNORMAL_DISPATCHER, EC
>  DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, ".R...")
>  DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, ".R..")
>  DEF_INTERNAL_FN (ASAN_POISON, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL)
> +DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL)
>  DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/asan.c.jj 2017-01-16 13:19:49.0 +0100
> +++ gcc/asan.c2017-01-16 14:52:34.022044223 +0100
> @@ -3094,6 +3094,8 @@ create_asan_shadow_var (tree var_decl,
>  return *slot;
>  }
>  
> +/* Expand ASAN_POISON ifn.  */
> +
>  bool
>  asan_expand_poison_ifn (gimple_stmt_iterator *iter,
>   bool *need_commit_edge_insert,
> @@ -3107,8 +3109,8 @@ asan_expand_poison_ifn (gimple_stmt_iter
>return true;
>  }
>  
> -  tree shadow_var  = create_asan_shadow_var (SSA_NAME_VAR (poisoned_var),
> -  shadow_vars_mapping);
> +  tree shadow_var = create_asan_shadow_var (SSA_NAME_VAR (poisoned_var),
> + shadow_vars_mapping);
>  
>bool recover_p;
>if (flag_sanitize & SANITIZE_USER_ADDRESS)
> @@ -3122,16 +3124,16 @@ asan_expand_poison_ifn (gimple_stmt_iter
>ASAN_MARK_POISON),
> build_fold_addr_expr (shadow_var), size);
>  
> -  use_operand_p use_p;
> +  gimple *use;
>imm_use_iterator imm_iter;
> -  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
> +  FOR_EACH_IMM_USE_STMT (use, imm_iter, poisoned_var)
>  {
> -  gimple *use = USE_STMT (use_p);
>if (is_gimple_debug (use))
>   continue;
>  
>int nargs;
> -  tree fun = report_error_func (false, recover_p, tree_to_uhwi (size),
> +  bool store_p = gimple_call_internal_p (use, IFN_ASAN_POISON_USE);
> +  tree fun =

patch to fix PR79058

2017-01-17 Thread Vladimir Makarov


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79058

The patch was successfully bootstrapped and tested on x86-64.

Committed as rev. 244535.


Index: ChangeLog
===
--- ChangeLog	(revision 244534)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2017-01-17  Vladimir Makarov  
+
+	PR target/79058
+	* ira-conflicts.c (ira_build_conflicts): Update total conflict
+	hard regs for inner regno.
+
 2017-01-17  Martin Liska  
 
 	PR ipa/71207
Index: ira-conflicts.c
===
--- ira-conflicts.c	(revision 244500)
+++ ira-conflicts.c	(working copy)
@@ -787,8 +787,12 @@ ira_build_conflicts (void)
 		   if (outer_regno < 0
 		   || !in_hard_reg_set_p (reg_class_contents[aclass],
 	  outer_mode, outer_regno))
-		 SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj),
-   inner_regno);
+		 {
+		   SET_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
+	 inner_regno);
+		   SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj),
+	 inner_regno);
+		 }
 		}
 	}
 
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 244534)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2017-01-17  Vladimir Makarov  
+
+	PR target/79058
+	* gcc.target/arm/pr79058.c: New.
+
 2017-01-17  Jakub Jelinek  
 
 	PR tree-optimization/71854
Index: testsuite/gcc.target/arm/pr79058.c
===
--- testsuite/gcc.target/arm/pr79058.c	(revision 0)
+++ testsuite/gcc.target/arm/pr79058.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "-mcpu=arm7tdmi" } } */
+/* { dg-options "-Os -mbig-endian -mcpu=arm7tdmi" } */
+
+enum { NILFS_SEGMENT_USAGE_ACTIVE, NILFS_SEGMENT_USAGE_DIRTY } a;
+
+void fn2 (long long);
+
+void fn1() {
+  int b = a & 1 << NILFS_SEGMENT_USAGE_DIRTY;
+  fn2 (b ? (long long) -1 : 0);
+}

Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-17 Thread Martin Sebor


On 01/17/2017 08:26 AM, Jeff Law wrote:

On 01/16/2017 05:06 PM, Martin Sebor wrote:

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.

But doesn't the creation of the bogus memset signal an invalid
transformation in the loop optimizer?  ie, if we're going to convert a
loop into a memset, then we'd damn well better be sure the loop bounds
are reasonable.


I'm not sure that emitting the memset call is necessarily a bug in
the loop optimizer (which in all likelihood wasn't written with
the goal of preventing or detecting possible buffer overflows).
The loop with the excessive bound is in the source code and can
be reached given the right inputs (calling v.resize(v.size() - 1)
on an empty vector.  It's a lurking bug in the program that, if
triggered, will overflow the vector and crash the program (or worse)
with or without the optimization.

What else could the loop optimizer could do in this instance?
I suppose it could just leave the loop alone and avoid emitting
the memset call.  That would avoid the warning but mask the
problem with the overflow.  In my mind, preventing the overflow
given that we have the opportunity is the right thing to do.
That is, after all, the goal of the warning.

As I mentioned privately yesterday, I'm actually pleasantly
surprised that it's helped identify this opportunity in GCC itself.
My hope was to eventually go and find the places where GCC emits
potentially out of bounds calls (based on user inputs) and fix them
to emit better code on the assumption that they can't be valid or
replace them with traps if they could happen in a running program.
It didn't occur to me that the warning itself would help find them.

Martin

Re: libgo patch committed: Update to Go1.8rc1

2017-01-17 Thread Jakub Jelinek

On Tue, Jan 17, 2017 at 10:03:25AM -0600, Lynn A. Boger wrote:
> I think this is missing the update of the libgo version number.

Why?  GCC 6.x shipped with libgo.so.9, so I don't see anything wrong
on 7.x shipping libgo.so.10.

Jakub

Re: libgo patch committed: Update to Go1.8rc1

2017-01-17 Thread Lynn A. Boger


I think this is missing the update of the libgo version number.

- Lynn

On 01/13/2017 06:05 PM, Ian Lance Taylor wrote:

I committed a patch to libgo to update the library to the first
release candidate of the upcoming Go 1.8 release.  This is a big
update, mostly a straight copy of the code in the master Go library.

I made the following changes to the Go frontend to correspond to
changes in the runtime library:

* Change map assignment to use mapassign and assign value directly.
* Change string iteration to use decoderune, faster for ASCII strings.
* Change makeslice to take int, and use makeslice64 for larger values.
* Add new noverflow field to hmap struct used for maps.

There are two known problems that I simply commented out of test code
until they can be fixed:

* Commented out test in go/types/sizes_test.go that doesn't compile.
* Commented out reflect.TestStructOf test for padding after zero-sized field.

As usual with these sorts of updates the patch is too large to send to
the mailing list.  I've appended the changes to the gccgo-specific
parts of the code.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  I would not
be terribly surprised if this breaks Solaris.  I'll try to check that
out shortly.

Ian

gotools/ChangeLog:

2017-01-13  Ian Lance Taylor  

Updates for Go 1.8rc1.
* Makefile.am (go_cmd_go_files): Add bug.go.
(s-zdefaultcc): Write defaultPkgConfig.

[PATCH, rs6000] Add support for vbpermd instruction and vec_bperm API

2017-01-17 Thread Bill Schmidt

Hi,

ISA 3.0 adds the vbpermd instruction, related to the vbpermq instruction
added in ISA 2.7.  This patch adds support for that instruction, and
also ensures that vec_bperm provides access to the three supported
interfaces mandated by the ELFv2 ABI:

vector unsigned char vec_bperm (vector unsigned char,
vector unsigned char);
vector unsigned long long vec_bperm (vector unsigned __int128,
 vector unsigned char);
vector unsigned long long vec_bperm (vector unsigned long long,
 vector unsigned char);

The first two forms correspond to vbpermq, and the third corresponds to
vbpermd.

Prior to this patch, vec_bperm was an alias for __builtin_vec_vbpermq,
which corresponds to the first two forms above, except that it returns
vector unsigned long long for the first case.  We need to keep
__builtin_vec_vbpermq as it is a published interface, but vec_bperm
needs to use the correct return value for the first form, and be 
broadened to include the third form.  Thus vec_bperm is now an alias
for __builtin_vec_vbperm_api, which is a new interface covering all
three forms.

The change in return value for the first form is not expected to
cause difficulties, as this is a rarely used interface and any
incompatibility can be solved with a cast.  The previous version was
a violation of the published API.  We may want to make note of this
in the release notes.

Bootstrapped and tested on powerpc64-unknown-linux-gnu and on
powerpc64le-unknown-linux-gnu with no regressions.  Is this ok for
trunk?

Thanks,
Bill


[gcc]

2016-01-17  Bill Schmidt  

* config/rs6000/altivec.h (vec_bperm): Change #define.
* config/rs6000/altivec.md (UNSPEC_VBPERMD): New enum constant.
(altivec_vbpermq2): New define_insn.
(altivec_vbpermd): Likewise.
* config/rs6000/rs6000-builtin.def (VBPERMQ2): New monomorphic
function interface.
(VBPERMD): Likewise.
(VBPERM): New polymorphic function interface.
* config/rs6000/r6000-c.c (altivec_overloaded_builtins_table):
Add entries for P9V_BUILTIN_VEC_VBPERM.
* doc/extend.texi: Add interfaces for vec_bperm.

[gcc/testsuite]

2016-01-17  Bill Schmidt  

* gcc.target/powerpc/p8vector-builtin-8.c: Add new form for
vec_bperm.
* gcc.target/powerpc/p9-vbpermd.c: New file.


Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 244498)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -347,7 +347,7 @@
 #define vec_vaddudm __builtin_vec_vaddudm
 #define vec_vadduqm __builtin_vec_vadduqm
 #define vec_vbpermq __builtin_vec_vbpermq
-#define vec_bperm __builtin_vec_vbpermq
+#define vec_bperm __builtin_vec_vbperm_api
 #define vec_vclz __builtin_vec_vclz
 #define vec_cntlz __builtin_vec_vclz
 #define vec_vclzb __builtin_vec_vclzb
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 244498)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -150,6 +150,7 @@
UNSPEC_VSUBEUQM
UNSPEC_VSUBECUQ
UNSPEC_VBPERMQ
+   UNSPEC_VBPERMD
UNSPEC_BCDADD
UNSPEC_BCDSUB
UNSPEC_BCD_OVERFLOW
@@ -3632,6 +3633,27 @@
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 
+; One of the vector API interfaces requires returning vector unsigned char.
+(define_insn "altivec_vbpermq2"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
+  (match_operand:V16QI 2 "register_operand" "v")]
+ UNSPEC_VBPERMQ))]
+  "TARGET_P8_VECTOR"
+  "vbpermq %0,%1,%2"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
+(define_insn "altivec_vbpermd"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v")
+ (match_operand:V16QI 2 "register_operand" "v")]
+UNSPEC_VBPERMD))]
+  "TARGET_P9_VECTOR"
+  "vbpermd %0,%1,%2"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
 ;; Decimal Integer operations
 (define_int_iterator UNSPEC_BCD_ADD_SUB [UNSPEC_BCDADD UNSPEC_BCDSUB])
 
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 244498)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1802,6 +1802,7 @@ BU_P8V_AV_2 (VMAXUD,  "vmaxud",   CONST,  
umaxv2di3)
 BU_P8V_AV_2 (VMRGEW,   "vmrgew",   CONST,  p8_vmrgew)
 BU_P8V_AV_2 (VMRGOW,   "vmrgow",   CONST,  p8_vmrgow)
 BU_P8V_AV_2 (VBPERMQ,  "vbpermq",  CONST,  altivec_vbpermq)
+BU_P8V_AV_2 (VBPERMQ2,

[committed] Add testcase for PR tree-optimization/71854

2017-01-17 Thread Jakub Jelinek

Hi!

This PR has been fixed in r244218 aka the PR78997 fix.
I've tested the testcase on x86_64-linux with -m32/-m64 (without/with
the PR78997 fix) and committed to trunk as obvious.

2017-01-17  Jakub Jelinek  

PR tree-optimization/71854
* gcc.dg/vect/pr71854.c: New test.

--- gcc/testsuite/gcc.dg/vect/pr71854.c.jj  2017-01-17 16:46:29.366816678 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr71854.c 2017-01-17 16:47:16.442209553 +0100
@@ -0,0 +1,25 @@
+/* PR tree-optimization/71854 */
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -ftree-loop-if-convert" } */
+
+char a, f = 1;
+int b, c, e[8];
+short d;
+
+short
+foo (short x)
+{
+  return x >= 2 || x >> c ? x : x << c;
+}
+
+int
+main ()
+{
+  while (f)
+for (d = 0; d <= 7; d++)
+  {
+   f = 7 >> b ? a : a << b;
+   e[d] = foo (f);
+  }
+  return 0;
+}

Jakub

[PATCH] PR69699 document why GLIBCXX macro is useless

2017-01-17 Thread Jonathan Wakely


The closest thing we have to a version macro in libstdc++ is
__GLIBCXX__ which holds the valid of gcc/DATESTAMP from the source
tree. That's useless for version checking or feature testing because
snapshots have arbitrary values and there's no total order across
branches (a later date does not mean a "better" release with more
features implemented). This updates the docs to point out it isn't
very useful. I've also removed the list of release dates, linking to
the online release timeline instead, so we don't have to keep adding
to the list. This means the information is not included in the sources
and you need to be online to find a date, but since the dates are not
very useful anyway I don't think this is a problem.

PR libstdc++/69699
* doc/xml/manual/abi.xml (abi.versioning.history): Explain why the
_GLIBCXX__ macro is not useful. Remove redundant date information
and link to the GCC release timeline.
(abi.versioning.active): Move partial sentence into the previous
paragraph.
* doc/html/*: Regenerate.

Committed to trunk.

commit 5087726efb1165d835c745fe6bd693d73aa0f389
Author: Jonathan Wakely 
Date:   Tue Jan 17 15:15:55 2017 +

PR69699 document why __GLIBCXX__ macro is useless

PR libstdc++/69699
* doc/xml/manual/abi.xml (abi.versioning.history): Explain why the
_GLIBCXX__ macro is not useful. Remove redundant date information
and link to the GCC release timeline.
(abi.versioning.active): Move partial sentence into the previous
paragraph.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index 8e4a3fa..c818bd8 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -393,10 +393,32 @@ compatible.
 
 
 This macro is defined in the file "c++config" in the
-"libstdc++-v3/include/bits" directory. (Up to GCC 4.1.0, it was
-changed every night by an automated script. Since GCC 4.1.0, it is
-the same value as gcc/DATESTAMP.)
+"libstdc++-v3/include/bits" directory. Up to GCC 4.1.0, it was
+changed every night by an automated script. Since GCC 4.1.0 it is set
+during configuration to the same value as
+gcc/DATESTAMP, so for an official release its value
+is the same as the date of the release, which is given in the http://www.w3.org/1999/xlink;
+  xlink:href="https://gcc.gnu.org/develop.html#timeline;>GCC Release
+Timeline.
 
+
+
+This macro is not useful for determining whether a particular feature is
+supported by the version of libstdc++ you are using. The date of a release
+might be after a feature was added to the development trunk, but the
+release could be from an older branch. For example, in the 5.4.0 release
+the macro has the value 20160603 which is greater than the 20160427 value
+of the macro in the 6.1.0 release, but there are features supported in the
+6.1.0 release that are not supported in 5.4.0 release.
+You also can't test for the the exact values listed below to try and
+identify a release, because a snapshot taken from the gcc-5-branch on
+2016-04-27 would have the same value for the macro as the 6.1.0 release
+despite being a different version.
+Many GNU/Linux distributions build their GCC packages from snapshots, so
+the macro can have dates that doesn't correspond to official releases.
+
+
 
 It is versioned as follows:
 
@@ -427,41 +449,12 @@ compatible.
 GCC 4.0.1: 20050707
 GCC 4.0.2: 20050921
 GCC 4.0.3: 20060309
-GCC 4.1.0: 20060228
-GCC 4.1.1: 20060524
-GCC 4.1.2: 20070214
-GCC 4.2.0: 20070514
-GCC 4.2.1: 20070719
-GCC 4.2.2: 20071007
-GCC 4.2.3: 20080201
-GCC 4.2.4: 20080519
-GCC 4.3.0: 20080306
-GCC 4.3.1: 20080606
-GCC 4.3.2: 20080827
-GCC 4.3.3: 20090124
-GCC 4.3.4: 20090804
-GCC 4.3.5: 20100522
-GCC 4.3.6: 20110627
-GCC 4.4.0: 20090421
-GCC 4.4.1: 20090722
-GCC 4.4.2: 20091015
-GCC 4.4.3: 20100121
-GCC 4.4.4: 20100429
-GCC 4.4.5: 20101001
-GCC 4.4.6: 20110416
-GCC 4.4.7: 20120313
-GCC 4.5.0: 20100414
-GCC 4.5.1: 20100731
-GCC 4.5.2: 20101216
-GCC 4.5.3: 20110428
-GCC 4.5.4: 20120702
-GCC 4.6.0: 20110325
-GCC 4.6.1: 20110627
-GCC 4.6.2: 20111026
-GCC 4.6.3: 20120301
-GCC 4.7.0: 20120322
-GCC 4.7.1: 20120614
-GCC 4.7.2: 20120920
+
+  GCC 4.1.0 and later: the GCC release date, as shown in the
+  http://www.w3.org/1999/xlink;
+xlink:href="https://gcc.gnu.org/develop.html#timeline;>GCC
+  Release Timeline
+
 
 
 
@@ -619,7 +612,7 @@ compatible.
 
   When the GNU C++ library is being built with symbol versioning
   on, you should see the following at configure time for
-  libstdc++:
+  libstdc++ (showing

Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-17 Thread Aaron Sawdey

On Tue, 2017-01-17 at 08:30 -0600, Peter Bergner wrote:
> On 1/16/17 3:09 PM, Aaron Sawdey wrote:
> > Here is an updated version of this patch.
> > 
> > Tulio noted that glibc's strncmp test was failing. This turned out
> > to
> > be the use of signed HOST_WIDE_INT for handling strncmp length. The
> > glibc test calls strncmp with length 2^64-1, presumably to provoke
> > exactly this type of bug. Fixing the issue required changing
> > select_block_compare_mode() and expand_block_compare() as well.
> 
> If glibc's testsuite exposed a bug, then we should also add a similar
> bug to our testsuite.  I scanned the patch and I'm not sure I see
> that exact test scenario.  Is it there and I'm just not seeing it?
> 
> Peter
> 

Nope, you didn't miss it, Peter. I will add such a test as a separate
patch, this one has dragged on for a long time. I have another more
comprehensive test case for strcmp/strncmp I want to add anyway.

Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-17 Thread Jeff Law


On 01/16/2017 05:06 PM, Martin Sebor wrote:

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.
But doesn't the creation of the bogus memset signal an invalid 
transformation in the loop optimizer?  ie, if we're going to convert a 
loop into a memset, then we'd damn well better be sure the loop bounds 
are reasonable.


Jeff

[PATCH] PR79114 use decayed type in std::throw_with_nested assertion

2017-01-17 Thread Jonathan Wakely


I added a static assertion to enforce the CopyConstructible
requirement that the standard imposes for std::throw_with_nested.
Unfortunately the standard is defective, so we started rejecting
perfectly good code. This alters the check to what I think the
standard should say (and what I've proposed in a new issue against the
standard): we should check decay not remove_reference, because a
throw-expression decays arrays and functions to pointers.

PR libstdc++/79114
* libsupc++/nested_exception.h (throw_with_nested): Use decay instead
of remove_reference.
* testsuite/18_support/nested_exception/79114.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit f4757dfe3379d6295a44954da8eaf2a42aae624d
Author: Jonathan Wakely 
Date:   Tue Jan 17 14:19:56 2017 +

PR79114 use decayed type in std::throw_with_nested assertion

PR libstdc++/79114
* libsupc++/nested_exception.h (throw_with_nested): Use decay instead
of remove_reference.
* testsuite/18_support/nested_exception/79114.cc: New test.

diff --git a/libstdc++-v3/libsupc++/nested_exception.h 
b/libstdc++-v3/libsupc++/nested_exception.h
index 35b025a..43970b4 100644
--- a/libstdc++-v3/libsupc++/nested_exception.h
+++ b/libstdc++-v3/libsupc++/nested_exception.h
@@ -111,7 +111,7 @@ namespace std
 inline void
 throw_with_nested(_Tp&& __t)
 {
-  using _Up = typename remove_reference<_Tp>::type;
+  using _Up = typename decay<_Tp>::type;
   using _CopyConstructible
= __and_, is_move_constructible<_Up>>;
   static_assert(_CopyConstructible::value,
diff --git a/libstdc++-v3/testsuite/18_support/nested_exception/79114.cc 
b/libstdc++-v3/testsuite/18_support/nested_exception/79114.cc
new file mode 100644
index 000..8afc72b
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/nested_exception/79114.cc
@@ -0,0 +1,27 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++11 } }
+
+#include 
+
+void
+test01()
+{
+  std::throw_with_nested("");
+  std::throw_with_nested(test01);
+}

Re: [PR C/79116] ICE with CilkPlus array notation and _Cilk_for (C front-end)

2017-01-17 Thread Aldy Hernandez


On 01/17/2017 09:41 AM, Jakub Jelinek wrote:

On Tue, Jan 17, 2017 at 09:22:52AM -0500, Aldy Hernandez wrote:

This is the same as pr70565 but it fails in an entirely different manner in
the C front-end.

The problem here is that the parser builds an ARRAY_NOTATION_REF with a type
of ptrdiff for length and stride.  Later in cilkplus_extract_an_triplets we
convert convert length and stride to an integer_type_node.  This causes
create_array_refs() to use a stride of integer_type, while the start is
still a ptrdiff (verify_gimple ICE, boom).

The attached patch converts `start' to an integer_type to match the length
and stride.  We could either do this, or do a fold_convert if
!useless_type_conversion_p in create_array_refs.  I didn't want to look into
cilkplus too deeply as to why we have different types, because (a) I don't
care (b) we're probably going to deprecate Cilk Plus, no?


Conceptually, using integer_type_node for these things is complete wrong,
unless the Cilk+ specification says that all the array notation expressions
are converted to int.  Because forcing the int there means that it will
misbehave on very large arrays (over 2GB elements).
So much better would be to have the expressions converted to sizetype or
something similar that the middle-end works with (yes, it is unsigned, so
if it needs to be signed somewhere, we'd need corresponding signed type for
that).

The question is where all is the integer_type_node in the Cilk+ lowering
hardcoded and how hard would it be to fix it up.

If it is too hard, I guess I can live with this patch, but there should be a
PR that it needs to be fixed not to hardcode int type which is inappropriate
for sizes/lengths.


As discussed on IRC, we will probably deprecate CilkPlus for GCC7 and 
remove it for GCC8 unless someone is interested in maintaining it. 
So...committing as is.




And the more important question is if Intel is willing to maintain Cilk+ in
GCC, or if we should deprecate it (and, if the latter, if already in GCC7
deprecate, remove in GCC8, or deprecate in GCC8, remove in GCC9).
There are various Cilk+ related PRs around on which nothing has been done
for many months.


Aldy

Re: [PATCH] Fix wording of -Wmisleading-indentation (PR c++/71497)

2017-01-17 Thread Jeff Law


On 01/17/2017 08:52 AM, David Malcolm wrote:

Someone pointed out a grammar nit in the -Wmisleading-indentation
diagnostic messages, which this patch fixes.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk and for gcc 6?

gcc/c-family/ChangeLog:
PR c++/71497
* c-indentation.c (warn_for_misleading_indentation): Use the past
subjunctive in the note.

gcc/testsuite/ChangeLog:
PR c++/71497
* c-c++-common/Wmisleading-indentation-3.c: Update wording of
expected messages.
* c-c++-common/Wmisleading-indentation.c: Likewise.

OK.
jeff

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-01-17 Thread Wilco Dijkstra

Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of 
a 
declaration is an integer. So the question is whether we should allow
largish offsets outside of the bounds of symbols (v1), no offsets (this 
version), or
small offsets (small negative and positive offsets just outside a symbol are 
common). 
The only thing we can't allow is any offset like we currently do...

In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations. 
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.


ChangeLog:
2017-01-17  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.

testsuite/
* gcc.target/aarch64/symbol-range.c (foo): Set new limit.
* gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
{
case AARCH64_CMODEL_TINY:
@@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
 we have no way of knowing the address of symbol at compile time
 so we can't accurately say if the distance between the PC and
 symbol + offset is outside the addressible range of +/-1M in the
-TINY code model.  So we rely on images not being greater than
-1M and cap the offset at 1M and anything beyond 1M will have to
-be loaded using an alternative mechanism.  Furthermore if the
-symbol is a weak reference to something that isn't known to
-resolve to a symbol in this module, then force to memory.  */
+TINY code model.  So we limit the maximum offset to +/-64KB and
+assume the offset to the symbol is not larger than +/-(1M - 64KB).
+Furthermore force to memory if the symbol is a weak reference to
+something that doesn't resolve to a symbol in this module.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
  return SYMBOL_TINY_ABSOLUTE;
 
case AARCH64_CMODEL_SMALL:
  /* Same reasoning as the tiny code model, but the offset cap here is
-4G.  */
+1G, allowing +/-3G for the offset to the symbol.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
  return SYMBOL_SMALL_ABSOLUTE;
 
case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */

[PATCH] Fix wording of -Wmisleading-indentation (PR c++/71497)

2017-01-17 Thread David Malcolm

Someone pointed out a grammar nit in the -Wmisleading-indentation
diagnostic messages, which this patch fixes.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk and for gcc 6?

gcc/c-family/ChangeLog:
PR c++/71497
* c-indentation.c (warn_for_misleading_indentation): Use the past
subjunctive in the note.

gcc/testsuite/ChangeLog:
PR c++/71497
* c-c++-common/Wmisleading-indentation-3.c: Update wording of
expected messages.
* c-c++-common/Wmisleading-indentation.c: Likewise.
---
 gcc/c-family/c-indentation.c   |  2 +-
 .../c-c++-common/Wmisleading-indentation-3.c   |  6 +--
 .../c-c++-common/Wmisleading-indentation.c | 54 +++---
 3 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c
index 78ef166..329f470 100644
--- a/gcc/c-family/c-indentation.c
+++ b/gcc/c-family/c-indentation.c
@@ -608,7 +608,7 @@ warn_for_misleading_indentation (const token_indent_info 
_tinfo,
  guard_tinfo_to_string (guard_tinfo)))
inform (next_tinfo.location,
("...this statement, but the latter is misleadingly indented"
-" as if it is guarded by the %qs"),
+" as if it were guarded by the %qs"),
guard_tinfo_to_string (guard_tinfo));
 }
 }
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c 
b/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c
index 277a388..6482b00 100644
--- a/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation-3.c
@@ -17,7 +17,7 @@ fn_5 (double *a, double *b, double *sum, double *prod)
   int i = 0;
   for (i = 0; i < 10; i++) /* { dg-warning "3: this 'for' clause does not 
guard..." } */
 sum[i] = a[i] * b[i];
-prod[i] = a[i] * b[i]; /* { dg-message "5: ...this statement, but the 
latter is misleadingly indented as if it is guarded by the 'for'" } */
+prod[i] = a[i] * b[i]; /* { dg-message "5: ...this statement, but the 
latter is misleadingly indented as if it were guarded by the 'for'" } */
 /* { dg-begin-multiline-output "" }
for (i = 0; i < 10; i++)
^~~
@@ -38,7 +38,7 @@ int fn_6 (int a, int b, int c)
goto fail;
if ((err = foo (b)) != 0) /* { dg-message "2: this 'if' clause does not 
guard..." } */
goto fail;
-   goto fail; /* { dg-message "3: ...this statement, but the 
latter is misleadingly indented as if it is guarded by the 'if'" } */
+   goto fail; /* { dg-message "3: ...this statement, but the 
latter is misleadingly indented as if it were guarded by the 'if'" } */
if ((err = foo (c)) != 0)
goto fail;
/* ... */
@@ -64,7 +64,7 @@ void fn_14 (void)
   int i;
   FOR_EACH (i, 0, 10) /* { dg-message "in expansion of macro .FOR_EACH." } */
 foo (i);
-bar (i, i); /* { dg-message "5: ...this statement, but the latter is 
misleadingly indented as if it is guarded by the 'for'" } */
+bar (i, i); /* { dg-message "5: ...this statement, but the latter is 
misleadingly indented as if it were guarded by the 'for'" } */
 
 /* { dg-begin-multiline-output "" }
for ((VAR) = (START); (VAR) < (STOP); (VAR++))
diff --git a/gcc/testsuite/c-c++-common/Wmisleading-indentation.c 
b/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
index dcc66e7..5cdeba1 100644
--- a/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
+++ b/gcc/testsuite/c-c++-common/Wmisleading-indentation.c
@@ -14,7 +14,7 @@ fn_1 (int flag)
   int x = 4, y = 5;
   if (flag) /* { dg-warning "3: this 'if' clause does not guard..." } */
 x = 3;
-y = 2; /* { dg-message "5: ...this statement, but the latter is 
misleadingly indented as if it is guarded by the 'if'" } */
+y = 2; /* { dg-message "5: ...this statement, but the latter is 
misleadingly indented as if it were guarded by the 'if'" } */
   return x * y;
 }
 
@@ -22,7 +22,7 @@ int
 fn_2 (int flag, int x, int y)
 {
   if (flag) /* { dg-warning "3: this 'if' clause does not guard..." } */
-x++; y++; /* { dg-message "10: ...this statement, but the latter is 
misleadingly indented as if it is guarded by the 'if'" } */
+x++; y++; /* { dg-message "10: ...this statement, but the latter is 
misleadingly indented as if it were guarded by the 'if'" } */
 
   return x * y;
 }
@@ -35,7 +35,7 @@ fn_3 (int flag)
 x = 3;
   else /* { dg-warning "3: this 'else' clause does not guard..." } */
 x = 2;
-y = 2; /* { dg-message "5: ...this statement, but the latter is 
misleadingly indented as if it is guarded by the 'else'" } */
+y = 2; /* { dg-message "5: ...this statement, but the latter is 
misleadingly indented as if it were guarded by the 'else'" } */
   return x * y;
 }
 
@@ -45,7 +45,7 @@ fn_4 (double *a, double *b, double *c)
   int i = 0;
   while (i < 10) /* { dg-warning "3: this 'while'

Re: [PATCH] Reload global options when strict aliasing is dropped (PR ipa/79043).

2017-01-17 Thread Martin Liška

On 01/13/2017 02:01 PM, Richard Biener wrote:
> On Fri, Jan 13, 2017 at 2:00 PM, Martin Liška  wrote:
>> On 01/13/2017 01:16 PM, Richard Biener wrote:
>>> On Tue, Jan 10, 2017 at 4:28 PM, Martin Liška  wrote:
 As mentioned in the PR, we currently do not properly reload global
 optimization options when we drop strict-aliasing flag on a function
 that equals to cfun.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
>>>
>>> Ok.
>>
>> Thanks, applied to trunk. May I install the patch to active branches if it 
>> survives
>> bootstrap and regression tests?
> 
> Yes, but please wait a few days to see if there's any fallout with odd
> target stuff
> done by set_cfun.

Ok, looks there's no fallout. To install the patch to GCC-5 branch I need patch 
from
r231095, which I tested with this patch and works fine.

Honza is it OK to apply it together?
Thanks,
Martin


> 
> Richard.
> 
>>
>> Martin
>>
>>>
>>> Richard.
>>>
 Martin
>>

Re: [PATCH] Fix wrong assumption in contains_type_p (PR ipa/71207).

2017-01-17 Thread Martin Liška

On 01/17/2017 11:43 AM, Jan Hubicka wrote:
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-01-12  Martin Liska  
>>
>>  PR ipa/71207
>>  * g++.dg/ipa/pr71207.C: New test.
>>
>> gcc/ChangeLog:
>>
>> 2017-01-12  Martin Liska  
>>
>>  PR ipa/71207
>>  * ipa-polymorphic-call.c (contains_type_p): Fix wrong
>>  assumption, add comment and renamed otr_type to inner_type.
>> ---
>>  gcc/ipa-polymorphic-call.c | 22 ++--
>>  gcc/testsuite/g++.dg/ipa/pr71207.C | 42 
>> ++
>>  2 files changed, 53 insertions(+), 11 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr71207.C
>>
>> diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c
>> index da64ce4c6e0..c13fc858c86 100644
>> --- a/gcc/ipa-polymorphic-call.c
>> +++ b/gcc/ipa-polymorphic-call.c
>> @@ -446,15 +446,15 @@ no_useful_type_info:
>>  }
>>  }
>>  
>> -/* Return true if OUTER_TYPE contains OTR_TYPE at OFFSET.
>> -   CONSIDER_PLACEMENT_NEW makes function to accept cases where OTR_TYPE can
>> +/* Return true if OUTER_TYPE contains INNER_TYPE at OFFSET.
>> +   CONSIDER_PLACEMENT_NEW makes function to accept cases where INNER_TYPE 
>> can
>> be built within OUTER_TYPE by means of placement new.  CONSIDER_BASES 
>> makes
>> -   function to accept cases where OTR_TYPE appears as base of OUTER_TYPE or 
>> as
>> +   function to accept cases where INNER_TYPE appears as base of OUTER_TYPE 
>> or as
>> base of one of fields of OUTER_TYPE.  */
>>  
>>  static bool
>>  contains_type_p (tree outer_type, HOST_WIDE_INT offset,
>> - tree otr_type,
>> + tree inner_type,
> 
> I would actually keep otr_type (or change it consistently in all cases).
> otr comes from OBJ_TYPE_REF and is used thorough the code (not my invention,
> comming from original binfo walking routine).
> 
> OK with that change.  I bleive I added the size check only to cut the 
> recurision
> early which is not a big deal.

Ok, applied without the renaming as r244530. I guess you added that to cut the 
recursion.

Would it be fine to install the patch to active branches after proper testing?
Thanks,
Martin

>>   bool consider_placement_new,
>>   bool consider_bases)
>>  {
>> @@ -463,18 +463,18 @@ contains_type_p (tree outer_type, HOST_WIDE_INT offset,
>>/* Check that type is within range.  */
>>if (offset < 0)
>>  return false;
>> -  if (TYPE_SIZE (outer_type) && TYPE_SIZE (otr_type)
>> -  && TREE_CODE (TYPE_SIZE (outer_type)) == INTEGER_CST
>> -  && TREE_CODE (TYPE_SIZE (otr_type)) == INTEGER_CST
>> -  && wi::ltu_p (wi::to_offset (TYPE_SIZE (outer_type)),
>> -(wi::to_offset (TYPE_SIZE (otr_type)) + offset)))
>> -return false;
>> +
>> +  /* PR ipa/71207
>> + As OUTER_TYPE can be a type which has a diamond virtual inheritance,
>> + it's not necessary that INNER_TYPE will fit within OUTER_TYPE with
>> + a given offset.  It can happen that INNER_TYPE also contains a base 
>> object,
>> + however it would point to the same instance in the OUTER_TYPE.  */
>>  
>>context.offset = offset;
>>context.outer_type = TYPE_MAIN_VARIANT (outer_type);
>>context.maybe_derived_type = false;
>>context.dynamic = false;
>> -  return context.restrict_to_inner_class (otr_type, consider_placement_new,
>> +  return context.restrict_to_inner_class (inner_type, 
>> consider_placement_new,
>>consider_bases);
>>  }
>>  
>> diff --git a/gcc/testsuite/g++.dg/ipa/pr71207.C 
>> b/gcc/testsuite/g++.dg/ipa/pr71207.C
>> new file mode 100644
>> index 000..19a03998460
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/ipa/pr71207.C
>> @@ -0,0 +1,42 @@
>> +/* PR ipa/71207 */
>> +/* { dg-do run } */
>> +
>> +class Class1
>> +{
>> +public:
>> +  Class1() {};
>> +  virtual ~Class1() {};
>> +
>> +protected:
>> +  unsigned Field1;
>> +};
>> +
>> +class Class2 : public virtual Class1
>> +{
>> +};
>> +
>> +class Class3 : public virtual Class1
>> +{
>> +public:
>> +  virtual void Method1() = 0;
>> +
>> +  void Method2()
>> +  {
>> +Method1();
>> +  }
>> +};
>> +
>> +class Class4 : public Class2, public virtual Class3
>> +{
>> +public:
>> +  Class4() {};
>> +  virtual void Method1() {};
>> +};
>> +
>> +int main()
>> +{
>> +  Class4 var1;
>> +  var1.Method2();
>> +
>> +  return 0;
>> +}
>> -- 
>> 2.11.0
>>
>

Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook

2017-01-17 Thread Jiong Wang




On 17/01/17 13:57, Richard Earnshaw (lists) wrote:

On 16/01/17 14:29, Jiong Wang wrote:



I can see the reason for doing this is if you want to seperate the
interpretion
of GCC CFA reg-note and the final DWARF CFA operation.  My
understanding is all
reg notes defined in gcc/reg-note.def should have general meaning,
even the
CFA_WINDOW_SAVE.  For those which are architecture specific we might
need a
mechanism to define them in backend only.
For general reg-notes in gcc/reg-note.def, they are not always have
the
corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE,
therefore if we want to achieve what you described, I think we also
need to
define a new target hook which maps a GCC CFA reg-note into
architecture DWARF
CFA operation.

Regards,
Jiong



Here is the patch.


Hmm, I really wasn't expecting any more than something like the
following in dwarf2cfi.c:

@@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn)
 handled_one = true;
 break;

+  case REG_CFA_TOGGLE_RA_MANGLE:
case REG_CFA_WINDOW_SAVE:
+   /* We overload both of these operations onto the same DWARF
opcode.  */
 dwarf2out_frame_debug_cfa_window_save ();
 handled_one = true;
 break;

This keeps the two reg notes separate within the compiler, but emits the
same dwarf operation during final output.  This avoids the need for new
hooks or anything more complicated.


This was my initial thoughts and the patch would be very small as you've
demonstrated.  I later moved to this complexer patch as I am thinking it's
better to completely treat notes in reg-notes.def as having generic meaning and
maps them to standard DWARF CFA if there is, otherwise maps them to target
private DWARF CFA through this new hook.  This give other targets a chance to
map, for example REG_CFA_TOGGLE_RA_MANGLE, to their architecture DWARF number.

The introduction of new hook looks be very low risk in this stage, the only
painful thing is the header file needs to be reorganized as we need to use some
DWARF type and reg-note type in targhooks.c.

Anyway, if the new hook patch is too heavy, I have attached the the simplified
version which simply defines the new REG_CFA_TOGGLE_RA_MANGLE and maps to same
code of REG_CFA_WINDOW_SAVE.


gcc/

2017-01-17  Jiong Wang  

* reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note.
* combine-stack-adj.c (no_unhandled_cfa): Handle
REG_CFA_TOGGLE_RA_MANGLE.
* dwarf2cfi.c
(dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE.
* config/aarch64/aarch64.c (aarch64_expand_prologue): Generates DWARF
info for return address signing.
(aarch64_expand_epilogue): Likewise.

diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index 20cd59ad08329e9f4f834bfc01d6f9ccc4485283..9ec14a3e44363f35f6419c38233ce5eebddd3458 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -208,6 +208,7 @@ no_unhandled_cfa (rtx_insn *insn)
   case REG_CFA_SET_VDRAP:
   case REG_CFA_WINDOW_SAVE:
   case REG_CFA_FLUSH_QUEUE:
+  case REG_CFA_TOGGLE_RA_MANGLE:
 	return false;
   }
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3bcad76b68b6ea7c9d75d150d79c45fb74d6bf0d..6451b08191cf1a44aed502930da8603111f6e8ca 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3553,7 +3553,11 @@ aarch64_expand_prologue (void)
 
   /* Sign return address for functions.  */
   if (aarch64_return_address_signing_enabled ())
-emit_insn (gen_pacisp ());
+{
+  insn = emit_insn (gen_pacisp ());
+  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 
   if (flag_stack_usage_info)
 current_function_static_stack_size = frame_size;
@@ -3707,7 +3711,11 @@ aarch64_expand_epilogue (bool for_sibcall)
 */
   if (aarch64_return_address_signing_enabled ()
   && (for_sibcall || !TARGET_ARMV8_3 || crtl->calls_eh_return))
-emit_insn (gen_autisp ());
+{
+  insn = emit_insn (gen_autisp ());
+  add_reg_note (insn, REG_CFA_TOGGLE_RA_MANGLE, const0_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
 
   /* Stack adjustment for exception handler.  */
   if (crtl->calls_eh_return)
diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 2748e2fa48e4794181496b26df9b51b7e51e7b84..2a527c9fecab091dccb417492e5dbb2ade244be2 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	handled_one = true;
 	break;
 
+  case REG_CFA_TOGGLE_RA_MANGLE:
   case REG_CFA_WINDOW_SAVE:
+	/* We overload both of these operations onto the same DWARF opcode.  */
 	dwarf2out_frame_debug_cfa_window_save ();
 	handled_one = true;
 	break;
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index ead4a9f58e8621288ee765e029c673640fdf38f4..175da119b6a534b04bd154f2c69dd087afd474ea 100644
--- a/gcc/reg-notes.def
+++

[PATCH][GCC][Aarch64] Add vectorize patten for copysign.

2017-01-17 Thread Tamar Christina

Hi All,

This patch vectorizes the copysign builtin for AArch64
similar to how it is done for Arm.

AArch64 now generates:

...
.L4:
ldr q1, [x6, x3]
add w4, w4, 1
ldr q0, [x5, x3]
cmp w4, w7
bif v1.16b, v2.16b, v3.16b
fmulv0.2d, v0.2d, v1.2d
str q0, [x5, x3]

for the input:

 x * copysign(1.0, y)

On 481.wrf in Spec2006 on AArch64 this gives us a speedup of 9.1%.
Regtested on  aarch64-none-linux-gnu and no regressions.

Ok for trunk?

gcc/
2017-01-17  Tamar Christina  

* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Added CASE_CFN_COPYSIGN.
* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Changed int to HOST_WIDE_INT.
* config/aarch64/aarch64-protos.h
(aarch64_simd_gen_const_vector_dup): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Added copysign BINOP.
* config/aarch64/aarch64-simd.md: Added copysign3.

gcc/testsuite/
2017-01-17  Tamar Christina  

* gcc.target/arm/vect-copysignf.c: Move to...
* gcc.dg/vect/vect-copysignf.c: ... Here.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 69fb756f0fbdc016f35ce1d08f2aaf092a034704..faba7a1a38b6e494e9589637d51c639e3126969d 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1447,6 +1447,16 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
 	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2di];
   else
 	return NULL_TREE;
+CASE_CFN_COPYSIGN:
+  if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv2sf];
+  else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv4sf];
+  else if (AARCH64_CHECK_BUILTIN_MODE (2, D))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_BINOP_copysignv2df];
+  else
+	return NULL_TREE;
+
 default:
   return NULL_TREE;
 }
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..e75ba29f93e9e749791803ca3fa8d716ca261064 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -362,7 +362,7 @@ rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
-rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
+rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 bool aarch64_simd_mem_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
 rtx aarch64_tls_get_addr (void);
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index d713d5d8b88837ec6f2dc51188fb252f8d5bc8bd..a67b7589e8badfbd0f13168557ef87e052eedcb1 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -151,6 +151,9 @@
   BUILTIN_VQN (TERNOP, raddhn2, 0)
   BUILTIN_VQN (TERNOP, rsubhn2, 0)
 
+  /* Implemented by copysign3.  */
+  BUILTIN_VHSDF (BINOP, copysign, 3)
+
   BUILTIN_VSQN_HSDI (UNOP, sqmovun, 0)
   /* Implemented by aarch64_qmovn.  */
   BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a12e2268ef9b023112f8d05db0a86957fee83273..627ada98b3e4d4b02685d5b5ff71ae74d8e3356a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -338,6 +338,24 @@
   }
 )
 
+(define_expand "copysign3"
+  [(match_operand:VHSDF 0 "register_operand")
+   (match_operand:VHSDF 1 "register_operand")
+   (match_operand:VHSDF 2 "register_operand")]
+  "TARGET_FLOAT && TARGET_SIMD"
+{
+  rtx v_bitmask = gen_reg_rtx (mode);
+  int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
+
+  emit_move_insn (v_bitmask,
+		  aarch64_simd_gen_const_vector_dup (mode,
+		 HOST_WIDE_INT_M1 << bits));
+  emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
+	 operands[2], operands[1]));
+  DONE;
+}
+)
+
 (define_insn "*aarch64_mul3_elt"
  [(set (match_operand:VMUL 0 "register_operand" "=w")
 (mult:VMUL
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0cf7d12186af3e05ba8742af5a03425f61f51754..1a69605db5d2a4a0efb8c9f97a019de9dded40eb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11244,14 +11244,16 @@ aarch64_mov_operand_p (rtx x, machine_mode mode)
 
 /* Return a const_int vector of VAL.  */
 rtx
-aarch64_simd_gen_const_vector_dup (machine_mode mode, int val)
+aarch64_simd_gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
 {
   int nunits = GET_MODE_NUNITS (mode);
   rtvec v = rtvec_alloc (nunits);
   int i;

Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

2017-01-17 Thread Jeff Law


On 01/17/2017 02:15 AM, Richard Biener wrote:

On Mon, Jan 16, 2017 at 11:36 PM, Richard Biener
 wrote:

On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law  wrote:

On 01/16/2017 01:51 AM, Richard Biener wrote:

On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law  wrote:


At one time I know I had the max_size == size test in

valid_ao_ref_for_dse.

But it got lost at some point.  This is what caused the Ada failure.

Technically it'd be OK for the potentially dead store to have a

variable

size as long as the later stores covered the entire range of the

potentially

dead store.  I doubt this happens enough to be worth checking.

The ppc64 big endian failures were more interesting.  We had this in

the IL:


memmove (dst, src, 0)

The trimming code assumes that there's at least one live byte in the

store,

which obviously isn't the case here.  The net result is we compute

an

incorrect trim and the copy goes wild with incorrect addresses and

lengths.

This is trivial to fix by validating that the store has a nonzero

length.


I was a bit curious how often this happened in practice because such

a call

is trivially dead.  ~80 during a bootstrap and a few dozen in the

testsuite.

Given how trivial it is to detect and optimize, this patch includes

removal

of such calls.  This hunk makes the check for zero size in
valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if

we add

more builtin support without filtering zero size we'd regress again.


Interesting - we do fold memset (..., 0) away so this means we either
have an unfolded memset stmt in the IL before DSE.

It's actually exposed by fre3, both in the original test and in the
reduced testcase.  In the reduced testcase we have this just prior to
FRE3:

;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
  _3 = MEM[(const struct vec *)_4].m_num;
  if (_3 != 0)
goto ; [36.64%]
  else
goto ; [63.36%]
;;succ:   4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
;;5 [63.4%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
  _6 = vec_av_set.m_vec;
  _7 = _6->m_num;
  _8 = _7 - _3;
  _6->m_num = _8;
  _9 = (long unsigned int) _8;
  _10 = _9 * 4;
  slot.2_11 = slot;
  dest.3_12 = dest;
  memmove (dest.3_12, slot.2_11, _10);
;;succ:   5 [100.0%]  (FALLTHRU,EXECUTABLE)


_3 has the value _6->m_num.  Thus _8 will have the value 0, which in
turn makes _10 have the value zero as seen in the .fre3 dump:

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
  _4->m_num = 0;
  slot.2_11 = slot;
  dest.3_12 = dest;
  memmove (dest.3_12, slot.2_11, 0);

In the full test its similar.

I don't know if you want to try and catch this in FRE though.


Ah, I think I have patches for this since a long time in my tree...  We're 
folding calls in a restricted way for some historical reason.


If we detect in DCE (where it makes reasonable sense) rather than DSE,
then we detect the dead mem* about 17 passes earlier and the dead
argument setup about 20 passes earlier.  In the testcase I looked at, I

didn't see additional secondary optimizations enabled, but I could
imagine cases where it might.  Seems like a gcc-8 thing though.


I'll give it a quick look tomorrow.


The comment before fold_stmt_inplace no longer applies (but it seems I
simply forgot to push
this change...).  It's better to not keep unfolded stmts around, so
I'll commit this as last bit
of stage3 if testing is fine.

Bootstrap / regtest on x86_64-unknown-linux-gnu in progress.

Richard.

2017-01-17  Richard Biener  

* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
Fold calls regularly.

* gcc.dg/tree-ssa/ssa-fre-57.c: New testcase.

Note you'll need the trivial update to the new ssa-dse testcase as it 
verifies removal of the dead memmove.


jeff

Re: [PR C/79116] ICE with CilkPlus array notation and _Cilk_for (C front-end)

2017-01-17 Thread Jakub Jelinek

On Tue, Jan 17, 2017 at 09:22:52AM -0500, Aldy Hernandez wrote:
> This is the same as pr70565 but it fails in an entirely different manner in
> the C front-end.
> 
> The problem here is that the parser builds an ARRAY_NOTATION_REF with a type
> of ptrdiff for length and stride.  Later in cilkplus_extract_an_triplets we
> convert convert length and stride to an integer_type_node.  This causes
> create_array_refs() to use a stride of integer_type, while the start is
> still a ptrdiff (verify_gimple ICE, boom).
> 
> The attached patch converts `start' to an integer_type to match the length
> and stride.  We could either do this, or do a fold_convert if
> !useless_type_conversion_p in create_array_refs.  I didn't want to look into
> cilkplus too deeply as to why we have different types, because (a) I don't
> care (b) we're probably going to deprecate Cilk Plus, no?

Conceptually, using integer_type_node for these things is complete wrong,
unless the Cilk+ specification says that all the array notation expressions
are converted to int.  Because forcing the int there means that it will
misbehave on very large arrays (over 2GB elements).
So much better would be to have the expressions converted to sizetype or
something similar that the middle-end works with (yes, it is unsigned, so
if it needs to be signed somewhere, we'd need corresponding signed type for
that).

The question is where all is the integer_type_node in the Cilk+ lowering
hardcoded and how hard would it be to fix it up.

If it is too hard, I guess I can live with this patch, but there should be a
PR that it needs to be fixed not to hardcode int type which is inappropriate
for sizes/lengths.

And the more important question is if Intel is willing to maintain Cilk+ in
GCC, or if we should deprecate it (and, if the latter, if already in GCC7
deprecate, remove in GCC8, or deprecate in GCC8, remove in GCC9).
There are various Cilk+ related PRs around on which nothing has been done
for many months.

> commit 494d38235e7a250f3f3b4d4c1950be9208917cce
> Author: Aldy Hernandez 
> Date:   Tue Jan 17 08:27:57 2017 -0500
> 
> PR c/79116
> * array-notation-common.c (cilkplus_extract_an_triplets): Convert
> start type to integer_type.
> 
> diff --git a/gcc/c-family/array-notation-common.c 
> b/gcc/c-family/array-notation-common.c
> index 061c203..3b95332 100644
> --- a/gcc/c-family/array-notation-common.c
> +++ b/gcc/c-family/array-notation-common.c
> @@ -628,7 +628,9 @@ cilkplus_extract_an_triplets (vec *list, 
> size_t size, size_t rank,
> tree ii_tree = array_exprs[ii][jj];
> (*node)[ii][jj].is_vector = true;
> (*node)[ii][jj].value = ARRAY_NOTATION_ARRAY (ii_tree);
> -   (*node)[ii][jj].start = ARRAY_NOTATION_START (ii_tree);
> +   (*node)[ii][jj].start
> + = fold_build1 (CONVERT_EXPR, integer_type_node,
> +ARRAY_NOTATION_START (ii_tree));
> (*node)[ii][jj].length
>   = fold_build1 (CONVERT_EXPR, integer_type_node,
>  ARRAY_NOTATION_LENGTH (ii_tree));
> diff --git a/gcc/testsuite/gcc.dg/cilk-plus/pr79116.c 
> b/gcc/testsuite/gcc.dg/cilk-plus/pr79116.c
> new file mode 100644
> index 000..9206aaf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/cilk-plus/pr79116.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fcilkplus" } */
> +
> +int array[1024];
> +void foo()
> +{
> +  _Cilk_for (int i = 0; i < 512; ++i)
> +array[:] = __sec_implicit_index(0);
> +}

Jakub

Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-17 Thread Peter Bergner


On 1/16/17 3:09 PM, Aaron Sawdey wrote:

Here is an updated version of this patch.

Tulio noted that glibc's strncmp test was failing. This turned out to
be the use of signed HOST_WIDE_INT for handling strncmp length. The
glibc test calls strncmp with length 2^64-1, presumably to provoke
exactly this type of bug. Fixing the issue required changing
select_block_compare_mode() and expand_block_compare() as well.


If glibc's testsuite exposed a bug, then we should also add a similar
bug to our testsuite.  I scanned the patch and I'm not sure I see
that exact test scenario.  Is it there and I'm just not seeing it?

Peter

[PR C/79116] ICE with CilkPlus array notation and _Cilk_for (C front-end)

2017-01-17 Thread Aldy Hernandez

This is the same as pr70565 but it fails in an entirely different manner 
in the C front-end.


The problem here is that the parser builds an ARRAY_NOTATION_REF with a 
type of ptrdiff for length and stride.  Later in 
cilkplus_extract_an_triplets we convert convert length and stride to an 
integer_type_node.  This causes create_array_refs() to use a stride of 
integer_type, while the start is still a ptrdiff (verify_gimple ICE, boom).


The attached patch converts `start' to an integer_type to match the 
length and stride.  We could either do this, or do a fold_convert if 
!useless_type_conversion_p in create_array_refs.  I didn't want to look 
into cilkplus too deeply as to why we have different types, because (a) 
I don't care (b) we're probably going to deprecate Cilk Plus, no?


OK?

commit 494d38235e7a250f3f3b4d4c1950be9208917cce
Author: Aldy Hernandez 
Date:   Tue Jan 17 08:27:57 2017 -0500

PR c/79116
* array-notation-common.c (cilkplus_extract_an_triplets): Convert
start type to integer_type.

diff --git a/gcc/c-family/array-notation-common.c 
b/gcc/c-family/array-notation-common.c
index 061c203..3b95332 100644
--- a/gcc/c-family/array-notation-common.c
+++ b/gcc/c-family/array-notation-common.c
@@ -628,7 +628,9 @@ cilkplus_extract_an_triplets (vec *list, 
size_t size, size_t rank,
  tree ii_tree = array_exprs[ii][jj];
  (*node)[ii][jj].is_vector = true;
  (*node)[ii][jj].value = ARRAY_NOTATION_ARRAY (ii_tree);
- (*node)[ii][jj].start = ARRAY_NOTATION_START (ii_tree);
+ (*node)[ii][jj].start
+   = fold_build1 (CONVERT_EXPR, integer_type_node,
+  ARRAY_NOTATION_START (ii_tree));
  (*node)[ii][jj].length
= fold_build1 (CONVERT_EXPR, integer_type_node,
   ARRAY_NOTATION_LENGTH (ii_tree));
diff --git a/gcc/testsuite/gcc.dg/cilk-plus/pr79116.c 
b/gcc/testsuite/gcc.dg/cilk-plus/pr79116.c
new file mode 100644
index 000..9206aaf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cilk-plus/pr79116.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-fcilkplus" } */
+
+int array[1024];
+void foo()
+{
+  _Cilk_for (int i = 0; i < 512; ++i)
+array[:] = __sec_implicit_index(0);
+}

Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook

2017-01-17 Thread Richard Earnshaw (lists)

On 16/01/17 14:29, Jiong Wang wrote:
> On 13/01/17 18:02, Jiong Wang wrote:
>> On 13/01/17 16:09, Richard Earnshaw (lists) wrote:
>>> On 06/01/17 11:47, Jiong Wang wrote:

 This patch is an update on DWARF generation for return address signing.

 According to new proposal, we simply needs to generate
 REG_CFA_WINDOW_SAVE
 annotation.

 gcc/

 2017-01-06  Jiong Wang  

  * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate
 dwarf
  annotation (REG_CFA_WINDOW_SAVE) for return address signing.
  (aarch64_expand_epilogue): Likewise.


>>> I don't think we should be overloading REG_CFA_WINDOW_SAVE internally in
>>> the compiler -- it's one thing to do it in the dwarf output tables, but
>>> quite another to be doing it elsewhere in the compiler.
>>>
>>> Instead we should create a new reg note kind and use that, but in the
>>> final dwarf output then emit the overloaded opcode.
>>
>> I can see the reason for doing this is if you want to seperate the
>> interpretion
>> of GCC CFA reg-note and the final DWARF CFA operation.  My
>> understanding is all
>> reg notes defined in gcc/reg-note.def should have general meaning,
>> even the
>> CFA_WINDOW_SAVE.  For those which are architecture specific we might
>> need a
>> mechanism to define them in backend only.
>>For general reg-notes in gcc/reg-note.def, they are not always have
>> the
>> corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE,
>> therefore if we want to achieve what you described, I think we also
>> need to
>> define a new target hook which maps a GCC CFA reg-note into
>> architecture DWARF
>> CFA operation.
>>
>> Regards,
>> Jiong
>>
>>
> Here is the patch.
> 

Hmm, I really wasn't expecting any more than something like the
following in dwarf2cfi.c:

@@ -2098,7 +2098,9 @@ dwarf2out_frame_debug (rtx_insn *insn)
handled_one = true;
break;

+  case REG_CFA_TOGGLE_RA_MANGLE:
   case REG_CFA_WINDOW_SAVE:
+   /* We overload both of these operations onto the same DWARF
opcode.  */
dwarf2out_frame_debug_cfa_window_save ();
handled_one = true;
break;

This keeps the two reg notes separate within the compiler, but emits the
same dwarf operation during final output.  This avoids the need for new
hooks or anything more complicated.

R.

> Introduced one new target hook TARGET_DWARF_MAP_REGNOTE_TO_CFA.  The
> purpose is
> to allow GCC to map DWARF CFA reg notes in reg-note.def, which looks to
> me have
> generic meaning, into target private DWARF CFI if there is no standard
> DWARF CFI
> mapping.
> 
> One new GCC reg-note REG_TOGGLE_RA_MANGLE introduced as well, currently,
> it's
> only used by AArch64 to implement return address signing and is mapped to
> AArch64's target private DWARF CFI.
> 
> Does this approach and implementation looks OK?
> 
> I can come up with seperate patches to define this hook on Sparc for
> CFA_WINDOW_SAVE, and to remove redundant including of dwarf2.h although
> there is
> "ifdef" protector in header file.
> 
> The default hook implementation "default_dwarf_map_regnote_to_cfa" in
> targhooks.c used the types "enum reg_note" and "enum dwarf_call_frame_info"
> which is not included in coretypes.h thus this patch has several change in
> header files.  I have done X86 bootstrap to make sure no build
> breakage.  I'd
> appreciate there is better ideas to handle these type define.
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
> 2017-01-16  Jiong Wang  
> 
> * target.def (dwarf_map_regnote_to_cfa): New hook.
> * targhooks.c (default_dwarf_map_regnote_to_cfa): Default
> implementation
> for TARGET_DWARF_MAP_REGNOTE_TO_CFA.
> * targhooks.h (default_dwarf_map_regnote_to_cfa): New declaration.
> * rtl.h (enum reg_note): Move enum reg_note to...
> * coretypes.h: ... here.
> (dwarf2.h): New include file.
> * reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note.
> * combine-stack-adj.c (no_unhandled_cfa): Handle
> REG_CFA_TOGGLE_RA_MANGLE.
> * dwarf2cfi.c (dwarf2out_frame_debug_cfa_toggle_ra_mangle): New
> function.
> (dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE.
> * doc/tm.texi: Regenerate.
> * doc/tm.texi.in: Documents TARGET_DWARF_MAP_REGNOTE_TO_CFA.
> * config/aarch64/aarch64.c (aarch64_map_regnote_to_cfa): Implements
> TARGET_DWARF_MAP_REGNOTE_TO_CFA.
> (aarch64_expand_prologue): Generate DWARF info for return address
> signing.
> (aarch64_expand_epilogue): Likewise.
> (TARGET_DWARF_MAP_REGNOTE_TO_CFA): Define.
> 
> 
> 1.patch
> 
> 
> diff --git a/gcc/target.def b/gcc/target.def
> index 0443390..6aaa9e6 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -3995,6 +3995,14 @@ the CFI label attached to the insn, @var{pattern} is 
> the pattern of\n\
>  the insn and

Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-17 Thread Jakub Jelinek

On Tue, Jan 17, 2017 at 04:03:08PM +0300, Andrew Senkevich wrote:
> > I've played a bit w/ SDE. And looks like operands are not early clobber:
> > TID0: INS 0x004003ee AVX512VEX kmovd k0, eax
> > TID0:   k0 := _
> > ...
> > TID0: INS 0x004003f4 AVX512VEX kshiftlw k0, k0, 0x3
> > TID0:   k0 := _fff8
> >
> > You can see that same dest and source works just fine.
> 
> Hmm, I looked only on what ICC generates, and it was not correct way.

I've just tried
int
main ()
{
  unsigned int a = 0x;
  asm volatile ("kmovw %1, %%k6; kshiftlw $1, %%k6, %%k6; kmovw %%k6, %0" : 
"=r" (a) : "r" (a) : "k6");
  __builtin_printf ("%x\n", a);
  return 0;
}
on KNL and got 0x.
Are you going to report to the SDM authors so that they fix it up?
E.g. using TEMP <- SRC1[0:...] before DEST[...] <- 0 and using TEMP
instead of SRC1[0:...] would fix it, or filling up TEMP first and only
at the end assigning DEST <- TEMP etc. would do.

Jakub

Re: [IPA PATCH] Refactor decl localizing

2017-01-17 Thread Jan Hubicka

> This patch refactors the decl localizing that happens in
> function_and_variable_visibility.  It doesn't fix the bug I'm working on
> (that's next).
> 
> Both the FOR_EACH_FUNCTION and FOR_EACH_VARIABLE loops contain very similar,
> but not quite the same code for localizing a definition that it's determined
> need not be externally visible.  It looks to me that the
> not-quite-the-sameness is erroneous, and this patch refactors that code into
> a common subroutine. If the differences need to be maintained (slight
> differences in when unique_name is updated and whether resolution is set to
> LDPR_PREVAILING_DEF_IRONLY), I think a flag to the new function would be
> best, rather than keep the duplicated code.
> 
> booted & tested on x86_64-linux, ok?

OK,
the code has indeed grown into quite a mess over the years ;)

Thanks,
Honza
> 
> nathan
> -- 
> Nathan Sidwell

> 2017-01-06  Nathan Sidwell  
> 
>   * ipa-visibility.c (localize_node): New function, broken out of ...
>   (function_and_variable_visibility): Call it.
> 
> Index: ipa-visibility.c
> ===
> --- ipa-visibility.c  (revision 244159)
> +++ ipa-visibility.c  (working copy)
> @@ -529,6 +529,53 @@ optimize_weakref (symtab_node *node)
>gcc_assert (node->alias);
>  }
>  
> +/* NODE is an externally visible definition, which we've discovered is
> +   not needed externally.  Make it local to this compilation.  */
> +
> +static void
> +localize_node (bool whole_program, symtab_node *node)
> +{
> +  gcc_assert (whole_program || in_lto_p || !TREE_PUBLIC (node->decl));
> +
> +  if (node->same_comdat_group && TREE_PUBLIC (node->decl))
> +{
> +  for (symtab_node *next = node->same_comdat_group;
> +next != node; next = next->same_comdat_group)
> + {
> +   next->set_comdat_group (NULL);
> +   if (!next->alias)
> + next->set_section (NULL);
> +   if (!next->transparent_alias)
> + next->make_decl_local ();
> +   next->unique_name
> + |= ((next->resolution == LDPR_PREVAILING_DEF_IRONLY
> +  || next->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
> + && TREE_PUBLIC (next->decl)
> + && !flag_incremental_link);
> + }
> +
> +  /* Now everything's localized, the grouping has no meaning, and
> +  will cause crashes if we keep it around.  */
> +  node->dissolve_same_comdat_group_list ();
> +}
> +
> +  node->unique_name
> +|= ((node->resolution == LDPR_PREVAILING_DEF_IRONLY
> +  || node->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
> + && TREE_PUBLIC (node->decl)
> + && !flag_incremental_link);
> +
> +  if (TREE_PUBLIC (node->decl))
> +node->set_comdat_group (NULL);
> +  if (DECL_COMDAT (node->decl) && !node->alias)
> +node->set_section (NULL);
> +  if (!node->transparent_alias)
> +{
> +  node->resolution = LDPR_PREVAILING_DEF_IRONLY;
> +  node->make_decl_local ();
> +}
> +}
> +
>  /* Decide on visibility of all symbols.  */
>  
>  static unsigned int
> @@ -606,48 +653,7 @@ function_and_variable_visibility (bool w
>if (!node->externally_visible
> && node->definition && !node->weakref
> && !DECL_EXTERNAL (node->decl))
> - {
> -   gcc_assert (whole_program || in_lto_p
> -   || !TREE_PUBLIC (node->decl));
> -   node->unique_name
> - |= ((node->resolution == LDPR_PREVAILING_DEF_IRONLY
> -  || node->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
> - && TREE_PUBLIC (node->decl)
> - && !flag_incremental_link);
> -   node->resolution = LDPR_PREVAILING_DEF_IRONLY;
> -   if (node->same_comdat_group && TREE_PUBLIC (node->decl))
> - {
> -   symtab_node *next = node;
> -
> -   /* Set all members of comdat group local.  */
> -   for (next = node->same_comdat_group;
> -next != node;
> -next = next->same_comdat_group)
> - {
> -   next->set_comdat_group (NULL);
> -   if (!next->alias)
> - next->set_section (NULL);
> -   if (!next->transparent_alias)
> - next->make_decl_local ();
> -   next->unique_name
> - |= ((next->resolution == LDPR_PREVAILING_DEF_IRONLY
> -  || next->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP)
> - && TREE_PUBLIC (next->decl)
> - && !flag_incremental_link);
> - }
> -   /* cgraph_externally_visible_p has already checked all
> -  other nodes in the group and they will all be made
> -  local.  We need to dissolve the group at once so that
> -  the predicate does not segfault though. */
> -   node->dissolve_same_comdat_group_list ();
> - }
> -   if (TREE_PUBLIC (node->decl))
> - node->set_comdat_group (NULL);
> -

RE: [PATCH, MIPS] Target flag and build option to disable indexed memory OPs.

2017-01-17 Thread Moore, Catherine



> -Original Message-
> From: Matthew Fortune [mailto:matthew.fort...@imgtec.com]
> Sent: Tuesday, January 17, 2017 4:35 AM
> To: Moore, Catherine ; Doug
> Gilmore ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, MIPS] Target flag and build option to disable
> indexed memory OPs.
> 
> Moore, Catherine  writes:
> > > -Original Message-
> > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > > ow...@gcc.gnu.org] On Behalf Of Matthew Fortune
> > > Sent: Monday, January 16, 2017 11:25 AM
> > > To: Doug Gilmore ; gcc-
> > > patc...@gcc.gnu.org
> > > Cc: Moore, Catherine 
> > > Subject: RE: [PATCH, MIPS] Target flag and build option to disable
> > > indexed memory OPs.
> > >
> > > Doug Gilmore 
> > > > I recently bisected PR78176 to problems introduced with r21650.
> > > >
> > > > Given the short time until the release, we would like to provide a
> > > > target flag and build option to avoid the bug until we are able to
> > > > resolve the problem with the commit.  Note that as Matthew
> Fortune
> > > has
> > > > mentioned in the PR:
> > > >
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176#c5
> > > >
> > > > the problem could also be addressed by updates to the Linux
> kernel
> > > since
> > > > the problem is only exposed by running MIPS 32-bit binaries on
> 64-
> > > bit
> > > > kernels.
> > > >
> > > > Bootstrapped on X86_64, regression tested on X86_64 and MIPS.
> > > >
> > > > OK to commit?
> > >
> > > Given this is a generic reference to indexed load/store and the
> issue
> > > could
> > > affect any indexed operation then I think it needs to include all of
> the
> > > following as well:
> > >
> > > /* ISA has lwxs instruction (load w/scaled index address.  */
> > > #define ISA_HAS_LWXS((TARGET_SMARTMIPS ||
> > > TARGET_MICROMIPS) \
> > >  && !TARGET_MIPS16)
> > >
> > > /* ISA has lbx, lbux, lhx, lhx, lhux, lwx, lwux, or ldx instruction. */
> > > #define ISA_HAS_LBX (TARGET_OCTEON2)
> > > #define ISA_HAS_LBUX(ISA_HAS_DSP || TARGET_OCTEON2)
> > > #define ISA_HAS_LHX (ISA_HAS_DSP || TARGET_OCTEON2)
> > > #define ISA_HAS_LHUX(TARGET_OCTEON2)
> > > #define ISA_HAS_LWX (ISA_HAS_DSP || TARGET_OCTEON2)
> > > #define ISA_HAS_LWUX(TARGET_OCTEON2 &&
> TARGET_64BIT)
> > > #define ISA_HAS_LDX ((ISA_HAS_DSP || TARGET_OCTEON2)
> \
> > >  && TARGET_64BIT)
> > >
> > > The DSP LBUX/LHX/LWX/LDX intrinsics will also need a new AVAIL
> > > predicate
> > > to disable them. The snag is that some DSP code will fail to compile
> if it
> > > uses the DSP load intrinsics directly.
> > >
> > > I see no way of avoiding that. Therefore, distributions that use
> > > --without-indexed-load-store will have to cope with some potential
> > > DSP
> > > fallout if they enable DSP at all.
> > >
> > > @Catherine: I'd like your input here if possible as I advocated this
> > > approach, comments on option names welcome too.  I quite like
> the
> > > verbose
> > > name.
> >
> > Okay, based on my reading of the comments in the bug report, you
> are proposing this option
> > as a workaround to a kernel deficiency.  I don't see any agreement
> that this is actually a
> > compiler bug.
> > Do we really need to include the DSP instrinsics as well?   Do you
> think that many
> > distributions actually enable DSP?
> >
> > The option name itself is acceptable to me.  I'd like to see
> documentation that explains
> > when this problem is exposed.  I'd like to limit the fix to LWXS and I'd
> like to see the
> > testcase from the bug report added to the testsuite.
> > I also agree that the preprocessor macro is a good idea (even if we
> decide to forgo the
> > DSP portion of the patch).
> 
> Thanks for the comments.
> 
> Having thought further I agree we can safely ignore DSP indexed load
> and micromips LWXS on
> the basis that DSP code will not run on a MIPS64 processor anyway (at
> least none that I
> know of) so the issue cannot occur and similarly for microMIPS, there
> are no 64-bit cores.
> 
> Restricting to just LWXC1/SWXC1/LDXC1/SDXC1 is therefore fine but
> we should reflect
> that in option names then.
> 
> --with-lxc1-sxc1 --without-lxc1-sxc1
> -mlxc1-sxc1
> 
> These names reflect the internal macro that controls availability of
> these instructions.
> 
> Macro name: __mips_no_lxc1_sxc1
> Defined when !ISA_HAS_LXC1_SXC1 so would be present even when
> targeting a core that
> doesn't have the instructions anyway.
> 
> Any refinements to this Catherine?
> 
No.  This plan looks good.

Re: [PATCH][PR76731] Fix intrinsics according to icc and docs

2017-01-17 Thread Uros Bizjak

On Tue, Jan 17, 2017 at 2:27 PM, Koval, Julia  wrote:
> I fixed the Changelog. Can you commit it for me if it is ok?

This is fairly unreviewable patch, so let's trust testsuite that
everything is OK.

I'll commit the patch later today.

Thanks,
Uros.

> Thanks,
> Julia
>
> gcc/
>   * config/i386/avx512fintrin.h
> (_mm512_i32gather_ps): Fixed arg to void const*.
> (_mm512_mask_i32gather_ps): Ditto.
> (_mm512_i32gather_pd): Ditto.
> (_mm512_mask_i32gather_pd): Ditto.
> (_mm512_i64gather_ps): Ditto.
> (_mm512_mask_i64gather_ps): Ditto.
> (_mm512_i64gather_pd): Ditto.
> (_mm512_mask_i64gather_pd): Ditto.
> (_mm512_i32gather_epi32): Ditto.
> (_mm512_mask_i32gather_epi32): Ditto.
> (_mm512_i32gather_epi64): Ditto.
> (_mm512_mask_i32gather_epi64): Ditto.
> (_mm512_i64gather_epi32): Ditto.
> (_mm512_mask_i64gather_epi32): Ditto.
> (_mm512_i64gather_epi64): Ditto.
> (_mm512_mask_i64gather_epi64): Ditto.
> (_mm512_i32scatter_ps): Fixed arg to void*.
> (_mm512_mask_i32scatter_ps): Ditto.
> (_mm512_i32scatter_pd): Ditto.
> (_mm512_mask_i32scatter_pd): Ditto.
> (_mm512_i64scatter_ps): Ditto.
> (_mm512_mask_i64scatter_ps): Ditto.
> (_mm512_i64scatter_pd): Ditto.
> (_mm512_mask_i64scatter_pd): Ditto.
> (_mm512_i32scatter_epi32): Ditto.
> (_mm512_mask_i32scatter_epi32): Ditto.
> (_mm512_i32scatter_epi64): Ditto.
> (_mm512_mask_i32scatter_epi64): Ditto.
> (_mm512_i64scatter_epi32): Ditto.
> (_mm512_mask_i64scatter_epi32): Ditto.
> (_mm512_i64scatter_epi64): Ditto.
> (_mm512_mask_i64scatter_epi64): Ditto.
>   * config/i386/avx512pfintrin.h
> (_mm512_mask_prefetch_i32gather_pd): Fixed arg to void const*.
> (_mm512_mask_prefetch_i32gather_ps): Ditto.
> (_mm512_mask_prefetch_i64gather_pd): Ditto.
> (_mm512_mask_prefetch_i64gather_ps): Ditto.
> (_mm512_prefetch_i32scatter_pd): Fixed arg to void*.
> (_mm512_prefetch_i32scatter_ps): Ditto.
> (_mm512_mask_prefetch_i32scatter_pd): Ditto.
> (_mm512_mask_prefetch_i32scatter_ps): Ditto.
> (_mm512_prefetch_i64scatter_pd): Ditto.
> (_mm512_prefetch_i64scatter_ps): Ditto.
> (_mm512_mask_prefetch_i64scatter_pd): Ditto.
> (_mm512_mask_prefetch_i64scatter_ps): Ditto.
>   * config/i386/avx512vlintrin.h
> (_mm256_mmask_i32gather_ps): Fixed arg to void const*.
> (_mm_mmask_i32gather_ps): Ditto.
> (_mm256_mmask_i32gather_pd): Ditto.
> (_mm_mmask_i32gather_pd): Ditto.
> (_mm256_mmask_i64gather_ps): Ditto.
> (_mm_mmask_i64gather_ps): Ditto.
> (_mm256_mmask_i64gather_pd): Ditto.
> (_mm_mmask_i64gather_pd): Ditto.
> (_mm256_mmask_i32gather_epi32): Ditto.
> (_mm_mmask_i32gather_epi32): Ditto.
> (_mm256_mmask_i32gather_epi64): Ditto.
> (_mm_mmask_i32gather_epi64): Ditto.
> (_mm256_mmask_i64gather_epi32): Ditto.
> (_mm_mmask_i64gather_epi32): Ditto.
> (_mm256_mmask_i64gather_epi64): Ditto.
> (_mm_mmask_i64gather_epi64): Ditto.
> (_mm256_i32scatter_ps): Fixed arg to void*.
> (_mm256_mask_i32scatter_ps): Ditto.
> (_mm_i32scatter_ps): Ditto.
> (_mm_mask_i32scatter_ps): Ditto.
> (_mm256_i32scatter_pd): Ditto.
> (_mm256_mask_i32scatter_pd): Ditto.
> (_mm_i32scatter_pd): Ditto.
> (_mm_mask_i32scatter_pd): Ditto.
> (_mm256_i64scatter_ps): Ditto.
> (_mm256_mask_i64scatter_ps): Ditto.
> (_mm_i64scatter_ps): Ditto.
> (_mm_mask_i64scatter_ps): Ditto.
> (_mm256_i64scatter_pd): Ditto.
> (_mm256_mask_i64scatter_pd): Ditto.
> (_mm_i64scatter_pd): Ditto.
> (_mm_mask_i64scatter_pd): Ditto.
> (_mm256_i32scatter_epi32): Ditto.
> (_mm256_mask_i32scatter_epi32): Ditto.
> (_mm_i32scatter_epi32): Ditto.
> (_mm_mask_i32scatter_epi32): Ditto.
> (_mm256_i32scatter_epi64): Ditto.
> (_mm256_mask_i32scatter_epi64): Ditto.
> (_mm_i32scatter_epi64): Ditto.
> (_mm_mask_i32scatter_epi64): Ditto.
> (_mm256_i64scatter_epi32): Ditto.
> (_mm256_mask_i64scatter_epi32): Ditto.
> (_mm_i64scatter_epi32): Ditto.
> (_mm_mask_i64scatter_epi32): Ditto.
> (_mm256_i64scatter_epi64): Ditto.
> (_mm256_mask_i64scatter_epi64): Ditto.
> (_mm_i64scatter_epi64): Ditto.
> (_mm_mask_i64scatter_epi64): Ditto.
>   * config/i386/i386-builtin-types.def: (V16SF_V16SF_PCFLOAT_V16SI_HI_INT,
> V8DF_V8DF_PCDOUBLE_V8SI_QI_INT, V8SF_V8SF_PCFLOAT_V8DI_QI_INT,
> V8DF_V8DF_PCDOUBLE_V8DI_QI_INT, V16SI_V16SI_PCINT_V16SI_HI_INT,
> V8DI_V8DI_PCINT64_V8SI_QI_INT, V8SI_V8SI_PCINT_V8DI_QI_INT,
> V8DI_V8DI_PCINT64_V8DI_QI_INT, V2DF_V2DF_PCDOUBLE_V4SI_QI_INT,
> V4DF_V4DF_PCDOUBLE_V4SI_QI_INT, V2DF_V2DF_PCDOUBLE_V2DI_QI_INT,
> V4DF_V4DF_PCDOUBLE_V4DI_QI_INT, V4SF_V4SF_PCFLOAT_V4SI_QI_INT,
> V8SF_V8SF_PCFLOAT_V8SI_QI_INT, V4SF_V4SF_PCFLOAT_V2DI_QI_INT,
> V4SF_V4SF_PCFLOAT_V4DI_QI_INT, V2DI_V2DI_PCINT64_V4SI_QI_INT,
> V4DI_V4DI_PCINT64_V4SI_QI_INT,

RE: [PATCH][PR76731] Fix intrinsics according to icc and docs

2017-01-17 Thread Koval, Julia

I fixed the Changelog. Can you commit it for me if it is ok?

Thanks,
Julia

gcc/
  * config/i386/avx512fintrin.h
(_mm512_i32gather_ps): Fixed arg to void const*.
(_mm512_mask_i32gather_ps): Ditto.
(_mm512_i32gather_pd): Ditto.
(_mm512_mask_i32gather_pd): Ditto.
(_mm512_i64gather_ps): Ditto.
(_mm512_mask_i64gather_ps): Ditto.
(_mm512_i64gather_pd): Ditto.
(_mm512_mask_i64gather_pd): Ditto.
(_mm512_i32gather_epi32): Ditto.
(_mm512_mask_i32gather_epi32): Ditto.
(_mm512_i32gather_epi64): Ditto.
(_mm512_mask_i32gather_epi64): Ditto.
(_mm512_i64gather_epi32): Ditto.
(_mm512_mask_i64gather_epi32): Ditto.
(_mm512_i64gather_epi64): Ditto.
(_mm512_mask_i64gather_epi64): Ditto.
(_mm512_i32scatter_ps): Fixed arg to void*.
(_mm512_mask_i32scatter_ps): Ditto.
(_mm512_i32scatter_pd): Ditto.
(_mm512_mask_i32scatter_pd): Ditto.
(_mm512_i64scatter_ps): Ditto.
(_mm512_mask_i64scatter_ps): Ditto.
(_mm512_i64scatter_pd): Ditto.
(_mm512_mask_i64scatter_pd): Ditto.
(_mm512_i32scatter_epi32): Ditto.
(_mm512_mask_i32scatter_epi32): Ditto.
(_mm512_i32scatter_epi64): Ditto.
(_mm512_mask_i32scatter_epi64): Ditto.
(_mm512_i64scatter_epi32): Ditto.
(_mm512_mask_i64scatter_epi32): Ditto.
(_mm512_i64scatter_epi64): Ditto.
(_mm512_mask_i64scatter_epi64): Ditto.
  * config/i386/avx512pfintrin.h
(_mm512_mask_prefetch_i32gather_pd): Fixed arg to void const*.
(_mm512_mask_prefetch_i32gather_ps): Ditto.
(_mm512_mask_prefetch_i64gather_pd): Ditto.
(_mm512_mask_prefetch_i64gather_ps): Ditto.
(_mm512_prefetch_i32scatter_pd): Fixed arg to void*.
(_mm512_prefetch_i32scatter_ps): Ditto.
(_mm512_mask_prefetch_i32scatter_pd): Ditto.
(_mm512_mask_prefetch_i32scatter_ps): Ditto.
(_mm512_prefetch_i64scatter_pd): Ditto.
(_mm512_prefetch_i64scatter_ps): Ditto.
(_mm512_mask_prefetch_i64scatter_pd): Ditto.
(_mm512_mask_prefetch_i64scatter_ps): Ditto.
  * config/i386/avx512vlintrin.h
(_mm256_mmask_i32gather_ps): Fixed arg to void const*.
(_mm_mmask_i32gather_ps): Ditto.
(_mm256_mmask_i32gather_pd): Ditto.
(_mm_mmask_i32gather_pd): Ditto.
(_mm256_mmask_i64gather_ps): Ditto.
(_mm_mmask_i64gather_ps): Ditto.
(_mm256_mmask_i64gather_pd): Ditto.
(_mm_mmask_i64gather_pd): Ditto.
(_mm256_mmask_i32gather_epi32): Ditto.
(_mm_mmask_i32gather_epi32): Ditto.
(_mm256_mmask_i32gather_epi64): Ditto.
(_mm_mmask_i32gather_epi64): Ditto.
(_mm256_mmask_i64gather_epi32): Ditto.
(_mm_mmask_i64gather_epi32): Ditto.
(_mm256_mmask_i64gather_epi64): Ditto.
(_mm_mmask_i64gather_epi64): Ditto.
(_mm256_i32scatter_ps): Fixed arg to void*.
(_mm256_mask_i32scatter_ps): Ditto.
(_mm_i32scatter_ps): Ditto.
(_mm_mask_i32scatter_ps): Ditto.
(_mm256_i32scatter_pd): Ditto.
(_mm256_mask_i32scatter_pd): Ditto.
(_mm_i32scatter_pd): Ditto.
(_mm_mask_i32scatter_pd): Ditto.
(_mm256_i64scatter_ps): Ditto.
(_mm256_mask_i64scatter_ps): Ditto.
(_mm_i64scatter_ps): Ditto.
(_mm_mask_i64scatter_ps): Ditto.
(_mm256_i64scatter_pd): Ditto.
(_mm256_mask_i64scatter_pd): Ditto.
(_mm_i64scatter_pd): Ditto.
(_mm_mask_i64scatter_pd): Ditto.
(_mm256_i32scatter_epi32): Ditto.
(_mm256_mask_i32scatter_epi32): Ditto.
(_mm_i32scatter_epi32): Ditto.
(_mm_mask_i32scatter_epi32): Ditto.
(_mm256_i32scatter_epi64): Ditto.
(_mm256_mask_i32scatter_epi64): Ditto.
(_mm_i32scatter_epi64): Ditto.
(_mm_mask_i32scatter_epi64): Ditto.
(_mm256_i64scatter_epi32): Ditto.
(_mm256_mask_i64scatter_epi32): Ditto.
(_mm_i64scatter_epi32): Ditto.
(_mm_mask_i64scatter_epi32): Ditto.
(_mm256_i64scatter_epi64): Ditto.
(_mm256_mask_i64scatter_epi64): Ditto.
(_mm_i64scatter_epi64): Ditto.
(_mm_mask_i64scatter_epi64): Ditto.
  * config/i386/i386-builtin-types.def: (V16SF_V16SF_PCFLOAT_V16SI_HI_INT,
V8DF_V8DF_PCDOUBLE_V8SI_QI_INT, V8SF_V8SF_PCFLOAT_V8DI_QI_INT,
V8DF_V8DF_PCDOUBLE_V8DI_QI_INT, V16SI_V16SI_PCINT_V16SI_HI_INT,
V8DI_V8DI_PCINT64_V8SI_QI_INT, V8SI_V8SI_PCINT_V8DI_QI_INT,
V8DI_V8DI_PCINT64_V8DI_QI_INT, V2DF_V2DF_PCDOUBLE_V4SI_QI_INT,
V4DF_V4DF_PCDOUBLE_V4SI_QI_INT, V2DF_V2DF_PCDOUBLE_V2DI_QI_INT,
V4DF_V4DF_PCDOUBLE_V4DI_QI_INT, V4SF_V4SF_PCFLOAT_V4SI_QI_INT,
V8SF_V8SF_PCFLOAT_V8SI_QI_INT, V4SF_V4SF_PCFLOAT_V2DI_QI_INT,
V4SF_V4SF_PCFLOAT_V4DI_QI_INT, V2DI_V2DI_PCINT64_V4SI_QI_INT,
V4DI_V4DI_PCINT64_V4SI_QI_INT, V2DI_V2DI_PCINT64_V2DI_QI_INT,
V4DI_V4DI_PCINT64_V4DI_QI_INT, V4SI_V4SI_PCINT_V4SI_QI_INT,
V8SI_V8SI_PCINT_V8SI_QI_INT, V4SI_V4SI_PCINT_V2DI_QI_INT,
V4SI_V4SI_PCINT_V4DI_QI_INT, VOID_PFLOAT_HI_V16SI_V16SF_INT,
VOID_PFLOAT_QI_V8SI_V8SF_INT, VOID_PFLOAT_QI_V4SI_V4SF_INT,
VOID_PDOUBLE_QI_V8SI_V8DF_INT, VOID_PDOUBLE_QI_V4SI_V4DF_INT,
VOID_PDOUBLE_QI_V4SI_V2DF_INT, VOID_PFLOAT_QI_V8DI_V8SF_INT,

Fix (some of) profile updating after jump threading

2017-01-17 Thread Jan Hubicka

Hi,
the testcase is about jump threading confusing profile enough so many edges
are considered cold. The first problem occurs in thread1 pass where
first remove_ctrl_stmt_and_useless_edges does not ouptated outgoing edge
probability after removing the other edges (so we end up with a single
succ bb having outgoing edge probability less than REG_BR_PROB_BASE)
and second is duplicate_thread_path scaling uniformly profiles of all
bbs in the duplicate and original path.  This makes sense only when the
original path has no side entry edges and moreover is always wrong for
last BB.

The job of updating here is quite easy, just keep track of frequency/count
of the patch being duplicated and subtract it from the offline copy.
This of course assumes that all branches along the patch except for last
one remains unoptimized and with same exit probabilities.

In the last BB it is necessary to re-distribute exit edges probabilities
as we know its outcome. Fortunately there already is
update_bb_profile_for_threading which does the job for RTL threader.

Mainline gets following mismatch counts:
q.c.103t.thread1   21
q.c.104t.vrp1   40
q.c.106t.dce2   19
..
q.c.112t.mergephi3   17
q.c.113t.phiopt1   17

With patch I get:

q.c.103t.thread1   2
q.c.104t.vrp1   8
q.c.106t.dce2   6
...
q.c.112t.mergephi3   5
...
q.c.113t.phiopt1   5
...
q.c.118t.thread2   6
q.c.178t.thread3   7

q.c.182t.vrp2   17
q.c.183t.phicprop2   10

So VRP's jump threading is now the main source of inconsistencies.
The bug there seems ot be that Theresa's code does not care to update
edge probabilities when profile info is absent.

For tramp3d the numbers are:
tramp3d-v4.ii.094t.cunrolli   17
...
tramp3d-v4.ii.101t.fre3   68
tramp3d-v4.ii.102t.mergephi2   66
tramp3d-v4.ii.103t.thread1   335
tramp3d-v4.ii.104t.vrp1   605
tramp3d-v4.ii.106t.dce2   265
tramp3d-v4.ii.107t.stdarg   265
tramp3d-v4.ii.108t.cdce   269
tramp3d-v4.ii.109t.cselim   269
tramp3d-v4.ii.110t.copyprop1   259
tramp3d-v4.ii.111t.ifcombine   287
tramp3d-v4.ii.112t.mergephi3   273
...
tramp3d-v4.ii.115t.ch2   275
tramp3d-v4.ii.116t.cplxlower1   275
tramp3d-v4.ii.117t.sra   275
tramp3d-v4.ii.118t.thread2   281
tramp3d-v4.ii.119t.dom2   302
tramp3d-v4.ii.120t.isolate-paths   319
tramp3d-v4.ii.121t.phicprop1   309
tramp3d-v4.ii.122t.dse2   309
tramp3d-v4.ii.123t.reassoc1   311
tramp3d-v4.ii.124t.dce3   310
tramp3d-v4.ii.125t.forwprop3   314
...
tramp3d-v4.ii.131t.lim2   316
...
tramp3d-v4.ii.134t.pre   565
tramp3d-v4.ii.135t.sink   299
tramp3d-v4.ii.139t.dce4   299
tramp3d-v4.ii.140t.fix_loops   299
tramp3d-v4.ii.141t.loop   281
tramp3d-v4.ii.142t.loopinit   281
tramp3d-v4.ii.143t.unswitch   429
tramp3d-v4.ii.144t.sccp   429
tramp3d-v4.ii.145t.lsplit   431
tramp3d-v4.ii.146t.cddce2   431
tramp3d-v4.ii.147t.ldist   443
tramp3d-v4.ii.148t.copyprop2   443
tramp3d-v4.ii.154t.ivcanon   445
tramp3d-v4.ii.157t.ch_vect   445
tramp3d-v4.ii.158t.ifcvt   467
tramp3d-v4.ii.159t.vect   1179

tramp3d-v4.ii.162t.cunroll   1090
...
tramp3d-v4.ii.167t.loopdone   1085
tramp3d-v4.ii.171t.veclower21   1103
tramp3d-v4.ii.173t.printf-return-value2   1103
tramp3d-v4.ii.174t.reassoc2   1103
tramp3d-v4.ii.175t.slsr   1103
tramp3d-v4.ii.176t.split-paths   1103
tramp3d-v4.ii.178t.thread3   1143
tramp3d-v4.ii.179t.dom3   1151
tramp3d-v4.ii.180t.strlen   1151
tramp3d-v4.ii.181t.thread4   1173
tramp3d-v4.ii.182t.vrp2   2263
tramp3d-v4.ii.183t.phicprop2   1078
tramp3d-v4.ii.184t.dse3   1078
tramp3d-v4.ii.185t.cddce3   1078
tramp3d-v4.ii.186t.forwprop4   1078
tramp3d-v4.ii.187t.phiopt3   1078
tramp3d-v4.ii.188t.fab1   1079
tramp3d-v4.ii.189t.widening_mul   1079
tramp3d-v4.ii.190t.store-merging   1079
tramp3d-v4.ii.191t.tailc   1079
tramp3d-v4.ii.192t.dce7   1079
tramp3d-v4.ii.193t.crited2   1079
tramp3d-v4.ii.195t.uncprop1   1079
tramp3d-v4.ii.196t.local-pure-const2   1079
tramp3d-v4.ii.224t.ehcleanup2   588
tramp3d-v4.ii.225t.resx   2493
tramp3d-v4.ii.226t.nrv   2493
tramp3d-v4.ii.227t.optimized   2418

On mainline and:

tramp3d-v4.ii.094t.cunrolli   17
..
tramp3d-v4.ii.101t.fre3   68
tramp3d-v4.ii.102t.mergephi2   66
tramp3d-v4.ii.103t.thread1   129
tramp3d-v4.ii.104t.vrp1   324
tramp3d-v4.ii.106t.dce2   196
tramp3d-v4.ii.107t.stdarg   196
tramp3d-v4.ii.108t.cdce   200
..
tramp3d-v4.ii.111t.ifcombine   228
...
tramp3d-v4.ii.115t.ch2   230
...
tramp3d-v4.ii.119t.dom2   255
tramp3d-v4.ii.120t.isolate-paths   272
tramp3d-v4.ii.121t.phicprop1   264
tramp3d-v4.ii.122t.dse2   264
tramp3d-v4.ii.123t.reassoc1   266
tramp3d-v4.ii.124t.dce3   265
tramp3d-v4.ii.125t.forwprop3   269
..
tramp3d-v4.ii.131t.lim2   271
..
tramp3d-v4.ii.134t.pre   515
tramp3d-v4.ii.135t.sink   272
tramp3d-v4.ii.141t.loop   254
tramp3d-v4.ii.142t.loopinit   254
tramp3d-v4.ii.143t.unswitch   402
tramp3d-v4.ii.144t.sccp   402
tramp3d-v4.ii.145t.lsplit   404
tramp3d-v4.ii.146t.cddce2   404
tramp3d-v4.ii.147t.ldist   416
tramp3d-v4.ii.148t.copyprop2   416
tramp3d-v4.ii.154t.ivcanon   418
tramp3d-v4.ii.157t.ch_vect   418
tramp3d-v4.ii.158t.ifcvt   440

Re: [PATCH][PR76731] Fix intrinsics according to icc and docs

2017-01-17 Thread Uros Bizjak

On Tue, Jan 17, 2017 at 1:57 PM, Koval, Julia  wrote:
> Hi,
> I added builtin changes to Jakub's patch from 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 It fixes the issue, when 
> gather/scatter intrinsics has wrong spec. Ok for trunk?

>   * config/i386/i386-builtin-types.def: Remove types
> V16SF_V16SF_PCFLOAT_V16SI_HI_INT, V8DF_V8DF_PCDOUBLE_V8SI_QI_INT,
> V8SF_V8SF_PCFLOAT_V8DI_QI_INT,  V8DF_V8DF_PCDOUBLE_V8DI_QI_INT,
> V16SI_V16SI_PCINT_V16SI_HI_INT,  V8DI_V8DI_PCINT64_V8SI_QI_INT,
> V8SI_V8SI_PCINT_V8DI_QI_INT, V8DI_V8DI_PCINT64_V8DI_QI_INT,
> V2DF_V2DF_PCDOUBLE_V4SI_QI_INT, V4DF_V4DF_PCDOUBLE_V4SI_QI_INT,
> V2DF_V2DF_PCDOUBLE_V2DI_QI_INT, V4DF_V4DF_PCDOUBLE_V4DI_QI_INT,
> V4SF_V4SF_PCFLOAT_V4SI_QI_INT, V8SF_V8SF_PCFLOAT_V8SI_QI_INT,
> V4SF_V4SF_PCFLOAT_V2DI_QI_INT, V4SF_V4SF_PCFLOAT_V4DI_QI_INT,
> V2DI_V2DI_PCINT64_V4SI_QI_INT, V4DI_V4DI_PCINT64_V4SI_QI_INT,
> V2DI_V2DI_PCINT64_V2DI_QI_INT, V4DI_V4DI_PCINT64_V4DI_QI_INT,
> V4SI_V4SI_PCINT_V4SI_QI_INT, V8SI_V8SI_PCINT_V8SI_QI_INT,
> V4SI_V4SI_PCINT_V2DI_QI_INT, V4SI_V4SI_PCINT_V4DI_QI_INT,
> VOID_PFLOAT_HI_V16SI_V16SF_INT, VOID_PFLOAT_QI_V8SI_V8SF_INT,
> VOID_PFLOAT_QI_V4SI_V4SF_INT, VOID_PDOUBLE_QI_V8SI_V8DF_INT,
> VOID_PDOUBLE_QI_V4SI_V4DF_INT, VOID_PDOUBLE_QI_V4SI_V2DF_INT,
> VOID_PFLOAT_QI_V8DI_V8SF_INT, VOID_PFLOAT_QI_V4DI_V4SF_INT,
> VOID_PFLOAT_QI_V2DI_V4SF_INT, VOID_PDOUBLE_QI_V8DI_V8DF_INT,
> VOID_PDOUBLE_QI_V4DI_V4DF_INT, VOID_PDOUBLE_QI_V2DI_V2DF_INT,
> VOID_PINT_HI_V16SI_V16SI_INT, VOID_PINT_QI_V8SI_V8SI_INT,
> VOID_PINT_QI_V4SI_V4SI_INT, VOID_PLONGLONG_QI_V8SI_V8DI_INT,
> VOID_PLONGLONG_QI_V4SI_V4DI_INT, VOID_PLONGLONG_QI_V4SI_V2DI_INT,
> VOID_PINT_QI_V8DI_V8SI_INT, VOID_PINT_QI_V4DI_V4SI_INT,
> VOID_PINT_QI_V2DI_V4SI_INT, VOID_PLONGLONG_QI_V8DI_V8DI_INT,
> VOID_QI_V8SI_PCINT64_INT_INT, VOID_PLONGLONG_QI_V4DI_V4DI_INT,
> VOID_PLONGLONG_QI_V2DI_V2DI_INT, VOID_HI_V16SI_PCINT_INT_INT,
> VOID_QI_V8DI_PCINT64_INT_INT, VOID_QI_V8DI_PCINT_INT_INT
> Add types V16SF_V16SF_PCVOID_V16SI_HI_INT,  V8DF_V8DF_PCVOID_V8SI_QI_INT,
> V8SF_V8SF_PCVOID_V8DI_QI_INT, V8DF_V8DF_PCVOID_V8DI_QI_INT,
> V16SI_V16SI_PCVOID_V16SI_HI_INT, V8DI_V8DI_PCVOID_V8SI_QI_INT,
> V8SI_V8SI_PCVOID_V8DI_QI_INT, V8DI_V8DI_PCVOID_V8DI_QI_INT,
> VOID_PVOID_HI_V16SI_V16SF_INT, VOID_PVOID_QI_V8SI_V8DF_INT,
> VOID_PVOID_QI_V8DI_V8SF_INT, VOID_PVOID_QI_V8DI_V8DF_INT,
> VOID_PVOID_HI_V16SI_V16SI_INT, VOID_PVOID_QI_V8SI_V8DI_INT,
> VOID_PVOID_QI_V8DI_V8SI_INT, VOID_PVOID_QI_V8DI_V8DI_INT,
> V2DF_V2DF_PCVOID_V4SI_QI_INT, V4DF_V4DF_PCVOID_V4SI_QI_INT,
> V2DF_V2DF_PCVOID_V2DI_QI_INT, V4DF_V4DF_PCVOID_V4DI_QI_INT
> V4SF_V4SF_PCVOID_V4SI_QI_INT, V8SF_V8SF_PCVOID_V8SI_QI_INT,
> V4SF_V4SF_PCVOID_V2DI_QI_INT, V4SF_V4SF_PCVOID_V4DI_QI_INT,
> V2DI_V2DI_PCVOID_V4SI_QI_INT, V4DI_V4DI_PCVOID_V4SI_QI_INT,
> V2DI_V2DI_PCVOID_V2DI_QI_INT, V4DI_V4DI_PCVOID_V4DI_QI_INT,
> V4SI_V4SI_PCVOID_V4SI_QI_INT, V8SI_V8SI_PCVOID_V8SI_QI_INT,
> V4SI_V4SI_PCVOID_V2DI_QI_INT, V4SI_V4SI_PCVOID_V4DI_QI_INT,
> VOID_PVOID_QI_V8SI_V8SF_INT, VOID_PVOID_QI_V4SI_V4SF_INT,
> VOID_PVOID_QI_V4SI_V4DF_INT, VOID_PVOID_QI_V4SI_V2DF_INT,
> VOID_PVOID_QI_V4DI_V4SF_INT, VOID_PVOID_QI_V2DI_V4SF_INT,
> VOID_PVOID_QI_V4DI_V4DF_INT, VOID_PVOID_QI_V2DI_V2DF_INT,
> VOID_PVOID_QI_V8SI_V8SI_INT, VOID_PVOID_QI_V4SI_V4SI_INT,
> VOID_PVOID_QI_V4SI_V4DI_INT, VOID_PVOID_QI_V4SI_V2DI_INT,
> VOID_PVOID_QI_V4DI_V4SI_INT, VOID_PVOID_QI_V2DI_V4SI_INT,
> VOID_PVOID_QI_V4DI_V4DI_INT, VOID_PVOID_QI_V2DI_V2DI_INT,
> VOID_QI_V8SI_PCVOID_INT_INT, VOID_HI_V16SI_PCVOID_INT_INT,
> VOID_QI_V8DI_PCVOID_INT_INT

Please write the above part as:

   * config/i386/i386-builtin-types.def  (V16SF_V16SF_PCFLOAT_V16SI_HI_INT,
 V8DF_V8DF_PCDOUBLE_V8SI_QI_INT, V8SF_V8SF_PCFLOAT_V8DI_QI_INT,
 ...
 VOID_QI_V8DI_PCINT_INT_INT): Remove.
 ( ... ): Add.

Uros.

Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-17 Thread Andrew Senkevich

2017-01-17 15:30 GMT+03:00 Kirill Yukhin :
> Hi Anrey,
> On 17 Jan 14:04, Andrew Senkevich wrote:
>> 2017-01-17 1:55 GMT+03:00 Jakub Jelinek :
>> > On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
>> >> here is one more part of intrinsics for k-mask registers shifts:
>> >
>> > The software developer manuals describe KSHIFT{L,R}* like:
>> > KSHIFTLW
>> > COUNT <- imm8[7:0]
>> > DEST[MAX_KL-1:0] <- 0
>> > IF COUNT <=15
>> > THEN DEST[15:0] <- SRC1[15:0] << COUNT;
>> > FI;
>> >
>> > What is the behavior when src1 == dest, like:
>> >   kshiftld $3, %k3, %k3
>> > ?  Is it just a bug in the SDM and will it actually do the expected thing
>> > (set %k3 to %k3 << 3 and clear just the upper bits), or do we need
>> > an early-clobber on the destination to make sure GCC never emits these
>> > insns with the same register as both input and output?
>>
>> Indeed, it should be different registers, how to do it?
> Are you sure?
>
> I've played a bit w/ SDE. And looks like operands are not early clobber:
> TID0: INS 0x004003ee AVX512VEX kmovd k0, eax
> TID0:   k0 := _
> ...
> TID0: INS 0x004003f4 AVX512VEX kshiftlw k0, k0, 0x3
> TID0:   k0 := _fff8
>
> You can see that same dest and source works just fine.

Hmm, I looked only on what ICC generates, and it was not correct way.

Thanks Kirill!


--
WBR,
Andrew

Re: [PATCH][PR76731] Fix intrinsics according to icc and docs

2017-01-17 Thread Kirill Yukhin

Hi Julia,
On 17 Jan 12:57, Koval, Julia wrote:
> Hi,
> I added builtin changes to Jakub's patch from 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 It fixes the issue, when 
> gather/scatter intrinsics has wrong spec. Ok for trunk?
Patch is OK for main trunk


--
Thanks, K
>
> gcc/
>   * config/i386/avx512fintrin.h
> (_mm512_i32gather_ps): Fixed arg to void const*.
> (_mm512_mask_i32gather_ps): Ditto.
> (_mm512_i32gather_pd): Ditto.
> (_mm512_mask_i32gather_pd): Ditto.
> (_mm512_i64gather_ps): Ditto.
> (_mm512_mask_i64gather_ps): Ditto.
> (_mm512_i64gather_pd): Ditto.
> (_mm512_mask_i64gather_pd): Ditto.
> (_mm512_i32gather_epi32): Ditto.
> (_mm512_mask_i32gather_epi32): Ditto.
> (_mm512_i32gather_epi64): Ditto.
> (_mm512_mask_i32gather_epi64): Ditto.
> (_mm512_i64gather_epi32): Ditto.
> (_mm512_mask_i64gather_epi32): Ditto.
> (_mm512_i64gather_epi64): Ditto.
> (_mm512_mask_i64gather_epi64): Ditto.
> (_mm512_i32scatter_ps): Fixed arg to void*.
> (_mm512_mask_i32scatter_ps): Ditto.
> (_mm512_i32scatter_pd): Ditto.
> (_mm512_mask_i32scatter_pd): Ditto.
> (_mm512_i64scatter_ps): Ditto.
> (_mm512_mask_i64scatter_ps): Ditto.
> (_mm512_i64scatter_pd): Ditto.
> (_mm512_mask_i64scatter_pd): Ditto.
> (_mm512_i32scatter_epi32): Ditto.
> (_mm512_mask_i32scatter_epi32): Ditto.
> (_mm512_i32scatter_epi64): Ditto.
> (_mm512_mask_i32scatter_epi64): Ditto.
> (_mm512_i64scatter_epi32): Ditto.
> (_mm512_mask_i64scatter_epi32): Ditto.
> (_mm512_i64scatter_epi64): Ditto.
> (_mm512_mask_i64scatter_epi64): Ditto.
>   * config/i386/avx512pfintrin.h
> (_mm512_mask_prefetch_i32gather_pd): Fixed arg to void const*.
> (_mm512_mask_prefetch_i32gather_ps): Ditto.
> (_mm512_mask_prefetch_i64gather_pd): Ditto.
> (_mm512_mask_prefetch_i64gather_ps): Ditto.
> (_mm512_prefetch_i32scatter_pd): Fixed arg to void*.
> (_mm512_prefetch_i32scatter_ps): Ditto.
> (_mm512_mask_prefetch_i32scatter_pd): Ditto.
> (_mm512_mask_prefetch_i32scatter_ps): Ditto.
> (_mm512_prefetch_i64scatter_pd): Ditto.
> (_mm512_prefetch_i64scatter_ps): Ditto.
> (_mm512_mask_prefetch_i64scatter_pd): Ditto.
> (_mm512_mask_prefetch_i64scatter_ps): Ditto.
>   * config/i386/avx512vlintrin.h
> (_mm256_mmask_i32gather_ps): Fixed arg to void const*.
> (_mm_mmask_i32gather_ps): Ditto.
> (_mm256_mmask_i32gather_pd): Ditto.
> (_mm_mmask_i32gather_pd): Ditto.
> (_mm256_mmask_i64gather_ps): Ditto.
> (_mm_mmask_i64gather_ps): Ditto.
> (_mm256_mmask_i64gather_pd): Ditto.
> (_mm_mmask_i64gather_pd): Ditto.
> (_mm256_mmask_i32gather_epi32): Ditto.
> (_mm_mmask_i32gather_epi32): Ditto.
> (_mm256_mmask_i32gather_epi64): Ditto.
> (_mm_mmask_i32gather_epi64): Ditto.
> (_mm256_mmask_i64gather_epi32): Ditto.
> (_mm_mmask_i64gather_epi32): Ditto.
> (_mm256_mmask_i64gather_epi64): Ditto.
> (_mm_mmask_i64gather_epi64): Ditto.
> (_mm256_i32scatter_ps): Fixed arg to void*.
> (_mm256_mask_i32scatter_ps): Ditto.
> (_mm_i32scatter_ps): Ditto.
> (_mm_mask_i32scatter_ps): Ditto.
> (_mm256_i32scatter_pd): Ditto.
> (_mm256_mask_i32scatter_pd): Ditto.
> (_mm_i32scatter_pd): Ditto.
> (_mm_mask_i32scatter_pd): Ditto.
> (_mm256_i64scatter_ps): Ditto.
> (_mm256_mask_i64scatter_ps): Ditto.
> (_mm_i64scatter_ps): Ditto.
> (_mm_mask_i64scatter_ps): Ditto.
> (_mm256_i64scatter_pd): Ditto.
> (_mm256_mask_i64scatter_pd): Ditto.
> (_mm_i64scatter_pd): Ditto.
> (_mm_mask_i64scatter_pd): Ditto.
> (_mm256_i32scatter_epi32): Ditto.
> (_mm256_mask_i32scatter_epi32): Ditto.
> (_mm_i32scatter_epi32): Ditto.
> (_mm_mask_i32scatter_epi32): Ditto.
> (_mm256_i32scatter_epi64): Ditto.
> (_mm256_mask_i32scatter_epi64): Ditto.
> (_mm_i32scatter_epi64): Ditto.
> (_mm_mask_i32scatter_epi64): Ditto.
> (_mm256_i64scatter_epi32): Ditto.
> (_mm256_mask_i64scatter_epi32): Ditto.
> (_mm_i64scatter_epi32): Ditto.
> (_mm_mask_i64scatter_epi32): Ditto.
> (_mm256_i64scatter_epi64): Ditto.
> (_mm256_mask_i64scatter_epi64): Ditto.
> (_mm_i64scatter_epi64): Ditto.
> (_mm_mask_i64scatter_epi64): Ditto.
>   * config/i386/i386-builtin-types.def: Remove types
> V16SF_V16SF_PCFLOAT_V16SI_HI_INT, V8DF_V8DF_PCDOUBLE_V8SI_QI_INT,
> V8SF_V8SF_PCFLOAT_V8DI_QI_INT,  V8DF_V8DF_PCDOUBLE_V8DI_QI_INT,
> V16SI_V16SI_PCINT_V16SI_HI_INT,  V8DI_V8DI_PCINT64_V8SI_QI_INT,
> V8SI_V8SI_PCINT_V8DI_QI_INT, V8DI_V8DI_PCINT64_V8DI_QI_INT,
> V2DF_V2DF_PCDOUBLE_V4SI_QI_INT, V4DF_V4DF_PCDOUBLE_V4SI_QI_INT,
> V2DF_V2DF_PCDOUBLE_V2DI_QI_INT, V4DF_V4DF_PCDOUBLE_V4DI_QI_INT,
> V4SF_V4SF_PCFLOAT_V4SI_QI_INT, V8SF_V8SF_PCFLOAT_V8SI_QI_INT,
> V4SF_V4SF_PCFLOAT_V2DI_QI_INT, V4SF_V4SF_PCFLOAT_V4DI_QI_INT,
> V2DI_V2DI_PCINT64_V4SI_QI_INT, V4DI_V4DI_PCINT64_V4SI_QI_INT,
>

[PATCH][PR76731] Fix intrinsics according to icc and docs

2017-01-17 Thread Koval, Julia

Hi,
I added builtin changes to Jakub's patch from 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 It fixes the issue, when 
gather/scatter intrinsics has wrong spec. Ok for trunk?

gcc/
  * config/i386/avx512fintrin.h
(_mm512_i32gather_ps): Fixed arg to void const*.
(_mm512_mask_i32gather_ps): Ditto.
(_mm512_i32gather_pd): Ditto.
(_mm512_mask_i32gather_pd): Ditto.
(_mm512_i64gather_ps): Ditto.
(_mm512_mask_i64gather_ps): Ditto.
(_mm512_i64gather_pd): Ditto.
(_mm512_mask_i64gather_pd): Ditto.
(_mm512_i32gather_epi32): Ditto.
(_mm512_mask_i32gather_epi32): Ditto.
(_mm512_i32gather_epi64): Ditto.
(_mm512_mask_i32gather_epi64): Ditto.
(_mm512_i64gather_epi32): Ditto.
(_mm512_mask_i64gather_epi32): Ditto.
(_mm512_i64gather_epi64): Ditto.
(_mm512_mask_i64gather_epi64): Ditto.
(_mm512_i32scatter_ps): Fixed arg to void*.
(_mm512_mask_i32scatter_ps): Ditto.
(_mm512_i32scatter_pd): Ditto.
(_mm512_mask_i32scatter_pd): Ditto.
(_mm512_i64scatter_ps): Ditto.
(_mm512_mask_i64scatter_ps): Ditto.
(_mm512_i64scatter_pd): Ditto.
(_mm512_mask_i64scatter_pd): Ditto.
(_mm512_i32scatter_epi32): Ditto.
(_mm512_mask_i32scatter_epi32): Ditto.
(_mm512_i32scatter_epi64): Ditto.
(_mm512_mask_i32scatter_epi64): Ditto.
(_mm512_i64scatter_epi32): Ditto.
(_mm512_mask_i64scatter_epi32): Ditto.
(_mm512_i64scatter_epi64): Ditto.
(_mm512_mask_i64scatter_epi64): Ditto.
  * config/i386/avx512pfintrin.h
(_mm512_mask_prefetch_i32gather_pd): Fixed arg to void const*.
(_mm512_mask_prefetch_i32gather_ps): Ditto.
(_mm512_mask_prefetch_i64gather_pd): Ditto.
(_mm512_mask_prefetch_i64gather_ps): Ditto.
(_mm512_prefetch_i32scatter_pd): Fixed arg to void*.
(_mm512_prefetch_i32scatter_ps): Ditto.
(_mm512_mask_prefetch_i32scatter_pd): Ditto.
(_mm512_mask_prefetch_i32scatter_ps): Ditto.
(_mm512_prefetch_i64scatter_pd): Ditto.
(_mm512_prefetch_i64scatter_ps): Ditto.
(_mm512_mask_prefetch_i64scatter_pd): Ditto.
(_mm512_mask_prefetch_i64scatter_ps): Ditto.
  * config/i386/avx512vlintrin.h
(_mm256_mmask_i32gather_ps): Fixed arg to void const*.
(_mm_mmask_i32gather_ps): Ditto.
(_mm256_mmask_i32gather_pd): Ditto.
(_mm_mmask_i32gather_pd): Ditto.
(_mm256_mmask_i64gather_ps): Ditto.
(_mm_mmask_i64gather_ps): Ditto.
(_mm256_mmask_i64gather_pd): Ditto.
(_mm_mmask_i64gather_pd): Ditto.
(_mm256_mmask_i32gather_epi32): Ditto.
(_mm_mmask_i32gather_epi32): Ditto.
(_mm256_mmask_i32gather_epi64): Ditto.
(_mm_mmask_i32gather_epi64): Ditto.
(_mm256_mmask_i64gather_epi32): Ditto.
(_mm_mmask_i64gather_epi32): Ditto.
(_mm256_mmask_i64gather_epi64): Ditto.
(_mm_mmask_i64gather_epi64): Ditto.
(_mm256_i32scatter_ps): Fixed arg to void*.
(_mm256_mask_i32scatter_ps): Ditto.
(_mm_i32scatter_ps): Ditto.
(_mm_mask_i32scatter_ps): Ditto.
(_mm256_i32scatter_pd): Ditto.
(_mm256_mask_i32scatter_pd): Ditto.
(_mm_i32scatter_pd): Ditto.
(_mm_mask_i32scatter_pd): Ditto.
(_mm256_i64scatter_ps): Ditto.
(_mm256_mask_i64scatter_ps): Ditto.
(_mm_i64scatter_ps): Ditto.
(_mm_mask_i64scatter_ps): Ditto.
(_mm256_i64scatter_pd): Ditto.
(_mm256_mask_i64scatter_pd): Ditto.
(_mm_i64scatter_pd): Ditto.
(_mm_mask_i64scatter_pd): Ditto.
(_mm256_i32scatter_epi32): Ditto.
(_mm256_mask_i32scatter_epi32): Ditto.
(_mm_i32scatter_epi32): Ditto.
(_mm_mask_i32scatter_epi32): Ditto.
(_mm256_i32scatter_epi64): Ditto.
(_mm256_mask_i32scatter_epi64): Ditto.
(_mm_i32scatter_epi64): Ditto.
(_mm_mask_i32scatter_epi64): Ditto.
(_mm256_i64scatter_epi32): Ditto.
(_mm256_mask_i64scatter_epi32): Ditto.
(_mm_i64scatter_epi32): Ditto.
(_mm_mask_i64scatter_epi32): Ditto.
(_mm256_i64scatter_epi64): Ditto.
(_mm256_mask_i64scatter_epi64): Ditto.
(_mm_i64scatter_epi64): Ditto.
(_mm_mask_i64scatter_epi64): Ditto.
  * config/i386/i386-builtin-types.def: Remove types
V16SF_V16SF_PCFLOAT_V16SI_HI_INT, V8DF_V8DF_PCDOUBLE_V8SI_QI_INT,
V8SF_V8SF_PCFLOAT_V8DI_QI_INT,  V8DF_V8DF_PCDOUBLE_V8DI_QI_INT,
V16SI_V16SI_PCINT_V16SI_HI_INT,  V8DI_V8DI_PCINT64_V8SI_QI_INT,
V8SI_V8SI_PCINT_V8DI_QI_INT, V8DI_V8DI_PCINT64_V8DI_QI_INT,
V2DF_V2DF_PCDOUBLE_V4SI_QI_INT, V4DF_V4DF_PCDOUBLE_V4SI_QI_INT,
V2DF_V2DF_PCDOUBLE_V2DI_QI_INT, V4DF_V4DF_PCDOUBLE_V4DI_QI_INT,
V4SF_V4SF_PCFLOAT_V4SI_QI_INT, V8SF_V8SF_PCFLOAT_V8SI_QI_INT,
V4SF_V4SF_PCFLOAT_V2DI_QI_INT, V4SF_V4SF_PCFLOAT_V4DI_QI_INT,
V2DI_V2DI_PCINT64_V4SI_QI_INT, V4DI_V4DI_PCINT64_V4SI_QI_INT,
V2DI_V2DI_PCINT64_V2DI_QI_INT, V4DI_V4DI_PCINT64_V4DI_QI_INT,
V4SI_V4SI_PCINT_V4SI_QI_INT, V8SI_V8SI_PCINT_V8SI_QI_INT,
V4SI_V4SI_PCINT_V2DI_QI_INT, V4SI_V4SI_PCINT_V4DI_QI_INT,
VOID_PFLOAT_HI_V16SI_V16SF_INT, VOID_PFLOAT_QI_V8SI_V8SF_INT,
VOID_PFLOAT_QI_V4SI_V4SF_INT, VOID_PDOUBLE_QI_V8SI_V8DF_INT,

RE: [PATCH] MIPS: Fix generation of DIV.G and MOD.G for Loongson targets.

2017-01-17 Thread Matthew Fortune

Maciej Rozycki  writes:
> On Mon, 16 Jan 2017, Toma Tabacu wrote:
> 
> > After searching through the archives, I have found an interesting bit
> > of information about DIV.G/MOD.G in the original submission thread:
> >
> > > > Ruan Beihong 23 July 2008:
> > > >
> > > > I've seen the Loongson 2F manual carefully. The (d)div(u) is
> > > > internally splited into one (d)div(u).g and one (d)mod(u).g. So I
> > > > said before was wrong. The truth is that, (d)div(u).g and
> > > > (d)mod(u).g are always faster than (d)div(u), at least the time
> > > > spend on mflo/mfhi is saved.
> > > >
> > > > James Ruan
> > >
> > > Richard Sandiford 24 July 2008:
> > >
> > > OK, great.  In that case, it should simply be a case of disabling
> > > the divmod-related insns for Loongson, in addition to your patch.
> > > (Probably stating the obvious there, sorry.)
> > >
> > > Richard
> >
> > Here's the link for part 1 of the submission thread (has the quotes
> from above):
> > https://gcc.gnu.org/ml/gcc-patches/2008-07/msg01529.html
> > and here's part 2:
> > https://gcc.gnu.org/ml/gcc-patches/2008-11/msg00273.html
> 
>  Thanks for digging this out!
> 
> > If DIV.G/MOD.G are faster, according to Ruan Beihong, and also smaller
> > than DIV (or the same size [1]), as pointed out by Maciej, then I am
> > led to the same conclusion as Richard Sandiford: that only DIV.G/MOD.G
> > should be generated for Loongson.
> >
> > I think it would still be a good idea to add a test for separated
> > DIV.G/MOD.G, though.
> 
>  Possibly, though the combined tests need to stay then, to make sure
> generic DIV/DIVU is not ever produced.

I'm happy to just stick with the original tests as they effectively test
both scenarios just at different optimisation levels. i.e. the new divmod
expansion only kicks in at -O2 I believe.

> > What are your thoughts on this ?
> > Have I misunderstood something in the context of the submission thread
> ?
> >
> > Regards,
> > Toma
> >
> > [1] I've noticed that GCC generates the same TEQ instruction twice if
> > both DIV.G and MOD.G are needed, which makes the sequence just as big
> > as DIV + TEQ + MFHI + MFLO; this seems unnecessary to me.
> 
>  This ought to be handled then, likely by adding Loongson-specific RTL
> insns matching the `divmod4' and `udivmod4' expanders.  It
> may be as simple as say (conceptually, untested):
> 
> (define_insn "divmod4_loongson"
>   [(set (match_operand:GPR 0 "register_operand" "=d")
>   (any_div:GPR (match_operand:GPR 1 "register_operand" "d")
>(match_operand:GPR 2 "register_operand" "d")))
>(set (match_operand:GPR 3 "register_operand" "=d")
>   (any_mod:GPR (match_dup 1)
>(match_dup 2)))]
>   "TARGET_LOONGSON_2EF"
> {
>   return mips_output_division
> ("div.g\t%0,%1,%2\;mod.g\t%3,%1,%2", operands);
> }
>   [(set_attr "type" "idiv")
>(set_attr "mode" "")])
> 
> although any final fix will have to take an instruction count adjustment
> into account too, as `mips_idiv_insns' won't as it stands handle the new
> case.

Sounds good. I'd prefer to get the testsuite clean first then improve the
code quality as a later step since it is not a regression and we are
a few days off stage 4.

In terms of the patch then the ISA_HAS_DIV3 macro is not currently used so
I suggest that instead it is renamed to ISA_AVOID_DIV_HILO and then use
that macro in the definition of ISA_HAS_DIV and ISA_HAS_DDIV to turn
off the DIV/DDIV instructions.

The ISA_HAS_DIV3 should have been cleaned up when R6 was added as it is
ambiguous and could refer to multiple variants of 3-reg operand DIV now
rather than just Loongson's.

Thanks,
Matthew

Re: [IPA PATCH] Refactor decl localizing

2017-01-17 Thread Nathan Sidwell


Ping?

On 01/06/2017 04:25 PM, Nathan Sidwell wrote:

This patch refactors the decl localizing that happens in
function_and_variable_visibility.  It doesn't fix the bug I'm working on
(that's next).

Both the FOR_EACH_FUNCTION and FOR_EACH_VARIABLE loops contain very
similar, but not quite the same code for localizing a definition that
it's determined need not be externally visible.  It looks to me that the
not-quite-the-sameness is erroneous, and this patch refactors that code
into a common subroutine. If the differences need to be maintained
(slight differences in when unique_name is updated and whether
resolution is set to LDPR_PREVAILING_DEF_IRONLY), I think a flag to the
new function would be best, rather than keep the duplicated code.

booted & tested on x86_64-linux, ok?

nathan



--
Nathan Sidwell

Re: [PATCH 9c] callgraph: handle __RTL functions

2017-01-17 Thread Jan Hubicka

> On Mon, Jan 16, 2017 at 10:25 PM, Jeff Law  wrote:
> > On 01/09/2017 07:38 PM, David Malcolm wrote:
> >>
> >> The RTL backend code is full of singleton state, so we have to handle
> >> functions as soon as we parse them.  This requires various special-casing
> >> in the callgraph code.
> >>
> >> gcc/ChangeLog:
> >> * cgraph.h (symtab_node::native_rtl_p): New decl.
> >> * cgraphunit.c (symtab_node::native_rtl_p): New function.
> >> (symtab_node::needed_p): Don't assert for early assembly output
> >> for __RTL functions.
> >> (cgraph_node::finalize_function): Set "force_output" for __RTL
> >> functions.
> >> (cgraph_node::analyze): Bail out early for __RTL functions.
> >> (analyze_functions): Update assertion to support __RTL functions.
> >> (cgraph_node::expand): Bail out early for __RTL functions.
> >> * gimple-expr.c: Include "tree-pass.h".
> >> (gimple_has_body_p): Return false for __RTL functions.
> >> ---
> >>  gcc/cgraph.h  |  4 
> >>  gcc/cgraphunit.c  | 41 ++---
> >>  gcc/gimple-expr.c |  3 ++-
> >>  3 files changed, 44 insertions(+), 4 deletions(-)
> >>
> >
> >> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> >> index 81a3ae9..ed699e1 100644
> >> --- a/gcc/cgraphunit.c
> >> +++ b/gcc/cgraphunit.c
> >
> >  @@ -568,6 +591,12 @@ cgraph_node::add_new_function (tree fndecl, bool
> > lowered)
> >>
> >>  void
> >>  cgraph_node::analyze (void)
> >>  {
> >> +  if (native_rtl_p ())
> >> +{
> >> +  analyzed = true;
> >> +  return;
> >> +}
> >
> > So my concern here would be how this interacts with the rest of the cgraph
> > machinery.  Essentially you're saying we've built all the properties for the
> > given code.  But AFAICT that can't be true and cgraph isn't actually aware
> > of any of the properties of the native RTL code (even such things as what
> > functions the native RTL code might call).
> >
> > So I guess my question is how do you ensure that even though cgraph hasn't
> > looked at code that we're appropriately conservative with how the file is
> > processed?  Particularly if there's other code in the source file that is
> > expected to interact with the RTL native code?
> 
> I think that as we're finalizing the function from the FE before the
> cgraph is built
> (and even throw away the RTL?) we have no other choice than treating a __RTL
> function as black box which means treat it as possibly calling all function in
> the TU and reading/writing/taking the address of all decls in the TU.  
> Consider

I guess RTL frontend may be arranged to mark all such decls as used or just 
require
user to do it, like we do with asm statements.

I wonder why we need to insert those definitions into cgraph at first place...

Honza
> 
> static int i;
> static void foo () {}
> int __RTL main()
> {
>   ... call foo, access i ...
> }
> 
> which probably will right now optimize i and foo away and thus fail to link?
> 
> But I think we can sort out these "details" when we run into them...
> 
> Richard.
> 
> > Jeff

1 2 >

1 - 100 of 132 matches

Mail list logo