Re: [PATCH] Better distinguish OpenACC and OpenMP sections in libgomp.texi

2019-01-10 Thread Jakub Jelinek
On Fri, Jan 11, 2019 at 01:03:48AM +, Julian Brown wrote:
> 2019-xx-xx  Thomas Schwinge  
> James Norris  
> 
> * libgomp.texi: Better distinguish OpenACC and OpenMP "Runtime
> Library Routines", and "Environment Variables".

Ok, thanks.

Jakub


[PATCH] PR fortran/35031 -- Check F2018:C1246

2019-01-10 Thread Steve Kargl
An entry-name obtains the elemental attribute from its containing
procedure.  F2018:C1546 prohibits an procedure from having a BIND(C)
attribute.  BIND(C) can appear on the entry-stmt line, so gfortran
needs to check for a conflict.  The attached patch does this check.
Tested on x86_64-*-freebsd.  Ok to commit?

2019-01-10  Steven G. Kargl  

PR fortran/35031
* decl.c (gfc_match_entry): Check for F2018:C1546.  Fix nearby
mis-indentation.
 
2019-01-10  Steven G. Kargl  

PR fortran/35031
* gfortran.dg/pr35031.f90: new test.

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 267825)
+++ gcc/fortran/decl.c	(working copy)
@@ -7431,9 +7431,11 @@ gfc_match_entry (void)
 	  gfc_error ("Missing required parentheses before BIND(C) at %C");
 	  return MATCH_ERROR;
 	}
-	if (!gfc_add_is_bind_c (&(entry->attr), entry->name,
-&(entry->declared_at), 1))
-	  return MATCH_ERROR;
+
+	  if (!gfc_add_is_bind_c (&(entry->attr), entry->name,
+  &(entry->declared_at), 1))
+	return MATCH_ERROR;
+	
 	}
 
   if (!gfc_current_ns->parent
@@ -7514,6 +7516,14 @@ gfc_match_entry (void)
   if (gfc_match_eos () != MATCH_YES)
 {
   gfc_syntax_error (ST_ENTRY);
+  return MATCH_ERROR;
+}
+
+  /* F2018:C1546 An elemental procedure shall not have the BIND attribute.  */
+  if (proc->attr.elemental && entry->attr.is_bind_c)
+{
+  gfc_error ("ENTRY statement at %L with BIND(C) prohibited in an "
+		 "elemental procedure", >declared_at);
   return MATCH_ERROR;
 }
 
Index: gcc/testsuite/gfortran.dg/pr35031.f90
===
--- gcc/testsuite/gfortran.dg/pr35031.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr35031.f90	(working copy)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+elemental subroutine sub2(x)
+   integer, intent(in) :: x
+   entry sub2_c(x) bind(c)! { dg-error "prohibited in an elemental" }
+end subroutine sub2
+
+elemental function func2(x)
+   integer, intent(in) :: x
+   entry func2_c(x) bind(c)   ! { dg-error "prohibited in an elemental" }
+end function func2


[PATCH] Better distinguish OpenACC and OpenMP sections in libgomp.texi

2019-01-10 Thread Julian Brown
Hi,

This patch looks like it should have been attached to the following
email:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01173.html

but it looks like the wrong patch (and ChangeLog!) were attached
instead. For convenience, I'll copy Cesar's blurb (mildly corrected)
from the previous message:

"This patch updates the libgomp documentation to more clearly identify
OpenMP-specific sections. Specifically, the sections "Runtime Library
Routine" and "Environment Variables" are now prefixed by OpenMP, because
those sections are not applicable to OpenACC."

I've re-checked that the generated libgomp.pdf looks ok.

OK? (Documentation, so should be OK for stage 4, IIUC.)

Thanks,

Julian

ChangeLog

2019-xx-xx  Thomas Schwinge  
James Norris  

* libgomp.texi: Better distinguish OpenACC and OpenMP "Runtime
Library Routines", and "Environment Variables".
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 4991271..e2e384a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -95,10 +95,12 @@ changed to GNU Offloading and Multi Processing Runtime Library.
 @comment
 @menu
 * Enabling OpenMP::How to enable OpenMP for your applications.
-* Runtime Library Routines::   The OpenMP runtime application programming 
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+   The OpenMP runtime application programming
interface.
-* Environment Variables::  Influencing runtime behavior with environment 
-   variables.
+* OpenMP Environment Variables: Environment Variables.
+   Influencing OpenMP runtime behavior with
+   environment variables.
 * Enabling OpenACC::   How to enable OpenACC for your
applications.
 * OpenACC Runtime Library Routines:: The OpenACC runtime application
@@ -144,11 +146,11 @@ version 4.5.
 
 
 @c -
-@c Runtime Library Routines
+@c OpenMP Runtime Library Routines
 @c -
 
 @node Runtime Library Routines
-@chapter Runtime Library Routines
+@chapter OpenMP Runtime Library Routines
 
 The runtime routines described here are defined by Section 3 of the OpenMP
 specification in version 4.5.  The routines are structured in following
@@ -1327,11 +1329,11 @@ guaranteed not to change during the execution of the program.
 
 
 @c -
-@c Environment Variables
+@c OpenMP Environment Variables
 @c -
 
 @node Environment Variables
-@chapter Environment Variables
+@chapter OpenMP Environment Variables
 
 The environment variables which beginning with @env{OMP_} are defined by
 section 4 of the OpenMP specification in version 4.5, while those


Re: [Patch, fortran] Fix PR59345, repacking of a packed temporary array

2019-01-10 Thread Steve Kargl
On Thu, Jan 10, 2019 at 09:17:37PM +0100, Thomas Koenig wrote:
> 
> the attached patch fixes a rather bad missed optimization, where
> the generated temporary array for
> 
> SUBROUTINE S1(A)
>   REAL :: A(3)
>   CALL S2(-A)
> END SUBROUTINE
> 
> was packed and unpacked(!).
> 
> Regression-tested. OK for trunk?
> 

Yes.

-- 
Steve


[PATCH 10/10] libiberty: Correct an invalid assumption

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

As a counter example: 888 * 10 = -3344831479658869200, which is
valid for 64 bit longs, and evidently divisible by 10.

Also safely check that adding the digit won't cause an overflow too.

No testcase provided since one of the previous testcases flagged this issue up.

 * d-demangle.c: Include  if available.
 (LONG_MAX): Define if necessary.
 (dlang_number): Fix overflow.

From 6dc14e124c4a48928046403faca37504229b13c4 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:57:08 +
Subject: [PATCH 10/10] libiberty: Correct an invalid assumption.

As a counter example: 888 * 10 = -3344831479658869200, which is
valid for 64 bit longs, and evidently divisible by 10.

Also safely check that adding the digit won't cause an overflow too.

No testcase provided since one of the previous testcases flagged this issue up.

* d-demangle.c: Include  if available.
(LONG_MAX): Define if necessary.
(dlang_number): Fix overflow.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index becc402..4ffcdd1 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -42,6 +42,13 @@ If not, see .  */
 #include 
 #endif
 
+#ifdef HAVE_LIMITS_H
+#include 
+#endif
+#ifndef LONG_MAX
+# define LONG_MAX  (long)(((unsigned long) ~0) >> 1)
+#endif
+
 #include 
 #include "libiberty.h"
 
@@ -206,15 +213,18 @@ dlang_number (const char *mangled, long *ret)
 
   while (ISDIGIT (*mangled))
 {
+  long digit = mangled[0] - '0';
+  mangled++;
+
+  if (*ret > LONG_MAX / 10)
+	return NULL;
+
   (*ret) *= 10;
 
-  /* If an overflow occured when multiplying by ten, the result
-	 will not be a multiple of ten.  */
-  if ((*ret % 10) != 0)
+  if (LONG_MAX - digit < *ret)
 	return NULL;
 
-  (*ret) += mangled[0] - '0';
-  mangled++;
+  (*ret) += digit;
 }
 
   if (*mangled == '\0' || *ret < 0)
-- 
2.20.1



[PATCH 09/10] libiberty: Correctly handle error result in dlang_parse_assocarray()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

 * d-demangle.c (dlang_parse_assocarray): Correctly handle error result.
 * testsuite/d-demangle-expected: Add testcase.

From f3dd4107d4bd59b7f3370b17b25c9fd35d499ea3 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:46:30 +
Subject: [PATCH 09/10] libiberty: Correctly handle error result in
 dlang_parse_assocarray().

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

* d-demangle.c (dlang_parse_assocarray): Correctly handle error result.
* testsuite/d-demangle-expected: Add testcase.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index e98118e..becc402 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -1217,8 +1217,13 @@ dlang_parse_assocarray (string *decl, const char *mangled)
   while (elements--)
 {
   mangled = dlang_value (decl, mangled, NULL, '\0');
+  if (mangled == NULL)
+	return NULL;
+
   string_append (decl, ":");
   mangled = dlang_value (decl, mangled, NULL, '\0');
+  if (mangled == NULL)
+	return NULL;
 
   if (elements != 0)
 	string_append (decl, ", ");
diff --git a/libiberty/testsuite/d-demangle-expected b/libiberty/testsuite/d-demangle-expected
index 44a8d3b..490d4e1 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -1322,3 +1322,7 @@ _D7__T2fnVlS8S5888S6S5
 --format=dlang
 _D1_B6961*
 _D1_B6961*
+# Could crash
+--format=dlang
+_D5__T1fVHacA66_
+_D5__T1fVHacA66_
-- 
2.20.1



[PATCH 07/10] libiberty: Correctly handle error result in dlang_parse_structlit()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

 * d-demangle.c (dlang_parse_structlit): Correctly handle error result.
 * testsuite/d-demangle-expected: Add testcase.

From 4911e6f481472b732277cc9b2136b0846474bb4a Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:37:41 +
Subject: [PATCH 07/10] libiberty: Correctly handle error result in
 dlang_parse_structlit().

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

* d-demangle.c (dlang_parse_structlit): Correctly handle error result.
* testsuite/d-demangle-expected: Add testcase.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 303d2ee..5590417 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -1246,6 +1246,9 @@ dlang_parse_structlit (string *decl, const char *mangled, const char *name)
   while (args--)
 {
   mangled = dlang_value (decl, mangled, NULL, '\0');
+  if (mangled == NULL)
+	return NULL;
+
   if (args != 0)
 	string_append (decl, ", ");
 }
diff --git a/libiberty/testsuite/d-demangle-expected b/libiberty/testsuite/d-demangle-expected
index 19665f5..0a5f9da 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -1314,3 +1314,7 @@ _D8__T2fnVa8_
 --format=dlang
 _D5__T2fnVmA1A1A9D
 _D5__T2fnVmA1A1A9D
+# Could crash
+--format=dlang
+_D7__T2fnVlS8S5888S6S5
+_D7__T2fnVlS8S5888S6S5
-- 
2.20.1



[PATCH 08/10] libiberty: Correctly handle error result in dlang_parse_tuple()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

 * d-demangle.c (dlang_parse_tuple): Correctly handle error result.
 * testsuite/d-demangle-expected: Add testcase.

From 7491ea105fd8d1d7887884594d30486ecf2cac08 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:40:48 +
Subject: [PATCH 08/10] libiberty: Correctly handle error result in
 dlang_parse_tuple().

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

* d-demangle.c (dlang_parse_tuple): Correctly handle error result.
* testsuite/d-demangle-expected: Add testcase.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 5590417..e98118e 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -1503,6 +1503,9 @@ dlang_parse_tuple (string *decl, const char *mangled)
   while (elements--)
 {
   mangled = dlang_type (decl, mangled);
+  if (mangled == NULL)
+	return NULL;
+
   if (elements != 0)
 	string_append (decl, ", ");
 }
diff --git a/libiberty/testsuite/d-demangle-expected b/libiberty/testsuite/d-demangle-expected
index 0a5f9da..44a8d3b 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -1318,3 +1318,7 @@ _D5__T2fnVmA1A1A9D
 --format=dlang
 _D7__T2fnVlS8S5888S6S5
 _D7__T2fnVlS8S5888S6S5
+# Could crash
+--format=dlang
+_D1_B6961*
+_D1_B6961*
-- 
2.20.1



[PATCH 05/10] libiberty: Fix stack underflow in dlang_parse_integer()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

A char array of size 10 was created on the stack to hold the decimal
representation of a long, which on my platform is 64 bits and hence has a
maximum value of 9223372036854775807, far exceeding 10 characters.

Fix this by bumping the size of the array to 20 characters.

 * d-demangle.c (dlang_parse_integer): Fix stack underflow.
 * testsuite/d-demangle-expected: Add testcase.

From 56a6202c87543dbf0a15d99e4dcb01507bf70f57 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:24:19 +
Subject: [PATCH 05/10] libiberty: Fix stack underflow in
 dlang_parse_integer().

A char array of size 10 was created on the stack to hold the decimal
representation of a long, which on my platform is 64 bits and hence has a
maximum value of 9223372036854775807, far exceeding 10 characters.

Fix this by bumping the size of the array to 20 characters.

* d-demangle.c (dlang_parse_integer): Fix stack underflow.
* testsuite/d-demangle-expected: Add testcase.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 8acbf04..114d9e0 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -939,8 +939,8 @@ dlang_parse_integer (string *decl, const char *mangled, char type)
   if (type == 'a' || type == 'u' || type == 'w')
 {
   /* Parse character value.  */
-  char value[10];
-  int pos = 10;
+  char value[20];
+  int pos = sizeof(value);
   int width = 0;
   long val;
 
@@ -991,7 +991,7 @@ dlang_parse_integer (string *decl, const char *mangled, char type)
 	  for (; width > 0; width--)
 	value[--pos] = '0';
 
-	  string_appendn (decl, &(value[pos]), 10 - pos);
+	  string_appendn (decl, &(value[pos]), sizeof(value) - pos);
 	}
   string_append (decl, "'");
 }
diff --git a/libiberty/testsuite/d-demangle-expected b/libiberty/testsuite/d-demangle-expected
index 547a2dd..9988238 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -1306,3 +1306,7 @@ rt.lifetime._d_newarrayOpT!(_d_newarrayiT)._d_newarrayOpT(const(TypeInfo), ulong
 --format=dlang
 _D4core8demangle16__T6mangleTFZPvZ6mangleFNaNbNfAxaAaZ11DotSplitter5emptyMxFNaNbNdNiNfZb
 core.demangle.mangle!(void*() function).mangle(const(char)[], char[]).DotSplitter.empty() const
+# Could crash
+--format=dlang
+_D8__T2fnVa8_
+_D8__T2fnVa8_
-- 
2.20.1



[PATCH 06/10] libiberty: Correctly handle error result in dlang_parse_arrayliteral()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

 * d-demangle.c (dlang_parse_arrayliteral): Correctly handle error result.
 * testsuite/d-demangle-expected: Add testcase.

From 8eca61f41b70891f4e2c456c4a12c06d3b4f3a3f Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:33:27 +
Subject: [PATCH 06/10] libiberty: Correctly handle error result in
 dlang_parse_arrayliteral().

The number of elements were being taken as valid and for each one a separator
was appended to the output, resulting in a huge memory bloat before crashing
later on due to a signed integer overflow.

* d-demangle.c (dlang_parse_arrayliteral): Correctly handle error result.
* testsuite/d-demangle-expected: Add testcase.

diff --git a/libiberty/d-demangle.c b/libiberty/d-demangle.c
index 114d9e0..303d2ee 100644
--- a/libiberty/d-demangle.c
+++ b/libiberty/d-demangle.c
@@ -1191,6 +1191,9 @@ dlang_parse_arrayliteral (string *decl, const char *mangled)
   while (elements--)
 {
   mangled = dlang_value (decl, mangled, NULL, '\0');
+  if (mangled == NULL)
+	return NULL;
+
   if (elements != 0)
 	string_append (decl, ", ");
 }
diff --git a/libiberty/testsuite/d-demangle-expected b/libiberty/testsuite/d-demangle-expected
index 9988238..19665f5 100644
--- a/libiberty/testsuite/d-demangle-expected
+++ b/libiberty/testsuite/d-demangle-expected
@@ -1310,3 +1310,7 @@ core.demangle.mangle!(void*() function).mangle(const(char)[], char[]).DotSplitte
 --format=dlang
 _D8__T2fnVa8_
 _D8__T2fnVa8_
+# Could crash
+--format=dlang
+_D5__T2fnVmA1A1A9D
+_D5__T2fnVmA1A1A9D
-- 
2.20.1



[PATCH 04/10] libiberty: Fix crash in ada_demangle()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

The output buffer is pre-allocated to a maximum size under the assumption that
special names can only occur once, however nothing was enforcing this for
stream attributes.

To fix this we treat stream attributes that appear before the end of the
mangled input as an error.

 * cplus-dem.c (ada_demangle): Only accept stream attributes if they're at
 the end of the input.
 * testsuite/demangle-expected: Add testcase.

From c8dd053c841e9b04583ad6c6bf4550d30aa47990 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:18:14 +
Subject: [PATCH 04/10] libiberty: Fix crash in ada_demangle().

The output buffer is pre-allocated to a maximum size under the assumption that
special names can only occur once, however nothing was enforcing this for
stream attributes.

To fix this we treat stream attributes that appear before the end of the
mangled input as an error.

* cplus-dem.c (ada_demangle): Only accept stream attributes if they're at
the end of the input.
* testsuite/demangle-expected: Add testcase.

diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c
index afceed2..245cf11 100644
--- a/libiberty/cplus-dem.c
+++ b/libiberty/cplus-dem.c
@@ -254,6 +254,8 @@ ada_demangle (const char *mangled, int option ATTRIBUTE_UNUSED)
   p = mangled;
   while (1)
 {
+  int stream = 0;
+
   /* An entity names is expected.  */
   if (ISLOWER (*p))
 {
@@ -363,6 +365,7 @@ ada_demangle (const char *mangled, int option ATTRIBUTE_UNUSED)
   goto unknown;
 }
   p += 2;
+  stream = 1;
   strcpy (d, name);
   d += strlen (name);
 }
@@ -437,6 +440,10 @@ ada_demangle (const char *mangled, int option ATTRIBUTE_UNUSED)
   else
 goto unknown;
 }
+  else if (stream)
+{
+  goto unknown;
+}
   else
 {
   *d++ = '.';
diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected
index f21ed00..8b830b6 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -81,6 +81,10 @@ _ZZaSFvOEES_
 
 _ZZeqFvOEES_z
 _ZZeqFvOEES_z
+# Could crash
+--format=gnat
+lSO__lSO
+
 #
 # demangler/80513 Test for bogus characters after __thunk_
 
-- 
2.20.1



[PATCH 03/10] libiberty: Fix a crash in d_print_comp_inner()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

'typed_name' is checked before the loop, but not checked after every
iteration. This can cause a crash if the input buffer is malformed since
'typed_name' can be assigned NULL.

To fix this, break out of the loop if we see it's NULL and handle that case
afterwards.

 * cp-demangle (d_print_comp_inner): Guard against a NULL 'typed_name'.
 * testsuite/demangle-expected: Add testcase.

From 3b36d9788fb9fe08ed9c83a57fb18bbfdc903543 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:13:26 +
Subject: [PATCH 03/10] libiberty: Fix a crash in d_print_comp_inner().

'typed_name' is checked before the loop, but not checked after every
iteration. This can cause a crash if the input buffer is malformed since
'typed_name' can be assigned NULL.

To fix this, break out of the loop if we see it's NULL and handle that case
afterwards.

* cp-demangle (d_print_comp_inner): Guard against a NULL 'typed_name'.
* testsuite/demangle-expected: Add testcase.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 02b5f9e..8ab0cd5 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -4757,12 +4757,8 @@ d_print_comp_inner (struct d_print_info *dpi, int options,
 	typed_name = d_right (typed_name);
 	if (typed_name->type == DEMANGLE_COMPONENT_DEFAULT_ARG)
 	  typed_name = typed_name->u.s_unary_num.sub;
-	if (typed_name == NULL)
-	  {
-		d_print_error (dpi);
-		return;
-	  }
-	while (is_fnqual_component_type (typed_name->type))
+	while (typed_name != NULL
+		   && is_fnqual_component_type (typed_name->type))
 	  {
 		if (i >= sizeof adpm / sizeof adpm[0])
 		  {
@@ -4781,6 +4777,11 @@ d_print_comp_inner (struct d_print_info *dpi, int options,
 
 		typed_name = d_left (typed_name);
 	  }
+	if (typed_name == NULL)
+	  {
+		d_print_error (dpi);
+		return;
+	  }
 	  }
 
 	/* If typed_name is a template, then it applies to the
diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected
index eb5264d..f21ed00 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -77,6 +77,10 @@ _ZmmAtl
 _ZZaSFvOEES_
 _ZZaSFvOEES_
 _ZZaSFvOEES_
+# Could crash
+
+_ZZeqFvOEES_z
+_ZZeqFvOEES_z
 #
 # demangler/80513 Test for bogus characters after __thunk_
 
-- 
2.20.1



[PATCH 01/10] libiberty: Fix an out of bounds read in d_expression_1()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

Passing "_ZmmAtl" to cplus_demangle() causes it to read past the end of the
input buffer. This is because cplus_demangle_type() may advance the current
offset so when control returns to d_expression_1() the current char may now
be the last valid byte and hence we cannot peek at the next char.

Fixed this by checking that the current char is still valid before checking
that the next char is too.

 * cp-demangle.c (d_expression_1): Don't peek ahead unless the current
 char is valid.
 * testsuite/demangle-expected: Add testcase.

From dadc7d7812e0c42c4a7c8c1f0525c4a11e0bd229 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 21:50:59 +
Subject: [PATCH 01/10] libiberty: Fix an out of bounds read in
 d_expression_1().

Passing "_ZmmAtl" to cplus_demangle() causes it to read past the end of the
input buffer. This is because cplus_demangle_type() may advance the current
offset so when control returns to d_expression_1() the current char may now
be the last valid byte and hence we cannot peek at the next char.

Fixed this by checking that the current char is still valid before checking
that the next char is too.

* cp-demangle.c (d_expression_1): Don't peek ahead unless the current
char is valid.
* testsuite/demangle-expected: Add testcase.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 4624cd5..8f6 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -3353,7 +3353,7 @@ d_expression_1 (struct d_info *di)
   d_advance (di, 2);
   if (peek == 't')
 	type = cplus_demangle_type (di);
-  if (!d_peek_next_char (di))
+  if (!d_peek_char (di) || !d_peek_next_char (di))
 	return NULL;
   return d_make_comp (di, DEMANGLE_COMPONENT_INITIALIZER_LIST,
 			  type, d_exprlist (di, 'E'));
diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected
index 3723b7a..328d51a 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -68,6 +68,10 @@ _$_H1R
 
 _Q8ccQ4M2e.
 _Q8ccQ4M2e.
+# Could crash
+
+_ZmmAtl
+_ZmmAtl
 #
 # demangler/80513 Test for bogus characters after __thunk_
 
-- 
2.20.1



[PATCH 02/10] libiberty: Fix a crash in d_encoding()

2019-01-10 Thread Ben L
Hi all,

First time emailing gcc-patches, so I'm sorry if I get any of this wrong or if
there's obvious errors repeated in my patches. AFAICT I should be sending each
change individually rather than as one bulk patch, so I'm sorry about the spam
too.

All of these changes were found by fuzzing libiberty's demanglers over the
past week, and I have at least one more that it's currently crashing out on
but I haven't had time to look into why yet.

Obviously since this is my first time emailing I don't have write access to
commit any of these, so if any are approved then I'd be grateful if you can
commit them too.

Thanks,
Ben

--

Passing "_ZZaSFvOEES_" to cplus_demangle() without the DMGL_PARAMS flag causes
a crash due to d_right (dc) returning NULL inside d_encoding().

Check for this case and handle it as an error rather than crashing when trying
to dereference the right side's type.

 * cp-demangle.c (d_encoding): Guard against NULL return values from
 d_right (dc).
 * testsuite/demangle-expected: Add testcase.

From 5102da933a72628e34b68402168e571b09c54581 Mon Sep 17 00:00:00 2001
From: bobsayshilol 
Date: Wed, 9 Jan 2019 22:05:16 +
Subject: [PATCH 02/10] libiberty: Fix a crash in d_encoding().

Passing "_ZZaSFvOEES_" to cplus_demangle() without the DMGL_PARAMS flag causes
a crash due to d_right (dc) returning NULL inside d_encoding().

Check for this case and handle it as an error rather than crashing when trying
to dereference the right side's type.

* cp-demangle.c (d_encoding): Guard against NULL return values from
d_right (dc).
* testsuite/demangle-expected: Add testcase.

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 8f6..02b5f9e 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -1330,8 +1330,14 @@ d_encoding (struct d_info *di, int top_level)
 	 really apply here; this happens when parsing a class
 	 which is local to a function.  */
 	  if (dc->type == DEMANGLE_COMPONENT_LOCAL_NAME)
-	while (is_fnqual_component_type (d_right (dc)->type))
-	  d_right (dc) = d_left (d_right (dc));
+	{
+	  while (d_right (dc) != NULL
+		 && is_fnqual_component_type (d_right (dc)->type))
+		d_right (dc) = d_left (d_right (dc));
+
+	  if (d_right (dc) == NULL)
+		dc = NULL;
+	}
 	}
   else
 	{
diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected
index 328d51a..eb5264d 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -72,6 +72,11 @@ _Q8ccQ4M2e.
 
 _ZmmAtl
 _ZmmAtl
+# Could crash
+--no-params
+_ZZaSFvOEES_
+_ZZaSFvOEES_
+_ZZaSFvOEES_
 #
 # demangler/80513 Test for bogus characters after __thunk_
 
-- 
2.20.1



Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-10 Thread Steve Ellcey
OK, I fixed the issues in your last email.  I initially found one
regression while testing.  In lra_create_live_ranges_1 I had removed
the 'call_p = false' statement but did not replaced it with anything.
This resulted in no regressions on aarch64 but caused a single
regression on x86 (gcc.target/i386/pr87759.c).  I replaced the
line with 'call_insn = NULL' and the regression went away so I
have clean bootstraps and no regressions on aarch64 and x86 now.

If this looks good to you can I go ahead and check it in?  I know
we are in Stage 3 now, but my recollection is that patches that were
initially submitted during Stage 1 could go ahead once approved.

Steve Ellcey
sell...@marvell.com



2019-01-10  Steve Ellcey  

* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
(aarch64_hard_regno_call_part_clobbered): Add insn argument.
(aarch64_return_call_with_max_clobbers): New function.
(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
argument.
* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
* cselib.c (cselib_process_insn): Add argument to
targetm.hard_regno_call_part_clobbered call.
* ira-conflicts.c (ira_build_conflicts): Ditto.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* lra-constraints.c (inherit_reload_reg): Ditto.
* lra-int.h (struct lra_reg): Add call_insn field, remove call_p field.
* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
argument.  Call targetm.return_call_with_max_clobbers.
Add argument to targetm.hard_regno_call_part_clobbered call.
(calls_have_same_clobbers_p): New function.
(process_bb_lives): Add call_insn and last_call_insn variables.
Pass call_insn to check_pseudos_live_through_calls.
Modify if stmt to check targetm.return_call_with_max_clobbers.
Update setting of flush variable.
(lra_create_live_ranges_1): Set call_insn to NULL instead of call_p
to false.
* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
* regcprop.c (copyprop_hardreg_forward_1): Add argument to
targetm.hard_regno_call_part_clobbered call.
* reginfo.c (choose_hard_reg_mode): Ditto.
* regrename.c (check_new_reg_p): Ditto.
* reload.c (find_equiv_reg): Ditto.
* reload1.c (emit_reload_insns): Ditto.
* sched-deps.c (deps_analyze_insn): Ditto.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
* target.def (hard_regno_call_part_clobbered): Add insn argument.
(return_call_with_max_clobbers): New target function.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
* hooks.c (hook_bool_uint_mode_false): Change to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c300af..7a1f838 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1655,14 +1655,53 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
the lower 64 bits of a 128-bit register.  Tell the compiler the callee
clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+	machine_mode mode)
+{
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno)
+	 && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  

Re: [v3 PATCH] Implement LWG 2221, No formatted output operator for nullptr

2019-01-10 Thread Jonathan Wakely

On 10/01/19 22:27 +0100, Rainer Orth wrote:

Hi Jonathan,


On 04/12/17 23:04 +, Jonathan Wakely wrote:

On 03/12/17 23:08 +0200, Ville Voutilainen wrote:

Tested on Linux-x64.

2017-11-14  Ville Voutilainen  

  Implement LWG 2221
  * include/std/ostream (operator<<(nullptr_t)): New.
  * testsuite/27_io/basic_ostream/inserters_other/char/lwg2221.cc: New.



diff --git a/libstdc++-v3/include/std/ostream
b/libstdc++-v3/include/std/ostream
index f7cab03..18011bc 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -245,6 +245,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 operator<<(const void* __p)
 { return _M_insert(__p); }

+#if __cplusplus > 201402L
+  __ostream_type&
+  operator<<(nullptr_t)
+  { return *this << "nullptr"; }
+#endif


As discussed on IRC, this requires a new symbol to be exported for the
std::ostream and std::wostream explicit instantiations, or the new
test will fail to link at -O0.

That should wait for stage 1.



This patch for a C++17 feature (posted over a year ago) should have
gone in during stage 1. I've taken care of the symbol exports that
were missing from the original patch.

Tested x86_64-linux, committed to trunk.


this patch broke Solaris bootstrap:

ld: fatal: libstdc++-symbols.ver-sun: 7117: symbol 'std::basic_ostream >::operator<<(decltype(nullptr))': symbol version conflict
ld: fatal: libstdc++-symbols.ver-sun: 7119: symbol 'std::basic_ostream >::operator<<(decltype(nullptr))': symbol version 
conflict

ld: fatal: libstdc++-symbols.ver-sun: 7117: symbol '_ZNSolsEDn': symbol version 
conflict
ld: fatal: libstdc++-symbols.ver-sun: 7119: symbol 
'_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn': symbol version conflict

Again, there were two matches for those two symbols:

 GLIBCXX_3.4
   ##_ZNSolsE*[^Dg] (glob)
   _ZNSolsEDn;
 GLIBCXX_3.4.26
   ##_ZNSolsEDn (glob)
   _ZNSolsEDn;

 GLIBCXX_3.4
   ##_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE*[^Dg] (glob)
   _ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn;
 GLIBCXX_3.4.26
   ##_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn (glob)
   _ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn;

ISTM that the patterns were backwards.  The following patch fixes this
and allowed i386-pc-solaris2.11 bootstrap to complete without
regressions relative to the last successful one.


I think what I should have done is change [^g] to [^gn]. That
preserves the original behaviour (don't match the ppc64 long double
symbols) but also excludes the new symbols, which end in 'n'.

Maybe the attached patch would be better though. It matches every
basic_ostream::operator<<(T) for any scalar T except 'g', and adds a
second pattern to match basic_ostream::operator<<(T*) for various T.
But neither of those matches the new operator<<(nullptr_t) overload.


FWIW I did run my symbol checker script, but it gets lots of false
positives because it doesn't understand the #if preprocessor
conditions, so it sees lots of false positive duplicates. I need to
make it smarter for it to be useful here.


diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 788c2e0303c..d3431d2c78e 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -495,7 +495,8 @@ GLIBCXX_3.4 {
 _ZNSo8_M_writeEPKc[ilx];
 _ZNSo3put*;
 _ZNSo[5-9][a-z]*;
-_ZNSolsE*[^Dg];
+_ZNSolsE[^g];
+_ZNSolsEP*;
 
 # std::basic_ostream
 _ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]Ev;
@@ -509,7 +510,8 @@ GLIBCXX_3.4 {
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE5writeEPKw*;
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE6sentry*;
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE8_M_writeEPKw[ilx];
-_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE*[^Dg];
+_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE[^g];
+_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEP*;
 
 # std::ostream operators and inserters
 _ZSt4end[ls]I[cw]St11char_traitsI[cw]EERSt13basic_ostream*;


Re: [PATCH] Don't use align > MAX_SUPPORTED_STACK_ALIGNMENT in assign_stack_temp_for_type (PR bootstrap/88450)

2019-01-10 Thread Jakub Jelinek
On Thu, Jan 10, 2019 at 03:11:18PM -0800, H.J. Lu wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, but that doesn't mean
> > much, because MAX_SUPPORTED_STACK_ALIGNMENT there is 1 << 28.  Guess more
> > useful would be to test it on mingw where BIGGEST_ALIGNMENT is often higher
> > than MAX_SUPPORTED_STACK_ALIGNMENT.
> 
> FWIW, MAX_SUPPORTED_STACK_ALIGNMENT is an arbitrary large value for
> Linux/x86 since we track and align stack as needed.

Yes, I know.  Which is why I've said that the passed bootstrap/regtest on
{x86_64,i686}-linux isn't really meaningful, since the patch probably never
changed anything at all, and probably not really useful on powerpc*, as
MAX_SUPPORTED_STACK_ALIGNMENT is there usually 128 and BIGGEST_ALIGNMENT 128
as well.

Jakub


Re: [PATCH] Fix float*v2div2sf2* patterns (PR target/88785)

2019-01-10 Thread Uros Bizjak
On Thu, Jan 10, 2019 at 11:20 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs in dwarf2out.c, because a few sse.md patterns
> contain invalid RTL, in particular
> (const_vector:V2SF [(const_int 0) (const_int 0)])
> Elements of a V2SF const_vector should be (const_double:SF 0), not
> (const_int 0).  Unfortunately, we can't add explicitly const_double 0
> constants the way one can write (const_int 0), so this patch uses
> separate define_expand to add those CONST0_RTX args and match_operand
> with "const0_rtx" "C" to match that.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-01-10  Jakub Jelinek  
>
> PR target/88785
> * config/i386/sse.md (floatv2div2sf2): Turn into
> define_expand.
> (*floatv2div2sf2): New define_insn.
> (floatv2div2sf2_mask): Turn into define_expand.
> (*floatv2div2sf2_mask): New define_insn.
> (*floatv2div2sf2_mask_1): Replace
> subrtxes (const_vector:V2SF [(const_int 0) (const_int 0)]) with
> match_operands with "const0_operand" "C".
>
> * g++.target/i386/pr88785.C: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-01-10 11:43:17.520326228 +0100
> +++ gcc/config/i386/sse.md  2019-01-10 12:57:52.946797987 +0100
> @@ -5222,11 +5222,19 @@ (define_insn "float (set_attr "prefix" "evex")
> (set_attr "mode" "")])
>
> -(define_insn "floatv2div2sf2"
> +(define_expand "floatv2div2sf2"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
> (vec_concat:V4SF
> (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
> "vm"))
> -   (const_vector:V2SF [(const_int 0) (const_int 0)])))]
> +   (match_dup 2)))]
> +  "TARGET_AVX512DQ && TARGET_AVX512VL"
> +  "operands[2] = CONST0_RTX (V2SFmode);")
> +
> +(define_insn "*floatv2div2sf2"
> +  [(set (match_operand:V4SF 0 "register_operand" "=v")
> +   (vec_concat:V4SF
> +   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
> "vm"))
> +   (match_operand:V2SF 2 "const0_operand" "C")))]
>"TARGET_AVX512DQ && TARGET_AVX512VL"
>"vcvtqq2ps{x}\t{%1, %0|%0, %1}"
>[(set_attr "type" "ssecvt")
> @@ -5260,16 +5268,29 @@ (define_expand "vec_pack_fl
>DONE;
>  })
>
> -(define_insn "floatv2div2sf2_mask"
> +(define_expand "floatv2div2sf2_mask"
> +  [(set (match_operand:V4SF 0 "register_operand" "=v")
> +(vec_concat:V4SF
> +(vec_merge:V2SF
> +   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
> "vm"))
> +(vec_select:V2SF
> +(match_operand:V4SF 2 "nonimm_or_0_operand" "0C")
> +(parallel [(const_int 0) (const_int 1)]))
> +(match_operand:QI 3 "register_operand" "Yk"))
> +   (match_dup 4)))]
> +  "TARGET_AVX512DQ && TARGET_AVX512VL"
> +  "operands[4] = CONST0_RTX (V2SFmode);")
> +
> +(define_insn "*floatv2div2sf2_mask"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
>  (vec_concat:V4SF
>  (vec_merge:V2SF
> -   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
> "vm"))
> +   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
> "vm"))
>  (vec_select:V2SF
>  (match_operand:V4SF 2 "nonimm_or_0_operand" "0C")
>  (parallel [(const_int 0) (const_int 1)]))
>  (match_operand:QI 3 "register_operand" "Yk"))
> -   (const_vector:V2SF [(const_int 0) (const_int 0)])))]
> +   (match_operand:V2SF 4 "const0_operand" "C")))]
>"TARGET_AVX512DQ && TARGET_AVX512VL"
>"vcvtqq2ps{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
>[(set_attr "type" "ssecvt")
> @@ -5282,9 +5303,9 @@ (define_insn "*floatv2di
> (vec_merge:V2SF
> (any_float:V2SF (match_operand:V2DI 1
>   "nonimmediate_operand" "vm"))
> -   (const_vector:V2SF [(const_int 0) (const_int 0)])
> +   (match_operand:V2SF 3 "const0_operand" "C")
> (match_operand:QI 2 "register_operand" "Yk"))
> -   (const_vector:V2SF [(const_int 0) (const_int 0)])))]
> +   (match_operand:V2SF 4 "const0_operand" "C")))]
>"TARGET_AVX512DQ && TARGET_AVX512VL"
>"vcvtqq2ps{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
>[(set_attr "type" "ssecvt")
> --- gcc/testsuite/g++.target/i386/pr88785.C.jj  2019-01-10 13:08:22.987439456 
> +0100
> +++ gcc/testsuite/g++.target/i386/pr88785.C 2019-01-10 13:08:17.396531359 
> +0100
> @@ -0,0 +1,197 @@
> +// PR target/88785
> +// { dg-do compile }
> +// { dg-options "-O2 -g -std=c++17 -mavx512vl -mavx512dq" }
> +
> +namespace a {
> +template  class b;
> +template  class d;
> +}
> +template  struct g { static constexpr int e = f; };
> +template  struct aa;
> +template  struct o;
> +template  struct o : aa::ac {};
> +template  struct j;
> +template  struct j : aa::ac {};
> +template  constexpr bool l = o::e;
> +template  struct r : g {};
> 

Re: [PATCH v2] x86-64: {,V}CVTSI2Sx are ambiguous without suffix

2019-01-10 Thread Uros Bizjak
On Thu, Jan 10, 2019 at 3:56 PM Jan Beulich  wrote:
>
> For 64-bit these should not be emitted without suffix in AT mode (as
> being ambiguous that way); the suffixes are benign for 32-bit. For
> consistency also omit the suffix in Intel mode for {,V}CVTSI2SxQ.
>
> The omission has originally (prior to rev 260691) lead to wrong code
> being generated for the 64-bit unsigned-to-float/double conversions (as
> gas guesses an L suffix instead of the required Q one when the operand
> is in memory). In all remaining cases (being changed here) the omission
> would "just" lead to warnings with future gas versions.
>
> As a result, arrange to check for the L suffixes in 32-bit test cases.
>
> In order for related test cases to actually test what they're supposed
> to test, add (seemingly unrelated) a few empty "asm volatile()".
> Presumably there are more where constant propagation voids the intended
> effect of the tests, but these are ones helping make sure the assembler
> actually still assembles correctly the output after the changes here.
> ---
> v2: Don't drop (redundant) suffixes from *2SI conversions. Adjust
> changes to testsuite accordingly.
>
> gcc/
> 2019-01-10  Jan Beulich  
>
> * config/i386/i386.md (rex64suffix): Add L suffix for SI.
> * config/i386/sse.md (cvtusi232,
> sse2_cvtsi2sd): Add {l}.
> (sse2_cvtsi2sdq): Make q conditional upon AT
> syntax.
>
> gcc/testsuite/
> 2019-01-10  Jan Beulich  
>
> * gcc.target/i386/avx512f-vcvtsd2si-1.c,
> gcc.target/i386/avx512f-vcvtss2si-1.c,
> gcc.target/i386/avx512f-vcvttsd2si-1.c,
> gcc.target/i386/avx512f-vcvttss2si-1.c: Permit l suffix.
> * gcc.target/i386/avx512f-vcvtsi2ss-1.c,
> gcc.target/i386/avx512f-vcvtusi2sd-1.c,
> gcc.target/i386/avx512f-vcvtusi2ss-1.c: Expect l suffix.
> * gcc.target/i386/avx512f-vcvtusi2sd-2.c,
> gcc.target/i386/avx512f-vcvtusi2sd64-2.c,
> gcc.target/i386/avx512f-vcvtusi2ss-2.c,
> gcc.target/i386/avx512f-vcvtusi2ss64-2.c: Add asm volatile().
> gcc.target/i386/pr19398.c: Permit l or q suffix.

OK.

Thanks,
Uros.

> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1162,7 +1162,7 @@
>[(QI "V64QI") (HI "V32HI") (SI "V16SI") (DI "V8DI") (SF "V16SF") (DF 
> "V8DF")])
>
>  ;; Instruction suffix for REX 64bit operators.
> -(define_mode_attr rex64suffix [(SI "") (DI "{q}")])
> +(define_mode_attr rex64suffix [(SI "{l}") (DI "{q}")])
>  (define_mode_attr rex64namesuffix [(SI "") (DI "q")])
>
>  ;; This mode iterator allows :P to be used for patterns that operate on
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4767,7 +4767,7 @@
>   (match_operand:VF_128 1 "register_operand" "v")
>   (const_int 1)))]
>"TARGET_AVX512F && "
> -  "vcvtusi2\t{%2, %1, %0|%0, %1, 
> %2}"
> +  "vcvtusi2{l}\t{%2, %1, %0|%0, 
> %1, %2}"
>[(set_attr "type" "sseicvt")
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
> @@ -5026,9 +5026,9 @@
>   (const_int 1)))]
>"TARGET_SSE2"
>"@
> -   cvtsi2sd\t{%2, %0|%0, %2}
> -   cvtsi2sd\t{%2, %0|%0, %2}
> -   vcvtsi2sd\t{%2, %1, %0|%0, %1, %2}"
> +   cvtsi2sd{l}\t{%2, %0|%0, %2}
> +   cvtsi2sd{l}\t{%2, %0|%0, %2}
> +   vcvtsi2sd{l}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "isa" "noavx,noavx,avx")
> (set_attr "type" "sseicvt")
> (set_attr "athlon_decode" "double,direct,*")
> @@ -5048,9 +5048,9 @@
>   (const_int 1)))]
>"TARGET_SSE2 && TARGET_64BIT"
>"@
> -   cvtsi2sdq\t{%2, %0|%0, %2}
> -   cvtsi2sdq\t{%2, %0|%0, %2}
> -   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
> +   cvtsi2sd{q}\t{%2, %0|%0, %2}
> +   cvtsi2sd{q}\t{%2, %0|%0, %2}
> +   vcvtsi2sd{q}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "isa" "noavx,noavx,avx")
> (set_attr "type" "sseicvt")
> (set_attr "athlon_decode" "double,direct,*")
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2si-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2si-1.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mavx512f" } */
> -/* { dg-final { scan-assembler-times "vcvtsd2si\[ 
> \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]+.{6}(?:\n|\[ \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vcvtsd2sil?\[ 
> \\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]+.{6}(?:\n|\[ \\t\]+#)" 1 } } */
>  #include 
>
>  volatile __m128d x;
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vcvtsi2ss-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsi2ss-1.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mavx512f -O2" } */
> -/* { dg-final { scan-assembler-times "vcvtsi2ss\[ 
> \\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 
> 1 } } */
> +/* { dg-final { scan-assembler-times "vcvtsi2ssl\[ 
> \\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 
> 1 } } */
>
>  #include 
>
> --- a/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2si-1.c
> +++ 

Re: [PATCH] Don't use align > MAX_SUPPORTED_STACK_ALIGNMENT in assign_stack_temp_for_type (PR bootstrap/88450)

2019-01-10 Thread H.J. Lu
On Thu, Jan 10, 2019 at 2:32 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Thu, Jan 10, 2019 at 04:36:35PM +0100, Eric Botcazou wrote:
> > > If there are other spots that need this, wondering about:
> > >   else
> > > copy = assign_temp (type, 1, 0);
> > > in calls.c, either it can be done by using the variable-sized object
> > > case in the then block, or could be done using assign_stack_local +
> > > this short realignment too.
> >
> > The latter I'd say.
>
> Will handle that tomorrow.
>
> But, there is another thing, while assign_stack_local_1 lowers
> alignment_in_bits to MAX_SUPPORTED_STACK_ALIGNMENT if it is higher than that
> and records that in the MEM it creates, the caller,
> assign_stack_temp_for_type will happily count with higher alignments and
> on the MEMs it creates will happily set MEM_ALIGN to the higher value.
> I think we shouldn't lie here, something in the optimizers could try to take
> advantage of the higher MEM_ALIGN.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, but that doesn't mean
> much, because MAX_SUPPORTED_STACK_ALIGNMENT there is 1 << 28.  Guess more
> useful would be to test it on mingw where BIGGEST_ALIGNMENT is often higher
> than MAX_SUPPORTED_STACK_ALIGNMENT.

FWIW, MAX_SUPPORTED_STACK_ALIGNMENT is an arbitrary large value for
Linux/x86 since we track and align stack as needed.

-- 
H.J.


[PATCH] Improve RTL DSE with -fstack-protector* (PR rtl-optimization/88796)

2019-01-10 Thread Jakub Jelinek
Hi!

As mentioned in the PR, RTL DSE doesn't do much with -fstack-protector*,
because the stack canary test in the epilogue of instrumented functions
is a MEM_VOLATILE_P read out of the crtl->stack_protect_guard ssp canary
slot in the stack frame and either a MEM_VOLATILE_P read of
__stack_chk_guard variable, or corresponding some other location (e.g. TLS
memory on x86).

The canary slot in the stack frame is written in the prologue using
MEM_VOLATILE_P store, so we never consider those to be DSEd and is only read
in the epilogue, so it shouldn't alias any other stores.
Similarly, __stack_chk_guard variable or say the TLS ssp slot or whatever
else is used to hold the random pointer-sized value really shouldn't be
changed in -fstack-protector* instrumented functions, as that would mean
they remembered one value in the prologue and would fail comparison in the
epilogue if it changed in between.  So, I believe we can safely ignore the
whole stack_pointer_test instruction in RTL DSE.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-10  Jakub Jelinek  

PR rtl-optimization/88796
* emit-rtl.h (struct rtl_data): Add stack_protect_guard_decl field.
* cfgexpand.c (stack_protect_prologue): Initialize
crtl->stack_protect_guard_decl.
* function.c (stack_protect_epilogue): Use it instead of calling
targetm.stack_protect_guard again.
* dse.c (check_mem_read_rtx): Ignore MEM_VOLATILE_P reads from
MEMs with MEM_EXPR equal to crtl->stack_protect_guard or
crtl->stack_protect_guard_decl.
* config/i386/i386.c (ix86_stack_protect_guard): Set TREE_THIS_VOLATILE
on the returned MEM_EXPR.

* gcc.target/i386/pr88796.c: New test.

--- gcc/emit-rtl.h.jj   2019-01-10 11:43:14.390377646 +0100
+++ gcc/emit-rtl.h  2019-01-10 21:38:38.682055891 +0100
@@ -87,6 +87,10 @@ struct GTY(()) rtl_data {
  Used for detecting stack clobbers.  */
   tree stack_protect_guard;
 
+  /* The __stack_chk_guard variable or expression holding the stack
+ protector canary value.  */
+  tree stack_protect_guard_decl;
+
   /* List (chain of INSN_LIST) of labels heading the current handlers for
  nonlocal gotos.  */
   rtx_insn_list *x_nonlocal_goto_handler_labels;
--- gcc/cfgexpand.c.jj  2019-01-07 09:50:26.774650762 +0100
+++ gcc/cfgexpand.c 2019-01-10 21:40:08.714589919 +0100
@@ -6219,6 +6219,7 @@ stack_protect_prologue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
 
+  crtl->stack_protect_guard_decl = guard_decl;
   x = expand_normal (crtl->stack_protect_guard);
 
   if (targetm.have_stack_protect_combined_set () && guard_decl)
--- gcc/function.c.jj   2019-01-10 16:43:54.802481070 +0100
+++ gcc/function.c  2019-01-10 21:40:49.326928642 +0100
@@ -4902,7 +4902,7 @@ init_function_start (tree subr)
 void
 stack_protect_epilogue (void)
 {
-  tree guard_decl = targetm.stack_protect_guard ();
+  tree guard_decl = crtl->stack_protect_guard_decl;
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
   rtx_insn *seq = NULL;
--- gcc/dse.c.jj2019-01-10 11:43:12.345411240 +0100
+++ gcc/dse.c   2019-01-10 21:48:07.224799798 +0100
@@ -2072,8 +2072,29 @@ check_mem_read_rtx (rtx *loc, bb_info_t
   insn_info = bb_info->last_insn;
 
   if ((MEM_ALIAS_SET (mem) == ALIAS_SET_MEMORY_BARRIER)
-  || (MEM_VOLATILE_P (mem)))
+  || MEM_VOLATILE_P (mem))
 {
+  if (crtl->stack_protect_guard
+ && (MEM_EXPR (mem) == crtl->stack_protect_guard
+ || (crtl->stack_protect_guard_decl
+ && MEM_EXPR (mem) == crtl->stack_protect_guard_decl))
+ && MEM_VOLATILE_P (mem))
+   {
+ /* This is either the stack protector canary on the stack,
+which ought to be written by a MEM_VOLATILE_P store and
+thus shouldn't be deleted and is read at the very end of
+function, but shouldn't conflict with any other store.
+Or it is __stack_chk_guard variable or TLS or whatever else
+MEM holding the canary value, which really shouldn't be
+ever modified in -fstack-protector* protected functions,
+otherwise the prologue store wouldn't match the epilogue
+check.  */
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, " stack protector canary read ignored.\n");
+ insn_info->cannot_delete = true;
+ return;
+   }
+
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, " adding wild read, volatile or barrier.\n");
   add_wild_read (bb_info);
--- gcc/config/i386/i386.c.jj   2019-01-10 11:43:17.534325998 +0100
+++ gcc/config/i386/i386.c  2019-01-10 21:35:39.588972002 +0100
@@ -45093,6 +45093,7 @@ ix86_stack_protect_guard (void)
  t = build_int_cst (asptrtype, ix86_stack_protector_guard_offset);
  t = build2 (MEM_REF, asptrtype, t,
  build_int_cst 

[PATCH] Don't use align > MAX_SUPPORTED_STACK_ALIGNMENT in assign_stack_temp_for_type (PR bootstrap/88450)

2019-01-10 Thread Jakub Jelinek
Hi!

On Thu, Jan 10, 2019 at 04:36:35PM +0100, Eric Botcazou wrote:
> > If there are other spots that need this, wondering about:
> >   else
> > copy = assign_temp (type, 1, 0);
> > in calls.c, either it can be done by using the variable-sized object
> > case in the then block, or could be done using assign_stack_local +
> > this short realignment too.
> 
> The latter I'd say.

Will handle that tomorrow.

But, there is another thing, while assign_stack_local_1 lowers
alignment_in_bits to MAX_SUPPORTED_STACK_ALIGNMENT if it is higher than that
and records that in the MEM it creates, the caller,
assign_stack_temp_for_type will happily count with higher alignments and
on the MEMs it creates will happily set MEM_ALIGN to the higher value.
I think we shouldn't lie here, something in the optimizers could try to take
advantage of the higher MEM_ALIGN.

Bootstrapped/regtested on x86_64-linux and i686-linux, but that doesn't mean
much, because MAX_SUPPORTED_STACK_ALIGNMENT there is 1 << 28.  Guess more
useful would be to test it on mingw where BIGGEST_ALIGNMENT is often higher
than MAX_SUPPORTED_STACK_ALIGNMENT.

Thoughts on this?

2019-01-10  Jakub Jelinek  

PR bootstrap/88450
* function.c (assign_stack_temp_for_type): Use alignment at most
MAX_SUPPORTED_STACK_ALIGNMENT.  Adjust assert correspondingly.

--- gcc/function.c.jj   2019-01-10 00:13:47.593688442 +0100
+++ gcc/function.c  2019-01-10 00:17:42.464890435 +0100
@@ -792,6 +792,7 @@ assign_stack_temp_for_type (machine_mode
   gcc_assert (known_size_p (size));
 
   align = get_stack_local_alignment (type, mode);
+  align = MIN (align, MAX_SUPPORTED_STACK_ALIGNMENT);
 
   /* Try to find an available, already-allocated temporary of the proper
  mode which meets the size and alignment requirements.  Choose the
@@ -872,8 +873,10 @@ assign_stack_temp_for_type (machine_mode
 
 So for requests which depended on the rounding of SIZE, we go ahead
 and round it now.  We also make sure ALIGNMENT is at least
-BIGGEST_ALIGNMENT.  */
-  gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT);
+minimum of BIGGEST_ALIGNMENT and MAX_SUPPORTED_STACK_ALIGNMENT.  */
+  gcc_assert (mode != BLKmode
+ || align == MIN (BIGGEST_ALIGNMENT,
+  MAX_SUPPORTED_STACK_ALIGNMENT));
   p->slot = assign_stack_local_1 (mode,
  (mode == BLKmode
   ? aligned_upper_bound (size,


Jakub


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-10 Thread Bernd Edlinger
On 1/10/19 10:23 PM, Richard Sandiford wrote:
> Segher Boessenkool  writes:
>> On Tue, Jan 08, 2019 at 12:03:06PM +, Richard Sandiford wrote:
>>> Bernd Edlinger  writes:
 Meanwhile I found out, that the stack clobber has only been ignored up to
 gcc-5 (at least with lra targets, not really sure about reload targets).
 From gcc-6 on, with the exception of PR arm/77904 which was a regression 
 due
 to the underlying lra change, but fixed later, and back-ported to 
 gcc-6.3.0,
 this works for all targets I tried so far.

 To me, it starts to look like a rather unique and useful feature, that I 
 would
 like to keep working.
>>>
>>> Not sure what you mean by "unique".  But forcing a frame is a bit of
>>> a slippery concept.  Force it where?  For the asm only, or the whole
>>> function?  This depends on optimisation and hasn't been consistent
>>> across GCC versions, since it depends on the shrink-wrapping
>>> optimisation.  (There was a similar controversy a while ago about
>>> to what extent -fno-omit-frame-pointer should "force a frame".)
>>
>> It's not forcing a frame currently: it's just setting frame_pointer_needed.
>> Whatever happens from that is the target's business.
> 
> Do you mean the asm clobber or -fno-omit-frame-pointer?  If the option,
> then yeah, and that was exactly what was controversial :-)
> 

Yes, what I meant is the asm clobber sets frame_pointer_needed,
on the function where this asm is used, that sounded to me like
it would have an impact on the frame pointer.

What I also expected, is that if an asm is accessing a local
via "m" then the a SP+x reference will be elimitated to a FP+x,
reference, which would allow the asm to push something on the
stack, and the memory references would remain valid,
as long as the stack is _restored_, again in the same asm.
I mean in case of register shortage.  I was not thinking about
noreturn at all.

But if -fno-omit-frame-pointer does the same, and that is not sufficient
to for forcing a frame pointer, because it is a target dependent, then I
wonder how ASAN is supposed to work on such a target.

But anyway I guess, your patch is fine.


Thanks
Bernd. 


[COMMITTED][PATCH][GCC][AArch64] Initialize the new SIMD buildins in right place.

2019-01-10 Thread Tamar Christina
Hi All,

This fixes an issue where the +nosimd option causes the builtins for fcmla_laneq
not to be defined at all.  This fixes the regression by initializing the
built-ins together with the rest of the SIMD ones.

Thanks,
Tamar

gcc/ChangeLog:

2019-01-10  Tamar Christina  

* config/aarch64/aarch64-builtins.c
(aarch64_init_builtins): Move aarch64_init_fcmla_laneq_builtins...
(aarch64_init_simd_builtins): ...Here.

-- 
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index df0e035e39a94b7978f7c30317779dbdda7c182e..04063e5ed134d2e64487db23b8fa7794817b2739 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -951,6 +951,9 @@ aarch64_init_simd_builtins (void)
  NULL, NULL_TREE);
   aarch64_builtin_decls[fcode] = fndecl;
 }
+
+   /* Initialize the remaining fcmla_laneq intrinsics.  */
+   aarch64_init_fcmla_laneq_builtins ();
 }
 
 static void
@@ -1078,10 +1081,7 @@ aarch64_init_builtins (void)
   aarch64_init_fp16_types ();
 
   if (TARGET_SIMD)
-{
-  aarch64_init_simd_builtins ();
-  aarch64_init_fcmla_laneq_builtins ();
-}
+aarch64_init_simd_builtins ();
 
   aarch64_init_crc32_builtins ();
   aarch64_init_builtin_rsqrt ();



[PATCH] Fix misplaced combine totals dumping (PR bootstrap/88714)

2019-01-10 Thread Jakub Jelinek
Hi!

r191883 seems to have introduced a pasto:
--- trunk/gcc/passes.c  2012/10/01 00:17:52 191882
+++ trunk/gcc/passes.c  2012/10/01 05:43:06 191883
@@ -231,27 +231,23 @@
   timevar_push (TV_DUMP);
   if (profile_arc_flag || flag_test_coverage || flag_branch_probabilities)
 {
-  dump_file = dump_begin (pass_profile.pass.static_pass_number, NULL);
+  dump_start (pass_profile.pass.static_pass_number, NULL);
   end_branch_prob ();
-  if (dump_file)
-   dump_end (pass_profile.pass.static_pass_number, dump_file);
+  dump_finish (pass_profile.pass.static_pass_number);
 }
 
   if (optimize > 0)
 {
-  dump_file = dump_begin (pass_combine.pass.static_pass_number, NULL);
-  if (dump_file)
-   {
- dump_combine_total_stats (dump_file);
-  dump_end (pass_combine.pass.static_pass_number, dump_file);
-   }
+  dump_start (pass_profile.pass.static_pass_number, NULL);
+  print_combine_total_stats ();
+  dump_finish (pass_combine.pass.static_pass_number);
 }

where dump_finish was used with correct pass_combine, but dump_start was
pastoed from the previous if and contained pass_profile instead.
Next r193821 noticed this, but instead of fixing the dump_start argument
changed dump_finish argument to match.

So, in the end, the combiner statistics was emitted in profile_estimate dump
and on the PR88714 issue suggested there is a difference already in the
profile_estimate dump, when actually the IL changed only during pre and of
course everything after it, including the combiner.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-10  Jakub Jelinek  

PR bootstrap/88714
* passes.c (finish_optimization_passes): Call print_combine_total_stats
inside of pass_combine_1 dump rather than pass_profile_1.

--- gcc/passes.c.jj 2019-01-01 12:37:15.494002253 +0100
+++ gcc/passes.c2019-01-10 16:30:43.295424173 +0100
@@ -361,9 +361,9 @@ finish_optimization_passes (void)
 
   if (optimize > 0)
 {
-  dumps->dump_start (pass_profile_1->static_pass_number, NULL);
+  dumps->dump_start (pass_combine_1->static_pass_number, NULL);
   print_combine_total_stats ();
-  dumps->dump_finish (pass_profile_1->static_pass_number);
+  dumps->dump_finish (pass_combine_1->static_pass_number);
 }
 
   /* Do whatever is necessary to finish printing the graphs.  */

Jakub


[PATCH] Fix float*v2div2sf2* patterns (PR target/88785)

2019-01-10 Thread Jakub Jelinek
Hi!

The following testcase ICEs in dwarf2out.c, because a few sse.md patterns
contain invalid RTL, in particular
(const_vector:V2SF [(const_int 0) (const_int 0)])
Elements of a V2SF const_vector should be (const_double:SF 0), not
(const_int 0).  Unfortunately, we can't add explicitly const_double 0
constants the way one can write (const_int 0), so this patch uses
separate define_expand to add those CONST0_RTX args and match_operand
with "const0_rtx" "C" to match that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-01-10  Jakub Jelinek  

PR target/88785
* config/i386/sse.md (floatv2div2sf2): Turn into
define_expand.
(*floatv2div2sf2): New define_insn.
(floatv2div2sf2_mask): Turn into define_expand.
(*floatv2div2sf2_mask): New define_insn.
(*floatv2div2sf2_mask_1): Replace
subrtxes (const_vector:V2SF [(const_int 0) (const_int 0)]) with
match_operands with "const0_operand" "C".

* g++.target/i386/pr88785.C: New test.

--- gcc/config/i386/sse.md.jj   2019-01-10 11:43:17.520326228 +0100
+++ gcc/config/i386/sse.md  2019-01-10 12:57:52.946797987 +0100
@@ -5222,11 +5222,19 @@ (define_insn "float")])
 
-(define_insn "floatv2div2sf2"
+(define_expand "floatv2div2sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(vec_concat:V4SF
(any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
-   (const_vector:V2SF [(const_int 0) (const_int 0)])))]
+   (match_dup 2)))]
+  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "operands[2] = CONST0_RTX (V2SFmode);")
+
+(define_insn "*floatv2div2sf2"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+   (vec_concat:V4SF
+   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
+   (match_operand:V2SF 2 "const0_operand" "C")))]
   "TARGET_AVX512DQ && TARGET_AVX512VL"
   "vcvtqq2ps{x}\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
@@ -5260,16 +5268,29 @@ (define_expand "vec_pack_fl
   DONE;
 })
 
-(define_insn "floatv2div2sf2_mask"
+(define_expand "floatv2div2sf2_mask"
+  [(set (match_operand:V4SF 0 "register_operand" "=v")
+(vec_concat:V4SF
+(vec_merge:V2SF
+   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
+(vec_select:V2SF
+(match_operand:V4SF 2 "nonimm_or_0_operand" "0C")
+(parallel [(const_int 0) (const_int 1)]))
+(match_operand:QI 3 "register_operand" "Yk"))
+   (match_dup 4)))]
+  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "operands[4] = CONST0_RTX (V2SFmode);")
+
+(define_insn "*floatv2div2sf2_mask"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 (vec_concat:V4SF
 (vec_merge:V2SF
-   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" 
"vm"))
+   (any_float:V2SF (match_operand:V2DI 1 "nonimmediate_operand" "vm"))
 (vec_select:V2SF
 (match_operand:V4SF 2 "nonimm_or_0_operand" "0C")
 (parallel [(const_int 0) (const_int 1)]))
 (match_operand:QI 3 "register_operand" "Yk"))
-   (const_vector:V2SF [(const_int 0) (const_int 0)])))]
+   (match_operand:V2SF 4 "const0_operand" "C")))]
   "TARGET_AVX512DQ && TARGET_AVX512VL"
   "vcvtqq2ps{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}"
   [(set_attr "type" "ssecvt")
@@ -5282,9 +5303,9 @@ (define_insn "*floatv2di
(vec_merge:V2SF
(any_float:V2SF (match_operand:V2DI 1
  "nonimmediate_operand" "vm"))
-   (const_vector:V2SF [(const_int 0) (const_int 0)])
+   (match_operand:V2SF 3 "const0_operand" "C")
(match_operand:QI 2 "register_operand" "Yk"))
-   (const_vector:V2SF [(const_int 0) (const_int 0)])))]
+   (match_operand:V2SF 4 "const0_operand" "C")))]
   "TARGET_AVX512DQ && TARGET_AVX512VL"
   "vcvtqq2ps{x}\t{%1, %0%{%2%}%{z%}|%0%{%2%}%{z%}, %1}"
   [(set_attr "type" "ssecvt")
--- gcc/testsuite/g++.target/i386/pr88785.C.jj  2019-01-10 13:08:22.987439456 
+0100
+++ gcc/testsuite/g++.target/i386/pr88785.C 2019-01-10 13:08:17.396531359 
+0100
@@ -0,0 +1,197 @@
+// PR target/88785
+// { dg-do compile }
+// { dg-options "-O2 -g -std=c++17 -mavx512vl -mavx512dq" }
+
+namespace a {
+template  class b;
+template  class d;
+}
+template  struct g { static constexpr int e = f; };
+template  struct aa;
+template  struct o;
+template  struct o : aa::ac {};
+template  struct j;
+template  struct j : aa::ac {};
+template  constexpr bool l = o::e;
+template  struct r : g {};
+template  struct r : g {};
+template  struct aa { typedef ad ac; };
+template  using ae = ad;
+template  using af = void;
+typedef float ag __attribute__((__vector_size__(16)));
+ag ah;
+ag ai(__attribute__((__vector_size__(2 * sizeof(long long long long z) {
+  ah = ag{};
+  __attribute__((__vector_size__(4 * sizeof(float float aj = ah;
+  return 

Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-10 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Thu, Jan 10, 2019 at 09:23:27PM +, Richard Sandiford wrote:
>> > "noreturn"...  What would that mean, *exactly*?  It cannot execute any
>> > code the compiler can see, so such asm is better off as real asm anyway
>> > (not inline asm).
>> 
>> "Exactly" is a strong word, and this wasn't my proposal, but...
>> I think it would act like a noreturn call to an unknown function.
>> Output operands wouldn't make sense, and arguably clobbers wouldn't
>> either.
>
> "noreturn" asm can be done already now, just use
> asm volatile ("..." ...);
> __builtin_unreachable ();
>
> I think there is no need to add a new syntax for that.

ISTR the point was that the PowerPC ABI places requirements on functions
with noreturn calls and the attribute would help GCC do the right thing
in those circumstances.  So "noreturn" would imply a call that doesn't
return, rather than just an infinite loop.

Richard


Re: [C++ Patch] Fix three locations

2019-01-10 Thread Jason Merrill

On 1/9/19 10:46 AM, Paolo Carlini wrote:

Hi,

three additional fixes along the usual lines. In the grokdeclarator 
changes I'm not touching the actual printing of the name, but another 
option would be using %qD and decl here too, thus, for cases like 
parse/crash43.C, where everything lives inside a namespace, we would 
print 'N::i'. instead of simply 'i'. Tested x86_64-linux, plus I checked 
by hand with a cross-compiler the dllimport bit.


Thanks, Paolo.




OK.

Jason


Re: [v3 PATCH] Implement LWG 2221, No formatted output operator for nullptr

2019-01-10 Thread Rainer Orth
Hi Jonathan,

> On 04/12/17 23:04 +, Jonathan Wakely wrote:
>>On 03/12/17 23:08 +0200, Ville Voutilainen wrote:
>>>Tested on Linux-x64.
>>>
>>>2017-11-14  Ville Voutilainen  
>>>
>>>   Implement LWG 2221
>>>   * include/std/ostream (operator<<(nullptr_t)): New.
>>>   * testsuite/27_io/basic_ostream/inserters_other/char/lwg2221.cc: New.
>>
>>>diff --git a/libstdc++-v3/include/std/ostream
>>> b/libstdc++-v3/include/std/ostream
>>>index f7cab03..18011bc 100644
>>>--- a/libstdc++-v3/include/std/ostream
>>>+++ b/libstdc++-v3/include/std/ostream
>>>@@ -245,6 +245,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>  operator<<(const void* __p)
>>>  { return _M_insert(__p); }
>>>
>>>+#if __cplusplus > 201402L
>>>+  __ostream_type&
>>>+  operator<<(nullptr_t)
>>>+  { return *this << "nullptr"; }
>>>+#endif
>>
>>As discussed on IRC, this requires a new symbol to be exported for the
>>std::ostream and std::wostream explicit instantiations, or the new
>>test will fail to link at -O0.
>>
>>That should wait for stage 1.
>>
>
> This patch for a C++17 feature (posted over a year ago) should have
> gone in during stage 1. I've taken care of the symbol exports that
> were missing from the original patch.
>
> Tested x86_64-linux, committed to trunk.

this patch broke Solaris bootstrap:

ld: fatal: libstdc++-symbols.ver-sun: 7117: symbol 'std::basic_ostream >::operator<<(decltype(nullptr))': symbol version 
conflict
ld: fatal: libstdc++-symbols.ver-sun: 7119: symbol 'std::basic_ostream >::operator<<(decltype(nullptr))': symbol version 
conflict

ld: fatal: libstdc++-symbols.ver-sun: 7117: symbol '_ZNSolsEDn': symbol version 
conflict
ld: fatal: libstdc++-symbols.ver-sun: 7119: symbol 
'_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn': symbol version conflict

Again, there were two matches for those two symbols:

  GLIBCXX_3.4
##_ZNSolsE*[^Dg] (glob)
_ZNSolsEDn;
  GLIBCXX_3.4.26
##_ZNSolsEDn (glob)
_ZNSolsEDn;

  GLIBCXX_3.4
##_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE*[^Dg] (glob)
_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn;
  GLIBCXX_3.4.26
##_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn (glob)
_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn;

ISTM that the patterns were backwards.  The following patch fixes this
and allowed i386-pc-solaris2.11 bootstrap to complete without
regressions relative to the last successful one.

Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2019-01-10  Rainer Orth  

* config/abi/pre/gnu.ver(GLIBCXX_3.4): Fix pattern.
(GLIBCXX_3.4.21): Likewise.

# HG changeset patch
# Parent  f8d5e17fd8042e330c31d714b9c619c1807d93cb
Fix libstdc++.so link on Solaris with C++17 LWG 2221 support

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -495,7 +495,7 @@ GLIBCXX_3.4 {
 _ZNSo8_M_writeEPKc[ilx];
 _ZNSo3put*;
 _ZNSo[5-9][a-z]*;
-_ZNSolsE*[^Dg];
+_ZNSolsE[^Dg]*;
 
 # std::basic_ostream
 _ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]Ev;
@@ -509,7 +509,7 @@ GLIBCXX_3.4 {
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE5writeEPKw*;
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE6sentry*;
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE8_M_writeEPKw[ilx];
-_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE*[^Dg];
+_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE[^Dg]*;
 
 # std::ostream operators and inserters
 _ZSt4end[ls]I[cw]St11char_traitsI[cw]EERSt13basic_ostream*;


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-10 Thread Jakub Jelinek
On Thu, Jan 10, 2019 at 09:23:27PM +, Richard Sandiford wrote:
> > "noreturn"...  What would that mean, *exactly*?  It cannot execute any
> > code the compiler can see, so such asm is better off as real asm anyway
> > (not inline asm).
> 
> "Exactly" is a strong word, and this wasn't my proposal, but...
> I think it would act like a noreturn call to an unknown function.
> Output operands wouldn't make sense, and arguably clobbers wouldn't
> either.

"noreturn" asm can be done already now, just use
asm volatile ("..." ...);
__builtin_unreachable ();

I think there is no need to add a new syntax for that.

Jakub


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-10 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Tue, Jan 08, 2019 at 12:03:06PM +, Richard Sandiford wrote:
>> Bernd Edlinger  writes:
>> > Meanwhile I found out, that the stack clobber has only been ignored up to
>> > gcc-5 (at least with lra targets, not really sure about reload targets).
>> > From gcc-6 on, with the exception of PR arm/77904 which was a regression 
>> > due
>> > to the underlying lra change, but fixed later, and back-ported to 
>> > gcc-6.3.0,
>> > this works for all targets I tried so far.
>> >
>> > To me, it starts to look like a rather unique and useful feature, that I 
>> > would
>> > like to keep working.
>> 
>> Not sure what you mean by "unique".  But forcing a frame is a bit of
>> a slippery concept.  Force it where?  For the asm only, or the whole
>> function?  This depends on optimisation and hasn't been consistent
>> across GCC versions, since it depends on the shrink-wrapping
>> optimisation.  (There was a similar controversy a while ago about
>> to what extent -fno-omit-frame-pointer should "force a frame".)
>
> It's not forcing a frame currently: it's just setting frame_pointer_needed.
> Whatever happens from that is the target's business.

Do you mean the asm clobber or -fno-omit-frame-pointer?  If the option,
then yeah, and that was exactly what was controversial :-)

>> The effect on the redzone seems like something that should be specified
>> explicitly rather than as an (accidental?) side effect of listing the
>> sp in the clobber list.  Maybe this would be another use for the "asm
>> attributes" proposal.  "noreturn" was another attribute suggested on
>> IRC yesterday.
>
> Redzone is target-dependent.

Right.  Target-dependent asm attributes wouldn't be a problem though.
Most other things about an asm are target-dependent anyway.

> "noreturn"...  What would that mean, *exactly*?  It cannot execute any
> code the compiler can see, so such asm is better off as real asm anyway
> (not inline asm).

"Exactly" is a strong word, and this wasn't my proposal, but...
I think it would act like a noreturn call to an unknown function.
Output operands wouldn't make sense, and arguably clobbers wouldn't
either.

Thanks,
Richard

>> But either way, the general feeling seems to be that going straight to a
>> hard error is too harsh, since there's quite a bit of existing code that
>> has the clobber.  This patch implements the compromise discussed on IRC
>> yesterday of making it a -Wdeprecated warning instead.
>
> The patch looks fine to me.  Thanks!
>
>
> Segher


patch to fix PR87305

2019-01-10 Thread Vladimir Makarov

The following patch fixes

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87305

The patch was bootstrapped and tested on x86-64 and ppc64 (be).

Committed as rev. 267823.

Index: ChangeLog
===
--- ChangeLog	(revision 267822)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2019-01-10  Vladimir Makarov  
+
+	PR rtl-optimization/87305
+	* lra-assigns.c
+	(setup_live_pseudos_and_spill_after_risky_transforms): Check
+	allocation for big endian pseudos used as paradoxical subregs and
+	spill them if it is wrong.
+	* lra-constraints.c (lra_constraints): Add a comment.
+
 2019-01-10  Richard Biener  
 
 	PR tree-optimization/88792
Index: lra-assigns.c
===
--- lra-assigns.c	(revision 267822)
+++ lra-assigns.c	(working copy)
@@ -1146,12 +1146,12 @@ static void
 setup_live_pseudos_and_spill_after_risky_transforms (bitmap
 		 spilled_pseudo_bitmap)
 {
-  int p, i, j, n, regno, hard_regno;
+  int p, i, j, n, regno, hard_regno, biggest_nregs, nregs_diff;
   unsigned int k, conflict_regno;
   poly_int64 offset;
   int val;
   HARD_REG_SET conflict_set;
-  machine_mode mode;
+  machine_mode mode, biggest_mode;
   lra_live_range_t r;
   bitmap_iterator bi;
   int max_regno = max_reg_num ();
@@ -1166,8 +1166,26 @@ setup_live_pseudos_and_spill_after_risky
   for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
 if ((pic_offset_table_rtx == NULL_RTX
 	 || i != (int) REGNO (pic_offset_table_rtx))
-	&& reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
-  sorted_pseudos[n++] = i;
+	&& (hard_regno = reg_renumber[i]) >= 0 && lra_reg_info[i].nrefs > 0)
+  {
+	biggest_mode = lra_reg_info[i].biggest_mode;
+	biggest_nregs = hard_regno_nregs (hard_regno, biggest_mode);
+	nregs_diff = (biggest_nregs
+		  - hard_regno_nregs (hard_regno, PSEUDO_REGNO_MODE (i)));
+	enum reg_class rclass = lra_get_allocno_class (i);
+
+	if (WORDS_BIG_ENDIAN
+	&& (hard_regno - nregs_diff < 0
+		|| !TEST_HARD_REG_BIT (reg_class_contents[rclass],
+   hard_regno - nregs_diff)))
+	  {
+	/* Hard registers of paradoxical sub-registers are out of
+	   range of pseudo register class.  Spill the pseudo.  */
+	reg_renumber[i] = -1;
+	continue;
+	  }
+	sorted_pseudos[n++] = i;
+  }
   qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
   if (pic_offset_table_rtx != NULL_RTX
   && (regno = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER
@@ -1206,10 +1224,11 @@ setup_live_pseudos_and_spill_after_risky
 	|| hard_regno != reg_renumber[conflict_regno])
 	  {
 	int conflict_hard_regno = reg_renumber[conflict_regno];
-	machine_mode biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
-	int biggest_nregs = hard_regno_nregs (conflict_hard_regno,
-		  biggest_mode);
-	int nregs_diff
+	
+	biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
+	biggest_nregs = hard_regno_nregs (conflict_hard_regno,
+	  biggest_mode);
+	nregs_diff
 	  = (biggest_nregs
 		 - hard_regno_nregs (conflict_hard_regno,
  PSEUDO_REGNO_MODE (conflict_regno)));
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 267822)
+++ lra-constraints.c	(working copy)
@@ -4739,7 +4739,9 @@ lra_constraints (bool first_p)
   else
 /* On the first iteration we should check IRA assignment
correctness.  In rare cases, the assignments can be wrong as
-   early clobbers operands are ignored in IRA.  */
+   early clobbers operands are ignored in IRA or usages of
+   paradoxical sub-registers are not taken into account by
+   IRA.  */
 lra_risky_transformations_p = first_p;
   new_insn_uid_start = get_max_uid ();
   new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 267822)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2019-01-10  Vladimir Makarov  
+
+	PR rtl-optimization/87305
+	* gcc.target/aarch64/pr87305.c: New.
+
 2019-01-10  Richard Biener  
 
 	PR tree-optimization/88792
Index: testsuite/gcc.target/aarch64/pr87305.c
===
--- testsuite/gcc.target/aarch64/pr87305.c	(nonexistent)
+++ testsuite/gcc.target/aarch64/pr87305.c	(working copy)
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mbig-endian -w" } */
+
+int cc;
+
+void
+rc (__int128 *oi)
+{
+  __int128 qz = (__int128)2 << cc;
+
+  if (qz != 0)
+{
+  if (cc != 0)
+{
+  __int128 zp = 1;
+
+  for (;;)
+{
+  unsigned __int128 *ar = 
+  int y5;
+
+  if (oi != 0)
+{
+ y3:
+  zp = *oi + *ar;
+}
+
+  y5 = (cc + 1) 

[PATCH] Update sinhatanh test

2019-01-10 Thread Giuliano Belinassi
Previously, the tests 'sinhatanh-2.c' and 'sinhatanh-3.c' did not count
the number of functions found in the tree-dump. This patch address this
issue.

2019-01-10  Giuliano Belinassi  

* gcc.dg/sinhatanh-2.c: Count the number of functions.
* gcc.dg/sinhatanh-3.c: Likewise.

Index: gcc/testsuite/gcc.dg/sinhatanh-2.c
===
--- gcc/testsuite/gcc.dg/sinhatanh-2.c	(revision 267815)
+++ gcc/testsuite/gcc.dg/sinhatanh-2.c	(working copy)
@@ -57,12 +57,12 @@
 }
 
 /* There should be calls to sinh, cosh and atanh */
-/* { dg-final { scan-tree-dump "cosh " "optimized" } } */
-/* { dg-final { scan-tree-dump "sinh " "optimized" } } */
-/* { dg-final { scan-tree-dump "atanh " "optimized" } } */
-/* { dg-final { scan-tree-dump "coshf " "optimized" } } */
-/* { dg-final { scan-tree-dump "sinhf " "optimized" } } */
-/* { dg-final { scan-tree-dump "atanhf " "optimized" } } */
-/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
-/* { dg-final { scan-tree-dump "sinhl " "optimized" } } */
-/* { dg-final { scan-tree-dump "atanhl " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "cosh " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "sinh " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "atanh " "2" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "coshf " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "sinhf " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "atanhf " "2" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "coshl " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "sinhl " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "atanhl " "2" "optimized" } } */
Index: gcc/testsuite/gcc.dg/sinhatanh-3.c
===
--- gcc/testsuite/gcc.dg/sinhatanh-3.c	(revision 267815)
+++ gcc/testsuite/gcc.dg/sinhatanh-3.c	(working copy)
@@ -51,12 +51,12 @@
 }
 
 /* There should be calls to sinh, cosh and atanh */
-/* { dg-final { scan-tree-dump "cosh " "optimized" } } */
-/* { dg-final { scan-tree-dump "sinh " "optimized" } } */
-/* { dg-final { scan-tree-dump "atanh " "optimized" } } */
-/* { dg-final { scan-tree-dump "coshf " "optimized" } } */
-/* { dg-final { scan-tree-dump "sinhf " "optimized" } } */
-/* { dg-final { scan-tree-dump "atanhf " "optimized" } } */
-/* { dg-final { scan-tree-dump "coshl " "optimized" } } */
-/* { dg-final { scan-tree-dump "sinhl " "optimized" } } */
-/* { dg-final { scan-tree-dump "atanhl " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "cosh " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "sinh " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "atanh " "2" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "coshf " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "sinhf " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "atanhf " "2" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "coshl " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "sinhl " "1" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "atanhl " "2" "optimized" } } */


[Patch, fortran] Fix PR59345, repacking of a packed temporary array

2019-01-10 Thread Thomas Koenig

Hello world,

the attached patch fixes a rather bad missed optimization, where
the generated temporary array for

SUBROUTINE S1(A)
 REAL :: A(3)
 CALL S2(-A)
END SUBROUTINE


was packed and unpacked(!).

Regression-tested. OK for trunk?

Regards

Thomas

2019-01-10  Thomas Koenig  

PR fortran/59345
* trans-array.c (gfc_conv_parameter_array):  Temporary
arrays generated for expressions do not need to be repacked.

2019-01-10  Thomas Koenig  

PR fortran/59345
* gfortran.dg/internal_pack_16.f90: New test.
Index: trans-array.c
===
--- trans-array.c	(Revision 267737)
+++ trans-array.c	(Arbeitskopie)
@@ -7866,6 +7866,12 @@ gfc_conv_array_parameter (gfc_se * se, gfc_expr *
 
   no_pack = contiguous && no_pack;
 
+  /* If we have an expression, an array temporary will be
+ generated which does not need to be packed / unpacked
+ if passed to an explicit-shape dummy array.  */
+
+  no_pack = no_pack || (g77 && expr->expr_type == EXPR_OP);
+
   /* Array constructors are always contiguous and do not need packing.  */
   array_constructor = g77 && !this_array_result && expr->expr_type == EXPR_ARRAY;
 
! { dg-do compile }
! { dg-additional-options "-fdump-tree-original" }
! PR 59345 - pack/unpack was not needed here.
SUBROUTINE S1(A)
 REAL :: A(3)
 CALL S2(-A)
END SUBROUTINE
! { dg-final { scan-tree-dump-not "_gfortran_internal_pack" "original" } }
! { dg-final { scan-tree-dump-not "_gfortran_internal_unpack" "original" } }


[PATCH 2/2][GCC][ARM] Implement hint intrinsics for ARM

2019-01-10 Thread Srinath Parvathaneni
Hi All,

This patch implements the ACLE hint intrinsics (nop,yield,wfe,wfi,sev 
and sevl), for all ARM targets.

The intrinsics specification will be published on the Arm website [1].

[1] 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

Bootstrapped on arm-none-linux-gnueabihf, regression tested on 
arm-none-eabi with no regressions and
ran the added tests for arm, thumb-1 and thumb-2 modes.

Ok for trunk? If ok, could someone commit the patch on my behalf, I 
don't have commit rights.

Thanks,
Srinath

gcc/ChangeLog:

2019-01-10  Srinath Parvathaneni  

* config/arm/arm-builtins.c (NOP_QUALIFIERS): New qualifier.
(arm_expand_builtin_args): New case.
* config/arm/arm.md (yield): New pattern name.
(wfe): Likewise.
(wfi): Likewise.
(sev): Likewise.
(sevl): Likewise.
* config/arm/arm_acle.h (__nop ): New inline function.
(__yield): Likewise.
(__sev): Likewise.
(__sevl): Likewise.
(__wfi): Likewise.
(__wfe): Likewise.
* config/arm/arm_acle_builtins.def (VAR1):
(nop): New builtin definitions.
(yield): Likewise.
(sev): Likewise.
(sevl): Likewise.
(wfi): Likewise.
(wfe): Likewise.
* config/arm/unspecs.md (unspecv):
(VUNSPEC_YIELD): New volatile unspec.
(VUNSPEC_SEV): Likewise.
(VUNSPEC_SEVL): Likewise.
(VUNSPEC_WFI): Likewise.

gcc/testsuite/ChangeLog:

2019-01-10  Srinath Parvathaneni  

* gcc.target/arm/acle/nop.c: New test.
* gcc.target/arm/acle/sev-1.c: Likewise.
* gcc.target/arm/acle/sev-2.c: Likewise.
* gcc.target/arm/acle/sev-3.c: Likewise.
* gcc.target/arm/acle/sevl-1.c: Likewise.
* gcc.target/arm/acle/sevl-2.c: Likewise.
* gcc.target/arm/acle/sevl-3.c: Likewise.
* gcc.target/arm/acle/wfe-1.c: Likewise.
* gcc.target/arm/acle/wfe-2.c: Likewise.
* gcc.target/arm/acle/wfe-3.c: Likewise.
* gcc.target/arm/acle/wfi-1.c: Likewise.
* gcc.target/arm/acle/wfi-2.c: Likewise.
* gcc.target/arm/acle/wfi-3.c: Likewise.
* gcc.target/arm/acle/yield-1.c: Likewise.
* gcc.target/arm/acle/yield-2.c: Likewise.
* gcc.target/arm/acle/yield-3.c: Likewise.




diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 563ca51dcd0d63046d2bf577ca86d5f70a466bcf..2afa9649813c0f37a803db5add1139067d83a343 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -85,6 +85,12 @@ enum arm_type_qualifiers
   qualifier_const_void_pointer = 0x802
 };
 
+/* The qualifier allows generation of builtins with no operands.  */
+static enum arm_type_qualifiers
+arm_nop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_void };
+#define NOP_QUALIFIERS (arm_nop_qualifiers)
+
 /*  The qualifier_internal allows generation of a unary builtin from
 a pattern with a third pseudo-operand such as a match_scratch.
 T (T).  */
@@ -2343,6 +2349,10 @@ constant_arg:
   else
 switch (argc)
   {
+  case 0:
+pat = GEN_FCN (icode) ();
+break;
+
   case 1:
 	pat = GEN_FCN (icode) (op[0]);
 	break;
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f6196e9316898e3258e08d8f2ece8fe9640676ca..36b24cfdfa6c61d952a5c704f54d37f2b0fdd34e 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8906,6 +8906,76 @@
(set_attr "type" "mov_reg")]
 )
 
+(define_insn "yield"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_YIELD)]
+  ""
+{
+  if (TARGET_ARM)
+return ".inst\t0xe320f001\t//yield";
+  else if(TARGET_THUMB2)
+return ".inst\t0xf3af8001\t//yield";
+  else /* TARGET_THUMB1 */
+return ".inst\t0xbf10\t//yield";
+}
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "wfe"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_WFE)]
+  ""
+{
+  if (TARGET_ARM)
+return ".inst\t0xe320f002\t//wfe";
+  else if(TARGET_THUMB2)
+return ".inst\t0xf3af8002\t//wfe";
+  else /* TARGET_THUMB1 */
+return ".inst\t0xbf20\t//wfe";
+}
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "wfi"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_WFI)]
+  ""
+{
+  if (TARGET_ARM)
+return ".inst\t0xe320f003\t//wfi";
+  else if(TARGET_THUMB2)
+return ".inst\t0xf3af8003\t//wfi";
+  else /* TARGET_THUMB1 */
+return ".inst\t0xbf30\t//wfi";
+}
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "sev"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SEV)]
+  ""
+{
+  if (TARGET_ARM)
+return ".inst\t0xe320f004\t//sev";
+  else if(TARGET_THUMB2)
+return ".inst\t0xf3af8004\t//sev";
+  else /* TARGET_THUMB1 */
+return ".inst\t0xbf40\t//sev";
+}
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "sevl"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SEVL)]
+  ""
+{
+  if (TARGET_ARM)
+return ".inst\t0xe320f005\t//sevl";
+  else if(TARGET_THUMB2)
+return ".inst\t0xf3af8005\t//sevl";
+  else /* TARGET_THUMB1 */
+

[PATCH 1/2][GCC][AArch64] Implement hint intrinsics for AArch64

2019-01-10 Thread Srinath Parvathaneni
Hi All,

This patch implements the ACLE hint intrinsics (nop, yield, wfe, wfi, 
sev and sevl), for AArch64.

The instructions are documented in the ArmARM[1] and the intrinsics 
specification will be
published on the Arm website [2].

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf

Bootstrapped on aarch64-none-linux-gnu and regression tested on 
aarch64-none-elf with no regressions.

Ok for trunk? If ok, could someone commit the patch on my behalf, I 
don't have commit rights.

Thanks,
Srinath

gcc/ChangeLog:

2019-01-10  Srinath Parvathaneni  

* config/aarch64/aarch64.md (yield): New pattern name.
(wfe): Likewise.
(wfi): Likewise.
(sev): Likewise.
(sevl): Likewise.
(UNSPECV_YIELD): New volatile unspec.
(UNSPECV_WFE): Likewise.
(UNSPECV_WFI): Likewise.
(UNSPECV_SEV): Likewise.
(UNSPECV_SEVL): Likewise.
* config/aarch64/aarch64-builtins.c (aarch64_builtins):
AARCH64_SYSHINTOP_BUILTIN_NOP: New builtin.
AARCH64_SYSHINTOP_BUILTIN_YIELD: Likewise.
AARCH64_SYSHINTOP_BUILTIN_WFE: Likewise.
AARCH64_SYSHINTOP_BUILTIN_WFI: Likewise.
AARCH64_SYSHINTOP_BUILTIN_SEV: Likewise.
AARCH64_SYSHINTOP_BUILTIN_SEVL: Likewise.
(aarch64_init_syshintop_builtins): New function.
(aarch64_init_builtins): New call statement.
(aarch64_expand_builtin): New case.
* config/aarch64/arm_acle.h (__nop ): New inline function.
(__yield): Likewise.
(__sev): Likewise.
(__sevl): Likewise.
(__wfi): Likewise.
(__wfe): Likewise.

gcc/testsuite/ChangeLog:

2019-01-10  Srinath Parvathaneni  

* gcc.target/aarch64/acle/hint-1.c: New test.
* gcc.target/aarch64/acle/hint-2.c: Likewise.






diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 8cced94567008e28b1761ec8771589a3925f2904..d5424f98df1f5c8f206cbded097bdd2dfcd1ca8e 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -399,6 +399,13 @@ enum aarch64_builtins
   AARCH64_PAUTH_BUILTIN_AUTIA1716,
   AARCH64_PAUTH_BUILTIN_PACIA1716,
   AARCH64_PAUTH_BUILTIN_XPACLRI,
+  /* System Hint Operation Builtins for AArch64.  */
+  AARCH64_SYSHINTOP_BUILTIN_NOP,
+  AARCH64_SYSHINTOP_BUILTIN_YIELD,
+  AARCH64_SYSHINTOP_BUILTIN_WFE,
+  AARCH64_SYSHINTOP_BUILTIN_WFI,
+  AARCH64_SYSHINTOP_BUILTIN_SEV,
+  AARCH64_SYSHINTOP_BUILTIN_SEVL,
   AARCH64_BUILTIN_MAX
 };
 
@@ -977,6 +984,39 @@ aarch64_init_pauth_hint_builtins (void)
 			NULL_TREE);
 }
 
+/* System Hint Operation builtins for AArch64.  */
+void
+aarch64_init_syshintop_builtins (void)
+{
+  tree vtype_node
+= build_function_type_list (void_type_node, NULL);
+
+  aarch64_builtin_decls[AARCH64_SYSHINTOP_BUILTIN_NOP]
+= add_builtin_function ("__builtin_aarch64_nop", vtype_node,
+AARCH64_SYSHINTOP_BUILTIN_NOP, BUILT_IN_MD, NULL,
+NULL_TREE);
+  aarch64_builtin_decls[AARCH64_SYSHINTOP_BUILTIN_YIELD]
+= add_builtin_function ("__builtin_aarch64_yield", vtype_node,
+AARCH64_SYSHINTOP_BUILTIN_YIELD, BUILT_IN_MD, NULL,
+NULL_TREE);
+  aarch64_builtin_decls[AARCH64_SYSHINTOP_BUILTIN_WFE]
+= add_builtin_function ("__builtin_aarch64_wfe", vtype_node,
+AARCH64_SYSHINTOP_BUILTIN_WFE, BUILT_IN_MD, NULL,
+NULL_TREE);
+  aarch64_builtin_decls[AARCH64_SYSHINTOP_BUILTIN_WFI]
+= add_builtin_function ("__builtin_aarch64_wfi", vtype_node,
+AARCH64_SYSHINTOP_BUILTIN_WFI, BUILT_IN_MD, NULL,
+NULL_TREE);
+  aarch64_builtin_decls[AARCH64_SYSHINTOP_BUILTIN_SEV]
+= add_builtin_function ("__builtin_aarch64_sev", vtype_node,
+AARCH64_SYSHINTOP_BUILTIN_SEV, BUILT_IN_MD, NULL,
+NULL_TREE);
+  aarch64_builtin_decls[AARCH64_SYSHINTOP_BUILTIN_SEVL]
+= add_builtin_function ("__builtin_aarch64_sevl", vtype_node,
+AARCH64_SYSHINTOP_BUILTIN_SEVL, BUILT_IN_MD, NULL,
+NULL_TREE);
+}
+
 void
 aarch64_init_builtins (void)
 {
@@ -1014,6 +1054,7 @@ aarch64_init_builtins (void)
  register them.  */
   if (!TARGET_ILP32)
 aarch64_init_pauth_hint_builtins ();
+  aarch64_init_syshintop_builtins ();
 }
 
 tree
@@ -1395,6 +1436,29 @@ aarch64_expand_builtin (tree exp,
 	}
 
   return target;
+case AARCH64_SYSHINTOP_BUILTIN_NOP:
+  emit_insn (GEN_FCN (CODE_FOR_nop) ());
+  return gen_reg_rtx (VOIDmode);
+
+case AARCH64_SYSHINTOP_BUILTIN_YIELD:
+  emit_insn (GEN_FCN (CODE_FOR_yield) ());
+  return gen_reg_rtx (VOIDmode);
+
+case 

Re: [Committed][AArch64] Fix PR62178 testcase failures

2019-01-10 Thread Wilco Dijkstra
I've backported this to GCC8 too since it had the same failures:

The testcase for PR62178 has been failing for a while due to the pass
conditions being too tight, resulting in failures with -mcmodel=tiny:

ldr q2, [x0], 124
ld1r{v1.4s}, [x1], 4
cmp x0, x2
mla v0.4s, v2.4s, v1.4s
bne .L7

-mcmodel=small generates the slightly different:

ldr q1, [x0], 124
ldr s2, [x1, 4]!
cmp x0, x2
mla v0.4s, v1.4s, v2.s[0]
bne .L7

This is due to Combine merging a DUP instruction with either a load
or MLA - we can't force it to prefer one over the other.  However the
generated vector loop is fast either way since it generates MLA and
merges the DUP either with a load or MLA.  So relax the conditions
slightly and check we still generate MLA and there is no DUP or FMOV.

The testcase now passes - committed as obvious.

ChangeLog
2019-01-09  Wilco Dijkstra

testsuite/
* gcc.target/aarch64/pr62178.c: Relax scan-assembler checks.

--- gcc/testsuite/gcc.target/aarch64/pr62178.c  (revision 266178)
+++ gcc/testsuite/gcc.target/aarch64/pr62178.c  (working copy)
@@ -18,5 +18,5 @@
 
 /* { dg-final { scan-assembler "ldr\\tq\[0-9\]+, \\\[x\[0-9\]+\\\], \[0-9\]+" 
} } */
 /* { dg-final { scan-assembler "mla\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+" 
} } */
-/* { dg-final { scan-assembler-not { dup } } } */
-/* { dg-final { scan-assembler-not { fmov } } } */
+/* { dg-final { scan-assembler-not {dup} } } */
+/* { dg-final { scan-assembler-not {fmov} } } */


Re: [PATCH] PR fortran/86322 -- Enforce F2018:C877

2019-01-10 Thread Thomas Koenig

Hi Steve!


The attached patche fixes the PR.  gfortran was not enforcing
F2018:C877 and would ICE.  Tested on x86_64-*-freebsd.  Ok to
commit?


OK.

Thanks for the patch!

Regards

Thomas


[Committed, GCC, AArch64] Disable tests for ilp32.

2019-01-10 Thread Sudakshina Das
Hi

Currently Return Address Signing is only supported in lp64. Thus the
tests that I added recently (that enables return address signing by the
mbranch-protection=standard option), should also be exempted from testing in
ilp32. This patch adds the needed dg-require-effective-target directive 
in the
tests.

*** gcc/testsuite/ChangeLog ***

2019-01-10  Sudakshina Das  

* gcc.target/aarch64/bti-1.c: Exempt for ilp32.
* gcc.target/aarch64/bti-2.c: Likewise.
* gcc.target/aarch64/bti-3.c: Likewise.

Only test directive change, hence only tested the above tests with:
RUNTESTFLAGS="--target_board \"unix{-mabi=ilp32}\" aarch64.exp="

Committed as obvious as r267818

Thanks
Sudi
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-1.c b/gcc/testsuite/gcc.target/aarch64/bti-1.c
index 975528cbf290af421f20d8c7edaef22a6bd6..5a556b08ed15679b25676a11fe9c7a64641ee671 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* -Os to create jump table.  */
 /* { dg-options "-Os" } */
+/* { dg-require-effective-target lp64 } */
 /* If configured with --enable-standard-branch-protection, don't use
command line option.  */
 /* { dg-additional-options "-mbranch-protection=standard" { target { ! default_branch_protection } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-2.c b/gcc/testsuite/gcc.target/aarch64/bti-2.c
index 85943c3d6415b010c858cb948221e33b0d30a310..6ad89284e1b74ec92ff4661e6a71c92230450d58 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target aarch64_bti_hw } */
 /* If configured with --enable-standard-branch-protection, don't use
command line option.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/bti-3.c b/gcc/testsuite/gcc.target/aarch64/bti-3.c
index 97cf5d37f42b9313da75481c2ceac884735ac995..9ff9f9d6be1d8708f34f50dc7303a1783c18f204 100644
--- a/gcc/testsuite/gcc.target/aarch64/bti-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/bti-3.c
@@ -1,6 +1,7 @@
 /* This is a copy of gcc/testsuite/gcc.c-torture/execute/pr56982.c to test the
setjmp case of the bti pass.  */
 /* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target aarch64_bti_hw } */
 /* { dg-options "--save-temps -mbranch-protection=standard" } */
 


Re: [PATCH, d] Add README for process contributing to dmd and phobos

2019-01-10 Thread Joseph Myers
On Thu, 10 Jan 2019, Iain Buclaw wrote:

> Hi,
> 
> Joseph made mention that there isn't a readme documenting where
> changes to d/dmd, libphobos/libdruntime, and libphobos/src should go.
> 
> I hope this clears things up.  OK for trunk?

This sort of patch is clearly covered by D maintainership.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Move PR84877 fix elsewhere (PR bootstrap/88450)

2019-01-10 Thread Richard Biener
On January 10, 2019 4:36:35 PM GMT+01:00, Eric Botcazou  
wrote:
>> Another problem is that in way too many cases we decide to choose
>> BIGGEST_ALIGNMENT for stack slots, even when not strictly needed. 
>E.g. any
>> BLKmode stack slot requests that BIGGEST_ALIGNMENT, even if
>TYPE_ALIGN is
>> much smaller and assign_stack_local_1 even asserts that for BLKmode
>the
>> alignment must be BIGGEST_ALIGNMENT.  E.g. on mingw with -mavx or
>-mavx512f
>> BIGGEST_ALIGNMENT is 256 or 512 bits, but
>MAX_SUPPORTED_STACK_ALIGNMENT is
>> 128 bits.  So the PR84877 change, even if it didn't cause wrong-code,
>causes
>> huge amounts of stack slots to be unnecessarily overaligned with
>dynamic
>> realignment.
>
>Yes, I think that we need to make sure that we dynamically realign only
>when 
>this is necessary and...
>
>> The following patch reverts to the previous behavior and moves this
>dynamic
>> stack realignment to the caller that needs it, which doesn't do
>caching and
>> where we can do it solely for this overaligned DECL_ALIGN.
>
>...this modified implementation looks much safer than the original one.
>
>> If there are other spots that need this, wondering about:
>>   else
>> copy = assign_temp (type, 1, 0);
>> in calls.c, either it can be done by using the variable-sized object
>> case in the then block, or could be done using assign_stack_local +
>> this short realignment too.
>
>The latter I'd say.
>
>> Bootstrapped/regtested on x86_64-linux, i686-linux,
>powerpc64le-linux,
>> bootstrapped on powerpc64-linux (regtest still pending there).
>> 
>> Ok for trunk?
>> 
>> 2019-01-09  Jakub Jelinek  
>> 
>>  PR middle-end/84877
>>  PR bootstrap/88450
>>  * function.c (assign_stack_local_1): Revert the 2018-11-21 changes.
>>  (assign_parm_setup_block): Do the argument slot realignment here
>>  instead.
>
>FWIW this looks OK to me.

Given that, 

OK.. 
Richard. 



Re: Substitute all "the the" with "the"

2019-01-10 Thread Joseph Myers
On Thu, 10 Jan 2019, Дилян Палаузов wrote:

> sed -i "s/the the/the/" `git grep -l "the the"`

That looks wrong; there are plenty of instances of "the theory", "the then 
branch" and similar that should not have such a substitution applied.

See Sandra's patch submissions for "can not", and the discussion thereof, 
for examples of how to propose such changes, including various files and 
directories in the source tree that should be excluded because they e.g. 
come from upstream sources maintained through different processes, or in 
the case of testcases because such fixes are not particularly relevant to 
them and unnecessarily perturb the code.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH][GCC][AArch64] Fix command line options canonicalization. (PR target/88530)

2019-01-10 Thread Kyrill Tkachov

Hi Tamar,

On 17/12/18 19:18, Tamar Christina wrote:

Hi All,

The options don't seem to get canonicalized into the smallest possible set
before output to the assembler. This means that overlapping feature sets are
emitted with superfluous parts.

Normally this isn't an issue, but in the case of crypto we have retro-actively
split it into aes and sha2. We need to emit only +crypto to the assembler
so old assemblers continue to work.

Because of how -mcpu=native and -march=native work they end up enabling all 
feature
bits, so we need to get the smallest possible set, which would also fix the
problem with older the assemblers and the retro-active split.

Admittedly this should be done earlier in options processing, but the problem
with the way AArch64 currently processes options is that where the isa_bits are
determined we don't know which options are part of the default set yet.

Which is why we instead do it late in processing when we have all the
information.  This however requires us to make a duplicate of the extensions
list.

The Option handling structures have been extended to have a boolean to indicate
whether the option is synthetic, with that I mean if the option flag itself has 
a bit.

e.g. +crypto isn't an actual bit, it just enables other bits, but other 
features flags
like +rdma also enable multiple options but are themselves also a feature.

There are two ways to solve this.

1) Either have the options that are feature bits also turn themselves on, e.g. 
change
   rdma to turn on FP, SIMD and RDMA as dependency bits.
2) Make a distinction between these two different type of features and have the 
framework
   handle it correctly.

Even though it's more code I went for the second approach, as it's the one 
that'll be less
fragile and give the least surprises.

This is a stop-gap measure that's has the lowest impact and is back-portable.

Effectively this patch changes the following:

The values before the => are the old compiler and after the => the new code.

-march=armv8.2-a+crypto+sha2 => -march=armv8.2-a+crypto
-march=armv8.2-a+sha2+aes => -march=armv8.2-a+crypto

The remaining behaviors stay the same.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for trunk?



This will need rebasing over the Armv8.5-A patches as there are new entries in 
config/aarch64/aarch64-option-extensions.def.
Since this has to be done anyway, I've also pointed out a few comment typos 
inline.

Apart from that, the patch looks good to me (this is a subtle area of GCC).

Thanks,
Kyrill



Thanks,
Tamar

gcc/ChangeLog:

2018-12-17  Tamar Christina  

PR target/88530
* common/config/aarch64/aarch64-common.c
(struct aarch64_option_extension): Add is_synthetic.
(all_extensions): Use it.
(TARGET_OPTION_INIT_STRUCT): Define hook.
(struct gcc_targetm_common): Moved to end.
(all_extensions_by_on): New.
(opt_ext_cmp, typedef opt_ext): New.
(aarch64_option_init_struct): New.
(aarch64_contains_opt): New.
(aarch64_get_extension_string_for_isa_flags): Output smallest set.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Explicitly include AES and SHA2 in crypto.
(fp, simd, crc, lse, fp16, rcpc, rdma, dotprod, aes, sha2, sha3,
sm4, fp16fml, sve, profile): Set is_synthetic to false.
(crypto): Set is_synthetic to true.

gcc/testsuite/ChangeLog:

2018-12-17  Tamar Christina  

PR target/88530
* gcc.target/aarch64/options_set_1.c: New test.
* gcc.target/aarch64/options_set_2.c: New test.
* gcc.target/aarch64/options_set_3.c: New test.
* gcc.target/aarch64/options_set_4.c: New test.
* gcc.target/aarch64/options_set_5.c: New test.
* gcc.target/aarch64/options_set_6.c: New test.
* gcc.target/aarch64/options_set_7.c: New test.
* gcc.target/aarch64/options_set_8.c: New test.
* gcc.target/aarch64/options_set_9.c: New test.

--


diff --git a/gcc/common/config/aarch64/aarch64-common.c 
b/gcc/common/config/aarch64/aarch64-common.c
index 
dd7d42673402c3cf16ebce009d263d62d574690a..d14237d229fd958c940aee32d3d6404b04cc137e
 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -46,6 +46,8 @@
 #define TARGET_OPTION_DEFAULT_PARAMS aarch64_option_default_params
 #undef TARGET_OPTION_VALIDATE_PARAM
 #define TARGET_OPTION_VALIDATE_PARAM aarch64_option_validate_param
+#undef TARGET_OPTION_INIT_STRUCT
+#define TARGET_OPTION_INIT_STRUCT aarch64_option_init_struct
 
 /* Set default optimization options.  */

 static const struct default_options aarch_option_optimization_table[] =
@@ -164,8 +166,6 @@ aarch64_handle_option (struct gcc_options *opts,
 }
 }
 
-struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;

-
 /* An ISA extension in the co-processor and main instruction set space.  */
 struct aarch64_option_extension
 {
@@ 

Re: [PATCH][GCC][AArch64] Have empty HWCAPs string ignored during native feature detection

2019-01-10 Thread Kyrill Tkachov

Hi Tamar,

On 18/12/18 13:36, Tamar Christina wrote:

Hi All,

This patch makes the feature detection code for AArch64 GCC not add features
automatically when the feature had no hwcaps string to match against.

This means that -mcpu=native no longer adds feature flags such as +profile.
The behavior wasn't noticed before because at the time +profile was added a bug
was preventing any feature bits from being added by native detections.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-12-18  Tamar Christina  

PR target/88530
* config/aarch64/aarch64-option-extensions.def: Document it.
* config/aarch64/driver-aarch64.c (host_detect_local_cpu): Skip feature
if empty hwcaps.

gcc/testsuite/ChangeLog:

2018-12-18  Tamar Christina  

PR target/88530
* gcc.target/aarch64/options_set_10.c: New test.

--


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 
cdf04e2d5fcccb8b9a32af8f83501ce23212bbab..323e642af2e87c2d463681c3a3efbaeff2ede018
 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -43,7 +43,8 @@
the extension (for example, the 'crypto' extension depends on four
entries: aes, pmull, sha1, sha2 being present).  In that case this field
should contain a space (" ") separated list of the strings in 'Features'
-   that are required.  Their order is not important.  */
+   that are required.  Their order is not important.  An empty string means
+   do not detect this feature during auto detection.  */
 
 /* Enabling "fp" just enables "fp".

Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2",
diff --git a/gcc/config/aarch64/driver-aarch64.c 
b/gcc/config/aarch64/driver-aarch64.c
index 
98f9d7959506338bd6a8524500a168cc22ef5396..4f386dbf5fc29cc54ff85e062d0b9cd146fa00e8
 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -253,6 +253,12 @@ host_detect_local_cpu (int argc, const char **argv)
  char *p = NULL;
  char *feat_string
= concat (aarch64_extensions[i].feat_string, NULL);
+
+ /* If the feature contains no HWCAPS string then ignore it for the
+auto detection.  */
+ if (strlen (feat_string) == 0)
+   continue;

I think this can avoid a strlen call by checking (*feat_string == '\0') though 
I believe most compilers will optimise it that way anyway.
It might be more immediately readable your way.
I wouldn't let it hold off this patch.

Looks good to me, but you'll need a maintainer to approve.

Thanks,
Kyrill


+
  bool enabled = true;
 
 	  /* This may be a multi-token feature string.  We need

diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_10.c 
b/gcc/testsuite/gcc.target/aarch64/options_set_10.c
new file mode 100644
index 
..5ffe83c199165dd4129814674297056bdf27cd83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/options_set_10.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target "aarch64*-*-linux*" } } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main ()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not {\.arch .+\+profile.*} } } */
+
+ /* Check that an empty feature string is not detected during mcpu=native.  */



Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On 03/12/2018 16:39, Ard Biesheuvel wrote:
> On Mon, 3 Dec 2018 at 10:55, Ramana Radhakrishnan
>  wrote:
>>
>> For quite sometime the kernel guys, (more specifically Ard) have been
>> talking about using a system register (sp_el0) and an offset from that
>> for a canary based access. This patchset adds support for a new set of
>> command line options similar to how powerpc has done this.
>>
>> I don't intend to change the defaults in userland, we've discussed this
>> for user-land in the past and as far as glibc and userland is concerned
>> we stick to the options as currently existing. The system register
>> option is really for the kernel to use along with an offset as they
>> control their ABI and this is a decision for them to make.
>>
>> I did consider sticking this all under a mcmodel=kernel-small option but
>> thought that would be a bit too aggressive. There is very little error
>> checking I can do in terms of the system register being used and really
>> the assembler would barf quite quickly in case things go wrong. I've
>> managed to rebuild Ard's kernel tree with an additional patch that
>> I will send to him. I haven't managed to boot this kernel.
>>
>> There was an additional question asked about the performance
>> characteristics of this but it's a security feature and the kernel
>> doesn't have the luxury of a hidden symbol. Further since the kernel
>> uses sp_el0 for access everywhere and if they choose to use the same
>> register I don't think the performance characteristics would be too bad,
>> but that's a decision for the kernel folks to make when taking in the
>> feature into the kernel.
>>
>> I still need to add some tests and documentation in invoke.texi but
>> this is at the stage where it would be nice for some other folks
>> to look at this.
>>
>> The difference in code generated is as below.
>>
>> extern void bar (char *);
>> int foo (void)
>> {
>> char a[100];
>> bar ();
>> }
>>
>> $GCC -O2  -fstack-protector-strong  vs
>> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg
>> -mstack-protector-guard-offset=1024 -fstack-protector-strong
>>
>>
>> --- tst.s   2018-12-03 09:46:21.174167443 +
>> +++ tst.s.1 2018-12-03 09:46:03.546257203 +
>> @@ -15,15 +15,14 @@
>>  mov x29, sp
>>  str x19, [sp, 16]
>>  .cfi_offset 19, -128
>> -   adrpx19, __stack_chk_guard
>> -   add x19, x19, :lo12:__stack_chk_guard
>> -   ldr x0, [x19]
>> -   str x0, [sp, 136]
>> -   mov x0,0
>> +   mrs x19, sp_el0
>>  add x0, sp, 32
>> +   ldr x1, [x19, 1024]
>> +   str x1, [sp, 136]
>> +   mov x1,0
>>  bl  bar
>>  ldr x0, [sp, 136]
>> -   ldr x1, [x19]
>> +   ldr x1, [x19, 1024]
>>  eor x1, x0, x1
>>  cbnzx1, .L5
>>
>>
>>
>>
>> I will be afk tomorrow and day after but this is to elicit some comments
>> and for Ard to try this out with his kernel patches.
>>
> 
> Thanks Ramana. I managed to build and run a complete kernel (including
> modules) on a bare metal system, and everything works as expected.
> 
> The only thing I'd like to confirm with you is the logic wrt the
> command line arguments, more specifically, if/when all 3 arguments
> have to appear, and whether they are permitted to appear if
> -fstack-protector is not set.

They are permitted to appear without -fstack-protector even though it 
doesn't make much sense ...

> 
> This is relevant given that we invoke the compiler in 3 different ways:
> - at the configure stage, we invoke the compiler with some/all of
> these options to decide whether the feature is supported, but the
> actual offset is not known, but also irrelevant
> - we invoke the compiler to build the header file that actually gives
> us the offset to pass to later invocations
> - finally, all kernel objects are built with all 3 arguments passed on
> the command line
> 
> It looks like your code permits -mstack-protector-guard-reg at any
> time, but only permits -mstack-protector-guard-offset if
> -mstack-protector-guard is set to sysreg (and thus set explicitly,
> since the default is global). Is that intentional? Can we expect this
> to remain like that?

It doesn't make sense to permit an offset if the stack protector guard 
is a global variable.


If the default changes to sysreg which I doubt, then I would expect 
-mstack-protector-guard-offset to be useable without 
-mstack-protector-guard=sysreg . However changing the default is not 
something I'm sure we have the appetite for yet in user land. The 
decision was made in 2015 that for user land the stack protector guard 
would be a hidden symbol and I expect there to be quite a lot of 
protracted discussion before changing this.


regards
Ramana


> 



Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On 10/01/2019 15:49, James Greenhalgh wrote:
> On Mon, Dec 03, 2018 at 03:55:36AM -0600, Ramana Radhakrishnan wrote:
>> For quite sometime the kernel guys, (more specifically Ard) have been
>> talking about using a system register (sp_el0) and an offset from that
>> for a canary based access. This patchset adds support for a new set of
>> command line options similar to how powerpc has done this.
>>
>> I don't intend to change the defaults in userland, we've discussed this
>> for user-land in the past and as far as glibc and userland is concerned
>> we stick to the options as currently existing. The system register
>> option is really for the kernel to use along with an offset as they
>> control their ABI and this is a decision for them to make.
>>
>> I did consider sticking this all under a mcmodel=kernel-small option but
>> thought that would be a bit too aggressive. There is very little error
>> checking I can do in terms of the system register being used and really
>> the assembler would barf quite quickly in case things go wrong. I've
>> managed to rebuild Ard's kernel tree with an additional patch that
>> I will send to him. I haven't managed to boot this kernel.
>>
>> There was an additional question asked about the performance
>> characteristics of this but it's a security feature and the kernel
>> doesn't have the luxury of a hidden symbol. Further since the kernel
>> uses sp_el0 for access everywhere and if they choose to use the same
>> register I don't think the performance characteristics would be too bad,
>> but that's a decision for the kernel folks to make when taking in the
>> feature into the kernel.
>>
>> I still need to add some tests and documentation in invoke.texi but
>> this is at the stage where it would be nice for some other folks
>> to look at this.
>>
>> The difference in code generated is as below.
>>
>> extern void bar (char *);
>> int foo (void)
>> {
>> char a[100];
>> bar ();
>> }
>>
>> $GCC -O2  -fstack-protector-strong  vs
>> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg
>> -mstack-protector-guard-offset=1024 -fstack-protector-strong
>>
>>  
>> --- tst.s2018-12-03 09:46:21.174167443 +
>> +++ tst.s.1  2018-12-03 09:46:03.546257203 +
>> @@ -15,15 +15,14 @@
>>  mov x29, sp
>>  str x19, [sp, 16]
>>  .cfi_offset 19, -128
>> -adrpx19, __stack_chk_guard
>> -add x19, x19, :lo12:__stack_chk_guard
>> -ldr x0, [x19]
>> -str x0, [sp, 136]
>> -mov x0,0
>> +mrs x19, sp_el0
>>  add x0, sp, 32
>> +ldr x1, [x19, 1024]
>> +str x1, [sp, 136]
>> +mov x1,0
>>  bl  bar
>>  ldr x0, [sp, 136]
>> -ldr x1, [x19]
>> +ldr x1, [x19, 1024]
>>  eor x1, x0, x1
>>  cbnzx1, .L5
>>
>>
>>
>>
>> I will be afk tomorrow and day after but this is to elicit some comments
>> and for Ard to try this out with his kernel patches.
>>
>> Thoughts ?
> 
> I didn't see ananswer on list to Ard's questions about the command-line logic.

Ah I must have missed that - will take that up separately.

> Remember to also fix up the error message concerns Florian raised.
> 


> That said, if Jakub is happy with this in Stage 4, I am too.
> 
> My biggest concern is the -mstack-protector-guard-reg interface, which
> is unchecked user input and so opens up nasty ways to force the compiler
> towards out of bounds accesses (e.g.
> -mstack-protector-guard-reg="What memory is at %10")
> 

-mstack-protector-guard-reg is fine - it's a system register , if the 
assembler doesn't recognize it , it will barf.

-mstack-protector-guard-offset= I assume is what you are 
concerned about. I don't have a good answer to that one and am going to 
chicken out and say this is the same interface as x86 and power and 
while I accept it's an access to any location, the user can still do 
that with a C program and any arbitrary inline asm :-/



regards
Ramana

> Thanks,
> James
> 
>>
>> regards
>> Ramana
>>
>> gcc/ChangeLog:
>>
>> 2018-11-23  Ramana Radhakrishnan  
>>
>>   * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
>>   * config/aarch64/aarch64.c (aarch64_override_options_internal):
>> Handle
>>   and put in error checks for stack protector guard options.
>>   (aarch64_stack_protect_guard): New.
>>   (TARGET_STACK_PROTECT_GUARD): Define.
>>   * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
>>   (reg_stack_protect_address): New.
>>   (stack_protect_set): Adjust for SSP_GLOBAL.
>>   (stack_protect_test): Likewise.
>>   * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
>>   (-mstack-protector-guard): Likewise.
>>   (-mstack-protector-guard-offset): Likewise.
>>   * doc/invoke.texi: Document new AArch64 options.
> 



Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Will Deacon
On Thu, Jan 10, 2019 at 03:49:27PM +, James Greenhalgh wrote:
> On Mon, Dec 03, 2018 at 03:55:36AM -0600, Ramana Radhakrishnan wrote:
> > For quite sometime the kernel guys, (more specifically Ard) have been 
> > talking about using a system register (sp_el0) and an offset from that 
> > for a canary based access. This patchset adds support for a new set of
> > command line options similar to how powerpc has done this.
> > 
> > I don't intend to change the defaults in userland, we've discussed this 
> > for user-land in the past and as far as glibc and userland is concerned 
> > we stick to the options as currently existing. The system register 
> > option is really for the kernel to use along with an offset as they 
> > control their ABI and this is a decision for them to make.
> > 
> > I did consider sticking this all under a mcmodel=kernel-small option but
> > thought that would be a bit too aggressive. There is very little error
> > checking I can do in terms of the system register being used and really
> > the assembler would barf quite quickly in case things go wrong. I've
> > managed to rebuild Ard's kernel tree with an additional patch that
> > I will send to him. I haven't managed to boot this kernel.
> > 
> > There was an additional question asked about the performance 
> > characteristics of this but it's a security feature and the kernel 
> > doesn't have the luxury of a hidden symbol. Further since the kernel 
> > uses sp_el0 for access everywhere and if they choose to use the same 
> > register I don't think the performance characteristics would be too bad, 
> > but that's a decision for the kernel folks to make when taking in the 
> > feature into the kernel.
> > 
> > I still need to add some tests and documentation in invoke.texi but
> > this is at the stage where it would be nice for some other folks
> > to look at this.
> > 
> > The difference in code generated is as below.
> > 
> > extern void bar (char *);
> > int foo (void)
> > {
> >char a[100];
> >bar ();
> > }
> > 
> > $GCC -O2  -fstack-protector-strong  vs 
> > -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg 
> > -mstack-protector-guard-offset=1024 -fstack-protector-strong
> > 
> > 
> > --- tst.s   2018-12-03 09:46:21.174167443 +
> > +++ tst.s.1 2018-12-03 09:46:03.546257203 +
> > @@ -15,15 +15,14 @@
> > mov x29, sp
> > str x19, [sp, 16]
> > .cfi_offset 19, -128
> > -   adrpx19, __stack_chk_guard
> > -   add x19, x19, :lo12:__stack_chk_guard
> > -   ldr x0, [x19]
> > -   str x0, [sp, 136]
> > -   mov x0,0
> > +   mrs x19, sp_el0
> > add x0, sp, 32
> > +   ldr x1, [x19, 1024]
> > +   str x1, [sp, 136]
> > +   mov x1,0
> > bl  bar
> > ldr x0, [sp, 136]
> > -   ldr x1, [x19]
> > +   ldr x1, [x19, 1024]
> > eor x1, x0, x1
> > cbnzx1, .L5
> > 
> > 
> > 
> > 
> > I will be afk tomorrow and day after but this is to elicit some comments 
> > and for Ard to try this out with his kernel patches.
> > 
> > Thoughts ?
> 
> I didn't see ananswer on list to Ard's questions about the command-line logic.

FWIW: the kernel-side is now merged upstream in 5.0-rc1:

http://git.kernel.org/linus/0a1213fa7432

where we ended up checking for the presence of all three options to be
on the safe side.

Will


Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread James Greenhalgh
On Mon, Dec 03, 2018 at 03:55:36AM -0600, Ramana Radhakrishnan wrote:
> For quite sometime the kernel guys, (more specifically Ard) have been 
> talking about using a system register (sp_el0) and an offset from that 
> for a canary based access. This patchset adds support for a new set of
> command line options similar to how powerpc has done this.
> 
> I don't intend to change the defaults in userland, we've discussed this 
> for user-land in the past and as far as glibc and userland is concerned 
> we stick to the options as currently existing. The system register 
> option is really for the kernel to use along with an offset as they 
> control their ABI and this is a decision for them to make.
> 
> I did consider sticking this all under a mcmodel=kernel-small option but
> thought that would be a bit too aggressive. There is very little error
> checking I can do in terms of the system register being used and really
> the assembler would barf quite quickly in case things go wrong. I've
> managed to rebuild Ard's kernel tree with an additional patch that
> I will send to him. I haven't managed to boot this kernel.
> 
> There was an additional question asked about the performance 
> characteristics of this but it's a security feature and the kernel 
> doesn't have the luxury of a hidden symbol. Further since the kernel 
> uses sp_el0 for access everywhere and if they choose to use the same 
> register I don't think the performance characteristics would be too bad, 
> but that's a decision for the kernel folks to make when taking in the 
> feature into the kernel.
> 
> I still need to add some tests and documentation in invoke.texi but
> this is at the stage where it would be nice for some other folks
> to look at this.
> 
> The difference in code generated is as below.
> 
> extern void bar (char *);
> int foo (void)
> {
>char a[100];
>bar ();
> }
> 
> $GCC -O2  -fstack-protector-strong  vs 
> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg 
> -mstack-protector-guard-offset=1024 -fstack-protector-strong
> 
>   
> --- tst.s 2018-12-03 09:46:21.174167443 +
> +++ tst.s.1   2018-12-03 09:46:03.546257203 +
> @@ -15,15 +15,14 @@
>   mov x29, sp
>   str x19, [sp, 16]
>   .cfi_offset 19, -128
> - adrpx19, __stack_chk_guard
> - add x19, x19, :lo12:__stack_chk_guard
> - ldr x0, [x19]
> - str x0, [sp, 136]
> - mov x0,0
> + mrs x19, sp_el0
>   add x0, sp, 32
> + ldr x1, [x19, 1024]
> + str x1, [sp, 136]
> + mov x1,0
>   bl  bar
>   ldr x0, [sp, 136]
> - ldr x1, [x19]
> + ldr x1, [x19, 1024]
>   eor x1, x0, x1
>   cbnzx1, .L5
> 
> 
> 
> 
> I will be afk tomorrow and day after but this is to elicit some comments 
> and for Ard to try this out with his kernel patches.
> 
> Thoughts ?

I didn't see ananswer on list to Ard's questions about the command-line logic.
Remember to also fix up the error message concerns Florian raised.

That said, if Jakub is happy with this in Stage 4, I am too.

My biggest concern is the -mstack-protector-guard-reg interface, which
is unchecked user input and so opens up nasty ways to force the compiler
towards out of bounds accesses (e.g.
-mstack-protector-guard-reg="What memory is at %10")

Thanks,
James

> 
> regards
> Ramana
> 
> gcc/ChangeLog:
> 
> 2018-11-23  Ramana Radhakrishnan  
> 
>  * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
>  * config/aarch64/aarch64.c (aarch64_override_options_internal): 
> Handle
>  and put in error checks for stack protector guard options.
>  (aarch64_stack_protect_guard): New.
>  (TARGET_STACK_PROTECT_GUARD): Define.
>  * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
>  (reg_stack_protect_address): New.
>  (stack_protect_set): Adjust for SSP_GLOBAL.
>  (stack_protect_test): Likewise.
>  * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
>  (-mstack-protector-guard): Likewise.
>  (-mstack-protector-guard-offset): Likewise.
>  * doc/invoke.texi: Document new AArch64 options.


Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2019-01-10 Thread Christophe Lyon
On Wed, 9 Jan 2019 at 15:42, Sudakshina Das  wrote:
>
> Hi
>
> On 20/12/18 16:40, Sudakshina Das wrote:
> > Hi James
> >
> > On 19/12/18 3:40 PM, James Greenhalgh wrote:
> >> On Fri, Dec 14, 2018 at 10:09:03AM -0600, Sudakshina Das wrote:
> >>
> >> 
> >>
> >>> I have updated the patch according to our discussions offline.
> >>> The md pattern is now split into 4 patterns and i have added a new
> >>> test for the setjmp case along with some comments where missing.
> >>
> >> This is OK for trunk.
> >>
> >
> > Thanks for the approvals. With this my series is ready to go in trunk. I
> > will wait for Sam's options patch to go in trunk before I commit mine.
> >
>
> Series is committed with a rebase without Sam Tebbs's 3rd patch for
> B-Key addition as r267765 to r267770.
>
> Thanks
> Sudi
>

Hi Sudi,

I think the new bti-1.c test lacks
/* { dg-require-effective-target lp64 } */
as I see it failing when using -mabi=ilp32:
cc1: sorry, unimplemented: return address signing is only supported
for -mabi=lp64

Christophe

> > Thanks
> > Sudi
> >
> >> Thanks,
> >> James
> >>
> >>> *** gcc/ChangeLog ***
> >>>
> >>> 2018-xx-xx  Sudakshina Das  
> >>> Ramana Radhakrishnan  
> >>>
> >>> * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
> >>> * gcc/config/aarch64/aarch64.h: Update comment for
> >>> TRAMPOLINE_SIZE.
> >>> * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
> >>> Update if bti is enabled.
> >>> * config/aarch64/aarch64-bti-insert.c: New file.
> >>> * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
> >>> bti pass.
> >>> * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
> >>> Declare the new bti pass.
> >>> * config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
> >>> UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
> >>> (bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
> >>> * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
> >>>
> >>> *** gcc/testsuite/ChangeLog ***
> >>>
> >>> 2018-xx-xx  Sudakshina Das  
> >>>
> >>> * gcc.target/aarch64/bti-1.c: New test.
> >>> * gcc.target/aarch64/bti-2.c: New test.
> >>> * gcc.target/aarch64/bti-3.c: New test.
> >>> * lib/target-supports.exp
> >>> (check_effective_target_aarch64_bti_hw): Add new check for
> >>> BTI hw.
> >>>
> >>> Thanks
> >>> Sudi
>


Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

2019-01-10 Thread Tamar Christina
Hi Christoph,

It was introduced in a small refactoring after which I only retested the 
testcases I added,which don't trigger the issue.

In any case it's a trivial fix and I'll submit a patch in a bit.

Tamar


From: Christophe Lyon 
Sent: Thursday, January 10, 2019 3:35:18 PM
To: Tamar Christina
Cc: Kyrill Tkachov; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard 
Earnshaw; ni...@redhat.com
Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
mutliplication and addition

Hi Tamar,


On Thu, 10 Jan 2019 at 04:44, Tamar Christina  wrote:
>
> Hi Kyrill,
>
> Committed with a the addition of a few trivial defines and iterators that 
> were missing due to
> The patch being split.
>
> Thanks,
> Tamar
>
> -Original Message-
> From: Kyrill Tkachov 
> Sent: Friday, December 21, 2018 11:40 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan ; 
> Richard Earnshaw ; ni...@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> mutliplication and addition
>
> Hi Tamar,
>
> On 11/12/18 15:46, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > multiplication and add instructions with a rotate along the Argand plane.
> >
> > The instructions are documented in the ArmARM[1] and the intrinsics
> > specification will be published on the Arm website [2].
> >
> > The Lane versions of these instructions are special in that they always 
> > select a pair.
> > using index 0 means selecting lane 0 and 1.  Because of this the range
> > check for the intrinsics require special handling.
> >
> > On Arm, in order to implement some of the lane intrinsics we're using
> > the structure of the register file.  The lane variant of these
> > instructions always select a D register, but the data itself can be
> > stored in Q registers.  This means that for single precision complex
> > numbers you are only allowed to select D[0] but using the register file 
> > layout you can get the range 0-1 for lane indices by selecting between 
> > Dn[0] and Dn+1[0].
> >
> > Same reasoning applies for half float complex numbers, except there
> > your D register indexes can be 0 or 1, so you have a total range of 4 
> > elements (for a V8HF).
> >
> >
> > [1]
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-referen
> > ce-manual-armv8-for-armv8-a-architecture-profile
> > [2] https://developer.arm.com/docs/101028/latest
> >
> > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> >
> > Ok for trunk?
> >
>
> Ok.
> Thanks,
> Kyrill
>
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  
> >
> > * config/arm/arm-builtins.c
> > (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > (MAC_LANE_PAIR_QUALIFIERS): New.
> > (arm_expand_builtin_args): Use it.
> > (arm_expand_builtin_1): Likewise.
> > * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> > * config/arm/arm_neon.h:
> > (vcadd_rot90_f16): New.
> > (vcaddq_rot90_f16): New.
> > (vcadd_rot270_f16): New.
> > (vcaddq_rot270_f16): New.
> > (vcmla_f16): New.
> > (vcmlaq_f16): New.
> > (vcmla_lane_f16): New.
> > (vcmla_laneq_f16): New.
> > (vcmlaq_lane_f16): New.
> > (vcmlaq_laneq_f16): New.
> > (vcmla_rot90_f16): New.
> > (vcmlaq_rot90_f16): New.
> > (vcmla_rot90_lane_f16): New.
> > (vcmla_rot90_laneq_f16): New.
> > (vcmlaq_rot90_lane_f16): New.
> > (vcmlaq_rot90_laneq_f16): New.
> > (vcmla_rot180_f16): New.
> > (vcmlaq_rot180_f16): New.
> > (vcmla_rot180_lane_f16): New.
> > (vcmla_rot180_laneq_f16): New.
> > (vcmlaq_rot180_lane_f16): New.
> > (vcmlaq_rot180_laneq_f16): New.
> > (vcmla_rot270_f16): New.
> > (vcmlaq_rot270_f16): New.
> > (vcmla_rot270_lane_f16): New.
> > (vcmla_rot270_laneq_f16): New.
> > (vcmlaq_rot270_lane_f16): New.
> > (vcmlaq_rot270_laneq_f16): New.
> > (vcadd_rot90_f32): New.
> > (vcaddq_rot90_f32): New.
> > (vcadd_rot270_f32): New.
> > (vcaddq_rot270_f32): New.
> > (vcmla_f32): New.
> > (vcmlaq_f32): New.
> > (vcmla_lane_f32): New.
> > (vcmla_laneq_f32): New.
> > (vcmlaq_lane_f32): New.
> > (vcmlaq_laneq_f32): New.
> > (vcmla_rot90_f32): New.
> > (vcmlaq_rot90_f32): New.
> > (vcmla_rot90_lane_f32): New.
> > (vcmla_rot90_laneq_f32): New.
> > (vcmlaq_rot90_lane_f32): New.
> > (vcmlaq_rot90_laneq_f32): New.
> > (vcmla_rot180_f32): New.
> > 

Re: [PATCH] Define new filesystem::__file_clock type

2019-01-10 Thread Jonathan Wakely

On 06/01/19 21:45 +, Jonathan Wakely wrote:

On 05/01/19 20:03 +, Jonathan Wakely wrote:

In C++17 the clock used for filesystem::file_time_type is unspecified,
allowing it to be chrono::system_clock. The C++2a draft requires it to
be a distinct type, with additional member functions to convert to/from
other clocks (either the system clock or UTC). In order to avoid an ABI
change later, this patch defines a new distinct type now, which will be
used for std::chrono::file_clock later.

* include/bits/fs_fwd.h (__file_clock): Define new clock.
(file_time_type): Redefine in terms of __file_clock.
* src/filesystem/ops-common.h (file_time): Add FIXME comment about
overflow.
* src/filesystem/std-ops.cc (is_set(perm_options, perm_options)): Give
internal linkage.
(internal_file_lock): New helper type for accessing __file_clock.
(do_copy_file): Use internal_file_lock to convert system time to
file_time_type.
(last_write_time(const path&, error_code&)): Likewise.
(last_write_time(const path&, file_time_type, error_code&)): Likewise.

Tested powerpc64-linux, committed to trunk.


There's a new failure on 32-bit x86:

/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/27_io/filesystem/operations/last_write_time.cc:148:
 void test02(): Assertion 'approx_equal(last_write_time(f.path), time)' failed.
FAIL: 27_io/filesystem/operations/last_write_time.cc execution test


The problem here is 32-bit time_t. I've defined the file_clock epoch
as a date after 2038, so unrepresentable in a 32-bit time_t.

The test stores a value of file_time_type::zero() which is the epoch,
and so that value can't be converted to time_t for the utimensat call.

Fixed by skipping the parts of the test using the file clock's epoch.

Tested x86_64-linux (-m32 and -m64). Committed to trunk.

commit 68908239cf8c8987ce4693f6769709f8f5b9fbc3
Author: Jonathan Wakely 
Date:   Thu Jan 10 15:38:31 2019 +

Fix filesystem::last_write_time failure with 32-bit time_t

* testsuite/27_io/filesystem/operations/last_write_time.cc: Fix
test failures on targets with 32-bit time_t.

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/operations/last_write_time.cc b/libstdc++-v3/testsuite/27_io/filesystem/operations/last_write_time.cc
index 7a693a1ddcb..3f31375f51b 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/operations/last_write_time.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/operations/last_write_time.cc
@@ -22,6 +22,7 @@
 // 15.25 Permissions [fs.op.last_write_time]
 
 #include 
+#include 
 #include 
 #include 
 
@@ -141,14 +142,27 @@ test02()
   VERIFY( !ec );
   VERIFY( approx_equal(last_write_time(f.path), time) );
 
+  if (std::numeric_limits::max()
+  < std::numeric_limits::max())
+return; // file clock's epoch is out of range for 32-bit time_t
+
   ec = bad_ec;
+  // The file clock's epoch:
   time = time_type();
   last_write_time(f.path, time, ec);
   VERIFY( !ec );
   VERIFY( approx_equal(last_write_time(f.path), time) );
 
   ec = bad_ec;
-  time -= std::chrono::milliseconds(1000 * 60 * 10 + 15);
+  // A time after the epoch
+  time += std::chrono::milliseconds(1000 * 60 * 10 + 15);
+  last_write_time(f.path, time, ec);
+  VERIFY( !ec );
+  VERIFY( approx_equal(last_write_time(f.path), time) );
+
+  ec = bad_ec;
+  // A time before than the epoch
+  time -= std::chrono::milliseconds(1000 * 60 * 20 + 15);
   last_write_time(f.path, time, ec);
   VERIFY( !ec );
   VERIFY( approx_equal(last_write_time(f.path), time) );


Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

2019-01-10 Thread Christophe Lyon
Hi Tamar,


On Thu, 10 Jan 2019 at 04:44, Tamar Christina  wrote:
>
> Hi Kyrill,
>
> Committed with a the addition of a few trivial defines and iterators that 
> were missing due to
> The patch being split.
>
> Thanks,
> Tamar
>
> -Original Message-
> From: Kyrill Tkachov 
> Sent: Friday, December 21, 2018 11:40 AM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan ; 
> Richard Earnshaw ; ni...@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> mutliplication and addition
>
> Hi Tamar,
>
> On 11/12/18 15:46, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > multiplication and add instructions with a rotate along the Argand plane.
> >
> > The instructions are documented in the ArmARM[1] and the intrinsics
> > specification will be published on the Arm website [2].
> >
> > The Lane versions of these instructions are special in that they always 
> > select a pair.
> > using index 0 means selecting lane 0 and 1.  Because of this the range
> > check for the intrinsics require special handling.
> >
> > On Arm, in order to implement some of the lane intrinsics we're using
> > the structure of the register file.  The lane variant of these
> > instructions always select a D register, but the data itself can be
> > stored in Q registers.  This means that for single precision complex
> > numbers you are only allowed to select D[0] but using the register file 
> > layout you can get the range 0-1 for lane indices by selecting between 
> > Dn[0] and Dn+1[0].
> >
> > Same reasoning applies for half float complex numbers, except there
> > your D register indexes can be 0 or 1, so you have a total range of 4 
> > elements (for a V8HF).
> >
> >
> > [1]
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-referen
> > ce-manual-armv8-for-armv8-a-architecture-profile
> > [2] https://developer.arm.com/docs/101028/latest
> >
> > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> >
> > Ok for trunk?
> >
>
> Ok.
> Thanks,
> Kyrill
>
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  
> >
> > * config/arm/arm-builtins.c
> > (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > (MAC_LANE_PAIR_QUALIFIERS): New.
> > (arm_expand_builtin_args): Use it.
> > (arm_expand_builtin_1): Likewise.
> > * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> > * config/arm/arm_neon.h:
> > (vcadd_rot90_f16): New.
> > (vcaddq_rot90_f16): New.
> > (vcadd_rot270_f16): New.
> > (vcaddq_rot270_f16): New.
> > (vcmla_f16): New.
> > (vcmlaq_f16): New.
> > (vcmla_lane_f16): New.
> > (vcmla_laneq_f16): New.
> > (vcmlaq_lane_f16): New.
> > (vcmlaq_laneq_f16): New.
> > (vcmla_rot90_f16): New.
> > (vcmlaq_rot90_f16): New.
> > (vcmla_rot90_lane_f16): New.
> > (vcmla_rot90_laneq_f16): New.
> > (vcmlaq_rot90_lane_f16): New.
> > (vcmlaq_rot90_laneq_f16): New.
> > (vcmla_rot180_f16): New.
> > (vcmlaq_rot180_f16): New.
> > (vcmla_rot180_lane_f16): New.
> > (vcmla_rot180_laneq_f16): New.
> > (vcmlaq_rot180_lane_f16): New.
> > (vcmlaq_rot180_laneq_f16): New.
> > (vcmla_rot270_f16): New.
> > (vcmlaq_rot270_f16): New.
> > (vcmla_rot270_lane_f16): New.
> > (vcmla_rot270_laneq_f16): New.
> > (vcmlaq_rot270_lane_f16): New.
> > (vcmlaq_rot270_laneq_f16): New.
> > (vcadd_rot90_f32): New.
> > (vcaddq_rot90_f32): New.
> > (vcadd_rot270_f32): New.
> > (vcaddq_rot270_f32): New.
> > (vcmla_f32): New.
> > (vcmlaq_f32): New.
> > (vcmla_lane_f32): New.
> > (vcmla_laneq_f32): New.
> > (vcmlaq_lane_f32): New.
> > (vcmlaq_laneq_f32): New.
> > (vcmla_rot90_f32): New.
> > (vcmlaq_rot90_f32): New.
> > (vcmla_rot90_lane_f32): New.
> > (vcmla_rot90_laneq_f32): New.
> > (vcmlaq_rot90_lane_f32): New.
> > (vcmlaq_rot90_laneq_f32): New.
> > (vcmla_rot180_f32): New.
> > (vcmlaq_rot180_f32): New.
> > (vcmla_rot180_lane_f32): New.
> > (vcmla_rot180_laneq_f32): New.
> > (vcmlaq_rot180_lane_f32): New.
> > (vcmlaq_rot180_laneq_f32): New.
> > (vcmla_rot270_f32): New.
> > (vcmlaq_rot270_f32): New.
> > (vcmla_rot270_lane_f32): New.
> > (vcmla_rot270_laneq_f32): New.
> > (vcmlaq_rot270_lane_f32): New.
> > (vcmlaq_rot270_laneq_f32): New.
> > * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, 
> > vcmla90,
> > vcmla180, 

Re: [PATCH] x86: Update VFIXUPIMM* Intrinsics to align with the latest Intel SDM

2019-01-10 Thread Matthias Kretz
On Donnerstag, 10. Januar 2019 14:27:40 CET Matthias Kretz wrote:
> On Donnerstag, 10. Januar 2019 11:39:56 CET Jakub Jelinek wrote:
> > On Thu, Jan 10, 2019 at 10:46:14AM +0100, Dr. Matthias Kretz wrote:
> > > _mm_fixupimm_ps(_mm_getexp_ps(x), x, _mm_set1_epi32(0x00550433), 0x00);
> > 
> > I guess you could use
> > _mm_mask_fixupimm_ps(_mm_getexp_ps(x), -1, x, _mm_set1_epi32(0x00550433),
> > 0x00); because that one does allow you to specify the dest operand.
> 
> Thanks. Actually _mm_getexp_ps produces the right answer already by itself.
> I only meant to demonstrate the fixupimm usage (e.g. if you calculate a
> trig function and then fix up for Annex F requirements).
> 
> BTW, your idea does not work because GCC recognizes the full writemask and
> simply produces the same as `_mm_fixupimm_ps(x, _mm_set1_epi32(0x00550433),
> 0x00)`. See here: https://godbolt.org/z/-5Ql0f
> 
> > But I agree it is just weird, the non-masked intrinsics don't take into
> > account the 0b cases anymore.
> 
> To be precise, they still do, but they produce garbage (e.g. https://
> godbolt.org/z/f6u-GI).

I opened PR88794 to track the issue.

-- 
──
 Dr. Matthias Kretzhttps://kretzfamily.de
 GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
 SIMD easy and portable https://github.com/VcDevel/Vc
──

[PATCH v2] x86-64: {,V}CVTSI2Sx are ambiguous without suffix

2019-01-10 Thread Jan Beulich
For 64-bit these should not be emitted without suffix in AT mode (as
being ambiguous that way); the suffixes are benign for 32-bit. For
consistency also omit the suffix in Intel mode for {,V}CVTSI2SxQ.

The omission has originally (prior to rev 260691) lead to wrong code
being generated for the 64-bit unsigned-to-float/double conversions (as
gas guesses an L suffix instead of the required Q one when the operand
is in memory). In all remaining cases (being changed here) the omission
would "just" lead to warnings with future gas versions.

As a result, arrange to check for the L suffixes in 32-bit test cases.

In order for related test cases to actually test what they're supposed
to test, add (seemingly unrelated) a few empty "asm volatile()".
Presumably there are more where constant propagation voids the intended
effect of the tests, but these are ones helping make sure the assembler
actually still assembles correctly the output after the changes here.
---
v2: Don't drop (redundant) suffixes from *2SI conversions. Adjust
changes to testsuite accordingly.

gcc/
2019-01-10  Jan Beulich  

* config/i386/i386.md (rex64suffix): Add L suffix for SI.
* config/i386/sse.md (cvtusi232,
sse2_cvtsi2sd): Add {l}.
(sse2_cvtsi2sdq): Make q conditional upon AT
syntax.

gcc/testsuite/
2019-01-10  Jan Beulich  

* gcc.target/i386/avx512f-vcvtsd2si-1.c,
gcc.target/i386/avx512f-vcvtss2si-1.c,
gcc.target/i386/avx512f-vcvttsd2si-1.c,
gcc.target/i386/avx512f-vcvttss2si-1.c: Permit l suffix.
* gcc.target/i386/avx512f-vcvtsi2ss-1.c,
gcc.target/i386/avx512f-vcvtusi2sd-1.c,
gcc.target/i386/avx512f-vcvtusi2ss-1.c: Expect l suffix.
* gcc.target/i386/avx512f-vcvtusi2sd-2.c,
gcc.target/i386/avx512f-vcvtusi2sd64-2.c,
gcc.target/i386/avx512f-vcvtusi2ss-2.c,
gcc.target/i386/avx512f-vcvtusi2ss64-2.c: Add asm volatile().
gcc.target/i386/pr19398.c: Permit l or q suffix.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1162,7 +1162,7 @@
   [(QI "V64QI") (HI "V32HI") (SI "V16SI") (DI "V8DI") (SF "V16SF") (DF 
"V8DF")])
 
 ;; Instruction suffix for REX 64bit operators.
-(define_mode_attr rex64suffix [(SI "") (DI "{q}")])
+(define_mode_attr rex64suffix [(SI "{l}") (DI "{q}")])
 (define_mode_attr rex64namesuffix [(SI "") (DI "q")])
 
 ;; This mode iterator allows :P to be used for patterns that operate on
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4767,7 +4767,7 @@
  (match_operand:VF_128 1 "register_operand" "v")
  (const_int 1)))]
   "TARGET_AVX512F && "
-  "vcvtusi2\t{%2, %1, %0|%0, %1, 
%2}"
+  "vcvtusi2{l}\t{%2, %1, %0|%0, %1, 
%2}"
   [(set_attr "type" "sseicvt")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -5026,9 +5026,9 @@
  (const_int 1)))]
   "TARGET_SSE2"
   "@
-   cvtsi2sd\t{%2, %0|%0, %2}
-   cvtsi2sd\t{%2, %0|%0, %2}
-   vcvtsi2sd\t{%2, %1, %0|%0, %1, %2}"
+   cvtsi2sd{l}\t{%2, %0|%0, %2}
+   cvtsi2sd{l}\t{%2, %0|%0, %2}
+   vcvtsi2sd{l}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,direct,*")
@@ -5048,9 +5048,9 @@
  (const_int 1)))]
   "TARGET_SSE2 && TARGET_64BIT"
   "@
-   cvtsi2sdq\t{%2, %0|%0, %2}
-   cvtsi2sdq\t{%2, %0|%0, %2}
-   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
+   cvtsi2sd{q}\t{%2, %0|%0, %2}
+   cvtsi2sd{q}\t{%2, %0|%0, %2}
+   vcvtsi2sd{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,direct,*")
--- a/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2si-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsd2si-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
-/* { dg-final { scan-assembler-times "vcvtsd2si\[ 
\\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]+.{6}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtsd2sil?\[ 
\\t\]+\[^\n\]*\{rn-sae\}\[^\n\]*%xmm\[0-9\]+.{6}(?:\n|\[ \\t\]+#)" 1 } } */
 #include 
 
 volatile __m128d x;
--- a/gcc/testsuite/gcc.target/i386/avx512f-vcvtsi2ss-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtsi2ss-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vcvtsi2ss\[ 
\\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 
} } */
+/* { dg-final { scan-assembler-times "vcvtsi2ssl\[ 
\\t\]+\[^%\n\]*%e\[^\{\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 
} } */
 
 #include 
 
--- a/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2si-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vcvtss2si-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
-/* { dg-final { scan-assembler-times "vcvtss2si\[ 
\\t\]+\[^\n\]*\{rn-sae\}\[^\{\n\]*%xmm\[0-9\]+.{6}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vcvtss2sil?\[ 

[SVE ACLE] Two infrastructure tweaks

2019-01-10 Thread Richard Sandiford
Hi,

The first patch replaces the autogenerated function enum with an
explicit one, so that we can have more than one .def entry with
the same function base name (e.g. predicated AND vs. integer AND).

The second patch makes us get the expected modes from the insn data,
rather than force callers to supply the modes explicitly.

Committed to aarch64/sve-acle-branch.

Richard


[SVE ACLE] Don't autogenerate the function enum

We need to be able to specify two .def entries with the same function
base name.  This patch does that by specifying the function enum
explicitly and using a separate enum for the .def file entries.

There is no change in behaviour.


diff --git a/gcc/config/aarch64/aarch64-sve-builtins.c b/gcc/config/aarch64/aarch64-sve-builtins.c
index 6b4018c0e45..a439fb9358e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.c
+++ b/gcc/config/aarch64/aarch64-sve-builtins.c
@@ -147,9 +147,28 @@ typedef enum type_suffix type_suffix_pair[2];
 
 /* Enumerates the function base names, such as "svadd".  */
 enum function {
-#define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PRED) FUNC_ ## NAME,
+  FUNC_svabd,
+  FUNC_svadd,
+  FUNC_svasrd,
+  FUNC_svdup,
+  FUNC_svindex,
+  FUNC_svmax,
+  FUNC_svmin,
+  FUNC_svmul,
+  FUNC_svptrue,
+  FUNC_svqadd,
+  FUNC_svqsub,
+  FUNC_svsub,
+  FUNC_svsubr
+};
+
+/* Enumerates the function groups defined in aarch64-sve-builtins.def.  */
+#define GROUP_ID(NAME, SHAPE, TYPES, PREDS) \
+  GROUP_##NAME##_##SHAPE##_##TYPES##_##PREDS
+enum group_id {
+#define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
+  GROUP_ID (NAME, SHAPE, TYPES, PREDS),
 #include "aarch64-sve-builtins.def"
-  NUM_FUNCS
 };
 
 /* Static information about each single-predicate or single-vector
@@ -187,6 +206,9 @@ struct type_suffix_info {
 
 /* Static information about a set of functions.  */
 struct function_group {
+  /* The unique identifier of the group.  */
+  group_id id;
+
   /* The base name, as a string and an enum.  */
   const char *name;
   function func;
@@ -209,8 +231,7 @@ struct function_group {
 /* Describes a fully-resolved function (i.e. one that has a unique full
name).  */
 struct GTY(()) function_instance {
-  function_instance () {}
-  function_instance (function, function_mode, const type_suffix_pair &,
+  function_instance (group_id, function_mode, const type_suffix_pair &,
 		 predication);
 
   bool operator== (const function_instance &) const;
@@ -221,7 +242,7 @@ struct GTY(()) function_instance {
   tree vector_type (unsigned int) const;
 
   /* The explicit "enum"s are required for gengtype.  */
-  enum function func;
+  enum group_id group;
   enum function_mode mode;
   type_suffix_pair types;
   enum predication pred;
@@ -391,6 +412,7 @@ private:
 
   /* The function being called.  */
   const function_instance _fi;
+  const function_group _group;
 
   /* The call we're folding.  */
   gcall *m_call;
@@ -448,6 +470,7 @@ private:
   /* The function being called.  */
   const registered_function _rfn;
   const function_instance _fi;
+  const function_group _group;
 
   /* The function call expression.  */
   tree m_exp;
@@ -552,7 +575,8 @@ static const predication preds_mxznone[] = { PRED_m, PRED_x, PRED_z,
 /* A list of all SVE ACLE functions.  */
 static const function_group function_groups[] = {
 #define DEF_SVE_FUNCTION(NAME, SHAPE, TYPES, PREDS) \
-  { #NAME, FUNC_##NAME, SHAPE_##SHAPE, types_##TYPES, preds_##PREDS },
+  { GROUP_ID (NAME, SHAPE, TYPES, PREDS), #NAME, FUNC_##NAME, SHAPE_##SHAPE, \
+types_##TYPES, preds_##PREDS },
 #include "aarch64-sve-builtins.def"
 };
 
@@ -610,11 +634,11 @@ report_out_of_range (location_t location, tree decl, unsigned int argno,
 }
 
 inline
-function_instance::function_instance (function func_in,
+function_instance::function_instance (group_id group_in,
   function_mode mode_in,
   const type_suffix_pair _in,
   predication pred_in)
-  : func (func_in), mode (mode_in), pred (pred_in)
+  : group (group_in), mode (mode_in), pred (pred_in)
 {
   memcpy (types, types_in, sizeof (types));
 }
@@ -622,7 +646,7 @@ function_instance::function_instance (function func_in,
 inline bool
 function_instance::operator== (const function_instance ) const
 {
-  return (func == other.func
+  return (group == other.group
 	  && mode == other.mode
 	  && pred == other.pred
 	  && types[0] == other.types[0]
@@ -639,7 +663,7 @@ hashval_t
 function_instance::hash () const
 {
   inchash::hash h;
-  h.add_int (func);
+  h.add_int (group);
   h.add_int (mode);
   h.add_int (types[0]);
   h.add_int (types[1]);
@@ -748,7 +772,7 @@ arm_sve_h_builder::build_all (function_signature signature,
   for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
 for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
   {
-	function_instance instance (group.func, mode, group.types[ti],
+	function_instance instance (group.id, mode, group.types[ti],
 group.preds[pi]);
 	(this->*signature) (instance, 

[C++ Patch] Fix three additional locations

2019-01-10 Thread Paolo Carlini

Hi again,

this one is also matter of consistency with, say, the precise location 
that we use for the error message at the beginning of check_methods. 
Indeed, the sequence of error messages of g++.dg/inherit/pure1.C reflect 
that. Tested x86_64-linux.


Thanks, Paolo.

PS: minor issues anyway, but I'm almost done with these low hanging 
fruits which I'm proposing to fix for 9 too


/

/cp
2019-01-10  Paolo Carlini  

* decl.c (cp_finish_decl): Improve error location.
(grokfield): Likewise, improve two locations.

/testsuite
2019-01-10  Paolo Carlini  

* g++.dg/g++.dg/cpp0x/pr62101.C: Test locations too.
* g++.dg/inherit/pure1.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 267807)
+++ cp/decl.c   (working copy)
@@ -7292,7 +7293,9 @@ cp_finish_decl (tree decl, tree init, bool init_co
synthesize_method (decl);
}
  else
-   error ("function %q#D is initialized like a variable", decl);
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "function %q#D is initialized like a variable",
+ decl);
}
  /* else no initialization required.  */
}
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 267807)
+++ cp/decl2.c  (working copy)
@@ -925,11 +925,13 @@ grokfield (const cp_declarator *declarator,
{
  gcc_assert (TREE_CODE (TREE_TYPE (value)) == FUNCTION_TYPE);
  if (friendp)
-   error ("initializer specified for friend function %qD",
-  value);
+   error_at (DECL_SOURCE_LOCATION (value),
+ "initializer specified for friend function %qD",
+ value);
  else
-   error ("initializer specified for static member function %qD",
-  value);
+   error_at (DECL_SOURCE_LOCATION (value),
+ "initializer specified for static member "
+ "function %qD", value);
}
}
   else if (TREE_CODE (value) == FIELD_DECL)
Index: testsuite/g++.dg/cpp0x/pr62101.C
===
--- testsuite/g++.dg/cpp0x/pr62101.C(revision 267807)
+++ testsuite/g++.dg/cpp0x/pr62101.C(working copy)
@@ -3,7 +3,7 @@
 
 struct X
 {
-  friend void g(X, int) = 0; // { dg-error "initializer specified for friend 
function" }
+  friend void g(X, int) = 0; // { dg-error "15:initializer specified for 
friend function" }
   friend void g(X, int) = default; // { dg-error "cannot be defaulted" }
   // { dg-prune-output "note" }
   friend void f(X, int) = delete;
Index: testsuite/g++.dg/inherit/pure1.C
===
--- testsuite/g++.dg/inherit/pure1.C(revision 267807)
+++ testsuite/g++.dg/inherit/pure1.C(working copy)
@@ -2,13 +2,13 @@
 // Origin: Volker Reichelt  
 // { dg-do compile }
 
-void foo0() = 0;   // { dg-error "like a variable" }
+void foo0() = 0;   // { dg-error "6:function .void foo0\\(\\). 
is initialized like a variable" }
 virtual void foo1() = 0;   // { dg-error "1:'virtual' outside class" }
-// { dg-error "like a variable" "" { target *-*-* } .-1 }
+// { dg-error "14:function .void foo1\\(\\). is initialized like a variable" 
"" { target *-*-* } .-1 }
 struct A
 {
-  void foo2() = 0; // { dg-error "non-virtual" }
-  static void foo3() = 0;  // { dg-error "static member" }
+  void foo2() = 0; // { dg-error "8:initializer specified for 
non-virtual method" }
+  static void foo3() = 0;  // { dg-error "15:initializer specified for 
static member function" }
   virtual static void foo4() = 0;  // { dg-error "both 'virtual' and 'static'" 
}
   virtual void foo5() = 0; // { dg-error "base class" }
 };
@@ -15,5 +15,6 @@ struct A
 
 struct B : A
 {
-  static void foo5() = 0;  // { dg-error "static member|declared" }
+  static void foo5() = 0;  // { dg-error "15:initializer specified for 
static member function" }
+// { dg-error "declared" "" { target *-*-* } .-1 }  
 };


[PATCH] Define __cpp_lib_erase_if feature test macro

2019-01-10 Thread Jonathan Wakely

The C++2a draft specifies the value 201811L for this, but as an
extension we return the number of elements erased. This is expected to
be standardised, so the macro has the value 201900L until a proper value
is specified in the draft.

* include/bits/erase_if.h: Define __cpp_lib_erase_if.
* include/std/deque: Likewise.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/string: Likewise.
* include/std/vector: Likewise.
* include/std/version: Likewise.
* testsuite/21_strings/basic_string/erasure.cc: Test macro.
* testsuite/23_containers/deque/erasure.cc: Likewise.
* testsuite/23_containers/forward_list/erasure.cc: Likewise.
* testsuite/23_containers/list/erasure.cc: Likewise.
* testsuite/23_containers/map/erasure.cc: Likewise.
* testsuite/23_containers/set/erasure.cc: Likewise.
* testsuite/23_containers/unordered_map/erasure.cc: Likewise.
* testsuite/23_containers/unordered_set/erasure.cc: Likewise.
* testsuite/23_containers/vector/erasure.cc: Likewise.

Tested x86_64-linux, committed to trunk.

commit 7d6b3fe12f1c70b8396627c60fbff30422705087
Author: Jonathan Wakely 
Date:   Thu Jan 10 13:26:17 2019 +

Define __cpp_lib_erase_if feature test macro

The C++2a draft specifies the value 201811L for this, but as an
extension we return the number of elements erased. This is expected to
be standardised, so the macro has the value 201900L until a proper value
is specified in the draft.

* include/bits/erase_if.h: Define __cpp_lib_erase_if.
* include/std/deque: Likewise.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/string: Likewise.
* include/std/vector: Likewise.
* include/std/version: Likewise.
* testsuite/21_strings/basic_string/erasure.cc: Test macro.
* testsuite/23_containers/deque/erasure.cc: Likewise.
* testsuite/23_containers/forward_list/erasure.cc: Likewise.
* testsuite/23_containers/list/erasure.cc: Likewise.
* testsuite/23_containers/map/erasure.cc: Likewise.
* testsuite/23_containers/set/erasure.cc: Likewise.
* testsuite/23_containers/unordered_map/erasure.cc: Likewise.
* testsuite/23_containers/unordered_set/erasure.cc: Likewise.
* testsuite/23_containers/vector/erasure.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/erase_if.h 
b/libstdc++-v3/include/bits/erase_if.h
index 9e865c6abb5..d84f5ffc8ed 100644
--- a/libstdc++-v3/include/bits/erase_if.h
+++ b/libstdc++-v3/include/bits/erase_if.h
@@ -38,6 +38,8 @@ namespace std
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+#define __cpp_lib_erase_if 201900L
+
   namespace __detail
   {
 template
diff --git a/libstdc++-v3/include/std/deque b/libstdc++-v3/include/std/deque
index 7932b1cdea5..ed4927e13b7 100644
--- a/libstdc++-v3/include/std/deque
+++ b/libstdc++-v3/include/std/deque
@@ -94,6 +94,9 @@ _GLIBCXX_END_NAMESPACE_VERSION
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#define __cpp_lib_erase_if 201900L
+
   template
 inline typename deque<_Tp, _Alloc>::size_type
 erase_if(deque<_Tp, _Alloc>& __cont, _Predicate __pred)
diff --git a/libstdc++-v3/include/std/forward_list 
b/libstdc++-v3/include/std/forward_list
index 93c95904fc3..3d3b6d4f5f6 100644
--- a/libstdc++-v3/include/std/forward_list
+++ b/libstdc++-v3/include/std/forward_list
@@ -65,6 +65,9 @@ _GLIBCXX_END_NAMESPACE_VERSION
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#define __cpp_lib_erase_if 201900L
+
   template
 inline typename forward_list<_Tp, _Alloc>::size_type 
 erase_if(forward_list<_Tp, _Alloc>& __cont, _Predicate __pred)
diff --git a/libstdc++-v3/include/std/list b/libstdc++-v3/include/std/list
index 5ea9a9619ba..7b02e8685d4 100644
--- a/libstdc++-v3/include/std/list
+++ b/libstdc++-v3/include/std/list
@@ -89,6 +89,9 @@ _GLIBCXX_END_NAMESPACE_VERSION
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#define __cpp_lib_erase_if 201900L
+
   template
 inline typename list<_Tp, _Alloc>::size_type
 erase_if(list<_Tp, _Alloc>& __cont, _Predicate __pred)
diff --git a/libstdc++-v3/include/std/string b/libstdc++-v3/include/std/string
index dc718b87357..caa54c24100 100644
--- a/libstdc++-v3/include/std/string
+++ b/libstdc++-v3/include/std/string
@@ -79,6 +79,9 @@ _GLIBCXX_END_NAMESPACE_VERSION
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#define __cpp_lib_erase_if 201900L
+
   template
 inline typename basic_string<_CharT, _Traits, _Alloc>::size_type
diff --git a/libstdc++-v3/include/std/vector b/libstdc++-v3/include/std/vector
index 059451801cb..2c90765b058 100644
--- a/libstdc++-v3/include/std/vector

[PATCH] Fix PR88792

2019-01-10 Thread Richard Biener


The following avoids a value-number as leader during PRE PHI translation
since that exposes us to bogus flow-sensitive info.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

>From d4849ecb2e82e49df4490d92f33c24a851f6e195 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Thu, 10 Jan 2019 13:34:01 +0100
Subject: [PATCH] fix-pr88792

2019-01-10  Richard Biener  

PR tree-optimization/88792
* tree-ssa-pre.c (get_representative_for): Do not return a
value-number here.

* gcc.dg/torture/pr88792.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/torture/pr88792.c 
b/gcc/testsuite/gcc.dg/torture/pr88792.c
new file mode 100644
index 000..e7f8fc0a624
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr88792.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+int one = 1;
+
+char
+__attribute__((noipa))
+foo(char v)
+{
+  int modec;
+
+  if (one)
+{
+  modec = ((v < 'A' || v > 'Z') ? v : v + ('a' - 'A'));
+  if (modec != 't' && modec != 'c' && modec != 'g')
+   modec = 0;
+}
+  else
+modec = 'g';
+
+  return modec;
+}
+
+int main(int argc, char **argv)
+{
+  char c = 't';
+  int r = foo (c);
+
+  if (r != c)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index a37eff6c7e3..3f38371cb21 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -1262,7 +1262,7 @@ get_representative_for (const pre_expr e, basic_block b = 
NULL)
   switch (e->kind)
 {
 case NAME:
-  return VN_INFO (PRE_EXPR_NAME (e))->valnum;
+  return PRE_EXPR_NAME (e);
 case CONSTANT:
   return PRE_EXPR_CONSTANT (e);
 case NARY:


[PATCH] Fix PR88775

2019-01-10 Thread Richard Biener


I am testing the following patch teaching VRP predicate analysis about

  __x.5_4 = (long unsigned int) "hello";
  __y.6_5 = (long unsigned int) _3;
  if (__x.5_4 != __y.6_5)

so that we know sth about the relation of the converted entities.
This appearantly (didn't back out other stuff) helps PR88775
after Jakubs changes to libstdc++ (before his changes a related
VN patch helped which meanwhile shows miscompiling
20_util/function_objects/comparisons_pointer.cc...).

I now see DOM performing all required optimization thanks to it
using the EVRP machinery.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK for trunk?

Any idea how we can have a reliable testcase for the std::string
optimization?

Thanks,
Richard.

>From 78a345845651565daac295f8dfbfc64cf5e8ccf3 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Thu, 10 Jan 2019 14:34:22 +0100
Subject: [PATCH] fix-pr88775-2

2019-01-10  Richard Biener  

PR tree-optimization/88775
* tree-vrp.c (register_edge_assert_for_2): Register asserts
from (T) a CMP (T) b.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 8d18e19d6e4..1efb907ae5e 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3247,6 +3247,42 @@ register_edge_assert_for_2 (tree name, edge e,
}
}
 }
+
+  /* For things like (T)a CMP (T)b register asserts for a CMP b if possible.  
*/
+  if (TREE_CODE_CLASS (comp_code) == tcc_comparison
+  && TREE_CODE (val) == SSA_NAME
+  && (INTEGRAL_TYPE_P (TREE_TYPE (val))
+ || POINTER_TYPE_P (TREE_TYPE (val
+{
+  gassign *def1 = dyn_cast  (SSA_NAME_DEF_STMT (name));
+  gassign *def2 = dyn_cast  (SSA_NAME_DEF_STMT (val));
+  if (def1 && def2
+ && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def1))
+ && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def2))
+ && types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (def1)),
+TREE_TYPE (gimple_assign_rhs1 (def2)))
+ && (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (def1)))
+ <= TYPE_PRECISION (TREE_TYPE (val)))
+ && (comp_code == EQ_EXPR
+ || comp_code == NE_EXPR
+ || (TYPE_SIGN (TREE_TYPE (val))
+ == TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (def1
+ || (TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (def1)))
+ && (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (def1)))
+ < TYPE_PRECISION (TREE_TYPE (val))
+   {
+ tree op0 = gimple_assign_rhs1 (def1);
+ tree op1 = gimple_assign_rhs1 (def2);
+ enum tree_code alt_comp_code = comp_code;
+ if (TREE_CODE (op0) != SSA_NAME)
+   {
+ alt_comp_code = swap_tree_comparison (alt_comp_code);
+ std::swap (op0, op1);
+   }
+ if (TREE_CODE (op0) == SSA_NAME)
+   add_assert_info (asserts, op0, op0, alt_comp_code, op1);
+   }
+}
 }
 
 /* OP is an operand of a truth value expression which is known to have


[PATCH] Fix part of PR87314, folding of != "foo"

2019-01-10 Thread Richard Biener


This fixes $subject and also "foo" != "bar" folding which was
somehow missing.  It fixes only parts of the PR since the PR
is about PTA tracking string constants.

It might help PR88775 but unless I can confirm that this is
just queued for GCC10.

You might notice I'm treating string merging possibilities
conservatively (defer to runtime).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

>From 8f14bac370b8334a42f985027394e9f3fdf9e2f1 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Thu, 10 Jan 2019 10:24:20 +0100
Subject: [PATCH] fix-pr87314-1

2019-01-10  Richard Biener  

PR middle-end/87314
* match.pd (cmp (convert1?@2 addr@0) (convert2? addr@1)):
Handle STRING_CST vs DECL or STRING_CST.

* gcc.dg/pr87314-1.c: New testcase.

diff --git a/gcc/match.pd b/gcc/match.pd
index 60b12f94f9e..95fa4e4a4dd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3896,6 +3896,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
|| TREE_CODE (base1) == SSA_NAME
|| TREE_CODE (base1) == STRING_CST))
  equal = (base0 == base1);
+   HOST_WIDE_INT ioff0 = -1, ioff1 = -1;
+   off0.is_constant ();
+   off1.is_constant ();
  }
  (if (equal == 1
  && (cmp == EQ_EXPR || cmp == NE_EXPR
@@ -3919,10 +3922,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (cmp == GT_EXPR && (known_gt (off0, off1) || known_le (off0, off1)))
{ constant_boolean_node (known_gt (off0, off1), type); }))
   (if (equal == 0
-  && DECL_P (base0) && DECL_P (base1)
-  /* If we compare this as integers require equal offset.  */
-  && (!INTEGRAL_TYPE_P (TREE_TYPE (@2))
-  || known_eq (off0, off1)))
+  && ((DECL_P (base0) && DECL_P (base1)
+   /* If we compare this as integers require equal offset.  */
+   && (!INTEGRAL_TYPE_P (TREE_TYPE (@2))
+   || known_eq (off0, off1)))
+  || (DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
+  || (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
+  || (TREE_CODE (base0) == STRING_CST
+  && TREE_CODE (base1) == STRING_CST
+  && ioff0 >= 0 && ioff1 >= 0
+  && ioff0 < TREE_STRING_LENGTH (base0)
+  && ioff1 < TREE_STRING_LENGTH (base1)
+  /* This is a too conservative test that the STRING_CSTs
+ will not end up being string-merged.  */
+  && strncmp (TREE_STRING_POINTER (base0) + ioff0,
+  TREE_STRING_POINTER (base1) + ioff1,
+  MIN (TREE_STRING_LENGTH (base0) - ioff0,
+   TREE_STRING_LENGTH (base1) - ioff1)) != 0)))
(switch
(if (cmp == EQ_EXPR)
 { constant_boolean_node (false, type); })
diff --git a/gcc/testsuite/gcc.dg/pr87314-1.c b/gcc/testsuite/gcc.dg/pr87314-1.c
new file mode 100644
index 000..4dc85c8eee6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87314-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-original" } */
+
+int f(){ int a; return ==(void *)"hello"; }
+int g(){ return "bye"=="hello"; }
+int h() { return "bye"=="hellobye"+5; }
+
+/* { dg-final { scan-tree-dump-times "hello" 1 "original" } } */
+/* The test in h() should be retained because the result depends on
+   string merging.  */
+/* { dg-final { scan-assembler "hello" } } */


[RFC] [Patch] [Debug] Add new NOTE to be used for debugging.

2019-01-10 Thread Matthew Malcomson
At the moment NOTE_INSN_FUNCTION_BEG is used for three different purposes.
The first is as a marker just before the first insn coming from a
"source code statement" of the function.
Bug 88432 is due to the fact that the note does not accurately point to
this logical position in a function -- in that case the stack protect
prologue is directly after NOTE_INSN_FUNCTION_BEG.

The second is (I believe) to make assumptions about what values are in the
parameter passing registers (in alias.c and calls.c).
(I'm not sure about this second use, if I am correctly reading this code then
it seems like a bug -- e.g. asan_emit_stack_protect inserts insns in the stream
that break the assumption that seems to be made.)

The third is as a marker to determine where to put extra code later in
sjlj_emit_function_enter from except.c, where to insert profiling code for a
function in final.c, and where to insert variable expansion code in
pass_expand::execute from cfgexpand.c.

These three uses seem to be at odds with each other -- insns that change the
values in the parameter passing registers store can come from automatically
inserted code like stack protection, and some requirements on where instructions
should get inserted have moved the position of this NOTE (e.g. see bugzilla bug
81186).

This patch splits the current note into two different notes, one to retain uses
2 and 3 above, and one for use in genrating debug information.

The first two uses are still attached to NOTE_INSN_FUNCTION_BEG, while the
debugging use is now implemented with NOTE_INSN_DEBUG_FUNCTION_BEG.

These two notes are put into the functions' insn chain in different
places during the expand pass, and can hence satisfy their respective
uses.

Bootstrapped and regtested on aarch64.
TODO -- Manual tests done on resulting debug information -- yet to be automated.

gcc/ChangeLog:

2019-01-10  Matthew Malcomson  

PR debug/88432
* cfgexpand.c (pass_expand::execute): Insert
NOTE_INSN_DEBUG_FUNCTION_BEG.
* function.c (thread_prologue_and_epilogue_insns): Account
for NOTE_INSN_DEBUG_FUNCTION_BEG.
* cfgrtl.c (duplicate_insn_chain): Account for new NOTE.
* doc/rtl.texi: Document new NOTE.
* dwarf2out.c (dwarf2out_source_line): Change comment to
reference new NOTE.
* final.c (asm_show_source): As above.
(final_scan_insn_1): Split action on NOTE_INSN_FUNCTION_BEG into
two, and move debugging info action to trigger on
NOTE_INSN_DEBUG_FUNCTION_BEG.
* insn-notes.def (INSN_NOTE): Add new NOTE.



### Attachment also inlined for ease of reply###


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 
60c1cfb4556e1a659db19f6719adccc1dab0fe46..491f441d01de226ba5aff2af8c71680b78648a12
 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6476,6 +6476,12 @@ pass_expand::execute (function *fun)
   if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ())
 stack_protect_prologue ();
 
+  /* Insert a NOTE that marks the end of "generated code" and the start of code
+ that comes from the user.  This is the point which dwarf2out.c will treat
+ as the beginning of the users code in this function.  e.g. GDB will stop
+ just after this note when breaking on entry to the function.  */
+  emit_note (NOTE_INSN_DEBUG_FUNCTION_BEG);
+
   expand_phi_nodes ();
 
   /* Release any stale SSA redirection data.  */
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 
172bdf585d036e27bcf53dba89c1ffc1b6cb84c7..d0cbca84aa3f14002a568a65e70016c3e15d6b9c
 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -4215,6 +4215,7 @@ duplicate_insn_chain (rtx_insn *from, rtx_insn *to)
case NOTE_INSN_DELETED_DEBUG_LABEL:
  /* No problem to strip these.  */
case NOTE_INSN_FUNCTION_BEG:
+   case NOTE_INSN_DEBUG_FUNCTION_BEG:
  /* There is always just single entry to function.  */
case NOTE_INSN_BASIC_BLOCK:
   /* We should only switch text sections once.  */
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 
583291018538722a19a9baf8c46c87cbdfe34216..a50d08483de0db84378e48c9334b48ff12548190
 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -3954,6 +3954,18 @@ identifies which region is associated with these notes.
 Appears at the start of the function body, after the function
 prologue.
 
+@findex NOTE_INSN_DEBUG_FUNCTION_BEG
+@item NOTE_INSN_DEBUG_FUNCTION_BEG
+This NOTE is inserted at the start of a function during RTL expansion.
+It is inserted at the point in a function where code coming directly from the
+"users source" starts. i.e. the first insn that appears after this note should
+be generated from the user code (and not from automatic code generation to
+support compiler features).
+This NOTE is used to ensure the first debug line number after the start of any
+function is the line number of the first source code statement in that
+function. 

Re: [PATCH] x86: Update VFIXUPIMM* Intrinsics to align with the latest Intel SDM

2019-01-10 Thread Matthias Kretz
On Donnerstag, 10. Januar 2019 11:39:56 CET Jakub Jelinek wrote:
> On Thu, Jan 10, 2019 at 10:46:14AM +0100, Dr. Matthias Kretz wrote:
> > _mm_fixupimm_ps(_mm_getexp_ps(x), x, _mm_set1_epi32(0x00550433), 0x00);
> 
> I guess you could use
> _mm_mask_fixupimm_ps(_mm_getexp_ps(x), -1, x, _mm_set1_epi32(0x00550433),
> 0x00); because that one does allow you to specify the dest operand.

Thanks. Actually _mm_getexp_ps produces the right answer already by itself. I 
only meant to demonstrate the fixupimm usage (e.g. if you calculate a trig 
function and then fix up for Annex F requirements).

BTW, your idea does not work because GCC recognizes the full writemask and 
simply produces the same as `_mm_fixupimm_ps(x, _mm_set1_epi32(0x00550433), 
0x00)`. See here: https://godbolt.org/z/-5Ql0f

> But I agree it is just weird, the non-masked intrinsics don't take into
> account the 0b cases anymore.

To be precise, they still do, but they produce garbage (e.g. https://
godbolt.org/z/f6u-GI).

-- 
──
 Dr. Matthias Kretzhttps://kretzfamily.de
 GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
 SIMD easy and portable https://github.com/VcDevel/Vc
──

[PATCH] Check AI_NUMERICSERV is defined before using it

2019-01-10 Thread Jonathan Wakely

The AI_NUMERICSERV constant is missing from old Darwin systems, so only
use it if it's supported.

* include/experimental/internet [AI_NUMERICSERV]
(resolver_base::numeric_service): Define conditionally.
* testsuite/experimental/net/internet/resolver/base.cc: Test it
conditionally.
* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Likewise.

Tested x86_64-linux, committed to trunk.

commit 263a9a4d85e1c2cbfc7fad5585d98dd133c874b1
Author: Jonathan Wakely 
Date:   Thu Jan 10 13:18:17 2019 +

Check AI_NUMERICSERV is defined before using it

The AI_NUMERICSERV constant is missing from old Darwin systems, so only
use it if it's supported.

* include/experimental/internet [AI_NUMERICSERV]
(resolver_base::numeric_service): Define conditionally.
* testsuite/experimental/net/internet/resolver/base.cc: Test it
conditionally.
* testsuite/experimental/net/internet/resolver/ops/lookup.cc:
Likewise.

diff --git a/libstdc++-v3/include/experimental/internet 
b/libstdc++-v3/include/experimental/internet
index cd0eee29585..07c62e697cd 100644
--- a/libstdc++-v3/include/experimental/internet
+++ b/libstdc++-v3/include/experimental/internet
@@ -1629,7 +1629,9 @@ namespace ip
   __flags_passive  = AI_PASSIVE,
   __flags_canonical_name   = AI_CANONNAME,
   __flags_numeric_host = AI_NUMERICHOST,
+#ifdef AI_NUMERICSERV
   __flags_numeric_service  = AI_NUMERICSERV,
+#endif
   __flags_v4_mapped= AI_V4MAPPED,
   __flags_all_matching = AI_ALL,
   __flags_address_configured   = AI_ADDRCONFIG
@@ -1637,7 +1639,9 @@ namespace ip
 static constexpr flags passive = __flags_passive;
 static constexpr flags canonical_name  = __flags_canonical_name;
 static constexpr flags numeric_host= __flags_numeric_host;
+#ifdef AI_NUMERICSERV
 static constexpr flags numeric_service = __flags_numeric_service;
+#endif
 static constexpr flags v4_mapped   = __flags_v4_mapped;
 static constexpr flags all_matching= __flags_all_matching;
 static constexpr flags address_configured  = __flags_address_configured;
diff --git a/libstdc++-v3/testsuite/experimental/net/internet/resolver/base.cc 
b/libstdc++-v3/testsuite/experimental/net/internet/resolver/base.cc
index 657e2f56b43..746557af656 100644
--- a/libstdc++-v3/testsuite/experimental/net/internet/resolver/base.cc
+++ b/libstdc++-v3/testsuite/experimental/net/internet/resolver/base.cc
@@ -49,7 +49,9 @@ test01()
   (void) resolver::passive;
   (void) resolver::canonical_name;
   (void) resolver::numeric_host;
+#ifdef AI_NUMERICSERV
   (void) resolver::numeric_service;
+#endif
   (void) resolver::v4_mapped;
   (void) resolver::all_matching;
   (void) resolver::address_configured;
diff --git 
a/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc 
b/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc
index d926385f1a8..39fb7fd7708 100644
--- a/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc
+++ b/libstdc++-v3/testsuite/experimental/net/internet/resolver/ops/lookup.cc
@@ -49,7 +49,10 @@ test02()
   std::error_code ec;
   io_context ctx;
   ip::tcp::resolver resolv(ctx);
-  auto flags = ip::resolver_base::numeric_host | 
ip::tcp::resolver::numeric_service;
+  auto flags = ip::resolver_base::numeric_host;
+#ifdef AI_NUMERICSERV
+  flags |= ip::tcp::resolver::numeric_service;
+#endif
   auto addrs = resolv.resolve("127.0.0.1", "42", flags, ec);
   VERIFY( !ec );
   VERIFY( addrs.size() > 0 );


Re: [PATCH] [RFC] PR target/52813 and target/11807

2019-01-10 Thread Segher Boessenkool
Hi!

On Tue, Jan 08, 2019 at 12:03:06PM +, Richard Sandiford wrote:
> Bernd Edlinger  writes:
> > Meanwhile I found out, that the stack clobber has only been ignored up to
> > gcc-5 (at least with lra targets, not really sure about reload targets).
> > From gcc-6 on, with the exception of PR arm/77904 which was a regression due
> > to the underlying lra change, but fixed later, and back-ported to gcc-6.3.0,
> > this works for all targets I tried so far.
> >
> > To me, it starts to look like a rather unique and useful feature, that I 
> > would
> > like to keep working.
> 
> Not sure what you mean by "unique".  But forcing a frame is a bit of
> a slippery concept.  Force it where?  For the asm only, or the whole
> function?  This depends on optimisation and hasn't been consistent
> across GCC versions, since it depends on the shrink-wrapping
> optimisation.  (There was a similar controversy a while ago about
> to what extent -fno-omit-frame-pointer should "force a frame".)

It's not forcing a frame currently: it's just setting frame_pointer_needed.
Whatever happens from that is the target's business.

> The effect on the redzone seems like something that should be specified
> explicitly rather than as an (accidental?) side effect of listing the
> sp in the clobber list.  Maybe this would be another use for the "asm
> attributes" proposal.  "noreturn" was another attribute suggested on
> IRC yesterday.

Redzone is target-dependent.

"noreturn"...  What would that mean, *exactly*?  It cannot execute any
code the compiler can see, so such asm is better off as real asm anyway
(not inline asm).

> But either way, the general feeling seems to be that going straight to a
> hard error is too harsh, since there's quite a bit of existing code that
> has the clobber.  This patch implements the compromise discussed on IRC
> yesterday of making it a -Wdeprecated warning instead.

The patch looks fine to me.  Thanks!


Segher


Re: [v3 PATCH] Implement LWG 2221, No formatted output operator for nullptr

2019-01-10 Thread Jonathan Wakely

On 04/12/17 23:04 +, Jonathan Wakely wrote:

On 03/12/17 23:08 +0200, Ville Voutilainen wrote:

Tested on Linux-x64.

2017-11-14  Ville Voutilainen  

  Implement LWG 2221
  * include/std/ostream (operator<<(nullptr_t)): New.
  * testsuite/27_io/basic_ostream/inserters_other/char/lwg2221.cc: New.



diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index f7cab03..18011bc 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -245,6 +245,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 operator<<(const void* __p)
 { return _M_insert(__p); }

+#if __cplusplus > 201402L
+  __ostream_type&
+  operator<<(nullptr_t)
+  { return *this << "nullptr"; }
+#endif


As discussed on IRC, this requires a new symbol to be exported for the
std::ostream and std::wostream explicit instantiations, or the new
test will fail to link at -O0.

That should wait for stage 1.



This patch for a C++17 feature (posted over a year ago) should have
gone in during stage 1. I've taken care of the symbol exports that
were missing from the original patch.

Tested x86_64-linux, committed to trunk.

commit 81a14252318562c32e5a0c466a39568e741ab6e6
Author: Jonathan Wakely 
Date:   Thu Jan 10 11:44:19 2019 +

Implement LWG 2221: formatted output operator for nullptr

2019-01-10  Ville Voutilainen  
Jonathan Wakely  

Implement LWG 2221
* config/abi/pre/gnu.ver (GLIBCXX_3.4): Tighten patterns.
(GLIBCXX_3.4.26): Add new exports.
* include/Makefile.am: Add ostream-inst.cc. Move string-inst.cc to
correct list of sources.
* include/Makefile.in: Regenerate.
* include/std/ostream (operator<<(nullptr_t)): New member function.
* src/c++17/ostream-inst.cc: New file.
* testsuite/27_io/basic_ostream/inserters_other/char/lwg2221.cc: New
test.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 92af1aec7b4..788c2e0303c 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -495,7 +495,7 @@ GLIBCXX_3.4 {
 _ZNSo8_M_writeEPKc[ilx];
 _ZNSo3put*;
 _ZNSo[5-9][a-z]*;
-_ZNSolsE*[^g];
+_ZNSolsE*[^Dg];
 
 # std::basic_ostream
 _ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]Ev;
@@ -509,7 +509,7 @@ GLIBCXX_3.4 {
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE5writeEPKw*;
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE6sentry*;
 _ZNSt13basic_ostreamIwSt11char_traitsIwEE8_M_writeEPKw[ilx];
-_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE*[^g];
+_ZNSt13basic_ostreamIwSt11char_traitsIwEElsE*[^Dg];
 
 # std::ostream operators and inserters
 _ZSt4end[ls]I[cw]St11char_traitsI[cw]EERSt13basic_ostream*;
@@ -2223,6 +2223,10 @@ GLIBCXX_3.4.26 {
 _ZNSt10filesystem7__cxx1128recursive_directory_iteratoraSEOS1_;
 _ZNSt10filesystem7__cxx1128recursive_directory_iteratorppEv;
 
+# basic_ostream::operator<<(nullptr_t)
+_ZNSolsEDn;
+_ZNSt13basic_ostreamIwSt11char_traitsIwEElsEDn;
+
 } GLIBCXX_3.4.25;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 50c99b25d58..2541d978886 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -245,6 +245,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   operator<<(const void* __p)
   { return _M_insert(__p); }
 
+#if __cplusplus >= 201703L
+  __ostream_type&
+  operator<<(nullptr_t)
+  { return *this << "nullptr"; }
+#endif
+
   /**
*  @brief  Extracting from another streambuf.
*  @param  __sb  A pointer to a streambuf
diff --git a/libstdc++-v3/src/c++17/Makefile.am b/libstdc++-v3/src/c++17/Makefile.am
index 4200f7f8259..1a00770081e 100644
--- a/libstdc++-v3/src/c++17/Makefile.am
+++ b/libstdc++-v3/src/c++17/Makefile.am
@@ -41,6 +41,8 @@ endif
 if ENABLE_EXTERN_TEMPLATE
 # XTEMPLATE_FLAGS = -fno-implicit-templates
 inst_sources = \
+	ostream-inst.cc \
+	string-inst.cc \
 	$(extra_string_inst_sources)
 else
 # XTEMPLATE_FLAGS =
@@ -52,7 +54,6 @@ sources = \
 	fs_ops.cc \
 	fs_path.cc \
 	memory_resource.cc \
-	string-inst.cc \
 	$(extra_fs_sources)
 
 vpath % $(top_srcdir)/src/c++17
diff --git a/libstdc++-v3/src/c++17/ostream-inst.cc b/libstdc++-v3/src/c++17/ostream-inst.cc
new file mode 100644
index 000..b2fbafcef91
--- /dev/null
+++ b/libstdc++-v3/src/c++17/ostream-inst.cc
@@ -0,0 +1,42 @@
+// std::ostream instantiations for C++17 -*- C++ -*-
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library 

Re: [PATCH v2] Fix PR64242

2019-01-10 Thread Wilco Dijkstra
Hi Jakub,

Any other comments? I'd like to finish this rather than leaving it in its 
current
half-done state.

Wilco
  

Hi,

Jakub Jelinek wrote:
On Fri, Dec 07, 2018 at 04:19:22PM +, Wilco Dijkstra wrote:

>> The test case doesn't need an aligned object to fail, so why did you add it?
>
> It needed it on i686, because otherwise it happened to see the value it
> wanted in the caller's stack frame.

Right, so I fixed that by increasing the size of the frame in broken_setjmp to 
be
larger than the frame in main, so it's now extremely unlikely to accidentally 
read
from a random stack location and end up with a valid stack pointer.

> >> +  /* Compute expected next alloca offset - some targets don't align 
> >> properly
> >> + and allocate too much.  */
> >> +  p = q + (q - p);
> >
> > This is UB, pointer difference is only defined within the same object.
> > So, you can only do such subtraction in some integral type rather than as
> > pointer subtraction. 
> 
> __builtin_setjmp is already undefined behaviour, and the stack corruption is
> even more undefined - trying to avoid harmless theoretical undefined behaviour
> wouldn't be helpful.

> No, __builtin_setjmp is a GNU extension, not undefined behavior.  

Well the evidence is that it's undocumented, unspecified and causes undefined
behaviour...

> And 
> something that is UB and might be harmless today might be harmful tomorrow,
> gcc optimizes heavily on the assumption that UB doesn't happen in the
> program, so might optimize that subtraction to 0 or 42 or whatever else.
>
>> > And I'm not sure you have a guarantee that every zero sized alloca is at 
>> > the
>> > same offset from the previous one.
>> 
>> The above pointer adjustment handles the case where alloca overallocates.
>> It passes on x86-64 which always adds 8 unnecessary bytes.
>
> What guarantee is there that it overallocates each time the same though?

How could it not be? It could only vary if it was reading an uninitialized 
register or
adding a random extra amount as a form of ASLR. But there is no point in trying
to support future unknown features/bugs since it will give false negatives 
today.

Wilco
    

Re: [PATCH] Fix PR84521

2019-01-10 Thread Wilco Dijkstra


ping


From: Wilco Dijkstra
Sent: 14 December 2018 13:16
To: GCC Patches
Cc: nd
Subject: [PATCH] Fix PR84521
  

This fixes and simplifies the setjmp and non-local goto implementation.
Currently the virtual frame pointer is saved when using __builtin_setjmp or
a non-local goto.  Depending on whether a frame pointer is used, this may
either save SP or FP with an immediate offset.  However the goto or longjmp
always updates the hard frame pointer.

A receiver veneer in the original function then assigns the hard frame pointer
to the virtual frame pointer, which should, if it works correctly, again assign
SP or FP.  However the special elimination code in eliminate_regs_in_insn
doesn't do this correctly unless the frame pointer is used, and even if it
worked by writing SP, the frame pointer would still be corrupted.

A much simpler implementation is to always save and restore the hard frame
pointer.  This avoids 2 redundant instructions which add/subtract the virtual
frame offset.  A large amount of code can be removed as a result, including all
implementations of TARGET_BUILTIN_SETJMP_FRAME_VALUE (all of which already use
the hard frame pointer).  The expansion of nonlocal_goto on PA can be simplied
to just restore the hard frame pointer. 

This fixes the most obvious issues, however there are still issues on targets
which define HARD_FRAME_POINTER_IS_FRAME_POINTER (arm, mips, xtensa).
Each function could have a different hard frame pointer, so a non-local goto
may restore the wrong frame pointer (TARGET_BUILTIN_SETJMP_FRAME_VALUE could
be useful for this).

The i386 TARGET_BUILTIN_SETJMP_FRAME_VALUE was incorrect: if stack_realign_fp
is true, it would save the hard frame pointer value but restore the virtual
frame pointer which according to ix86_initial_elimination_offset can have a
non-zero offset from the hard frame pointer.

The ia64 implementation of nonlocal_goto seems incorrect since the helper
function moves the the frame pointer value into the static chain register
(so this patch does nothing to make it better or worse).

AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm.

ChangeLog:
2018-12-13  Wilco Dijkstra  

gcc/
    PR middle-end/84521
    * builtins.c (expand_builtin_setjmp_setup): Save hard_frame_pointer_rtx.
    (expand_builtin_setjmp_receiver): Do not emit sfp = fp move since we 
restore fp.
    * function.c (expand_function_start): Save hard_frame_pointer_rtx for 
non-local goto.
    * lra-eliminations.c (eliminate_regs_in_insn): Remove sfp = fp 
elimination code.
    (remove_reg_equal_offset_note): Remove unused function.
    * reload1.c (eliminate_regs_in_insn): Remove sfp = fp elimination code.
    * config/arc/arc.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
    (arc_builtin_setjmp_frame_value): Remove function.
    * config/avr/avr.c  (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
    (avr_builtin_setjmp_frame_value): Remove function.
    * config/i386/i386.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
    (ix86_builtin_setjmp_frame_value): Remove function.
    * config/pa/pa.md (nonlocal_goto): Remove FP adjustment.
    * config/sparc/sparc.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
    (sparc_builtin_setjmp_frame_value): Remove function.
    * config/vax/vax.c (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Remove.
    (vax_builtin_setjmp_frame_value): Remove function.

testsuite/
    PR middle-end/84521
    * gcc.c-torture/execute/pr84521.c: New test.

---
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 
2ef9c9afcc69fcb775dc6a6fff550025bdc76337..55b78adbc3df8c970083e6d9b548a8ca7dc52600
 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -982,7 +982,7 @@ expand_builtin_setjmp_setup (rtx buf_addr, rtx 
receiver_label)
 
   mem = gen_rtx_MEM (Pmode, buf_addr);
   set_mem_alias_set (mem, setjmp_alias_set);
-  emit_move_insn (mem, targetm.builtin_setjmp_frame_value ());
+  emit_move_insn (mem, hard_frame_pointer_rtx);
 
   mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, buf_addr,
    GET_MODE_SIZE (Pmode))),
@@ -1024,31 +1024,6 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
   if (chain && REG_P (chain))
 emit_clobber (chain);
 
-  /* Now put in the code to restore the frame pointer, and argument
- pointer, if needed.  */
-  if (! targetm.have_nonlocal_goto ())
-    {
-  /* First adjust our frame pointer to its actual value.  It was
-    previously set to the start of the virtual area corresponding to
-    the stacked variables when we branched here and now needs to be
-    adjusted to the actual hardware fp value.
-
-    Assignments to virtual registers are converted by
-    instantiate_virtual_regs into the corresponding assignment
-    to the underlying register (fp in this case) that makes
-    the original assignment true.
-    So the following insn will actually be decrementing fp by
-    

Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On Thu, Jan 10, 2019 at 11:05 AM Jakub Jelinek  wrote:
>
> On Thu, Jan 10, 2019 at 10:53:32AM +, Ramana Radhakrishnan wrote:
> > > 2018-11-23  Ramana Radhakrishnan  
> > >
> > >  * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
> > >  * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > > Handle
> > >  and put in error checks for stack protector guard options.
> > >  (aarch64_stack_protect_guard): New.
> > >  (TARGET_STACK_PROTECT_GUARD): Define.
> > >  * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
> > >  (reg_stack_protect_address): New.
> > >  (stack_protect_set): Adjust for SSP_GLOBAL.
> > >  (stack_protect_test): Likewise.
> > >  * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
> > >  (-mstack-protector-guard): Likewise.
> > >  (-mstack-protector-guard-offset): Likewise.
> > >  * doc/invoke.texi: Document new AArch64 options.
> >
> > Any further thoughts or is it just Jakub's comments that I need to
> > address on this patch ? It looks like the kernel folks have queued
> > this for the next kernel release and given this is helping the kernel
> > with a security feature, can we move this forward ?
>
> From RM POV this is ok in stage4 if you commit it RSN.
> Both x86 and powerpc have -mstack-protector-guard{,-reg,-offset}= options,
> x86 even has -mstack-protector-guard-symbol=.  So it would be nice if the
> aarch64 options are compatible with those other arches.
>

Thanks Jakub. I haven't added the -mstack-protector-guard-symbol as
there is no requirement to do so now and I don't want to add an option
that isn't being used. IIRC, the other options seem to be in sync with
x86 and powerpc.

> Please make sure you don't regress non-glibc SSP support (don't repeat
> PR85644/PR86832).
>

That should be ok as I'm not changing any defaults. I would expect
that non-glibc based libraries that support SSP must be mimicking
glibc support for this using the global symbol as there is nothing
special in the backend for this today. I guess there is freebsd as a
non-glibc target or musl that I can look at but I don't expect that to
be an issue.

I'll wait until tomorrow to respin just to see if I can get any
further feedback.

regards
Ramana



> Jakub


[Patch]Bug 84762 - GCC for PowerPC32 violates the SysV ABI spec for small struct returns

2019-01-10 Thread Lokesh Janghel
Hi Segher,

Find the attached patch for the subjected issue.
Please let me know your thoughts and comments on the same.

>Do you have a copyright assignment with the FSF?
We don't have a copyright assignment with FSF.

>-  if (!global_options_set.x_aix_struct_return)
>+  if (!global_options_set.x_aix_struct_return
>+&& !rs6000_current_svr4_struct_return)
According to the value of aix_struct_return it will decide which one need to use
register or memory. After that, it will check which alignment is there
for register
according to the given option.

ChangeLogs
/gcc/ChangeLog
2019-01-10  Lokesh Janghel  

PR target/84762
* config/rs6000/rs6000.c (rs6000_return_in_msb): Retrun in svr4
for small struct value.
(rs6000_option_override_internal): Add the condition for aix or
svr4 (LSB/MSB aligned).
* config/rs6000/rs6000.opt: Extend the -msvr4-struct-return option
for LSB aligned value and MSB aligned value.

/gcc/testsuite/ChangeLog
2019-01-10  Lokesh Janghel  

PR target/84762
* gcc.target/pr84762-1.c: New testcase.
* gcc.target/pr84762-2.c: New testcase.
* gcc.target/pr84762-3.c: New testcase.


-- 
Thanks
Lokesh Janghel


84762.patch
Description: Binary data


[PATCH, d] Add README for process contributing to dmd and phobos

2019-01-10 Thread Iain Buclaw
Hi,

Joseph made mention that there isn't a readme documenting where
changes to d/dmd, libphobos/libdruntime, and libphobos/src should go.

I hope this clears things up.  OK for trunk?

-- 
Iain
---
gcc/d/ChangeLog:

2019-01-10  Iain Buclaw  

* README.gcc: New file.

libphobos/ChangeLog:

2019-01-10  Iain Buclaw  

* README.gcc: New file.

---
diff --git a/gcc/d/README.gcc b/gcc/d/README.gcc
new file mode 100644
index 000..757545cc705
--- /dev/null
+++ b/gcc/d/README.gcc
@@ -0,0 +1,11 @@
+The files in the dmd subdirectory are part of the front-end for the
+Digital Mars D compiler, hosted at https://github.com/dlang/dmd/.
+
+They cover the lexical analysis, parsing, and semantic analysis of the
+D Programming Language defined in the documents at https://dlang.org/.
+
+To report a problem or look up known issues with the dmd front-end,
+please visit the issue tracker at https://issues.dlang.org/.
+
+All changes to dmd should go through the upstream repository first,
+then merged back to GCC.
diff --git a/libphobos/README.gcc b/libphobos/README.gcc
new file mode 100644
index 000..53593783995
--- /dev/null
+++ b/libphobos/README.gcc
@@ -0,0 +1,26 @@
+The files in this directory where noted are part of the DRuntime
+and Phobos library.
+
+DRuntime is the low-level runtime library backing the D programming
+language, hosted at https://github.com/dlang/druntime/.
+
+Phobos is the standard library for the D Programming Language, hosted
+at https://github.com/dlang/phobos/.
+
+The following sources and directories are part of DRuntime:
+  libdruntime/core/
+  libdruntime/gc/
+  libdruntime/gcstub/
+  libdruntime/object.d
+  libdruntime/rt/
+
+The following sources and directories are part of Phobos:
+  src/etc/
+  src/index.d
+  src/std/
+
+To report a bug or look up known issues with the runtime or standard
+library please visit the issue tracker at https://issues.dlang.org/.
+
+All changes to either of these libraries should go through the
+upstream repository first, then merged back to GCC.


Re: [RS6000] Implement -mno-pltseq

2019-01-10 Thread Alan Modra
On Thu, Jan 10, 2019 at 05:09:06AM -0600, Segher Boessenkool wrote:
> On Mon, Jan 07, 2019 at 12:54:02PM +1030, Alan Modra wrote:
> > Since the last patch untangled inline PLT and TLS marker support there
> > now isn't a way of requesting the older long call sequences on a
> > compiler built with inline PLT support.  This patch adds support for
> > a new -mno-pltseq option.
> 
> Is this a useful option to have?

I figure the most use it will get is by gcc maintainers and others
investigating bug reports.  You get a bug report and can't reproduce
the failure even though compiler flags and gcc version are well
specified.  -mno-pltseq gives you a way of quickly trying out
something that might be different in their auto-host.h to yours.

> It needs documentation (in the manual), then.

OK, will do.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] ARM: fix -masm-syntax-unified (PR88648)

2019-01-10 Thread Kyrill Tkachov

Hi Stefan,

On 08/01/19 09:33, Kyrill Tkachov wrote:

Hi Stefan,

On 01/01/19 23:34, Stefan Agner wrote:
> This allows to use unified asm syntax when compiling for the
> ARM instruction. This matches documentation and seems what the
> initial patch was intended doing when the flag got added.
> ---
>  gcc/config/arm/arm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3419b6bd0f8..67b2b199f3f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3095,7 +3095,8 @@ arm_option_override_internal (struct gcc_options *opts,
>
>/* Thumb2 inline assembly code should always use unified syntax.
>   This will apply to ARM and Thumb1 eventually.  */
> -  opts->x_inline_asm_unified = TARGET_THUMB2_P (opts->x_target_flags);
> +  if (TARGET_THUMB2_P (opts->x_target_flags))
> +opts->x_inline_asm_unified = true;

This looks right to me and is the logic we had in GCC 5.
How has this patch been tested?

Can you please provide a ChangeLog entry for this patch[1].



I've bootstrapped and tested this, together with your testsuite patch on 
arm-none-linux-gnueabihf
and committed both with r267804 with the following ChangeLog entries:

2019-01-10  Stefan Agner  

PR target/88648
* config/arm/arm.c (arm_option_override_internal): Force
opts->x_inline_asm_unified to true only if TARGET_THUMB2_P.

2019-01-10  Stefan Agner  

PR target/88648
* gcc.target/arm/pr88648-asm-syntax-unified.c: Add test to
check if -masm-syntax-unified gets applied properly.

Thank you for the patch. If you plan to contribute more patches in the future I 
suggest you
sort out the copyright assignment paperwork.

I believe this fix needs to be backported to the branches.
I'll do so after a few days of testing on trunk.

Thanks again,
Kyrill


Thanks,
Kyrill

[1] https://gcc.gnu.org/contribute.html

>
>  #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
>SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
> --
> 2.20.1
>





Re: [RS6000] Implement -mno-pltseq

2019-01-10 Thread Segher Boessenkool
On Mon, Jan 07, 2019 at 12:54:02PM +1030, Alan Modra wrote:
> Since the last patch untangled inline PLT and TLS marker support there
> now isn't a way of requesting the older long call sequences on a
> compiler built with inline PLT support.  This patch adds support for
> a new -mno-pltseq option.

Is this a useful option to have?

It needs documentation (in the manual), then.


Segher


Re: [RS6000] Emit inline PLT when -mno-tls-markers

2019-01-10 Thread Segher Boessenkool
Hi Alan,

This patch is okay (for trunk, and backports if needed), thanks!

I'll review your other patches next week.


Segher


On Mon, Jan 07, 2019 at 12:28:44PM +1030, Alan Modra wrote:
> I restricted output of inline PLT sequences to when TLS marker relocs
> were also available, which is obviously true when just considering
> assembler support.  However, there is a -mno-tls-markers option to
> disable emitting the marker relocs.  Currently that option also
> disables inline PLT sequences, which is a bug (*).  This patch fixes
> that problem.
> 
> *) To be honest, it was a deliberate bug.  I didn't want to have to
> deal with inline PLT __tls_get_addr sequences lacking the marker
> relocs in the linker, but it turns out the existing linker support for
> old-style __tls_get_addr calls works reasonably well.
> 
> Bootstrapped and regression tested powerpc64le-linux and
> powerpc64-linux, with and without -mno-tls-markers.  OK to apply?
> 
>   * config/rs6000/rs6000.c (rs6000_indirect_call_template_1),
>   (rs6000_pltseq_template): Guard output of TLS markers with
>   TARGET_TLS_MARKERS.
>   (rs6000_longcall_ref, rs6000_call_aix, rs6000_call_sysv),
>   (rs6000_sibcall_sysv): Ignore TARGET_TLS_MARKERS when deciding
>   to use inline PLT sequences.
>   * config/rs6000/rs6000.md (pltseq_tocsave_),
>   (pltseq_plt16_ha_, pltseq_plt16_lo_),
>   (pltseq_mtctr_): Don't test TARGET_TLS_MARKERS in predicate.


Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Jakub Jelinek
On Thu, Jan 10, 2019 at 10:53:32AM +, Ramana Radhakrishnan wrote:
> > 2018-11-23  Ramana Radhakrishnan  
> >
> >  * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
> >  * config/aarch64/aarch64.c (aarch64_override_options_internal):
> > Handle
> >  and put in error checks for stack protector guard options.
> >  (aarch64_stack_protect_guard): New.
> >  (TARGET_STACK_PROTECT_GUARD): Define.
> >  * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
> >  (reg_stack_protect_address): New.
> >  (stack_protect_set): Adjust for SSP_GLOBAL.
> >  (stack_protect_test): Likewise.
> >  * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
> >  (-mstack-protector-guard): Likewise.
> >  (-mstack-protector-guard-offset): Likewise.
> >  * doc/invoke.texi: Document new AArch64 options.
> 
> Any further thoughts or is it just Jakub's comments that I need to
> address on this patch ? It looks like the kernel folks have queued
> this for the next kernel release and given this is helping the kernel
> with a security feature, can we move this forward ?

>From RM POV this is ok in stage4 if you commit it RSN.
Both x86 and powerpc have -mstack-protector-guard{,-reg,-offset}= options,
x86 even has -mstack-protector-guard-symbol=.  So it would be nice if the
aarch64 options are compatible with those other arches.

Please make sure you don't regress non-glibc SSP support (don't repeat
PR85644/PR86832).

Jakub


[PATCH] Include name of test in filesystem-test.XXXXXX filenames

2019-01-10 Thread Jonathan Wakely

Also fix some tests that were not cleaning up after themselves, as
identified by the change to nonexistent_path.

* testsuite/util/testsuite_fs.h (nonexistent_path): Include name
of the source file containing the caller.
* testsuite/27_io/filesystem/iterators/directory_iterator.cc: Remove
directories created by test.
* testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
* testsuite/experimental/filesystem/iterators/directory_iterator.cc:
Likewise.
* testsuite/experimental/filesystem/iterators/
recursive_directory_iterator.cc: Likewise.

Tested x86_64-linux, committed to trunk, and will backport to
gcc-8-branch.


commit d5b08bba526157d4cec585afade777af5f235aa7
Author: Jonathan Wakely 
Date:   Thu Jan 10 10:20:49 2019 +

Include name of test in filesystem-test.XX filenames

Also fix some tests that were not cleaning up after themselves, as
identified by the change to nonexistent_path.

* testsuite/util/testsuite_fs.h (nonexistent_path): Include name
of the source file containing the caller.
* testsuite/27_io/filesystem/iterators/directory_iterator.cc: Remove
directories created by test.
* 
testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
* testsuite/experimental/filesystem/iterators/directory_iterator.cc:
Likewise.
* testsuite/experimental/filesystem/iterators/
recursive_directory_iterator.cc: Likewise.

diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/directory_iterator.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/directory_iterator.cc
index ddb424b4be0..5288bd297bd 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/iterators/directory_iterator.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/directory_iterator.cc
@@ -138,6 +138,9 @@ test05()
   static_assert( noexcept(begin(it)), "begin is noexcept" );
   VERIFY( end(it) == endit );
   static_assert( noexcept(end(it)), "end is noexcept" );
+
+  std::error_code ec;
+  remove_all(p, ec);
 }
 
 int
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
index bf67bfd215b..47b3266d3eb 100644
--- 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
+++ 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
@@ -179,6 +179,9 @@ test05()
   static_assert( noexcept(begin(it)), "begin is noexcept" );
   VERIFY( end(it) == endit );
   static_assert( noexcept(end(it)), "end is noexcept" );
+
+  std::error_code ec;
+  remove_all(p, ec);
 }
 
 int
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/directory_iterator.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/directory_iterator.cc
index 758291afcd6..cc3cd879865 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/directory_iterator.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/directory_iterator.cc
@@ -129,6 +129,9 @@ test05()
   static_assert( noexcept(begin(it)), "begin is noexcept" );
   VERIFY( end(it) == endit );
   static_assert( noexcept(end(it)), "end is noexcept" );
+
+  std::error_code ec;
+  remove_all(p, ec);
 }
 
 int
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
index ad37ba33f8e..6217aca8b9a 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
@@ -176,6 +176,9 @@ test05()
   static_assert( noexcept(begin(it)), "begin is noexcept" );
   VERIFY( end(it) == endit );
   static_assert( noexcept(end(it)), "end is noexcept" );
+
+  std::error_code ec;
+  remove_all(p, ec);
 }
 
 int
diff --git a/libstdc++-v3/testsuite/util/testsuite_fs.h 
b/libstdc++-v3/testsuite/util/testsuite_fs.h
index 64215690083..48f503b3c27 100644
--- a/libstdc++-v3/testsuite/util/testsuite_fs.h
+++ b/libstdc++-v3/testsuite/util/testsuite_fs.h
@@ -86,8 +86,18 @@ namespace __gnu_test
   // This is NOT supposed to be a secure way to get a unique name!
   // We just need a path that doesn't exist for testing purposes.
   test_fs::path
-  nonexistent_path()
+  nonexistent_path(std::string file = __builtin_FILE())
   {
+// Include the caller's filename to help identify tests that fail to
+// clean up the files they create.
+// Remove .cc extension:
+if (file.length() > 3 && file.compare(file.length() - 3, 3, ".cc") == 0)
+  file.resize(file.length() - 3);
+// And directory:
+auto pos = file.find_last_of("/\\");
+if (pos != file.npos)
+ 

Re: [RFC][AArch64] Add support for system register based stack protector canary access

2019-01-10 Thread Ramana Radhakrishnan
On Mon, Dec 3, 2018 at 9:55 AM Ramana Radhakrishnan
 wrote:
>
> For quite sometime the kernel guys, (more specifically Ard) have been
> talking about using a system register (sp_el0) and an offset from that
> for a canary based access. This patchset adds support for a new set of
> command line options similar to how powerpc has done this.
>
> I don't intend to change the defaults in userland, we've discussed this
> for user-land in the past and as far as glibc and userland is concerned
> we stick to the options as currently existing. The system register
> option is really for the kernel to use along with an offset as they
> control their ABI and this is a decision for them to make.
>
> I did consider sticking this all under a mcmodel=kernel-small option but
> thought that would be a bit too aggressive. There is very little error
> checking I can do in terms of the system register being used and really
> the assembler would barf quite quickly in case things go wrong. I've
> managed to rebuild Ard's kernel tree with an additional patch that
> I will send to him. I haven't managed to boot this kernel.
>
> There was an additional question asked about the performance
> characteristics of this but it's a security feature and the kernel
> doesn't have the luxury of a hidden symbol. Further since the kernel
> uses sp_el0 for access everywhere and if they choose to use the same
> register I don't think the performance characteristics would be too bad,
> but that's a decision for the kernel folks to make when taking in the
> feature into the kernel.
>
> I still need to add some tests and documentation in invoke.texi but
> this is at the stage where it would be nice for some other folks
> to look at this.
>
> The difference in code generated is as below.
>
> extern void bar (char *);
> int foo (void)
> {
>char a[100];
>bar ();
> }
>
> $GCC -O2  -fstack-protector-strong  vs
> -mstack-protector-guard-reg=sp_el0 -mstack-protector-guard=sysreg
> -mstack-protector-guard-offset=1024 -fstack-protector-strong
>
>
> --- tst.s   2018-12-03 09:46:21.174167443 +
> +++ tst.s.1 2018-12-03 09:46:03.546257203 +
> @@ -15,15 +15,14 @@
> mov x29, sp
> str x19, [sp, 16]
> .cfi_offset 19, -128
> -   adrpx19, __stack_chk_guard
> -   add x19, x19, :lo12:__stack_chk_guard
> -   ldr x0, [x19]
> -   str x0, [sp, 136]
> -   mov x0,0
> +   mrs x19, sp_el0
> add x0, sp, 32
> +   ldr x1, [x19, 1024]
> +   str x1, [sp, 136]
> +   mov x1,0
> bl  bar
> ldr x0, [sp, 136]
> -   ldr x1, [x19]
> +   ldr x1, [x19, 1024]
> eor x1, x0, x1
> cbnzx1, .L5
>
>
>
>
> I will be afk tomorrow and day after but this is to elicit some comments
> and for Ard to try this out with his kernel patches.
>
> Thoughts ?
>
> regards
> Ramana
>
> gcc/ChangeLog:
>
> 2018-11-23  Ramana Radhakrishnan  
>
>  * config/aarch64/aarch64-opts.h (enum stack_protector_guard): New
>  * config/aarch64/aarch64.c (aarch64_override_options_internal):
> Handle
>  and put in error checks for stack protector guard options.
>  (aarch64_stack_protect_guard): New.
>  (TARGET_STACK_PROTECT_GUARD): Define.
>  * config/aarch64/aarch64.md (UNSPEC_SSP_SYSREG): New.
>  (reg_stack_protect_address): New.
>  (stack_protect_set): Adjust for SSP_GLOBAL.
>  (stack_protect_test): Likewise.
>  * config/aarch64/aarch64.opt (-mstack-protector-guard-reg): New.
>  (-mstack-protector-guard): Likewise.
>  (-mstack-protector-guard-offset): Likewise.
>  * doc/invoke.texi: Document new AArch64 options.

Any further thoughts or is it just Jakub's comments that I need to
address on this patch ? It looks like the kernel folks have queued
this for the next kernel release and given this is helping the kernel
with a security feature, can we move this forward ?

Ramana


Re: [PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-10 Thread Jonathan Wakely

On 10/01/19 10:02 +0100, Jakub Jelinek wrote:

Hi!

In Marc's testcase, we generate terrible code for std::string assignment,
because the __builtin_constant_p is kept in the IL for way too long and the
optimizers (jump threading?) create way too many copies of the
memcpy/memmove calls that it is then hard to bring it back in sanitity.
On the testcase in the PR, GCC 7 emits on x86_64 with -O2 99 bytes long
function, GCC 9 unpatched 259 bytes long, with this patch it emits
139 bytes long, better but still not as good as before.  I guess we'll need
to improve GIMPLE optimizers too, but having twice as small IL for these
heavily used operators where e.g. _M_disjunct uses two of them and we wind
up with twice as many branches because of that is IMHO very useful.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

1) I'm not really sure about proper formatting in libstdc++, I thought you
  don't use space before ( in function calls, but then why is there a space
  in __builtin_constant_p?


I guess I probably copy it from something you gave me :-)

The space shouldn't be there.


2) not really sure about that #if __cplusplus >= 201402L either, I think we
  don't really want to use __builtin_is_constant_evaluated at least in
  C++98 code, but even in C++11, if the operator isn't constexpr, is there
  any point trying to help it do the right thing in constexpr contexts?


I think there's no point, so only doing it for C++14 and later looks
OK to me.

OK for trunk, thanks.



Re: [PATCH] Fix dllimport attribute handling (PR c/88568)

2019-01-10 Thread Richard Biener
On Thu, 10 Jan 2019, Jakub Jelinek wrote:

> Hi!
> 
> handle_dll_attribute sets DECL_EXTERNAL on node for "dllimport" on
> VAR_DECLs, it wants to handle those as if those vars are actually declared
> extern.  The problem is that it doesn't clear TREE_STATIC on them, which
> is what is normally the case on VAR_DECLs that are DECL_EXTERNAL and so
> the C FE that checks for incomplete structs and uses TREE_STATIC incorrectly
> diagnoses that.
> Joseph said in the PR that DECL_EXTERNAL + TREE_STATIC combination is for
> gnu_inline extern inlines, where it indeed makes sense, there is an external
> definition and a static alternative for that, but for the VAR_DECLs nothing
> like that makes sense.
> 
> I have no idea under what maintainership this falls into, because it is
> in middle-end code, but handle_dll_attribute seems to be very Windows
> specific.
> 
> JonY said in the PR that testing went well.
> 
> Ok for trunk?

OK.

Richard.

> 2019-01-10  Jakub Jelinek  
> 
>   PR c/88568
>   * attribs.c (handle_dll_attribute): Clear TREE_STATIC after setting
>   DECL_EXTERNAL.
> 
>   * gcc.dg/pr88568.c: New test.
> 
> --- gcc/attribs.c.jj  2019-01-05 12:06:12.055124090 +0100
> +++ gcc/attribs.c 2019-01-07 12:57:09.739782281 +0100
> @@ -1691,6 +1691,8 @@ handle_dll_attribute (tree * pnode, tree
>a function global scope, unless declared static.  */
> if (current_function_decl != NULL_TREE && !TREE_STATIC (node))
>   TREE_PUBLIC (node) = 1;
> +   /* Clear TREE_STATIC because DECL_EXTERNAL is set.  */
> +   TREE_STATIC (node) = 0;
>   }
>  
>if (*no_add_attrs == false)
> --- gcc/testsuite/gcc.dg/pr88568.c.jj 2019-01-07 13:00:43.113279882 +0100
> +++ gcc/testsuite/gcc.dg/pr88568.c2019-01-07 13:00:16.494718463 +0100
> @@ -0,0 +1,4 @@
> +/* PR c/88568 */
> +/* { dg-do compile } */
> +/* { dg-require-dll "" } */
> +__attribute__((dllimport)) struct S var; /* { dg-bogus "storage size of 
> .var. isn.t known" } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] x86: Update VFIXUPIMM* Intrinsics to align with the latest Intel SDM

2019-01-10 Thread Jakub Jelinek
On Thu, Jan 10, 2019 at 10:46:14AM +0100, Dr. Matthias Kretz wrote:
> I strongly believe this API change needs to be reverted (unless I completely 
> misunderstand how vfixupimm works):
> 
> 1. This change breaks API with previous GCC releases. I.e. source code that 
> compiled with GCC 8 will not compile with GCC 9 anymore. If you really want a 
> "simplified" fixup intrinsic, add a new one.
> 
> 1'. It'll be really frustrating to support code that compiles on more than a 
> single compiler & version.
> 
> 2. Some existing documentation (e.g. https://software.intel.com/sites/
> landingpage/IntrinsicsGuide) still documents the old interface.
> 
> 3. Why was this change introduced to the SDM in the first place? E.g. a logb 
> implementation looked like this:
> 
> ```
> _mm_fixupimm_ps(_mm_getexp_ps(x), x, _mm_set1_epi32(0x00550433), 0x00);

I guess you could use
_mm_mask_fixupimm_ps(_mm_getexp_ps(x), -1, x, _mm_set1_epi32(0x00550433), 0x00);
because that one does allow you to specify the dest operand.

But I agree it is just weird, the non-masked intrinsics don't take into
account the 0b cases anymore.

Jakub


Re: [PATCH][GCC][DOC] Remove obsolete arm and aarch64 CPU names from invoke.texi

2019-01-10 Thread Sam Tebbs
On 12/27/18 12:54 AM, Gerald Pfeifer wrote:

> On Fri, 23 Nov 2018, Sam Tebbs wrote:
>> The mtune= documentation in doc/invoke.texi contains some obsolete CPU names
>> that have been removed from the Arm and AArch64 backends. This patch removes
>> them.
> I believe this should also be covered in the GCC 9 release notes
> at https://gcc.gnu.org/gcc-9/changes.html ?

Hi Gerald,

Sorry for the late reply. My email filters seem to have stumbled a bit 
so I didn't pick this up until now. Would you suggest adding something 
along the lines of "Removed obsolete Arm CPU names from the option 
documentation" (perhaps with a full list as in my original email)?

Sam



Re: [PATCH][AArch64] Use Q-reg loads/stores in movmem expansion

2019-01-10 Thread Kyrill Tkachov

Hi James,

On 09/01/19 17:50, James Greenhalgh wrote:

On Fri, Dec 21, 2018 at 06:30:49AM -0600, Kyrill Tkachov wrote:

Hi all,

Our movmem expansion currently emits TImode loads and stores when copying 
128-bit chunks.
This generates X-register LDP/STP sequences as these are the most preferred 
registers for that mode.

For the purpose of copying memory, however, we want to prefer Q-registers.
This uses one fewer register, so helping with register pressure.
It also allows merging of 256-bit and larger copies into Q-reg LDP/STP, further 
helping code size.

The implementation of that is easy: we just use a 128-bit vector mode (V4SImode 
in this patch)
rather than a TImode.

With this patch the testcase:
#define N 8
int src[N], dst[N];

void
foo (void)
{
__builtin_memcpy (dst, src, N * sizeof (int));
}

generates:
foo:
  adrpx1, src
  add x1, x1, :lo12:src
  adrpx0, dst
  add x0, x0, :lo12:dst
  ldp q1, q0, [x1]
  stp q1, q0, [x0]
  ret

instead of:
foo:
  adrpx1, src
  add x1, x1, :lo12:src
  adrpx0, dst
  add x0, x0, :lo12:dst
  ldp x2, x3, [x1]
  stp x2, x3, [x0]
  ldp x2, x3, [x1, 16]
  stp x2, x3, [x0, 16]
  ret

Bootstrapped and tested on aarch64-none-linux-gnu.
I hope this is a small enough change for GCC 9.
One could argue that it is finishing up the work done this cycle to support 
Q-register LDP/STPs

I've seen this give about 1.8% on 541.leela_r on Cortex-A57 with other changes 
in SPEC2017 in the noise
but there is reduction in code size everywhere (due to more LDP/STP-Q pairs 
being formed)

Ok for trunk?

I'm surprised by the logic. If we want to use 256-bit copies, shouldn't we
be explicit about that in the movmem code, rather than using 128-bit copies
that get merged.


To emit the Q-reg pairs here we'd need to:
1) Adjust the copy_limit to 256 bits after checking 
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-bit moves. 
There's a couple of ways to do it:
  i) Emit OImode moves. For that we'd need add patterns/alternatives that use 
LDP/STP-Q for OImode moves (currently we only use OImode as a container mode 
for LD1/ST1 operations).

  ii) Emit explicit load/store pairs of TImode values. For that we'd need to 
generate two MEMs and two registers, which would complicate 
aarch64_copy_one_block_and_progress_pointers a bit more. Furthermore we'd need 
to add {load,store}_pairtiti patterns in in aarch64-simd.md that actually
  handle TImode pairs. These patterns wouldn't be very useful in other contexts 
as for the compiler to form the register allocation should have chosen to use Q 
regs for the individual TImode operations (for the peepholing to match them), 
which is unlikely. So these patterns would largely exist only for this bit of 
code

  iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs ldp/stps. 
This wouldn't need any adjustments to MD patterns, but would make 
aarch64_copy_one_block_and_progress_pointers more complex as it would now have 
two paths, where one handles two adjacent memory addresses in one calls.

  Why do TImode loads require two X registers? Shouldn't we
just fix TImode loads to use Q registers if that is preferable?

We do have both alternatives in our movti pattern. The Q-reg alternatives comes 
later in the list, so I guess it's slightly less preferable.

I believe the most common use of TImode is for wide integer arithmetic (our 
add-with-overflow patterns)
which are usually done on X registers. So when preferring to use X registers 
makes sense in that scenario.
We only care about using Q-regs for TImode when moving memory.



I'm not opposed to the principle of using LDP-Q in our movmem, but is this
the best way to make that happen?


It looked like the minimal patch to achieve this. If you'd like to see explicit 
pair creating straight away during expand,
I think approach iii) above is the least bad option.

Thanks,
Kyrill


Thanks,
James


2018-12-21  Kyrylo Tkachov  

  * config/aarch64/aarch64.c (aarch64_expand_movmem): Use V4SImode for
  128-bit moves.

2018-12-21  Kyrylo Tkachov  

  * gcc.target/aarch64/movmem-q-reg_1.c: New test.




[PATCH] Fix dllimport attribute handling (PR c/88568)

2019-01-10 Thread Jakub Jelinek
Hi!

handle_dll_attribute sets DECL_EXTERNAL on node for "dllimport" on
VAR_DECLs, it wants to handle those as if those vars are actually declared
extern.  The problem is that it doesn't clear TREE_STATIC on them, which
is what is normally the case on VAR_DECLs that are DECL_EXTERNAL and so
the C FE that checks for incomplete structs and uses TREE_STATIC incorrectly
diagnoses that.
Joseph said in the PR that DECL_EXTERNAL + TREE_STATIC combination is for
gnu_inline extern inlines, where it indeed makes sense, there is an external
definition and a static alternative for that, but for the VAR_DECLs nothing
like that makes sense.

I have no idea under what maintainership this falls into, because it is
in middle-end code, but handle_dll_attribute seems to be very Windows
specific.

JonY said in the PR that testing went well.

Ok for trunk?

2019-01-10  Jakub Jelinek  

PR c/88568
* attribs.c (handle_dll_attribute): Clear TREE_STATIC after setting
DECL_EXTERNAL.

* gcc.dg/pr88568.c: New test.

--- gcc/attribs.c.jj2019-01-05 12:06:12.055124090 +0100
+++ gcc/attribs.c   2019-01-07 12:57:09.739782281 +0100
@@ -1691,6 +1691,8 @@ handle_dll_attribute (tree * pnode, tree
 a function global scope, unless declared static.  */
  if (current_function_decl != NULL_TREE && !TREE_STATIC (node))
TREE_PUBLIC (node) = 1;
+ /* Clear TREE_STATIC because DECL_EXTERNAL is set.  */
+ TREE_STATIC (node) = 0;
}
 
   if (*no_add_attrs == false)
--- gcc/testsuite/gcc.dg/pr88568.c.jj   2019-01-07 13:00:43.113279882 +0100
+++ gcc/testsuite/gcc.dg/pr88568.c  2019-01-07 13:00:16.494718463 +0100
@@ -0,0 +1,4 @@
+/* PR c/88568 */
+/* { dg-do compile } */
+/* { dg-require-dll "" } */
+__attribute__((dllimport)) struct S var;   /* { dg-bogus "storage size of 
.var. isn.t known" } */

Jakub


[PATCH] Remove bare 'throw' expression that breaks -fno-exceptions build

2019-01-10 Thread Jonathan Wakely

This is debugging code that wasn't meant to be left in, and prevents
building the filesystem TS library with -fno-exceptions. It was already
removed from trunk months aog, this removes it from the branch too.

* src/filesystem/std-path.cc (path::remove_filename()): Remove debug
check that prevents building with -fno-exceptions.

Thanks to Uri Simchoni for reporting it.

Tested x86_64-linux, committed to gcc-8-branch.

commit 9e22a0bb23787e74a67730a8f99deb75aa5e6c94
Author: Jonathan Wakely 
Date:   Thu Jan 10 09:43:38 2019 +

Remove bare 'throw' expression that breaks -fno-exceptions build

This is debugging code that wasn't meant to be left in, and prevents
building the filesystem TS library with -fno-exceptions. It was already
removed from trunk months aog, this removes it from the branch too.

* src/filesystem/std-path.cc (path::remove_filename()): Remove debug
check that prevents building with -fno-exceptions.

diff --git a/libstdc++-v3/src/filesystem/std-path.cc 
b/libstdc++-v3/src/filesystem/std-path.cc
index d5016e06138..aa24e1fb288 100644
--- a/libstdc++-v3/src/filesystem/std-path.cc
+++ b/libstdc++-v3/src/filesystem/std-path.cc
@@ -63,8 +63,6 @@ path::remove_filename()
 }
   else if (_M_type == _Type::_Filename)
 clear();
-  if (!empty() && _M_pathname.back() != '/')
-throw 1;
   return *this;
 }
 


Re: [PATCH] x86: Update VFIXUPIMM* Intrinsics to align with the latest Intel SDM

2019-01-10 Thread Dr. Matthias Kretz
I strongly believe this API change needs to be reverted (unless I completely 
misunderstand how vfixupimm works):

1. This change breaks API with previous GCC releases. I.e. source code that 
compiled with GCC 8 will not compile with GCC 9 anymore. If you really want a 
"simplified" fixup intrinsic, add a new one.

1'. It'll be really frustrating to support code that compiles on more than a 
single compiler & version.

2. Some existing documentation (e.g. https://software.intel.com/sites/
landingpage/IntrinsicsGuide) still documents the old interface.

3. Why was this change introduced to the SDM in the first place? E.g. a logb 
implementation looked like this:

```
_mm_fixupimm_ps(_mm_getexp_ps(x), x, _mm_set1_epi32(0x00550433), 0x00);
```

i.e. if x is
- QNaN or SNaN return QNaN
- 0 return -inf
- +1 return _mm_getexp_ps(x)
- +/-inf return +inf
- else return _mm_getexp_ps(x)

How is the new _mm_fixupimm_ps supposed to be useful if I can't pass 
_mm_getexp_ps(x) anymore? What does `_mm_fixupimm_ps(x, 
_mm_set1_epi32(0x00550433), 0x00)` even mean? It returns whatever garbage the 
dst register holds?

Cheers,
  Matthias

On Dienstag, 30. Oktober 2018 10:12:23 CET Wei Xiao wrote:
> Hi,
> 
> The attached patch updates VFIXUPIMM* Intrinsics to align with the
> latest Intel® 64 and IA-32 Architectures Software Developer’s Manual
> (SDM).
> Tested with GCC regression test on x86, no regression.
> 
> Is it ok?
> 
> Thanks
> Wei
> 
> gcc/
> 2018-10-30 Wei Xiao 
> 
> *config/i386/avx512fintrin.h: Update VFIXUPIMM* intrinsics.
> (_mm512_fixupimm_round_pd): Update parameters and builtin.
> (_mm512_maskz_fixupimm_round_pd): Ditto.
> (_mm512_fixupimm_round_ps): Ditto.
> (_mm512_maskz_fixupimm_round_ps): Ditto.
> (_mm_fixupimm_round_sd): Ditto.
> (_mm_maskz_fixupimm_round_sd): Ditto.
> (_mm_fixupimm_round_ss): Ditto.
> (_mm_maskz_fixupimm_round_ss): Ditto.
> (_mm512_fixupimm_pd): Ditto.
> (_mm512_maskz_fixupimm_pd): Ditto.
> (_mm512_fixupimm_ps): Ditto.
> (_mm512_maskz_fixupimm_ps): Ditto.
> (_mm_fixupimm_sd): Ditto.
> (_mm_maskz_fixupimm_sd): Ditto.
> (_mm_fixupimm_ss): Ditto.
> (_mm_maskz_fixupimm_ss): Ditto.
> (_mm512_mask_fixupimm_round_pd): Update builtin.
> (_mm512_mask_fixupimm_round_ps): Ditto.
> (_mm_mask_fixupimm_round_sd): Ditto.
> (_mm_mask_fixupimm_round_ss): Ditto.
> (_mm512_mask_fixupimm_pd): Ditto.
> (_mm512_mask_fixupimm_ps): Ditto.
> (_mm_mask_fixupimm_sd): Ditto.
> (_mm_mask_fixupimm_ss): Ditto.
> *config/i386/avx512vlintrin.h:
> (_mm256_fixupimm_pd): Update parameters and builtin.
> (_mm256_maskz_fixupimm_pd): Ditto.
> (_mm256_fixupimm_ps): Ditto.
> (_mm256_maskz_fixupimm_ps): Ditto.
> (_mm_fixupimm_pd): Ditto.
> (_mm_maskz_fixupimm_pd): Ditto.
> (_mm_fixupimm_ps): Ditto.
> (_mm_maskz_fixupimm_ps): Ditto.
> (_mm256_mask_fixupimm_pd): Update builtin.
> (_mm256_mask_fixupimm_ps): Ditto.
> (_mm_mask_fixupimm_pd): Ditto.
> (_mm_mask_fixupimm_ps): Ditto.
> *config/i386/i386-builtin-types.def: Add new builtin types.
> *config/i386/i386-builtin.def: Update builtin definitions.
> *config/i386/i386.c: Handle new builtin types.
> *config/i386/sse.md: Update VFIXUPIMM* patterns.
> (_fixupimm_maskz): Update.
> (_fixupimm):
> Update. (_fixupimm_mask): Update.
> (avx512f_sfixupimm_maskz): Update.
> (avx512f_sfixupimm): Update.
> (avx512f_sfixupimm_mask): Update.
> *config/i386/subst.md:
> (round_saeonly_sd_mask_operand4): Add new subst_attr.
> (round_saeonly_sd_mask_op4): Ditto.
> (round_saeonly_expand_operand5): Ditto.
> (round_saeonly_expand): Update.
> 
> gcc/testsuite
> 2018-10-30 Wei Xiao 
> 
> *gcc.target/i386/avx-1.c: Update tests for VFIXUPIMM* intrinsics.
> *gcc.target/i386/avx512f-vfixupimmpd-1.c: Ditto.
> *gcc.target/i386/avx512f-vfixupimmpd-2.c: Ditto.
> *gcc.target/i386/avx512f-vfixupimmps-1.c: Ditto.
> *gcc.target/i386/avx512f-vfixupimmsd-1.c: Ditto.
> *gcc.target/i386/avx512f-vfixupimmsd-2.c: Ditto.
> *gcc.target/i386/avx512f-vfixupimmss-1.c: Ditto.
> *gcc.target/i386/avx512f-vfixupimmss-2.c: Ditto.
> *gcc.target/i386/avx512vl-vfixupimmpd-1.c: Ditto.
> *gcc.target/i386/avx512vl-vfixupimmps-1.c: Ditto.
> *gcc.target/i386/sse-13.c: Ditto.
> *gcc.target/i386/sse-14.c: Ditto.
> *gcc.target/i386/sse-22.c: Ditto.
> *gcc.target/i386/sse-23.c: Ditto.
> *gcc.target/i386/testimm-10.c: Ditto.
> *gcc.target/i386/testround-1.c: Ditto.


-- 
───
 Dr. Matthias Kretzphone:  +49 6159 713084
 Senior Software 

Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2019-01-10 Thread Richard Sandiford
Steve Ellcey  writes:
> On Wed, 2019-01-09 at 10:00 +, Richard Sandiford wrote:
>
> Thanks for the quick turnaround on the comments Richard.  Here is a new
> version where I tried to address all the issues you raised.  One thing
> I noticed is that I think your calls_have_same_clobbers_p function only
> works if, when return_call_with_max_clobbers is called with two calls
> that clobber the same set of registers, it always returns the first
> call.
>
> I don't think my original function had that guarantee but I changed it 
> so that it would and documented that requirement in target.def.  I
> couldn't see a better way to implement the calls_have_same_clobbers_p
> function other than doing that.

Yeah, I think that's a good guarantee to have.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1c300af..d88be6c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1655,14 +1655,56 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
> the lower 64 bits of a 128-bit register.  Tell the compiler the callee
> clobbers the top 64 bits when restoring the bottom 64 bits.  */
>  
>  static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode 
> mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
> + machine_mode mode)
>  {
> -  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
> +  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
> +  return FP_REGNUM_P (regno)
> +  && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
> +}
> +
> +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
> +
> +rtx_insn *
> +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
> +{
> +  gcc_assert (CALL_P (call_1) && CALL_P (call_2));
> +
> +  if (aarch64_simd_call_p (call_1) == aarch64_simd_call_p (call_2))
> +return call_1;
> +
> +  if (aarch64_simd_call_p (call_2))
> +return call_1;
> +  else
> +return call_2;

Think this is simpler as:

  gcc_assert (CALL_P (call_1) && CALL_P (call_2));

  if (!aarch64_simd_call_p (call_1) || aarch64_simd_call_p (call_2))
return call_1;
  else
return call_2;

> diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
> index a00ec38..61149e1 100644
> --- a/gcc/lra-lives.c
> +++ b/gcc/lra-lives.c
> @@ -579,22 +579,32 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
> PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
>  static inline void
>  check_pseudos_live_through_calls (int regno,
> -   HARD_REG_SET last_call_used_reg_set)
> +   HARD_REG_SET last_call_used_reg_set,
> +   rtx_insn *call_insn)

Should document the new parameter.

> @@ -906,17 +933,22 @@ process_bb_lives (basic_block bb, int _point, bool 
> dead_insn_p)
>  
> bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
>   && ! hard_reg_set_equal_p (last_call_used_reg_set,
> -this_call_used_reg_set));
> +this_call_used_reg_set)
> + && ! calls_have_same_clobbers_p (call_insn,
> +  last_call_insn));

This should be || with the current test, not &&.  We need to check
that last_call_insn is nonnull first.

> EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
>   {
> IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
>   this_call_used_reg_set);
> +
> if (flush)
> - check_pseudos_live_through_calls
> -   (j, last_call_used_reg_set);
> + check_pseudos_live_through_calls (j,
> +   last_call_used_reg_set,
> +   curr_insn);
>   }

Should be last_call_insn rather than curr_insn.  I.e. when we flush,
we apply the properties of the previous call to pseudos live after
the new call.

Looks good otherwise.

Thanks,
Richard


[PATCH] Use __builtin_is_constant_evaluated in std::less etc. (PR tree-optimization/88775)

2019-01-10 Thread Jakub Jelinek
Hi!

In Marc's testcase, we generate terrible code for std::string assignment,
because the __builtin_constant_p is kept in the IL for way too long and the
optimizers (jump threading?) create way too many copies of the
memcpy/memmove calls that it is then hard to bring it back in sanitity.
On the testcase in the PR, GCC 7 emits on x86_64 with -O2 99 bytes long
function, GCC 9 unpatched 259 bytes long, with this patch it emits
139 bytes long, better but still not as good as before.  I guess we'll need
to improve GIMPLE optimizers too, but having twice as small IL for these
heavily used operators where e.g. _M_disjunct uses two of them and we wind
up with twice as many branches because of that is IMHO very useful.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

1) I'm not really sure about proper formatting in libstdc++, I thought you
   don't use space before ( in function calls, but then why is there a space
   in __builtin_constant_p?
2) not really sure about that #if __cplusplus >= 201402L either, I think we
   don't really want to use __builtin_is_constant_evaluated at least in
   C++98 code, but even in C++11, if the operator isn't constexpr, is there
   any point trying to help it do the right thing in constexpr contexts?

2019-01-09  Jakub Jelinek  

PR tree-optimization/88775
* include/bits/stl_function.h (greater<_Tp*>::operator(),
less<_Tp*>::operator(), greater_equal<_Tp*>::operator(),
less_equal<_Tp*>::operator()): Use __builtin_is_constant_evaluated
instead of __builtin_constant_p if available.  Don't bother with
the pointer comparison in C++11 and earlier.

--- libstdc++-v3/include/bits/stl_function.h.jj 2019-01-01 12:45:51.182541077 
+0100
+++ libstdc++-v3/include/bits/stl_function.h2019-01-09 23:15:34.824800676 
+0100
@@ -413,8 +413,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX14_CONSTEXPR bool
   operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
   {
+#if __cplusplus >= 201402L
+#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+   if (__builtin_is_constant_evaluated())
+#else
if (__builtin_constant_p (__x > __y))
+#endif
  return __x > __y;
+#endif
return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
   }
 };
@@ -426,8 +432,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX14_CONSTEXPR bool
   operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
   {
+#if __cplusplus >= 201402L
+#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+   if (__builtin_is_constant_evaluated())
+#else
if (__builtin_constant_p (__x < __y))
+#endif
  return __x < __y;
+#endif
return (__UINTPTR_TYPE__)__x < (__UINTPTR_TYPE__)__y;
   }
 };
@@ -439,8 +451,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX14_CONSTEXPR bool
   operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
   {
+#if __cplusplus >= 201402L
+#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+   if (__builtin_is_constant_evaluated())
+#else
if (__builtin_constant_p (__x >= __y))
+#endif
  return __x >= __y;
+#endif
return (__UINTPTR_TYPE__)__x >= (__UINTPTR_TYPE__)__y;
   }
 };
@@ -452,8 +470,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _GLIBCXX14_CONSTEXPR bool
   operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
   {
+#if __cplusplus >= 201402L
+#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+   if (__builtin_is_constant_evaluated())
+#else
if (__builtin_constant_p (__x <= __y))
+#endif
  return __x <= __y;
+#endif
return (__UINTPTR_TYPE__)__x <= (__UINTPTR_TYPE__)__y;
   }
 };

Jakub