Re: [PATCH, i386]: Fix PR79804, ICE in print_reg

2017-04-20 Thread Markus Trippelsdorf
On 2017.04.20 at 22:29 +0200, Uros Bizjak wrote:
> 
> PR target/79804
> * gcc.target/i386/pr79804.c: New test.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Committed to mainline SVN.
> --- testsuite/gcc.target/i386/pr79804.c   (nonexistent)
> +++ testsuite/gcc.target/i386/pr79804.c   (working copy)
> @@ -0,0 +1,10 @@
> +/* PR target/79804 */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +void foo (void)
> +{
> +  register int r20 asm ("20");
> +
> +  asm volatile ("# %0" : "=r"(r20));  /* { dg-error "invalid use of 
> register" } */
> +}

The test fails without optimizations:

gcc/testsuite/gcc.target/i386/pr79804.c: In function ‘foo’:
gcc/testsuite/gcc.target/i386/pr79804.c:10:1: error: frame cannot be used in 
asm here
 }
 ^
gcc/testsuite/gcc.target/i386/pr79804.c:9:3: error: invalid 'asm': invalid use 
of register 'frame'
   asm volatile ("# %0" : "=r"(r20));  /* { dg-error "invalid use of register" 
} */
   ^~~


-- 
Markus


[PATCH] warn for reading past the end by library functions (PR 54924, 79234)

2017-04-20 Thread Martin Sebor

PR libstdc++/54924 - Warn for std::string constructor with wrong
size asks for a warning when constructing a std::string from
a character array and a number of elements that's in excess of
the number of elements.  E.g.,

  std::string s ("abc", 7);

PR middle-end/79234 - warn on past the end reads by library functions
is a more general enhancement that suggests warning for calls to any
standard library functions that read past the end of a provided array.
For example:

  char a[8];
  memcpy (a, "abcdef", sizeof a);

The attached patch extends the -Wstringop-overflow warning to also
detect and warn for reading past the end in memcmp, memchr, memcpy,
and mempcpy.  The patch doesn't handle memmove because there's
a separate bug for -Wstringop-overflow not handling the function.
A patch for it was submitted in January and deferred to GCC 8:

  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01994.html

Although the patch handles the std::string case fine the warning
for it is suppressed by -Wsystem-headers.  There's also a separate
bug for that (bug 79214) and a patch for it was submitted back in
January and deferred to GCC 8:

  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01994.html

Martin

PR libstdc++/54924 - Warn for std::string constructor with wrong size
PR middle-end/79234 - warn on past the end reads by library functions

gcc/ChangeLog:

	PR middle-end/79234
	* builtins.c (check_sizes): Adjust to handle reading past the end.
	Avoid printing excessive upper bound of ranges.
	(expand_builtin_memchr): New function.
	(compute_dest_size): Rename...
	(compute_objsize): ...to this.
	(expand_builtin_memcpy): Adjust.
	(expand_builtin_mempcpy): Adjust.
	(expand_builtin_strcat): Adjust.
	(expand_builtin_strcpy): Adjust.
	(check_strncat_sizes): Adjust.
	(expand_builtin_strncat): Adjust.
	(expand_builtin_strncpy): Adjust and simplify.
	(expand_builtin_memset): Adjust.
	(expand_builtin_bzero): Adjust.
	(expand_builtin_memcmp): Adjust.
	(expand_builtin): Handle memcmp.
	(maybe_emit_chk_warning): Check strncat just once.

gcc/testsuite/ChangeLog:

	PR middle-end/79234
	* gcc.dg/builtin-stringop-chk-8.c: New test.
	* gcc.dg/builtin-stringop-chk-1.c: Adjust.
	* gcc.dg/builtin-stringop-chk-4.c: Same.
	* gcc.dg/builtin-strncat-chk-1.c: Same.
	* g++.dg/ext/strncpy-chk1.C: Same.
	* g++.dg/torture/Wsizeof-pointer-memaccess1.C: Same.
	* gcc.dg/out-of-bounds-1.c: Same.
	* gcc.dg/pr78138.c: Same.
	* gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Same.
	* gfortran.dg/mvbits_7.f90: Same.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index f3bee5b..892f576 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -118,6 +118,7 @@ static rtx expand_builtin_va_copy (tree);
 static rtx expand_builtin_strcmp (tree, rtx);
 static rtx expand_builtin_strncmp (tree, rtx, machine_mode);
 static rtx builtin_memcpy_read_str (void *, HOST_WIDE_INT, machine_mode);
+static rtx expand_builtin_memchr (tree, rtx);
 static rtx expand_builtin_memcpy (tree, rtx);
 static rtx expand_builtin_memcpy_with_bounds (tree, rtx);
 static rtx expand_builtin_memcpy_args (tree, tree, tree, rtx, tree);
@@ -3044,10 +3045,10 @@ expand_builtin_memcpy_args (tree dest, tree src, tree len, rtx target, tree exp)
MAXLEN is the user-supplied bound on the length of the source sequence
(such as in strncat(d, s, N).  It specifies the upper limit on the number
of bytes to write.
-   STR is the source string (such as in strcpy(d, s)) when the epxression
+   SRC is the source string (such as in strcpy(d, s)) when the epxression
EXP is a string function call (as opposed to a memory call like memcpy).
-   As an exception, STR can also be an integer denoting the precomputed
-   length of the source string.
+   As an exception, SRC can also be an integer denoting the precomputed
+   size of the source string or object (for functions like memcpy).
OBJSIZE is the size of the destination object specified by the last
argument to the _chk builtins, typically resulting from the expansion
of __builtin_object_size (such as in __builtin___strcpy_chk(d, s,
@@ -3060,7 +3061,7 @@ expand_builtin_memcpy_args (tree dest, tree src, tree len, rtx target, tree exp)
the function returns true, otherwise false..  */
 
 static bool
-check_sizes (int opt, tree exp, tree size, tree maxlen, tree str, tree objsize)
+check_sizes (int opt, tree exp, tree size, tree maxlen, tree src, tree objsize)
 {
   /* The size of the largest object is half the address space, or
  SSIZE_MAX.  (This is way too permissive.)  */
@@ -3068,25 +3069,40 @@ check_sizes (int opt, tree exp, tree size, tree maxlen, tree str, tree objsize)
 
   tree slen = NULL_TREE;
 
+  tree range[2] = { NULL_TREE, NULL_TREE };
+
   /* Set to true when the exact number of bytes written by a string
  function like strcpy is not known and the only thing that is
  known is that it must be at least one (for the terminating nul).  */
   bool at_least_one = false;
-  if (str)
+  if (src)
 {
-  /* STR is 

Re: [patch, wwwdocs]: Update GCC 7 PowerPC changes

2017-04-20 Thread Michael Meissner
I reworded some of the PowerPC changes based on IRC comments from Segher.  Are
these changes ok to go in?

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.77
diff -p -c -r1.77 changes.html
*** htdocs/gcc-7/changes.html   17 Apr 2017 22:12:35 -  1.77
--- htdocs/gcc-7/changes.html   20 Apr 2017 22:23:24 -
*** const int* get_address (unsigned idx)
*** 1005,1010 
--- 1005,1021 
GCC now diagnoses inline assembly that clobbers register r2.
  This has always been invalid code, and is no longer quietly
  tolerated.
+   The PowerPC port's support for ISA 3.0 (-mcpu=power9)
+ has been enhanced to generate more of the new instructions by default, and
+ to provide more built-in functions to generate code for other new
+ instructions.
+   The configuration option --enable=gnu-indirect-function
+ is now enabled by default on PowerPC GNU/Linux builds.
+   The PowerPC port will now allow 64-bit and 32-bit integer types to be
+ allocated to the VSX vector registers (ISA 2.06 and above).  In addition,
+ on ISA 3.0, 16-bit and 8-bit integer types can be allocated in the vector
+ registers.  Previously, only 64-bit integer types were allowed in the
+ traditional floating point registers.
  
  
  


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[wwwdocs] powerpc: Another update for gcc-7/changes.html

2017-04-20 Thread Segher Boessenkool
Hi!

Here's another update.  Does this look okay?  (I'll fix the conflict
with Mike's patch, of course).

Thanks,


Segher


Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.77
diff -U 3 -r1.77 changes.html
--- htdocs/gcc-7/changes.html   17 Apr 2017 22:12:35 -  1.77
+++ htdocs/gcc-7/changes.html   20 Apr 2017 22:04:58 -
@@ -1005,6 +1005,11 @@
   GCC now diagnoses inline assembly that clobbers register r2.
 This has always been invalid code, and is no longer quietly
 tolerated.
+  There are new options -mstack-protector-guard=global,
+-mstack-protector-guard=tls,
+-mstack-protector-guard-reg=, and
+-mstack-protector-guard-offset=, to change how the stack
+protector gets the value to use as canary.
 
 
 


[patch, wwwdocs]: Update GCC 7 PowerPC changes

2017-04-20 Thread Michael Meissner
I updated the GCC 7 changes document to include some of the PowerPC
changes we did for GCC 7.  Can I check this in?

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.77
diff -p -c -r1.77 changes.html
*** htdocs/gcc-7/changes.html   17 Apr 2017 22:12:35 -  1.77
--- htdocs/gcc-7/changes.html   20 Apr 2017 22:05:02 -
*** const int* get_address (unsigned idx)
*** 1005,1010 
--- 1005,1020 
GCC now diagnoses inline assembly that clobbers register r2.
  This has always been invalid code, and is no longer quietly
  tolerated.
+   The PowerPC port's support for ISA 3.0 (-mcpu=power9)
+ has been enchanced to include native code generation and new built-in
+ functions.
+   The configuration option --enable=gnu-indirect-function
+ is now enabled by default on PowerPC Linux builds.
+   The PowerPC port will now allow 64-bit and 32-bit integer types to be
+ allocated to the VSX vector registers (ISA 2.06 and above).  In addition,
+ on ISA 3.0, 16-bit and 8-bit integer types can go in the vector registers.
+ Previously, only 64-bit integer types were allowed in the traditional
+ floating point registers.
  
  
  

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [gomp4] add support for allocatable scalars in OpenACC declare constructs

2017-04-20 Thread Cesar Philippidis
On 04/20/2017 01:08 AM, Thomas Schwinge wrote:

> On Wed, 19 Apr 2017 11:11:39 -0700, Cesar Philippidis 
>  wrote:

>> Included in this patch is a bug fix for non-declared allocatable
>> scalars. [...]
> 
> Please, bug fixes as work items/patches/commits separate from new
> features.  (As long as that makes sense.)

I tried, but these were too closely related.

>> +  if (code->op == EXEC_OACC_UPDATE)
>> +for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> +  {
>> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>> +&& OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER)
>> +  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ALWAYS_POINTER);
>> +  }
> 
>> --- a/gcc/testsuite/gfortran.dg/goacc/declare-allocatable-1.f90
>> +++ b/gcc/testsuite/gfortran.dg/goacc/declare-allocatable-1.f90
>> @@ -6,20 +6,20 @@
>>  
>>  program allocate
>>implicit none
>> -  integer, allocatable :: a(:)
>> +  integer, allocatable :: a(:), b
>>integer, parameter :: n = 100
>>integer i
>> -  !$acc declare create(a)
>> +  !$acc declare create(a,b)
>>  
>> -  allocate (a(n))
>> +  allocate (a(n), b)
>>  
>> -  !$acc parallel loop copyout(a)
>> +  !$acc parallel loop copyout(a, b)
>>do i = 1, n
>> - a(i) = i
>> + a(i) = b
>>end do
> 
> That "a(i) = b" assignment is reading uninitialized data ("copyout(b)").
> Did you mean to specify "copyin(b)" or (implicit?) "firstprivate(b)", and
> set "b" to a defined value before the region?

This is only a compiler test and all of the interesting stuff happens
during gimplication. In fact, this test exposed an ICE while testing
allocatable scalars early on.

>> -  deallocate (a)
>> +  deallocate (a, b)
>>  end program allocate
>>  
>> -! { dg-final { scan-tree-dump-times "pragma acc enter data 
>> map.declare_allocate" 1 "original" } }
>> -! { dg-final { scan-tree-dump-times "pragma acc exit data 
>> map.declare_deallocate" 1 "original" } }
>> +! { dg-final { scan-tree-dump-times "pragma acc enter data 
>> map.declare_allocate" 2 "original" } }
>> +! { dg-final { scan-tree-dump-times "pragma acc exit data 
>> map.declare_deallocate" 2 "original" } }
> 
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90
>> @@ -0,0 +1,30 @@
> 
> Missing "{ dg-do run }" to exercise torture testing -- or is that
> intentionally not done here?

Corrected in the attached gomp4-allocate-test.diff.

>> +program main
>> +  implicit none
>> +  integer, parameter :: n = 100
>> +  integer, allocatable :: a, c
>> +  integer :: i, b(n)
>> +
>> +  allocate (a)
>> +
>> +  a = 50
>> +
>> +  !$acc parallel loop
>> +  do i = 1, n;
>> + b(i) = a
>> +  end do
>> +
>> +  do i = 1, n
>> + if (b(i) /= a) call abort
>> +  end do
>> +
>> +  allocate (c)
>> +
>> +  print *, loc (c)
>> +  !$acc parallel copyout(c) num_gangs(1)
>> +  c = a
>> +  !$acc end parallel
>> +
>> +  if (c /= a) call abort
>> +
>> +  deallocate (a, c)
>> +end program main
> 
> 
> Additionally, I'm seeing one regression:
> 
> [-PASS:-]{+FAIL: gfortran.dg/goacc/pr77371-1.f90   -O  (internal compiler 
> error)+}
> {+FAIL:+} gfortran.dg/goacc/pr77371-1.f90   -O  (test for excess errors)
> 
> [...]/gcc/testsuite/gfortran.dg/goacc/pr77371-1.f90:5:0: internal 
> compiler error: in force_constant_size, at gimplify.c:664
> 0x87c57b force_constant_size
> [...]/gcc/gimplify.c:664
> 0x87e687 gimple_add_tmp_var(tree_node*)
> [...]/gcc/gimplify.c:702
> 0x867f3d create_tmp_var(tree_node*, char const*)
> [...]/gcc/gimple-expr.c:476
> 0x9a1b95 lower_omp_target
> [...]/gcc/omp-low.c:16875
> [...]

That's because lower_omp_target is now allocating local storage for
pointers, and that ICE is triggered by creating a temporary variable for
a VLA. I don't think VLAs are supported on accelerators because of the
lack of alloca, so I just fell back to the original behavior of not
allocating local storage for that variable. See gomp4-pr77371-1-ice.diff
for more details. Maybe the gimplifier should emit a warning when it
encounters such a variable? Another thing we can do is teach GCC how to
upload firstprivate variables into const memory, so that user cannot
manipulate them. But that doesn't help improve the correctness of the
program.

That lower_omp_target patch is still under test. I'll apply
gomp4-allocate-test.diff to gomp-4_0-branch shortly.

Cesar

2017-04-20  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_omp_target): Don't privatize firstprivate VLAs.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5c41edc..cc209ba 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -16867,7 +16867,11 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		  x = convert_from_firstprivate_int (x, TREE_TYPE (new_var),
 		 is_reference (var),
 		 );
-		else if (is_reference (new_var))
+		/* Accelerators may not have alloca, so it's not
+		   

Re: [PATCH] Add test-case (PR tree-optimization/66278).

2017-04-20 Thread Mike Stump
On Apr 20, 2017, at 6:04 AM, Bin.Cheng  wrote:
> 
> On Thu, Apr 20, 2017 at 9:35 AM, Martin Liška  wrote:
>> Hello.
>> 
>> The patch adds a new test-case for the mentioned PR. Tested on 
>> x86_64-linux-gnu
>> and ppc64le-linux-gnu.
>> 
>> Ready for trunk or should I postpone it for next stage1?
> Though can't approve, I think it's ok since we are in stage 1 now,
> also adding test is unlikely to affect RCs.

Right, since we are in stage1 now, there is little reason to postpone any work 
the next stage 1 after the current stage 1.  :-)



[PATCH] Implement a warning for bogus sizeof(pointer) / sizeof(pointer[0])

2017-04-20 Thread Bernd Edlinger
Hi!


This implements a new -Wall enabled warning for a rather common, but
completely wrong way to compute an array size by dividing the
sizeof(pointer) / sizeof(pointer[0]) or sizeof(*pointer).

It is often hard to find this kind of error by simple code inspection
in real code, because using sizeof in this way is a quite common idiom
to get the array size of an array variable.  And furthermore this
expression may be used in macros, which makes it even more important to
have this warning.

There is a similar warning -Wsizeof-pointer-memaccess which helped in
implementing the infrastructure for the new warning in the C FE.

However I noticed that the -Wsizeof-pointer-memaccess warning was 
missing in C, when the sizeof is used inside parentheses, which is
different from C++, so I fixed that too.

Of course, I added some test cases for that as well.

To illustrate the usefulness of this warning, it revealed quite a few
places where bogus sizeof divisions were used in our testsuite.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
gcc:
2017-04-20  Bernd Edlinger  

* doc/invoke.texi: Document the -Wsizeof-pointer-div warning.

gcc/c-family:
2017-04-20  Bernd Edlinger  

* c.opt (Wsizeof-pointer-div): New warning option.

gcc/c:
2017-04-20  Bernd Edlinger  

* c-parser.c (c_parser_binary_expression): Implement the
-Wsizeof_pointer_div warning.
(c_parser_postfix_expression): Allow SIZEOF_EXPR as expr.original_code
from a parenthesized expression.
(c_parser_expr_list): Use c_last_sizeof_loc.
* c-tree.h (c_last_sizeof_loc): New external.
* c-typeck.c (c_last_sizeof_loc): New variable.
(c_expr_sizeof_expr, c_expr_sizeof_type): Assign c_last_sizeof_loc.


gcc/cp:
2017-04-20  Bernd Edlinger  

* cp-gimplify.c (cp_fold): Implement the -Wsizeof_pointer_div warning.

gcc/testsuite:
2017-04-20  Bernd Edlinger  

* c-c++-common/Wsizeof-pointer-div.c: New test.
* gcc.dg/Wsizeof-pointer-memaccess1.c: Add test cases with parens.
* gcc.dg/torture/Wsizeof-pointer-memaccess1.c: Likewise.
* gcc.target/i386/sse-init-v4hi-1.c: Fix test case.
* gcc.target/i386/sse-init-v4sf-1.c: Likewise.
* gcc.target/i386/sse-set-ps-1.c: Likewise.
* gcc.target/i386/sse2-init-v16qi-1.c: Likewise.
* gcc.target/i386/sse2-init-v2di-1.c: Likewise.
* gcc.target/i386/sse2-init-v4si-1.c: Likewise.
* gcc.target/i386/sse2-init-v8hi-1.c: Likewise.
* gcc.target/i386/sse2-set-epi32-1.c: Likewise.
* gcc.target/i386/sse2-set-epi64x-1.c: Likewise.
* gcc.target/i386/sse4_1-init-v16qi-1.c: Likewise.
* gcc.target/i386/sse4_1-init-v2di-1.c: Likewise.
* gcc.target/i386/sse4_1-init-v4sf-1.c: Likewise.
* gcc.target/i386/sse4_1-init-v4si-1.c: Likewise.
* gcc.target/i386/sse4_1-set-epi32-1.c: Likewise.
* gcc.target/i386/sse4_1-set-epi64x-1.c: Likewise.
* gcc.target/i386/sse4_1-set-ps-1.c: Likewise.
* libgomp.c/pr39591-2.c: Likewise.
* libgomp.c/pr39591-3.c: Likewise.

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c	(revision 246482)
+++ gcc/c/c-parser.c	(working copy)
@@ -6652,6 +6652,8 @@ c_parser_binary_expression (c_parser *parser, stru
 enum tree_code op;
 /* The source location of this operation.  */
 location_t loc;
+/* The sizeof argument if expr.original_code == SIZEOF_EXPR.  */
+tree sizeof_arg;
   } stack[NUM_PRECS];
   int sp;
   /* Location of the binary operator.  */
@@ -6668,6 +6670,22 @@ c_parser_binary_expression (c_parser *parser, stru
 	c_inhibit_evaluation_warnings -= (stack[sp - 1].expr.value	  \
 	  == truthvalue_true_node);	  \
 	break;  \
+  case TRUNC_DIV_EXPR: 		  \
+	if (stack[sp - 1].expr.original_code == SIZEOF_EXPR		  \
+	&& stack[sp].expr.original_code == SIZEOF_EXPR)		  \
+	  {  \
+	tree type0 = stack[sp - 1].sizeof_arg;			  \
+	tree type1 = stack[sp].sizeof_arg;  \
+	if (!TYPE_P (type0))	  \
+	  type0 = TREE_TYPE (type0);  \
+	if (!TYPE_P (type1))	  \
+	  type1 = TREE_TYPE (type1);  \
+	if (POINTER_TYPE_P (type0)	  \
+		&& comptypes (TREE_TYPE (type0), type1))		  \
+	  warning_at (stack[sp].loc, OPT_Wsizeof_pointer_div,	  \
+			  "dividing the pointer size by the element size");   \
+	  }  \
+	break;  \
   default:  \
 	break;  \
   }	  \
@@ -6701,6 +6719,7 @@ c_parser_binary_expression (c_parser *parser, stru
   stack[0].loc = c_parser_peek_token (parser)->location;
   stack[0].expr = c_parser_cast_expression 

[PATCH, i386]: Fix PR79804, ICE in print_reg

2017-04-20 Thread Uros Bizjak
2017-04-20  Uros Bizjak  

PR target/79804
* config/i386/i386.c (print_reg): Remove assert for disalowed
regno values, call output_operand_lossage instead.

testsuite/ChangeLog:

2017-04-20  Uros Bizjak  

PR target/79804
* gcc.target/i386/pr79804.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 246382)
+++ config/i386/i386.c  (working copy)
@@ -17637,13 +17637,17 @@ print_reg (rtx x, int code, FILE *file)
 
   regno = REGNO (x);
 
-  gcc_assert (regno != ARG_POINTER_REGNUM
- && regno != FRAME_POINTER_REGNUM
- && regno != FPSR_REG
- && regno != FPCR_REG);
-
-  if (regno == FLAGS_REG)
+  if (regno == ARG_POINTER_REGNUM
+  || regno == FRAME_POINTER_REGNUM
+  || regno == FPSR_REG
+  || regno == FPCR_REG)
 {
+  output_operand_lossage
+   ("invalid use of register '%s'", reg_names[regno]);
+  return;
+}
+  else if (regno == FLAGS_REG)
+{
   output_operand_lossage ("invalid use of asm flag output");
   return;
 }
Index: testsuite/gcc.target/i386/pr79804.c
===
--- testsuite/gcc.target/i386/pr79804.c (nonexistent)
+++ testsuite/gcc.target/i386/pr79804.c (working copy)
@@ -0,0 +1,10 @@
+/* PR target/79804 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void foo (void)
+{
+  register int r20 asm ("20");
+
+  asm volatile ("# %0" : "=r"(r20));  /* { dg-error "invalid use of register" 
} */
+}


RE: [PATCH] Fix fixincludes for canadian cross builds - next try

2017-04-20 Thread Matthew Fortune
Bernd Edlinger  writes:
> This is my new attempt to clean up the different cross compiler
> configurations.  It turned out to be a very complicated matter,
> so I thought it would be better to postpone it to the stage1.
> 
> In a canadian cross compiler setup we have a different header dir path
> for use in the build and later on the target, which is written to
> install-tools/mkheaders.conf, so I propose to export SYSTEM_HEADER_DIR
> and BUILD_SYSTEM_HEADER_DIR from configure.ac to be used in Makefile.in.
> 
> I also removed unnecessary handling of --with-headers, because
> the headers are copied to sys-include and thus it is not necessary to
> use the original path here.
> 
> If --with-sysroot or --with-build-sysroot is used the SYSTEM_HEADER_DIR
> or BUILD_SYSTEM_HEADER_DIR contain $${sysroot_headers_suffix},
> which is normally an empty string, but on mips it may be something
> like "mips-r2" which gets appended to the sysroot for use of fixincludes
> but "target_header_dir" which is used in configure to find things like
> the GLIBC version it is not used.  I assume that that either does
> not create problems and is silently ignored, or that people have a
> work around, my patch should not change that, however I have not been
> able to setup a sysroot for mips*-img-linux* or mips*-mti-linux* which
> seem to be the only targets where this might make a difference.

I'll try to test this out for you with MIPS. I have changes I want to make
to further improve the cross builds for the mti and img vendor builds so
if there is a bit of fallout I could deal with it then.

Matthew


[PATCH] Fix fixincludes for canadian cross builds - next try

2017-04-20 Thread Bernd Edlinger
Hi!

This is my new attempt to clean up the different cross compiler
configurations.  It turned out to be a very complicated matter,
so I thought it would be better to postpone it to the stage1.

In a canadian cross compiler setup we have a different header dir path
for use in the build and later on the target, which is written to
install-tools/mkheaders.conf, so I propose to export SYSTEM_HEADER_DIR
and BUILD_SYSTEM_HEADER_DIR from configure.ac to be used in Makefile.in.

I also removed unnecessary handling of --with-headers, because
the headers are copied to sys-include and thus it is not necessary to
use the original path here.

If --with-sysroot or --with-build-sysroot is used the SYSTEM_HEADER_DIR
or BUILD_SYSTEM_HEADER_DIR contain $${sysroot_headers_suffix},
which is normally an empty string, but on mips it may be something
like "mips-r2" which gets appended to the sysroot for use of fixincludes
but "target_header_dir" which is used in configure to find things like
the GLIBC version it is not used.  I assume that that either does
not create problems and is silently ignored, or that people have a
work around, my patch should not change that, however I have not been
able to setup a sysroot for mips*-img-linux* or mips*-mti-linux* which
seem to be the only targets where this might make a difference.

I have tested all different combinations of --with-sysroot /
--with-build-sysroot.  Even a native build with --with-sysroot works.
Except go of course: cross-builds are a no-go area for the go language
in general ;)

As before I would appreciate your kind help with testing the many
different build setups.


So far I have tested native x86_64-pc-linux-gnu and arm-linux-gnueabihf
cross build configurations.  And everything looks sane.

Is it OK for trunk?


Thanks
Bernd.
2017-04-18  Bernd Edlinger  

	* configure.ac (SYSTEM_HEADER_DIR, BUILD_SYSTEM_HEADER_DIR,
	target_header_dir): Set correctly.
	* configure: Regenerated.
	* Makefile.in (BUILD_SYSTEM_HEADER_DIR): New make variabe.
	(LIMITS_H_TEST, if_multiarch, stmp-fixinc): Use BUILD_SYSTEM_HEADER_DIR
	instead of SYSTEM_HEADER_DIR.

Index: gcc/configure.ac
===
--- gcc/configure.ac	(revision 246979)
+++ gcc/configure.ac	(working copy)
@@ -1998,41 +1998,29 @@ done
 CROSS=		AC_SUBST(CROSS)
 ALL=all.internalAC_SUBST(ALL)
 SYSTEM_HEADER_DIR='$(NATIVE_SYSTEM_HEADER_DIR)'	AC_SUBST(SYSTEM_HEADER_DIR)
+BUILD_SYSTEM_HEADER_DIR=$SYSTEM_HEADER_DIR	AC_SUBST(BUILD_SYSTEM_HEADER_DIR)
 
-if test "x$with_build_sysroot" != x; then
-  build_system_header_dir=$with_build_sysroot'$${sysroot_headers_suffix}$(NATIVE_SYSTEM_HEADER_DIR)'
-else
-  # This value is used, even on a native system, because 
-  # CROSS_SYSTEM_HEADER_DIR is just 
-  # $(TARGET_SYSTEM_ROOT)$(NATIVE_SYSTEM_HEADER_DIR).
-  build_system_header_dir='$(CROSS_SYSTEM_HEADER_DIR)'
-fi
+if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x ||
+   test x$build != x$host || test "x$with_build_sysroot" != x; then
+  if test "x$with_build_sysroot" != x; then
+BUILD_SYSTEM_HEADER_DIR=$with_build_sysroot'$${sysroot_headers_suffix}$(NATIVE_SYSTEM_HEADER_DIR)'
+  else
+BUILD_SYSTEM_HEADER_DIR='$(CROSS_SYSTEM_HEADER_DIR)'
+  fi
 
-if test x$host != x$target
-then
-	CROSS="-DCROSS_DIRECTORY_STRUCTURE"
-	ALL=all.cross
-	SYSTEM_HEADER_DIR=$build_system_header_dir
-	case $target in
-		*-*-mingw*)
-			if test "x$with_headers" = x; then
-with_headers=yes
-			fi
-			;;
-		*)
-			;;
-	esac
-elif test "x$TARGET_SYSTEM_ROOT" != x; then
-SYSTEM_HEADER_DIR=$build_system_header_dir 
-fi
+  if test x$host != x$target
+  then
+CROSS="-DCROSS_DIRECTORY_STRUCTURE"
+ALL=all.cross
+SYSTEM_HEADER_DIR=$BUILD_SYSTEM_HEADER_DIR
+  elif test "x$TARGET_SYSTEM_ROOT" != x; then
+SYSTEM_HEADER_DIR='$(CROSS_SYSTEM_HEADER_DIR)'
+  fi
 
-if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
-  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
-target_header_dir=$with_headers
+  if test "x$with_build_sysroot" != "x"; then
+target_header_dir="${with_build_sysroot}${native_system_header_dir}"
   elif test "x$with_sysroot" = x; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
-  elif test "x$with_build_sysroot" != "x"; then
-target_header_dir="${with_build_sysroot}${native_system_header_dir}"
   elif test "x$with_sysroot" = xyes; then
 target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-root${native_system_header_dir}"
   else
Index: gcc/configure
===
--- gcc/configure	(revision 246979)
+++ gcc/configure	(working copy)
@@ -719,6 +719,7 @@ BUILD_CFLAGS
 CXX_FOR_BUILD
 CC_FOR_BUILD
 inhibit_libc
+BUILD_SYSTEM_HEADER_DIR
 SYSTEM_HEADER_DIR
 ALL
 CROSS
@@ -12214,41 +12215,29 @@ done
 CROSS=
 ALL=all.internal
 SYSTEM_HEADER_DIR='$(NATIVE_SYSTEM_HEADER_DIR)'

[PATCH, i386]: Fix PR78090, GCC allows integer register for inter unit conversion under -mtune-ctrl=^inter_unit_conversions

2017-04-20 Thread Uros Bizjak
Hello!

Attached patch makes TARGET_INTER_UNIT_CONVERSIONS setting more
robust, by taking the same approach as handling of
TARGET_INTER_UNIT_{TO,FROM}_VEC.

The patch also removes now obsolete test that checked
TARGET_INTER_UNIT_CONVERSIONS with tune_for_speed attribute, which is
not case anymore.

2017-04-20  Uros Bizjak  

PR target/78090
* config/i386/constraints.md (Yc): New register constraint.
* config/i386/i386.md (*float2_mixed):
Use Yc constraint for alternative 2 of operand 0.  Remove
preferred_for_speed attribute.

testsuite/ChangeLog:

2017-04-20  Uros Bizjak  

PR target/78090
* gcc.target/i386/conversion-2.c: Remove obsolete test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/constraints.md
===
--- config/i386/constraints.md  (revision 247026)
+++ config/i386/constraints.md  (working copy)
@@ -99,6 +99,7 @@
 
 ;; We use the Y prefix to denote any number of conditional register sets:
 ;;  z  First SSE register.
+;;  c  SSE inter-unit conversions enabled
 ;;  i  SSE2 inter-unit moves to SSE register enabled
 ;;  j  SSE2 inter-unit moves from SSE register enabled
 ;;  m  MMX inter-unit moves to MMX register enabled
@@ -117,6 +118,10 @@
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
 
+(define_register_constraint "Yc"
+ "TARGET_SSE && TARGET_INTER_UNIT_CONVERSIONS ? ALL_SSE_REGS : NO_REGS"
+ "@internal Any SSE register, when SSE and inter-unit conversions are 
enabled.")
+
 (define_register_constraint "Yi"
  "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC ? ALL_SSE_REGS : NO_REGS"
  "@internal Any SSE register, when SSE2 and inter-unit moves to vector 
registers are enabled.")
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 247026)
+++ config/i386/i386.md (working copy)
@@ -5207,7 +5207,7 @@
 })
 
 (define_insn "*float2_mixed"
-  [(set (match_operand:MODEF 0 "register_operand" "=f,v,v")
+  [(set (match_operand:MODEF 0 "register_operand" "=f,Yc,v")
(float:MODEF
  (match_operand:SWI48 1 "nonimmediate_operand" "m,r,m")))]
   "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH"
@@ -5236,10 +5236,6 @@
&& X87_ENABLE_FLOAT (mode,
 mode)")
]
-   (symbol_ref "true")))
-   (set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
-  (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS")]
(symbol_ref "true")))])
 
 (define_insn "*float2_i387"
Index: testsuite/gcc.target/i386/conversion-2.c
===
--- testsuite/gcc.target/i386/conversion-2.c(revision 247026)
+++ testsuite/gcc.target/i386/conversion-2.c(nonexistent)
@@ -1,36 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fno-toplevel-reorder -mtune=bdver2" } */
-/* { dg-additional-options "-mregparm=1 -msse -mfpmath=sse" { target ia32 } } 
*/
-
-void __attribute__ ((hot))
-f1 (int x)
-{
-  register float f asm ("%xmm0") = x;
-  asm volatile ("" :: "x" (f));
-}
-
-void __attribute__ ((cold))
-f2 (int x)
-{
-  register float f asm ("%xmm1") = x;
-  asm volatile ("" :: "x" (f));
-}
-
-void __attribute__ ((hot))
-f3 (int x)
-{
-  register float f asm ("%xmm2") = x;
-  asm volatile ("" :: "x" (f));
-}
-
-void __attribute__ ((cold))
-f4 (int x)
-{
-  register float f asm ("%xmm3") = x;
-  asm volatile ("" :: "x" (f));
-}
-
-/* { dg-final { scan-assembler "sp\\\), %xmm0" } } */
-/* { dg-final { scan-assembler "(ax|di), %xmm1" } } */
-/* { dg-final { scan-assembler "sp\\\), %xmm2" } } */
-/* { dg-final { scan-assembler "(ax|di), %xmm3" } } */


Re: [PATCH 4/7] [D] libiberty: Remove wrongly spec'd mangle rule for encoded integers.

2017-04-20 Thread Iain Buclaw
On 15 April 2017 at 17:24, Iain Buclaw  wrote:
> This updates the implementation to reflect a part of the D ABI spec
> that has been removed.  There should never be a bare integer value
> encoded into a template argument list.  Integers are always prefixed
> by `i' if they are positive or `N' if they are negative.
>
> Have verified this indeed is the case using the code below, and
> updated all coverage tests to match the compiler.
>
> module demangle;
>
> template test(alias T)
> {
> void test() { auto t = T; }
> }
>
> pragma(msg, (test!(cast(byte)123)).mangleof);
> pragma(msg, (test!(cast(int)123)).mangleof);
> pragma(msg, (test!(cast(short)123)).mangleof);
> pragma(msg, (test!(cast(ubyte)123)).mangleof);
> pragma(msg, (test!(cast(uint)123)).mangleof);
> pragma(msg, (test!(cast(ushort)123)).mangleof);
> pragma(msg, (test!(cast(long)123)).mangleof);
> pragma(msg, (test!(cast(ulong)123)).mangleof);
> pragma(msg, (test!(true)).mangleof);
> pragma(msg, (test!(false)).mangleof);
> pragma(msg, (test!('\x0a')).mangleof);
> pragma(msg, (test!(' ')).mangleof);
> pragma(msg, (test!('A')).mangleof);
> pragma(msg, (test!('~')).mangleof);
> pragma(msg, (test!('\u03e8')).mangleof);
> pragma(msg, (test!('\U000186a0')).mangleof);
>
> ---

Hmm, it seems like D compilers until 3 years ago infact used to mangle
in this way.  Better keep it around then for a little while longer for
compatibility.  I'll follow-up with an amendment which doesn't remove
the cases, instead adding an explanatory comment on why it's present,
but still fixing up the coverage tests anyway, as they should try to
reflect as close to actual mangling as possible.

--
Iain.


[PING] [PATCH, ARM] correctly encode the CC reg data flow

2017-04-20 Thread Bernd Edlinger
Ping...

for this patch:
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html

On 01/18/17 16:36, Bernd Edlinger wrote:
> On 01/13/17 19:28, Bernd Edlinger wrote:
>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
 On 18/12/16 12:58, Bernd Edlinger wrote:
> Hi,
>
> this is related to PR77308, the follow-up patch will depend on this
> one.
>
> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
> before reload, a mis-compilation in libgcc function __gnu_satfractdasq
> was discovered, see [1] for more details.
>
> The reason seems to be that when the *arm_cmpdi_insn is directly
> followed by a *arm_cmpdi_unsigned instruction, both are split
> up into this:
>
>[(set (reg:CC CC_REGNUM)
>  (compare:CC (match_dup 0) (match_dup 1)))
> (parallel [(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 3) (match_dup 4)))
>(set (match_dup 2)
> (minus:SI (match_dup 5)
>  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
> 0])]
>
>[(set (reg:CC CC_REGNUM)
>  (compare:CC (match_dup 2) (match_dup 3)))
> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 0) (match_dup 1]
>
> The problem is that the reg:CC from the *subsi3_carryin_compare
> is not mentioning that the reg:CC is also dependent on the reg:CC
> from before.  Therefore the *arm_cmpsi_insn appears to be
> redundant and thus got removed, because the data values are identical.
>
> I think that applies to a number of similar pattern where data
> flow is happening through the CC reg.
>
> So this is a kind of correctness issue, and should be fixed
> independently from the optimization issue PR77308.
>
> Therefore I think the patterns need to specify the true
> value that will be in the CC reg, in order for cse to
> know what the instructions are really doing.
>
>
> Bootstrapped and reg-tested on arm-linux-gnueabihf.
> Is it OK for trunk?
>

 I agree you've found a valid problem here, but I have some issues with
 the patch itself.


 (define_insn_and_split "subdi3_compare1"
   [(set (reg:CC_NCV CC_REGNUM)
 (compare:CC_NCV
   (match_operand:DI 1 "register_operand" "r")
   (match_operand:DI 2 "register_operand" "r")))
(set (match_operand:DI 0 "register_operand" "=")
 (minus:DI (match_dup 1) (match_dup 2)))]
   "TARGET_32BIT"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 1) (match_dup 2)))
   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
(parallel [(set (reg:CC_C CC_REGNUM)
(compare:CC_C
  (zero_extend:DI (match_dup 4))
  (plus:DI (zero_extend:DI (match_dup 5))
   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
   (set (match_dup 3)
(minus:SI (minus:SI (match_dup 4) (match_dup 5))
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


 This pattern is now no-longer self consistent in that before the split
 the overall result for the condition register is in mode CC_NCV, but
 afterwards it is just CC_C.

 I think CC_NCV is correct mode (the N, C and V bits all correctly
 reflect the result of the 64-bit comparison), but that then implies
 that
 the cc mode of subsi3_carryin_compare is incorrect as well and
 should in
 fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
 agree
 that CC_NCV is the correct mode for this operation

 I'm not sure if there are other consequences that will fall out from
 fixing this (it's possible that we might need a change to
 select_cc_mode
 as well).

>>>
>>> Yes, this is still a bit awkward...
>>>
>>> The N and V bit will be the correct result for the subdi3_compare1
>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>> only gets the C bit correct, the expression for N and V is a different
>>> one.
>>>
>>> It probably works, because the subsi3_carryin_compare instruction sets
>>> more CC bits than the pattern does explicitly specify the value.
>>> We know the subsi3_carryin_compare also computes the NV bits, but it is
>>> hard to write down the correct rtl expression for it.
>>>
>>> In theory the pattern should describe everything correctly,
>>> maybe, like:
>>>
>>> set (reg:CC_C CC_REGNUM)
>>> (compare:CC_C
>>>   (zero_extend:DI (match_dup 4))
>>>   (plus:DI (zero_extend:DI (match_dup 5))
>>>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>>> set (reg:CC_NV 

Re: [PATCH] PR c++/80473 allow suppressing notes about over-aligned new

2017-04-20 Thread Jonathan Wakely

On 20/04/17 17:43 +0100, Jonathan Wakely wrote:

On 20/04/17 17:33 +0200, Marek Polacek wrote:

On Thu, Apr 20, 2017 at 05:22:00PM +0200, Jakub Jelinek wrote:

On Thu, Apr 20, 2017 at 04:09:20PM +0100, Jonathan Wakely wrote:

--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3128,11 +3128,14 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
 {
   warning (OPT_Waligned_new_, "% of type %qT with extended "
   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
-  inform (input_location, "uses %qD, which does not have an alignment "
- "parameter", alloc_fn);
-  if (!aligned_new_threshold)
-   inform (input_location, "use %<-faligned-new%> to enable C++17 "
-   "over-aligned new support");
+  if (diagnostic_report_warnings_p (global_dc, input_location))
+   {
+ inform (input_location, "uses %qD, which does not have an alignment "
+ "parameter", alloc_fn);
+ if (!aligned_new_threshold)
+   inform (input_location, "use %<-faligned-new%> to enable C++17 "
+   "over-aligned new support");
+   }


This looks weird.  I'd expect instead:
 if (warning (OPT_Waligned_new_, "% of type %qT with extended "
   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type)))
{
  inform (input_location, "uses %qD, which does not have an alignment "
  "parameter", alloc_fn);
  if (!aligned_new_threshold)
inform (input_location, "use %<-faligned-new%> to enable C++17 "
"over-aligned new support");
}
That is a standard idiom used if some inform or later warning/error depends
on whether earlier warning/error has been diagnosed.


Yes.


If that works, this is ok for trunk and 7.1 (we don't have a rc1 yet, it is
ok now).


One more thing, the test passes even without the patch.  Did you mean to add
// { dg-options "-Wall -w" }
?


Good catch. My original testcase used  but I changed it
to not depend on the library, and didn't fix the dg-options.


Here's what I'm going to commit. Thanks for the reviews/help :)



commit 8a5d42bf7baba903807617e6dd800582514dec0b
Author: Jonathan Wakely 
Date:   Thu Apr 20 15:24:09 2017 +0100

PR c++/80473 allow suppressing notes about over-aligned new

gcc/cp:

	PR c++/80473
	* init.c (build_new_1): Suppress notes about over-aligned new when
	the warning is suppressed.

gcc/testsuite:

	PR c++/80473
	* g++.dg/diagnostic/pr80473.C: New test.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index bfa9020..e9c39ff 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3126,13 +3126,15 @@ build_new_1 (vec **placement, tree type, tree nelts,
 	  || CP_DECL_CONTEXT (alloc_fn) == global_namespace)
   && !aligned_allocation_fn_p (alloc_fn))
 {
-  warning (OPT_Waligned_new_, "% of type %qT with extended "
-	   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
-  inform (input_location, "uses %qD, which does not have an alignment "
-	  "parameter", alloc_fn);
-  if (!aligned_new_threshold)
-	inform (input_location, "use %<-faligned-new%> to enable C++17 "
-"over-aligned new support");
+  if (warning (OPT_Waligned_new_, "% of type %qT with extended "
+		   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type)))
+	{
+	  inform (input_location, "uses %qD, which does not have an alignment "
+		  "parameter", alloc_fn);
+	  if (!aligned_new_threshold)
+	inform (input_location, "use %<-faligned-new%> to enable C++17 "
+"over-aligned new support");
+	}
 }
 
   /* If we found a simple case of PLACEMENT_EXPR above, then copy it
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr80473.C b/gcc/testsuite/g++.dg/diagnostic/pr80473.C
new file mode 100644
index 000..8721213
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr80473.C
@@ -0,0 +1,16 @@
+// { dg-options "-Wall -w" }
+// { dg-do compile { target c++11 } }
+// { dg-bogus "over-aligned new" "PR c++/80473" { target *-*-* } 0 }
+
+template T&& declval();
+
+template
+struct is_constructible { enum { value = 0 }; };
+
+template
+struct is_constructible
+{ enum { value = 1 }; };
+
+struct alignas(64) A { int i; };
+
+constexpr bool b = is_constructible::value;


Re: [CHKP] Fix for PR79988

2017-04-20 Thread Ilya Enkovich
Hi,

Please put comment to code explaining why you don't use
gimple_call_builtin_p to avoid similar issues in the future.

Also please follow Rainer's comments.

OK with these fixes.

Thanks,
Ilya


2017-04-20 15:27 GMT+03:00 Rainer Orth :
> Hi Alexander,
>
> just a couple of nits:
>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/i386/mpx/PR79988.c: New test.
>
> We usually don't use capital PR in testcase names.  Please use pr79988.c
> instead, matching the other files there.
>
> Also, both ChangeLog entries should include a PR reference like so:
>
> PR middle-end/79988
> * gcc.target/i386/mpx/pr79988.c: New test.
>
> so the commit messages are automatically forwarded to bugzilla.
>
>> gcc/ChangeLog:
>>
>> * tree-chkp.c (chkp_gimple_call_builtin_p):
>> Remove gimple_call_builtin_p call to avoid the call
>> of gimple_builtin_call_types_compatible_p. this will
>> strip the checks for address spaces, which can be skipped
>> without loosing the functionality
>
> ChangeLog entries only describe *what* changed, not *why*.
>
> Thanks.
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [CHKP] Fix for PR79990

2017-04-20 Thread Ilya Enkovich
2017-04-20 12:27 GMT+03:00 Alexander Ivchenko :
> Thanks for correcting the usage of get_base_address. I fixed that.
> Plus addressed the comment about the avoiding the usage of
> chkp_find_bounds.
>
>
> gcc/testsuite/ChangeLog:
>
> 2017-04-20  Alexander Ivchenko  
>
> * gcc.target/i386/mpx/hard-reg-2-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-2-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-2-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-1-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-1-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-1-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-2-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-2-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-2-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-3-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-4-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-4-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-4-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-5-1-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-5-1-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-5-1-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-5-2-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-5-2-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-5-2-ubv.c: New test.
> * gcc.target/i386/mpx/hard-reg-6-lbv.c: New test.
> * gcc.target/i386/mpx/hard-reg-6-nov.c: New test.
> * gcc.target/i386/mpx/hard-reg-6-ubv.c: New test.
>
> gcc/ChangeLog:
>
> 2017-04-20  Alexander Ivchenko  
>
> * tree-chkp.c (chkp_get_hard_register_var_fake_base_address):
> New function to provide a base address for
> chkp_get_hard_register_fake_addr_expr.
> (chkp_get_hard_register_fake_addr_expr): New function to build
> fake address expression for an expr that resides on a hard
> register.
> (chkp_build_addr_expr): Add checks for hard reg cases.
> (chkp_parse_array_and_component_ref): Create/find bounds if the
> var resides on a hard reg.
>
>
> I already had a testcases for struct with a pointer - "hard-reg-4-*".
> Here is the instrumentation of the foo function:
>
> __attribute__((chkp instrumented))
> foo.chkp (int i)
> {
>   __bounds_type __bound_tmp.1;
>   register struct S2 b __asm__ (*xmm0);
>   int k;
>   int * _1;
>   long unsigned int _2;
>   long unsigned int _3;
>   int * _4;
>   int * _5;
>   int _11;
>   int * _18;
>
>[0.00%]:
>   __bound_tmp.1_17 = __chkp_zero_bounds;
>   __bound_tmp.1_14 = __builtin_ia32_bndmk (, 4);
>   __bound_tmp.1_13 = __builtin_ia32_bndmk (-2147483648B, 16);
>
>[0.00%]:
>
>[0.00%]:
>   k = 5;
>   b.f3 = 
>   __builtin_ia32_bndstx (, __bound_tmp.1_14, -2147483648B);
>   _1 = b.f3;
>   __bound_tmp.1_15 = __builtin_ia32_bndldx (-2147483648B, _1);
>   _2 = (long unsigned int) i_9(D);
>   _3 = _2 * 4;
>   _4 = _1 + _3;
>   b.f3 = _4;
>   __builtin_ia32_bndstx (_4, __bound_tmp.1_15, -2147483648B);
>   _5 = b.f3;
>   __bound_tmp.1_16 = __builtin_ia32_bndldx (-2147483648B, _5);
>   __builtin_ia32_bndcl (_5, __bound_tmp.1_16);
>   _18 = _5 + 3;
>   __builtin_ia32_bndcu (_18, __bound_tmp.1_16);
>   _11 = *_5;
>   k ={v} {CLOBBER};
>   return _11;
>
> }
>
> Which is the most suspicious one, because we have ldx and stx. I'm not
> sure whether this is OK.

It is not OK because single entry in BT may be used by multiple hardreg
variables. There is a code in chkp_find_bounds_1 to use zero bounds in
case we load bounds from register var. I believe here we should do the
same. In chkp_find_bounds_1 at ARRAY_REF/COMPONENT_REF
case check if addr is register var and use zero bounds.

Thanks,
Ilya


>
> Here is the chkp dump for foo function of newly added hard-reg-6* case:
>
> __attribute__((chkp instrumented))
> foo.chkp (int i, int * kp1, __bounds_type __chkp_bounds_of_kp1)
> {
>   __bounds_type __bound_tmp.1;
>   int D.2873;
>   register struct S2 b __asm__ (*xmm0);
>   int k2;
>   int * _1;
>   long unsigned int _2;
>   long unsigned int _3;
>   int * _4;
>   int * _5;
>   int _13;
>   int * _31;
>
>[0.00%]:
>   __bound_tmp.1_22 = __builtin_ia32_bndmk (, 4);
>   __bound_tmp.1_17 = __chkp_zero_bounds;
>   __bound_tmp.1_15 = __builtin_ia32_bndmk (-2147483648B, 16);
>
>[0.00%]:
>
>[0.00%]:
>   k2 = 5;
>   __bound_tmp.1_16 = __builtin_ia32_bndmk (-2147483648B, 16);
>   __bound_tmp.1_18 = __builtin_ia32_bndint (__bound_tmp.1_16, 
> __bound_tmp.1_15);
>   __builtin_ia32_bndcl (-2147483648B, __bound_tmp.1_18);
>   __builtin_ia32_bndcu (-2147483641B, __bound_tmp.1_18);
>   b.f[0] = kp1_8(D);
>   __builtin_ia32_bndstx (kp1_8(D), __chkp_bounds_of_kp1_19(D), -2147483648B);
>   __bound_tmp.1_20 = 

Re: [PATCH 2/5] omp-low: implement SIMT privatization, part 1

2017-04-20 Thread Jakub Jelinek
On Thu, Apr 20, 2017 at 07:37:13PM +0300, Alexander Monakov wrote:
> On Thu, 20 Apr 2017, Jakub Jelinek wrote:
> > > This wasn't caught in testing, as apparently all testcases that have 
> > > target
> > > simd loops with linear/lastprivate clauses also have the corresponding 
> > > variables
> > > mentioned in target map clause, which makes them addressable (is that 
> > > necessary?),
> > 
> > Yes, in order to map something you need to map its address (+ size) on the
> > host to its address on the device, so it needs to be addressable.
> > Compared to that, firstprivate on target should not make it addressable.
> 
> But ideally if nothing else is taking the address of a mapped variable inside
> of a target region, then it'd be more efficient to create a non-addressable
> instance and just copy its value from/to the addressable one on target
> region entry/exit.

Perhaps, but you'd need to do only if it is map on the target construct
because that is the only case where you can actually add copy in/out code
on the host as well as target.  And you'd need to think about nowait
implications etc., or what happens if it in addition to target construct
is mentioned in map clause on some other construct etc., i.e. take into
account all the clauses of the variable in the scope of the variable, not
just a single one.  That is GCC8 material for sure.  If it is say map(to:),
it could as well be just promotion into firstprivate, or for map(alloc:) to
private etc.

> > Would be nice to also test explicit linear, perhaps in the same testcase,
> > just add ch and c and use say linear(c:2).
> 
> Unfortunately that uncovers a separate wrong-code issue, explicit linear is
> not specifically handled in simt regions, but it should be, since we change
> the loop iteration step.

Then please commit what you have now and deal with the rest incrementally.

Jakub


Re: [PATCH] PR c++/80473 allow suppressing notes about over-aligned new

2017-04-20 Thread Jonathan Wakely

On 20/04/17 17:33 +0200, Marek Polacek wrote:

On Thu, Apr 20, 2017 at 05:22:00PM +0200, Jakub Jelinek wrote:

On Thu, Apr 20, 2017 at 04:09:20PM +0100, Jonathan Wakely wrote:
> --- a/gcc/cp/init.c
> +++ b/gcc/cp/init.c
> @@ -3128,11 +3128,14 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
>  {
>warning (OPT_Waligned_new_, "% of type %qT with extended "
>   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
> -  inform (input_location, "uses %qD, which does not have an alignment "
> -"parameter", alloc_fn);
> -  if (!aligned_new_threshold)
> -  inform (input_location, "use %<-faligned-new%> to enable C++17 "
> -  "over-aligned new support");
> +  if (diagnostic_report_warnings_p (global_dc, input_location))
> +  {
> +inform (input_location, "uses %qD, which does not have an alignment "
> +"parameter", alloc_fn);
> +if (!aligned_new_threshold)
> +  inform (input_location, "use %<-faligned-new%> to enable C++17 "
> +  "over-aligned new support");
> +  }

This looks weird.  I'd expect instead:
  if (warning (OPT_Waligned_new_, "% of type %qT with extended "
   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type)))
{
  inform (input_location, "uses %qD, which does not have an alignment "
  "parameter", alloc_fn);
  if (!aligned_new_threshold)
inform (input_location, "use %<-faligned-new%> to enable C++17 "
"over-aligned new support");
}
That is a standard idiom used if some inform or later warning/error depends
on whether earlier warning/error has been diagnosed.


Yes.


If that works, this is ok for trunk and 7.1 (we don't have a rc1 yet, it is
ok now).


One more thing, the test passes even without the patch.  Did you mean to add
// { dg-options "-Wall -w" }
?


Good catch. My original testcase used  but I changed it
to not depend on the library, and didn't fix the dg-options.




Re: [PATCH 2/5] omp-low: implement SIMT privatization, part 1

2017-04-20 Thread Alexander Monakov
On Thu, 20 Apr 2017, Jakub Jelinek wrote:
> > This wasn't caught in testing, as apparently all testcases that have target
> > simd loops with linear/lastprivate clauses also have the corresponding 
> > variables
> > mentioned in target map clause, which makes them addressable (is that 
> > necessary?),
> 
> Yes, in order to map something you need to map its address (+ size) on the
> host to its address on the device, so it needs to be addressable.
> Compared to that, firstprivate on target should not make it addressable.

But ideally if nothing else is taking the address of a mapped variable inside
of a target region, then it'd be more efficient to create a non-addressable
instance and just copy its value from/to the addressable one on target
region entry/exit.

> Would be nice to also test explicit linear, perhaps in the same testcase,
> just add ch and c and use say linear(c:2).

Unfortunately that uncovers a separate wrong-code issue, explicit linear is
not specifically handled in simt regions, but it should be, since we change
the loop iteration step.

Thanks.
Alexander


Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 31 October 2016 18:29
To: GCC Patches
Cc: nd
Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage
    
This patch cleans up all code related to the frame pointer.  On AArch64 we
emit a frame chain even in cases where the frame pointer is not required.
So make this explicit by introducing a boolean emit_frame_chain in
aarch64_frame record.

When the frame pointer is enabled but not strictly required (eg. no use of
alloca), we emit a frame chain in non-leaf functions, but continue to use the
stack pointer to access locals.  This results in smaller code and unwind info.

Also simplify the complex logic in aarch64_override_options_after_change_1
and compute whether the frame chain is required in aarch64_layout_frame
instead.  As a result aarch64_frame_pointer_required is now redundant and
aarch64_can_eliminate can be greatly simplified.

Finally convert all callee save/restore functions to use gen_frame_mem.

Bootstrap OK. Any comments?

ChangeLog:
2016-10-31  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.h (aarch64_frame):
 Add emit_frame_chain boolean.
    * config/aarch64/aarch64.c (aarch64_frame_pointer_required)
    Remove.
    (aarch64_layout_frame): Initialise emit_frame_chain.
    (aarch64_pushwb_single_reg): Use gen_frame_mem.
    (aarch64_pop_regs): Likewise.
    (aarch64_gen_load_pair): Likewise.
    (aarch64_save_callee_saves): Likewise.
    (aarch64_restore_callee_saves): Likewise.
    (aarch64_expand_prologue): Use emit_frame_chain.
    (aarch64_can_eliminate): Simplify. When FP needed or outgoing
    arguments are large, eliminate to FP, otherwise SP.
    (aarch64_override_options_after_change_1): Simplify.
    (TARGET_FRAME_POINTER_REQUIRED): Remove define.

--

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
fa81e4b853daf08842955288861ec7e7acca..6e32dc9f6f171dde0c182fdd7857230251f71712
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -583,6 +583,9 @@ struct GTY (()) aarch64_frame
   /* The size of the stack adjustment after saving callee-saves.  */
   HOST_WIDE_INT final_adjust;
 
+  /* Store FP,LR and setup a frame pointer.  */
+  bool emit_frame_chain;
+
   unsigned wb_candidate1;
   unsigned wb_candidate2;
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
f07d771ea343803e054e03f59c8c1efb698bf474..6c06ac18d16f8afa7ee1cc5e8530e285a60e2b0f
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2728,24 +2728,6 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
-static bool
-aarch64_frame_pointer_required (void)
-{
-  /* In aarch64_override_options_after_change
- flag_omit_leaf_frame_pointer turns off the frame pointer by
- default.  Turn it back on now if we've not got a leaf
- function.  */
-  if (flag_omit_leaf_frame_pointer
-  && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)))
-    return true;
-
-  /* Force a frame pointer for EH returns so the return address is at FP+8.  */
-  if (crtl->calls_eh_return)
-    return true;
-
-  return false;
-}
-
 /* Mark the registers that need to be saved by the callee and calculate
    the size of the callee-saved registers area and frame record (both FP
    and LR may be omitted).  */
@@ -2758,6 +2740,18 @@ aarch64_layout_frame (void)
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
 
+  /* Force a frame chain for EH returns so the return address is at FP+8.  */
+  cfun->machine->frame.emit_frame_chain
+    = frame_pointer_needed || crtl->calls_eh_return;
+
+  /* Emit a frame chain if the frame pointer is enabled.
+ If -momit-leaf-frame-pointer is used, do not use a frame chain
+ in leaf functions which do not use LR.  */
+  if (flag_omit_frame_pointer == 2
+  && !(flag_omit_leaf_frame_pointer && crtl->is_leaf
+  && !df_regs_ever_live_p (LR_REGNUM)))
+    cfun->machine->frame.emit_frame_chain = true;
+
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
 
@@ -2789,7 +2783,7 @@ aarch64_layout_frame (void)
 && !call_used_regs[regno])
   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
 
-  if (frame_pointer_needed)
+  if (cfun->machine->frame.emit_frame_chain)
 {
   /* FP and LR are placed in the linkage record.  */
   cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
@@ -2937,7 +2931,7 @@ aarch64_pushwb_single_reg (machine_mode mode, unsigned 
regno,
   reg = gen_rtx_REG (mode, regno);
   mem = gen_rtx_PRE_MODIFY (Pmode, base_rtx,
 plus_constant (Pmode, base_rtx, -adjustment));
-  mem = gen_rtx_MEM (mode, mem);
+  mem = gen_frame_mem (mode, mem);
 
   insn = emit_move_insn (mem, reg);
   RTX_FRAME_RELATED_P (insn) = 1;
@@ -3011,7 +3005,7 @@ aarch64_pop_regs (unsigned regno1, unsigned regno2, 
HOST_WIDE_INT adjustment,
 {
   rtx mem = 

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 29 November 2016 11:05
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8
    
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only to one of the duplicates.

Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

    * config/arm/vfp.md (movdi_vfp): Merge changes from movdi_vfp_cortexa8.
    * (movdi_vfp_cortexa8): Remove pattern.
--

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
2051f1018f1cbff9c5bf044e71304d78e615458e..a917aa625a7b15f6c9e2b549ab22e5219bb9b99c
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -304,9 +304,9 @@
 ;; DImode moves
 
 (define_insn "*movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,r,w,w, 
Uv")
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,q,q,m,w,!r,w,w, Uv")
    (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune != TARGET_CPU_cortexa8
+  "TARGET_32BIT && TARGET_HARD_FLOAT
    && (   register_operand (operands[0], DImode)
    || register_operand (operands[1], DImode))
    && !(TARGET_NEON && CONST_INT_P (operands[1])
@@ -339,71 +339,25 @@
 }
   "
   [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1,4,5,6") (const_int 8)
+   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
   (eq_attr "alternative" "2") (const_int 12)
   (eq_attr "alternative" "3") (const_int 16)
+ (eq_attr "alternative" "4,5,6")
+  (symbol_ref "arm_count_output_move_double_insns 
(operands) * 4")
   (eq_attr "alternative" "9")
    (if_then_else
  (match_test "TARGET_VFP_SINGLE")
  (const_int 8)
  (const_int 4))]
   (const_int 4)))
+   (set_attr "predicable"    "yes")
    (set_attr "arm_pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
    (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
    (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
+   (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4"))
    (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
 )
 
-(define_insn "*movdi_vfp_cortexa8"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-   (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
-    && (   register_operand (operands[0], DImode)
-    || register_operand (operands[1], DImode))
-    && !(TARGET_NEON && CONST_INT_P (operands[1])
-    && neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
-  "*
-  switch (which_alternative)
-    {
-    case 0: 
-    case 1:
-    case 2:
-    case 3:
-  return \"#\";
-    case 4:
-    case 5:
-    case 6:
-  return output_move_double (operands, true, NULL);
-    case 7:
-  return \"vmov%?\\t%P0, %Q1, %R1\\t%@ int\";
-    case 8:
-  return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
-    case 9:
-  return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
-    case 10: case 11:
-  return output_move_vfp (operands);
-    default:
-  gcc_unreachable ();
-    }
-  "
-  [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
-   (eq_attr "alternative" "2") (const_int 12)
-   (eq_attr "alternative" "3") (const_int 16)
-   (eq_attr "alternative" "4,5,6") 
-  (symbol_ref 
-   "arm_count_output_move_double_insns (operands) \
- * 4")]
-  (const_int 4)))
-   (set_attr "predicable"    "yes")
-   (set_attr "arm_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
-   (set (attr "ce_count") 
-   (symbol_ref "get_attr_length (insn) / 4"))
-   (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
- )
-
 ;; HFmode moves
 
 (define_insn "*movhf_vfp_fp16"


Re: [PATCH][ARM] Fix ldrd offsets

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
    
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.

ChangeLog:
2015-11-03  Wilco Dijkstra  

    gcc/
    * config/arm/arm.c (arm_legitimate_index_p): Add comment.
    (thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
 {
   HOST_WIDE_INT val = INTVAL (index);
 
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
   if (TARGET_LDRD)
 return val > -256 && val < 256;
   else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
index, int strict_p)
   if (code == CONST_INT)
 {
   HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-    However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+    Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
 }
   else
 return 0;

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
    
Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same limit for ARM
for consistency. 

ChangeLog:
2016-11-04  Wilco Dijkstra  

    * config/arm/arm.c (arm_option_params_internal): Improve setting of
    max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }
 
-  if (optimize_size)
-    {
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
 
-  /* For THUMB2, we limit the conditional sequence to one IT block.  */
-  if (TARGET_THUMB2)
-    max_insns_skipped = arm_restrict_it ? 1 : 4;
-    }
-  else
-    /* When -mrestrict-it is in use tone down the if-conversion.  */
-    max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+    max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
 }
 
 /* True if -mflip-thumb should next add an attribute for the default



Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
    
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the question is whether we should allow
largish offsets outside of the bounds of symbols (v1), no offsets (this 
version), or
small offsets (small negative and positive offsets just outside a symbol are 
common).
The only thing we can't allow is any offset like we currently do...

In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.


ChangeLog:
2017-01-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Apply reasonable limit to symbol offsets.

    testsuite/
    * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
    * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
 return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
 {
 case AARCH64_CMODEL_TINY:
@@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
  we have no way of knowing the address of symbol at compile time
  so we can't accurately say if the distance between the PC and
  symbol + offset is outside the addressible range of +/-1M in the
-    TINY code model.  So we rely on images not being greater than
-    1M and cap the offset at 1M and anything beyond 1M will have to
-    be loaded using an alternative mechanism.  Furthermore if the
-    symbol is a weak reference to something that isn't known to
-    resolve to a symbol in this module, then force to memory.  */
+    TINY code model.  So we limit the maximum offset to +/-64KB and
+    assume the offset to the symbol is not larger than +/-(1M - 64KB).
+    Furthermore force to memory if the symbol is a weak reference to
+    something that doesn't resolve to a symbol in this module.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_TINY_ABSOLUTE;
 
 case AARCH64_CMODEL_SMALL:
   /* Same reasoning as the tiny code model, but the offset cap here is
-    4G.  */
+    1G, allowing +/-3G for the offset to the symbol.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_SMALL_ABSOLUTE;
 
 case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
--- 

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 17 January 2017 18:00
To: GCC Patches
Cc: nd; Kyrylo Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns
    
After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This means the Thumb-2 iordi_notdi_di
patterns are no longer used - the split ORR and NOT instructions are merged
into ORN by Combine.  With -mfpu=neon the iordi_notdi_di patterns are used
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN).  After this there are no Thumb-2 specific DImode 
patterns.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/thumb2.md (iordi_notdi_di): Remove pattern.
    (iordi_notzesidi_di): Likewise.
    (iordi_notdi_zesidi): Likewise.
    (iordi_notsesidi_di): Likewise.

--

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
2e7580f220eae1524fef69719b1796f50f5cf27c..91471d4650ecae4f4e87b549d84d11adf3014ad2
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1434,103 +1434,6 @@
    (set_attr "type" "alu_sreg")]
 )
 
-; Constants for op 2 will never be given to these patterns.
-(define_insn_and_split "*iordi_notdi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r"))
-   (match_operand:DI 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
-   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[5] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notzesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (zero_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,?r")))]
-  "TARGET_THUMB2"
-  "#"
-  ; (not (zero_extend...)) means operand0 will always be 0x
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (const_int -1))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "4,8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notdi_zesidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r"))
-   (zero_extend:DI
-    (match_operand:SI 1 "s_register_operand" "r,r"]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (not:SI (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[4] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int 31)))
-  (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
 (define_insn "*orsi_notsi_si"
   [(set (match_operand:SI 0 

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]

Re: [PATCH][AArch64] Enable AES fusion with -mcpu=generic

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 16 March 2017 17:22
To: GCC Patches; Evandro Menezes; andrew.pin...@cavium.com; 
jim.wil...@linaro.org
Cc: nd
Subject: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
    
Many supported cores implement fusion of AES instructions.  When fusion
happens it can give a significant performance gain.  If not, scheduling
fusion candidates next to each other has almost no effect on performance.
Due to the high benefit/low cost it makes sense to enable AES fusion with
-mcpu=generic so that cores that support it always benefit.  Any objections?

Bootstrapped on AArch64, no regressions.

ChangeLog:
2017-03-16  Wilco Dijkstra  

    * gcc/config/aarch64/aarch64.c (generic_tunings): Add AES fusion.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
728ce7029f1e2b5161d9f317d10e564dd5a5f472..c8cf7169a5d387de336920b50c83761dc0c96f3a
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -536,7 +536,7 @@ static const struct tune_params generic_tunings =
   _approx_modes,
   4, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
   8,   /* function_align.  */
   8,   /* jump_align.  */
   4,   /* loop_align.  */


Re: [PATCH][AArch64] Model Cortex-A53 load forwarding

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 05 April 2017 13:29
To: GCC Patches
Cc: nd; James Greenhalgh
Subject: [PATCH][AArch64] Model Cortex-A53 load forwarding
    
Code scheduling for Cortex-A53 isn't as good as it could be.  It turns out
code runs faster overall if we place loads and stores with a dependency
closer together.  To achieve this effect, this patch adds a bypass between
cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an
earlier load is used in an address calculation.  This significantly improved
benchmark scores in a proprietary benchmark suite.

Passes AArch64 bootstrap and regress. OK for stage 1?

ChangeLog:
2017-04-05  Wilco Dijkstra  

    * config/arm/aarch-common.c (arm_early_load_addr_dep_ptr):
    New function.
    (arm_early_store_addr_dep_ptr): Likewise.
    * config/arm/aarch-common-protos.h
    (arm_early_load_addr_dep_ptr): Add prototype.
    (arm_early_store_addr_dep_ptr): Likewise.
    * config/arm/cortex-a53.md: Add new bypasses.
---

diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 
8e9fb7a895b0a4aaf1585eb3368443899b061c9b..5298172e6b6930a110388a40a7533ff208a87095
 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -30,7 +30,9 @@ extern bool aarch_rev16_p (rtx);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
 extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode);
 extern int arm_early_load_addr_dep (rtx, rtx);
+extern int arm_early_load_addr_dep_ptr (rtx, rtx);
 extern int arm_early_store_addr_dep (rtx, rtx);
+extern int arm_early_store_addr_dep_ptr (rtx, rtx);
 extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
 extern int arm_mac_accumulator_is_result (rtx, rtx);
 extern int arm_no_early_alu_shift_dep (rtx, rtx);
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 
dd37be0291a633f606d95ec8acacc598435828b3..74b80b272550028919c4274387944867ffed43d1
 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -241,6 +241,24 @@ arm_early_load_addr_dep (rtx producer, rtx consumer)
   return reg_overlap_mentioned_p (value, addr);
 }
 
+/* Return nonzero if the CONSUMER instruction (a load) does need
+   a Pmode PRODUCER's value to calculate the address.  */
+
+int
+arm_early_load_addr_dep_ptr (rtx producer, rtx consumer)
+{
+  rtx value = arm_find_sub_rtx_with_code (PATTERN (producer), SET, false);
+  rtx addr = arm_find_sub_rtx_with_code (PATTERN (consumer), SET, false);
+
+  if (!value || !addr || !MEM_P (SET_SRC (value)))
+    return 0;
+
+  value = SET_DEST (value);
+  addr = SET_SRC (addr);
+
+  return GET_MODE (value) == Pmode && reg_overlap_mentioned_p (value, addr);
+}
+
 /* Return nonzero if the CONSUMER instruction (an ALU op) does not
    have an early register shift value or amount dependency on the
    result of PRODUCER.  */
@@ -336,6 +354,24 @@ arm_early_store_addr_dep (rtx producer, rtx consumer)
   return !arm_no_early_store_addr_dep (producer, consumer);
 }
 
+/* Return nonzero if the CONSUMER instruction (a store) does need
+   a Pmode PRODUCER's value to calculate the address.  */
+
+int
+arm_early_store_addr_dep_ptr (rtx producer, rtx consumer)
+{
+  rtx value = arm_find_sub_rtx_with_code (PATTERN (producer), SET, false);
+  rtx addr = arm_find_sub_rtx_with_code (PATTERN (consumer), SET, false);
+
+  if (!value || !addr || !MEM_P (SET_SRC (value)))
+    return 0;
+
+  value = SET_DEST (value);
+  addr = SET_DEST (addr);
+
+  return GET_MODE (value) == Pmode && reg_overlap_mentioned_p (value, addr);
+}
+
 /* Return non-zero iff the consumer (a multiply-accumulate or a
    multiple-subtract instruction) has an accumulator dependency on the
    result of the producer and no other dependency on that result.  It
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 
b367ad403a4a641da34521c17669027b87092737..f8225f33c7a06485147b30fe2633309ac252d0c7
 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -246,6 +246,16 @@
  "cortex_a53_store*"
  "arm_no_early_store_addr_dep")
 
+;; Model a bypass for load to load/store address.
+
+(define_bypass 3 "cortex_a53_load1"
+    "cortex_a53_load*"
+    "arm_early_load_addr_dep_ptr")
+
+(define_bypass 3 "cortex_a53_load1"
+    "cortex_a53_store*"
+    "arm_early_store_addr_dep_ptr")
+
 ;; Model a GP->FP register move as similar to stores.
 
 (define_bypass 0 "cortex_a53_alu*,cortex_a53_shift*"



Re: [PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 05 April 2017 13:38
To: GCC Patches
Cc: nd; James Greenhalgh; andrew.pin...@cavium.com; Evandro Menezes; 
jim.wil...@linaro.org
Subject: [PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic
    
Many supported cores use the AUTOPREFETCHER_WEAK setting which tries
to order loads and stores to improve streaming performance.  Since significant
gains were reported in http://patchwork.ozlabs.org/patch/534469/ it seems
like a good idea to enable this setting too for -mcpu=generic.  Since the
weak model only keeps the order if it doesn't make the schedule worse, it
should not impact performance adversely on cores that don't show a gain.
Any objections?

ChangeLog:
2017-04-05  Wilco Dijkstra  

    * gcc/config/aarch64/aarch64.c (generic_tunings): Update prefetch model.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
8b729b1b1f87316e940d7fc657f235a935ffa93e..b249ce2b310707c7ded2827d505ce2ddfcfbf976
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,7 +547,7 @@ static const struct tune_params generic_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   0,   /* cache_line_size.  */
-  tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE)    /* tune_flags.  */
 };

Re: [PATCH][AArch64] Set jump alignment to 4 for Cortex cores

2017-04-20 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 12 April 2017 13:50
To: GCC Patches
Cc: nd; James Greenhalgh
Subject: [PATCH][AArch64] Set jump alignment to 4 for Cortex cores
    
Set jump alignment to 4 for Cortex cores as it reduces codesize by 0.4% on 
average
with no obvious performance difference.  See original discussion of the 
overheads
of various alignments:  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html

Bootstrap OK, OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  

    * config/aarch64/aarch64.c (cortexa35_tunings): Set jump alignment to 4.
    (cortexa53_tunings): Likewise.
    (cortexa57_tunings): Likewise.
    (cortexa72_tunings): Likewise.
    (cortexa73_tunings): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
a6004e6e283ba7157e65b678cf668f8a47e21abb..a8b3a29dd2e242a35f37b8c6a6fb30699ace5e01
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -564,7 +564,7 @@ static const struct tune_params cortexa35_tunings =
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
   16,  /* function_align.  */
-  8,   /* jump_align.  */
+  4,   /* jump_align.  */
   8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
@@ -590,7 +590,7 @@ static const struct tune_params cortexa53_tunings =
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
   16,  /* function_align.  */
-  8,   /* jump_align.  */
+  4,   /* jump_align.  */
   8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
@@ -616,7 +616,7 @@ static const struct tune_params cortexa57_tunings =
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
-  8,   /* jump_align.  */
+  4,   /* jump_align.  */
   8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
@@ -642,7 +642,7 @@ static const struct tune_params cortexa72_tunings =
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
-  8,   /* jump_align.  */
+  4,   /* jump_align.  */
   8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
@@ -668,7 +668,7 @@ static const struct tune_params cortexa73_tunings =
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
   16,  /* function_align.  */
-  8,   /* jump_align.  */
+  4,   /* jump_align.  */
   8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */


Re: [PATCH][AArch64] Update alignment for -mcpu=generic

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 12 April 2017 13:58
To: GCC Patches
Cc: nd; James Greenhalgh; jim.wil...@linaro.org; Evandro Menezes; 
andrew.pin...@cavium.com
Subject: [PATCH][AArch64] Update alignment for -mcpu=generic
    
With -mcpu=generic the loop alignment is currently 4.  All but one of the
supported cores use 8 or higher.  Since using 8 provides performance gains
on several cores, it is best to use that by default.  As discussed in [1],
the jump alignment has no effect on performance, yet has a relatively high
codesize cost [2], so setting it to 4 is best.  This gives a 0.2% overall 
codesize improvement as well as performance gains in several benchmarks.
Any objections?

Bootstrap OK on AArch64, OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  

    * config/aarch64/aarch64.c (generic_tunings): Set jump alignment to 4.
    Set loop alignment to 8.

[1] https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00574.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02075.html

---
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c8cf7169a5d387de336920b50c83761dc0c96f3a..8b729b1b1f87316e940d7fc657f235a935ffa93e
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -538,8 +538,8 @@ static const struct tune_params generic_tunings =
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
   8,   /* function_align.  */
-  8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  4,   /* jump_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
    

Re: [PATCH][ARM] Update max_cond_insns settings

2017-04-20 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 12 April 2017 14:02
To: GCC Patches
Cc: nd; Kyrylo Tkachov
Subject: [PATCH][ARM] Update max_cond_insns settings
    
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  

    * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
    (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,

Re: [PATCH][AArch64] Improve address cost for -mcpu=generic

2017-04-20 Thread Wilco Dijkstra
ping

From: Wilco Dijkstra
Sent: 12 April 2017 14:08
To: GCC Patches
Cc: nd; James Greenhalgh; Evandro Menezes; jim.wil...@linaro.org; 
andrew.pin...@cavium.com
Subject: [PATCH][AArch64] Improve address cost for -mcpu=generic
    
All cores which add a cpu_addrcost_table use a non-zero value for
HI and TI mode shifts (a non-zero value for general indexing also 
applies to all shifts).  Given this, it makes no sense to use a
different setting in generic_addrcost_table.  So change it so that all
supported cores, including -mcpu=generic, now generate the same:

int f(short *p, short *q, long x) { return p[x] + q[x]; }

    lsl x2, x2, 1
    ldrsh   w3, [x0, x2]
    ldrsh   w0, [x1, x2]
    add w0, w3, w0
    ret

Bootstrapped for AArch64. Any comments? OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  

    * gcc/config/aarch64/aarch64.c (generic_addrcost_table):
    Change HI/TI mode setting.

---
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
419b756efcb40e48880cd4529efc4f9f59938325..728ce7029f1e2b5161d9f317d10e564dd5a5f472
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -193,10 +193,10 @@ static const struct aarch64_flag_desc 
aarch64_tuning_flags[] =
 static const struct cpu_addrcost_table generic_addrcost_table =
 {
 {
-  0, /* hi  */
+  1, /* hi  */
   0, /* si  */
   0, /* di  */
-  0, /* ti  */
+  1, /* ti  */
 },
   0, /* pre_modify  */
   0, /* post_modify  */



Re: [PATCH] PR c++/80473 allow suppressing notes about over-aligned new

2017-04-20 Thread Marek Polacek
On Thu, Apr 20, 2017 at 05:22:00PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 20, 2017 at 04:09:20PM +0100, Jonathan Wakely wrote:
> > --- a/gcc/cp/init.c
> > +++ b/gcc/cp/init.c
> > @@ -3128,11 +3128,14 @@ build_new_1 (vec **placement, tree 
> > type, tree nelts,
> >  {
> >warning (OPT_Waligned_new_, "% of type %qT with extended "
> >"alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
> > -  inform (input_location, "uses %qD, which does not have an alignment "
> > - "parameter", alloc_fn);
> > -  if (!aligned_new_threshold)
> > -   inform (input_location, "use %<-faligned-new%> to enable C++17 "
> > -   "over-aligned new support");
> > +  if (diagnostic_report_warnings_p (global_dc, input_location))
> > +   {
> > + inform (input_location, "uses %qD, which does not have an alignment "
> > + "parameter", alloc_fn);
> > + if (!aligned_new_threshold)
> > +   inform (input_location, "use %<-faligned-new%> to enable C++17 "
> > +   "over-aligned new support");
> > +   }
> 
> This looks weird.  I'd expect instead:
>   if (warning (OPT_Waligned_new_, "% of type %qT with extended "
>  "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type)))
>   {
> inform (input_location, "uses %qD, which does not have an alignment "
> "parameter", alloc_fn);
> if (!aligned_new_threshold)
>   inform (input_location, "use %<-faligned-new%> to enable C++17 "
>   "over-aligned new support");
>   }
> That is a standard idiom used if some inform or later warning/error depends
> on whether earlier warning/error has been diagnosed.
 
Yes.

> If that works, this is ok for trunk and 7.1 (we don't have a rc1 yet, it is
> ok now).

One more thing, the test passes even without the patch.  Did you mean to add
// { dg-options "-Wall -w" }
?

Marek


Re: [PATCH] PR c++/80473 allow suppressing notes about over-aligned new

2017-04-20 Thread Jonathan Wakely

On 20/04/17 17:22 +0200, Jakub Jelinek wrote:

On Thu, Apr 20, 2017 at 04:09:20PM +0100, Jonathan Wakely wrote:

--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3128,11 +3128,14 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
 {
   warning (OPT_Waligned_new_, "% of type %qT with extended "
   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
-  inform (input_location, "uses %qD, which does not have an alignment "
- "parameter", alloc_fn);
-  if (!aligned_new_threshold)
-   inform (input_location, "use %<-faligned-new%> to enable C++17 "
-   "over-aligned new support");
+  if (diagnostic_report_warnings_p (global_dc, input_location))
+   {
+ inform (input_location, "uses %qD, which does not have an alignment "
+ "parameter", alloc_fn);
+ if (!aligned_new_threshold)
+   inform (input_location, "use %<-faligned-new%> to enable C++17 "
+   "over-aligned new support");
+   }


This looks weird.  I'd expect instead:
 if (warning (OPT_Waligned_new_, "% of type %qT with extended "
   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type)))
{
  inform (input_location, "uses %qD, which does not have an alignment "
  "parameter", alloc_fn);
  if (!aligned_new_threshold)
inform (input_location, "use %<-faligned-new%> to enable C++17 "
"over-aligned new support");
}
That is a standard idiom used if some inform or later warning/error depends
on whether earlier warning/error has been diagnosed.


Aha, thanks.


If that works, this is ok for trunk and 7.1 (we don't have a rc1 yet, it is
ok now).


OK, I'll test it now.




Re: [PATCH] PR c++/80473 allow suppressing notes about over-aligned new

2017-04-20 Thread Jakub Jelinek
On Thu, Apr 20, 2017 at 04:09:20PM +0100, Jonathan Wakely wrote:
> --- a/gcc/cp/init.c
> +++ b/gcc/cp/init.c
> @@ -3128,11 +3128,14 @@ build_new_1 (vec **placement, tree type, 
> tree nelts,
>  {
>warning (OPT_Waligned_new_, "% of type %qT with extended "
>  "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
> -  inform (input_location, "uses %qD, which does not have an alignment "
> -   "parameter", alloc_fn);
> -  if (!aligned_new_threshold)
> - inform (input_location, "use %<-faligned-new%> to enable C++17 "
> - "over-aligned new support");
> +  if (diagnostic_report_warnings_p (global_dc, input_location))
> + {
> +   inform (input_location, "uses %qD, which does not have an alignment "
> +   "parameter", alloc_fn);
> +   if (!aligned_new_threshold)
> + inform (input_location, "use %<-faligned-new%> to enable C++17 "
> + "over-aligned new support");
> + }

This looks weird.  I'd expect instead:
  if (warning (OPT_Waligned_new_, "% of type %qT with extended "
   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type)))
{
  inform (input_location, "uses %qD, which does not have an alignment "
  "parameter", alloc_fn);
  if (!aligned_new_threshold)
inform (input_location, "use %<-faligned-new%> to enable C++17 "
"over-aligned new support");
}
That is a standard idiom used if some inform or later warning/error depends
on whether earlier warning/error has been diagnosed.

If that works, this is ok for trunk and 7.1 (we don't have a rc1 yet, it is
ok now).

Jakub


Re: [PATCH 2/5] omp-low: implement SIMT privatization, part 1

2017-04-20 Thread Jakub Jelinek
On Fri, Mar 31, 2017 at 06:38:09PM +0300, Alexander Monakov wrote:
> I've noticed while re-reading that this patch incorrectly handled SIMT case
> in lower_lastprivate_clauses.  The code was changed to look for variables
> with "omp simt private" attribute, and was left under
> 'simduid && DECL_HAS_VALUE_EXPR_P (new_var)' condition.  This effectively
> constrained processing to privatized (i.e. addressable) variables, as
> non-addressable variables now have neither the value-expr nor the attribute.

Sorry for the review delay.

> This wasn't caught in testing, as apparently all testcases that have target
> simd loops with linear/lastprivate clauses also have the corresponding 
> variables
> mentioned in target map clause, which makes them addressable (is that 
> necessary?),

Yes, in order to map something you need to map its address (+ size) on the
host to its address on the device, so it needs to be addressable.
Compared to that, firstprivate on target should not make it addressable.

> and I didn't think to check something like that manually.
> 
> The following patch fixes it by adjusting the if's in 
> lower_lastprivate_clauses;
> alternatively it may be possible to keep that code as-is, and instead set up
> value-expr and "omp simt private" attributes for all formally private 
> variables,
> not just addressable ones, but to me that sounds less preferable.  OK for 
> trunk?
> 
> I think it's possible to improve test coverage for NVPTX by running all OpenMP
> testcases via nvptx-none-run (it'll need some changes, but shouldn't be hard).
> 
> gcc/
>   * omp-low.c (lower_lastprivate_clauses): Correct handling of linear and
>   lastprivate clauses in SIMT case.
> 
> libgomp/
>   * testsuite/libgomp.c/target-36.c: New testcase.

Ok for trunk/gcc-7-branch, thanks.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-36.c
> @@ -0,0 +1,18 @@
> +int
> +main ()
> +{
> +  int ah, bh, n = 1024;
> +#pragma omp target map(from: ah, bh)
> +  {
> +int a, b;
> +#pragma omp simd lastprivate(b)
> +for (a = 0; a < n; a++)
> +  {
> + b = a + n + 1;
> + asm volatile ("" : "+r"(b));
> +  }
> +ah = a, bh = b;
> +  }
> +  if (ah != n || bh != 2 * n)
> +__builtin_abort ();
> +}

Would be nice to also test explicit linear, perhaps in the same testcase,
just add ch and c and use say linear(c:2).

Jakub


Re: [PATCH, libstdc++] PR libstdc++/68397 std::tr1::expint fails ... long double arguments.

2017-04-20 Thread Jonathan Wakely

On 20/04/17 10:58 -0400, Ed Smith-Rowland wrote:

Here is a patch for

 PR libstdc++/68397 std::tr1::expint fails in __expint_En_cont_frac 
for some long double arguments due to low __max_iter value.


The proposed resolution of increasing the max_iter to 1000 is a simple 
and obvious fix.


Built and tested on x86_64-linux.

OK?


OK for trunk, thanks.



[PATCH] PR c++/80473 allow suppressing notes about over-aligned new

2017-04-20 Thread Jonathan Wakely

This suppresses the notes about over-aligned new when the
corresponding warning is suppressed, either by -w or because it's in a
system header (as in PR 80390, which I originally tried to fix in the
library).

OK for trunk? And gcc-7-branch after 7.1 is released?

Is this the right place for the new test?


gcc/cp:

PR c++/80473
* init.c (build_new_1): Suppress notes about over-aligned new when
the warning is suppressed.

gcc/testsuite:

PR c++/80473
* init.c (build_new_1): Suppress notes in system headers.
PR c++/80473
* g++.dg/diagnostic/pr80473.C: New test.
commit 66697a6551dd13715c2156ac6b2d5969010c12db
Author: Jonathan Wakely 
Date:   Thu Apr 20 15:24:09 2017 +0100

PR c++/80473 allow suppressing notes about over-aligned new

gcc/cp:

	PR c++/80473
	* init.c (build_new_1): Suppress notes about over-aligned new when
	the warning is suppressed.

gcc/testsuite:

	PR c++/80473
	* g++.dg/diagnostic/pr80473.C: New test.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index bfa9020..2e72451 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3128,11 +3128,14 @@ build_new_1 (vec **placement, tree type, tree nelts,
 {
   warning (OPT_Waligned_new_, "% of type %qT with extended "
 	   "alignment %d", elt_type, TYPE_ALIGN_UNIT (elt_type));
-  inform (input_location, "uses %qD, which does not have an alignment "
-	  "parameter", alloc_fn);
-  if (!aligned_new_threshold)
-	inform (input_location, "use %<-faligned-new%> to enable C++17 "
-"over-aligned new support");
+  if (diagnostic_report_warnings_p (global_dc, input_location))
+	{
+	  inform (input_location, "uses %qD, which does not have an alignment "
+		  "parameter", alloc_fn);
+	  if (!aligned_new_threshold)
+	inform (input_location, "use %<-faligned-new%> to enable C++17 "
+"over-aligned new support");
+	}
 }
 
   /* If we found a simple case of PLACEMENT_EXPR above, then copy it
diff --git a/gcc/testsuite/g++.dg/diagnostic/pr80473.C b/gcc/testsuite/g++.dg/diagnostic/pr80473.C
new file mode 100644
index 000..6a739f5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/pr80473.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+// { dg-bogus "over-aligned new" "PR c++/80473" { target *-*-* } 0 }
+template T&& declval();
+
+template
+struct is_constructible { enum { value = 0 }; };
+
+template
+struct is_constructible
+{ enum { value = 1 }; };
+
+struct alignas(64) A { int i; };
+
+constexpr bool b = is_constructible::value;


[PATCH, libstdc++] PR libstdc++/68397 std::tr1::expint fails ... long double arguments.

2017-04-20 Thread Ed Smith-Rowland

Here is a patch for

  PR libstdc++/68397 std::tr1::expint fails in __expint_En_cont_frac 
for some long double arguments due to low __max_iter value.


The proposed resolution of increasing the max_iter to 1000 is a simple 
and obvious fix.


Built and tested on x86_64-linux.

OK?

Ed


2017-04-20  Edward Smith-Rowland  <3dw...@verizon.net>

PR libstdc++/68397 std::tr1::expint fails ... long double arguments.
* include/tr1/exp_integral.tcc: Increase iteration limits.
* testsuite/tr1/5_numerical_facilities/special_functions/15_expint/
pr68397.cc: New test.
* testsuite/special_functions/14_expint/pr68397.cc: New test.
Index: include/tr1/exp_integral.tcc
===
--- include/tr1/exp_integral.tcc(revision 246797)
+++ include/tr1/exp_integral.tcc(working copy)
@@ -86,7 +86,7 @@
   _Tp __term = _Tp(1);
   _Tp __esum = _Tp(0);
   _Tp __osum = _Tp(0);
-  const unsigned int __max_iter = 100;
+  const unsigned int __max_iter = 1000;
   for (unsigned int __i = 1; __i < __max_iter; ++__i)
 {
   __term *= - __x / __i;
@@ -156,7 +156,7 @@
 _Tp
 __expint_En_series(unsigned int __n, _Tp __x)
 {
-  const unsigned int __max_iter = 100;
+  const unsigned int __max_iter = 1000;
   const _Tp __eps = std::numeric_limits<_Tp>::epsilon();
   const int __nm1 = __n - 1;
   _Tp __ans = (__nm1 != 0
@@ -202,7 +202,7 @@
 _Tp
 __expint_En_cont_frac(unsigned int __n, _Tp __x)
 {
-  const unsigned int __max_iter = 100;
+  const unsigned int __max_iter = 1000;
   const _Tp __eps = std::numeric_limits<_Tp>::epsilon();
   const _Tp __fp_min = std::numeric_limits<_Tp>::min();
   const int __nm1 = __n - 1;
Index: 
testsuite/tr1/5_numerical_facilities/special_functions/15_expint/pr68397.cc
===
--- testsuite/tr1/5_numerical_facilities/special_functions/15_expint/pr68397.cc 
(nonexistent)
+++ testsuite/tr1/5_numerical_facilities/special_functions/15_expint/pr68397.cc 
(working copy)
@@ -0,0 +1,46 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// PR libstdc++/68397 -  std::tr1::expint fails in __expint_En_cont_frac
+// for some long double arguments due to low __max_iter value
+
+#include 
+#include 
+
+void
+test01()
+{
+  // Answers from Wolfram Alpha.
+  long double ans_ok = -0.10001943365331651406888645149537315243646135979573L;
+  long double ans_bomb = 
-0.1027809650077516264612749163100483995270163783L;
+
+  long double Ei_ok = std::tr1::expint(-1.51L);
+  long double diff_ok = std::abs(Ei_ok - ans_ok);
+  VERIFY(diff_ok < 1.0e-15L);
+
+  long double Ei_bomb = std::tr1::expint(-1.450001L);
+  long double diff_bomb = std::abs(Ei_bomb - ans_bomb);
+  VERIFY(diff_bomb < 1.0e-15L);
+}
+
+int
+main()
+{
+  test01();
+  return 0;
+}
+
Index: testsuite/special_functions/14_expint/pr68397.cc
===
--- testsuite/special_functions/14_expint/pr68397.cc(nonexistent)
+++ testsuite/special_functions/14_expint/pr68397.cc(working copy)
@@ -0,0 +1,47 @@
+// { dg-do run { target c++11 } }
+// { dg-options "-D__STDCPP_WANT_MATH_SPEC_FUNCS__" }
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// PR libstdc++/68397 -  std::tr1::expint fails in __expint_En_cont_frac
+// for some long double arguments due to low __max_iter value
+
+#include 
+#include 
+
+int

[PATCH] Fix PR80453

2017-04-20 Thread Richard Biener

The following fixes PR80453.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk 
sofar.

Richard.

2017-04-20  Richard Biener  

PR tree-optimization/80453
* tree-ssa-sccvn.h (struct vn_phi_s): Add cclhs and ccrhs members.
* tree-ssa-sccvn.c (cond_stmts_equal_p): Use recorded lhs and rhs
from the conditions.
(vn_phi_eq): Pass them down.
(vn_phi_lookup): Record them.
(vn_phi_insert): Likewise.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 246964)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2916,14 +2916,11 @@ vn_phi_compute_hash (vn_phi_t vp1)
the other.  */
 
 static bool
-cond_stmts_equal_p (gcond *cond1, gcond *cond2, bool *inverted_p)
+cond_stmts_equal_p (gcond *cond1, tree lhs1, tree rhs1,
+   gcond *cond2, tree lhs2, tree rhs2, bool *inverted_p)
 {
   enum tree_code code1 = gimple_cond_code (cond1);
   enum tree_code code2 = gimple_cond_code (cond2);
-  tree lhs1 = gimple_cond_lhs (cond1);
-  tree lhs2 = gimple_cond_lhs (cond2);
-  tree rhs1 = gimple_cond_rhs (cond1);
-  tree rhs2 = gimple_cond_rhs (cond2);
 
   *inverted_p = false;
   if (code1 == code2)
@@ -2941,10 +2938,6 @@ cond_stmts_equal_p (gcond *cond1, gcond
   else
 return false;
 
-  lhs1 = vn_valueize (lhs1);
-  rhs1 = vn_valueize (rhs1);
-  lhs2 = vn_valueize (lhs2);
-  rhs2 = vn_valueize (rhs2);
   return ((expressions_equal_p (lhs1, lhs2)
   && expressions_equal_p (rhs1, rhs2))
  || (commutative_tree_code (code1)
@@ -3002,7 +2995,10 @@ vn_phi_eq (const_vn_phi_t const vp1, con
  return false;
bool inverted_p;
if (! cond_stmts_equal_p (as_a  (last1),
- as_a  (last2), _p))
+ vp1->cclhs, vp1->ccrhs,
+ as_a  (last2),
+ vp2->cclhs, vp2->ccrhs,
+ _p))
  return false;
 
/* Get at true/false controlled edges into the PHI.  */
@@ -3081,6 +3077,16 @@ vn_phi_lookup (gimple *phi)
   vp1.type = TREE_TYPE (gimple_phi_result (phi));
   vp1.phiargs = shared_lookup_phiargs;
   vp1.block = gimple_bb (phi);
+  /* Extract values of the controlling condition.  */
+  vp1.cclhs = NULL_TREE;
+  vp1.ccrhs = NULL_TREE;
+  basic_block idom1 = get_immediate_dominator (CDI_DOMINATORS, vp1.block);
+  if (EDGE_COUNT (idom1->succs) == 2)
+if (gcond *last1 = dyn_cast  (last_stmt (idom1)))
+  {
+   vp1.cclhs = vn_valueize (gimple_cond_lhs (last1));
+   vp1.ccrhs = vn_valueize (gimple_cond_rhs (last1));
+  }
   vp1.hashcode = vn_phi_compute_hash ();
   slot = current_info->phis->find_slot_with_hash (, vp1.hashcode,
  NO_INSERT);
@@ -3117,6 +3123,16 @@ vn_phi_insert (gimple *phi, tree result)
   vp1->type = TREE_TYPE (gimple_phi_result (phi));
   vp1->phiargs = args;
   vp1->block = gimple_bb (phi);
+  /* Extract values of the controlling condition.  */
+  vp1->cclhs = NULL_TREE;
+  vp1->ccrhs = NULL_TREE;
+  basic_block idom1 = get_immediate_dominator (CDI_DOMINATORS, vp1->block);
+  if (EDGE_COUNT (idom1->succs) == 2)
+if (gcond *last1 = dyn_cast  (last_stmt (idom1)))
+  {
+   vp1->cclhs = vn_valueize (gimple_cond_lhs (last1));
+   vp1->ccrhs = vn_valueize (gimple_cond_rhs (last1));
+  }
   vp1->result = result;
   vp1->hashcode = vn_phi_compute_hash (vp1);
 
Index: gcc/tree-ssa-sccvn.h
===
--- gcc/tree-ssa-sccvn.h(revision 246964)
+++ gcc/tree-ssa-sccvn.h(working copy)
@@ -67,6 +67,9 @@ typedef struct vn_phi_s
   hashval_t hashcode;
   vec phiargs;
   basic_block block;
+  /* Controlling condition lhs/rhs.  */
+  tree cclhs;
+  tree ccrhs;
   tree type;
   tree result;
 } *vn_phi_t;


RE: [PATCH] MIPS: Prevent buffer overrun in uninitialised variable fix

2017-04-20 Thread Matthew Fortune
Jakub Jelinek  writes:
> On Thu, Apr 20, 2017 at 12:49:49PM +, Matthew Fortune wrote:
> > Hi Jeff,
> >
> > I missed a load of test failures while on vacation and just noticed
> > that the fix you did for a potentially uninitialized variable warning
> > is overwriting the stack and breaking MSA in MIPS.
> >
> > I guess you may have intended to set the length appropriately and only
> > zero the elements that would not be set in the loop:
> >
> > memset (_perm[nelt], 0, MAX_VECT_LEN - nelt);
> >
> > but I switched it to just zero initialise the whole array for
> > simplicity in the patch below.
> >
> > I thought I'd check with you before committing. Obviously I'd like to
> > apply this to GCC 7 branch as well.
> 
> As it is just 16, it is not a big deal either way, memsetting everything
> might be even faster.  If it would be bigger than that, the MAX_VECT_LEN
> - nelt case would be better.
> 
> > gcc/
> > * config/mips/mips.c (mips_expand_vec_perm_const): Re-fix
> > uninitialized variable warning to avoid buffer overrun.
> 
> Ok, thanks, even for GCC 7 branch.

Thanks Jakub. Committed on trunk and gcc-7-branch.

Matthew


Re: [PATCH] Fix PR51513, switch statement with default case containing __builtin_unreachable leads to wild branch

2017-04-20 Thread Peter Bergner
On 4/20/17 2:37 AM, Richard Biener wrote:
> Ok, so I think we should ensure that we remove the regular cases
> with unreachable destination, like in
> 
> switch (i)
> {
> case 0:
>   __builtin_unreachable ();
> default:;
> }
> 
> and handle default with __builtin_unreachable () from switch
> expansion by omitting the range check for the jump table
> indexing (and choose a non-default case label for gaps).

Ok, I'll modify optimize_unreachable() to remove the unreachable
case statements, and leave the unreachable default case statement
for switch expansion so we can eliminate the range check.  Thanks.

Peter



[PATCH] PR79862 check macro for type aliases that depend on

2017-04-20 Thread Jonathan Wakely

The  atomic types shouldn't be declared if the underlying
types aren't available.

PR libstdc++/79862
* include/std/atomic [!_GLIBCXX_USE_C99_STDINT_TR1] (atomic_int8_t)
(atomic_uint8_t, atomic_int16_t, atomic_uint16_t, atomic_int32_t,
(atomic_uint32_t, atomic_int64_t, atomic_uint64_t)
(atomic_int_least8_t, atomic_uint_least8_t, atomic_int_least16_t)
(atomic_uint_least16_t, atomic_int_least32_t, atomic_uint_least32_t)
(atomic_int_least64_t, atomic_uint_least64_t, atomic_int_fast8_t)
(atomic_uint_fast8_t, atomic_int_fast16_t, atomic_uint_fast16_t)
(atomic_int_fast32_t, atomic_uint_fast32_t, atomic_int_fast64_t)
(atomic_uint_fast64_t, atomic_intmax_t, atomic_uintmax_t): Don't
define.

Tested x86_64-linux, committed to trunk.


commit 506f73f5ac492bc2e0745be303ffe946fe16758e
Author: Jonathan Wakely 
Date:   Tue Mar 7 11:55:38 2017 +

PR79862 check macro for type aliases that depend on 

PR libstdc++/79862
* include/std/atomic [!_GLIBCXX_USE_C99_STDINT_TR1] (atomic_int8_t)
(atomic_uint8_t, atomic_int16_t, atomic_uint16_t, atomic_int32_t,
(atomic_uint32_t, atomic_int64_t, atomic_uint64_t)
(atomic_int_least8_t, atomic_uint_least8_t, atomic_int_least16_t)
(atomic_uint_least16_t, atomic_int_least32_t, atomic_uint_least32_t)
(atomic_int_least64_t, atomic_uint_least64_t, atomic_int_fast8_t)
(atomic_uint_fast8_t, atomic_int_fast16_t, atomic_uint_fast16_t)
(atomic_int_fast32_t, atomic_uint_fast32_t, atomic_int_fast64_t)
(atomic_uint_fast64_t, atomic_intmax_t, atomic_uintmax_t): Don't
define.

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 5b252a4..4b583c17 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -909,7 +909,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// atomic_char32_t
   typedef atomic atomic_char32_t;
 
-
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2441. Exact-width atomic typedefs should be provided
 
@@ -986,6 +986,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// atomic_uint_fast64_t
   typedef atomicatomic_uint_fast64_t;
+#endif
 
 
   /// atomic_intptr_t
@@ -997,15 +998,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// atomic_size_t
   typedef atomic   atomic_size_t;
 
+  /// atomic_ptrdiff_t
+  typedef atomicatomic_ptrdiff_t;
+
+#ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /// atomic_intmax_t
   typedef atomic atomic_intmax_t;
 
   /// atomic_uintmax_t
   typedef atomicatomic_uintmax_t;
-
-  /// atomic_ptrdiff_t
-  typedef atomicatomic_ptrdiff_t;
-
+#endif
 
   // Function definitions, atomic_flag operations.
   inline bool


Re: [PATCH] Add test-case (PR tree-optimization/66278).

2017-04-20 Thread Bin.Cheng
On Thu, Apr 20, 2017 at 9:35 AM, Martin Liška  wrote:
> Hello.
>
> The patch adds a new test-case for the mentioned PR. Tested on 
> x86_64-linux-gnu
> and ppc64le-linux-gnu.
>
> Ready for trunk or should I postpone it for next stage1?
Though can't approve, I think it's ok since we are in stage 1 now,
also adding test is unlikely to affect RCs.
Thanks,
bin
> Martin


Re: [PR 80293] Don't totally-sRA char arrays

2017-04-20 Thread Martin Jambor
Hi,

On Mon, Apr 17, 2017 at 09:35:18PM +0200, Martin Jambor wrote:
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  
> > wrote:

...

> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of 
> > non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.
> 

To gather some data for this, I have looked at SRA statistics gathered
for SPEC 2006 benchmarks.  Complete tables are below, let me comment
on the diffs, though.  The most interesting column is probably the
third one (the second one with numbers).

This is the difference between my patch that special-cases char arrays
and disallowing any non-const-pool array total scalarization:

 | Benchmark - pass  | scalarized aggregates | Totally scalarized agregates 
| # scalar replacements | Modified exprs | Deleted stmts |
 
|---+---+--+---++---|
-| 403.gcc - esra|   138 |   12 
|   458 |   1274 |20 |
-| 403.gcc - sra |42 |   11 
|   141 |457 |13 |
+| 403.gcc - esra|   132 |4 
|   370 |   1274 |13 |
+| 403.gcc - sra |43 |8 
|   140 |459 |12 |
-| 447.dealII - esra | 12899 | 7826 
| 16347 |  29046 |  1840 |
-| 447.dealII - sra  |  1083 |  567 
|  2298 |   5479 |   392 |
+| 447.dealII - esra | 12897 | 7790 
| 16329 |  29046 |  1838 |
+| 447.dealII - sra  |  1079 |  553 
|  2262 |   5479 |   388 |
-| 450.soplex - sra  |42 |2 
|87 |263 | 2 |
+| 450.soplex - sra  |42 |0 
|81 |263 | 2 |
-| 453.povray - esra |   265 |   10 
|   812 |   3422 |12 |
-| 453.povray - sra  |76 |6 
|   217 |786 |57 |
+| 453.povray - esra |   264 |8 
|   806 |   3422 |11 |
+| 453.povray - sra  |76 |5 
|   215 |786 |57 |
-| 465.tonto - esra  |  4029 |5 
|  8269 |  21351 | 5 |
+| 465.tonto - esra  |  4024 |0 
|  8233 |  21351 | 0 |

For reference, this is the difference between (2 week old) trunk and
my proposed patch:

 | Benchmark - pass  | scalarized aggregates | Totally scalarized agregates 
| # scalar replacements | Modified exprs | Deleted stmts |
 
|---+---+--+---++---|
-| 401.bzip2 - sra   | 3 |2 
|22 | 62 | 0 |
+| 401.bzip2 - sra   | 3 |0 
|22 | 62 | 0 |
-| 403.gcc - esra|   138 |   13 
|   458 |   1274 |20 |
-| 403.gcc - sra |42 |   12 
|   141 |457 |13 |
+| 403.gcc - esra|   138 |   12 
|   458 |   1274 |20 |
+| 403.gcc - sra |42 |   

Re: [PATCH] MIPS: Prevent buffer overrun in uninitialised variable fix

2017-04-20 Thread Jakub Jelinek
On Thu, Apr 20, 2017 at 12:49:49PM +, Matthew Fortune wrote:
> Hi Jeff,
> 
> I missed a load of test failures while on vacation and just noticed
> that the fix you did for a potentially uninitialized variable warning
> is overwriting the stack and breaking MSA in MIPS.
> 
> I guess you may have intended to set the length appropriately and
> only zero the elements that would not be set in the loop:
> 
> memset (_perm[nelt], 0, MAX_VECT_LEN - nelt);
> 
> but I switched it to just zero initialise the whole array for
> simplicity in the patch below.
> 
> I thought I'd check with you before committing. Obviously I'd like to
> apply this to GCC 7 branch as well.

As it is just 16, it is not a big deal either way, memsetting everything
might be even faster.  If it would be bigger than that, the MAX_VECT_LEN -
nelt case would be better.

> gcc/
>   * config/mips/mips.c (mips_expand_vec_perm_const): Re-fix
>   uninitialized variable warning to avoid buffer overrun.

Ok, thanks, even for GCC 7 branch.

Jakub


[PATCH] MIPS: Prevent buffer overrun in uninitialised variable fix

2017-04-20 Thread Matthew Fortune
Hi Jeff,

I missed a load of test failures while on vacation and just noticed
that the fix you did for a potentially uninitialized variable warning
is overwriting the stack and breaking MSA in MIPS.

I guess you may have intended to set the length appropriately and
only zero the elements that would not be set in the loop:

memset (_perm[nelt], 0, MAX_VECT_LEN - nelt);

but I switched it to just zero initialise the whole array for
simplicity in the patch below.

I thought I'd check with you before committing. Obviously I'd like to
apply this to GCC 7 branch as well.

Thanks,
Matthew

gcc/
* config/mips/mips.c (mips_expand_vec_perm_const): Re-fix
uninitialized variable warning to avoid buffer overrun.
---
 gcc/ChangeLog  | 5 +
 gcc/config/mips/mips.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c908048..80d3436 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-04-20  Matthew Fortune  
+
+   * config/mips/mips.c (mips_expand_vec_perm_const): Re-fix
+   uninitialized variable warning to avoid buffer overrun.
+
 2017-04-20  Alexander Monakov  
 
PR other/71250
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index b35fba7..6bfd86a 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -21358,7 +21358,7 @@ mips_expand_vec_perm_const (rtx operands[4])
 
   /* This is overly conservative, but ensures we don't get an
  uninitialized warning on ORIG_PERM.  */
-  memset (_perm[nelt], 0, MAX_VECT_LEN);
+  memset (orig_perm, 0, MAX_VECT_LEN);
   for (i = which = 0; i < nelt; ++i)
 {
   rtx e = XVECEXP (sel, 0, i);
-- 
2.2.1



Re: [PATCH] omp-low: fix lastprivate/linear lowering for SIMT

2017-04-20 Thread Alexander Monakov
Ping - as this patch addresses a wrong-code issue in new functionality, I'd like
to ask if it may be applied to gcc-7 branch too.

On Fri, 7 Apr 2017, Alexander Monakov wrote:

> Ping.
> 
> > I've noticed while re-reading that this patch incorrectly handled SIMT case
> > in lower_lastprivate_clauses.  The code was changed to look for variables
> > with "omp simt private" attribute, and was left under
> > 'simduid && DECL_HAS_VALUE_EXPR_P (new_var)' condition.  This effectively
> > constrained processing to privatized (i.e. addressable) variables, as
> > non-addressable variables now have neither the value-expr nor the attribute.
> > 
> > This wasn't caught in testing, as apparently all testcases that have target
> > simd loops with linear/lastprivate clauses also have the corresponding 
> > variables
> > mentioned in target map clause, which makes them addressable (is that 
> > necessary?),
> > and I didn't think to check something like that manually.
> > 
> > The following patch fixes it by adjusting the if's in 
> > lower_lastprivate_clauses;
> > alternatively it may be possible to keep that code as-is, and instead set up
> > value-expr and "omp simt private" attributes for all formally private 
> > variables,
> > not just addressable ones, but to me that sounds less preferable.  OK for 
> > trunk?
> > 
> > I think it's possible to improve test coverage for NVPTX by running all 
> > OpenMP
> > testcases via nvptx-none-run (it'll need some changes, but shouldn't be 
> > hard).
> > 
> > gcc/
> > * omp-low.c (lower_lastprivate_clauses): Correct handling of linear and
> > lastprivate clauses in SIMT case.
> > 
> > libgomp/
> > * testsuite/libgomp.c/target-36.c: New testcase.
> > 
> > Thanks.
> > Alexander
> > 
> > diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> > index 253dc85..02b681c 100644
> > --- a/gcc/omp-low.c
> > +++ b/gcc/omp-low.c
> > @@ -4768,11 +4768,10 @@ lower_lastprivate_clauses (tree clauses, tree 
> > predicate, gimple_seq *stmt_list,
> > TREE_NO_WARNING (new_var) = 1;
> > }
> >  
> > - if (simduid && DECL_HAS_VALUE_EXPR_P (new_var))
> > + if (!maybe_simt && simduid && DECL_HAS_VALUE_EXPR_P (new_var))
> > {
> >   tree val = DECL_VALUE_EXPR (new_var);
> > - if (!maybe_simt
> > - && TREE_CODE (val) == ARRAY_REF
> > + if (TREE_CODE (val) == ARRAY_REF
> >   && VAR_P (TREE_OPERAND (val, 0))
> >   && lookup_attribute ("omp simd array",
> >DECL_ATTRIBUTES (TREE_OPERAND (val,
> > @@ -4792,26 +4791,26 @@ lower_lastprivate_clauses (tree clauses, tree 
> > predicate, gimple_seq *stmt_list,
> > TREE_OPERAND (val, 0), lastlane,
> > NULL_TREE, NULL_TREE);
> > }
> > - else if (maybe_simt
> > -  && VAR_P (val)
> > -  && lookup_attribute ("omp simt private",
> > -   DECL_ATTRIBUTES (val)))
> > +   }
> > + else if (maybe_simt)
> > +   {
> > + tree val = (DECL_HAS_VALUE_EXPR_P (new_var)
> > + ? DECL_VALUE_EXPR (new_var)
> > + : new_var);
> > + if (simtlast == NULL)
> > {
> > - if (simtlast == NULL)
> > -   {
> > - simtlast = create_tmp_var (unsigned_type_node);
> > - gcall *g = gimple_build_call_internal
> > -   (IFN_GOMP_SIMT_LAST_LANE, 1, simtcond);
> > - gimple_call_set_lhs (g, simtlast);
> > - gimple_seq_add_stmt (stmt_list, g);
> > -   }
> > - x = build_call_expr_internal_loc
> > -   (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_IDX,
> > -TREE_TYPE (val), 2, val, simtlast);
> > - new_var = unshare_expr (new_var);
> > - gimplify_assign (new_var, x, stmt_list);
> > - new_var = unshare_expr (new_var);
> > + simtlast = create_tmp_var (unsigned_type_node);
> > + gcall *g = gimple_build_call_internal
> > +   (IFN_GOMP_SIMT_LAST_LANE, 1, simtcond);
> > + gimple_call_set_lhs (g, simtlast);
> > + gimple_seq_add_stmt (stmt_list, g);
> > }
> > + x = build_call_expr_internal_loc
> > +   (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_IDX,
> > +TREE_TYPE (val), 2, val, simtlast);
> > + new_var = unshare_expr (new_var);
> > + gimplify_assign (new_var, x, stmt_list);
> > + new_var = unshare_expr (new_var);
> > }
> >  
> >   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE
> > diff --git a/libgomp/testsuite/libgomp.c/target-36.c 
> > b/libgomp/testsuite/libgomp.c/target-36.c
> > new file mode 100644
> > index 000..6925a2a
> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.c/target-36.c
> > @@ -0,0 +1,18 @@
> > +int
> > +main ()
> > +{
> > 

Re: [CHKP] Fix for PR79988

2017-04-20 Thread Rainer Orth
Hi Alexander,

just a couple of nits:

> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/mpx/PR79988.c: New test.

We usually don't use capital PR in testcase names.  Please use pr79988.c
instead, matching the other files there.

Also, both ChangeLog entries should include a PR reference like so:

PR middle-end/79988
* gcc.target/i386/mpx/pr79988.c: New test.

so the commit messages are automatically forwarded to bugzilla.

> gcc/ChangeLog:
> 
> * tree-chkp.c (chkp_gimple_call_builtin_p):
> Remove gimple_call_builtin_p call to avoid the call
> of gimple_builtin_call_types_compatible_p. this will
> strip the checks for address spaces, which can be skipped
> without loosing the functionality

ChangeLog entries only describe *what* changed, not *why*.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[CHKP] Fix for PR79988

2017-04-20 Thread Alexander Ivchenko
Hi Ilya,

The patch below is basically what Richard proposed in Bugzilla. This
approach should produce the correct code for any address spaces,
because it will just strip the address space part of the pointer in
all bnd* instructions; Since we will do that consistently, all checks
should be consistent and correct. What do you think?

gcc/testsuite/ChangeLog:

* gcc.target/i386/mpx/PR79988.c: New test.

gcc/ChangeLog:

* tree-chkp.c (chkp_gimple_call_builtin_p):
Remove gimple_call_builtin_p call to avoid the call
of gimple_builtin_call_types_compatible_p. this will
strip the checks for address spaces, which can be skipped
without loosing the functionality


diff --git a/gcc/testsuite/gcc.target/i386/mpx/PR79988.c
b/gcc/testsuite/gcc.target/i386/mpx/PR79988.c
new file mode 100644
index 000..a6e43eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/PR79988.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+void foo(unsigned char * __seg_gs *pointer_gs) {
+pointer_gs[5] = 0;
+}
diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index b1ff218..1f7184d 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -433,7 +433,9 @@ chkp_gimple_call_builtin_p (gimple *call,
 enum built_in_function code)
 {
   tree fndecl;
-  if (gimple_call_builtin_p (call, BUILT_IN_MD)
+  if (is_gimple_call (call)
+  && (fndecl = gimple_call_fndecl (call)) != NULL
+  && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD
   && (fndecl = targetm.builtin_chkp_function (code))
   && (DECL_FUNCTION_CODE (gimple_call_fndecl (call))
   == DECL_FUNCTION_CODE (fndecl)))


[PATCH][STAGE1] Add prefix to test verification in guality.h

2017-04-20 Thread Martin Liška
Hello.

Currently tests that include guality.h header file and us the fancy macros
for verification produce output like:

FAIL: o is -1, not 268566560
FAIL: w is -1, not 268566624
FAIL: w is -1, not 268566624
FAIL: o is -1, not 268566576
FAIL: w is -1, not 268566624
FAIL: w is -1, not 268566624
FAIL: o is -1, not 268566592

which is quite unpleasant to find a name of test which is responsible for that.
Thus, I'm suggesting to add prefix for the messages coming from the header file
to:

FAIL: guality/guality.h: w is -1, not 6299888
FAIL: guality/guality.h: w is -1, not 6299888
FAIL: guality/guality.h: w is -1, not 6299888
FAIL: guality/guality.h: w is -1, not 6299888
FAIL: guality/guality.h: ret is -1, not 6299872
FAIL: guality/guality.h: e is -1, not 6299888
FAIL: guality/guality.h: e is -1, not 6299888
FAIL: guality/guality.h: e is -1, not 6299888

Ready to be installed?
Thanks,
Martin
>From 4895528a00883afb843f1e493388489472ed7145 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 20 Apr 2017 13:27:38 +0200
Subject: [PATCH] Add prefix to test verification in guality.h

gcc/testsuite/ChangeLog:

2017-04-20  Martin Liska  

	* gcc.dg/guality/guality.h: Add prefix to test verification.
---
 gcc/testsuite/gcc.dg/guality/guality.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/guality.h b/gcc/testsuite/gcc.dg/guality/guality.h
index d5867d8ba96..94722510ff0 100644
--- a/gcc/testsuite/gcc.dg/guality/guality.h
+++ b/gcc/testsuite/gcc.dg/guality/guality.h
@@ -55,6 +55,8 @@ along with GCC; see the file COPYING3.  If not see
so that __FILE__ and __LINE__ will be usable to identify them.
 */
 
+#define GUALITY_TEST "guality/guality.h"
+
 /* This is the type we use to pass values to guality_check.  */
 
 typedef intmax_t gualchk_t;
@@ -274,7 +276,7 @@ continue\n\
 
   i = guality_count[INCORRECT];
 
-  fprintf (stderr, "%s: %i PASS, %i FAIL, %i UNRESOLVED\n",
+  fprintf (stderr, "%s: " GUALITY_TEST  ": %i PASS, %i FAIL, %i UNRESOLVED\n",
 	   i ? "FAIL" : "PASS",
 	   guality_count[PASS], guality_count[INCORRECT],
 	   guality_count[INCOMPLETE]);
@@ -361,13 +363,13 @@ continue\n\
 switch (result)
   {
   case PASS:
-	fprintf (stderr, "PASS: %s is %lli\n", name, value);
+	fprintf (stderr, "PASS: " GUALITY_TEST ": %s is %lli\n", name, value);
 	break;
   case INCORRECT:
-	fprintf (stderr, "FAIL: %s is %lli, not %lli\n", name, xvalue, value);
+	fprintf (stderr, "FAIL: " GUALITY_TEST ": %s is %lli, not %lli\n", name, xvalue, value);
 	break;
   case INCOMPLETE:
-	fprintf (stderr, "%s: %s is %s, expected %lli\n",
+	fprintf (stderr, "%s: " GUALITY_TEST ": %s is %s, expected %lli\n",
 		 unknown_ok ? "UNRESOLVED" : "FAIL", name,
 		 unavailable < 0 ? "not computable" : "optimized away", value);
 	result = unknown_ok ? INCOMPLETE : INCORRECT;
-- 
2.12.2



Re: [PR 80293] Don't totally-sRA char arrays

2017-04-20 Thread Richard Biener
On Mon, 17 Apr 2017, Martin Jambor wrote:

> Hi,
> 
> On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote:
> > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  
> > wrote:
> > >Hi,
> > >
> > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> > >> On Wed, 12 Apr 2017, Martin Jambor wrote:
> > >> 
> > >> > Hi,
> > >> > 
> > >> > the patch below is an attempt to deal with PR 80293 as
> > >non-invasively
> > >> > as possible.  Basically, it switches off total SRA scalarization of
> > >> > any local aggregates which contains an array of elements that have
> > >one
> > >> > byte (or less).
> > >> > 
> > >> > The logic behind this is that accessing such arrays element-wise
> > >> > usually results in poor code and that such char arrays are often
> > >used
> > >> > for non-statically-typed content anyway, and we do not want to copy
> > >> > that byte per byte.
> > >> > 
> > >> > Alan, do you think this could impact your constant pool
> > >scalarization
> > >> > too severely?
> > >> 
> > >> Hmm, isn't one of the issues that we have
> > >> 
> > >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
> > >>   {
> > >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> > >> <= max_scalarization_size)
> > >>   {
> > >> create_total_scalarization_access (var);
> > >> 
> > >> which limits the size of scalarizable vars but not the number
> > >> of accesses we create for total scalarization?
> > >
> > >Well, yes, but at least I understood from your comment #4 in the bug
> > >that you did not want to limit the number of replacements for gcc 7?
> > >
> > >> 
> > >> Is scalarizable_type_p only used in contexts where we have no hint
> > >> of the actual accesses?
> > >
> > >There are should_scalarize_away_bitmap and
> > >cannot_scalarize_away_bitmap hints.
> > >
> > >Total scalarization is a hack process where we chop up small
> > >aggregates according to their types - as opposed to actual accesses,
> > >meaning it is an alien process to the rest of SRA - knowing that they
> > >will go completely away afterwards (that is ensured by
> > >cannot_scalarize_away_bitmap) and that this will facilitate copy
> > >propagation (this is indicated by should_scalarize_away_bitmap, which
> > >has a bit set if there is a non-scalar assignment _from_ (a part of)
> > >the aggregate).
> > 
> > OK, but for the copy x = y where x would go away it still depends on uses 
> > of x what pieces we actually want?  Or is total scalarization only done for 
> > x  = y; z = x;?
> > Thus no further accesses to x?
> 
> Total scalarization adds artificial accesses only to y, but
> (in both cases of total and "natural" scalarization) for all aggregate
> assignments between SRA candidates, SRA attempts to recreate all
> accesses covering RHS to LHS.  Transitively.  So the artificial
> accesses created for y will then get created for x and z even if they
> would not be candidates for total scalarization.  So e.g. if z cannot
> go away because it is passed to a function, it will be loaded
> piece-wise from y.

So what I was trying to figure out is whether there is anything that
fores us to totally scalarize using "natural" elements rather than
using larger accesses?  For

  x = y;
  z = x;

the best pieces to chop x to depends on the write accesses to y
and the read accesses from z (we eventually want to easily match them
up for CSE).

I see we first create total scalarization accesses and only then
doing propagate_all_subaccesses.

> > 
> > >> That is, for the constant pool case we
> > >> usually have
> > >> 
> > >>   x = .LC0;
> > >>   .. = x[2];
> > >> 
> > >> so we have a "hint" that accesses on x are those we'd want to
> > >> optimize to accesses to .LC0.
> > >
> > >You don't need total scalarization for this, just the existence of
> > >read from x[2] would be sufficient but thanks for pointing me to the
> > >right direction.  For constant pool decl scalarization, it is not
> > >important to introduce artificial accesses for x but for .LC0.
> > >Therefore, I have adapted the patch to allow char arrays for const
> > >decls only and verified that it scalarizes a const-pool array of chars
> > >on Aarch64.  The (otherwise yet untested) result is below.
> > >
> > >What do you think?
> > 
> > Why special case char arrays?  I'd simply disallow total scalarization of 
> > non-const arrays completely.
> 
> Well, currently we will get element-wise copy propagation for things
> like "int rgb[3];" (possibly embeded in a small struct).  If I remove
> it, someone will complain.  Maybe the correct limit is SI mode size or
> BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though.
> I just wanted to be conservative at this point in the release cycle.

Sure but this total scalarization was added to be able to defer
ctor "lowering" at gimplification time to SRA.  So it would be ok
(IMHO) to limit it to constant initializers.

But agreed 

[PATCH, GCC/ARM, Stage 1] PR71607: Fix ICE when loading constant

2017-04-20 Thread Prakhar Bahuguna
[ARM] PR71607: Fix ICE when loading constant

gcc/ChangeLog:

2017-04-18  Andre Vieira  
Prakhar Bahuguna  

PR target/71607
* config/arm/arm.md (use_literal_pool): Removes.
(64-bit immediate split): No longer takes cost into consideration
if 'arm_disable_literal_pool' is enabled.
* config/arm/arm.c (arm_tls_referenced_p): Add diagnostic if TLS is
used when arm_disable_literal_pool is enabled.
(arm_max_const_double_inline_cost): Remove use of
arm_disable_literal_pool.
(arm_reorg): Add return if arm_disable_literal_pool is enabled.
* config/arm/vfp.md (no_literal_pool_df_immediate): New.
(no_literal_pool_sf_immediate): New.

testsuite/ChangeLog:

2017-04-18  Andre Vieira  
Thomas Preud'homme  
Prakhar Bahuguna  

PR target/71607
* gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
* gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
* gcc.target/arm/thumb2-slow-flash-data-2.c: New.
* gcc.target/arm/thumb2-slow-flash-data-3.c: New.
* gcc.target/arm/thumb2-slow-flash-data-4.c: New.
* gcc.target/arm/thumb2-slow-flash-data-5.c: New.
* gcc.target/arm/tls-disable-literal-pool.c: New.

Okay for stage1?

--

Prakhar Bahuguna
>From 985100bbf8f168ab9a88ca29869453844eb6b58e Mon Sep 17 00:00:00 2001
From: Prakhar Bahuguna 
Date: Tue, 18 Apr 2017 14:16:46 +0100
Subject: [PATCH] [ARM] PR71607: Fix ICE when loading constant

---
 gcc/config/arm/arm.c   | 20 +---
 gcc/config/arm/arm.md  |  9 ++
 gcc/config/arm/vfp.md  | 37 ++
 ...low-flash-data.c => thumb2-slow-flash-data-1.c} |  0
 .../gcc.target/arm/thumb2-slow-flash-data-2.c  | 28 
 .../gcc.target/arm/thumb2-slow-flash-data-3.c  | 25 +++
 .../gcc.target/arm/thumb2-slow-flash-data-4.c  | 26 +++
 .../gcc.target/arm/thumb2-slow-flash-data-5.c  | 14 
 .../gcc.target/arm/tls-disable-literal-pool.c  | 15 +
 9 files changed, 163 insertions(+), 11 deletions(-)
 rename gcc/testsuite/gcc.target/arm/{thumb2-slow-flash-data.c => 
thumb2-slow-flash-data-1.c} (100%)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-4.c
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb2-slow-flash-data-5.c
 create mode 100644 gcc/testsuite/gcc.target/arm/tls-disable-literal-pool.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a2d80cfd645..01d8c52d8c5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8633,7 +8633,16 @@ arm_tls_referenced_p (rtx x)
 {
   const_rtx x = *iter;
   if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x) != 0)
-   return true;
+   {
+ /* ARM currently does not provide relocations to encode TLS variables
+into AArch32 instructions, only data, so there is no way to
+currently implement these if a literal pool is disabled.  */
+ if (arm_disable_literal_pool)
+   sorry ("accessing thread-local storage is not currently supported "
+  "with -mpure-code or -mslow-flash-data");
+
+ return true;
+   }
 
   /* Don't recurse into UNSPEC_TLS looking for TLS symbols; these are
 TLS offsets, not real symbol references.  */
@@ -16391,10 +16400,6 @@ push_minipool_fix (rtx_insn *insn, HOST_WIDE_INT 
address, rtx *loc,
 int
 arm_max_const_double_inline_cost ()
 {
-  /* Let the value get synthesized to avoid the use of literal pools.  */
-  if (arm_disable_literal_pool)
-return 99;
-
   return ((optimize_size || arm_ld_sched) ? 3 : 4);
 }
 
@@ -17341,6 +17346,11 @@ arm_reorg (void)
   if (!optimize)
 split_all_insns_noflow ();
 
+  /* Make sure we do not attempt to create a literal pool even though it should
+ no longer be necessary to create any.  */
+  if (arm_disable_literal_pool)
+return ;
+
   minipool_fix_head = minipool_fix_tail = NULL;
 
   /* The first insn must always be a note, or the code below won't
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 21cfe3a4c31..f9365cde504 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -233,10 +233,6 @@
   (match_test "arm_restrict_it"))
  (const_string "no")
 
- (and (eq_attr "use_literal_pool" "yes")
-  (match_test "arm_disable_literal_pool"))
- (const_string "no")
-
  (eq_attr "arch_enabled" "no")
  (const_string "no")]
 (const_string "yes")))
@@ -5878,8 +5874,9 

Re: [PATCH, GCC, stage4] Fix type for .init_array.* and .fini_array.* sections

2017-04-20 Thread Thomas Preudhomme

Hi Kyrill,

On 19/04/17 10:56, Kyrill Tkachov wrote:

Hi Thomas,

On 11/04/17 18:35, Thomas Preudhomme wrote:

Hi,

Several tests started failing for ARM targets (eg. gcc.dg/initpri1.c)
after change 6f9dbcd42f2cf034a9a21f46842c08d2e88449db in binutils. This
is because the non-default priority init_array and fini_array sections
are not created with NOTYPE flag as is the case for default priority
init_array and fini_array sections (see default_section_type_flags in
varasm.c for instance). This patch fixes the issue.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-04-11  Thomas Preud'homme  

* config/arm/arm.c (arm_elf_asm_cdtor): Create non-default
priority .init_array and .fini_array section with SECTION_NOTYPE
flag.

Testing: with this patch test gcc.dg/initpri1.c succeeds but fails
without.

Is this ok for stage4?



This is ok if a bootstrap and test run on arm-none-linux-gnueabihf shows no
problems.


No issue found, hence committed. Thanks.

Best regards,

Thomas


Re: [CHKP] Fix for PR79990

2017-04-20 Thread Alexander Ivchenko
Thanks for correcting the usage of get_base_address. I fixed that.
Plus addressed the comment about the avoiding the usage of
chkp_find_bounds.


gcc/testsuite/ChangeLog:

2017-04-20  Alexander Ivchenko  

* gcc.target/i386/mpx/hard-reg-2-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-2-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-2-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-3-1-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-3-1-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-3-1-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-3-2-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-3-2-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-3-2-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-3-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-3-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-3-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-4-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-4-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-4-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-5-1-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-5-1-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-5-1-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-5-2-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-5-2-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-5-2-ubv.c: New test.
* gcc.target/i386/mpx/hard-reg-6-lbv.c: New test.
* gcc.target/i386/mpx/hard-reg-6-nov.c: New test.
* gcc.target/i386/mpx/hard-reg-6-ubv.c: New test.

gcc/ChangeLog:

2017-04-20  Alexander Ivchenko  

* tree-chkp.c (chkp_get_hard_register_var_fake_base_address):
New function to provide a base address for
chkp_get_hard_register_fake_addr_expr.
(chkp_get_hard_register_fake_addr_expr): New function to build
fake address expression for an expr that resides on a hard
register.
(chkp_build_addr_expr): Add checks for hard reg cases.
(chkp_parse_array_and_component_ref): Create/find bounds if the
var resides on a hard reg.


I already had a testcases for struct with a pointer - "hard-reg-4-*".
Here is the instrumentation of the foo function:

__attribute__((chkp instrumented))
foo.chkp (int i)
{
  __bounds_type __bound_tmp.1;
  register struct S2 b __asm__ (*xmm0);
  int k;
  int * _1;
  long unsigned int _2;
  long unsigned int _3;
  int * _4;
  int * _5;
  int _11;
  int * _18;

   [0.00%]:
  __bound_tmp.1_17 = __chkp_zero_bounds;
  __bound_tmp.1_14 = __builtin_ia32_bndmk (, 4);
  __bound_tmp.1_13 = __builtin_ia32_bndmk (-2147483648B, 16);

   [0.00%]:

   [0.00%]:
  k = 5;
  b.f3 = 
  __builtin_ia32_bndstx (, __bound_tmp.1_14, -2147483648B);
  _1 = b.f3;
  __bound_tmp.1_15 = __builtin_ia32_bndldx (-2147483648B, _1);
  _2 = (long unsigned int) i_9(D);
  _3 = _2 * 4;
  _4 = _1 + _3;
  b.f3 = _4;
  __builtin_ia32_bndstx (_4, __bound_tmp.1_15, -2147483648B);
  _5 = b.f3;
  __bound_tmp.1_16 = __builtin_ia32_bndldx (-2147483648B, _5);
  __builtin_ia32_bndcl (_5, __bound_tmp.1_16);
  _18 = _5 + 3;
  __builtin_ia32_bndcu (_18, __bound_tmp.1_16);
  _11 = *_5;
  k ={v} {CLOBBER};
  return _11;

}

Which is the most suspicious one, because we have ldx and stx. I'm not
sure whether this is OK.

Here is the chkp dump for foo function of newly added hard-reg-6* case:

__attribute__((chkp instrumented))
foo.chkp (int i, int * kp1, __bounds_type __chkp_bounds_of_kp1)
{
  __bounds_type __bound_tmp.1;
  int D.2873;
  register struct S2 b __asm__ (*xmm0);
  int k2;
  int * _1;
  long unsigned int _2;
  long unsigned int _3;
  int * _4;
  int * _5;
  int _13;
  int * _31;

   [0.00%]:
  __bound_tmp.1_22 = __builtin_ia32_bndmk (, 4);
  __bound_tmp.1_17 = __chkp_zero_bounds;
  __bound_tmp.1_15 = __builtin_ia32_bndmk (-2147483648B, 16);

   [0.00%]:

   [0.00%]:
  k2 = 5;
  __bound_tmp.1_16 = __builtin_ia32_bndmk (-2147483648B, 16);
  __bound_tmp.1_18 = __builtin_ia32_bndint (__bound_tmp.1_16, __bound_tmp.1_15);
  __builtin_ia32_bndcl (-2147483648B, __bound_tmp.1_18);
  __builtin_ia32_bndcu (-2147483641B, __bound_tmp.1_18);
  b.f[0] = kp1_8(D);
  __builtin_ia32_bndstx (kp1_8(D), __chkp_bounds_of_kp1_19(D), -2147483648B);
  __bound_tmp.1_20 = __builtin_ia32_bndmk (-2147483648B, 16);
  __bound_tmp.1_21 = __builtin_ia32_bndint (__bound_tmp.1_20, __bound_tmp.1_15);
  __builtin_ia32_bndcl (-2147483640B, __bound_tmp.1_21);
  __builtin_ia32_bndcu (-2147483633B, __bound_tmp.1_21);
  b.f[1] = 
  __builtin_ia32_bndstx (, __bound_tmp.1_22, -2147483640B);
  __bound_tmp.1_23 = __builtin_ia32_bndmk (-2147483648B, 16);
  __bound_tmp.1_24 = __builtin_ia32_bndint (__bound_tmp.1_23, __bound_tmp.1_15);
  __builtin_ia32_bndcl (-2147483648B, __bound_tmp.1_24);
  __builtin_ia32_bndcu (-2147483641B, __bound_tmp.1_24);
  _1 = b.f[0];
  __bound_tmp.1_27 = __builtin_ia32_bndldx (-2147483648B, _1);
  _2 = (long unsigned int) 

[PING*2][PATCH][PR79542][Ada] Fix ICE in dwarf2out.c with nested func. inlining

2017-04-20 Thread Pierre-Marie de Rodat

Hello,

I would like to ping for the patch I posted there: 
. Thank you!


--
Pierre-Marie de Rodat


[PATCH] Add test-case (PR tree-optimization/66278).

2017-04-20 Thread Martin Liška
Hello.

The patch adds a new test-case for the mentioned PR. Tested on x86_64-linux-gnu
and ppc64le-linux-gnu.

Ready for trunk or should I postpone it for next stage1?
Martin
>From 84335b09fd7626224148be8883edbaa42dcc496e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 20 Apr 2017 09:54:02 +0200
Subject: [PATCH] Add test-case (PR tree-optimization/66278).

gcc/testsuite/ChangeLog:

2017-04-20  Martin Liska  

	PR tree-optimization/66278
	* gcc.dg/vect/pr66278.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr66278.c | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr66278.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr66278.c b/gcc/testsuite/gcc.dg/vect/pr66278.c
new file mode 100644
index 000..3e0e2ec5e2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr66278.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+#define N 101
+
+typedef unsigned int __uint32_t;
+
+int main(int argc, char **argv)
+{
+  __uint32_t array[N][N][N];
+
+  const unsigned int next = argc == 3 ? 0 : 1;
+
+  for (unsigned i = next; i < N;  i++)
+array[3][3][i] = array[3][3][i] - 10;
+
+  return array[3][3][argc];
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
+/* { dg-require-effective-target vect_int } */
-- 
2.12.2



Re: [PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-20 Thread Richard Biener
On Thu, 20 Apr 2017, Jakub Jelinek wrote:

> On Wed, Apr 19, 2017 at 08:22:23AM +0200, Jakub Jelinek wrote:
> > Here in (so far untested) patch form:
> > 
> > 2017-04-19  Jakub Jelinek  
> > 
> > PR middle-end/80423
> > * tree.h (build_array_type): Add typeless_storage default argument.
> > * tree.c (type_cache_hasher::equal): Also compare
> > TYPE_TYPELESS_STORAGE flag for ARRAY_TYPEs.
> > (build_array_type): Add typeless_storage argument, set
> > TYPE_TYPELESS_STORAGE to it, if shared also hash it, and pass to
> > recursive call.
> > (build_nonshared_array_type): Adjust build_array_type_1 caller.
> > (build_array_type): Likewise.  Add typeless_storage argument.
> > c-family/
> > * c-common.c (complete_array_type): Preserve TYPE_TYPELESS_STORAGE.
> > cp/
> > * tree.c (build_cplus_array_type): Call build_array_type
> > with the intended TYPE_TYPELESS_STORAGE flag value, instead
> > of calling build_array_type and modifying later TYPE_TYPELESS_STORAGE
> > on the shared type.
> > testsuite/
> > * g++.dg/other/pr80423.C: New test.
> 
> Just for completeness, it bootstrapped/regtested successfully on x86_64-linux
> and i686-linux.

Ok.

Richard.


Re: [gomp4] add support for allocatable scalars in OpenACC declare constructs

2017-04-20 Thread Thomas Schwinge
Hi!

On Wed, 19 Apr 2017 11:11:39 -0700, Cesar Philippidis  
wrote:
> I've applied this patch to gomp-4_0-branch to add support for fortran
> allocatable scalars inside OpenACC declare constructs.

Thanks!


> Included in this patch is a bug fix for non-declared allocatable
> scalars. [...]

Please, bug fixes as work items/patches/commits separate from new
features.  (As long as that makes sense.)


I have not reviewed your changes in detail; just a few comments here:

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -3336,6 +3336,18 @@ gfc_trans_oacc_executable_directive (gfc_code *code)
>gfc_start_block ();
>oacc_clauses = gfc_trans_omp_clauses (, code->ext.omp_clauses,
>   code->loc);
> +
> +  /* Promote GOMP_MAP_FIRSTPRIVATE_POINTER to GOMP_MAP_ALWAYS_POINTER for
> + variables inside OpenACC update directives.  */

Well, that's quite obvious from the following few lines of code ;-) --
would be more helpful in such cases to describe in source code comments
*why* something is being done.

> +  if (code->op == EXEC_OACC_UPDATE)
> +for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +  {
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER)
> +   OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_ALWAYS_POINTER);
> +  }

> --- a/gcc/testsuite/gfortran.dg/goacc/declare-allocatable-1.f90
> +++ b/gcc/testsuite/gfortran.dg/goacc/declare-allocatable-1.f90
> @@ -6,20 +6,20 @@
>  
>  program allocate
>implicit none
> -  integer, allocatable :: a(:)
> +  integer, allocatable :: a(:), b
>integer, parameter :: n = 100
>integer i
> -  !$acc declare create(a)
> +  !$acc declare create(a,b)
>  
> -  allocate (a(n))
> +  allocate (a(n), b)
>  
> -  !$acc parallel loop copyout(a)
> +  !$acc parallel loop copyout(a, b)
>do i = 1, n
> - a(i) = i
> + a(i) = b
>end do

That "a(i) = b" assignment is reading uninitialized data ("copyout(b)").
Did you mean to specify "copyin(b)" or (implicit?) "firstprivate(b)", and
set "b" to a defined value before the region?

>  
> -  deallocate (a)
> +  deallocate (a, b)
>  end program allocate
>  
> -! { dg-final { scan-tree-dump-times "pragma acc enter data 
> map.declare_allocate" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "pragma acc exit data 
> map.declare_deallocate" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "pragma acc enter data 
> map.declare_allocate" 2 "original" } }
> +! { dg-final { scan-tree-dump-times "pragma acc exit data 
> map.declare_deallocate" 2 "original" } }

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90
> @@ -0,0 +1,30 @@

Missing "{ dg-do run }" to exercise torture testing -- or is that
intentionally not done here?

> +program main
> +  implicit none
> +  integer, parameter :: n = 100
> +  integer, allocatable :: a, c
> +  integer :: i, b(n)
> +
> +  allocate (a)
> +
> +  a = 50
> +
> +  !$acc parallel loop
> +  do i = 1, n;
> + b(i) = a
> +  end do
> +
> +  do i = 1, n
> + if (b(i) /= a) call abort
> +  end do
> +
> +  allocate (c)
> +
> +  print *, loc (c)
> +  !$acc parallel copyout(c) num_gangs(1)
> +  c = a
> +  !$acc end parallel
> +
> +  if (c /= a) call abort
> +
> +  deallocate (a, c)
> +end program main


Additionally, I'm seeing one regression:

[-PASS:-]{+FAIL: gfortran.dg/goacc/pr77371-1.f90   -O  (internal compiler 
error)+}
{+FAIL:+} gfortran.dg/goacc/pr77371-1.f90   -O  (test for excess errors)

[...]/gcc/testsuite/gfortran.dg/goacc/pr77371-1.f90:5:0: internal compiler 
error: in force_constant_size, at gimplify.c:664
0x87c57b force_constant_size
[...]/gcc/gimplify.c:664
0x87e687 gimple_add_tmp_var(tree_node*)
[...]/gcc/gimplify.c:702
0x867f3d create_tmp_var(tree_node*, char const*)
[...]/gcc/gimple-expr.c:476
0x9a1b95 lower_omp_target
[...]/gcc/omp-low.c:16875
[...]


Grüße
 Thomas


Re: [PATCH] Fix PR80457

2017-04-20 Thread Richard Biener
On Wed, Apr 19, 2017 at 12:38 AM, Bill Schmidt
 wrote:
> Hi,
>
> While investigating a performance issue, I happened to notice that vectorized
> COND_EXPRs were not contributing to the vectorizer cost model.  This patch
> addresses that.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this ok for
> trunk, or should it wait for GCC 8?

Ok for GCC 8, this is not a regression.

Thanks,
Richard.

> Thanks,
> Bill
>
>
> [gcc]
>
> 2017-04-18  Bill Schmidt  
>
> PR tree-optimization/80457
> * tree-vect-stmts.c (vectorizable_condition): Update the cost
> model when vectorizing a COND_EXPR.
>
> [gcc/testsuite]
>
> 2017-04-18  Bill Schmidt  
>
> PR tree-optimization/80457
> * gcc.target/powerpc/pr78604.c: Verify that vectorized COND_EXPRs
> call vect_model_simple_cost.
>
>
> Index: gcc/testsuite/gcc.target/powerpc/pr78604.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr78604.c  (revision 246948)
> +++ gcc/testsuite/gcc.target/powerpc/pr78604.c  (working copy)
> @@ -2,7 +2,7 @@
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>  /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> -/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize" } */
> +/* { dg-options "-mcpu=power8 -O2 -ftree-vectorize -fdump-tree-vect-details" 
> } */
>
>  #ifndef SIZE
>  #define SIZE 1024
> @@ -110,3 +110,4 @@ uns_gte (UNS_TYPE val1, UNS_TYPE val2)
>  /* { dg-final { scan-assembler-times {\mvcmpgtsd\M} 4 } } */
>  /* { dg-final { scan-assembler-times {\mvcmpgtud\M} 4 } } */
>  /* { dg-final { scan-assembler-not   {\mvcmpequd\M} } } */
> +/* { dg-final { scan-tree-dump-times "vect_model_simple_cost" 8 "vect" } } */
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   (revision 246948)
> +++ gcc/tree-vect-stmts.c   (working copy)
> @@ -7746,7 +7746,7 @@ vectorizable_condition (gimple *stmt, gimple_stmt_
>tree vec_compare;
>tree new_temp;
>loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> -  enum vect_def_type dt, dts[4];
> +  enum vect_def_type dt[2], dts[4];
>int ncopies;
>enum tree_code code, cond_code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>stmt_vec_info prev_stmt_info = NULL;
> @@ -7813,10 +7813,10 @@ vectorizable_condition (gimple *stmt, gimple_stmt_
>  return false;
>
>gimple *def_stmt;
> -  if (!vect_is_simple_use (then_clause, stmt_info->vinfo, _stmt, ,
> +  if (!vect_is_simple_use (then_clause, stmt_info->vinfo, _stmt, [0],
>))
>  return false;
> -  if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, ,
> +  if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, [1],
>))
>  return false;
>
> @@ -7900,8 +7900,12 @@ vectorizable_condition (gimple *stmt, gimple_stmt_
> return false;
> }
> }
> -  return expand_vec_cond_expr_p (vectype, comp_vectype,
> -cond_code);
> +  if (expand_vec_cond_expr_p (vectype, comp_vectype, cond_code))
> +   {
> + vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
> + return true;
> +   }
> +  return false;
>  }
>
>/* Transform.  */
>


Re: [PATCH] Fix PR51513, switch statement with default case containing __builtin_unreachable leads to wild branch

2017-04-20 Thread Richard Biener
On Thu, 13 Apr 2017, Peter Bergner wrote:

> Bah, fixing up my return address.
> 
> 
> On 4/13/17 3:14 AM, Richard Biener wrote:
> > To recap the situation (from what I can deciper out of the ppc asm
> > and the expand RTL) we seem to expand to
> > 
> >   if (cond > 2)
> > __builtin_unreachable (); // jumps to the jump table data(?)
> >   goto *tbl[cond];
> > 
> > now I do not remember the reason why we keep __builtin_unreachable ()
> > at the RTL level -- on GIMPLE we keep it to be able to extract
> > range information from the controlling condition.
> 
> As Jakub said, we do remove the blocks.  But we don't remove the branch
> to those blocks.  That is true of whether the __builtin_unreachable() is
> in the default case statement or in a normal case statement.  That is
> what leads to the wild branch issue.
> 
> 
> > Your patch comments suggest that handling of gaps is similarly
> > improved but the testcase doesn't contain any gaps.
> 
> I didn't change the way the jump table gaps are handled.  They have
> always been redirected to the "default label"... although the patch
> doesn't use default_label anymore, it uses gap_label, as resetting
> default_label to fallback_label when default_label is NULL was
> what was stopping us from removing the range check.
> 
> What is changed, is that now we also handle case labels that point to
> blocks with __builtin_unreachable() and those are also redirected to
> the new gap_label.
> 
> I did not include a test case with an unreachable case statement,
> because unlike the unreachable default case test case, there is no
> compare/branch I can verify has been removed.  However, looking
> closer, it seems the unpatched compiler leaves the unreachable
> block as well as it's label in the jump table, so I should be able
> to create an executable test case that uses a switch condition
> that targets the unreachable case and see if we SEGV/SIGILL or not.
> I cannot do that for the unreachable default case, since the range
> check we remove is what saves us from accessing the jump table
> with an index that is outside of the table.
> 
> 
> 
> > So I wonder why we don't simply apply the "transform" at the GIMPLE
> > level, for example in the kitchen-sink pass_fold_builtins.  That is
> > remove all __builtin_unreachable () labels and if the default label
> > is amongst them make the first one the new default (or maybe the
> > last one dependent on which would shrink jump table size most).
> 
> Well if the default case is unreachable and we compute a "new"
> default, then we won't be able to eliminate the range check
> (ie, compare/branch to default).  So isn't it performance wise
> better to eliminate the range check, rather than choosing a
> new default?  If you think we shouldn't care about that, then
> I'd vote for choosing the first/last label with the higher
> edge probability rather than trying to shrink the table.

True.

> > VRP2 removes the if because we put the range computed by VRP1
> > on the SSA name used in the test (we have special code handling
> > unreachable paths there).  For your testcase with the switch
> > as cond_2 is not used after the switch we do not compute any
> > range for it, otherwise we'd likely eliminate the default
> > case as well.  For the small testcase above fold-all-builtins
> > handles the removal as well if VRP is not run, so extending
> > that to handle switch stmts sounds reasonable (see
> > optimize_unreachable).  There's even a TODO in this function.
> 
> If we updated optimize_unreachable() to remove unreachable
> default cases, would we still be able to disambiguate between
> a switch statement written with no default case and one where
> the default case was explicitly shown to be unreachable?

No.

> Maybe the default_label would be NULL for the unreachable
> case and non-NULL in the other case?  If so, we'd still need
> my change that doesn't set default_label to fallback_label
> and instead uses the new var gap_label.

That would be possible though we now rely on the default label
to be present IIRC.

Ok, so I think we should ensure that we remove the regular cases
with unreachable destination, like in

switch (i)
{
case 0:
  __builtin_unreachable ();
default:;
}

and handle default with __builtin_unreachable () from switch
expansion by omitting the range check for the jump table
indexing (and choose a non-default case label for gaps).

Richard.


Re: [PATCH] Fix TYPE_TYPELESS_STORAGE handling (PR middle-end/80423)

2017-04-20 Thread Jakub Jelinek
On Wed, Apr 19, 2017 at 08:22:23AM +0200, Jakub Jelinek wrote:
> Here in (so far untested) patch form:
> 
> 2017-04-19  Jakub Jelinek  
> 
>   PR middle-end/80423
>   * tree.h (build_array_type): Add typeless_storage default argument.
>   * tree.c (type_cache_hasher::equal): Also compare
>   TYPE_TYPELESS_STORAGE flag for ARRAY_TYPEs.
>   (build_array_type): Add typeless_storage argument, set
>   TYPE_TYPELESS_STORAGE to it, if shared also hash it, and pass to
>   recursive call.
>   (build_nonshared_array_type): Adjust build_array_type_1 caller.
>   (build_array_type): Likewise.  Add typeless_storage argument.
> c-family/
>   * c-common.c (complete_array_type): Preserve TYPE_TYPELESS_STORAGE.
> cp/
>   * tree.c (build_cplus_array_type): Call build_array_type
>   with the intended TYPE_TYPELESS_STORAGE flag value, instead
>   of calling build_array_type and modifying later TYPE_TYPELESS_STORAGE
>   on the shared type.
> testsuite/
>   * g++.dg/other/pr80423.C: New test.

Just for completeness, it bootstrapped/regtested successfully on x86_64-linux
and i686-linux.

Jakub