date:20171017

[PATCH] enhance -Warray-bounds to handle strings and excessive indices

2017-10-17 Thread Martin Sebor


While testing my latest -Wrestrict changes I noticed a number of
opportunities to improve the -Warray-bounds warning.  Attached
is a patch that implements a solution for the following subset
of these:

PR tree-optimization/82596 - missing -Warray-bounds on an out-of
  bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on an excessively
  large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds
  inner indices

The patch also adds more detail to the -Warray-bounds diagnostics
to make it easier to see the cause of the problem.

Richard, since the changes touch tree-vrp.c, I look in particular
for your comments.

Thanks
Martin
PR tree-optimization/82596 - missing -Warray-bounds on an out-of-bounds index into string literal
PR tree-optimization/82588 - missing -Warray-bounds on a excessively large index
PR tree-optimization/82583 - missing -Warray-bounds on out-of-bounds inner indic

gcc/ChangeLog:
	PR tree-optimization/82596
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* tree-vrp.c (check_array_ref): Handle flexible array members,
	string literals, and inner indices.
	(search_for_addr_array): Add detail to diagnostics.

gcc/testsuite/ChangeLog:

	PR tree-optimization/82596
	PR tree-optimization/82588
	PR tree-optimization/82583	
	* c-c++-common/Warray-bounds.c: New test.
	* gcc.dg/Warray-bounds-11.c: Adjust.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 2c86b8e..88cce15 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "tree-cfg.h"
+#include "tree-dfa.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
 #include "tree-ssa-loop.h"
@@ -64,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "builtins.h"
 
 #define VR_INITIALIZER { VR_UNDEFINED, NULL_TREE, NULL_TREE, NULL }
 
@@ -6675,26 +6677,51 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
   low_sub = up_sub = TREE_OPERAND (ref, 1);
   up_bound = array_ref_up_bound (ref);
 
-  /* Can not check flexible arrays.  */
   if (!up_bound
-  || TREE_CODE (up_bound) != INTEGER_CST)
-return;
+  || (warn_array_bounds < 2
+	  && array_at_struct_end_p (ref)))
+{
+  /* Accesses to trailing arrays via pointers may access storage
+	 beyond the types array bounds.  For such arrays, or for flexible
+	 array members as well as for other arrays of an unknown size,
+	 replace the upper bound with a more permissive one that assumes
+	 the size of the largest object is SSIZE_MAX.  */
+  tree eltype = TREE_TYPE (ref);
+  tree eltsize = TYPE_SIZE_UNIT (eltype);
+  tree maxbound = TYPE_MAX_VALUE (ssizetype);
+  up_bound_p1 = fold_build2 (TRUNC_DIV_EXPR, ssizetype, maxbound, eltsize);
+
+  tree arg = TREE_OPERAND (ref, 0);
+  tree_code code = TREE_CODE (arg);
+  if (code == COMPONENT_REF)
+	{
+	  HOST_WIDE_INT off;
+	  if (tree base = get_addr_base_and_unit_offset (ref, ))
+	up_bound_p1 = fold_build2 (MINUS_EXPR, ssizetype, up_bound_p1,
+   TYPE_SIZE_UNIT (TREE_TYPE (base)));
+	  else
+	return;
+	}
+  else if (code == STRING_CST)
+	up_bound_p1 = build_int_cst (ssizetype, TREE_STRING_LENGTH (arg));
 
-  /* Accesses to trailing arrays via pointers may access storage
- beyond the types array bounds.  */
-  if (warn_array_bounds < 2
-  && array_at_struct_end_p (ref))
-return;
+  up_bound = int_const_binop (MINUS_EXPR, up_bound_p1,
+  build_int_cst (ssizetype, 1));
+}
+  else
+up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
+   build_int_cst (TREE_TYPE (up_bound), 1));
 
   low_bound = array_ref_low_bound (ref);
-  up_bound_p1 = int_const_binop (PLUS_EXPR, up_bound,
- build_int_cst (TREE_TYPE (up_bound), 1));
+
+  tree artype = TREE_TYPE (TREE_OPERAND (ref, 0));
 
   /* Empty array.  */
   if (tree_int_cst_equal (low_bound, up_bound_p1))
 {
   warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array bounds of %qT",
+		  low_bound, artype);
   TREE_NO_WARNING (ref) = 1;
 }
 
@@ -6718,7 +6745,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
   && tree_int_cst_le (low_sub, low_bound))
 {
   warning_at (location, OPT_Warray_bounds,
-		  "array subscript is outside array bounds");
+		  "array subscript [%E, %E] is outside array bounds of %qT",
+		  low_sub, up_sub, artype);
   TREE_NO_WARNING (ref) = 1;
 }
 }
@@ -6734,7 +6762,8 @@ check_array_ref (location_t location, tree ref, bool ignore_off_by_one)
 	  fprintf (dump_file, "\n");
 	}
   warning_at (location, OPT_Warray_bounds,
-		  "array subscript is above array bounds");
+		  "array subscript %E is above array

Re: [PATCH] C/C++: more stdlib header hints (PR c/81404)

2017-10-17 Thread Martin Sebor


On 10/17/2017 11:33 AM, David Malcolm wrote:

This patch depends on:

* "[PATCH] c-family: add name_hint/deferred_diagnostic (v2)"
  * https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01021.html
  (waiting review)

* [PATCH 3/3] C: hints for missing stdlib includes for macros and types
  * https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00125.html
  (approved, pending the prereq above)

It extends the C frontend's "knowledge" of the C stdlib within
get_c_name_hint to cover some more macros and functions, covering
a case reported in PR c/81404 ("INT_MAX"), so that rather than printing:

  t.c:5:12: error: 'INT_MAX' undeclared here (not in a function); did you mean 
'__INT_MAX__'?
   int test = INT_MAX;
  ^~~
  __INT_MAX__

we instead print:

  t.c:5:12: error: 'INT_MAX' undeclared here (not in a function)
   int test = INT_MAX;
  ^~~
  t.c:5:12: note: 'INT_MAX' is defined in header ''; did you forget to 
'#include '?
  t.c:1:1:
  +#include 

  t.c:5:12:
   int test = INT_MAX;
  ^~~

It also adds generalizes some of the code for this (and for the "std::"
namespace hints in the C++ frontend), moving it to a new
c-family/known-headers.cc and .h, and introducing a class known_headers.
This currently just works by scanning a hardcoded array of known
name/header associations, but perhaps in the future could be turned
into some kind of symbol database so that the compiler could record API
uses and use that to offer suggestions e.g.


I think this feature will be especially helpful for the not
so common symbols or those that are often expected to be
declared in the wrong header.  As you say, the data structure
can be improved/enhanced and made more general.  Until that
happens, it would be great to complete the set of the standard
symbols.  Since the full list is going to be quite long, it
might be worthwhile to sort it so it can be searched using
a binary search.

Btw., while on the subject of enhancements, I think it would
also be helpful to make the preprocessor aware of the set of
predefined macros and have it issue warnings in cases where
they are either used in nonsensical or error-prone ways.  One
example that I've had trouble with is making mistakes testing
the C++ __cplusplus macro for equality that can never be true,
as in the contrived:

  #if __cplusplus == 2014
  // C++ 14 code
  #else
  // other code
  #endif

(The real problems usually involve several non-trivial tests
not all of which get tested.)

Martin

Re: [patch, fortran] Fix PR 82567

2017-10-17 Thread Steve Kargl

On Tue, Oct 17, 2017 at 06:14:16PM -0700, Jerry DeLisle wrote:
> On 10/17/2017 03:36 PM, Thomas Koenig wrote:
> > Hello world,
> > 
> > this patch fixes a regression with long compile times,
> > which came about due to our handling of array constructors
> > at compile time.  This, togeteher with a simplification in
> > front end optimization, led to long compile times and large
> > code.
> > 
> > Regression-tested. OK for trunk and the other affected branches?
> > 
> 
> Well I know 42 is the answer to the ultimate question of the universe so this
> must be OK.  I just don't know what the question is.
> 
> OK and thanks,
> 
> Jerry
> 
> +#define CONSTR_LEN_MAX 42

Actually, I was wondering about the choice myself.  With
most common hardware having fairly robust L1 and L2 cache
sizes, a double precision array constructor with 42 
elements only occupies 336 bytes.  Seems small.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

Re: [PATCH 20/22] Enable building libobjc with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 03:19 PM, Tsimbalist, Igor V wrote:
> Enable building libobjc with Intel CET options.
> 
> libobjc/
>   * Makefile.in: Regenerate.
>   * aclocal.m4: Likeiwse.
>   * configure: Likewise.
>   * configure.ac: Set CET_FLAGS. Update XCFLAGS.
> 


Same comments as the libcilkrts changes.

Jeff

Re: [PATCH 17/22] Enable building libquadmath with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:34 PM, Tsimbalist, Igor V wrote:
> Enable building libquadmath with Intel CET options.
> 
> libquadmath/
>   * Makefile.am: Update AM_CFLAGS.
>   * Makefile.in: Regenerate:
>   * acinclude.m4: Add enable.m4 and cet.m4.
>   * configure: Regenerate.
>   * configure.ac: Set CET_FLAGS. Update XCFLAGS.
> 

Same comments as the libcilkrts changes.

Jeff

Re: [PATCH 18/22] Enable building libmpx with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:36 PM, Tsimbalist, Igor V wrote:
> Enable building libmpx with Intel CET options.
> 
> libmpx/
>   * Makefile.in: Regenerate.
>   * acinclude.m4: Add enable.m4 and cet.m4.
>   * configure: Regenerate.
>   * configure.ac: Set CET_FLAGS. Update XCFLAGS.
>   * mpxrt/Makefile.am: Update libmpx_la_CFLAGS.
>   * mpxrt/Makefile.in: Regenerate.
>   * mpxwrap/Makefile.am: Add AM_CFLAGS. Update
>   * libmpxwrappers_la_CFLAGS.
>   * mpxwrap/Makefile.in: Regenerate.
> 


Same comments as the libcilkrts changes.

Jeff

Re: [PATCH 16/22] Enable building libssp with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:31 PM, Tsimbalist, Igor V wrote:
> Enable building libssp with Intel CET options.
> 
> libssp/
>   * Makefile.am: Update AM_CFLAGS.
>   * Makefile.in: Regenerate.
>   * configure: Likewise.
>   * aclocal.m4: Likewise.
>   * configure.ac: Set CET_FLAGS. Update XCFLAGS.
> 

Same comments as with libcilkrts.
Jeff

Re: [PATCH 15/22] Enable building libvtv with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:29 PM, Tsimbalist, Igor V wrote:
> Enable building libvtv with Intel CET options.
> 
> libvtv/
>   * acinclude.m4: Add enable.m4 and cet.m4.
>   * libvtv/configure: Regenerate.
>   * libvtv/configure.ac: Set CET_FLAGS. Update XCFLAGS.
Same comments as with libcilkrts.
Jeff

Re: [PATCH 19/22] Enable building libgfortran with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 03:17 PM, Tsimbalist, Igor V wrote:
> Enable building libgfortran with Intel CET options.
> 
> libgfortran/
>   * acinclude.m4: Add enable.m4, cet.m4.
>   * configure: Regenerate.
>   * configure.ac: Set CET_FLAGS. Update AM_FCFLAGS, AM_CFLAGS,
>   CFLAGS.
> 

Same comments as the libcilkrts changes.

Jeff

Re: [PATCH 14/22] Enable building libsanitizer with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:27 PM, Tsimbalist, Igor V wrote:
> Enable building libsanitizer with Intel CET options.
> 
> libsanitizer/
>   * acinclude.m4: Add enable.m4 and cet.m4.
>   * Makefile.in: Regenerate.
>   * asan/Makefile.am: Update AM_CXXFLAGS.
>   * asan/Makefile.in: Regenerate.
>   * configure: Likewise.
>   * configure.ac: Set CET_FLAGS. Update EXTRA_CFLAGS,
>   * EXTRA_CXXFLAGS.
>   * interception/Makefile.am: Update AM_CXXFLAGS.
>   * interception/Makefile.in: Regenerate.
>   * libbacktrace/Makefile.am: Update AM_CFLAGS, AM_CXXFLAGS.
>   * libbacktrace/Makefile.in: Regenerate.
>   * lsan/Makefile.am: Update AM_CXXFLAGS.
>   * lsan/Makefile.in: Regenerate.
>   * sanitizer_common/Makefile.am: Update AM_CXXFLAGS.
>   * sanitizer_common/Makefile.in: Regenerate.
>   * tsan/Makefile.am: Update AM_CXXFLAGS.
>   * tsan/Makefile.in: Regenerate.
>   * ubsan/Makefile.am: Update AM_CXXFLAGS.
>   * ubsan/Makefile.in: Regenerate.
> 

Same comments as with libcilkrts.
Jeff

Re: [PATCH 11/22] Enable building libatomic with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:18 PM, Tsimbalist, Igor V wrote:
> Enable building libatomic with CET options.
> 
> libatomic/
>   * configure.ac: Set CET_FLAGS, update XCFLAGS.
>   * acinclude.m4: Add cet.m4 and enable.m4.
>   * configure: Regenerate.
>   * Makefile.in: Likewise.
>   * testsuite/Makefile.in: Likewise.
> 
Same comments as with libcilkrts.
Jeff

Re: [PATCH 12/22] Enable building libgomp with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:20 PM, Tsimbalist, Igor V wrote:
> Enable building libgomp with CET options.
> 
> libgomp/
>   * configure.ac: Set CET_FLAGS, update XCFLAGS and FCFLAGS.
>   * acinclude.m4: Add cet.m4.
>   * configure: Regenerate.
>   * Makefile.in: Likewise.
>   * testsuite/Makefile.in: Likewise
> 

Same comments as with libcilkrts.
Jeff

Re: [PATCH 10/22] Enable building libcilkrts with Intel CET

2017-10-17 Thread Jeff Law

On 10/12/2017 02:13 PM, Tsimbalist, Igor V wrote:
> Enable building libcilkrts with CET options.
> 
> libcilkrts/
>   * Makefile.am: Add AM_CXXFLAGS and XCXXFLAGS.
>   * configure.ac: Set CET_FLAGS, update XCFLAGS, XCXXFLAGS.
>   * Makefile.in: Regenerate.
>   * aclocal.m4: Likewise.
>   * configure: Likewise.
> 
So like the other patches in this space, the inclusion of cet.h seems
wrong.  I don't see why this should be needed here.

It's OK with that bit removed and once any prereqs are OK'd.

jeff

Re: [patch, fortran] Fix PR 82567

2017-10-17 Thread Jerry DeLisle

On 10/17/2017 03:36 PM, Thomas Koenig wrote:
> Hello world,
> 
> this patch fixes a regression with long compile times,
> which came about due to our handling of array constructors
> at compile time.  This, togeteher with a simplification in
> front end optimization, led to long compile times and large
> code.
> 
> Regression-tested. OK for trunk and the other affected branches?
> 

Well I know 42 is the answer to the ultimate question of the universe so this
must be OK.  I just don't know what the question is.

OK and thanks,

Jerry

+#define CONSTR_LEN_MAX 42
+

Re: [patch, fortran] Fix PR 79795

2017-10-17 Thread Jerry DeLisle

On 10/15/2017 11:09 AM, Thomas Koenig wrote:
> Hello world,
> 
> the attached patch fixes a regression by turning an ICE-on-invalid into
> an error message (and making sure that it fits).
> 
> Regression-tested on trunk.
> 
> OK for all affected branches (8/7/6)?
> 

Yes, OK, thanks.

Jerry

Re: [Patch, fortran] PR82550 - program using submodules fails to link

2017-10-17 Thread Jerry DeLisle

On 10/17/2017 11:33 AM, Paul Richard Thomas wrote:
> The attached patch has a comment that explains what is going on.
> 
> Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 7-branch?
> 

Yes, looks OK for both. Thanks.

Jerry

[patch, fortran] Fix PR 82567

2017-10-17 Thread Thomas Koenig


Hello world,

this patch fixes a regression with long compile times,
which came about due to our handling of array constructors
at compile time.  This, togeteher with a simplification in
front end optimization, led to long compile times and large
code.

Regression-tested. OK for trunk and the other affected branches?

Regards

Thomas

2917-10-17  Thomas Koenig  

PR fortran/82567
* frontend-passes.c (combine_array_constructor): If an array
constructor is all constants and has more elements than a small
constant, don't convert a*[b,c] to [a*b,a*c] to reduce compilation
times.

2917-10-17  Thomas Koenig  

PR fortran/82567
* gfortran.dg/array_constructor_51.f90: New test.
! { dg-do compile }
! { dg-additional-options "-ffrontend-optimize -fdump-tree-original" }
! PR 82567 - long compile times caused by large constant constructors
! multiplied by variables

  SUBROUTINE sub()
  IMPLICIT NONE
  
  INTEGER, PARAMETER :: n = 1000
  REAL, ALLOCATABLE :: x(:)
  REAL :: xc, h
  INTEGER :: i
 
  ALLOCATE( x(n) )
  xc = 100.
  h = xc/n
  x = h*[(i,i=1,n)]
  
end
! { dg-final { scan-tree-dump-times "__var" 0 "original" } }
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 253768)
+++ frontend-passes.c	(Arbeitskopie)
@@ -1635,6 +1635,8 @@ combine_array_constructor (gfc_expr *e)
   gfc_constructor *c, *new_c;
   gfc_constructor_base oldbase, newbase;
   bool scalar_first;
+  int n_elem;
+  bool all_const;
 
   /* Array constructors have rank one.  */
   if (e->rank != 1)
@@ -1674,12 +1676,38 @@ combine_array_constructor (gfc_expr *e)
   if (op2->ts.type == BT_CHARACTER)
 return false;
 
-  scalar = create_var (gfc_copy_expr (op2), "constr");
+  /* This might be an expanded constructor with very many constant values. If
+ we perform the operation here, we might end up with a long compile time,
+ so an arbitrary length bound is in order here.  If the constructor
+ constains something which is not a constant, it did not come from an
+ expansion, so leave it alone.  */
 
+#define CONSTR_LEN_MAX 42
+
   oldbase = op1->value.constructor;
+
+  n_elem = 0;
+  all_const = true;
+  for (c = gfc_constructor_first (oldbase); c; c = gfc_constructor_next(c))
+{
+  if (c->expr->expr_type != EXPR_CONSTANT)
+	{
+	  all_const = false;
+	  break;
+	}
+  n_elem += 1;
+}
+
+  if (all_const && n_elem > CONSTR_LEN_MAX)
+return false;
+
+#undef CONSTR_LEN_MAX
+
   newbase = NULL;
   e->expr_type = EXPR_ARRAY;
 
+  scalar = create_var (gfc_copy_expr (op2), "constr");
+
   for (c = gfc_constructor_first (oldbase); c;
c = gfc_constructor_next (c))
 {

Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-17 Thread Richard Sandiford

Andrew MacLeod  writes:
> On 10/17/2017 08:18 AM, Richard Sandiford wrote:
>> Aldy Hernandez  writes:
>>> Hi folks!
>>>
>>> Calling print_hex() on a widest_int with the most significant bit turned
>>> on can lead to a leading zero being printed (0x0). This produces
>>> confusing dumps to say the least, especially when you incorrectly assume
>>> an integer is NOT signed :).
>> That's the intended behaviour though.  wide_int-based types only use as
>> many HWIs as they need to store their current value, with any other bits
>> in the value being a sign extension of the top bit.  So if the most
>> significant HWI in a widest_int is zero, that HWI is there to say that
>> the previous HWI should be zero- rather than sign-extended.
>>
>> So:
>>
>> 0x0  -> (1 << 32) - 1 to infinite precision
>> (i.e. a positive value)
>> 0x   -> -1
>>
>> Thanks,
>> Richard
>
> I for one find this very confusing.  If I have a 128 bit value, I don't 
> expect to see a 132 bits.  And there are enough 0's its not obvious when 
> I look.

But Aldy was talking about widest_int, which is wider than 128 bits.
It's an approximation of infinite precision.

wide_int is the type to use if you want an N-bit number (for some N).

> I don't think a leading 0 should be printed if "precision" bits have 
> already been printed.

Does 0 get printed in that case though?  Aldy's patch skips an upper
HWI if the upper HWI is zero, but we never have more HWIs than the
number that's being represented.

Thanks,
Richard

Re: [PATCH 4/9] [SFN] introduce statement frontier notes, still disabled

2017-10-17 Thread Alexandre Oliva

On Oct 13, 2017, Richard Biener  wrote:

> If the [SFN] is self-contained you can install that part once the approval
> for the FE parts is in.

It is, so I'll do that.

> You can of course wait a bit for more reviews
> (stopped short on LVU because of that all-targets touching patch ... ;))

:-)

I could minimize the amount of visible changes in target code by using
'&' rather than '*' to make the parameter changeable.  This enables
final_start_function to consume the initial debug insns, so that
e.g. parm bindings are integrated in the initial view.

The per-target changes could also be avoided entirely, by having both
final_start_function and final skip initial debug insns, but I think
that is wasteful, cpu-wise, and error prone in the long term,
maintenance-wise.

Just throwing some options out, in case touching the code of so many
targets turns out to be a blocker...

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

[testsuite] UnXFAIL gcc.dg/attr-alloc_size-11.c on Visium

2017-10-17 Thread Eric Botcazou

Tested on visium-elf, applied on the mainline and 7 branch.


2017-10-17  Eric Botcazou  

* gcc.dg/attr-alloc_size-11.c: UnXFAIL for visium-*-*.

-- 
Eric BotcazouIndex: gcc.dg/attr-alloc_size-11.c
===
--- gcc.dg/attr-alloc_size-11.c	(revision 253830)
+++ gcc.dg/attr-alloc_size-11.c	(working copy)
@@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t;
 
 /* The following tests fail because of missing range information.  The xfail
exclusions are PR79356.  */
-TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for signed char" { xfail { ! { aarch64*-*-* arm*-*-* alpha*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } } */
-TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for short" { xfail { ! { aarch64*-*-* arm*-*-* alpha*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* } } } } */
+TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for signed char" { xfail { ! { aarch64*-*-* arm*-*-* alpha*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390*-*-* visium-*-* } } } } */
+TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for short" { xfail { ! { aarch64*-*-* arm*-*-* alpha*-*-* ia64-*-* mips*-*-* powerpc*-*-* sparc*-*-* s390x-*-* visium-*-* } } } } */
 TEST (int, INT_MIN + 2, ALLOC_MAX);/* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -3, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -2, ALLOC_MAX); /* { dg-warning "argument 1 range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */

[Visium] Fix build breakage

2017-10-17 Thread Eric Botcazou

The compare-elim.c change broke the build because the pass now sends all kind 
of junk RTXes to the select_cc_mode target hook, which was written in exact 
keeping with arithmetic patterns of the MD file.  We now need to handle all 
possible RTXes on the RHS of an assignment, even calls.

Tested on visium-elf, applied on the mainline.


2017-10-17  Eric Botcazou  

* config/visium/visium.c (visium_select_cc_mode): Return CCmode for
any RTX present on the RHS of a SET.
* compare-elim.c (try_eliminate_compare): Restore comment.

-- 
Eric BotcazouIndex: config/visium/visium.c
===
--- config/visium/visium.c	(revision 253767)
+++ config/visium/visium.c	(working copy)
@@ -2938,12 +2938,6 @@ visium_select_cc_mode (enum rtx_code cod
   /* This is a btst, the result is in C instead of Z.  */
   return CCCmode;
 
-case CONST_INT:
-  /* This is a degenerate case, typically an uninitialized variable.  */
-  gcc_assert (op0 == constm1_rtx);
-
-  /* ... fall through ... */
-
 case REG:
 case AND:
 case IOR:
@@ -2960,6 +2954,17 @@ visium_select_cc_mode (enum rtx_code cod
 	 when applied to a comparison with zero.  */
   return CCmode;
 
+/* ??? Cater to the junk RTXes sent by try_merge_compare.  */
+case ASM_OPERANDS:
+case CALL:
+case CONST_INT:
+case LO_SUM:
+case HIGH:
+case MEM:
+case UNSPEC:
+case ZERO_EXTEND:
+  return CCmode;
+
 default:
   gcc_unreachable ();
 }
Index: compare-elim.c
===
--- compare-elim.c	(revision 253767)
+++ compare-elim.c	(working copy)
@@ -729,6 +729,7 @@ try_eliminate_compare (struct comparison
   if (try_merge_compare (cmp))
 return true;
 
+  /* We must have found an interesting "clobber" preceding the compare.  */
   if (cmp->prev_clobber == NULL)
 return false;

RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Michael Collison

Are we in agreement that I should revert the patch?

-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, October 17, 2017 1:10 PM
To: Michael Collison ; Eric Botcazou 

Cc: Jeff Law ; GCC Patches ; Segher 
Boessenkool ; Kyrill Tkachov 
; nd 
Subject: RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison 
 wrote:
>Richard and Eric,
>
>I see you have objected and indicated the additional cost. Have you 
>quantified how much more expensive the pass is?

DF has known quadratic behavior in memory for certain problems. Not sure off 
head if DU and UD fall into this category. 

Richard. 

>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com]
>Sent: Tuesday, October 17, 2017 4:45 AM
>To: Eric Botcazou 
>Cc: Jeff Law ; GCC Patches ; 
>Michael Collison ; Segher Boessenkool 
>; Kyrill Tkachov 
>; nd 
>Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal 
>ops
>
>On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou 
>wrote:
>>> This looks good.  OK for the trunk.
>>
>> FWIW I disagree.  The patch completely shuns the existing 
>> implementation of the pass, which is based on a forward scan within 
>> basic blocks to identify the various interesting instructions and 
>> record them, and uses full-blown def-use and use-def chains instead, 
>> which are much more costly to compute.  It's not clear to me why the
>existing implementation couldn't have been extended.
>>
>> The result is that, for targets for which the pass was initially
>written, i.e.
>> targets for which most (all) arithmetic instructions clobber the 
>> flags, the pass will be slower for absolutely no benefits, as the 
>> existing implementation would already have caught all the interesting
>cases.
>>
>> So it's again a case of a generic change made for a specific target 
>> without consideration for other, admittedly less mainstream,
>targets...
>
>I agree with Eric here.
>
>Richard.
>
>> --
>> Eric Botcazou

Re: Unbreak Ada bootstrap (was Re: [PATCH PR/82546] tree node size)

2017-10-17 Thread Eric Botcazou

> This change broke Ada bootstrap, because the FE doesn't have any tree_size
> langhook, but has one language specific tcc_type tree -
> UNCONSTRAINED_ARRAY_TYPE.

There should be a requirement to test all languages for this kind of changes.

> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2017-10-17  Jakub Jelinek  
> 
>   * langhooks.h (struct lang_hooks): Document that tree_size langhook
>   may be also called on tcc_type nodes.
>   * langhooks.c (lhd_tree_size): Likewise.
> 
>   * gcc-interface/misc.c (gnat_tree_size): New function.
>   (LANG_HOOKS_TREE_SIZE): Redefine.

OK, thanks.

-- 
Eric Botcazou

RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Richard Biener

On October 17, 2017 9:08:31 PM GMT+02:00, Michael Collison 
 wrote:
>Richard and Eric,
>
>I see you have objected and indicated the additional cost. Have you
>quantified how much more expensive the pass is?

DF has known quadratic behavior in memory for certain problems. Not sure off 
head if DU and UD fall into this category. 

Richard. 

>-Original Message-
>From: Richard Biener [mailto:richard.guent...@gmail.com] 
>Sent: Tuesday, October 17, 2017 4:45 AM
>To: Eric Botcazou 
>Cc: Jeff Law ; GCC Patches ;
>Michael Collison ; Segher Boessenkool
>; Kyrill Tkachov
>; nd 
>Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal
>ops
>
>On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou 
>wrote:
>>> This looks good.  OK for the trunk.
>>
>> FWIW I disagree.  The patch completely shuns the existing 
>> implementation of the pass, which is based on a forward scan within 
>> basic blocks to identify the various interesting instructions and 
>> record them, and uses full-blown def-use and use-def chains instead, 
>> which are much more costly to compute.  It's not clear to me why the
>existing implementation couldn't have been extended.
>>
>> The result is that, for targets for which the pass was initially
>written, i.e.
>> targets for which most (all) arithmetic instructions clobber the 
>> flags, the pass will be slower for absolutely no benefits, as the 
>> existing implementation would already have caught all the interesting
>cases.
>>
>> So it's again a case of a generic change made for a specific target 
>> without consideration for other, admittedly less mainstream,
>targets...
>
>I agree with Eric here.
>
>Richard.
>
>> --
>> Eric Botcazou

Re: Unbreak Ada bootstrap (was Re: [PATCH PR/82546] tree node size)

2017-10-17 Thread Richard Biener

On October 17, 2017 9:29:46 PM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>On Fri, Oct 13, 2017 at 02:29:40PM -0400, Nathan Sidwell wrote:
>> [Although I filed this as a middle-end bug, it's really a core infra
>bug,
>> not sure who the best reviewer is]
>
>> 2017-10-13  Nathan Sidwell  
>> 
>>  PR middle-end/82546
>>  gcc/
>>  * tree.c (tree_code_size): Reformat.  Punt to lang hook for unknown
>>  TYPE nodes.
>
>This change broke Ada bootstrap, because the FE doesn't have any
>tree_size
>langhook, but has one language specific tcc_type tree -
>UNCONSTRAINED_ARRAY_TYPE.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>2017-10-17  Jakub Jelinek  
>
>   * langhooks.h (struct lang_hooks): Document that tree_size langhook
>   may be also called on tcc_type nodes.
>   * langhooks.c (lhd_tree_size): Likewise.
>
>   * gcc-interface/misc.c (gnat_tree_size): New function.
>   (LANG_HOOKS_TREE_SIZE): Redefine.
>
>--- gcc/langhooks.h.jj 2017-09-12 17:20:17.0 +0200
>+++ gcc/langhooks.h2017-10-17 19:49:29.277324006 +0200
>@@ -307,10 +307,10 @@ struct lang_hooks
>   /* Remove any parts of the tree that are used only by the FE. */
>   void (*free_lang_data) (tree);
> 
>-  /* Determines the size of any language-specific tcc_constant or
>- tcc_exceptional nodes.  Since it is called from make_node, the
>- only information available is the tree code.  Expected to die
>- on unrecognized codes.  */
>+  /* Determines the size of any language-specific tcc_constant,
>+ tcc_exceptional or tcc_type nodes.  Since it is called from
>+ make_node, the only information available is the tree code.
>+ Expected to die on unrecognized codes.  */
>   size_t (*tree_size) (enum tree_code);
> 
>   /* Return the language mask used for converting argv into a sequence
>--- gcc/langhooks.c.jj 2017-05-21 15:46:13.0 +0200
>+++ gcc/langhooks.c2017-10-17 19:47:13.973960166 +0200
>@@ -266,8 +266,8 @@ lhd_gimplify_expr (tree *expr_p ATTRIBUT
> }
> 
> /* lang_hooks.tree_size: Determine the size of a tree with code C,
>-   which is a language-specific tree code in category tcc_constant or
>-   tcc_exceptional.  The default expects never to be called.  */
>+   which is a language-specific tree code in category tcc_constant,
>+   tcc_exceptional or tcc_type.  The default expects never to be
>called.  */
> size_t
> lhd_tree_size (enum tree_code c ATTRIBUTE_UNUSED)
> {
>--- gcc/ada/gcc-interface/misc.c.jj2017-08-31 23:47:18.0 +0200
>+++ gcc/ada/gcc-interface/misc.c   2017-10-17 19:48:39.715923329 +0200
>@@ -343,6 +343,23 @@ internal_error_function (diagnostic_cont
>   Compiler_Abort (sp, sp_loc, true);
> }
> 
>+/* lang_hooks.tree_size: Determine the size of a tree with code C,
>+   which is a language-specific tree code in category tcc_constant,
>+   tcc_exceptional or tcc_type.  The default expects never to be
>called.  */
>+
>+static size_t
>+gnat_tree_size (enum tree_code code)
>+{
>+  gcc_checking_assert (code >= NUM_TREE_CODES);
>+  switch (code)
>+{
>+case UNCONSTRAINED_ARRAY_TYPE:
>+  return sizeof (tree_type_non_common);
>+default:
>+  gcc_unreachable ();
>+}
>+}
>+
>/* Perform all the initialization steps that are language-specific.  */
> 
> static bool
>@@ -1387,6 +1404,8 @@ get_lang_specific (tree node)
> #define LANG_HOOKS_NAME   "GNU Ada"
> #undef  LANG_HOOKS_IDENTIFIER_SIZE
> #define LANG_HOOKS_IDENTIFIER_SIZEsizeof (struct tree_identifier)
>+#undef  LANG_HOOKS_TREE_SIZE
>+#define LANG_HOOKS_TREE_SIZE  gnat_tree_size
> #undef  LANG_HOOKS_INIT
> #define LANG_HOOKS_INIT   gnat_init
> #undef  LANG_HOOKS_OPTION_LANG_MASK
>
>
>   Jakub

Re: [PATCH] C/C++: more stdlib header hints (PR c/81404)

2017-10-17 Thread Joseph Myers

On Tue, 17 Oct 2017, David Malcolm wrote:

> It also adds generalizes some of the code for this (and for the "std::"
> namespace hints in the C++ frontend), moving it to a new
> c-family/known-headers.cc and .h, and introducing a class known_headers.
> This currently just works by scanning a hardcoded array of known
> name/header associations, but perhaps in the future could be turned
> into some kind of symbol database so that the compiler could record API
> uses and use that to offer suggestions e.g.
> 
> foo.cc: error: 'myapi::foo' was not declared in this scope
> foo.cc: note: 'myapi::foo" was declared in header 'myapi/private.h'
> (included via 'myapi/public.h') when compiling 'bar.cc'; did you forget to
> '#include "myapi/public.h"'?
> 
> or somesuch.
> 
> In any case, moving this to a class gives an easier way to locate the
> hardcoded knowledge about the stdlib.
> 
> The patch also adds similar code to the C++ frontend covering
> unqualified names in the standard library, so that rather than just

I'd tend to expect hardcoded standard library knowledge, where it relates 
to symbols present for both C and C++, to be in a common c-family file 
(e.g. listing both C and C++ headers for each symbol, with the possibility 
of some symbols only having a header listed for one C and C++; most C 
symbols would have  and  listed, but some might be different, 
e.g. wchar_t being a keyword in C++ or clog being completely different in 
the two libraries).  That reduces the chance of a symbol being 
gratuitously listed for one language only when such hints make sense for 
it in both languages.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Eric Botcazou

> I see you have objected and indicated the additional cost. Have you
> quantified how much more expensive the pass is?

No, but use-def chains are known to be slow because DF is slow, see e.g. the 
comment located a few lines below the call to try_merge_compare:

  /* ??? This is one point at which one could argue that DF_REF_CHAIN would
 be useful, but it is thought to be too heavy-weight a solution here.  */

Note that the patch breaks e.g. the Visium port, because the pass now sends 
all kind of junk RTXes to the select_cc_mode target hook, which was written to 
be in exact keeping with the arithmetic patterns of the MD file.  I'm going to 
fix the breakage of course, but this shows that the patch burns a large amount 
of cycles for targets like Visium for no benefit.

Index: config/visium/visium.c
===
--- config/visium/visium.c  (revision 253767)
+++ config/visium/visium.c  (working copy)
@@ -2938,12 +2938,6 @@ visium_select_cc_mode (enum rtx_code cod
   /* This is a btst, the result is in C instead of Z.  */
   return CCCmode;
 
-case CONST_INT:
-  /* This is a degenerate case, typically an uninitialized variable.  */
-  gcc_assert (op0 == constm1_rtx);
-
-  /* ... fall through ... */
-
 case REG:
 case AND:
 case IOR:
@@ -2953,6 +2947,7 @@ visium_select_cc_mode (enum rtx_code cod
 case LSHIFTRT:
 case TRUNCATE:
 case SIGN_EXTEND:
+case ZERO_EXTEND:
   /* Pretend that the flags are set as for a COMPARE with zero.
 That's mostly true, except for the 2 right shift insns that
 will set the C flag.  But the C flag is relevant only for
@@ -2960,6 +2955,16 @@ visium_select_cc_mode (enum rtx_code cod
 when applied to a comparison with zero.  */
   return CCmode;
 
+/* ??? Cater to the junk RTXes sent by try_merge_compare.  */
+case ASM_OPERANDS:
+case CALL:
+case CONST_INT:
+case LO_SUM:
+case HIGH:
+case MEM:
+case UNSPEC:
+  return CCmode;
+
 default:
   gcc_unreachable ();


-- 
Eric Botcazou

Re: [PATCH] c-family: add name_hint/deferred_diagnostic (v2)

2017-10-17 Thread Joseph Myers

The C front-end parts of this patch are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

[v3 PATCH] Deduction guides for associative containers, debug mode deduction guide fixes.

2017-10-17 Thread Ville Voutilainen

Tested on Linux-PPC64. The debug mode fixes have been tested manually
and individually on Linux-x64.

2017-10-17  Ville Voutilainen  

Deduction guides for associative containers, debug mode deduction
guide fixes.
* include/bits/stl_algobase.h (__iter_key_t)
(__iter_val_t, __iter_to_alloc_t): New.
* include/bits/stl_map.h: Add deduction guides.
* include/bits/stl_multimap.h: Likewise.
* include/bits/stl_multiset.h: Likewise.
* include/bits/stl_set.h: Likewise.
* include/bits/unordered_map.h: Likewise.
* include/bits/unordered_set.h: Likewise.
* include/debug/deque: Likewise.
* include/debug/forward_list: Likewise.
* include/debug/list: Likewise.
* include/debug/map.h: Likewise.
* include/debug/multimap.h: Likewise.
* include/debug/multiset.h: Likewise.
* include/debug/set.h: Likewise.
* include/debug/unordered_map: Likewise.
* include/debug/unordered_set: Likewise.
* include/debug/vector: Likewise.
* testsuite/23_containers/map/cons/deduction.cc: New.
* testsuite/23_containers/multimap/cons/deduction.cc: Likewise.
* testsuite/23_containers/multiset/cons/deduction.cc: Likewise.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_map/cons/deduction.cc: Likewise.
* testsuite/23_containers/unordered_multimap/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/cons/deduction.cc:
Likewise.
* testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise.


deduction_guidos.diff.gz
Description: GNU Zip compressed data

[committed] Simplify format_warning_at_substring API

2017-10-17 Thread David Malcolm

The format_warning_at_substring API has a rather clunk way of indicating
the location of the pertinent param (if any): a source_range * is passed
in, which can be NULL.  Doing so requires extracting a range from the
location_t and passing around a pointer to it, or NULL, as needed.

This patch simplifies things by eliminating the source_range * in
favor of a location_t, with UNKNOWN_LOCATION used to signify that
no param location is available.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r253827.

gcc/c-family/ChangeLog:
* c-format.c (format_warning_at_char): Pass UNKNOWN_LOCATION
rather than NULL to format_warning_va.
(check_format_types): Likewise when calling format_type_warning.
Remove code to extract source_ranges and source_range * in favor
of just a location_t.
(format_type_warning): Convert source_range * param to a
location_t.

gcc/ChangeLog:
* gimple-ssa-sprintf.c (fmtwarn): Update for changed signature of
format_warning_at_substring.
(maybe_warn): Convert source_range * param to a location_t.  Pass
UNKNOWN_LOCATION rather than NULL to fmtwarn.
(format_directive): Remove code to extract source_ranges and
source_range * in favor of just a location_t.
(parse_directive): Pass UNKNOWN_LOCATION rather than NULL to
fmtwarn.
* substring-locations.c (format_warning_va): Convert
source_range * param to a location_t.
(format_warning_at_substring): Likewise.
* substring-locations.h (format_warning_va): Likewise.
(format_warning_at_substring): Likewise.
---
 gcc/c-family/c-format.c   | 45 ---
 gcc/gimple-ssa-sprintf.c  | 60 +--
 gcc/substring-locations.c | 17 +-
 gcc/substring-locations.h |  4 ++--
 4 files changed, 55 insertions(+), 71 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 0dba979..164d035 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -97,7 +97,8 @@ format_warning_at_char (location_t fmt_string_loc, tree 
format_string_cst,
 
   substring_loc fmt_loc (fmt_string_loc, string_type, char_idx, char_idx,
 char_idx);
-  bool warned = format_warning_va (fmt_loc, NULL, NULL, opt, gmsgid, );
+  bool warned = format_warning_va (fmt_loc, UNKNOWN_LOCATION, NULL, opt,
+  gmsgid, );
   va_end (ap);
 
   return warned;
@@ -1039,7 +1040,7 @@ static void check_format_types (const substring_loc 
_loc,
char conversion_char,
vec *arglocs);
 static void format_type_warning (const substring_loc _loc,
-source_range *param_range,
+location_t param_loc,
 format_wanted_type *, tree,
 tree,
 const format_kind_info *fki,
@@ -3073,8 +3074,9 @@ check_format_types (const substring_loc _loc,
   cur_param = types->param;
   if (!cur_param)
 {
- format_type_warning (fmt_loc, NULL, types, wanted_type, NULL, fki,
-  offset_to_type_start, conversion_char);
+ format_type_warning (fmt_loc, UNKNOWN_LOCATION, types, wanted_type,
+  NULL, fki, offset_to_type_start,
+  conversion_char);
   continue;
 }
 
@@ -3084,23 +3086,15 @@ check_format_types (const substring_loc _loc,
   orig_cur_type = cur_type;
   char_type_flag = 0;
 
-  source_range param_range;
-  source_range *param_range_ptr;
+  location_t param_loc = UNKNOWN_LOCATION;
   if (EXPR_HAS_LOCATION (cur_param))
-   {
- param_range = EXPR_LOCATION_RANGE (cur_param);
- param_range_ptr = _range;
-   }
+   param_loc = EXPR_LOCATION (cur_param);
   else if (arglocs)
{
  /* arg_num is 1-based.  */
  gcc_assert (types->arg_num > 0);
- location_t param_loc = (*arglocs)[types->arg_num - 1];
- param_range = get_range_from_loc (line_table, param_loc);
- param_range_ptr = _range;
+ param_loc = (*arglocs)[types->arg_num - 1];
}
-  else
-   param_range_ptr = NULL;
 
   STRIP_NOPS (cur_param);
 
@@ -3166,7 +3160,7 @@ check_format_types (const substring_loc _loc,
}
  else
{
- format_type_warning (fmt_loc, param_range_ptr,
+ format_type_warning (fmt_loc, param_loc,
   types, wanted_type, orig_cur_type, fki,
   offset_to_type_start, conversion_char);
  break;
@@ -3236,7 +3230,7 @@ check_format_types (const substring_loc _loc,
  && TYPE_PRECISION (cur_type) == TYPE_PRECISION

Re: [PATCH] Update -ffunction/data-sections documentation

2017-10-17 Thread Sandra Loosemore


On 10/15/2017 11:59 PM, Sebastian Huber wrote:

gcc/
* invoke.texi (ffunction-sections and fdata-sections): Update.
---
  gcc/doc/invoke.texi | 32 
  1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e7dfb33c31..7bc051a1fc5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9712,18 +9712,26 @@ file if the target supports arbitrary sections.  The 
name of the
  function or the name of the data item determines the section's name
  in the output file.

-Use these options on systems where the linker can perform optimizations
-to improve locality of reference in the instruction space.  Most systems
-using the ELF object format and SPARC processors running Solaris 2 have
-linkers with such optimizations.  AIX may have these optimizations in
-the future.
-
-Only use these options when there are significant benefits from doing
-so.  When you specify these options, the assembler and linker
-create larger object and executable files and are also slower.
-You cannot use @command{gprof} on all systems if you
-specify this option, and you may have problems with debugging if
-you specify both this option and @option{-g}.
+Use these options on systems where the linker can perform optimizations to
+improve locality of reference in the instruction space.  Most systems using the
+ELF object format have linkers with such optimizations.  On AIX, the linker
+rearranges sections (CSECTs) based on the call graph.  The performance impact
+varies.
+
+Together with a linker garbage collection (linker @option{--gc-sections}
+option) these options may lead to smaller statically linked executables (after


statically-linked


+stripping).
+
+On ELF/DWARF systems these options do not degenerate the quality of the debug
+information.  There could be issues with other object files/debug info formats.
+
+Only use these options when there are significant benefits from doing so.  When
+you specify these options, the assembler and linker create larger object and
+executable files and are also slower.  These options affect code generation.
+They prevent optimzations by the compiler and assembler using relative


optimizations


+locations inside a translation unit since the locations are unknown until
+link-time.


link time


+An examples for such an optimization is a call to short call
+relaxation.


I'd rewrite the last sentence as

An example of such an optimization is relaxing calls to short call 
instructions.


I think the patch is OK with those nits fixed.

-Sandra

[RFC PATCH] Add -fsanitize=noreturn support

2017-10-17 Thread Jakub Jelinek

Hi!

While we have a warning for falling through out of a noreturn function
or return in such function, the actual UB occurs only if we actually
return from those functions.

This patch attempts to instrument it.  Will need to submit the libsanitizer
part upstream first though.

2017-10-17  Jakub Jelinek  

* flag-types.h (enum sanitize_code): Add SANITIZE_NORETURN.  Or
SANITIZE_NORETURN into SANITIZE_UNDEFINED.
* sanitizer.def (BUILT_IN_UBSAN_HANDLE_NORETURN): New builtin.
* common.opt (flag_sanitize_recover): Don't set SANITIZE_NORETURN.
* opts.c (finish_options): Don't clear aggressive loop opts
just for SANITIZE_RETURN or SANITIZE_NORETURN.
(sanitizer_opts): Add noreturn.
(parse_sanitizer_options, common_handle_option): Handle
SANITIZE_NORETURN like other non-recoverable sanitizers.
* ubsan.c (instrument_noreturn): New function.
(pass_ubsan::execute): Call it.
(pass_ubsan::gate): Enable even for SANITIZE_NORETURN.
* doc/invoke.texi: Document -fsanitize=noreturn.

* c-c++-common/ubsan/noreturn-1.c: New test.
* c-c++-common/ubsan/noreturn-2.c: New test.

* ubsan/ubsan_handlers.h (NoreturnData): New type.
(noreturn): New UNRECOVERABLE handler.
* ubsan/ubsan_handlers.cc (handleNoreturn): New function.
(__ubsan::__ubsan_handle_noreturn): Likewise.
* ubsan/ubsan_checks.inc (InvalidNoreturn): New UBSAN_CHECK.

--- gcc/flag-types.h.jj 2017-10-17 11:08:00.0 +0200
+++ gcc/flag-types.h2017-10-17 13:47:25.905757381 +0200
@@ -247,6 +247,7 @@ enum sanitize_code {
   SANITIZE_BOUNDS_STRICT = 1UL << 23,
   SANITIZE_POINTER_OVERFLOW = 1UL << 24,
   SANITIZE_BUILTIN = 1UL << 25,
+  SANITIZE_NORETURN = 1UL << 26,
   SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT,
   SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE
   | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
@@ -255,7 +256,8 @@ enum sanitize_code {
   | SANITIZE_NONNULL_ATTRIBUTE
   | SANITIZE_RETURNS_NONNULL_ATTRIBUTE
   | SANITIZE_OBJECT_SIZE | SANITIZE_VPTR
-  | SANITIZE_POINTER_OVERFLOW | SANITIZE_BUILTIN,
+  | SANITIZE_POINTER_OVERFLOW | SANITIZE_BUILTIN
+  | SANITIZE_NORETURN,
   SANITIZE_UNDEFINED_NONDEFAULT = SANITIZE_FLOAT_DIVIDE | SANITIZE_FLOAT_CAST
  | SANITIZE_BOUNDS_STRICT
 };
--- gcc/sanitizer.def.jj2017-10-17 11:03:15.0 +0200
+++ gcc/sanitizer.def   2017-10-17 14:51:16.157455132 +0200
@@ -532,6 +532,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN
  "__ubsan_handle_invalid_builtin_abort",
  BT_FN_VOID_PTR,
  ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_NORETURN,
+ "__ubsan_handle_noreturn",
+ BT_FN_VOID_PTR_PTR,
+ ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DYNAMIC_TYPE_CACHE_MISS,
  "__ubsan_handle_dynamic_type_cache_miss",
  BT_FN_VOID_PTR_PTR_PTR,
--- gcc/common.opt.jj   2017-10-12 20:51:36.0 +0200
+++ gcc/common.opt  2017-10-17 15:16:34.945668469 +0200
@@ -231,7 +231,7 @@ unsigned int flag_sanitize
 
 ; What sanitizers should recover from errors
 Variable
-unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | 
SANITIZE_UNDEFINED_NONDEFAULT | SANITIZE_KERNEL_ADDRESS) & 
~(SANITIZE_UNREACHABLE | SANITIZE_RETURN)
+unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | 
SANITIZE_UNDEFINED_NONDEFAULT | SANITIZE_KERNEL_ADDRESS) & 
~(SANITIZE_UNREACHABLE | SANITIZE_RETURN | SANITIZE_NORETURN)
 
 ; What the coverage sanitizers should instrument
 Variable
--- gcc/opts.c.jj   2017-10-17 11:08:42.0 +0200
+++ gcc/opts.c  2017-10-17 14:49:34.716638724 +0200
@@ -985,7 +985,8 @@ finish_options (struct gcc_options *opts
 opts->x_flag_delete_null_pointer_checks = 0;
 
   /* Aggressive compiler optimizations may cause false negatives.  */
-  if (opts->x_flag_sanitize & ~(SANITIZE_LEAK | SANITIZE_UNREACHABLE))
+  if (opts->x_flag_sanitize & ~(SANITIZE_LEAK | SANITIZE_UNREACHABLE
+   | SANITIZE_RETURN | SANITIZE_NORETURN))
 opts->x_flag_aggressive_loop_optimizations = 0;
 
   /* Enable -fsanitize-address-use-after-scope if address sanitizer is
@@ -1522,6 +1523,7 @@ const struct sanitizer_opts_s sanitizer_
   SANITIZER_OPT (vptr, SANITIZE_VPTR, true),
   SANITIZER_OPT (pointer-overflow, SANITIZE_POINTER_OVERFLOW, true),
   SANITIZER_OPT (builtin, SANITIZE_BUILTIN, true),
+  SANITIZER_OPT (noreturn, SANITIZE_NORETURN, false),
   SANITIZER_OPT (all, ~0U, true),
 #undef SANITIZER_OPT
   { NULL, 0U, 0UL, false }
@@ -1646,7 +1648,8 @@

[PATCH] Add -fsanitize=builtin support

2017-10-17 Thread Jakub Jelinek

Hi!

On Mon, Oct 16, 2017 at 08:52:50PM +0200, Jakub Jelinek wrote:
> The following patch is an attempt at libsanitizer merge from upstream.
> Sadly libubsan has several ABI incompatible changes, dunno if we should
> fight the mess and re-add backward compatibility back, or as the patch
> does just bump soname, upstream clearly doesn't care about ABI compatibility
> at all.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, it would be nice to
> get it tested on other targets (e.g. darwin, that breaks almost all the time
> when doing merges, but no access to that).
> 
> Included is just the non-libsanitizer/ part plus GCC owned file changes
> in libsanitizer (except regenerated ones), attached is bzip2ed full patch.
> 
> Thoughts on this?

On top of this patch, which apparently added into libubsan code to detect
UB in __builtin_c[tl]z* arguments, this patch adds this instrumentation.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2017-10-17  Jakub Jelinek  

* flag-types.h (enum sanitize_code): Add SANITIZE_BUILTIN.  Or
SANITIZE_BUILTIN into SANITIZE_UNDEFINED.
* sanitizer.def (BUILT_IN_UBSAN_HANDLE_INVALID_BUILTIN,
BUILT_IN_UBSAN_HANDLE_INVALID_BUILTIN_ABORT): New builtins.
* opts.c (sanitizer_opts): Add builtin.
* ubsan.c (instrument_builtin): New function.
(pass_ubsan::execute): Call it.
(pass_ubsan::gate): Enable even for SANITIZE_BUILTIN.
* doc/invoke.texi: Document -fsanitize=builtin.

* c-c++-common/ubsan/builtin-1.c: New test.

--- gcc/flag-types.h.jj 2017-09-20 10:46:07.0 +0200
+++ gcc/flag-types.h2017-10-17 11:08:00.011686944 +0200
@@ -246,6 +246,7 @@ enum sanitize_code {
   SANITIZE_VPTR = 1UL << 22,
   SANITIZE_BOUNDS_STRICT = 1UL << 23,
   SANITIZE_POINTER_OVERFLOW = 1UL << 24,
+  SANITIZE_BUILTIN = 1UL << 25,
   SANITIZE_SHIFT = SANITIZE_SHIFT_BASE | SANITIZE_SHIFT_EXPONENT,
   SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE
   | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN
@@ -254,7 +255,7 @@ enum sanitize_code {
   | SANITIZE_NONNULL_ATTRIBUTE
   | SANITIZE_RETURNS_NONNULL_ATTRIBUTE
   | SANITIZE_OBJECT_SIZE | SANITIZE_VPTR
-  | SANITIZE_POINTER_OVERFLOW,
+  | SANITIZE_POINTER_OVERFLOW | SANITIZE_BUILTIN,
   SANITIZE_UNDEFINED_NONDEFAULT = SANITIZE_FLOAT_DIVIDE | SANITIZE_FLOAT_CAST
  | SANITIZE_BOUNDS_STRICT
 };
--- gcc/sanitizer.def.jj2017-10-17 10:10:27.0 +0200
+++ gcc/sanitizer.def   2017-10-17 11:03:15.502236152 +0200
@@ -524,6 +524,14 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN
  "__ubsan_handle_nonnull_return_v1_abort",
  BT_FN_VOID_PTR_PTR,
  ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_INVALID_BUILTIN,
+ "__ubsan_handle_invalid_builtin",
+ BT_FN_VOID_PTR,
+ ATTR_COLD_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_INVALID_BUILTIN_ABORT,
+ "__ubsan_handle_invalid_builtin_abort",
+ BT_FN_VOID_PTR,
+ ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DYNAMIC_TYPE_CACHE_MISS,
  "__ubsan_handle_dynamic_type_cache_miss",
  BT_FN_VOID_PTR_PTR_PTR,
--- gcc/opts.c.jj   2017-10-11 22:37:53.0 +0200
+++ gcc/opts.c  2017-10-17 11:08:42.055162459 +0200
@@ -1521,6 +1521,7 @@ const struct sanitizer_opts_s sanitizer_
   SANITIZER_OPT (object-size, SANITIZE_OBJECT_SIZE, true),
   SANITIZER_OPT (vptr, SANITIZE_VPTR, true),
   SANITIZER_OPT (pointer-overflow, SANITIZE_POINTER_OVERFLOW, true),
+  SANITIZER_OPT (builtin, SANITIZE_BUILTIN, true),
   SANITIZER_OPT (all, ~0U, true),
 #undef SANITIZER_OPT
   { NULL, 0U, 0UL, false }
--- gcc/ubsan.c.jj  2017-10-17 10:10:27.0 +0200
+++ gcc/ubsan.c 2017-10-17 13:31:54.051193236 +0200
@@ -2221,6 +2221,72 @@ instrument_object_size (gimple_stmt_iter
   gsi_insert_before (gsi, g, GSI_SAME_STMT);
 }
 
+/* Instrument values passed to builtin functions.  */
+
+static void
+instrument_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  location_t loc = gimple_location (stmt);
+  tree arg;
+  enum built_in_function fcode
+= DECL_FUNCTION_CODE (gimple_call_fndecl (stmt));
+  int kind = 0;
+  switch (fcode)
+{
+CASE_INT_FN (BUILT_IN_CLZ):
+  kind = 1;
+  gcc_fallthrough ();
+CASE_INT_FN (BUILT_IN_CTZ):
+  arg = gimple_call_arg (stmt, 0);
+  if (!integer_nonzerop (arg))
+   {
+ gimple *g;
+ if (!is_gimple_val (arg))
+   {
+ g = gimple_build_assign (make_ssa_name (TREE_TYPE (arg)), arg);
+ gimple_set_location (g, loc);

Unbreak Ada bootstrap (was Re: [PATCH PR/82546] tree node size)

2017-10-17 Thread Jakub Jelinek

Hi!

On Fri, Oct 13, 2017 at 02:29:40PM -0400, Nathan Sidwell wrote:
> [Although I filed this as a middle-end bug, it's really a core infra bug,
> not sure who the best reviewer is]

> 2017-10-13  Nathan Sidwell  
> 
>   PR middle-end/82546
>   gcc/
>   * tree.c (tree_code_size): Reformat.  Punt to lang hook for unknown
>   TYPE nodes.

This change broke Ada bootstrap, because the FE doesn't have any tree_size
langhook, but has one language specific tcc_type tree -
UNCONSTRAINED_ARRAY_TYPE.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-10-17  Jakub Jelinek  

* langhooks.h (struct lang_hooks): Document that tree_size langhook
may be also called on tcc_type nodes.
* langhooks.c (lhd_tree_size): Likewise.

* gcc-interface/misc.c (gnat_tree_size): New function.
(LANG_HOOKS_TREE_SIZE): Redefine.

--- gcc/langhooks.h.jj  2017-09-12 17:20:17.0 +0200
+++ gcc/langhooks.h 2017-10-17 19:49:29.277324006 +0200
@@ -307,10 +307,10 @@ struct lang_hooks
   /* Remove any parts of the tree that are used only by the FE. */
   void (*free_lang_data) (tree);
 
-  /* Determines the size of any language-specific tcc_constant or
- tcc_exceptional nodes.  Since it is called from make_node, the
- only information available is the tree code.  Expected to die
- on unrecognized codes.  */
+  /* Determines the size of any language-specific tcc_constant,
+ tcc_exceptional or tcc_type nodes.  Since it is called from
+ make_node, the only information available is the tree code.
+ Expected to die on unrecognized codes.  */
   size_t (*tree_size) (enum tree_code);
 
   /* Return the language mask used for converting argv into a sequence
--- gcc/langhooks.c.jj  2017-05-21 15:46:13.0 +0200
+++ gcc/langhooks.c 2017-10-17 19:47:13.973960166 +0200
@@ -266,8 +266,8 @@ lhd_gimplify_expr (tree *expr_p ATTRIBUT
 }
 
 /* lang_hooks.tree_size: Determine the size of a tree with code C,
-   which is a language-specific tree code in category tcc_constant or
-   tcc_exceptional.  The default expects never to be called.  */
+   which is a language-specific tree code in category tcc_constant,
+   tcc_exceptional or tcc_type.  The default expects never to be called.  */
 size_t
 lhd_tree_size (enum tree_code c ATTRIBUTE_UNUSED)
 {
--- gcc/ada/gcc-interface/misc.c.jj 2017-08-31 23:47:18.0 +0200
+++ gcc/ada/gcc-interface/misc.c2017-10-17 19:48:39.715923329 +0200
@@ -343,6 +343,23 @@ internal_error_function (diagnostic_cont
   Compiler_Abort (sp, sp_loc, true);
 }
 
+/* lang_hooks.tree_size: Determine the size of a tree with code C,
+   which is a language-specific tree code in category tcc_constant,
+   tcc_exceptional or tcc_type.  The default expects never to be called.  */
+
+static size_t
+gnat_tree_size (enum tree_code code)
+{
+  gcc_checking_assert (code >= NUM_TREE_CODES);
+  switch (code)
+{
+case UNCONSTRAINED_ARRAY_TYPE:
+  return sizeof (tree_type_non_common);
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Perform all the initialization steps that are language-specific.  */
 
 static bool
@@ -1387,6 +1404,8 @@ get_lang_specific (tree node)
 #define LANG_HOOKS_NAME"GNU Ada"
 #undef  LANG_HOOKS_IDENTIFIER_SIZE
 #define LANG_HOOKS_IDENTIFIER_SIZE sizeof (struct tree_identifier)
+#undef  LANG_HOOKS_TREE_SIZE
+#define LANG_HOOKS_TREE_SIZE   gnat_tree_size
 #undef  LANG_HOOKS_INIT
 #define LANG_HOOKS_INITgnat_init
 #undef  LANG_HOOKS_OPTION_LANG_MASK


Jakub

RE: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Michael Collison

Richard and Eric,

I see you have objected and indicated the additional cost. Have you quantified 
how much more expensive the pass is?

-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Tuesday, October 17, 2017 4:45 AM
To: Eric Botcazou 
Cc: Jeff Law ; GCC Patches ; Michael 
Collison ; Segher Boessenkool 
; Kyrill Tkachov ; nd 

Subject: Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou  wrote:
>> This looks good.  OK for the trunk.
>
> FWIW I disagree.  The patch completely shuns the existing 
> implementation of the pass, which is based on a forward scan within 
> basic blocks to identify the various interesting instructions and 
> record them, and uses full-blown def-use and use-def chains instead, 
> which are much more costly to compute.  It's not clear to me why the existing 
> implementation couldn't have been extended.
>
> The result is that, for targets for which the pass was initially written, i.e.
> targets for which most (all) arithmetic instructions clobber the 
> flags, the pass will be slower for absolutely no benefits, as the 
> existing implementation would already have caught all the interesting cases.
>
> So it's again a case of a generic change made for a specific target 
> without consideration for other, admittedly less mainstream, targets...

I agree with Eric here.

Richard.

> --
> Eric Botcazou

[Patch, fortran] PR82550 - program using submodules fails to link

2017-10-17 Thread Paul Richard Thomas

The attached patch has a comment that explains what is going on.

Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 7-branch?

Paul

2017-10-17  Paul Thomas  

PR fortran/82550
* expr.c (gfc_check_pointer_assign): A use associated procedure
target in a submodule must have the 'use_assoc' attribute set
so that the name mangling is done correctly.

2017-10-17  Paul Thomas  

PR fortran/82550
* gfortran.dg/submodule_30.f08 : New test.


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
Index: gcc/fortran/expr.c
===
*** gcc/fortran/expr.c  (revision 253748)
--- gcc/fortran/expr.c  (working copy)
*** gfc_check_pointer_assign (gfc_expr *lval
*** 3632,3637 
--- 3632,3645 
  name = s2->name;
}
  
+   /* Make the procedure use associated so that the middle end does
+the right thing with name mangling. This undoes the reset in
+parse.c(set_syms_host_assoc) and is necessary to allow the
+attributes of module procedure interfaces to be changed.  */
+   if (s2 && s2->attr.flavor == FL_PROCEDURE
+ && s2->module && s2->attr.used_in_submodule)
+   s2->attr.use_assoc = 1;
+ 
if (s2 && s2->attr.proc_pointer && s2->ts.interface)
s2 = s2->ts.interface;
  
Index: gcc/testsuite/gfortran.dg/submodule_30.f08
===
*** gcc/testsuite/gfortran.dg/submodule_30.f08  (nonexistent)
--- gcc/testsuite/gfortran.dg/submodule_30.f08  (working copy)
***
*** 0 
--- 1,42 
+ ! { dg-do run }
+ !
+ ! Test the fix for PR82550 in which the reference to 'p' in 'foo'
+ ! was not being correctly handled.
+ !
+ ! Contributed by Reinhold Bader  
+ !
+ module m_subm_18_pos
+   implicit none
+   integer :: i = 0
+   interface
+ module subroutine foo(fun_ptr)
+   procedure(p), pointer, intent(out) :: fun_ptr
+ end subroutine
+   end interface
+ contains
+   subroutine p()
+ i = 1
+   end subroutine p
+ end module m_subm_18_pos
+ submodule (m_subm_18_pos) subm_18_pos
+ implicit none
+ contains
+ module subroutine foo(fun_ptr)
+   procedure(p), pointer, intent(out) :: fun_ptr
+   fun_ptr => p
+ end subroutine
+ end submodule
+ program p_18_pos
+   use m_subm_18_pos
+   implicit none
+   procedure(), pointer :: x
+   call foo(x)
+   call x()
+   if (i == 1) then
+  write(*,*) 'OK'
+   else
+  write(*,*) 'FAIL'
+  call abort
+   end if
+ end program p_18_pos
+

[PATCH, rs6000] 1/2 Add x86 SSE2 <emmintrin,h> intrinsics to GCC PPC64LE target

2017-10-17 Thread Steven Munroe

These is the forth major contribution of X86 intrinsic equivalent
headers for PPC64LE.

X86 SSE2 technology adds double float (__m128d) support, filled in a
number 128-bit vector integer (__m128i) operations and added some MMX
conversions to and from 128-bit vector (XMM) operations.

In general the SSE2 (__m128) intrinsic's are a good match to the
PowerISA VSX 128-bit vector double facilities. This allows direct
mapping of the __m128d type to PowerPC __vector double type and allows
natural handling of parameter passing, return values, and SIMD double
operations. 

However, while both ISA's support double and float scalars in vector
registers the X86_64 and PowerPC64LE use different formats (and bits
within the vector register) for floating point scalars. This requires
extra PowerISA operations to exactly match the X86 SSE scalar (intrinsic
functions ending in *_sd) semantics. The intent is to provide a
functionally correct implementation at some reduction in performance.

Some inline assembler is required. There a several cases where we need 
to generate Data Cache Block instruction. There are no existing builtin
for flush and touch for store transient.  Also some of the double to and
from 32-bit float and int required assembler to the correct semantics
at reasonable cost. Perhaps these can be revisited when the team
completes the builtins for vec_double* and vec_float*.

part 2 adds the associated 131 DG test cases.

./gcc/ChangeLog:

2017-10-17  Steven Munroe  

* config.gcc (powerpc*-*-*): Add emmintrin.h.
* config/rs6000/emmintrin.h: New file.
* config/rs6000/x86intrin.h [__ALTIVEC__]: Include emmintrin.h.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 253786)
+++ gcc/config.gcc  (working copy)
@@ -459,7 +459,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
-   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h"
+   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} paired.h"
Index: gcc/config/rs6000/x86intrin.h
===
--- gcc/config/rs6000/x86intrin.h   (revision 253786)
+++ gcc/config/rs6000/x86intrin.h   (working copy)
@@ -39,6 +39,8 @@
 #include 
 
 #include 
+
+#include 
 #endif /* __ALTIVEC__ */
 
 #include 
Index: gcc/config/rs6000/emmintrin.h
===
--- gcc/config/rs6000/emmintrin.h   (revision 0)
+++ gcc/config/rs6000/emmintrin.h   (revision 0)
@@ -0,0 +1,2413 @@
+/* Copyright (C) 2003-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 SSE2 (__m128i, __m128d) intrinsics,
+   the PowerPC VMX/VSX ISA is a good match for vector double SIMD
+   operations.  However scalar double operations in vector (XMM)
+   registers require the POWER8 VSX ISA (2.07) level. Also there are
+   important differences for data format and placement of double
+   scalars in the vector register.
+
+   For PowerISA Scalar double in FPRs

Re: [PATCH, rs6000] Add Power 8 support to vec_revb

2017-10-17 Thread Segher Boessenkool

Hi Carl,

On Tue, Oct 17, 2017 at 09:56:43AM -0700, Carl Love wrote:
> gcc/ChangeLog:
> 
> 2017-10-17  Carl Love  
> 
>   * config/rs6000/rs6000-c.c (P8V_BUILTIN_VEC_REVB):
>   Add power 8 definitions for the builtin instances.
>   (P9V_BUILTIN_VEC_REVB): Remove the power 9 instance
>   definitions.
>   * config/rs6000/altivec.h (vec_revb): Change the
>   #define from power 9 to power 8.
>   * config/rs6000/r6000-protos.h (swap_selector_for_mode): Add extern
>   declaration.
>   * config/rs6000/rs6000.c (swap_selector_for_mode): Add
>   endian option to function.
>   * config/rs6000/rs6000-builtin.def (BU_P8V_VSX_1,
>   BU_P8V_OVERLOAD_1): Add power 8 macro expansions.
>   (BU_P9V_OVERLOAD_1): Remove power 9 overload expansion.
>   * config/rs6000/vsx.md (revb_): Add define_expand
>   to generate power 8 instructions for the vec_revb builtin.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-10-17  Carl Love  
> 
>   * gcc.target/powerpc/builtins-revb-runnable.c: New
>   runnable test file for the vec_revb builtin.

You make all changelog lines really short...  They are wrapped at 79 chars
or so normally.

> @@ -1853,6 +1853,13 @@ BU_P6_64BIT_2 (CMPB, "cmpb",   CONST,  cmpbdi3)
>  /* 1 argument VSX instructions added in ISA 2.07.  */
>  BU_P8V_VSX_1 (XSCVSPDPN,  "xscvspdpn",   CONST,  vsx_xscvspdpn)
>  BU_P8V_VSX_1 (XSCVDPSPN,  "xscvdpspn",   CONST,  vsx_xscvdpspn)
> +BU_P8V_VSX_1 (REVB_V1TI,  "revb_v1ti",   CONST,   revb_v1ti)
> +BU_P8V_VSX_1 (REVB_V2DI,  "revb_v2di",   CONST,   revb_v2di)
> +BU_P8V_VSX_1 (REVB_V4SI,  "revb_v4si",   CONST,   revb_v4si)
> +BU_P8V_VSX_1 (REVB_V8HI,  "revb_v8hi",   CONST,   revb_v8hi)
> +BU_P8V_VSX_1 (REVB_V16QI, "revb_v16qi",  CONST,   revb_v16qi)
> +BU_P8V_VSX_1 (REVB_V2DF,  "revb_v2df",   CONST,   revb_v2df)
> +BU_P8V_VSX_1 (REVB_V4SF,  "revb_v4sf",   CONST,   revb_v4sf)

The other entries have a tab instead of spaces between the last two
columns; please fix.

> +/* Return a constant vector for use as a big endian or little-endian
> +   permute control vector to reverse the order of elements of the
> +   given vector mode.  */
> +rtx
> +swap_selector_for_mode (machine_mode mode, bool endian)

What does a bool "endian" mean?  True if it is the one true endianness?
:-)

You probably should have a swap_endianness parameter, instead?  So that
most callers use "false".  Or, better, make a separate function that
swaps endianness, so that most callers can just use what they did before,
and no magic boolean parameters.

> +  if ( endian == VECTOR_ELT_ORDER_BIG)

No space after paren open.

> +;; Iterator for xxperm types supported by VSX
> +(define_mode_iterator XXBR_L [V16QI
> +   V8HI
> +   V4SI
> +   V2DI
> +   V4SF
> +   V2DF
> +   V1TI])

That's all normal vector types; is there no existing iterator you can use?

> +;; Attribute for xxbr instructions
> +(define_mode_attr VSX_XXBR [(V16QI "q_v16qi")
> + (V8HI  "h_v8hi")
> + (V4SI  "w_v4si")
> + (V2DI  "d_v2di")
> + (V4SF  "w_v4sf")
> + (V2DF  "d_v2df")
> + (V1TI  "q_v1ti")])

That's _?  Well, except that needs entries for V4SF and V2DF
as well then.

> +
> +  DONE;
> +}
> +
> +;;  [(set_attr "type" "vecperm")])
> +)

I think you forgot to finish this part?

Rest looks good.


Segher

[PATCH] C/C++: more stdlib header hints (PR c/81404)

2017-10-17 Thread David Malcolm

This patch depends on:

* "[PATCH] c-family: add name_hint/deferred_diagnostic (v2)"
  * https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01021.html
  (waiting review)

* [PATCH 3/3] C: hints for missing stdlib includes for macros and types
  * https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00125.html
  (approved, pending the prereq above)

It extends the C frontend's "knowledge" of the C stdlib within
get_c_name_hint to cover some more macros and functions, covering
a case reported in PR c/81404 ("INT_MAX"), so that rather than printing:

  t.c:5:12: error: 'INT_MAX' undeclared here (not in a function); did you mean 
'__INT_MAX__'?
   int test = INT_MAX;
  ^~~
  __INT_MAX__

we instead print:

  t.c:5:12: error: 'INT_MAX' undeclared here (not in a function)
   int test = INT_MAX;
  ^~~
  t.c:5:12: note: 'INT_MAX' is defined in header ''; did you forget 
to '#include '?
  t.c:1:1:
  +#include 
 
  t.c:5:12:
   int test = INT_MAX;
  ^~~

It also adds generalizes some of the code for this (and for the "std::"
namespace hints in the C++ frontend), moving it to a new
c-family/known-headers.cc and .h, and introducing a class known_headers.
This currently just works by scanning a hardcoded array of known
name/header associations, but perhaps in the future could be turned
into some kind of symbol database so that the compiler could record API
uses and use that to offer suggestions e.g.

foo.cc: error: 'myapi::foo' was not declared in this scope
foo.cc: note: 'myapi::foo" was declared in header 'myapi/private.h'
(included via 'myapi/public.h') when compiling 'bar.cc'; did you forget to
'#include "myapi/public.h"'?

or somesuch.

In any case, moving this to a class gives an easier way to locate the
hardcoded knowledge about the stdlib.

The patch also adds similar code to the C++ frontend covering
unqualified names in the standard library, so that rather than just
e.g.:

 t.cc:19:13: error: 'NULL' was not declared in this scope
  void *ptr = NULL;
  ^~~~

we can emit:

 t.cc:19:13: error: 'NULL' was not declared in this scope
  void *ptr = NULL;
  ^~~~
 t.cc:19:13: note: 'NULL' is defined in header ''; did you forget
 to '#include '?
 t.cc:1:1:
 +#include 

 t.cc:19:13:
   void *ptr = NULL;
   ^~~~

(Also XFAIL for PR c++/80567 added for the C++ testcase; this is a
separate pre-existing bug exposed by the testcase for PR 81404).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk once the prereqs are in place?

gcc/ChangeLog:
* Makefile.in (C_COMMON_OBJS): Add c-family/known-headers.o.

gcc/c-family/ChangeLog:
* known-headers.cc: New file, with suggest_missing_header taken
from c/c-decl.c.
* known-headers.h: Likewise.

gcc/c/ChangeLog:
PR c/81404
* c-decl.c: Include "c-family/known-headers.h".
(get_c_name_hint): Reimplement in terms of the API in
known-headers.h.  Add some knowledge about ,
, and .
(class suggest_missing_header): Move to known-header.h.

gcc/cp/ChangeLog:
* name-lookup.c: Include "c-family/known-headers.h".
(get_std_name_hint): Reimplement in terms of the API in
known-headers.h.
(get_stdlib_name_hint): New function.
(lookup_name_fuzzy): Call get_stdlib_name_hint.

gcc/testsuite/ChangeLog:
PR c/81404
PR c++/80567
* g++.dg/spellcheck-stdlib.C: New test case.
* gcc.dg/spellcheck-stdlib.c (test_INT_MAX): New.
---
 gcc/Makefile.in  |  2 +-
 gcc/c-family/known-headers.cc| 68 
 gcc/c-family/known-headers.h | 63 ++
 gcc/c/c-decl.c   | 84 ++
 gcc/cp/name-lookup.c | 89 +++-
 gcc/testsuite/g++.dg/spellcheck-stdlib.C | 84 ++
 gcc/testsuite/gcc.dg/spellcheck-stdlib.c |  9 
 7 files changed, 341 insertions(+), 58 deletions(-)
 create mode 100644 gcc/c-family/known-headers.cc
 create mode 100644 gcc/c-family/known-headers.h
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-stdlib.C

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2809619..9855919 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1190,7 +1190,7 @@ C_COMMON_OBJS = c-family/c-common.o 
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/c-semantics.o c-family/c-ada-spec.o \
   c-family/c-cilkplus.o \
   c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o \
-  c-family/c-attribs.o c-family/c-warn.o
+  c-family/c-attribs.o c-family/c-warn.o c-family/known-headers.o
 
 # Language-independent object files.
 # We put the *-match.o and insn-*.o files first so that a parallel make
diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
new file mode 100644
index 000..008788e
--- /dev/null
+++

Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-17 Thread Jeff Law

On 10/13/2017 07:02 AM, Martin Liška wrote:
> On 10/12/2017 11:54 PM, Jeff Law wrote:
>> On 10/11/2017 12:13 AM, Martin Liška wrote:
>>> 2017-10-10  Martin Liska  
>>>
>>> PR tree-optimization/82493
>>> * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>>> (test_range_functions): New function.
>>> (sbitmap_c_tests): Likewise.
>>> * selftest-run-tests.c (selftest::run_tests): Run new tests.
>>> * selftest.h (sbitmap_c_tests): New function.
>> I went ahead and committed this along with a patch to fix the off-by-one
>> error in live_bytes_read.  Bootstrapped and regression tested on x86.
>>
>> Actual patch attached for archival purposes.
>>
>> Jeff
>>
> Hello.
> 
> I wrote a patch that adds various gcc_checking_asserts and I hit following:
> 
> ./xgcc -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90 -c 
> -O2
> during GIMPLE pass: dse
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90:7:0:
> 
>   program testat
>  
> internal compiler error: in bitmap_check_index, at sbitmap.h:105
> 0x1c014c1 bitmap_check_index
>   ../../gcc/sbitmap.h:105
> 0x1c01fa7 bitmap_bit_in_range_p(simple_bitmap_def const*, unsigned int, 
> unsigned int)
>   ../../gcc/sbitmap.c:335
> 0x1179002 live_bytes_read
>   ../../gcc/tree-ssa-dse.c:497
> 0x117935a dse_classify_store
>   ../../gcc/tree-ssa-dse.c:595
> 0x1179947 dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)
>   ../../gcc/tree-ssa-dse.c:786
> 0x1179b6e dse_dom_walker::before_dom_children(basic_block_def*)
>   ../../gcc/tree-ssa-dse.c:853
> 0x1a6f659 dom_walker::walk(basic_block_def*)
>   ../../gcc/domwalk.c:308
> 0x1179cb9 execute
>   ../../gcc/tree-ssa-dse.c:907
> 
> Where we call:
> Breakpoint 1, bitmap_bit_in_range_p (bmap=0x29d6cd0, start=0, end=515) at 
> ../../gcc/sbitmap.c:335
> 335 bitmap_check_index (bmap, end);
> (gdb) p *bmap
> $1 = {n_bits = 256, size = 4, elms = {255}}
> 
> Is it a valid call or should caller check indices?
> 
> Martin
> 
> 
> 0002-Add-gcc_checking_assert-for-sbitmap.c.patch
> 
> 
> From ba3d597be70b8329abafe92da868ab5250610840 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 13 Oct 2017 13:39:08 +0200
> Subject: [PATCH 2/2] Add gcc_checking_assert for sbitmap.c.
> 
> ---
>  gcc/sbitmap.c | 39 +++
>  gcc/sbitmap.h | 25 +
>  2 files changed, 64 insertions(+)
So the only change that concerned me was the bitmap_subset_p test.  In
theory they don't need to be the same size for that test.  However, I
think we should go ahead with your patch as-is and deal with that
possibility if and when we need the capability to do a subset test with
different sized bitmaps.

jeff

Re: [committed] Fix another tree-ssa-dse.c thinko

2017-10-17 Thread Jeff Law

On 10/16/2017 01:58 PM, Martin Liška wrote:
> On 10/16/2017 07:56 PM, Jeff Law wrote:
>> With this patch we get a clean bootstrap & regression test with Martin's
>> latest sbitmap checking patches on x86.
> 
> Thanks Jeff for testing. May I consider this as green for installation
> of my patch?
I hadn't actually looked at the patch, just installed it for testing
purposes.  I'll take a looksie now :-)

jeff

Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-17 Thread Jan Hubicka

> On Tue, 17 Oct 2017, Jan Hubicka wrote:
> 
> > Hi,
> > gether/scatter loads tends to be expensive (at least for x86) while we now 
> > account them
> > as vector loads/stores which are cheap.  This patch adds vectorizer cost 
> > entry for these
> > so this can be modelled more realistically.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> Ok.  gather and load is somewhat redundant, likewise
> scatter and store.  So you might want to change it to just
> vector_gather and vector_scatter.  Even vector_ is redundant...

Hehe, comming from outside of vectorizer world, I did not know what
scatter/gather is and thus I wanted to keep load/store and vec in so it will be
easier to google for those who will need to fill in the numbers in future :)
> 
> Best available implementations manage to hide the vector build
> cost and just expose the latency of the load(s).  I wonder what
> Zen does here ;)

According to Agner's tables, gathers range from 12 ops (vgatherdpd)
to 66 ops (vpgatherdd).  I assume that CPU needs to do following:

1) transfer the offsets sse->ALU unit for address generation (3 cycles
   each, 2 ops)
2) do the address calcualtion (2 ops, probably 4 ops because it does not map 
naturally
   to AGU)
2) do the load (7 cycles each, 2 ops)
3) merge results (1 ops)

so I get 7 ops, not sure what remaining 5 do.

Agner does not account time, but According to
http://users.atw.hu/instlatx64/AuthenticAMD0800F11_K17_Zen_InstLatX64.txt the
gather time ranges from 14 cycles (vgatherpd) to 20 cycles.  Here I guess it is
3+1+7+1=12 so it seems to work.

If you implement gather by hand, you save the SSE->address caluclation path and
thus you can get faster.
> 
> Note the most major source of impreciseness in the cost model
> is from vec_perm because we lack the information of the
> permutation mask which means we can't distinguish between
> cross-lane and intra-lane permutes.

Besides that we lack information about what operation we do (addition
or division?) which may be useful to pass down, especially because we do
have relevant information handy in the x86_cost tables.  So I am thinking
of adding extra parameter to the hook telling the operation.
What info we need to pass for permutations?

Honza
> 
> Richard.
> 
> > Honza
> > 
> > 2017-10-17  Jan Hubicka  
> > 
> > * target.h (enum vect_cost_for_stmt): Add vec_gather_load and
> > vec_scatter_store
> > * tree-vect-stmts.c (record_stmt_cost): Make difference between normal
> > and scatter/gather ops.
> > 
> > * aarch64/aarch64.c (aarch64_builtin_vectorization_cost): Add
> > vec_gather_load and vec_scatter_store.
> > * arm/arm.c (arm_builtin_vectorization_cost): Likewise.
> > * powerpcspe/powerpcspe.c (rs6000_builtin_vectorization_cost): Likewise.
> > * rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Likewise.
> > * s390/s390.c (s390_builtin_vectorization_cost): Likewise.
> > * spu/spu.c (spu_builtin_vectorization_cost): Likewise.
> > 
> > Index: config/aarch64/aarch64.c
> > ===
> > --- config/aarch64/aarch64.c(revision 253789)
> > +++ config/aarch64/aarch64.c(working copy)
> > @@ -8547,9 +8547,10 @@ aarch64_builtin_vectorization_cost (enum
> > return fp ? costs->vec_fp_stmt_cost : costs->vec_int_stmt_cost;
> >  
> >case vector_load:
> > +  case vector_gather_load:
> > return costs->vec_align_load_cost;
> >  
> > -  case vector_store:
> > +  case vector_scatter_store:
> > return costs->vec_store_cost;
> >  
> >case vec_to_scalar:
> > Index: config/arm/arm.c
> > ===
> > --- config/arm/arm.c(revision 253789)
> > +++ config/arm/arm.c(working copy)
> > @@ -11241,9 +11241,11 @@ arm_builtin_vectorization_cost (enum vec
> >  return current_tune->vec_costs->vec_stmt_cost;
> >  
> >case vector_load:
> > +  case vector_gather_load:
> >  return current_tune->vec_costs->vec_align_load_cost;
> >  
> >case vector_store:
> > +  case vector_scatter_store:
> >  return current_tune->vec_costs->vec_store_cost;
> >  
> >case vec_to_scalar:
> > Index: config/powerpcspe/powerpcspe.c
> > ===
> > --- config/powerpcspe/powerpcspe.c  (revision 253789)
> > +++ config/powerpcspe/powerpcspe.c  (working copy)
> > @@ -5834,6 +5834,8 @@ rs6000_builtin_vectorization_cost (enum
> >case vector_stmt:
> >case vector_load:
> >case vector_store:
> > +  case vector_gather_load:
> > +  case vector_scatter_store:
> >case vec_to_scalar:
> >case scalar_to_vec:
> >case cond_branch_not_taken:
> > Index: config/rs6000/rs6000.c
> > ===
> > --- config/rs6000/rs6000.c  (revision

Re: [PATCH, rs6000] Add Power 8 support to vec_revb

2017-10-17 Thread Carl Love

GCC maintainers:

I have addressed the issues with the vec_revb patch mentioned by Segher.
I have retested the updated patch on:

 powerpc64-unknown-linux-gnu (Power 8 BE),
 powerpc64le-unknown-linux-gnu (Power 8 LE),   
 powerpc64le-unknown-linux-gnu (Power 9 LE)

without regressions.  

Please let me know if the following patch is acceptable.  Thanks.

   Carl Love

---

gcc/ChangeLog:

2017-10-17  Carl Love  

* config/rs6000/rs6000-c.c (P8V_BUILTIN_VEC_REVB):
Add power 8 definitions for the builtin instances.
(P9V_BUILTIN_VEC_REVB): Remove the power 9 instance
definitions.
* config/rs6000/altivec.h (vec_revb): Change the
#define from power 9 to power 8.
* config/rs6000/r6000-protos.h (swap_selector_for_mode): Add extern
declaration.
* config/rs6000/rs6000.c (swap_selector_for_mode): Add
endian option to function.
* config/rs6000/rs6000-builtin.def (BU_P8V_VSX_1,
BU_P8V_OVERLOAD_1): Add power 8 macro expansions.
(BU_P9V_OVERLOAD_1): Remove power 9 overload expansion.
* config/rs6000/vsx.md (revb_): Add define_expand
to generate power 8 instructions for the vec_revb builtin.

gcc/testsuite/ChangeLog:

2017-10-17  Carl Love  

* gcc.target/powerpc/builtins-revb-runnable.c: New
runnable test file for the vec_revb builtin.
---
 gcc/config/rs6000/altivec.h|   3 +-
 gcc/config/rs6000/rs6000-builtin.def   |  10 +-
 gcc/config/rs6000/rs6000-c.c   |  44 +--
 gcc/config/rs6000/rs6000-protos.h  |   2 +
 gcc/config/rs6000/rs6000.c | 102 --
 gcc/config/rs6000/vsx.md   |  45 +++
 .../gcc.target/powerpc/builtins-revb-runnable.c| 352 +
 7 files changed, 500 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-revb-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index c8e508c..a05e23a 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -415,6 +415,7 @@
 #define vec_vsubuqm __builtin_vec_vsubuqm
 #define vec_vupkhsw __builtin_vec_vupkhsw
 #define vec_vupklsw __builtin_vec_vupklsw
+#define vec_revb __builtin_vec_revb
 #endif
 
 #ifdef __POWER9_VECTOR__
@@ -476,8 +477,6 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
-
-#define vec_revb __builtin_vec_revb
 #endif
 
 /* Predicates.
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 850164a..7ca2974 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1853,6 +1853,13 @@ BU_P6_64BIT_2 (CMPB, "cmpb", CONST,  cmpbdi3)
 /* 1 argument VSX instructions added in ISA 2.07.  */
 BU_P8V_VSX_1 (XSCVSPDPN,  "xscvspdpn", CONST,  vsx_xscvspdpn)
 BU_P8V_VSX_1 (XSCVDPSPN,  "xscvdpspn", CONST,  vsx_xscvdpspn)
+BU_P8V_VSX_1 (REVB_V1TI,  "revb_v1ti", CONST,   revb_v1ti)
+BU_P8V_VSX_1 (REVB_V2DI,  "revb_v2di", CONST,   revb_v2di)
+BU_P8V_VSX_1 (REVB_V4SI,  "revb_v4si", CONST,   revb_v4si)
+BU_P8V_VSX_1 (REVB_V8HI,  "revb_v8hi", CONST,   revb_v8hi)
+BU_P8V_VSX_1 (REVB_V16QI, "revb_v16qi",CONST,   revb_v16qi)
+BU_P8V_VSX_1 (REVB_V2DF,  "revb_v2df", CONST,   revb_v2df)
+BU_P8V_VSX_1 (REVB_V4SF,  "revb_v4sf", CONST,   revb_v4sf)
 
 /* 1 argument altivec instructions added in ISA 2.07.  */
 BU_P8V_AV_1 (ABS_V2DI,   "abs_v2di",   CONST,  absv2di2)
@@ -1962,6 +1969,7 @@ BU_P8V_OVERLOAD_1 (VPOPCNTUH, "vpopcntuh")
 BU_P8V_OVERLOAD_1 (VPOPCNTUW,  "vpopcntuw")
 BU_P8V_OVERLOAD_1 (VPOPCNTUD,  "vpopcntud")
 BU_P8V_OVERLOAD_1 (VGBBD,  "vgbbd")
+BU_P8V_OVERLOAD_1 (REVB,   "revb")
 
 /* ISA 2.07 vector overloaded 2 argument functions.  */
 BU_P8V_OVERLOAD_2 (EQV,"eqv")
@@ -2073,8 +2081,6 @@ BU_P9V_OVERLOAD_1 (VSTDCNQP,  "scalar_test_neg_qp")
 BU_P9V_OVERLOAD_1 (VSTDCNDP,   "scalar_test_neg_dp")
 BU_P9V_OVERLOAD_1 (VSTDCNSP,   "scalar_test_neg_sp")
 
-BU_P9V_OVERLOAD_1 (REVB,   "revb")
-
 BU_P9V_OVERLOAD_1 (VEXTRACT_FP_FROM_SHORTH, "vextract_fp_from_shorth")
 BU_P9V_OVERLOAD_1 (VEXTRACT_FP_FROM_SHORTL, "vextract_fp_from_shortl")
 
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 897306c..0706319 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -5532,36 +5532,38 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
 RS6000_BTI_unsigned_V16QI, 0 },
 
-  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
-RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
-  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
-

Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-17 Thread Andrew MacLeod


On 10/17/2017 08:18 AM, Richard Sandiford wrote:

Aldy Hernandez  writes:

Hi folks!

Calling print_hex() on a widest_int with the most significant bit turned
on can lead to a leading zero being printed (0x0). This produces
confusing dumps to say the least, especially when you incorrectly assume
an integer is NOT signed :).

That's the intended behaviour though.  wide_int-based types only use as
many HWIs as they need to store their current value, with any other bits
in the value being a sign extension of the top bit.  So if the most
significant HWI in a widest_int is zero, that HWI is there to say that
the previous HWI should be zero- rather than sign-extended.

So:

0x0  -> (1 << 32) - 1 to infinite precision
   (i.e. a positive value)
0x   -> -1

Thanks,
Richard


I for one find this very confusing.  If I have a 128 bit value, I don't 
expect to see a 132 bits.  And there are enough 0's its not obvious when 
I look.


I don't think a leading 0 should be printed if "precision" bits have 
already been printed.



Andrew

Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-17 Thread Mike Stump

On Oct 17, 2017, at 5:18 AM, Richard Sandiford  
wrote:
> 
> Aldy Hernandez  writes:
>> This produces confusing dumps to say the least

> That's the intended behaviour though.

>   0x0  -> (1 << 32) - 1 to infinite precision
>  (i.e. a positive value)
>   0x   -> -1

Another potential way around this would be to print the leading - when 
applicable and negate the number.  I don't have any strong opinions about this, 
but thought I would mention it.  This would then allow the trimming of the 
leading 0 without confusion.

[PATCH, i386]: Do not emit x87 FP reg-stack compensation pops from output_fp_compare

2017-10-17 Thread Uros Bizjak

Hello!

Currently, x87 FP stack compensation pops for FTST and FCOMIP
instructions are emitted from output_fp_compare function as an
assembly code. Attached patch moves detection of these two
instructions to reg-stack.c and handles compensation pops during
reg-stack processing. This change further allows for a massive cleanup
in output_fp_compare.

2017-10-17  Uros Bizjak  

* reg-stack.c (compare_for_stack_reg): Add bool argument.
Detect FTST instruction and handle its register pops.  Only pop
second operand if can_pop_second_op is true.
(subst_stack_regs_pat) : Detect FCOMI instruction to
set can_pop_second_op to false in the compare_for_stack_reg call.

* config/i386/i386.md (*cmpi): Only call
output_fp_compare for stack register operands.
* config/i386/i386.c (output_fp_compare): Do not output SSE compare
instructions here.  Do not emit stack register pops here.  Assert
that FCOMPP pops next to top stack register.  Rewrite function.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 253812)
+++ config/i386/i386.c  (working copy)
@@ -18879,120 +18879,65 @@ output_387_ffreep (rtx *operands ATTRIBUTE_UNUSED,
should be used.  UNORDERED_P is true when fucom should be used.  */
 
 const char *
-output_fp_compare (rtx_insn *insn, rtx *operands, bool eflags_p, bool 
unordered_p)
+output_fp_compare (rtx_insn *insn, rtx *operands,
+  bool eflags_p, bool unordered_p)
 {
-  int stack_top_dies;
-  rtx cmp_op0, cmp_op1;
-  int is_sse = SSE_REG_P (operands[0]) || SSE_REG_P (operands[1]);
+  rtx *xops = eflags_p ? [0] : [1];
+  bool stack_top_dies;
 
+  static char buf[40];
+  const char *p, *r;
+ 
+  gcc_assert (STACK_TOP_P (xops[0]));
+
+  stack_top_dies = find_regno_note (insn, REG_DEAD, FIRST_STACK_REG);
+
   if (eflags_p)
 {
-  cmp_op0 = operands[0];
-  cmp_op1 = operands[1];
+  p = unordered_p ? "fucomi" : "fcomi";
+  strcpy (buf, p);
+
+  r = "p\t{%y1, %0|%0, %y1}";
+  strcat (buf, r + !stack_top_dies);
+
+  return buf;
 }
-  else
+
+  if (STACK_REG_P (xops[1])
+  && stack_top_dies
+  && find_regno_note (insn, REG_DEAD, FIRST_STACK_REG + 1))
 {
-  cmp_op0 = operands[1];
-  cmp_op1 = operands[2];
+  gcc_assert (REGNO (xops[1]) == FIRST_STACK_REG + 1);
+
+  /* If both the top of the 387 stack die, and the other operand
+is also a stack register that dies, then this must be a
+`fcompp' float compare.  */
+  p = unordered_p ? "fucompp" : "fcompp";
+  strcpy (buf, p);
 }
-
-  if (is_sse)
+  else if (const0_operand (xops[1], VOIDmode))
 {
-  if (GET_MODE (operands[0]) == SFmode)
-   if (unordered_p)
- return "%vucomiss\t{%1, %0|%0, %1}";
-   else
- return "%vcomiss\t{%1, %0|%0, %1}";
-  else
-   if (unordered_p)
- return "%vucomisd\t{%1, %0|%0, %1}";
-   else
- return "%vcomisd\t{%1, %0|%0, %1}";
+  gcc_assert (!unordered_p);
+  strcpy (buf, "ftst");
 }
-
-  gcc_assert (STACK_TOP_P (cmp_op0));
-
-  stack_top_dies = find_regno_note (insn, REG_DEAD, FIRST_STACK_REG) != 0;
-
-  if (cmp_op1 == CONST0_RTX (GET_MODE (cmp_op1)))
+  else
 {
-  if (stack_top_dies)
+  if (GET_MODE_CLASS (GET_MODE (xops[1])) == MODE_INT)
{
- output_asm_insn ("ftst\n\tfnstsw\t%0", operands);
- return output_387_ffreep (operands, 1);
+ gcc_assert (!unordered_p);
+ p = "ficom";
}
   else
-   return "ftst\n\tfnstsw\t%0";
-}
+   p = unordered_p ? "fucom" : "fcom";
 
-  if (STACK_REG_P (cmp_op1)
-  && stack_top_dies
-  && find_regno_note (insn, REG_DEAD, REGNO (cmp_op1))
-  && REGNO (cmp_op1) != FIRST_STACK_REG)
-{
-  /* If both the top of the 387 stack dies, and the other operand
-is also a stack register that dies, then this must be a
-`fcompp' float compare */
+  strcpy (buf, p);
 
-  if (eflags_p)
-   {
- /* There is no double popping fcomi variant.  Fortunately,
-eflags is immune from the fstp's cc clobbering.  */
- if (unordered_p)
-   output_asm_insn ("fucomip\t{%y1, %0|%0, %y1}", operands);
- else
-   output_asm_insn ("fcomip\t{%y1, %0|%0, %y1}", operands);
- return output_387_ffreep (operands, 0);
-   }
-  else
-   {
- if (unordered_p)
-   return "fucompp\n\tfnstsw\t%0";
- else
-   return "fcompp\n\tfnstsw\t%0";
-   }
+  r = "p%Z2\t%y2";
+  strcat (buf, r + !stack_top_dies);
 }
-  else
-{
-  /* Encoded here as eflags_p | intmode | unordered_p | stack_top_dies.  */
 
-  static const char * const alt[16] =
-  {
-   "fcom%Z2\t%y2\n\tfnstsw\t%0",
-

[PATCH] Canonicalize constant multiplies in division

2017-10-17 Thread Wilco Dijkstra

This patch implements some of the optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

Canonicalize x / (C1 * y) into (x * C2) / y.

This moves constant multiplies out of the RHS of a division in order
to allow further simplifications (such as (C1 * x) / (C2 * y) ->
(C3 * x) / y) and to enable more reciprocal CSEs.

OK for commit?

ChangeLog
2017-10-17  Wilco Dijkstra    
Jackson Woodruff  

gcc/
PR 71026/tree-optimization
* match.pd: Canonicalize constant multiplies in division.

gcc/testsuite/
PR 71026/tree-optimization
* gcc.dg/cse_recip.c: New test.
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 
ade851f78fb9ac6ce03b752f63e03f3b5a19cda9..532fabf51ce8a45d54147a3ae0b3917e22b1a4d0
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -342,10 +342,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (negate @0)))
 
 (if (flag_reciprocal_math)
- /* Convert (A/B)/C to A/(B*C)  */
+ /* Convert (A/B)/C to A/(B*C). */
  (simplify
   (rdiv (rdiv:s @0 @1) @2)
-   (rdiv @0 (mult @1 @2)))
+  (rdiv @0 (mult @1 @2)))
+
+ /* Canonicalize x / (C1 * y) to (x * C2) / y.  */
+ (if (optimize)
+  (simplify
+   (rdiv @0 (mult @1 REAL_CST@2))
+   (if (!real_zerop (@1))
+(with
+ { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @2); }
+ (if (tem)
+  (rdiv (mult @0 { tem; } ) @1))
 
  /* Convert A/(B/C) to (A/B)*C  */
  (simplify
@@ -628,15 +638,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (tem)
  (rdiv { tem; } @1)
 
-/* Convert C1/(X*C2) into (C1/C2)/X  */
-(simplify
- (rdiv REAL_CST@0 (mult @1 REAL_CST@2))
-  (if (flag_reciprocal_math)
-   (with
-{ tree tem = const_binop (RDIV_EXPR, type, @0, @2); }
-(if (tem)
- (rdiv { tem; } @1)
-
 /* Simplify ~X & X as zero.  */
 (simplify
  (bit_and:c (convert? @0) (convert? (bit_not @0)))
diff --git a/gcc/testsuite/gcc.dg/cse_recip.c b/gcc/testsuite/gcc.dg/cse_recip.c
new file mode 100644
index 
..20ed529c33ebecc911fb540a8b2b597bba0023e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cse_recip.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+void
+cse_recip (float x, float y, float *a)
+{
+  a[0] = y / (5 * x);
+  a[1] = y / (3 * x);
+  a[2] = y / x;
+}
+
+/* { dg-final { scan-tree-dump-times " / " 1 "optimized" } } */

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Mike Stump

On Oct 17, 2017, at 8:34 AM, Tom de Vries  wrote:
> 
>>> OK, if full testing is ok?
>> I believe this was fully intentional and the presence/absence of
>> explicit dg-do run can then be used to decide if it should loop through
>> options or not.
> 
> I don't see an explicit mention of ignoring dg-do-what-default in the 
> original submission

So, my guidance would be for fortran to behave generally speaking, like other 
languages and for this type of code to be predictable across languages and 
library components.  That's as far I can go, so, I'll have to punt the decision 
to domain experts to decide which direction they _want_ to go in.  If they 
don't know (or don't care or can't agree and want someone to break a tie), my 
inclination is to approve it.

[PATCH] Canonicalize negates in division

2017-10-17 Thread Wilco Dijkstra

This patch implements some of the optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

Canonicalize x / (- y) into (-x) / y.

This moves negates out of the RHS of a division in order to
allow further simplifications and potentially more reciprocal CSEs.

OK for commit?

ChangeLog
2017-10-17  Wilco Dijkstra    
Jackson Woodruff  

gcc/
PR 71026/tree-optimization
* match.pd: Canonicalize negate in division.

gcc/testsuite/
PR 71026/tree-optimization
* gcc.dg/div_neg: New test.
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 
cb48f079b4a310272e49cc319a1b3b0ff2023ba4..ade851f78fb9ac6ce03b752f63e03f3b5a19cda9
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -352,6 +352,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (rdiv @0 (rdiv:s @1 @2))
(mult (rdiv @0 @1) @2)))
 
+/* Simplify x / (- y) to -x / y.  */
+(simplify
+ (rdiv @0 (negate @1))
+ (rdiv (negate @0) @1))
+
 (if (flag_unsafe_math_optimizations)
   /* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
   (for op (lt le gt ge)
diff --git a/gcc/testsuite/gcc.dg/div_neg.c b/gcc/testsuite/gcc.dg/div_neg.c
new file mode 100644
index 
..da499cda2fba6c943ec99c55cae2ea389f9e1cca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/div_neg.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+float
+div_neg (float x, float y)
+{
+  return (-x / y) * (x / -y);
+}
+
+/* { dg-final { scan-tree-dump-times " / " 1 "optimized" } } */

[PATCH] Simplify floating point comparisons

2017-10-17 Thread Wilco Dijkstra

This patch implements some of the optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

Simplify (C / x > 0.0) into x > 0.0.

If C is negative the comparison is reversed. 

Simplify (x * C1) > C2 into x > (C2 / C1).

Again, if C1 is negative the comparison is reversed.
Both transformations are only done with -funsafe-math-optimizations,
the constant is non-zero, and not a NaN.

OK for commit?

ChangeLog
2017-10-17  Wilco Dijkstra    
Jackson Woodruff  

gcc/
PR 71026/tree-optimization
* match.pd: Simplify floating point comparisons.

gcc/testsuite/
PR 71026/tree-optimization
* gcc.dg/associate_comparison_1.c: New test.
--
diff --git a/gcc/match.pd b/gcc/match.pd
index 
e58a65af59b44a6b82ed8705f62966c5e6f251ac..cb48f079b4a310272e49cc319a1b3b0ff2023ba4
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -352,6 +352,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (rdiv @0 (rdiv:s @1 @2))
(mult (rdiv @0 @1) @2)))
 
+(if (flag_unsafe_math_optimizations)
+  /* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
+  (for op (lt le gt ge)
+   neg_op (gt ge lt le)
+(simplify
+  (op (rdiv REAL_CST@0 @1) real_zerop@2)
+  (switch
+   (if (real_less (, TREE_REAL_CST_PTR (@0)))
+   (op @1 @2))
+   /* For C < 0, use the inverted operator.  */
+   (if (real_less (TREE_REAL_CST_PTR (@0), ))
+   (neg_op @1 @2))
+
 /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
  (simplify
@@ -3546,6 +3559,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(rdiv @2 @1))
(rdiv (op @0 @2) @1)))
 
+ (for cmp (lt le gt ge)
+  neg_cmp (gt ge lt le)
+  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
+  (simplify
+   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
+   (with
+{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
+(if (tem)
+ (switch
+  (if (real_less (, TREE_REAL_CST_PTR (@1)))
+   (cmp @0 { tem; }))
+  (if (real_less (TREE_REAL_CST_PTR (@1), ))
+   (neg_cmp @0 { tem; })))
+
  /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
  (for root (SQRT CBRT)
   (simplify
diff --git a/gcc/testsuite/gcc.dg/associate_comparison_1.c 
b/gcc/testsuite/gcc.dg/associate_comparison_1.c
new file mode 100644
index 
..d051f052e13812c91cbd2d559bf2af8fae128ee1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/associate_comparison_1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized" } */
+
+int
+cmp_mul_1 (float x)
+{
+  return x * 3 <= 100;
+}
+
+int
+cmp_mul_2 (float x)
+{
+  return x * -5 > 100;
+}
+
+int
+div_cmp_1 (float x, float y)
+{
+  return x / 3 <= y;
+}
+
+int
+div_cmp_2 (float x, float y)
+{
+  return x / 3 <= 1;
+}
+
+int
+inv_cmp (float x)
+{
+  return 5 / x >= 0;
+}
+
+/* { dg-final { scan-tree-dump-not " / " "optimized" } } */

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Tom de Vries


On 10/17/2017 05:34 PM, Tom de Vries wrote:

On 10/17/2017 04:46 PM, Jakub Jelinek wrote:



the presence/absence of
explicit dg-do run can then be used to decide if it should loop through
options or not.




I'd be in favor of specifying this clearly, f.i. as:
...
'! { dg-no-torture-options }'
...

Thanks,
- Tom

Re: [PATCH, testsuite] Add dg-require-stack-size

2017-10-17 Thread Mike Stump

On Oct 16, 2017, at 3:16 AM, Tom de Vries  wrote:
> 
> I noticed gcc.dg/tree-ssa/ldist-27.c failing for nvptx due to a too large 
> stack size.

> OK for trunk?

Hum.  There is an existing mechanism (find-grep STACK_SIZE) in the tree to 
handle the same issue.  Did you consider using it?

I think I like the, trim the test case down solution better and STACK_SIZE can 
drive that.  Maybe a tree-saa person to weigh in on the test case.

But, if the optimization pass likes large things, certainly the patch is Ok.

Re: [PATCH] Do not put gimple stmt on an abnormal edge (PR sanitizer/82545).

2017-10-17 Thread Jakub Jelinek

On Mon, Oct 16, 2017 at 10:15:04PM +0200, Martin Liška wrote:
> Hi.
> 
> As discussed with Jakub on IRC, we should not put ASAN reporting function
> on critical edges. Can that potentially lead to a missed use-after-scope,
> but I guess it's very rare.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2017-10-16  Martin Liska  
> 
>   PR sanitizer/82545
>   * asan.c (asan_expand_poison_ifn): Do not put gimple stmt
>   on an abnormal edge.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-10-16  Martin Liska  
> 
>   PR sanitizer/82545
>   * gcc.dg/asan/pr82545.c: New test.

Ok, with a nit:

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/asan/pr82545.c
> @@ -0,0 +1,15 @@
> +/* PR sanitizer/82545.  */
> +/* { dg-do compile } */
> +
> +extern void c(int);
> +extern void d(void);
> +
> +void a(void) {
> +  {
> +int b;
> +
> +__builtin_setjmp(0);

Please call __builtin_setjmp with a valid argument in a global var,
like:
void *buf[5];
...
__builtin_setjmp(buf);

> +c(b);
> +  }
> +  d();
> +}
> 


Jakub

[C++ PATCH 82560] missing dtor call

2017-10-17 Thread Nathan Sidwell

In a 'new T(whatever)' expression, we'll never call T::~T.  We used to 
generate such a cleanup (but then throw it away in optimization).  But 
now dtors can be deleted, so that approach could fail.  My patch for 
78469 fixed that.  But caused this problem.  The only cleanup we should 
not be generating is for the object being newed.  Like the decltype 
handling in build_new_method_call, we shouldn't be passing no_cleanup 
down either.


Applying to trunk.

nathan
--
Nathan Sidwell
2017-10-17  Nathan Sidwell  

	PR c++/82560
	* call.c (build_over_call): Don't pass tf_no_cleanup to nested
	calls.

	PR c++/82560
	* g++.dg/cpp0x/pr82560.C: New.

Index: cp/call.c
===
--- cp/call.c	(revision 253819)
+++ cp/call.c	(working copy)
@@ -7717,8 +7717,11 @@ build_over_call (struct z_candidate *can
 }
 
   /* N3276 magic doesn't apply to nested calls.  */
-  int decltype_flag = (complain & tf_decltype);
+  tsubst_flags_t decltype_flag = (complain & tf_decltype);
   complain &= ~tf_decltype;
+  /* No-Cleanup doesn't apply to nested calls either.  */
+  tsubst_flags_t no_cleanup_complain = complain;
+  complain &= ~tf_no_cleanup;
 
   /* Find maximum size of vector to hold converted arguments.  */
   parmlen = list_length (parm);
@@ -7916,7 +7919,7 @@ build_over_call (struct z_candidate *can
   if (flags & LOOKUP_NO_CONVERSION)
 	conv->user_conv_p = true;
 
-  tsubst_flags_t arg_complain = complain & (~tf_no_cleanup);
+  tsubst_flags_t arg_complain = complain;
   if (!conversion_warning)
 	arg_complain &= ~tf_warning;
 
@@ -8164,7 +8167,8 @@ build_over_call (struct z_candidate *can
   else if (default_ctor_p (fn))
 	{
 	  if (is_dummy_object (argarray[0]))
-	return force_target_expr (DECL_CONTEXT (fn), void_node, complain);
+	return force_target_expr (DECL_CONTEXT (fn), void_node,
+  no_cleanup_complain);
 	  else
 	return cp_build_indirect_ref (argarray[0], RO_NULL, complain);
 	}
@@ -9062,7 +9066,6 @@ build_new_method_call_1 (tree instance,
  static member function.  */
   instance = mark_type_use (instance);
 
-
   /* Figure out whether to skip the first argument for the error
  message we will display to users if an error occurs.  We don't
  want to display any compiler-generated arguments.  The "this"
Index: testsuite/g++.dg/cpp0x/pr82560.C
===
--- testsuite/g++.dg/cpp0x/pr82560.C	(revision 0)
+++ testsuite/g++.dg/cpp0x/pr82560.C	(working copy)
@@ -0,0 +1,28 @@
+// { dg-do run { target c++11 } }
+// PR82560, failed to destruct default arg inside new
+
+static int liveness = 0;
+
+struct Foo {
+
+  Foo (int) {
+liveness++;
+  }
+
+  ~Foo() {
+liveness--;
+  }
+
+};
+
+struct Bar {
+  Bar (Foo = 0) { }
+  ~Bar() { }
+};
+
+int main()
+{
+  delete new Bar();
+
+  return liveness != 0;;
+}

[PATCH, middle-end/82577] Fix DECL_ASSEMBLER_NAME ICE

2017-10-17 Thread Nathan Sidwell

This fixes a new ICE I caused by breaking out HAS_DECL_ASSEMBLER_NAME_P 
from DECL_ASSEMBLER_NAME_SET_P.  alias.c needs to check it.  As it's 
doing explicit HAS and SET checking, it might as well use the RAW 
accessor too.


Committing as obvious.

nathan
--
Nathan Sidwell
2017-10-17  Nathan Sidwell  

	gcc/
	PR middle-end/82577
	* alias.c (compare_base_decls): Check HAS_DECL_ASSEMBLER_NAME_P,
	use DECL_ASSEMBLER_NAME_RAW.

	gcc/testsuite/
	PR middle-end/82577
	* g++.dg/opt/pr82577.C: New.

Index: alias.c
===
--- alias.c	(revision 253818)
+++ alias.c	(working copy)
@@ -2047,13 +2047,15 @@ compare_base_decls (tree base1, tree bas
 return 1;
 
   /* If we have two register decls with register specification we
- cannot decide unless their assembler name is the same.  */
+ cannot decide unless their assembler names are the same.  */
   if (DECL_REGISTER (base1)
   && DECL_REGISTER (base2)
+  && HAS_DECL_ASSEMBLER_NAME_P (base1)
+  && HAS_DECL_ASSEMBLER_NAME_P (base2)
   && DECL_ASSEMBLER_NAME_SET_P (base1)
   && DECL_ASSEMBLER_NAME_SET_P (base2))
 {
-  if (DECL_ASSEMBLER_NAME (base1) == DECL_ASSEMBLER_NAME (base2))
+  if (DECL_ASSEMBLER_NAME_RAW (base1) == DECL_ASSEMBLER_NAME_RAW (base2))
 	return 1;
   return -1;
 }
Index: testsuite/g++.dg/opt/pr82577.C
===
--- testsuite/g++.dg/opt/pr82577.C	(revision 0)
+++ testsuite/g++.dg/opt/pr82577.C	(working copy)
@@ -0,0 +1,17 @@
+// { dg-additional-options "-O2" }
+// PR c++/82577 ICE when optimizing
+
+class a {
+public:
+  int *b();
+};
+struct c {
+  int d;
+  a e;
+} f;
+void fn1(register c *g) {
+  register int *h;
+  do
+(h) = g->e.b() + (g)->d;
+  while ();
+}

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Tom de Vries


On 10/17/2017 04:46 PM, Jakub Jelinek wrote:

On Tue, Oct 17, 2017 at 04:42:58PM +0200, Tom de Vries wrote:

I found the culprit, in gfortran-dg-runtest:
...
 # look if this is dg-do-run test, in which case

 # we cycle through the option list, otherwise we don't

 if [expr [search_for $test "dg-do run"]] {
 set option_list $torture_with_loops
} else {
 set option_list [list { -O } ]
 }
...
This doesn't take dg-do-what-default into account. [ Note that this search
also triggers on '! bla bla dg-do run bla bla' ]

Attached patch fixes this.

Tested on x86_64 with test-case
libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90.

Verified that that adding '! dg-do compiler' switches back to running with
'-O'.

OK, if full testing is ok?


I believe this was fully intentional and the presence/absence of
explicit dg-do run can then be used to decide if it should loop through
options or not.


I don't see an explicit mention of ignoring dg-do-what-default in the 
original submission ( 
https://gcc.gnu.org/ml/fortran/2004-07/msg00166.html ). It was written 
in a dg-do-what-default == compile context, and I think the 
dg-do-what-default == run scenario was overlooked. Note also that the 
comment says it's trying to determine if it's a 'dg-do-run test', not if 
it contains a 'dg-do run' string.


Thomas stumbled upon the same problem here ( 
https://gcc.gnu.org/ml/fortran/2013-08/msg00042.html ). AFAIU, the reply 
from Mikael Morin here ( 
https://gcc.gnu.org/ml/fortran/2013-08/msg00044.html ) implies that he 
would consider it a bug if dg-do-what-default would be 'run' for 
gfortran.dg.


Perhaps this behaviour has been intentionally exploited in the libgomp 
testsuite. But it is confusing to me, and evidently to others as well.


Thanks,
- Tom

[PATCH PR82574]Check that datref must be executed exactly once per iteration against outermost loop in nest

2017-10-17 Thread Bin Cheng

Hi,
The patch fixes ICE reported in PR82574.  In order to distribute builtin 
partition, we need
to check that data reference must be executed exactly once per iteration.  In 
distribution
for loop nest, this has to be checked against each loop in the nest.  One 
optimization can
be done is we only need to check against the outermost loop for perfect nest.
Bootstrap and test on x86_64.  Is it OK?

Thanks,
bin
2017-10-17  Bin Cheng  

PR tree-optimization/82574
* tree-loop-distribution.c (find_single_drs): New parameter.  Check
that data reference must be executed exactly once per iteration
against the outermost loop in nest.
(classify_partition): Update call to above function.

gcc/testsuite
2017-10-17  Bin Cheng  

PR tree-optimization/82574
* gcc.dg/tree-ssa/pr82574.c: New test.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82574.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr82574.c
new file mode 100644
index 000..8fc4596
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82574.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+unsigned char a, b, c, d[200][200];
+
+void abort (void);
+
+int main ()
+{
+  for (; a < 200; a++)
+for (b = 0; b < 200; b++)
+  if (c)
+   d[a][b] = 1;
+
+  if ((c && d[0][0] != 1) || (!c && d[0][0] != 0))
+abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 5e835be..d029f98 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1283,12 +1283,12 @@ build_rdg_partition_for_vertex (struct graph *rdg, int 
v)
   return partition;
 }
 
-/* Given PARTITION of RDG, record single load/store data references for
-   builtin partition in SRC_DR/DST_DR, return false if there is no such
+/* Given PARTITION of LOOP and RDG, record single load/store data references
+   for builtin partition in SRC_DR/DST_DR, return false if there is no such
data references.  */
 
 static bool
-find_single_drs (struct graph *rdg, partition *partition,
+find_single_drs (struct loop *loop, struct graph *rdg, partition *partition,
 data_reference_p *dst_dr, data_reference_p *src_dr)
 {
   unsigned i;
@@ -1344,10 +1344,12 @@ find_single_drs (struct graph *rdg, partition 
*partition,
   && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (single_st), 1)))
 return false;
 
-  /* Data reference must be executed exactly once per iteration.  */
+  /* Data reference must be executed exactly once per iteration of each
+ loop in the loop nest.  We only need to check dominant information
+ against the outermost one in a perfect loop nest because a bb can't
+ dominate outermost loop's latch without dominating inner loop's.  */
   basic_block bb_st = gimple_bb (DR_STMT (single_st));
-  struct loop *inner = bb_st->loop_father;
-  if (!dominated_by_p (CDI_DOMINATORS, inner->latch, bb_st))
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_st))
 return false;
 
   if (single_ld)
@@ -1365,14 +1367,16 @@ find_single_drs (struct graph *rdg, partition 
*partition,
 
   /* Load and store must be in the same loop nest.  */
   basic_block bb_ld = gimple_bb (DR_STMT (single_ld));
-  if (inner != bb_ld->loop_father)
+  if (bb_st->loop_father != bb_ld->loop_father)
return false;
 
-  /* Data reference must be executed exactly once per iteration.  */
-  if (!dominated_by_p (CDI_DOMINATORS, inner->latch, bb_ld))
+  /* Data reference must be executed exactly once per iteration.
+Same as single_st, we only need to check against the outermost
+loop.  */
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb_ld))
return false;
 
-  edge e = single_exit (inner);
+  edge e = single_exit (bb_st->loop_father);
   bool dom_ld = dominated_by_p (CDI_DOMINATORS, e->src, bb_ld);
   bool dom_st = dominated_by_p (CDI_DOMINATORS, e->src, bb_st);
   if (dom_ld != dom_st)
@@ -1611,7 +1615,7 @@ classify_partition (loop_p loop, struct graph *rdg, 
partition *partition,
 return;
 
   /* Find single load/store data references for builtin partition.  */
-  if (!find_single_drs (rdg, partition, _st, _ld))
+  if (!find_single_drs (loop, rdg, partition, _st, _ld))
 return;
 
   /* Classify the builtin kind.  */

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Jakub Jelinek

On Tue, Oct 17, 2017 at 04:42:58PM +0200, Tom de Vries wrote:
> I found the culprit, in gfortran-dg-runtest:
> ...
> # look if this is dg-do-run test, in which case
> 
> # we cycle through the option list, otherwise we don't
> 
> if [expr [search_for $test "dg-do run"]] {
> set option_list $torture_with_loops
>   } else {
> set option_list [list { -O } ]
> }
> ...
> This doesn't take dg-do-what-default into account. [ Note that this search
> also triggers on '! bla bla dg-do run bla bla' ]
> 
> Attached patch fixes this.
> 
> Tested on x86_64 with test-case
> libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90.
> 
> Verified that that adding '! dg-do compiler' switches back to running with
> '-O'.
> 
> OK, if full testing is ok?

I believe this was fully intentional and the presence/absence of
explicit dg-do run can then be used to decide if it should loop through
options or not.

Jakub

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Tom de Vries


On 10/17/2017 02:51 PM, Tom de Vries wrote:

On 10/17/2017 01:19 PM, Thomas Schwinge wrote:

Hi!

On Mon, 16 Oct 2017 10:49:45 +0200, Tom de 
Vries  wrote:

this patch enables some openacc test-cases for non-nvidia devices.

Committed.

Thanks!


--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,4 +1,4 @@
-! { dg-do run  { target openacc_nvidia_accel_selected } }
+! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
[...]

To restore the torture testing that we like to do for Fortran test cases,
I committed the following in trunk r253808, as obvious:


It's not obvious to me.

Given that lib/libgomp.exp contains:
...
set dg-do-what-default run
...
I'd expect adding '! dg-do run' to have no effect.



I found the culprit, in gfortran-dg-runtest:
...
# look if this is dg-do-run test, in which case 

# we cycle through the option list, otherwise we don't 


if [expr [search_for $test "dg-do run"]] {
set option_list $torture_with_loops
} else {
set option_list [list { -O } ]
}
...
This doesn't take dg-do-what-default into account. [ Note that this 
search also triggers on '! bla bla dg-do run bla bla' ]


Attached patch fixes this.

Tested on x86_64 with test-case 
libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90.


Verified that that adding '! dg-do compiler' switches back to running 
with '-O'.


OK, if full testing is ok?

Thanks,
- Tom

diff --git a/gcc/testsuite/lib/gfortran-dg.exp b/gcc/testsuite/lib/gfortran-dg.exp
index 27b2a69..8c3 100644
--- a/gcc/testsuite/lib/gfortran-dg.exp
+++ b/gcc/testsuite/lib/gfortran-dg.exp
@@ -125,6 +125,7 @@ proc gfortran-dg-prune { system text } {
 proc gfortran-dg-runtest { testcases flags default-extra-flags } {
 global runtests
 global DG_TORTURE_OPTIONS torture_with_loops
+global dg-do-what-default
 
 torture-init
 set-torture-options $DG_TORTURE_OPTIONS
@@ -136,9 +137,22 @@ proc gfortran-dg-runtest { testcases flags default-extra-flags } {
 	continue
 }
 
+	set dorun 0
+	if [expr [search_for $test "\{ dg-do "]] {
+	# We set dg-do
+	if [expr [search_for $test "\{ dg-do run"]] {
+		# We set dg-do to run
+		set dorun 1
+	}
+	} else {
+	# We don't set dg-do, use the default
+	if { ${dg-do-what-default} == "run" } {
+		set dorun 1
+	}
+	}
 	# look if this is dg-do-run test, in which case
 	# we cycle through the option list, otherwise we don't
-	if [expr [search_for $test "dg-do run"]] {
+	if { $dorun } {
 	set option_list $torture_with_loops
 	} else {
 	set option_list [list { -O } ]

Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions

2017-10-17 Thread Bin.Cheng

On Mon, Oct 16, 2017 at 5:27 PM, Bin.Cheng  wrote:
> On Mon, Oct 16, 2017 at 5:00 PM, Bin.Cheng  wrote:
>> On Mon, Oct 16, 2017 at 2:56 PM, Bin.Cheng  wrote:
>>> On Thu, Oct 12, 2017 at 2:43 PM, Richard Biener
>>>  wrote:
 On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
> Hi,
> This patch merges adjacent memset builtin partitions if possible.  It is
> a useful special case optimization transforming below code:
>
> #define M (256)
> #define N (512)
>
> struct st
> {
>   int a[M][N];
>   int c[M];
>   int b[M][N];
> };
>
> void
> foo (struct st *p)
> {
>   for (unsigned i = 0; i < M; ++i)
> {
>   p->c[i] = 0;
>   for (unsigned j = N; j > 0; --j)
> {
>   p->a[i][j - 1] = 0;
>   p->b[i][j - 1] = 0;
> }
> }
>
> into a single memset function call, rather than three calls initializing
> the structure field by field.
>
> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?

 +  /* Insertion sort is good enough for the small sub-array.  */
 +  for (k = i + 1; k < j; ++k)
 +   {
 + part1 = (*partitions)[k];
 + for (l = k; l > i; --l)
 +   {
 + part2 = (*partitions)[l - 1];
 + if (part1->builtin->dst_base_offset
 +   >= part2->builtin->dst_base_offset)
 +   break;
 +
 + (*partitions)[l] = part2;
 +   }
 + (*partitions)[l] = part1;
 +   }

 so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
 to write a qsort helper for this but I can't wrap my head around the above
 in 5 minutes so ... please!
>>> Hmm, I thought twice about this and now I believe stable sorting (thus
>>> insertion sort)
>>> is required here.  Please see below for explanation.
>>>

 You don't seem to check the sizes of the memsets given that they
 obviously cannot overlap(!?)
>>> Yes, given it's quite special case transformation, I did add code
>>> checking overlap cases.

 Also why handle memset and not memcpy/memmove or combinations of
 them (for sorting)?

   for ()
{
   p->a[i] = 0;
   p->c[i] = q->c[i];
   p->b[i] = 0;
}

 with a and b adjacent.  I suppose p->c could be computed by a
 non-builtin partition as well.
>>> Yes, the two memset builtin partitions can be merged in this case, but...
 So don't we want to see if dependences allow sorting all builtin
 partitions next to each other
 as much as possible?  (maybe we do that already?)
>>> The answer for this, above partition merging and use of qsort is no.
>>> I think all the three are the same question here.  For now we only do
>>> topological sort for partitions.  To maximize parallelism (either by merging
>>> normal parallel partitions or merging builtin partitions) requires 
>>> fine-tuned
>>> sorting between partitions that doesn't dependence on each other.
>>> In order to sort all memset/memcpy/memmove, we need check dependence
>>> between all data references between different partitions.  For example, I
>>> created new test ldist-36.c illustrating sorting memcpy along with memset
>>> would generate wrong code because dependence is broken.  It's the same
>>> for qsort.  In extreme case, if the same array is set twice with different 
>>> rhs
>>> value, the order between the two sets needs to be preserved.  Unfortunately,
>>> qsort is unstable and could reorder different sets.  This would break output
>>> dependence.
>>> At the point of this function, dependence graph is destroyed, we can't do
>>> much in addition to special case handling for memset.  Full solution would
>>> require a customized topological sorting process.
>>>
>>> So, this updated patch keeps insertion sort with additional comment 
>>> explaining
>>> why.  Also two test cases added showing when memset partitions should be
>>> merged (we can't for now) and when memset partitions should not be merged.
>> Hmm, I can factor out the sorting loop into a function, that might
>> make it easier
>> to read.
> Okay, I will use std::stable_sort in this case, that's exactly what we
> want for this case.
Hi Richard,
This is the 3rd version of patch, it uses std::stable_sort which gives
stable sort and code simplicity.
Bootstrap and test.  Is it OK?

Thanks,
bin
2017-10-17  Bin Cheng  

* tree-loop-distribution.c (INCLUDE_ALGORITHM): New header file.
(tree-ssa-loop-ivopts.h): New header file.
(struct builtin_info): New fields.
(classify_builtin_1): Compute and record base and offset parts for
memset builtin partition by calling strip_offset.
(offset_cmp, fuse_memset_builtins): New

Re: [PATCH PR/82546] tree node size

2017-10-17 Thread Richard Biener

On Tue, 17 Oct 2017, Nathan Sidwell wrote:

> On 10/17/2017 05:26 AM, Richard Biener wrote:
> 
> > Sorry for not looking at the patch before replying.  The patch looks ok
> > but shouldn't LANG_TYPE be also handled by the FE?  LANG_TYPE itself
> > is an odd beast if I may say that - it's only used by the C++ and Ada FEs
> > and the Ada FE does only
> 
> I agree.  I think LANG_TYPE may be from when there were no FE-specific nodes.
> It should probably be killed and resurrected as appropriate FE nodes.
> 
> Olivier, as an ADA person, is that something that could be done?
> 
> > Thus the patch is ok.
> 
> Thanks,
> 
> As a heads up, we currently have:
> 
> struct type_common;
> struct type_with_lang_specific : type_common;
> struct type_non_common : type_with_lang_specific;
> 
> And many (most?, all?) FE type nodes derive from type_non_common (even if, as
> I discovered, they don't know it).  It seems the hierarchy would be better as:
> 
> struct type_common;
> struct type_non_common : type_common; // FE type derive here
> struct type_with_lang_specific : type_non_common;
> 
> After all, why would a FE-specific type need a lang-specific pointer?  I don't
> think there are types that just have the land-pointer and don't have
> non_common.
> 
> That's the direction I'm heading in with this clean up.

Not sure.  For decls the lang_specific pointer is even in decl_common!

It's probably that there are basically no types w/o a FE using the
lang-specific pointer but there are some types using
type_common fields only (but need lang-specifics).

So maybe do like with decls and remove type_with_lang_specific, instead
folding it into type_common?

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH PR/82546] tree node size

2017-10-17 Thread Nathan Sidwell


On 10/17/2017 05:26 AM, Richard Biener wrote:


Sorry for not looking at the patch before replying.  The patch looks ok
but shouldn't LANG_TYPE be also handled by the FE?  LANG_TYPE itself
is an odd beast if I may say that - it's only used by the C++ and Ada FEs
and the Ada FE does only


I agree.  I think LANG_TYPE may be from when there were no FE-specific 
nodes.  It should probably be killed and resurrected as appropriate FE 
nodes.


Olivier, as an ADA person, is that something that could be done?


Thus the patch is ok.


Thanks,

As a heads up, we currently have:

struct type_common;
struct type_with_lang_specific : type_common;
struct type_non_common : type_with_lang_specific;

And many (most?, all?) FE type nodes derive from type_non_common (even 
if, as I discovered, they don't know it).  It seems the hierarchy would 
be better as:


struct type_common;
struct type_non_common : type_common; // FE type derive here
struct type_with_lang_specific : type_non_common;

After all, why would a FE-specific type need a lang-specific pointer?  I 
don't think there are types that just have the land-pointer and don't 
have non_common.


That's the direction I'm heading in with this clean up.

nathan

--
Nathan Sidwell

[PATCH][GRAPHITE] Fix ISL memory management issue

2017-10-17 Thread Richard Biener


The isl_union_map operations always take the existing map and return
a new one but scop_get_reads_and_writes tries to operate on its
parameters in-place.  This fails once a re-allocation happens leading
to "interesting" issues (like random segfaults with 
-fdump-tree-graphite-details on larger testcases).

Fixed as follows.

Committed as obvious.

Richard.

2017-10-17  Richard Biener  

* graphite-dependences.c (scop_get_reads_and_writes): Change
output parameters to references.

Index: gcc/graphite-dependences.c
===
--- gcc/graphite-dependences.c  (revision 253811)
+++ gcc/graphite-dependences.c  (working copy)
@@ -67,9 +67,9 @@ add_pdr_constraints (poly_dr_p pdr, poly
reads are returned in READS and writes in MUST_WRITES and MAY_WRITES.  */
 
 static void
-scop_get_reads_and_writes (scop_p scop, isl_union_map *reads,
-  isl_union_map *must_writes,
-  isl_union_map *may_writes)
+scop_get_reads_and_writes (scop_p scop, isl_union_map *,
+  isl_union_map *_writes,
+  isl_union_map *_writes)
 {
   int i, j;
   poly_bb_p pbb;

[PATCH][GRAPHITE] Remove dead code

2017-10-17 Thread Richard Biener


The following removes copy_internal_parameters and the parameter rename
map.  It got dead by myself forgetting to copy the member to the
false if-region part ... and in previous mail we discussed we'd rather
wait for a testcase showing the need to handle "parameters" defined in
the region.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-10-17  Richard Biener  

* graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_ast_expr_id):
Simplify with removal of the parameter rename map.
(set_rename): Likewise.
(should_copy_to_new_region): Likewise.
(graphite_copy_stmts_from_block): Likewise.
(copy_bb_and_scalar_dependences): Remove initialization of
unused copied_bb_map.
(copy_def): Remove.
(copy_internal_parameters): Likewise.
(graphite_regenerate_ast_isl): Do not call copy_internal_parameters.
* graphite-scop-detection.c (scop_detection::stmt_simple_for_scop_p):
Use INTEGRAL_TYPE_P.
(parameter_index_in_region_1): Rename to ...
(assign_parameter_index_in_region): ... this.  Assert we have
a parameter we handle.
(scan_tree_for_params): Adjust.
* sese.h (parameter_rename_map_t): Remove.
(struct sese_info_t): Remove unused parameter_rename_map and
copied_bb_map members.
* sese.c (new_sese_info): Adjust.
(free_sese_info): Likewise.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253811)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -264,11 +264,9 @@ gcc_expression_from_isl_ast_expr_id (tre
  "Could not map isl_id to tree expression");
   isl_ast_expr_free (expr_id);
   tree t = res->second;
-  tree *val = region->parameter_rename_map->get(t);
-
-  if (!val)
-   val = 
-  return fold_convert (type, *val);
+  if (useless_type_conversion_p (type, TREE_TYPE (t)))
+return t;
+  return fold_convert (type, t);
 }
 
 /* Converts an isl_ast_expr_int expression E to a widest_int.
@@ -953,13 +951,6 @@ set_rename (tree old_name, tree expr)
   r.safe_push (expr);
   region->rename_map->put (old_name, r);
 }
-
-  tree t;
-  int i;
-  /* For a parameter of a scop we don't want to rename it.  */
-  FOR_EACH_VEC_ELT (region->params, i, t)
-if (old_name == t)
-  region->parameter_rename_map->put(old_name, expr);
 }
 
 /* Return an iterator to the instructions comes last in the execution order.
@@ -1138,14 +1129,6 @@ should_copy_to_new_region (gimple *stmt,
   && scev_analyzable_p (lhs, region->region))
 return false;
 
-  /* Do not copy parameters that have been generated in the header of the
- scop.  */
-  if (is_gimple_assign (stmt)
-  && (lhs = gimple_assign_lhs (stmt))
-  && TREE_CODE (lhs) == SSA_NAME
-  && region->parameter_rename_map->get(lhs))
-return false;
-
   return true;
 }
 
@@ -1214,7 +1197,7 @@ graphite_copy_stmts_from_block (basic_bl
   if (codegen_error_p ())
return false;
 
-  /* For each SSA_NAME in the parameter_rename_map rename their usage.  */
+  /* For each SCEV analyzable SSA_NAME, rename their usage.  */
   ssa_op_iter iter;
   use_operand_p use_p;
   if (!is_gimple_debug (copy))
@@ -1223,26 +1206,16 @@ graphite_copy_stmts_from_block (basic_bl
tree old_name = USE_FROM_PTR (use_p);
 
if (TREE_CODE (old_name) != SSA_NAME
-   || SSA_NAME_IS_DEFAULT_DEF (old_name))
- continue;
-
-   tree *new_expr = region->parameter_rename_map->get (old_name);
-   tree new_name;
-   if (!new_expr
-   && scev_analyzable_p (old_name, region->region))
- {
-   gimple_seq stmts = NULL;
-   new_name = get_rename_from_scev (old_name, ,
-bb->loop_father, iv_map);
-   if (! codegen_error_p ())
- gsi_insert_earliest (stmts);
-   new_expr = _name;
- }
-
-   if (!new_expr)
+   || SSA_NAME_IS_DEFAULT_DEF (old_name)
+   || ! scev_analyzable_p (old_name, region->region))
  continue;
 
-   replace_exp (use_p, *new_expr);
+   gimple_seq stmts = NULL;
+   tree new_name = get_rename_from_scev (old_name, ,
+ bb->loop_father, iv_map);
+   if (! codegen_error_p ())
+ gsi_insert_earliest (stmts);
+   replace_exp (use_p, new_name);
  }
 
   update_stmt (copy);
@@ -1288,17 +1261,6 @@ copy_bb_and_scalar_dependences (basic_bl
   gsi_insert_after (_tgt, ass, GSI_NEW_STMT);
 }
 
-  vec  *copied_bbs = region->copied_bb_map->get (bb);
-  if (copied_bbs)
-copied_bbs->safe_push (new_bb);
-  else
-{
-  vec bbs;
-  bbs.create (2);
-

Re: [PATCH 2/2] S/390: Do not end groups after fallthru edge

2017-10-17 Thread Robin Dapp

> Can't we just set s390_sched_state to s390_last_sched_state in
> s390_sched_init.

Good idea, this simplifies the code quite a bit.

> Preserving the sched state across basic blocks for your case works
> only if the BBs are traversed with the fall through edges coming
> first. Is that the case? We probably should have a description for
> s390_last_sched_state stating this.

According to the documentation the basic blocks should be traversed in
"instruction stream" order, i.e. a fallthru edge will be considered
first as far as I understand it.

Regards
 Robin

--

gcc/ChangeLog:

2017-10-17  Robin Dapp  

* config/s390/s390.c (s390_bb_fallthru_entry_likely): New
function.
(s390_sched_init): Do not reset s390_sched_state if we entered
the current basic block via a fallthru edge and all others are
very unlikely.
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index c1a144e..dac81bc 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -83,6 +83,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "symbol-summary.h"
 #include "ipa-prop.h"
 #include "ipa-fnsummary.h"
+#include "sched-int.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -14346,6 +14347,28 @@ s390_z10_prevent_earlyload_conflicts (rtx_insn **ready, int *nready_p)
   ready[0] = tmp;
 }
 
+/* Returns TRUE if BB is entered via a fallthru edge and all other
+   incoming edges are less than unlikely.  */
+static bool
+s390_bb_fallthru_entry_likely (basic_block bb)
+{
+  edge e, fallthru_edge;
+  edge_iterator ei;
+
+  if (!bb)
+return false;
+
+  fallthru_edge = find_fallthru_edge (bb->preds);
+  if (!fallthru_edge)
+return false;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+if (e != fallthru_edge
+	&& e->probability >= profile_probability::unlikely ())
+  return false;
+
+  return true;
+}
 
 /* The s390_sched_state variable tracks the state of the current or
the last instruction group.
@@ -14354,7 +14377,7 @@ s390_z10_prevent_earlyload_conflicts (rtx_insn **ready, int *nready_p)
3 the last group is complete - normal insns
4 the last group was a cracked/expanded insn */
 
-static int s390_sched_state;
+static int s390_sched_state = 0;
 
 #define S390_SCHED_STATE_NORMAL  3
 #define S390_SCHED_STATE_CRACKED 4
@@ -14764,7 +14787,17 @@ s390_sched_init (FILE *file ATTRIBUTE_UNUSED,
 {
   last_scheduled_insn = NULL;
   memset (last_scheduled_unit_distance, 0, MAX_SCHED_UNITS * sizeof (int));
-  s390_sched_state = 0;
+
+  /* If the next basic block is most likely entered via a fallthru edge
+ we keep the last sched state.  Otherwise we start a new group.
+ The scheduler traverses basic blocks in "instruction stream" ordering
+ so if we see a fallthru edge here, s390_sched_state will be of its
+ source block.  */
+  rtx_insn *insn = current_sched_info->prev_head
+? NEXT_INSN (current_sched_info->prev_head) : NULL;
+  basic_block bb = insn ? BLOCK_FOR_INSN (insn) : NULL;
+  if (!s390_bb_fallthru_entry_likely (bb))
+s390_sched_state = 0;
 }
 
 /* This target hook implementation for TARGET_LOOP_UNROLL_ADJUST calculates

Re: [RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-17 Thread Richard Biener

On Tue, 17 Oct 2017, Jan Hubicka wrote:

> Hi,
> gether/scatter loads tends to be expensive (at least for x86) while we now 
> account them
> as vector loads/stores which are cheap.  This patch adds vectorizer cost 
> entry for these
> so this can be modelled more realistically.
> 
> Bootstrapped/regtested x86_64-linux, OK?

Ok.  gather and load is somewhat redundant, likewise
scatter and store.  So you might want to change it to just
vector_gather and vector_scatter.  Even vector_ is redundant...

Best available implementations manage to hide the vector build
cost and just expose the latency of the load(s).  I wonder what
Zen does here ;)

Note the most major source of impreciseness in the cost model
is from vec_perm because we lack the information of the
permutation mask which means we can't distinguish between
cross-lane and intra-lane permutes.

Richard.

> Honza
> 
> 2017-10-17  Jan Hubicka  
> 
>   * target.h (enum vect_cost_for_stmt): Add vec_gather_load and
>   vec_scatter_store
>   * tree-vect-stmts.c (record_stmt_cost): Make difference between normal
>   and scatter/gather ops.
> 
>   * aarch64/aarch64.c (aarch64_builtin_vectorization_cost): Add
>   vec_gather_load and vec_scatter_store.
>   * arm/arm.c (arm_builtin_vectorization_cost): Likewise.
>   * powerpcspe/powerpcspe.c (rs6000_builtin_vectorization_cost): Likewise.
>   * rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Likewise.
>   * s390/s390.c (s390_builtin_vectorization_cost): Likewise.
>   * spu/spu.c (spu_builtin_vectorization_cost): Likewise.
> 
> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 253789)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -8547,9 +8547,10 @@ aarch64_builtin_vectorization_cost (enum
>   return fp ? costs->vec_fp_stmt_cost : costs->vec_int_stmt_cost;
>  
>case vector_load:
> +  case vector_gather_load:
>   return costs->vec_align_load_cost;
>  
> -  case vector_store:
> +  case vector_scatter_store:
>   return costs->vec_store_cost;
>  
>case vec_to_scalar:
> Index: config/arm/arm.c
> ===
> --- config/arm/arm.c  (revision 253789)
> +++ config/arm/arm.c  (working copy)
> @@ -11241,9 +11241,11 @@ arm_builtin_vectorization_cost (enum vec
>  return current_tune->vec_costs->vec_stmt_cost;
>  
>case vector_load:
> +  case vector_gather_load:
>  return current_tune->vec_costs->vec_align_load_cost;
>  
>case vector_store:
> +  case vector_scatter_store:
>  return current_tune->vec_costs->vec_store_cost;
>  
>case vec_to_scalar:
> Index: config/powerpcspe/powerpcspe.c
> ===
> --- config/powerpcspe/powerpcspe.c(revision 253789)
> +++ config/powerpcspe/powerpcspe.c(working copy)
> @@ -5834,6 +5834,8 @@ rs6000_builtin_vectorization_cost (enum
>case vector_stmt:
>case vector_load:
>case vector_store:
> +  case vector_gather_load:
> +  case vector_scatter_store:
>case vec_to_scalar:
>case scalar_to_vec:
>case cond_branch_not_taken:
> Index: config/rs6000/rs6000.c
> ===
> --- config/rs6000/rs6000.c(revision 253789)
> +++ config/rs6000/rs6000.c(working copy)
> @@ -5398,6 +5398,8 @@ rs6000_builtin_vectorization_cost (enum
>case vector_stmt:
>case vector_load:
>case vector_store:
> +  case vector_gather_load:
> +  case vector_scatter_store:
>case vec_to_scalar:
>case scalar_to_vec:
>case cond_branch_not_taken:
> Index: config/s390/s390.c
> ===
> --- config/s390/s390.c(revision 253789)
> +++ config/s390/s390.c(working copy)
> @@ -3717,6 +3717,8 @@ s390_builtin_vectorization_cost (enum ve
>case vector_stmt:
>case vector_load:
>case vector_store:
> +  case vector_gather_load:
> +  case vector_scatter_store:
>case vec_to_scalar:
>case scalar_to_vec:
>case cond_branch_not_taken:
> Index: config/spu/spu.c
> ===
> --- config/spu/spu.c  (revision 253789)
> +++ config/spu/spu.c  (working copy)
> @@ -6625,6 +6625,8 @@ spu_builtin_vectorization_cost (enum vec
>case vector_stmt:
>case vector_load:
>case vector_store:
> +  case vector_gather_load:
> +  case vector_scatter_store:
>case vec_to_scalar:
>case scalar_to_vec:
>case cond_branch_not_taken:
> Index: target.h
> ===
> --- target.h  (revision 253789)
> +++ target.h  (working copy)
> @@ -171,9

Re: [patch] Relax IVOPTs restriction on auto-increment

2017-10-17 Thread Richard Biener

On Tue, Oct 17, 2017 at 9:45 AM, Eric Botcazou  wrote:
> Hi,
>
> add_autoinc_candidates begins with this test:
>
>   /* If we insert the increment in any position other than the standard
>  ones, we must ensure that it is incremented once per iteration.
>  It must not be in an inner nested loop, or one side of an if
>  statement.  */
>   if (use_bb->loop_father != data->current_loop
>   || !dominated_by_p (CDI_DOMINATORS, data->current_loop->latch, use_bb)
>   || stmt_could_throw_p (use->stmt)
>   || !cst_and_fits_in_hwi (step))
> return;
>
> This means that, for languages supporting non-call exceptions like Ada, no
> auto-inc candidates will be added for a simple iteration over an array in most
> cases since stmt_could_throw_p returns true.  I don't think that's necessary,
> or else it would also be necessary to test gimple_could_trap_p, which would
> have the same effect for all languages.  Note that add_autoinc_candidates is
> only invoked for USE_ADDRESS use types:
>
>   /* At last, add auto-incremental candidates.  Make such variables
>  important since other iv uses with same base object may be based
>  on it.  */
>   if (use != NULL && use->type == USE_ADDRESS)
> add_autoinc_candidates (data, iv->base, iv->step, true, use);
>
> So the attached patch replaces it with stmt_can_throw_internal (which is the
> predicate used to compute the flow of control) as e.g. in the vectorizer.
>
> Bootstrapped/regtested on PowerPC64/Linux, OK for the mainline?

Ok.

Richard.

>
> 2017-10-17  Eric Botcazou  
>
> * tree-ssa-loop-ivopts.c (add_autoinc_candidates): Bail out only if
> the use statement can throw internally.
>
> --
> Eric Botcazou

[RFA] Zen tuning part 9: Add support for scatter/gather in vectorizer costmodel

2017-10-17 Thread Jan Hubicka

Hi,
gether/scatter loads tends to be expensive (at least for x86) while we now 
account them
as vector loads/stores which are cheap.  This patch adds vectorizer cost entry 
for these
so this can be modelled more realistically.

Bootstrapped/regtested x86_64-linux, OK?

Honza

2017-10-17  Jan Hubicka  

* target.h (enum vect_cost_for_stmt): Add vec_gather_load and
vec_scatter_store
* tree-vect-stmts.c (record_stmt_cost): Make difference between normal
and scatter/gather ops.

* aarch64/aarch64.c (aarch64_builtin_vectorization_cost): Add
vec_gather_load and vec_scatter_store.
* arm/arm.c (arm_builtin_vectorization_cost): Likewise.
* powerpcspe/powerpcspe.c (rs6000_builtin_vectorization_cost): Likewise.
* rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Likewise.
* s390/s390.c (s390_builtin_vectorization_cost): Likewise.
* spu/spu.c (spu_builtin_vectorization_cost): Likewise.

Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c(revision 253789)
+++ config/aarch64/aarch64.c(working copy)
@@ -8547,9 +8547,10 @@ aarch64_builtin_vectorization_cost (enum
return fp ? costs->vec_fp_stmt_cost : costs->vec_int_stmt_cost;
 
   case vector_load:
+  case vector_gather_load:
return costs->vec_align_load_cost;
 
-  case vector_store:
+  case vector_scatter_store:
return costs->vec_store_cost;
 
   case vec_to_scalar:
Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 253789)
+++ config/arm/arm.c(working copy)
@@ -11241,9 +11241,11 @@ arm_builtin_vectorization_cost (enum vec
 return current_tune->vec_costs->vec_stmt_cost;
 
   case vector_load:
+  case vector_gather_load:
 return current_tune->vec_costs->vec_align_load_cost;
 
   case vector_store:
+  case vector_scatter_store:
 return current_tune->vec_costs->vec_store_cost;
 
   case vec_to_scalar:
Index: config/powerpcspe/powerpcspe.c
===
--- config/powerpcspe/powerpcspe.c  (revision 253789)
+++ config/powerpcspe/powerpcspe.c  (working copy)
@@ -5834,6 +5834,8 @@ rs6000_builtin_vectorization_cost (enum
   case vector_stmt:
   case vector_load:
   case vector_store:
+  case vector_gather_load:
+  case vector_scatter_store:
   case vec_to_scalar:
   case scalar_to_vec:
   case cond_branch_not_taken:
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c  (revision 253789)
+++ config/rs6000/rs6000.c  (working copy)
@@ -5398,6 +5398,8 @@ rs6000_builtin_vectorization_cost (enum
   case vector_stmt:
   case vector_load:
   case vector_store:
+  case vector_gather_load:
+  case vector_scatter_store:
   case vec_to_scalar:
   case scalar_to_vec:
   case cond_branch_not_taken:
Index: config/s390/s390.c
===
--- config/s390/s390.c  (revision 253789)
+++ config/s390/s390.c  (working copy)
@@ -3717,6 +3717,8 @@ s390_builtin_vectorization_cost (enum ve
   case vector_stmt:
   case vector_load:
   case vector_store:
+  case vector_gather_load:
+  case vector_scatter_store:
   case vec_to_scalar:
   case scalar_to_vec:
   case cond_branch_not_taken:
Index: config/spu/spu.c
===
--- config/spu/spu.c(revision 253789)
+++ config/spu/spu.c(working copy)
@@ -6625,6 +6625,8 @@ spu_builtin_vectorization_cost (enum vec
   case vector_stmt:
   case vector_load:
   case vector_store:
+  case vector_gather_load:
+  case vector_scatter_store:
   case vec_to_scalar:
   case scalar_to_vec:
   case cond_branch_not_taken:
Index: target.h
===
--- target.h(revision 253789)
+++ target.h(working copy)
@@ -171,9 +171,11 @@ enum vect_cost_for_stmt
   scalar_store,
   vector_stmt,
   vector_load,
+  vector_gather_load,
   unaligned_load,
   unaligned_store,
   vector_store,
+  vector_scatter_store,
   vec_to_scalar,
   scalar_to_vec,
   cond_branch_not_taken,
Index: tree-vect-stmts.c
===
--- tree-vect-stmts.c   (revision 253789)
+++ tree-vect-stmts.c   (working copy)
@@ -95,6 +95,12 @@ record_stmt_cost (stmt_vector_for_cost *
  enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
  int misalign, enum vect_cost_model_location where)
 {
+  if ((kind == vector_load || kind == unaligned_load)
+  && STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+kind = vector_gather_load;
+  if ((kind

RE: [patch][i386, AVX] GFNI enabling [4/4]

2017-10-17 Thread Koval, Julia

Fixed changelog.

gcc/
* config/i386/gfniintrin.h (_mm_gf2p8mul_epi8, _mm256_gf2p8mul_epi8,
_mm_mask_gf2p8mul_epi8, _mm_maskz_gf2p8mul_epi8,
_mm256_mask_gf2p8mul_epi8, _mm256_maskz_gf2p8mul_epi8,
_mm512_mask_gf2p8mul_epi8, _mm512_maskz_gf2p8mul_epi8,
_mm512_gf2p8mul_epi8): New intrinsics.
* config/i386/i386-builtin-types.def
(V64QI_FTYPE_V64QI_V64QI): New type.
* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8mulb_v64qi,
__builtin_ia32_vgf2p8mulb_v64qi_mask, __builtin_ia32_vgf2p8mulb_v32qi,
__builtin_ia32_vgf2p8mulb_v32qi_mask, __builtin_ia32_vgf2p8mulb_v16qi,
__builtin_ia32_vgf2p8mulb_v16qi_mask): New builtins.
* config/i386/sse.md (vgf2p8mulb_*): New pattern.
* config/i386/i386.c (ix86_expand_args_builtin): Handle new type.

gcc/testsuite/
* gcc.target/i386/avx512f-gf2p8mulb-2.c: New runtime tests.
* gcc.target/i386/avx512vl-gf2p8mulb-2.c: Ditto.
* gcc.target/i386/gfni-1.c: Add tests for GF2P8MUL.
* gcc.target/i386/gfni-2.c: Ditto.
* gcc.target/i386/gfni-3.c: Ditto.
* gcc.target/i386/gfni-4.c: Ditto.

> -Original Message-
> From: Koval, Julia
> Sent: Tuesday, October 17, 2017 3:21 PM
> To: GCC Patches 
> Cc: Kirill Yukhin 
> Subject: [patch][i386, AVX] GFNI enabling [4/4]
> 
> Hi,
> This the fourth patch of GFNI ISASET enabling. It enables GF2P8MULB
> instruction, described here:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> instruction-set-extensions-programming-reference.pdf
> 
> gcc/
>   * config/i386/gfniintrin.h (_mm_gf2p8mul_epi8,
> _mm256_gf2p8mul_epi8,
>   _mm_mask_gf2p8mul_epi8, _mm_maskz_gf2p8mul_epi8,
> _mm256_mask_gf2p8mul_epi8,
>   _mm256_maskz_gf2p8mul_epi8, _mm512_mask_gf2p8mul_epi8,
> _mm512_maskz_gf2p8mul_epi8,
>   _mm512_gf2p8mul_epi8): New intrinsics.
>   * config/i386/i386-builtin-types.def (V64QI_FTYPE_V64QI_V64QI): New
> type.
>   * config/i386/i386-builtin.def (__builtin_ia32_vgf2p8mulb_v64qi,
> __builtin_ia32_vgf2p8mulb_v64qi_mask
>   __builtin_ia32_vgf2p8mulb_v32qi,
> __builtin_ia32_vgf2p8mulb_v32qi_mask,
>   __builtin_ia32_vgf2p8mulb_v16qi,
> __builtin_ia32_vgf2p8mulb_v16qi_mask): New builtins.
>   * config/i386/sse.md (vgf2p8mulb_*): New pattern.
>   * config/i386/i386.c (ix86_expand_args_builtin): Handle new type.
> 
> gcc/testsuite/
>   * gcc.target/i386/avx512f-gf2p8mulb-2.c: New runtime tests.
>   * gcc.target/i386/avx512vl-gf2p8mulb-2.c: Ditto.
>   * gcc.target/i386/gfni-1.c: Add tests for GF2P8MUL.
>   * gcc.target/i386/gfni-2.c: Ditto.
>   * gcc.target/i386/gfni-3.c: Ditto.
>   * gcc.target/i386/gfni-4.c: Ditto.
> 
> Ok for trunk?
> 
> Thanks,
> Julia

RE: [patch][i386, AVX] GFNI enabling [3/4]

2017-10-17 Thread Koval, Julia

Thanks for your comments, fixed everything.

gcc/
* config/i386/gfniintrin.h (_mm_gf2p8affine_epi64_epi8,
_mm256_gf2p8affine_epi64_epi8, _mm_mask_gf2p8affine_epi64_epi8,
_mm_maskz_gf2p8affine_epi64_epi8, _mm256_mask_gf2p8affine_epi64_epi8,
_mm256_maskz_gf2p8affine_epi64_epi8,
_mm512_mask_gf2p8affine_epi64_epi8, _mm512_gf2p8affine_epi64_epi8
_mm512_maskz_gf2p8affine_epi64_epi8): New intrinsics.
* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineqb_v64qi,
__builtin_ia32_vgf2p8affineqb_v32qi,
__builtin_ia32_vgf2p8affineqb_v16qi): New builtins.
* config/i386/sse.md (vgf2p8affineqb_): New pattern.

gcc/testsuite/
* gcc.target/i386/avx-1.c: Handle new intrinsics.
* gcc.target/i386/avx512f-gf2p8affineqb-2.c: New runtime tests.
* gcc.target/i386/avx512vl-gf2p8affineqb-2.c: Ditto.
* gcc.target/i386/gfni-1.c: Add tests for GF2P8AFFINE.
* gcc.target/i386/gfni-2.c: Ditto.
* gcc.target/i386/gfni-3.c: Ditto.
* gcc.target/i386/gfni-4.c: Ditto.
* gcc.target/i386/sse-13.c: Handle new tests.
* gcc.target/i386/sse-23.c: Handle new tests.

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, October 17, 2017 3:15 PM
> To: Koval, Julia 
> Cc: GCC Patches ; Kirill Yukhin
> 
> Subject: Re: [patch][i386, AVX] GFNI enabling [3/4]
> 
> On Tue, Oct 17, 2017 at 01:09:50PM +, Koval, Julia wrote:
> > Hi, this the third patch of GFNI ISASET enabling. It enables GF2P8AFFINE
> instruction, described here:
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-
> instruction-set-extensions-programming-reference.pdf
> >
> > gcc/
> > * config/i386/gfniintrin.h (_mm_gf2p8affine_epi64_epi8,
> _mm256_gf2p8affine_epi64_epi8,
> 
> Too long line, even ChangeLog entries should be wrapped to 80 columns.
> 
> > (_mm_mask_gf2p8affine_epi64_epi8,
> _mm_maskz_gf2p8affine_epi64_epi8,
> > _mm256_mask_gf2p8affine_epi64_epi8,
> _mm256_maskz_gf2p8affine_epi64_epi8,
> > _mm512_mask_gf2p8affine_epi64_epi8,
> _mm512_maskz_gf2p8affine_epi64_epi8,
> 
> The above two are also too long (off by 1 char).
> 
> > _mm512_gf2p8affine_epi64_epi8): New intrinsics.
> > * config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineqb_v64qi,
> > __builtin_ia32_vgf2p8affineqb_v32qi,
> __builtin_ia32_vgf2p8affineqb_v16qi): New builtins.
> 
> And this one too.  Please wrap them.
> 
> > * config/i386/sse.md (vgf2p8affineqb_*): New pattern.
> 
> Use vgf2p8affineqb_ instead of the wild-card?
> 
> I'll defer actual review to Kirill.
> 
>   Jakub

Re: [PATCH][Middle-end]Fix PR80295 [7/8 Regression] ICE in __builtin_update_setjmp_buf expander

2017-10-17 Thread Richard Biener

On Mon, 16 Oct 2017, Qing Zhao wrote:

> resend this patch for middle-end to review. 
> 
> this patch was originally sent to aarch64 for review in the beginning:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00404.html 
> 
> The implementation of __builtin_update_setjmp_buf is not correct. It takes a 
> pointer
> as an operand and treats the Mode of the pointer as Pmode, which is not 
> correct.
> a conversion from ptr_mode to Pmode is needed for this pointer.
> 
> bootstrapped and tested on both aarch64-unknown-linux-gnu and 
> x86_64-pc-linux-gnu, 
> no regressions.
> 
> Wilco helped me a lot during fixing this bug.
> 
> Okay for trunk?

Ok.

Richard.

> the patch is:
> 
> gcc/ChangeLog
> 
> 2017-10-16  Qing Zhao 
>   Wilco Dijkstra 
> 
> * builtins.c (expand_builtin_update_setjmp_buf): Add a
> converstion to Pmode from the buf_addr.
> 
> gcc/testsuite/ChangeLog
> 
> 2017-10-16  Qing Zhao 
>   Wilco Dijkstra 
> 
> PR middle-end/80295
> * gcc.target/aarch64/pr80295.c: New test.
> 
> ---
>  gcc/builtins.c | 1 +
>  gcc/testsuite/gcc.target/aarch64/pr80295.c | 8 
>  2 files changed, 9 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr80295.c
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index c8a5ea6..01fb08b 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -1199,6 +1199,7 @@ void
>  expand_builtin_update_setjmp_buf (rtx buf_addr)
>  {
>machine_mode sa_mode = STACK_SAVEAREA_MODE (SAVE_NONLOCAL);
> +  buf_addr = convert_memory_address (Pmode, buf_addr);
>rtx stack_save
>  = gen_rtx_MEM (sa_mode,
>  memory_address
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr80295.c 
> b/gcc/testsuite/gcc.target/aarch64/pr80295.c
> new file mode 100644
> index 000..b3866d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr80295.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mabi=ilp32" } */
> +
> +void f (void *b) 
> +{ 
> +  __builtin_update_setjmp_buf (b); 
> +}
> +
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[patch][i386, AVX] GFNI enabling [4/4]

2017-10-17 Thread Koval, Julia

Hi,
This the fourth patch of GFNI ISASET enabling. It enables GF2P8MULB 
instruction, described here: 
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

gcc/
* config/i386/gfniintrin.h (_mm_gf2p8mul_epi8, _mm256_gf2p8mul_epi8,
_mm_mask_gf2p8mul_epi8, _mm_maskz_gf2p8mul_epi8, 
_mm256_mask_gf2p8mul_epi8,
_mm256_maskz_gf2p8mul_epi8, _mm512_mask_gf2p8mul_epi8, 
_mm512_maskz_gf2p8mul_epi8,
_mm512_gf2p8mul_epi8): New intrinsics.
* config/i386/i386-builtin-types.def (V64QI_FTYPE_V64QI_V64QI): New 
type.
* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8mulb_v64qi, 
__builtin_ia32_vgf2p8mulb_v64qi_mask
__builtin_ia32_vgf2p8mulb_v32qi, __builtin_ia32_vgf2p8mulb_v32qi_mask,
__builtin_ia32_vgf2p8mulb_v16qi, __builtin_ia32_vgf2p8mulb_v16qi_mask): 
New builtins.
* config/i386/sse.md (vgf2p8mulb_*): New pattern.
* config/i386/i386.c (ix86_expand_args_builtin): Handle new type.

gcc/testsuite/
* gcc.target/i386/avx512f-gf2p8mulb-2.c: New runtime tests.
* gcc.target/i386/avx512vl-gf2p8mulb-2.c: Ditto.
* gcc.target/i386/gfni-1.c: Add tests for GF2P8MUL.
* gcc.target/i386/gfni-2.c: Ditto. 
* gcc.target/i386/gfni-3.c: Ditto. 
* gcc.target/i386/gfni-4.c: Ditto.

Ok for trunk?

Thanks,
Julia



0004-GF2P8MULB-instruction.patch
Description: 0004-GF2P8MULB-instruction.patch

Re: [patch][arm] gcc-7-branch: Fix bootstrap on FreeBSD

2017-10-17 Thread Richard Earnshaw (lists)

On 17/10/17 14:00, Kyrill Tkachov wrote:
> 
> On 17/10/17 13:42, Andreas Tobler wrote:
>> Hi Kyrill,
>>
>> On 17.10.17 12:02, Kyrill Tkachov wrote:
>>
>> > On 16/10/17 20:00, Andreas Tobler wrote:
>> >> Hi all,
>> >>
>> >> I struggled over a bootstrap issue while building gcc-7 for
>> >> armv7-*-freebsd*
>> >>
>> >> I got a 'permission denied' while creating the arm-tables.opt file.
>> >>
>> >> The source tree is located on a nfs server.
>> >>
>> >> The below patch fixed it for me.
>> >>
>> >> Ok to apply?
>> >>
>> >> TIA,
>> >> Andreas
>> >>
>> >> 2017-10-16  Andreas Tobler 
>> >>
>> >>  * config/arm/t-arm (MD_INCLUDES): Create arm-tables.opt via
>> >>  intermediate arm-tables.new like the other awk generated
>> files.
>> >>
>> >> Index: config/arm/t-arm
>> >> ===
>> >> --- config/arm/t-arm    (revision 253792)
>> >> +++ config/arm/t-arm    (working copy)
>> >> @@ -75,8 +75,8 @@
>> >>    $(srcdir)/config/arm/arm-tables.opt:
>> $(srcdir)/config/arm/parsecpu.awk \
>> >>  $(srcdir)/config/arm/arm-cpus.in
>> >>   $(AWK) -f $(srcdir)/config/arm/parsecpu.awk -v cmd=opt \
>> >> -   $(srcdir)/config/arm/arm-cpus.in > \
>> >> -   $(srcdir)/config/arm/arm-tables.opt
>> >> +   $(srcdir)/config/arm/arm-cpus.in > arm-tables.new
>> >> +   mv arm-tables.new $(srcdir)/config/arm/arm-tables.opt
>> >>
>> >
>> > This looks ok to me as it makes the rule consistent with the other
>> > awk-generated files.
>> >
>> > Out of interest, this looks like a small subset of Richard's patch [1]
>> > at r249971.
>>
>> Hehe, now as you say, yes. But I wasn't aware about it. I just tried to
>> fix my bootstrap issue and compared the snippet with main. And tried if
>> it helps to use an intermediate file.
>>
>> > Have you tried that patch on the branch?
>>
>> No, is this patch going to appear on the gcc-7 branch?
>> If it is, then I'll not apply my patchlet above.
>>
> 
> AFAIK that patch was part of a series to further improve the
> architecture features selection mechanism for GCC 8
> and wasn't thus considered for the GCC 7 branch, but it looks like a
> fairly standalone improvement, so unless Richard
> has any objections to it, I think it would be good to take that patch
> for the branch (assuming it passes validation there).
> 
> Kyrill
> 
>> > [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00223.html
>>
>> Thanks,
>> Andreas
>>
> 

I've no objections to back-porting a suitable subset of the patch.

I'd point out, however, that this file is committed as part of the
source base, so unless you're changing arm-cpus.in (or parsecpu.awk)
there really shouldn't be a need to regenerate that file.  Using
"contrib/gcc_update --touch" should ensure that the date-stamps are all
correct.

R.

[PATCH][GRAPHITE] Fix PR82563

2017-10-17 Thread Richard Biener


PR82573 shows the ugly part of an earlier fix, that we now split the
entry edge of SCOPs during analysis phase to get a GBB for the entry
edge PHI copies.  That invalidates loop-closed SSA in some cases like
the PR.  So the following patch gets rid of that "fake" GBB by explicitely
emitting SESE entry edge copies (sources are all parameters).

This seems to remove quite some "spurious" optimized loop nests from SPEC.
I do see spurious schedule differences detected while the AST generator
still generates the same code, like for gcc.dg/graphite/interchange-1.c.
One of the cases "fixed" with this patch is

[scheduler] original schedule:
domain: "[P_2806, P_364] -> { S_294[] : -2147483648 <= P_2806 <= 
2147483647 and -9223372036854775808 <= P_364 <= 9223372036854775807; 
S_651[] : -2147483648 <= P_2806 <= 2147483647 and -9223372036854775808 <= 
P_364 <= 9223372036854775807; S_210[i33] : -2147483648 <= P_2806 <= 
2147483647 and -9223372036854775808 <= P_364 <= 9223372036854775807 and 0 
<= i33 <= 2147483645 and 4294967296*floor((-1 + P_2806)/4294967296) < 
P_2806 - i33; S_211[i34] : -2147483648 <= P_2806 <= 2147483647 and 
-9223372036854775808 <= P_364 <= 9223372036854775807 and 0 <= i34 <= 
2147483645 and 4294967296*floor((-1 + P_2806)/4294967296) < P_2806 - i34; 
S_687[] : -2147483648 <= P_2806 <= 2147483647 and -9223372036854775808 <= 
P_364 <= 9223372036854775807 }"
child:
  sequence:
  - filter: "[P_2806, P_364] -> { S_687[] }"
  - filter: "[P_2806, P_364] -> { S_210[i33] }"
child:
  schedule: "[P_2806, P_364] -> L_33[{ S_210[i33] -> [(i33)] }]"
  - filter: "[P_2806, P_364] -> { S_294[] }"
  - filter: "[P_2806, P_364] -> { S_651[] }"
  - filter: "[P_2806, P_364] -> { S_211[i34] }"
child:
  schedule: "[P_2806, P_364] -> L_34[{ S_211[i34] -> [(i34)] }]"

[scheduler] isl transformed schedule:
domain: "[P_2806, P_364] -> { S_294[] : -2147483648 <= P_2806 <= 
2147483647 and -9223372036854775808 <= P_364 <= 9223372036854775807; 
S_651[] : -2147483648 <= P_2806 <= 2147483647 and -9223372036854775808 <= 
P_364 <= 9223372036854775807; S_210[i33] : -2147483648 <= P_2806 <= 
2147483647 and -9223372036854775808 <= P_364 <= 9223372036854775807 and 0 
<= i33 <= 2147483645 and 4294967296*floor((-1 + P_2806)/4294967296) < 
P_2806 - i33; S_211[i34] : -2147483648 <= P_2806 <= 2147483647 and 
-9223372036854775808 <= P_364 <= 9223372036854775807 and 0 <= i34 <= 
2147483645 and 4294967296*floor((-1 + P_2806)/4294967296) < P_2806 - i34; 
S_687[] : -2147483648 <= P_2806 <= 2147483647 and -9223372036854775808 <= 
P_364 <= 9223372036854775807 }"
child:
  sequence:
  - filter: "[P_2806, P_364] -> { S_210[i33]; S_687[] }"
child:
  schedule: "[P_2806, P_364] -> [{ S_210[i33] -> [(i33)]; S_687[] -> 
[(0)] }]"
  permutable: 1
  child:
sequence:
- filter: "[P_2806, P_364] -> { S_687[] }"
- filter: "[P_2806, P_364] -> { S_210[i33] }"
  - filter: "[P_2806, P_364] -> { S_294[] }"
  - filter: "[P_2806, P_364] -> { S_651[] }"
  - filter: "[P_2806, P_364] -> { S_211[i34] }"
child:
  schedule: "[P_2806, P_364] -> [{ S_211[i34] -> [(i34)] }]"
  permutable: 1
  coincident: [ 1 ]

[scheduler] original ast:
{
  S_687();
  for (int c0 = 0; c0 < P_2806; c0 += 1)
S_210(c0);
  S_294();
  S_651();
  for (int c0 = 0; c0 < P_2806; c0 += 1)
S_211(c0);
}

[scheduler] AST generated by isl:
{
  S_687();
  for (int c0 = 0; c0 < P_2806; c0 += 1)
S_210(c0);
  S_294();
  S_651();
  for (int c0 = 0; c0 < P_2806; c0 += 1)
S_211(c0);
}

where with the patch S_687 () is no longer there (and S_210 depending
on it via RAW).

Overall the number of "optimized" loop nests in SPEC CPU 2006 drops
from 348 to 279.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-10-17  Richard Biener  

PR tree-optimization/82563
* graphite-isl-ast-to-gimple.c (generate_entry_out_of_ssa_copies):
New function.
(graphite_regenerate_ast_isl): Call it.
* graphite-scop-detection.c (build_scops): Remove entry edge split.

* gcc.dg/graphite/pr82563.c: New testcase.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253807)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -1501,6 +1501,35 @@ copy_internal_parameters (sese_info_p re
 }
 }
 
+/* Generate out-of-SSA copies for the entry edge FALSE_ENTRY/TRUE_ENTRY
+   in REGION.  */
+
+static void
+generate_entry_out_of_ssa_copies (edge false_entry,
+ edge true_entry,
+ sese_info_p region)
+{
+  gimple_stmt_iterator gsi_tgt = gsi_start_bb (true_entry->dest);
+  for (gphi_iterator psi = gsi_start_phis (false_entry->dest);
+   !gsi_end_p (psi); gsi_next ())
+{
+  gphi *phi = psi.phi ();
+  tree res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+

Re: [patch][i386, AVX] GFNI enabling [3/4]

2017-10-17 Thread Jakub Jelinek

On Tue, Oct 17, 2017 at 01:09:50PM +, Koval, Julia wrote:
> Hi, this the third patch of GFNI ISASET enabling. It enables GF2P8AFFINE 
> instruction, described here: 
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> gcc/
>   * config/i386/gfniintrin.h (_mm_gf2p8affine_epi64_epi8, 
> _mm256_gf2p8affine_epi64_epi8,

Too long line, even ChangeLog entries should be wrapped to 80 columns.

>   (_mm_mask_gf2p8affine_epi64_epi8, _mm_maskz_gf2p8affine_epi64_epi8,
>   _mm256_mask_gf2p8affine_epi64_epi8, _mm256_maskz_gf2p8affine_epi64_epi8,
>   _mm512_mask_gf2p8affine_epi64_epi8, _mm512_maskz_gf2p8affine_epi64_epi8,

The above two are also too long (off by 1 char).

>   _mm512_gf2p8affine_epi64_epi8): New intrinsics.
>   * config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineqb_v64qi,
>   __builtin_ia32_vgf2p8affineqb_v32qi, 
> __builtin_ia32_vgf2p8affineqb_v16qi): New builtins.

And this one too.  Please wrap them.

>   * config/i386/sse.md (vgf2p8affineqb_*): New pattern.

Use vgf2p8affineqb_ instead of the wild-card?

I'll defer actual review to Kirill.

Jakub

[patch][i386, AVX] GFNI enabling [3/4]

2017-10-17 Thread Koval, Julia

Hi, this the third patch of GFNI ISASET enabling. It enables GF2P8AFFINE 
instruction, described here: 
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

gcc/
* config/i386/gfniintrin.h (_mm_gf2p8affine_epi64_epi8, 
_mm256_gf2p8affine_epi64_epi8,
(_mm_mask_gf2p8affine_epi64_epi8, _mm_maskz_gf2p8affine_epi64_epi8,
_mm256_mask_gf2p8affine_epi64_epi8, _mm256_maskz_gf2p8affine_epi64_epi8,
_mm512_mask_gf2p8affine_epi64_epi8, _mm512_maskz_gf2p8affine_epi64_epi8,
_mm512_gf2p8affine_epi64_epi8): New intrinsics.
* config/i386/i386-builtin.def (__builtin_ia32_vgf2p8affineqb_v64qi,
__builtin_ia32_vgf2p8affineqb_v32qi, 
__builtin_ia32_vgf2p8affineqb_v16qi): New builtins.
* config/i386/sse.md (vgf2p8affineqb_*): New pattern.

gcc/testsuite/
* gcc.target/i386/avx-1.c: Handle new intrinsics.
* gcc.target/i386/avx512f-gf2p8affineqb-2.c: New runtime tests.
* gcc.target/i386/avx512vl-gf2p8affineqb-2.c: Ditto.
* gcc.target/i386/gfni-1.c: Add tests for GF2P8AFFINE.
* gcc.target/i386/gfni-2.c: Ditto. 
* gcc.target/i386/gfni-3.c: Ditto. 
* gcc.target/i386/gfni-4.c: Ditto.
* gcc.target/i386/sse-13.c: Handle new tests.
* gcc.target/i386/sse-23.c: Handle new tests.

Ok for trunk?

Thanks,
Julia


0003-GF2P8AFFINE-instruction.patch
Description: 0003-GF2P8AFFINE-instruction.patch

Re: [patch][arm] gcc-7-branch: Fix bootstrap on FreeBSD

2017-10-17 Thread Kyrill Tkachov

On 17/10/17 13:42, Andreas Tobler wrote:

Hi Kyrill,

On 17.10.17 12:02, Kyrill Tkachov wrote:

> On 16/10/17 20:00, Andreas Tobler wrote:
>> Hi all,
>>
>> I struggled over a bootstrap issue while building gcc-7 for
>> armv7-*-freebsd*
>>
>> I got a 'permission denied' while creating the arm-tables.opt file.
>>
>> The source tree is located on a nfs server.
>>
>> The below patch fixed it for me.
>>
>> Ok to apply?
>>
>> TIA,
>> Andreas
>>
>> 2017-10-16  Andreas Tobler 
>>
>>  * config/arm/t-arm (MD_INCLUDES): Create arm-tables.opt via
>>  intermediate arm-tables.new like the other awk generated 
files.

>>
>> Index: config/arm/t-arm
>> ===
>> --- config/arm/t-arm(revision 253792)
>> +++ config/arm/t-arm(working copy)
>> @@ -75,8 +75,8 @@
>>$(srcdir)/config/arm/arm-tables.opt: 
$(srcdir)/config/arm/parsecpu.awk \

>>  $(srcdir)/config/arm/arm-cpus.in
>>   $(AWK) -f $(srcdir)/config/arm/parsecpu.awk -v cmd=opt \
>> -   $(srcdir)/config/arm/arm-cpus.in > \
>> -   $(srcdir)/config/arm/arm-tables.opt
>> +   $(srcdir)/config/arm/arm-cpus.in > arm-tables.new
>> +   mv arm-tables.new $(srcdir)/config/arm/arm-tables.opt
>>
>
> This looks ok to me as it makes the rule consistent with the other
> awk-generated files.
>
> Out of interest, this looks like a small subset of Richard's patch [1]
> at r249971.

Hehe, now as you say, yes. But I wasn't aware about it. I just tried to
fix my bootstrap issue and compared the snippet with main. And tried if
it helps to use an intermediate file.

> Have you tried that patch on the branch?

No, is this patch going to appear on the gcc-7 branch?
If it is, then I'll not apply my patchlet above.

AFAIK that patch was part of a series to further improve the 
architecture features selection mechanism for GCC 8
and wasn't thus considered for the GCC 7 branch, but it looks like a 
fairly standalone improvement, so unless Richard
has any objections to it, I think it would be good to take that patch 
for the branch (assuming it passes validation there).

Kyrill

> [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00223.html

Thanks,
Andreas

Re: [PATCH GCC]Introduce qsort_range interface for GCC vector

2017-10-17 Thread Richard Biener

On Mon, Oct 16, 2017 at 4:53 PM, Bin Cheng  wrote:
> Hi,
> I was asked by Richi to replace insertion sort with qsort_range in loop
> nest distribution patch.  Although I believe stable sort (thus insertion)
> sort is needed in that case, I also added qsort_range interface in vec.h.
> The new interface might be useful in other places.
> Bootstrap and test on x86_64 and AArch64 with other patches.  Is it OK?

I think you want

  gcc_checking_assert (e >= s && e < length ());
  if (e - s + 1 > 1)
::qsort (...);

so elide qsort for 1 element and do the validity verification with a
checking assert.

Ok with that change.

Richard.

> Thanks,
> bin
> 2017-10-13  Bin Cheng  
>
> * vec.h (struct GTY((user)) vec::qsort_range): New
> member function.
> (struct vec): New member function.

[patch][x86] GFNI enabling [2/4]

2017-10-17 Thread Koval, Julia

Hi, this is the second patch of enabling GFNI ISASET. It adds GF2P8AFFINEINV 
instruction.
The instruction is described here:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

gcc/
* config.gcc: Add gfniintrin.h.
* config/i386/gfniintrin.h: New.
* config/i386/i386-builtin-types.def 
(__builtin_ia32_vgf2p8affineinvqb_v64qi,
__builtin_ia32_vgf2p8affineinvqb_v64qi_mask, 
__builtin_ia32_vgf2p8affineinvqb_v32qi
__builtin_ia32_vgf2p8affineinvqb_v32qi_mask, 
__builtin_ia32_vgf2p8affineinvqb_v16qi,
__builtin_ia32_vgf2p8affineinvqb_v16qi_mask): New builtins.
* config/i386/i386-builtin.def (V64QI_FTYPE_V64QI_V64QI_INT_V64QI_UDI,
V32QI_FTYPE_V32QI_V32QI_INT_V32QI_USI, 
V16QI_FTYPE_V16QI_V16QI_INT_V16QI_UHI,
V64QI_FTYPE_V64QI_V64QI_INT): New types.
* config/i386/i386.c (ix86_expand_args_builtin): Handle new types.
* config/i386/immintrin.h: Include gfniintrin.h.
* config/i386/sse.md (vgf2p8affineinvqb_*) New pattern.

gcc/testsuite/
* gcc.target/i386/avx-1.c: Handle new intrinsics.
* gcc.target/i386/avx512-check.h: Check GFNI bit.
* gcc.target/i386/avx512f-gf2p8affineinvqb-2.c: Runtime test.
* gcc.target/i386/avx512vl-gf2p8affineinvqb-2.c: Runtime test.
* gcc.target/i386/gfni-1.c: New.
* gcc.target/i386/gfni-2.c: New.
* gcc.target/i386/gfni-3.c: New.
* gcc.target/i386/gfni-4.c: New.
* gcc.target/i386/i386.exp: (check_effective_target_gfni): New.
* gcc.target/i386/sse-13.c: Handle new intrinsics.
* gcc.target/i386/sse-23.c: Handle new intrinsics.

Ok for trunk?

Thanks,
Julia


0002-GF2P8AFFINEINVQB-instruction.patch
Description: 0002-GF2P8AFFINEINVQB-instruction.patch

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Tom de Vries


On 10/17/2017 01:19 PM, Thomas Schwinge wrote:

Hi!

On Mon, 16 Oct 2017 10:49:45 +0200, Tom de Vries  wrote:

this patch enables some openacc test-cases for non-nvidia devices.

Committed.

Thanks!


--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,4 +1,4 @@
-! { dg-do run  { target openacc_nvidia_accel_selected } }
+! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
[...]

To restore the torture testing that we like to do for Fortran test cases,
I committed the following in trunk r253808, as obvious:


It's not obvious to me.

Given that lib/libgomp.exp contains:
...
set dg-do-what-default run
...
I'd expect adding '! dg-do run' to have no effect.

Thanks,
- Tom

Re: [patch][arm] gcc-7-branch: Fix bootstrap on FreeBSD

2017-10-17 Thread Andreas Tobler


Hi Kyrill,

On 17.10.17 12:02, Kyrill Tkachov wrote:


On 16/10/17 20:00, Andreas Tobler wrote:

Hi all,

I struggled over a bootstrap issue while building gcc-7 for
armv7-*-freebsd*

I got a 'permission denied' while creating the arm-tables.opt file.

The source tree is located on a nfs server.

The below patch fixed it for me.

Ok to apply?

TIA,
Andreas

2017-10-16  Andreas Tobler  

 * config/arm/t-arm (MD_INCLUDES): Create arm-tables.opt via
 intermediate arm-tables.new like the other awk generated files.

Index: config/arm/t-arm
===
--- config/arm/t-arm(revision 253792)
+++ config/arm/t-arm(working copy)
@@ -75,8 +75,8 @@
   $(srcdir)/config/arm/arm-tables.opt: $(srcdir)/config/arm/parsecpu.awk \
 $(srcdir)/config/arm/arm-cpus.in
  $(AWK) -f $(srcdir)/config/arm/parsecpu.awk -v cmd=opt \
-   $(srcdir)/config/arm/arm-cpus.in > \
-   $(srcdir)/config/arm/arm-tables.opt
+   $(srcdir)/config/arm/arm-cpus.in > arm-tables.new
+   mv arm-tables.new $(srcdir)/config/arm/arm-tables.opt



This looks ok to me as it makes the rule consistent with the other
awk-generated files.

Out of interest, this looks like a small subset of Richard's patch [1]
at r249971.


Hehe, now as you say, yes. But I wasn't aware about it. I just tried to 
fix my bootstrap issue and compared the snippet with main. And tried if 
it helps to use an intermediate file.



Have you tried that patch on the branch?


No, is this patch going to appear on the gcc-7 branch?
If it is, then I'll not apply my patchlet above.


[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00223.html


Thanks,
Andreas

Re: [patch] avoid printing leading 0 in widest_int hex dumps

2017-10-17 Thread Richard Sandiford

Aldy Hernandez  writes:
> Hi folks!
>
> Calling print_hex() on a widest_int with the most significant bit turned 
> on can lead to a leading zero being printed (0x0). This produces 
> confusing dumps to say the least, especially when you incorrectly assume 
> an integer is NOT signed :).

That's the intended behaviour though.  wide_int-based types only use as
many HWIs as they need to store their current value, with any other bits
in the value being a sign extension of the top bit.  So if the most
significant HWI in a widest_int is zero, that HWI is there to say that
the previous HWI should be zero- rather than sign-extended.

So:

   0x0  -> (1 << 32) - 1 to infinite precision
   (i.e. a positive value)
   0x   -> -1

Thanks,
Richard

Re: Check that there are no missing probabilities

2017-10-17 Thread Jan Hubicka

> 
> graphite does
> 
>   if (changed)
> {
>   cleanup_tree_cfg ();
>   profile_status_for_fn (cfun) = PROFILE_ABSENT;
>   release_recorded_exits (cfun);
>   tree_estimate_probability (false);
> 
> so it runs into CFG cleanup running before it properly resets counts.
> 
> I wonder if we shouldn't simply get rid of the explicit checking calls in
> cfg cleanup...  or if the profile checking should happen somewhere
> else.
> 
> I'd also appreciate a better way for doing the above.  Shouldn't we
> end up with a proper initialization on all edges as we just split
> existing ones and use create_empty_if_region_on_edge and
> create_empty_loop_on_edge?
> 
> Ah, those use make_edge as well.
> 
> The tree_estimate_probablility call above should be ideally
> replaced with sth like "propagate-SESE-entry-probability".

Well, re-running tree_estimate_probability is a hack and it won't
really get you very sane profile update.   I gues create_empty_if_region_on_edge
and create_empty_loop_on_edge should care about profile, i will try to take
a look.

We have frequency propagation across SEME regions as part of 
find_sub_basic_blocks.
It does not handle loops sanely though (which could be added), but still someone
needs to care about correct probabilities at least.

Honza
> 
> Richard.
> 
> > Jakub

Re: [PATCH] Improve alloca alignment

2017-10-17 Thread Wilco Dijkstra

Wilco Dijkstra wrote:
>
> Yes STACK_BOUNDARY applies to virtual_stack_dynamic_rtx and all other
> virtual frame registers. It appears it's main purpose is to enable alignment
> optimizations since PREFERRED_STACK_BOUNDARY is used to align
> local and outgoing argument area etc. So if you don't want the alignment
> optimizations it is feasible to set STACK_BOUNDARY to a lower value
> without changing the stack layout.
>
> There is also STACK_DYNAMIC_OFFSET which computes the total offset
> from the stack. It's not obvious whether the default version should align 
> (since
> outgoing arguments are already aligned there is no easy way to record the
> extra padding), but we could assert if the offset isn't aligned.

Also there is something odd in the sparc backend:

/* Given the stack bias, the stack pointer isn't actually aligned.  */
#define INIT_EXPANDERS   \
  do {   \
if (crtl->emit.regno_pointer_align && SPARC_STACK_BIAS)  \
  {  \
REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = BITS_PER_UNIT;  \
REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = BITS_PER_UNIT; \
  }  \
  } while (0)

That lowers the alignment for the stack and frame pointer. So assuming that 
works
and blocks alignment optimizations, why isn't this done for the dynamic offset 
as well?

Wilco

Re: [patch] Enhance support for -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than

2017-10-17 Thread Richard Biener

On Mon, Oct 16, 2017 at 10:35 AM, Eric Botcazou  wrote:
> Hi,
>
> a big limitation of -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than is
> that you need -O2 (or more precisely -ftree-vrp) in order to be able to say
> something sensible for dynamically-sized objects/VLAs/calls to alloca.  That
> can be problematic, for example if the coding guidelines prevents you from
> using anything beyond -O1 for production builds.
>
> Now in Ada it is very easy and common to use integer types with custom bounds
> (the compiler automatically generates the appropriate run-time bound checks)
> so it is very easy to be able to say something sensible about dynamically-
> sized objects and VLAs (Ada doesn't have alloca) even at -O0 or -O1.
>
> That's why the attached patch introduces a way for front-ends to communicate
> an upper bound for the size of dynamically-sized objects/VLAs/calls to alloca
> to the -Wstack-usage/-Wvla-larger-than/-Walloca-larger-than machineries, based
> on a 3rd form of the BUILT_IN_ALLOCA builtin which takes a 3rd parameter in
> addition to the 2 parameters of BUILT_IN_ALLOCA_WITH_ALIGN.  This 3rd form is
> only used when the front-end can put an upper bound via max_int_size_in_bytes,
> which invokes lang_hooks.types.max_size, for the time being, but its usage
> could easily be extended.
>
> Macros and helper function are provided to manipulate the variants as a single
> builtin, so that code not directly tied to their support is little modified.
> The -Wstack-usage and -Wvla-larger-than/-Walloca-larger-than machineries are
> enhanced to take into account the upper bound, if it exists.
>
> Bootstrapped/regtested on x86_64-suse-linux, OK for the mainline?

Looks ok.  I wonder if you want to explicitely document that max_size < size
doesn't have any effect on actual code generation and is not checked for.
Also it seems that __builtin_alloca_with_align (20, 8, 16) will still account 20
as the size and not 16 compared to 20 arriving in a variable which is when 16
will be used.  So at least for accounting always use MIN (size, max_size)?

Richard.

>
> 2017-10-16  Eric Botcazou  
>
> * asan.c (handle_builtin_alloca): Deal with all alloca variants.
> (get_mem_refs_of_builtin_call): Likewise.
> * builtins.c (expand_builtin_apply): Adjust call to
> allocate_dynamic_stack_space.
> (expand_builtin_alloca): For __builtin_alloca_with_align_and_max, pass
> the third argument to allocate_dynamic_stack_space, otherwise -1.
> (expand_builtin): Deal with all alloca variants.
> (is_inexpensive_builtin): Likewise.
> * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX): New.
> * calls.c (special_function_p): Deal with all alloca variants.
> (initialize_argument_information): Adjust call to
> allocate_dynamic_stack_space.
> (expand_call): Likewise.
> * cfgexpand.c (expand_stack_vars): Likewise.
> (expand_call_stmt): Deal with all alloca variants.
> * doc/extend.texi (Built-ins): Add __builtin_alloca_with_align_and_max
> * explow.c (allocate_dynamic_stack_space): Add MAX_SIZE parameter and
> use it for the stack usage computation.
> * explow.h (allocate_dynamic_stack_space): Adjust prototype.
> * function.c (gimplify_parameters): Turn BUILT_IN_ALLOCA_WITH_ALIGN
> into BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX and pass maximum size.
> * gimple-ssa-warn-alloca.c (alloca_call_type): Simplify control flow.
> Take into account 3rd argument of __builtin_alloca_with_align_and_max.
> (in_loop_p): Remove first argument and useless check.
> (pass_walloca::execute): Remove useless test and adjust call to above.
> * gimple.c (gimple_build_call_from_tree): Deal with all alloc variants
> * gimplify.c (gimplify_vla_decl): Turn BUILT_IN_ALLOCA_WITH_ALIGN into
> BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX and pass maximum size.
> (gimplify_call_expr): Deal with all alloca variants.
> * hsa-gen.c (gen_hsa_alloca): Likewise.
> (gen_hsa_insns_for_call): Likewise.
> * ipa-pure-const.c (special_builtin_state): Likewise.
> * tree-chkp.c (chkp_build_returned_bound): Likewise.
> * tree-object-size.c (alloc_object_size): Likewise.
> * tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Likewise.
> (call_may_clobber_ref_p_1): Likewise.
> * tree-ssa-ccp.c (evaluate_stmt): Likewise.
> (ccp_fold_stmt): Likewise.
> (optimize_stack_restore): Likewise.
> * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Likewise.
> (mark_all_reaching_defs_necessary_1): Likewise.
> (propagate_necessity): Likewise.
> (eliminate_unnecessary_stmts): Likewise.
> * tree.c (build_common_builtin_nodes): Build
> BUILT_IN_ALLOCA_WITH_ALIGN_AND_MAX.
> * tree.h (ALLOCA_FUNCTION_CODE_P):

Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-17 Thread Richard Biener

On Sat, Oct 14, 2017 at 10:39 AM, Eric Botcazou  wrote:
>> This looks good.  OK for the trunk.
>
> FWIW I disagree.  The patch completely shuns the existing implementation of
> the pass, which is based on a forward scan within basic blocks to identify the
> various interesting instructions and record them, and uses full-blown def-use
> and use-def chains instead, which are much more costly to compute.  It's not
> clear to me why the existing implementation couldn't have been extended.
>
> The result is that, for targets for which the pass was initially written, i.e.
> targets for which most (all) arithmetic instructions clobber the flags, the
> pass will be slower for absolutely no benefits, as the existing implementation
> would already have caught all the interesting cases.
>
> So it's again a case of a generic change made for a specific target without
> consideration for other, admittedly less mainstream, targets...

I agree with Eric here.

Richard.

> --
> Eric Botcazou

Re: Check that there are no missing probabilities

2017-10-17 Thread Richard Biener

On Fri, Oct 13, 2017 at 9:27 PM, Jakub Jelinek  wrote:
> On Fri, Oct 13, 2017 at 09:06:55PM +0200, Jan Hubicka wrote:
>> For EH we should set it to profile_probability::zero () because we know it 
>> is unlikely
>> path.   I will take a look.
>
> With the
>
> --- gcc/cfghooks.c.jj   2017-10-13 18:27:12.0 +0200
> +++ gcc/cfghooks.c  2017-10-13 19:15:11.444650533 +0200
> @@ -162,6 +162,7 @@ verify_flow_info (void)
>   err = 1;
> }
>   if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
> + && (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_FAKE)) == 0
>   && !e->probability.initialized_p ())
> {
>   error ("Uninitialized probability of edge %i->%i", 
> e->src->index,
>
> hack x86_64-linux and i686-linux bootstrapped fine, but I see still many
> graphite related regressions:
>
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 41->17
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 44->41
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 36->21
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 29->36
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 32->29
> during GIMPLE pass: graphite
> dump file: id-16.c.150t.graphite
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: internal 
> compiler error: verify_flow_info failed
> 0xafac1a verify_flow_info()
> ../../gcc/cfghooks.c:268
> 0xf2a624 checking_verify_flow_info
> ../../gcc/cfghooks.h:198
> 0xf2a624 cleanup_tree_cfg_noloop
> ../../gcc/tree-cfgcleanup.c:901
> 0xf2a624 cleanup_tree_cfg()
> ../../gcc/tree-cfgcleanup.c:952
> 0x162df85 graphite_transform_loops()
> ../../gcc/graphite.c:422
> 0x162f0c0 graphite_transforms
> ../../gcc/graphite.c:447
> 0x162f0c0 execute
> ../../gcc/graphite.c:524
>
> So probably graphite needs to be tweaked for this too.

graphite does

  if (changed)
{
  cleanup_tree_cfg ();
  profile_status_for_fn (cfun) = PROFILE_ABSENT;
  release_recorded_exits (cfun);
  tree_estimate_probability (false);

so it runs into CFG cleanup running before it properly resets counts.

I wonder if we shouldn't simply get rid of the explicit checking calls in
cfg cleanup...  or if the profile checking should happen somewhere
else.

I'd also appreciate a better way for doing the above.  Shouldn't we
end up with a proper initialization on all edges as we just split
existing ones and use create_empty_if_region_on_edge and
create_empty_loop_on_edge?

Ah, those use make_edge as well.

The tree_estimate_probablility call above should be ideally
replaced with sth like "propagate-SESE-entry-probability".

Richard.

> Jakub

Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-17 Thread Richard Biener

On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor  wrote:
> Hi,
>
> I'd like to request comments to the patch below which aims to fix PR
> 80689, which is an instance of a store-to-load forwarding stall on
> x86_64 CPUs in the Image Magick benchmark, which is responsible for a
> slow down of up to 9% compared to gcc 6, depending on options and HW
> used.  (Actually, I have just seen 24% in one specific combination but
> for various reasons can no longer verify it today.)
>
> The revision causing the regression is 237074, which increased the
> size of the mode for copying aggregates "by pieces" to 128 bits,
> incurring big stalls when the values being copied are also still being
> stored in a smaller data type or if the copied values are loaded in a
> smaller types shortly afterwards.  Such situations happen in Image
> Magick even across calls, which means that any non-IPA flow-sensitive
> approach would not detect them.  Therefore, the patch simply changes
> the way we copy small BLKmode data that are simple combinations of
> records and arrays (meaning no unions, bit-fields but also character
> arrays are disallowed) and simply copies them one field and/or element
> at a time.
>
> "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
> (the structure in the benchmark has 32 bytes) but is subject to change
> after more benchmarking and is actually zero - meaning element copying
> never happens - on other architectures.  I believe that any
> architecture with a store buffer can benefit but it's probably better
> to leave it to their maintainers to find a different default value.  I
> am not sure this is how such HW-dependant decisions should be done and
> is the primary reason, why I am sending this RFC first.
>
> I have decided to implement this change at the expansion level because
> at that point the type information is still readily available and at
> the same time we can also handle various implicit copies, for example
> those passing a parameter.  I found I could re-use some bits and
> pieces of tree-SRA and so I did, creating tree-sra.h header file in
> the process.
>
> I am fully aware that in the final patch the new parameter, or indeed
> any new parameters, need to be documented.  I have skipped that
> intentionally now and will write the documentation if feedback here is
> generally good.
>
> I have bootstrapped and tested this patch on x86_64-linux, with
> different values of the parameter and only found problems with
> unreasonably high values leading to OOM.  I have done the same with a
> previous version of the patch which was equivalent to the limit being
> 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
> into failures of tests which assumed that structure padding was copied
> in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
> for example gcc.dg/vmx/varargs-4.c).
>
> The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
> 8% at -O2 and -Ofast compilation levels respectively on one particular
> new AMD CPU and by 6% and 3% on one particular old Intel machine.
>
> Thanks in advance for any comments,

I wonder if you can at the place you choose to hook this in elide any
copying of padding between fields.

I'd rather have hooked such "high level" optimization in expand_assignment
where you can be reasonably sure you're seeing an actual source-level construct.

35 bytes seems to be much - what is the code-size impact?

IIRC the reason this may be slow isn't loading in smaller types than stored
before by the copy - the store buffers can handle this reasonably well.  It's
solely when previous smaller stores are

  a1) not mergeabe in the store buffer
  a2) not merged because earlier stores are already committed

and

  b) loaded afterwards as a type that would access multiple store buffers

a) would be sure to happen in case b) involves accessing padding.  Is the
Image Magick case one that involves padding in the structure?

Richard.

> Martin
>
>
> 2017-10-12  Martin Jambor  
>
> PR target/80689
> * tree-sra.h: New file.
> * ipa-prop.h: Moved declaration of build_ref_for_offset to
> tree-sra.h.
> * expr.c: Include params.h and tree-sra.h.
> (emit_move_elementwise): New function.
> (store_expr_with_bounds): Optionally use it.
> * ipa-cp.c: Include tree-sra.h.
> * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
> * config/i386/i386.c (ix86_option_override_internal): Set
> PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
> * tree-sra.c: Include tree-sra.h.
> (scalarizable_type_p): Renamed to
> simple_mix_of_records_and_arrays_p, made public, renamed the
> second parameter to allow_char_arrays.
> (extract_min_max_idx_from_array): New function.
> (completely_scalarize): Moved bits of the function to
> extract_min_max_idx_from_array.
>
>

Re: [openacc, testsuite, committed] Enable libgomp.oacc-/declare-.{c,f90} for non-nvidia devices

2017-10-17 Thread Thomas Schwinge

Hi!

On Mon, 16 Oct 2017 10:49:45 +0200, Tom de Vries  wrote:
> this patch enables some openacc test-cases for non-nvidia devices.
> 
> Committed.

Thanks!

> --- a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
> @@ -1,4 +1,4 @@
> -! { dg-do run  { target openacc_nvidia_accel_selected } }
> +! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }

> [...]

To restore the torture testing that we like to do for Fortran test cases,
I committed the following in trunk r253808, as obvious:

commit 7bc57773107f6db13fd807dd2213298e330c9097
Author: tschwinge 
Date:   Tue Oct 17 11:17:00 2017 +

Restore "dg-do run" directives for libgomp.oacc-fortran/declare-*.f90

libgomp/
* testsuite/libgomp.oacc-fortran/declare-1.f90: Restore "dg-do
run" directive.
* testsuite/libgomp.oacc-fortran/declare-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-3.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-4.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253808 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog| 9 +
 libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 | 1 +
 libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90 | 2 ++
 libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90 | 1 +
 libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90 | 2 ++
 libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 | 2 ++
 6 files changed, 17 insertions(+)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index a5af03b..35a2374 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,12 @@
+2017-10-17  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-fortran/declare-1.f90: Restore "dg-do
+   run" directive.
+   * testsuite/libgomp.oacc-fortran/declare-2.f90: Likewise.
+   * testsuite/libgomp.oacc-fortran/declare-3.f90: Likewise.
+   * testsuite/libgomp.oacc-fortran/declare-4.f90: Likewise.
+   * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.
+
 2017-10-16  Tom de Vries  
 
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: Don't require
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 
libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
index ca8831e..b502df4 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,3 +1,4 @@
+! { dg-do run }
 ! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
 
 ! Tests to exercise the declare directive along with
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90 
libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90
index aeea10a..0e759dd 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-2.f90
@@ -1,3 +1,5 @@
+! { dg-do run }
+
 module globalvars
   implicit none
   integer a
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90 
libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90
index 88b9aff..16164cd 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-3.f90
@@ -1,3 +1,4 @@
+! { dg-do run }
 ! { dg-skip-if "" { *-*-* } { "-DACC_MEM_SHARED=1" } }
 
 module globalvars
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90 
libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90
index 252c4ff..6c4e7c5 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-4.f90
@@ -1,3 +1,5 @@
+! { dg-do run }
+
 module vars
   implicit none
   real b
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90 
libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
index e91f26b..4f5c8f0 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-5.f90
@@ -1,3 +1,5 @@
+! { dg-do run }
+
 module vars
   implicit none
   real b


Grüße
 Thomas

Re: [patch][arm] gcc-7-branch: Fix bootstrap on FreeBSD

2017-10-17 Thread Kyrill Tkachov


Hi Andreas,

On 16/10/17 20:00, Andreas Tobler wrote:

Hi all,

I struggled over a bootstrap issue while building gcc-7 for 
armv7-*-freebsd*


I got a 'permission denied' while creating the arm-tables.opt file.

The source tree is located on a nfs server.

The below patch fixed it for me.

Ok to apply?

TIA,
Andreas

2017-10-16  Andreas Tobler  

* config/arm/t-arm (MD_INCLUDES): Create arm-tables.opt via
intermediate arm-tables.new like the other awk generated files.

Index: config/arm/t-arm
===
--- config/arm/t-arm(revision 253792)
+++ config/arm/t-arm(working copy)
@@ -75,8 +75,8 @@
  $(srcdir)/config/arm/arm-tables.opt: $(srcdir)/config/arm/parsecpu.awk \
$(srcdir)/config/arm/arm-cpus.in
 $(AWK) -f $(srcdir)/config/arm/parsecpu.awk -v cmd=opt \
-   $(srcdir)/config/arm/arm-cpus.in > \
-   $(srcdir)/config/arm/arm-tables.opt
+   $(srcdir)/config/arm/arm-cpus.in > arm-tables.new
+   mv arm-tables.new $(srcdir)/config/arm/arm-tables.opt



This looks ok to me as it makes the rule consistent with the other 
awk-generated files.


Out of interest, this looks like a small subset of Richard's patch [1] 
at r249971.

Have you tried that patch on the branch?

[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00223.html

Thanks,
Kyrill

  $(srcdir)/config/arm/arm-cpu.h: $(srcdir)/config/arm/parsecpu.awk \
$(srcdir)/config/arm/arm-cpus.in

Re: [PATCH][GRAPHITE] Consistently use region analysis

2017-10-17 Thread Richard Biener

On Sat, 14 Oct 2017, Sebastian Pop wrote:

> On Fri, Oct 13, 2017 at 8:02 AM, Richard Biener  wrote:
> 
> >
> > Now that SCEV instantiation handles regions properly (see hunk below
> > for a minor fix) we can use it consistently from GRAPHITE and thus
> > simplify scalar_evolution_in_region greatly.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > A lot of the parameter renaming stuff looks dead but a more "complete"
> > patch causes some more SPEC miscompares and also a bootstrap issue
> > (warning only, but an uninitialized use of 'int tem = 0;' ...).
> >
> > This is probably all latent issues coming up more easily now.
> >
> > Note that formerly we'd support invariant "parameters" defined in
> > the region by copying those out but now SCEV instantiation should
> > lead chrec_dont_know for stuff we cannot gobble up (anythin not
> > affine).  This probably only worked for the outermost scop in the
> > region and it means we need some other way to handle those.
> 
> 
> How important is it to move defs out the region?

I have no idea...

> Can we postpone handling those cases until we have an interesting case?

That would be my preference as well.

> The
> > original issue is probably that "parameters" cannot occur in
> > dependences and thus an array index cannot "depend" on the computation
> > of a parameter (and array indexes coming from "data" cannot be handled
> > anyway?).
> 
> 
> Correct.  Parameters can occur in array indexes as long as they cancel out.
> For example, the following dependence can be computed:
> 
> A[p] vs. A[p+3]
> 
> and the following dependence cannot be computed
> 
> A[p] vs. A[0]
> 
> as the value of the parameter p is not known at compilation time.
> 
> We don't seem to have any functional testcase for those
> > parameters that are not parameters.
> >
> >
> Ok.  Let's wait for a testcase that needs this functionality.

Good.  I'll install this and hope to get some spare cycles to look
at the latent wrong-code issues that have popped up on SPEC 2k6.

Richard.

> 
> > Richard.
> >
> > 2017-10-13  Richard Biener  
> >
> > * graphite-scop-detection.c
> > (scop_detection::stmt_has_simple_data_refs_p): Always use
> > the full nest as region.
> > (try_generate_gimple_bb): Likewise.
> > (build_scops): First split edges, then compute RPO order.
> > * sese.c (scalar_evolution_in_region): Simplify now that
> > SCEV can handle instantiation in regions.
> > * tree-scalar-evolution.c (instantiate_scev_name): Also instantiate
> > in the non-loop part of a function if requested.
> >
> 
> Looks good.
> Thanks.
> 
> 
> >
> > Index: gcc/graphite-scop-detection.c
> > ===
> > --- gcc/graphite-scop-detection.c   (revision 253721)
> > +++ gcc/graphite-scop-detection.c   (working copy)
> > @@ -1005,15 +1005,10 @@ scop_detection::graphite_can_represent_e
> >  bool
> >  scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
> >  {
> > -  edge nest;
> > +  edge nest = scop.entry;;
> >loop_p loop = loop_containing_stmt (stmt);
> >if (!loop_in_sese_p (loop, scop))
> > -{
> > -  nest = scop.entry;
> > -  loop = NULL;
> > -}
> > -  else
> > -nest = loop_preheader_edge (outermost_loop_in_sese (scop, gimple_bb
> > (stmt)));
> > +loop = NULL;
> >
> >auto_vec drs;
> >if (! graphite_find_data_references_in_stmt (nest, loop, stmt, ))
> > @@ -1381,15 +1350,10 @@ try_generate_gimple_bb (scop_p scop, bas
> >vec reads = vNULL;
> >
> >sese_l region = scop->scop_info->region;
> > -  edge nest;
> > +  edge nest = region.entry;
> >loop_p loop = bb->loop_father;
> >if (!loop_in_sese_p (loop, region))
> > -{
> > -  nest = region.entry;
> > -  loop = NULL;
> > -}
> > -  else
> > -nest = loop_preheader_edge (outermost_loop_in_sese (region, bb));
> > +loop = NULL;
> >
> >for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> > gsi_next ())
> > @@ -1696,6 +1660,13 @@ build_scops (vec *scops)
> >/* Now create scops from the lightweight SESEs.  */
> >vec scops_l = sb.get_scops ();
> >
> > +  /* For our out-of-SSA we need a block on s->entry, similar to how
> > + we include the LCSSA block in the region.  */
> > +  int i;
> > +  sese_l *s;
> > +  FOR_EACH_VEC_ELT (scops_l, i, s)
> > +s->entry = single_pred_edge (split_edge (s->entry));
> > +
> >/* Domwalk needs a bb to RPO mapping.  Compute it once here.  */
> >int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >int postorder_num = pre_and_rev_post_order_compute (NULL, postorder,
> > true);
> > @@ -1704,14 +1675,8 @@ build_scops (vec *scops)
> >  bb_to_rpo[postorder[i]] = i;
> >free (postorder);
> >
> > -  int i;
> > -  sese_l *s;
> >FOR_EACH_VEC_ELT (scops_l, i, s)
> >  {
> > -  /* For our

Re: [PATCH PR/82546] tree node size

2017-10-17 Thread Richard Biener

On Mon, 16 Oct 2017, Nathan Sidwell wrote:

> On 10/16/2017 02:49 AM, Richard Biener wrote:
> > On October 13, 2017 8:29:40 PM GMT+02:00, Nathan Sidwell 
> > wrote:
> 
> > > I intend to continue cleaning this up of course.  It's not clear to me
> > > whether we should cache these node sizes in an array, and the way it
> > > goes about checking nodes with nested switches is understandable, but
> > > possible not the fastest solution. However let's at least get the
> > > sizing
> > > right first.
> > 
> > We were conservative exactly to avoid the langhook here. I think there's
> > similar 'bug' on the decl side.
> 
> The other code types (decls, exprs, etc) call the langhook.  tcc_type seems
> the exception (now?).

Sorry for not looking at the patch before replying.  The patch looks ok
but shouldn't LANG_TYPE be also handled by the FE?  LANG_TYPE itself
is an odd beast if I may say that - it's only used by the C++ and Ada FEs
and the Ada FE does only

/* Make a dummy type corresponding to GNAT_TYPE.  */

tree
make_dummy_type (Entity_Id gnat_type)
{
...
  /* Create a debug type so that debug info consumers only see an 
unspecified
 type.  */
  if (Needs_Debug_Info (gnat_type))
{
  debug_type = make_node (LANG_TYPE);
  SET_TYPE_DEBUG_TYPE (gnu_type, debug_type);

  TYPE_NAME (debug_type) = TYPE_NAME (gnu_type);
  TYPE_ARTIFICIAL (debug_type) = TYPE_ARTIFICIAL (gnu_type);
}

Thus the patch is ok.

Thanks,
Richard.

[patch] Relax IVOPTs restriction on auto-increment

2017-10-17 Thread Eric Botcazou

Hi,

add_autoinc_candidates begins with this test:

  /* If we insert the increment in any position other than the standard
 ones, we must ensure that it is incremented once per iteration.
 It must not be in an inner nested loop, or one side of an if
 statement.  */
  if (use_bb->loop_father != data->current_loop
  || !dominated_by_p (CDI_DOMINATORS, data->current_loop->latch, use_bb)
  || stmt_could_throw_p (use->stmt)
  || !cst_and_fits_in_hwi (step))
return;

This means that, for languages supporting non-call exceptions like Ada, no 
auto-inc candidates will be added for a simple iteration over an array in most 
cases since stmt_could_throw_p returns true.  I don't think that's necessary, 
or else it would also be necessary to test gimple_could_trap_p, which would 
have the same effect for all languages.  Note that add_autoinc_candidates is 
only invoked for USE_ADDRESS use types:

  /* At last, add auto-incremental candidates.  Make such variables
 important since other iv uses with same base object may be based
 on it.  */
  if (use != NULL && use->type == USE_ADDRESS)
add_autoinc_candidates (data, iv->base, iv->step, true, use);

So the attached patch replaces it with stmt_can_throw_internal (which is the 
predicate used to compute the flow of control) as e.g. in the vectorizer.

Bootstrapped/regtested on PowerPC64/Linux, OK for the mainline?


2017-10-17  Eric Botcazou  

* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Bail out only if
the use statement can throw internally.

-- 
Eric BotcazouIndex: tree-ssa-loop-ivopts.c
===
--- tree-ssa-loop-ivopts.c	(revision 253767)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -3140,7 +3140,7 @@ add_autoinc_candidates (struct ivopts_da
  statement.  */
   if (use_bb->loop_father != data->current_loop
   || !dominated_by_p (CDI_DOMINATORS, data->current_loop->latch, use_bb)
-  || stmt_could_throw_p (use->stmt)
+  || stmt_can_throw_internal (use->stmt)
   || !cst_and_fits_in_hwi (step))
 return;

[patch] avoid printing leading 0 in widest_int hex dumps

2017-10-17 Thread Aldy Hernandez


Hi folks!

Calling print_hex() on a widest_int with the most significant bit turned 
on can lead to a leading zero being printed (0x0). This produces 
confusing dumps to say the least, especially when you incorrectly assume 
an integer is NOT signed :).


OK for trunk?
gcc/

	* wide-int-print.cc (print_hex): Avoid printing a leading zero.
	* wide-int.cc (test_printing): Add test for hex dumping of
	a widest_int conversion.

diff --git a/gcc/wide-int-print.cc b/gcc/wide-int-print.cc
index 36d8ad863f5..36a681d4b73 100644
--- a/gcc/wide-int-print.cc
+++ b/gcc/wide-int-print.cc
@@ -123,7 +123,13 @@ print_hex (const wide_int_ref , char *buf)
 
 	}
   else
-	buf += sprintf (buf, "0x" HOST_WIDE_INT_PRINT_HEX_PURE, wi.elt (--i));
+	{
+	  buf += sprintf (buf, "0x");
+	  --i;
+	  /* Avoid printing a leading zero.  */
+	  if (wi.elt (i))
+	buf += sprintf (buf, HOST_WIDE_INT_PRINT_HEX_PURE, wi.elt (i));
+	}
 
   while (--i >= 0)
 	buf += sprintf (buf, HOST_WIDE_INT_PRINT_PADDED_HEX, wi.elt (i));
diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 71e24ec22af..123a70fa9b8 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -2220,6 +2220,14 @@ test_printing ()
   VALUE_TYPE a = from_int (42);
   assert_deceq ("42", a, SIGNED);
   assert_hexeq ("0x2a", a);
+
+  /* Test that converting to widest_int still produces a sane hex
+ representation.  */
+  VALUE_TYPE big = from_int (0x);
+  widest_int huge = widest_int::from (big, UNSIGNED);
+  char buf[WIDE_INT_PRINT_BUFFER_SIZE];
+  print_hex (huge, buf);
+  ASSERT_TRUE (buf[0] == '0' && buf[1] == 'x' && buf[2] == 'f');
 }
 
 /* Verify that various operations work correctly for VALUE_TYPE,

Re: [PATCH] Improve FAIL message for dump-*-times functions.

2017-10-17 Thread Martin Liška


On 10/11/2017 06:56 PM, Segher Boessenkool wrote:

Hi!

On Wed, Oct 11, 2017 at 10:14:29AM +0200, Martin Liška wrote:

This patch helps to find why an expected number of scan patterns does not match:

FAIL: gcc.dg/unroll-3.c scan-tree-dump-times cunrolli "loop with 3 iterations 
completely unrolled" 222 (found 1 times)
FAIL: c-c++-common/attr-simd-2.c  -Wc++-compat   scan-assembler-times 
_ZGVbN4_simd_attr: 111 (found 1 times)


Cool, looks fine to me (but I can't approve it).  Some suggestions:


diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index bab23e8e165..e90e61c29ae 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -247,10 +247,11 @@ proc scan-assembler-times { args } {
  set text [read $fd]
  close $fd
  
-if { [llength [regexp -inline -all -- $pattern $text]] == [lindex $args 1]} {

+set pattern_count [llength [regexp -inline -all -- $pattern $text]]
+if {$pattern_count == [lindex $args 1]} {
pass "$testcase scan-assembler-times $pp_pattern [lindex $args 1]"
  } else {
-   fail "$testcase scan-assembler-times $pp_pattern [lindex $args 1]"
+   fail "$testcase scan-assembler-times $pp_pattern [lindex $args 1] (found 
$pattern_count times)"
  }
  }


pattern_count is not such a great name (it's the result count, instead).

You could factor out the [lindex $args 1] to a variable.

You do both of these in scandump.exp already, so why not here :-)


Thanks for review, I fixed that and I'm going to install the patch.

Martin




Segher



diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
index 2e6eebfaf33..4a64ac6e05d 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -86,6 +86,7 @@ proc scan-dump-times { args } {
  }
  
  set testcase [testname-for-summary]

+set times [lindex $args 2]
  set suf [dump-suffix [lindex $args 3]]
  set printable_pattern [make_pattern_printable [lindex $args 1]]
  set testname "$testcase scan-[lindex $args 0]-dump-times $suf 
\"$printable_pattern\" [lindex $args 2]"
@@ -101,10 +102,11 @@ proc scan-dump-times { args } {
  set text [read $fd]
  close $fd
  
-if { [llength [regexp -inline -all -- [lindex $args 1] $text]] == [lindex $args 2]} {

+set result_count [llength [regexp -inline -all -- [lindex $args 1] $text]]
+if {$result_count == $times} {
  pass "$testname"
  } else {
-fail "$testname"
+fail "$testname (found $result_count times)"
  }
  }

Re: Missing REDUCE[SD,SS] intrinsics

2017-10-17 Thread Kirill Yukhin

Hello Olga, Sebastian,
On 16 Oct 11:20, Peryt, Sebastian wrote:
> Hi,
> 
> This patch written by Olga Makhotina adds missing intrinsics for 
> REDUCE[SD,SS].
> 
> 16.10.2017 Olga Makhotina 
> 
> gcc/
>   * config/i386/avx512dqintrin.h (_mm_mask_reduce_sd,
>   _mm_maskz_reduce_sd, _mm_mask_reduce_ss, 
>   _mm_maskz_reduce_ss): New intrinsics.
>   * config/i386/i386-builtin.def (__builtin_ia32_reducesd_mask,
>   __builtin_ia32_reducess_mask): New builtin.
>   (__builtin_ia32_reducesd, __builtin_ia32_reducess): Remove.
>   * config/i386/sse.md (reduces): Renamed to ...
>   (reduces): ... this.
>   (vreduce\t{%3, %2, %1, %0|%0, %1, %2, %3}): 
> Changed to ...
>   (vreduce\t{%3, %2, %1, %0|
>   %0, %1, %2, %3}): ... this.
> 
> gcc/testsuite/
>   * gcc.target/i386/avx512dq-vreducesd-1.c (_mm_mask_reduce_sd,
>   _mm_maskz_reduce_sd): Test new intrinsics.
>   * gcc.target/i386/avx512dq-vreducesd-2.c: New.
>   * gcc.target/i386/avx512dq-vreducess-1.c (_mm_mask_reduce_ss,
>   _mm_maskz_reduce_ss): Test new intrinsics.
>   * gcc.target/i386/avx512dq-vreducess-2.c: New.
>   * gcc.target/i386/avx-1.c (__builtin_ia32_reducesd,
>   __builtin_ia32_reducess): Remove builtin.
>   (__builtin_ia32_reducesd_mask,
>   __builtin_ia32_reducess_mask): Test new builtin.
>   * gcc.target/i386/sse-13.c: Ditto.
>   * gcc.target/i386/sse-23.c: Ditto.
> 
> Is it ok for trunk?
Patch is OK for trunk. I've checked it in w/ few minor changes in ChangeLog 
entries.

> Thanks,
> Sebastian
> 

--
Thanks, K

96 matches

Mail list logo