Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-06 Thread Segher Boessenkool
Whoops, left off a bit:

> 2016-07-06  Segher Boessenkool  
> 
>   PR target/70098
>   PR target/71763
>   * config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
>   *ctr_internal5, *ctr_internal6): 

* config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
*ctr_internal5, *ctr_internal6): Add *wi to the output
constraint.


Segher


[PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-06 Thread Segher Boessenkool
Similar to PR70098, which is about integers in floating point registers,
we can have the completely analogous problem with vector registers as well
now that we allow integers in vector registers.  So, this patch solves it
in the same way.  This only works for targets with direct move.

To recap: register allocation can decide to put an integer mode value in
a floating point or vector register.  If that register is used in a bd*z
instruction, which is a jump instruction, reload can not do an output
reload on it (it does not do output reloads on any jump insns), so the
float or vector register will remain, and we have to allow it here or
recog will ICE.  Later on we will split this to valid instructions,
including a move from that fp/vec register to an int register; it is this
move that will still fail (PR70098) if we do not have direct move enabled.

Bootstrapped and tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra);
testing on powerpc64le-linux in progress.  Also tested the new testcase
separately.  And bootstrapped/tested on powerp64le-linux.

This will need to go to the 6 branch together with the int-in-vector
patches (and the previous 70098 patch if that isn't there yet).

Committing to trunk.


Segher


2016-07-06  Segher Boessenkool  

PR target/70098
PR target/71763
* config/rs6000/rs6000.md (*ctr_internal1, *ctr_internal2,
*ctr_internal5, *ctr_internal6): 

gcc/testsuite/
PR target/70098
PR target/71763
* gcc.target/powerpc/pr71763.c: New file.

---
 gcc/config/rs6000/rs6000.md|  8 
 gcc/testsuite/gcc.target/powerpc/pr71763.c | 27 +++
 2 files changed, 31 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr71763.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a7615b1..fcb70e5 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -12151,7 +12151,7 @@ (define_insn "*ctr_internal1"
  (const_int 1))
  (label_ref (match_operand 0 "" ""))
  (pc)))
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
@@ -12175,7 +12175,7 @@ (define_insn "*ctr_internal2"
  (const_int 1))
  (pc)
  (label_ref (match_operand 0 "" ""
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
@@ -12201,7 +12201,7 @@ (define_insn "*ctr_internal5"
  (const_int 1))
  (label_ref (match_operand 0 "" ""))
  (pc)))
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
@@ -12225,7 +12225,7 @@ (define_insn "*ctr_internal6"
  (const_int 1))
  (pc)
  (label_ref (match_operand 0 "" ""
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*c*l")
+   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
(plus:P (match_dup 1)
 (const_int -1)))
(clobber (match_scratch:CC 3 "=X,,,"))
diff --git a/gcc/testsuite/gcc.target/powerpc/pr71763.c 
b/gcc/testsuite/gcc.target/powerpc/pr71763.c
new file mode 100644
index 000..7910a90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr71763.c
@@ -0,0 +1,27 @@
+// PR target/71763
+// { dg-do compile }
+// { dg-options "-O1 -mvsx" }
+// { dg-xfail-if "PR70098" { lp64 && powerpc64_no_dm } }
+// { dg-prune-output ".*internal compiler error.*" }
+
+int a, b;
+float c;
+
+void fn2(void);
+
+void fn1(void)
+{
+long d;
+
+for (d = 3; d; d--) {
+for (a = 0; a <= 1; a++) {
+b &= 1;
+if (b) {
+for (;;) {
+fn2();
+c = d;
+}
+}
+}
+}
+}
-- 
1.9.3



Re: [Driver] Add support for -fuse-ld=lld

2016-07-06 Thread Trevor Saunders
On Mon, Jul 04, 2016 at 09:36:52PM +0200, Markus Trippelsdorf wrote:
> On 2016.07.04 at 10:08 -0700, H.J. Lu wrote:
> > On Sun, Jul 3, 2016 at 9:38 PM, Davide Italiano  
> > wrote:
> > > On Thu, Jun 23, 2016 at 9:11 PM, Davide Italiano  
> > > wrote:
> > >> + HJ who wrote the code for the option originally.
> > >>
> > >> On Thu, Jun 23, 2016 at 9:01 PM, Davide Italiano  
> > >> wrote:
> > >>> LLVM currently ships with a new ELF linker http://lld.llvm.org/.
> > >>> I experiment a lot with gcc and lld so it would be nice if
> > >>> -fuse-ld=lld is supported (considering the linker is now mature enough
> > >>> to link large C/C++ applications).
> > >>>
> > >>> Also, IMHO, -fuse-ld should be a generic facility which accept other
> > >>> linkers (as long as they follow the convention ld.), and should
> > >>> also support absolute path, e.g. -fuse-ld=/usr/local/bin/ld.mylinker.
> > >>> Probably outside of the scope of this patch, but I thought worth
> > >>> mentioning.
> > >
> > > Hi, can anybody take a look?
> > 
> > lld isn't compatible with GCC:
> > 
> > https://llvm.org/bugs/show_bug.cgi?id=28414
> 
> Besides the technical issues, this also raises the question if it is
> right to support lld at all. Because this project was obviously started
> to replace the GNU linkers (ld.bfd and gold) in the long run.

Technically it seems like it would be useful to support
-fuse-ld= so you can say easily test gcc
with a ld.bfd you just built.

> So I see no reason why it should be supported in GCC.
> 
> (And who needs a buggy new ELF linker anyway?)

I'm not particularly thrilled by a new linker, but presumably the bugs
will get fixed, and it does link libxul.so and presumably other things
but I don't have data, and that is certainly useful.

Trev



Re: [PATCH 0/9] remove some manual memory management

2016-07-06 Thread Trevor Saunders
On Thu, Jun 30, 2016 at 12:33:24PM +0200, Bernd Schmidt wrote:
> On 06/29/2016 02:26 PM, tbsaunde+...@tbsaunde.org wrote:
> > patches individually bootstrapped and regtested on x86_64-linux-gnu, ok?
> 
> I think these all look sensible. ChangeLogs ought to have slightly more
> information than "Adjust" in some cases, especially when you're changing
> function arguments.

I'm still a little suprised people actually read ChangeLogs, but anyway

I doubt people want to be spammed with a bunch more email just for some
changes to ChangeLogs so I'll try and fix that up and then commit this.
Generally I've tried to just comment in the ChangeLog about the
meaningful change in the commit and then the rest generally is just
adjustments for that change, but not always and I guess that isn't
always the best textual description of a diff.

Thanks!

Trev

> 
> 
> Bernd
> 


Re: [PATCH 0/6] remove some usage of rtx_{insn,expr}_list

2016-07-06 Thread Trevor Saunders
On Fri, Jul 01, 2016 at 03:34:51PM +0200, Bernd Schmidt wrote:
> On 06/21/2016 04:47 PM, Trevor Saunders wrote:
> > On Mon, Jun 20, 2016 at 06:52:35PM +0200, Bernd Schmidt wrote:
> 
> > > So that's a second more in real time - was the machine very busy at the 
> > > time
> > > you ran these tests so that these aren't meaningful, or is there a need to
> > > investigate this?
> > 
> > Well, it was on my laptop which was running a web browser and stuff.  I
> > wasn't aware of it being busy, but it also wasn't a extra stable
> > machine.  I also noticed a bit of variance within the same configuration
> > so I'm not terribly concerned, but it is odd.
> 
> I ran some tests myself, and while they were somewhat inconclusive, they
> also didn't give me reason to think there's something to worry about. So ok
> for the first four patches.

Ok, thanks! committed now.

Trev



Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-06 Thread Michael Meissner
On Wed, Jul 06, 2016 at 05:01:38PM -0500, Peter Bergner wrote:
> On 7/6/16 2:19 PM, Michael Meissner wrote:
> >On Tue, Jul 05, 2016 at 09:26:50PM -0500, Peter Bergner wrote:
> >>-  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
> >>+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR
> >>+   | OPTION_MASK_P9_DFORM_VECTOR);
> >> }
> >
> >Note, this should be
> >
> >+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR_SCALAR
> >+| OPTION_MASK_P9_DFORM_VECTOR);
> 
> I think you mean the following, since we have to disable -mpower9-vector
> too:
> 
> -  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
> +  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR
> + | OPTION_MASK_P9_DFORM_SCALAR
> + | OPTION_MASK_P9_DFORM_VECTOR);
> 
> I had thought about adding the dform scalar flag, but it was already
> correctly disabled and I wasn't sure whether we could have the p9
> dform scalar without the vector part.  Probably not, so consider
> the patch above as the latest.

Yes, you can have P9 dform scalar without P9 dform vector.

> 
> >However, we probably need to add all of the other options that depend on VSX.
> 
> Yes, there is a cascade affect on the disabling of options when you
> explicitly disable something.  It'd be nice if this was somehow
> automated, where we have some table showing the dependencies of
> the options and the compiler just follows the table disabling things
> that depend on something that has been disabled.  Could be hard to
> make that dependency list though, given how many we have.

Yep.  I'm thinking we need masks in rs6000-cpus.def of the options to turn off
if -mno-vsx (and even worse -mno-altivec).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [v3 PATCH] Initial implementation of std::any.

2016-07-06 Thread Jonathan Wakely

On 07/07/16 00:57 +0300, Ville Voutilainen wrote:

Tested on Linux-x64.

2016-07-07  Ville Voutilainen  

   Implement std::any. This is not yet a C++17-conforming implementation,
   but just a copy of std::experimental::any with the proper namespace
   and C++17 flagging. Proper conformance to C++17 will follow.


Strike this part from the ChangeLog (leave it in the git or svn commit
message if you like, but not the ChangeLog file).

The Doxygen @file comment for include/std/any says:

+ *  This is a TS C++ Library header.

s/TS/Standard/

OK for trunk with those tweaks.



Re: [PATCH] c/71552 - Confusing error for incorrect struct initialization

2016-07-06 Thread Martin Sebor

On 07/06/2016 03:55 PM, Jeff Law wrote:

On 06/30/2016 04:38 PM, Martin Sebor wrote:

On 06/20/2016 08:52 AM, Joseph Myers wrote:

On Sat, 18 Jun 2016, Martin Sebor wrote:


The attached patch slightly changes the order in which initializers
are checked for type compatibility to issue the same error for static
initializers of incompatible types as for automatic objects, rather
than rejecting the former for their lack of constness first.


OK, presuming the patch has passed the usual testing.


Thanks.  I committed it in r237829.  The reporter wants to know
if the patch can also be backported to 5 and or 6.  Should I go
ahead?

My inclination would be no -- it's not a regression or incorrect code
generation.


Thanks.  I let the reporter know.

Martin


Re: [PATCH] c++/60760 - arithmetic on null pointers should not be allowed in constant expressions

2016-07-06 Thread Martin Sebor

On 06/23/2016 03:36 PM, Jason Merrill wrote:

On 06/20/2016 10:17 PM, Martin Sebor wrote:

+  && tree_int_cst_equal (lhs, null_pointer_node)
+  && !tree_int_cst_equal (rhs, integer_zero_node))


Not integer_zerop?


+"invalid conversion involving a null pointer");

...

+"invalid conversion from %qT to %qT",


The conversion isn't invalid, it just isn't a constant expression.


(Sorry for the delay following up on this review.  I got busy
with something else.)

I've adjusted the text of the diagnostics, though the first one
is also issued for conversions that are invalid even outside
constexpr, such as those that cast away constness, or those that
cast to incomplete type.  Without -fpermissve those are already
diagnosed by this point but I'm not sure how much trouble to go
to here to avoid diagnosing them again, or at all with
-fpermissve.


For
the null pointer to pointer conversion, does this properly allow
conversion to void* or to base*?


It didn't handle either but does now.  Thank you for calling it
out.  Surprisingly, a regression run including libstdc++ didn't
catch it.  I've added tests to exercise it.




+if (integer_zerop (op))

...

+ else if (!integer_zerop (op))


The second test seems redundant.


I have removed it.

Martin
PR c++/60760 - arithmetic on null pointers should not be allowed in constant
  expressions
PR c++/71091 - constexpr reference bound to a null pointer dereference
   accepted

gcc/cp/ChangeLog:
2016-07-06  Martin Sebor  

PR c++/60760
PR c++/71091
* constexpr.c (cxx_eval_binary_expression): Reject invalid expressions
involving null pointers.
(cxx_eval_component_reference): Reject null pointer dereferences.
(cxx_eval_indirect_ref): Reject indirecting through null pointers.
(cxx_eval_constant_expression): Reject invalid expressions involving
null pointers.

gcc/testsuite/ChangeLog:
2016-07-06  Martin Sebor  

PR c++/60760
PR c++/71091
	* g++.dg/cpp0x/constexpr-cast.C: New test.
* g++.dg/cpp0x/constexpr-nullptr-2.C: New test.
* g++.dg/cpp1y/constexpr-sfinae.C: Correct.
* g++.dg/ubsan/pr63956.C: Correct.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ba40435..83954d8 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1811,6 +1811,13 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t,
 		   || null_member_pointer_value_p (rhs)))
 	r = constant_boolean_node (!is_code_eq, type);
 }
+  if (code == POINTER_PLUS_EXPR && !*non_constant_p
+  && integer_zerop (lhs) && !integer_zerop (rhs))
+{
+  if (!ctx->quiet)
+error ("arithmetic involving a null pointer in %qE", lhs);
+  return t;
+}
 
   if (r == NULL_TREE)
 r = fold_binary_loc (loc, code, type, lhs, rhs);
@@ -2151,6 +2158,11 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, tree t,
   tree whole = cxx_eval_constant_expression (ctx, orig_whole,
 	 lval,
 	 non_constant_p, overflow_p);
+  if (TREE_CODE (whole) == INDIRECT_REF
+  && integer_zerop (TREE_OPERAND (whole, 0))
+  && !ctx->quiet)
+error ("dereferencing a null pointer in %qE", orig_whole);
+
   if (TREE_CODE (whole) == PTRMEM_CST)
 whole = cplus_expand_constant (whole);
   if (whole == orig_whole)
@@ -2911,6 +2923,14 @@ cxx_eval_indirect_ref (const constexpr_ctx *ctx, tree t,
   if (*non_constant_p)
 	return t;
 
+  if (integer_zerop (op0))
+	{
+	  if (!ctx->quiet)
+	error ("dereferencing a null pointer");
+	  *non_constant_p = true;
+	  return t;
+	}
+
   r = cxx_fold_indirect_ref (EXPR_LOCATION (t), TREE_TYPE (t), op0,
  _base);
   if (r == NULL_TREE)
@@ -3559,10 +3579,22 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 	  if (!flag_permissive || ctx->quiet)
 	*overflow_p = true;
 	}
+
+  if (TREE_CODE (t) == INTEGER_CST
+  && TREE_CODE (TREE_TYPE (t)) == POINTER_TYPE
+  && !integer_zerop (t))
+{
+  if (!ctx->quiet)
+error ("value %qE of type %qT is not a constant expression",
+		   t, TREE_TYPE (t));
+	  *non_constant_p = true;
+}
+
   return t;
 }
 
-  switch (TREE_CODE (t))
+  tree_code tcode = TREE_CODE (t);
+  switch (tcode)
 {
 case RESULT_DECL:
   if (lval)
@@ -3973,7 +4005,6 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 case NOP_EXPR:
 case UNARY_PLUS_EXPR:
   {
-	enum tree_code tcode = TREE_CODE (t);
 	tree oldop = TREE_OPERAND (t, 0);
 
 	tree op = cxx_eval_constant_expression (ctx, oldop,
@@ -3999,15 +4030,48 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 		return t;
 	  }
 	  }
-	if (POINTER_TYPE_P (type)
-	&& TREE_CODE (op) == INTEGER_CST
-	&& !integer_zerop (op))
-	  {
-	if (!ctx->quiet)
-	  error_at (EXPR_LOC_OR_LOC (t, input_location),
-			

Re: Improve insert/emplace robustness to self insertion

2016-07-06 Thread Jonathan Wakely

On 06/07/16 21:46 +0200, François Dumont wrote:

On 05/07/2016 12:47, Jonathan Wakely wrote:

On 04/07/16 15:55 +0100, Jonathan Wakely wrote:

I'm getting nervous about the smart insertion trick to avoid making a
copy, I have a devious testcase in mind which will break with that
change. I'll share the testcase later today.


Here's a testcase which passes with libstdc++ but fails with libc++
because libc++ doesn't make a copy when inserting a T lvalue into
std::vector:

#include 
#include 
#include 

struct T
{
T(int v = 0) : value(v) { }
T(const T& t);
T& operator=(const T& t);
void make_child() { child = std::make_unique(value + 10); }
std::unique_ptr child;
int value;
};

T::T(const T& t) : value(t.value)
{
if (t.child)
  child.reset(new T(*t.child));
}

T& T::operator=(const T& t)
{
value = t.value;
if (t.child)
{
  if (child)
*child = *t.child;
  else
child.reset(new T(*t.child));
}
else
  child.reset();
return *this;
}

int main()
{
std::vector v;
v.reserve(3);
v.push_back(T(1));
v.back().make_child();
v.push_back(T(2));
v.back().make_child();

assert(v[1].child->value == 12);
assert(v[1].child->child == nullptr);

v.insert(v.begin(), *v[1].child);

assert(v[0].value == 12);
assert(v[0].child == nullptr);
}

The problem is that the object being inserted (*v[1].child) is not an
element of the vector, so the optimization assumes it is unchanged by
shuffling the existing elements. That assumption is wrong.

As far as I can see, this program is perfectly valid. It's slightly
contrived to prove a point, but it's not entirely unrealistic code.



Don't you plan to add it to the testsuite ?


Yes, I'm just very busy with other things, I've only been doing
anything on std::vector so you're not waiting too long for responses
:-)

On my side I rebase part of my patch to reorganize a little bit code. 
I reintroduced _M_realloc_insert which isolates the code of 
_M_insert_aux used when we need to reallocate memory. So _M_insert_aux 
is used only when insertion can be done in place. It is a nice 
replacement for _M_emplace_back_aux that have been removed. In most of 
vector modifiers we start checking if we need to reallocate or not. 


Great, I was thinking of doing that kind of refactoring.

I'll review it ASAP.


Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-06 Thread Peter Bergner

On 7/6/16 2:19 PM, Michael Meissner wrote:

On Tue, Jul 05, 2016 at 09:26:50PM -0500, Peter Bergner wrote:

-  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR
+   | OPTION_MASK_P9_DFORM_VECTOR);
 }


Note, this should be

+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR_SCALAR
+   | OPTION_MASK_P9_DFORM_VECTOR);


I think you mean the following, since we have to disable -mpower9-vector
too:

-  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR
+   | OPTION_MASK_P9_DFORM_SCALAR
+   | OPTION_MASK_P9_DFORM_VECTOR);

I had thought about adding the dform scalar flag, but it was already
correctly disabled and I wasn't sure whether we could have the p9
dform scalar without the vector part.  Probably not, so consider
the patch above as the latest.



However, we probably need to add all of the other options that depend on VSX.


Yes, there is a cascade affect on the disabling of options when you
explicitly disable something.  It'd be nice if this was somehow
automated, where we have some table showing the dependencies of
the options and the compiler just follows the table disabling things
that depend on something that has been disabled.  Could be hard to
make that dependency list though, given how many we have.

Peter





[v3 PATCH] Initial implementation of std::any.

2016-07-06 Thread Ville Voutilainen
Tested on Linux-x64.

2016-07-07  Ville Voutilainen  

Implement std::any. This is not yet a C++17-conforming implementation,
but just a copy of std::experimental::any with the proper namespace
and C++17 flagging. Proper conformance to C++17 will follow.
* include/Makefile.am: Add any and c++17_warning.h to exported headers.
* include/Makefile.in: Likewise.
* include/std/any: New.
* testsuite/20_util/any/assign/1.cc: Likewise.
* testsuite/20_util/any/assign/2.cc: Likewise.
* testsuite/20_util/any/assign/self.cc: Likewise.
* testsuite/20_util/any/cons/1.cc: Likewise.
* testsuite/20_util/any/cons/2.cc: Likewise.
* testsuite/20_util/any/cons/aligned.cc: Likewise.
* testsuite/20_util/any/cons/nontrivial.cc: Likewise.
* testsuite/20_util/any/misc/any_cast.cc: Likewise.
* testsuite/20_util/any/misc/any_cast_neg.cc: Likewise.
* testsuite/20_util/any/misc/any_cast_no_rtti.cc: Likewise.
* testsuite/20_util/any/misc/swap.cc: Likewise.
* testsuite/20_util/any/modifiers/1.cc: Likewise.
* testsuite/20_util/any/observers/type.cc: Likewise.
* testsuite/20_util/any/typedefs.cc: Likewise.


std_any.diff.gz
Description: GNU Zip compressed data


Re: [PATCH] c/71552 - Confusing error for incorrect struct initialization

2016-07-06 Thread Jeff Law

On 06/30/2016 04:38 PM, Martin Sebor wrote:

On 06/20/2016 08:52 AM, Joseph Myers wrote:

On Sat, 18 Jun 2016, Martin Sebor wrote:


The attached patch slightly changes the order in which initializers
are checked for type compatibility to issue the same error for static
initializers of incompatible types as for automatic objects, rather
than rejecting the former for their lack of constness first.


OK, presuming the patch has passed the usual testing.


Thanks.  I committed it in r237829.  The reporter wants to know
if the patch can also be backported to 5 and or 6.  Should I go
ahead?
My inclination would be no -- it's not a regression or incorrect code 
generation.


jeff


Re: [PATCH] Prevent LTO wrappers to process a recursive execution

2016-07-06 Thread Jeff Law

On 06/23/2016 02:23 AM, Martin Liška wrote:

On 06/23/2016 06:57 AM, Jeff Law wrote:

Is this still something you want to pursue?  It looks pretty reasonable and one 
could make an argument that it's a good idea in and of itself.

jeff


Yeah, I would like to install the patch :) Can I take your reply as signal that 
it's accepted?

Yes.  It looked reasonable to me.

jeff



Re: [PATCH] remove unused CTOR_LISTS_DEFINED_EXTERNALLY macro

2016-07-06 Thread Jeff Law

On 06/27/2016 06:19 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

The last target to use this was i386-interix, so since that is gone we
don't need this anymore.

bootstrapped and regtested on x86-linux-gnu, ok?

Trev

libgcc/ChangeLog:

2016-06-27  Trevor Saunders  

* libgcc2.c (SYMBOL__MAIN): Remove checks for
CTOR_LISTS_DEFINED_EXTERNALLY.

OK.
jeff



Re: Improve insert/emplace robustness to self insertion

2016-07-06 Thread François Dumont

On 05/07/2016 12:47, Jonathan Wakely wrote:

On 04/07/16 15:55 +0100, Jonathan Wakely wrote:

I'm getting nervous about the smart insertion trick to avoid making a
copy, I have a devious testcase in mind which will break with that
change. I'll share the testcase later today.


Here's a testcase which passes with libstdc++ but fails with libc++
because libc++ doesn't make a copy when inserting a T lvalue into
std::vector:

#include 
#include 
#include 

struct T
{
 T(int v = 0) : value(v) { }
 T(const T& t);
 T& operator=(const T& t);
 void make_child() { child = std::make_unique(value + 10); }
 std::unique_ptr child;
 int value;
};

T::T(const T& t) : value(t.value)
{
 if (t.child)
   child.reset(new T(*t.child));
}

T& T::operator=(const T& t)
{
 value = t.value;
 if (t.child)
 {
   if (child)
 *child = *t.child;
   else
 child.reset(new T(*t.child));
 }
 else
   child.reset();
 return *this;
}

int main()
{
 std::vector v;
 v.reserve(3);
 v.push_back(T(1));
 v.back().make_child();
 v.push_back(T(2));
 v.back().make_child();

 assert(v[1].child->value == 12);
 assert(v[1].child->child == nullptr);

 v.insert(v.begin(), *v[1].child);

 assert(v[0].value == 12);
 assert(v[0].child == nullptr);
}

The problem is that the object being inserted (*v[1].child) is not an
element of the vector, so the optimization assumes it is unchanged by
shuffling the existing elements. That assumption is wrong.

As far as I can see, this program is perfectly valid. It's slightly
contrived to prove a point, but it's not entirely unrealistic code.



Don't you plan to add it to the testsuite ?

On my side I rebase part of my patch to reorganize a little bit code. I 
reintroduced _M_realloc_insert which isolates the code of _M_insert_aux 
used when we need to reallocate memory. So _M_insert_aux is used only 
when insertion can be done in place. It is a nice replacement for 
_M_emplace_back_aux that have been removed. In most of vector modifiers 
we start checking if we need to reallocate or not. With this 
reorganization we don't check it several times. Moreover, as soon as we 
reallocate we know that we don't need to do any temporary copy so 
insert_vs_emplace.cc test04 has been adapted and we now have no 
situation where emplace and insert are not equivalent.


* include/bits/stl_vector.h (push_back(const value_type&)): Forward
to _M_realloc_insert.
(insert(const_iterator, value_type&&)): Forward to _M_insert_rval.
(_M_realloc_insert): Declare new function.
(_M_emplace_back_aux): Remove definition.
* include/bits/vector.tcc (emplace_back(_Args...)):
Use _M_realloc_insert.
(insert(const_iterator, const value_type&)): Likewise.
(_M_insert_rval, _M_emplace_aux): Likewise.
(_M_emplace_back_aux): Remove declaration.
(_M_realloc_insert): Define.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc:
Adjust expected results for emplacing an lvalue with reallocation.

Tested under Linux x86_64.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index 8e8aa7c..85abf4a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -946,11 +946,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	++this->_M_impl._M_finish;
 	  }
 	else
-#if __cplusplus >= 201103L
-	  _M_emplace_back_aux(__x);
-#else
-	  _M_insert_aux(end(), __x);
-#endif
+	  _M_realloc_insert(end(), __x);
   }
 
 #if __cplusplus >= 201103L
@@ -1436,6 +1432,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   // Called by insert(p,x)
   void
   _M_insert_aux(iterator __position, const value_type& __x);
+
+  void
+  _M_realloc_insert(iterator __position, const value_type& __x);
 #else
   // A value_type object constructed with _Alloc_traits::construct()
   // and destroyed with _Alloc_traits::destroy().
@@ -1469,16 +1468,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	void
 	_M_insert_aux(iterator __position, _Arg&& __arg);
 
+  template
+	void
+	_M_realloc_insert(iterator __position, _Args&&... __args);
+
   // Either move-construct at the end, or forward to _M_insert_aux.
   iterator
   _M_insert_rval(const_iterator __position, value_type&& __v);
 
-  // Called by push_back(x) and emplace_back(args) when they need to
-  // reallocate.
-  template
-	void
-	_M_emplace_back_aux(_Args&&... __args);
-
   // Try to emplace at the end, otherwise forward to _M_insert_aux.
   template
 	iterator
diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc
index 6e9be7f..b291e95 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -98,7 +98,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	++this->_M_impl._M_finish;
 	  }
 	else
-	  _M_emplace_back_aux(std::forward<_Args>(__args)...);
+	  _M_realloc_insert(end(), std::forward<_Args>(__args)...);
   }
 #endif
 
@@ -112,29 +112,32 @@ 

Re: [PATCH 2/2] gcc/genrecog: Don't warn for missing mode on special predicates

2016-07-06 Thread Andrew Burgess
* Richard Sandiford  [2016-07-04 09:47:20 +0100]:

> Andrew Burgess  writes:
> > +/* Return true if OPERAND is a MATCH_OPERAND using a special predicate
> > +   function.  */
> > +
> > +static bool
> > +special_predicate_operand_p (rtx operand)
> > +{
> > +  if (GET_CODE (operand) == MATCH_OPERAND)
> > +{
> > +  const char *pred_name = predicate_name (operand);
> > +  if (pred_name[0] != 0)
> > +   {
> > + const struct pred_data *pred;
> > +
> > + pred = lookup_predicate (pred_name);
> > + return pred->special;
> 
> Thanks for removing the duplicated error check for unknown predicates.
> I think that error gets reported later though, so we should check for
> null here:
> 
>   return pred && pred->special;
> 
> OK with that change, thanks.

Richard,

Thanks for the continued reviews.  I don't have GCC write access, so I
wonder if you would be willing to commit this patch for me please.

There's an updated version below that includes the latest change you
suggested.

Many thanks,
Andrew

---

[PATCH] gcc/genrecog: Don't warn for missing mode on special predicates

In md.texi it says:

  Predicates written with @code{define_special_predicate} do not get any
  automatic mode checks, and are treated as having special mode handling
  by @command{genrecog}.

In genrecog, when validating a SET pattern, there is already a special
case for 'address_operand' which is a special predicate, however,
other special predicates fall through to the code which checks for
incorrect use of VOIDmode.

This commit adds a new function for detecting special predicates, and
then generalises the check in validate_pattern so that mode checking
is skipped for all special predicates.

gcc/ChangeLog:

* genrecog.c (special_predicate_operand_p): New function.
(predicate_name): Move function.
(validate_pattern): Don't warn about missing mode for all
define_special_predicate predicates.
---
 gcc/ChangeLog  |  7 +++
 gcc/genrecog.c | 50 +++---
 2 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index a9f5a4a..056798c 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -463,6 +463,38 @@ constraints_supported_in_insn_p (rtx insn)
   || GET_CODE (insn) == DEFINE_PEEPHOLE2);
 }
 
+/* Return the name of the predicate matched by MATCH_RTX.  */
+
+static const char *
+predicate_name (rtx match_rtx)
+{
+  if (GET_CODE (match_rtx) == MATCH_SCRATCH)
+return "scratch_operand";
+  else
+return XSTR (match_rtx, 1);
+}
+
+/* Return true if OPERAND is a MATCH_OPERAND using a special predicate
+   function.  */
+
+static bool
+special_predicate_operand_p (rtx operand)
+{
+  if (GET_CODE (operand) == MATCH_OPERAND)
+{
+  const char *pred_name = predicate_name (operand);
+  if (pred_name[0] != 0)
+   {
+ const struct pred_data *pred;
+
+ pred = lookup_predicate (pred_name);
+ return pred != NULL && pred->special;
+   }
+}
+
+  return false;
+}
+
 /* Check for various errors in PATTERN, which is part of INFO.
SET is nonnull for a destination, and is the complete set pattern.
SET_CODE is '=' for normal sets, and '+' within a context that
@@ -651,10 +683,9 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, 
int set_code)
dmode = GET_MODE (dest);
smode = GET_MODE (src);
 
-   /* The mode of an ADDRESS_OPERAND is the mode of the memory
-  reference, not the mode of the address.  */
-   if (GET_CODE (src) == MATCH_OPERAND
-   && ! strcmp (XSTR (src, 1), "address_operand"))
+   /* Mode checking is not performed for special predicates.  */
+   if (special_predicate_operand_p (src)
+   || special_predicate_operand_p (dest))
  ;
 
 /* The operands of a SET must have the same mode unless one
@@ -3788,17 +3819,6 @@ operator < (const pattern_pos , const pattern_pos )
   return diff < 0;
 }
 
-/* Return the name of the predicate matched by MATCH_RTX.  */
-
-static const char *
-predicate_name (rtx match_rtx)
-{
-  if (GET_CODE (match_rtx) == MATCH_SCRATCH)
-return "scratch_operand";
-  else
-return XSTR (match_rtx, 1);
-}
-
 /* Add new decisions to S that check whether the rtx at position POS
matches PATTERN.  Return the state that is reached in that case.
TOP_PATTERN is the overall pattern, as passed to match_pattern_1.  */
-- 
2.5.1



Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-06 Thread Michael Meissner
On Tue, Jul 05, 2016 at 09:26:50PM -0500, Peter Bergner wrote:
> The following patch fixes a bug where we do not disable POWER9 vector dform
> addressing when we compile for POWER9 but without VSX support.  This 
> manifested
> itself with us trying to use dform addressing with altivec loads/stores
> which is illegal, leading to an ICE.
> 
> This has bootstrapped and regtested with no regessions.  Ok for trunk?
> 
> This also affects the FSF 6 branch, ok there too, assuming bootstrap and
> regtesting complete cleanly?
> 
> Peter
> 
> gcc/
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Disable
>   -mpower9-dform-vector when disabling -mpower9-vector.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/pr71733.c: New test.
> 
> 
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 237945)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -4303,7 +4303,8 @@ rs6000_option_override_internal (bool gl
>  {
>if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
>   error ("-mpower9-vector requires -mpower8-vector");
> -  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
> +  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR
> + | OPTION_MASK_P9_DFORM_VECTOR);
>  }

Note, this should be

+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR_SCALAR
+   | OPTION_MASK_P9_DFORM_VECTOR);

However, we probably need to add all of the other options that depend on VSX.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[PATCH] simplify-rtx.c: start adding selftests (v2)

2016-07-06 Thread David Malcolm
On Sun, 2016-07-03 at 18:12 +0100, Richard Sandiford wrote:
> David Malcolm  writes:
> > This patch starts adding selftests to simplify-rtx.c, to ensure
> > that
> > RTL expressions are simplified as we expect.
> >
> > It adds a new ASSERT_RTX_EQ macro that checks for pointer equality
> > of two rtx values.  If they're non-equal, it aborts, printing both
> > expressions.
>
> This might be a bit confusing when more tests are added, since
> pointer
> equality is only useful in certain specific cases (e.g. when you know
> you're dealing with CONST_INTs or pseudo registers).  How about
> making
> ASSERT_RTX_EQ check for rtx_equal_p equality and have something like
> ASSERT_RTX_PTR_EQ for cases where pointer equality really is needed?

> Also, how about using LAST_VIRTUAL_REGISTER + 1 as the base for
> register numbers?  DImode might not be valid for register 0 on
> all targets.

Thanks.  Here's an updated version which adds both ASSERT_RTX_EQ
and ASSERT_RTX_PTR_EQ.  The simplify-rtx.c tests can use the
stricter pointer equality test, so I updated them to use
ASSERT_RTX_PTR_EQ condition.

I added a selftest::make_test_reg to allocate pseudo regs, starting
at LAST_VIRTUAL_REGISTER + 1.

Successfully bootstrapped on x86_64-pc-linux-gnu

OK for trunk if it passes config-list.mk testing?

gcc/ChangeLog:
* Makefile.in (OBJS): Add selftest-rtl.o.
* selftest-rtl.c: New file.
* selftest-run-tests.c (selftest::run_tests): Add call to
simplify_rtx_c_tests.
* selftest.c (selftest::begin_fail): New function.
(selftest::fail): Reimplement in terms of begin_fail.
(selftest::fail_formatted): Likewise.
* selftest.h (selftest::begin_fail): New declaration.
(selftest::assert_rtx_eq): New declaration.
(selftest::assert_rtx_ptr_eq): New declaration.
(selftest::simplify_rtx_c_tests): New declaration.
(ASSERT_RTX_EQ): New macro.
(ASSERT_RTX_PTR_EQ): New macro.
* simplify-rtx.c: Include selftest.h.
(selftest::make_test_reg): New function.
(selftest::test_sign_bits): New function.
(selftest::test_unary): New function.
(selftest::test_binary): New function.
(selftest::test_ternary): New function.
(selftest::run_tests_for_mode): New function.
(selftest::simplify_rtx_c_tests): New function.
---
 gcc/Makefile.in  |   1 +
 gcc/selftest-rtl.c   |  85 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.c   |  17 --
 gcc/selftest.h   |  39 ++
 gcc/simplify-rtx.c   | 138 +++
 6 files changed, 277 insertions(+), 4 deletions(-)
 create mode 100644 gcc/selftest-rtl.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ca7b1f6..de085b0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
sel-sched-ir.o \
sel-sched-dump.o \
sel-sched.o \
+   selftest-rtl.o \
selftest-run-tests.o \
sese.o \
shrink-wrap.o \
diff --git a/gcc/selftest-rtl.c b/gcc/selftest-rtl.c
new file mode 100644
index 000..20f4c21
--- /dev/null
+++ b/gcc/selftest-rtl.c
@@ -0,0 +1,85 @@
+/* Selftest support for RTL.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "selftest.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+
+#if CHECKING_P
+
+/* Helper function for selftest::assert_rtx_eq and selftest::assert_rtx_ptr_eq.
+   Print VAL_EXPECTED and VAL_ACTUAL to stderr.  */
+
+static void
+print_non_equal_rtx (rtx val_expected, rtx val_actual)
+{
+  fprintf (stderr, "  expected=%p:\n", (void *)val_expected);
+  print_rtl (stderr, val_expected);
+  fprintf (stderr, "\n  actual=%p:\n", (void *)val_actual);
+  print_rtl (stderr, val_actual);
+  fprintf (stderr, "\n");
+}
+
+/* Implementation detail of ASSERT_RTX_EQ.  If val_expected and val_actual
+   fail rtx_equal_p, print the location, "FAIL: ", and print the
+   mismatching RTL expressions to stderr, then abort.  */
+
+void
+selftest::assert_rtx_eq (const location ,
+const char *desc_expected, const char *desc_actual,
+rtx val_expected, rtx val_actual)
+{
+  if 

Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-06 Thread Peter Bergner

On 7/6/16 12:53 PM, David Edelsohn wrote:

On Tue, Jul 5, 2016 at 10:26 PM, Peter Bergner  wrote:

The following patch fixes a bug where we do not disable POWER9 vector dform
addressing when we compile for POWER9 but without VSX support.  This manifested
itself with us trying to use dform addressing with altivec loads/stores
which is illegal, leading to an ICE.


Peter,

DFORM definitely should be disabled without VSX, but the patch seems
incomplete.  If VSX and DFORM are enabled, and GCC chooses an Altivec
instruction alternative in a pattern, what is to prevent the
generation of a DFORM address?


That's a good question.  I'm currently attempting to find out why we
seem to think reg+offset is ok.  With -mcpu=power8 -mno-vsx, we use
reg+reg addressing right from expand.  With -mcpu=power9 -mno-vsx,
we use reg+offset right from expand.  I'll see where the disconnect
is happening.

Peter





Re: [v3 PATCH] Add a new header for diagnosing the use of C++17 facilities in pre-C++17 modes.

2016-07-06 Thread Jonathan Wakely

On 06/07/16 20:11 +0300, Ville Voutilainen wrote:

Add a new header for diagnosing the use of C++17 facilities
in pre-C++17 modes.
* include/bits/c++17_warning.h:New.


Urgh, the test in the header is completely wrong. Fixed patch attached.


OK for trunk.



Re: [PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-06 Thread David Edelsohn
On Tue, Jul 5, 2016 at 10:26 PM, Peter Bergner  wrote:
> The following patch fixes a bug where we do not disable POWER9 vector dform
> addressing when we compile for POWER9 but without VSX support.  This 
> manifested
> itself with us trying to use dform addressing with altivec loads/stores
> which is illegal, leading to an ICE.

Peter,

DFORM definitely should be disabled without VSX, but the patch seems
incomplete.  If VSX and DFORM are enabled, and GCC chooses an Altivec
instruction alternative in a pattern, what is to prevent the
generation of a DFORM address?

Thanks, David


Go patch committed: Implement escape analysis tag phase

2016-07-06 Thread Ian Lance Taylor
This patch by Chris Manghane implements the tag phase in escape
analysis.  Escape analysis is still not enabled by default.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 237453)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-1f2f2c77c7ec92efa254e07162a8fc0d22a550e7
+c8fdad389ce6f439a02fb654d231053b47ff4e02
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 237453)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -2194,15 +2194,103 @@ Gogo::propagate_escape(Escape_context* c
 }
 }
 
+class Escape_analysis_tag
+{
+ public:
+  Escape_analysis_tag(Escape_context* context)
+: context_(context)
+  { }
+
+  // Add notes to the function's type about the escape information of its
+  // input parameters.
+  void
+  tag(Named_object* fn);
+
+ private:
+  Escape_context* context_;
+};
+
+void
+Escape_analysis_tag::tag(Named_object* fn)
+{
+  // External functions are assumed unsafe
+  // unless //go:noescape is given before the declaration.
+  if (fn->package() != NULL || !fn->is_function())
+{
+  // TODO(cmang): Implement //go:noescape directive for external functions;
+  // mark input parameters as not escaping.
+  return;
+}
+
+  Function_type* fntype = fn->func_value()->type();
+  Bindings* bindings = fn->func_value()->block()->bindings();
+
+  if (fntype->is_method()
+  && !fntype->receiver()->name().empty()
+  && !Gogo::is_sink_name(fntype->receiver()->name()))
+{
+  Named_object* rcvr_no = bindings->lookup(fntype->receiver()->name());
+  go_assert(rcvr_no != NULL);
+  Node* rcvr_node = Node::make_node(rcvr_no);
+  switch ((rcvr_node->encoding() & ESCAPE_MASK))
+   {
+   case Node::ESCAPE_NONE: // not touched by flood
+   case Node::ESCAPE_RETURN:
+ if (fntype->receiver()->type()->has_pointer())
+   // Don't bother tagging for scalars.
+   fntype->add_receiver_note(rcvr_node->encoding());
+ break;
+
+   case Node::ESCAPE_HEAP: // flooded, moved to heap.
+   case Node::ESCAPE_SCOPE: // flooded, value leaves scope.
+ break;
+
+   default:
+ break;
+   }
+}
+
+  int i = 0;
+  if (fntype->parameters() != NULL)
+{
+  const Typed_identifier_list* til = fntype->parameters();
+  for (Typed_identifier_list::const_iterator p = til->begin();
+  p != til->end();
+  ++p, ++i)
+   {
+ if (p->name().empty() || Gogo::is_sink_name(p->name()))
+   continue;
+
+ Named_object* param_no = bindings->lookup(p->name());
+ go_assert(param_no != NULL);
+ Node* param_node = Node::make_node(param_no);
+ switch ((param_node->encoding() & ESCAPE_MASK))
+   {
+   case Node::ESCAPE_NONE: // not touched by flood
+   case Node::ESCAPE_RETURN:
+ if (p->type()->has_pointer())
+   // Don't bother tagging for scalars.
+   fntype->add_parameter_note(i, param_node->encoding());
+ break;
+
+   case Node::ESCAPE_HEAP: // flooded, moved to heap.
+   case Node::ESCAPE_SCOPE: // flooded, value leaves scope.
+ break;
+
+   default:
+ break;
+   }
+   }
+}
+  fntype->set_is_tagged();
+}
 
 // Tag each top-level function with escape information that will be used to
 // retain analysis results across imports.
 
 void
-Gogo::tag_function(Escape_context*, Named_object*)
+Gogo::tag_function(Escape_context* context, Named_object* fn)
 {
-  // TODO(cmang): Create escape information notes for each input and output
-  // parameter in a given function.
-  // Escape_analysis_tag eat(context, fn);
-  // this->traverse();
+  Escape_analysis_tag eat(context);
+  eat.tag(fn);
 }


Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-07-06 Thread Bernd Schmidt

On 07/05/2016 12:41 PM, Richard Biener wrote:

On Fri, Jul 1, 2016 at 3:10 PM, Manish Goregaokar  wrote:

Added a test:


Ok if this passed bootstrap/regtest.



+  return flag_delete_null_pointer_checks
+&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
+|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
 case PLUS_EXPR:


But please fix the wrapping - multi-line expressions like this should be 
enclosed in parentheses to make the editor deal with them correctly.



Bernd



Re: [v3 PATCH] Add a new header for diagnosing the use of C++17 facilities in pre-C++17 modes.

2016-07-06 Thread Ville Voutilainen
On 6 July 2016 at 17:44, Ville Voutilainen  wrote:
> 2016-07-06  Ville Voutilainen  
>
> Add a new header for diagnosing the use of C++17 facilities
> in pre-C++17 modes.
> * include/bits/c++17_warning.h:New.

Urgh, the test in the header is completely wrong. Fixed patch attached.
diff --git a/libstdc++-v3/include/bits/c++17_warning.h 
b/libstdc++-v3/include/bits/c++17_warning.h
new file mode 100644
index 000..66ac196
--- /dev/null
+++ b/libstdc++-v3/include/bits/c++17_warning.h
@@ -0,0 +1,37 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/c++17_warning.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{iosfwd}
+ */
+
+#ifndef _CXX17_WARNING_H
+#define _CXX17_WARNING_H 1
+
+#if __cplusplus <= 201402L
+#error This file requires compiler and library support \
+for the ISO C++ 2017 standard. This support must be enabled \
+with the -std=c++17 or -std=gnu++17 compiler options.
+#endif
+
+#endif


Re: [PATCH, ARM 6/7, ping1] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline

2016-07-06 Thread Thomas Preudhomme
On Friday 20 May 2016 14:22:48 Kyrill Tkachov wrote:
> Hi Thomas,
> 

> 
> Hmm, I'm not a fan of this change. arm_print_operand_punct_valid_p is an
> implementation of a target hook that is used to validate user-provided
> inline asm as well and is therefore the right place to reject such invalid
> constructs.
> 
> This is just working around the fact that the output template for the
> [u]divsi3 patterns has a '%?' in it that is illegal in Thumb1 and will not
> be used for ARMv8-M Baseline anyway. I'd prefer it if you add a second
> alternative to those patterns and emit the sdiv/udiv mnemonic without the
> '%?' and enable that for the v8mb arch attribute (and mark the existing
> alternative as requiring the "32" arch attribute).

Fixed.

> 
> s/TARGET_HAVE_MOVT/TARGET_HAVE_CBZ/

Likewise.

> > +  [(set (attr "far_jump")
> > +   (if_then_else
> > +   (eq_attr "length" "8")
> > +   (const_string "yes")
> > +   (const_string "no")))
> > +   (set (attr "length")
> > +   (if_then_else
> > +   (and (ge (minus (match_dup 2) (pc)) (const_int 2))
> > +(le (minus (match_dup 2) (pc)) (const_int 128))
> > +(not (match_test "which_alternative")))
> 
> This pattern only has one alternative so "which_alternative"
> will always be 0, so the (not (match_test "which_alternative"))
> test inside the 'and' is redundant and can be removed.

Ditto.

Please find updated ChangeLog entries:

*** gcc/ChangeLog ***

2016-05-23  Thomas Preud'homme  

* config/arm/arm.h (TARGET_HAVE_CBZ): Define.
(TARGET_IDIV): Set for all Thumb targets provided they have hardware
divide feature.
* config/arm/arm.md (divsi3): New unpredicable alternative for ARMv8-M
Baseline.  Make initial alternative TARGET_32BIT only.
(udivsi3): Likewise.
* config/arm/thumb1.md (thumb1_cbz): New insn.
* doc/sourcebuild.texi (arm_thumb1_cbz_ok): Document new effective
target.


*** gcc/testsuite/ChangeLog ***

2016-05-23  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_thumb1_cbz_ok):
Add new arm_thumb1_cbz_ok effective target.
* gcc.target/arm/cbz.c: New test.


Updated patch in attachment.

Best regards,

Thomasdiff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 5f8cfb07a841f7da39cd9e1c2c675dddb807a64f..4b8b7b6b96bdb697a3856ce4f56656b572c6bd22 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -271,9 +271,12 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* Nonzero if this chip provides the MOVW and MOVT instructions.  */
 #define TARGET_HAVE_MOVT	(arm_arch_thumb2 || arm_arch8)
 
+/* Nonzero if this chip provides the CBZ and CBNZ instructions.  */
+#define TARGET_HAVE_CBZ		(arm_arch_thumb2 || arm_arch8)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV	((TARGET_ARM && arm_arch_arm_hwdiv)	\
-			 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
+			 || (TARGET_THUMB && arm_arch_thumb_hwdiv))
 
 /* Nonzero if disallow volatile memory access in IT block.  */
 #define TARGET_NO_VOLATILE_CE		(arm_arch_no_volatile_ce)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 3f97c2ad68661e6a0f52ce4fb89f52ab73943a6e..b94c3626d4e82907eb9f81b9d56fc2ea006f9e08 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4333,23 +4333,29 @@
 
 ;; Division instructions
 (define_insn "divsi3"
-  [(set (match_operand:SI	  0 "s_register_operand" "=r")
-	(div:SI (match_operand:SI 1 "s_register_operand"  "r")
-		(match_operand:SI 2 "s_register_operand"  "r")))]
+  [(set (match_operand:SI	  0 "s_register_operand" "=r,r")
+	(div:SI (match_operand:SI 1 "s_register_operand"  "r,r")
+		(match_operand:SI 2 "s_register_operand"  "r,r")))]
   "TARGET_IDIV"
-  "sdiv%?\t%0, %1, %2"
-  [(set_attr "predicable" "yes")
+  "@
+   sdiv%?\t%0, %1, %2
+   sdiv\t%0, %1, %2"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
(set_attr "type" "sdiv")]
 )
 
 (define_insn "udivsi3"
-  [(set (match_operand:SI	   0 "s_register_operand" "=r")
-	(udiv:SI (match_operand:SI 1 "s_register_operand"  "r")
-		 (match_operand:SI 2 "s_register_operand"  "r")))]
+  [(set (match_operand:SI	   0 "s_register_operand" "=r,r")
+	(udiv:SI (match_operand:SI 1 "s_register_operand"  "r,r")
+		 (match_operand:SI 2 "s_register_operand"  "r,r")))]
   "TARGET_IDIV"
-  "udiv%?\t%0, %1, %2"
-  [(set_attr "predicable" "yes")
+  "@
+   udiv%?\t%0, %1, %2
+   udiv\t%0, %1, %2"
+  [(set_attr "arch" "32,v8mb")
+   (set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
(set_attr "type" "udiv")]
 )
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 47e569d0c259cd17d86a03061e5b47b3dab4579f..0a8c364cc01aad34849f57879f9f18f9b92851c0 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -973,6 +973,91 @@
   DONE;
 })
 
+;; A 

Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Jan Hubicka
> > Yeah, I think the other comment should be adjusted accordingly.  I
> > didn't remember we have that one extra bit either ... ;)  (given wide-ints
> > have unsigned variants of ops I wonder if it is really necessary, but who
> > knows - the wide-int rep w/o a sign is really sth odd and I blame RTL 
> > for it).

Given that actual instructions do not have signed/unsigned types on them, and
RTL is an abstract model of target machine language, signedless operations do
make sense to me. At Gimple level we could probably wrap this further and
make sign part of the type and also have signed/unsigned types so one
can write normal C expressions rather than looking up wi:: api to do operations
where sign matters.

The consequences of the lack of sign was definitely dragging me away from wide
ints for a while. After finally giving it some tought this weekend it makes
sense, but it is not obvoius.
> 
> Hmm, yeah.  We definitely need the extra bit for widest_int, but I'm
> not sure why we need it for wide_int.

Yep, extra bit in wide int seems useless at least from highlevel POV :)

Honza
> 
> Thanks,
> Richard


Re: [PATCH, libgcc/ARM 1a/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-07-06 Thread Ramana Radhakrishnan
On Fri, Jun 17, 2016 at 6:21 PM, Thomas Preudhomme
 wrote:
> On Wednesday 01 June 2016 10:00:52 Ramana Radhakrishnan wrote:
>> Please fix up the macros, post back and redo the test. Otherwise this
>> is ok from a quick read.
>
> What about the updated patch in attachment? As for the original patch, I've
> checked that code generation does not change for a number of combinations of
> ISAs (ARM/Thumb), optimization levels (Os/O2), and architectures (armv4,
> armv4t, armv5, armv5t, armv5te, armv6, armv6j, armv6k, armv6s-m, armv6kz,
> armv6t2, armv6z, armv6zk, armv7, armv7-a, armv7e-m, armv7-m, armv7-r, armv7ve,
> armv8-a, armv8-a+crc, iwmmxt and iwmmxt2).
>
> Note, I renumbered this patch 1a to not make the numbering of other patches
> look strange. The CLZ part is now in patch 1b/7.
>
> ChangeLog entries are now as follow:
>
>
> *** gcc/ChangeLog ***
>
> 2016-05-23  Thomas Preud'homme  
>
> * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
> decide whether to prevent some libgcc routines being included for some
> multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
> link between this condition and the one in
> libgcc/config/arm/lib1func.S.
>
>
> *** gcc/testsuite/ChangeLog ***
>
> 2015-11-10  Thomas Preud'homme  
>
> * lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
> __ARM_ARCH_ISA_ARM to test for Cortex-M devices.
>
>
> *** libgcc/ChangeLog ***
>
> 2016-06-01  Thomas Preud'homme  
>
> * config/arm/bpabi-v6m.S: Clarify what architectures is the
> implementation suitable for.
> * config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
> for all Thumb-1 only targets.
> (NOT_ISA_TARGET_32BIT): Define for Thumb-1 only targets.
> (THUMB_LDIV0): Test for NOT_ISA_TARGET_32BIT rather than
> __ARM_ARCH_6M__.
> (EQUIV): Likewise.
> (ARM_FUNC_ALIAS): Likewise.
> (umodsi3): Add check to __ARM_ARCH_ISA_THUMB != 1 to guard the idiv
> version.
> (modsi3): Likewise.
> (clzsi2): Test for NOT_ISA_TARGET_32BIT rather than __ARM_ARCH_6M__.
> (clzdi2): Likewise.
> (ctzsi2): Likewise.
> (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
> __ARM_ARCH_6M__ in guard for checking whether it is defined.
> (final includes): Test for NOT_ISA_TARGET_32BIT rather than
> __ARM_ARCH_6M__ and add comment to indicate the connection between
> this condition and the one in gcc/config/arm/elf.h.
> * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
> __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
> * config/arm/t-softfp: Likewise.
>
>
> Best regards,
>
> Thomas

OK - thanks for your patience and sorry about the delay.

Ramana


Re: [PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-07-06 Thread Ramana Radhakrishnan
On Mon, Jun 27, 2016 at 5:51 PM, Thomas Preudhomme
 wrote:
> Hi Ramana,
>
> On Wednesday 01 June 2016 10:00:52 Ramana Radhakrishnan wrote:
>>
>> From here down to 
>>
>> > -#if ((__ARM_ARCH__ > 5) && !defined(__ARM_ARCH_6M__)) \
>> > -|| defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
>> > -|| defined(__ARM_ARCH_5TEJ__)
>> > -#define HAVE_ARM_CLZ 1
>> > -#endif
>> > -
>> >
>> >  #ifdef L_clzsi2
>> >
>> > -#if defined(__ARM_ARCH_6M__)
>> > +#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
>> >
>> >  FUNC_START clzsi2
>> >
>> > mov r1, #28
>> > mov r3, #1
>> >
>> > @@ -1544,7 +1538,7 @@ FUNC_START clzsi2
>> >
>> > FUNC_END clzsi2
>> >
>> >  #else
>> >  ARM_FUNC_START clzsi2
>> >
>> > -# if defined(HAVE_ARM_CLZ)
>> > +# if defined(__ARM_FEATURE_CLZ)
>> >
>> > clz r0, r0
>> > RET
>> >
>> >  # else
>> >
>> > @@ -1568,15 +1562,15 @@ ARM_FUNC_START clzsi2
>> >
>> >  .align 2
>> >  1:
>> >  .byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
>> >
>> > -# endif /* !HAVE_ARM_CLZ */
>> > +# endif /* !__ARM_FEATURE_CLZ */
>> >
>> > FUNC_END clzsi2
>> >
>> >  #endif
>> >  #endif /* L_clzsi2 */
>> >
>> >  #ifdef L_clzdi2
>> >
>> > -#if !defined(HAVE_ARM_CLZ)
>> > +#if !defined(__ARM_FEATURE_CLZ)
>>
>> here should be it's own little patchlet and can  go in separately.
>
> The patch in attachment changes the CLZ availability check in libgcc to test
> ISA supported and architecture version rather than encode a specific list of
> architectures. __ARM_FEATURE_CLZ is not used because its value depends on what
> mode the user is targeting but only the architecture support matters in this
> case. Indeed, the code using CLZ is written in assembler and uses mnemonics
> available both in ARM and Thumb mode so only CLZ availability in one of the
> mode matters.
>
> This change was split out from [PATCH, GCC, ARM 1/7] Fix Thumb-1 only ==
> ARMv6-M & Thumb-2 only == ARMv7-M assumptions.
>
> ChangeLog entry is as follows:
>
> *** libgcc/ChangeLog ***
>
> 2016-06-16  Thomas Preud'homme  
>
> * config/arm/lib1funcs.S (HAVE_ARM_CLZ): Define for ARMv6* or later
> and ARMv5t* rather than for a fixed list of architectures.
>
> Looking for code generation change accross a number of combinations of ISAs
> (ARM/Thumb), optimization levels (Os/O2), and architectures (armv4, armv4t,
> armv5, armv5t, armv5te, armv6, armv6j, armv6k, armv6s-m, armv6kz, armv6t2,
> armv6z, armv6zk, armv7, armv7-a, armv7e-m, armv7-m, armv7-r, armv7ve, armv8-a,
> armv8-a+crc, iwmmxt and iwmmxt2) shows that only ARMv5T is impacted (uses CLZ
> now). This is expected because currently HAVE_ARM_CLZ is not defined for this
> architecture while the ARMv7-a/ARMv7-R Architecture Reference Manual [1]
> states that all ARMv5T* architectures have CLZ. ARMv5E should also be impacted
> (not using CLZ anymore) but testing it is difficult since current binutils 
> does
> not support ARMv5E.
>
> [1] Document ARM DDI0406C in http://infocenter.arm.com
>
> Best regards,
>
> Thomas



OK.

Ramana


Re: [PATCH][ARM] Add support for some ARMv8-A cores to driver-arm.c

2016-07-06 Thread Ramana Radhakrishnan
On Wed, Jun 22, 2016 at 10:38 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This patch adds entries to the arm_cpu_table in driver-arm.c to enable it to
> perform native CPU detection
> on some aarch32 ARMv8-A systems. The cores added are Cortex-A32, Cortex-A35,
> Cortex-A53, Cortex-A57,
> Cortex-A72, Cortex-A73.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf.
>
> Ok for trunk?
>

OK.

Ramana
> Thanks,
> Kyrill
>
> 2016-06-22  Kyrylo Tkachov  
>
> * config/arm/driver-arm.c (arm_cpu_table): Add entries for cortex-a32,
> cortex-a35, cortex-a53, cortex-a57, cortex-a72, cortex-a73.


Re: [AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-06 Thread Christophe Lyon
On 6 July 2016 at 17:44, Kyrill Tkachov  wrote:
> Hi all,
>
>
> On 06/07/16 16:29, James Greenhalgh wrote:
>>
>> On Wed, Jul 06, 2016 at 02:11:51PM +0100, Jiong Wang wrote:
>>>
>>> The current vmaxnm/vminnm float intrinsics are implemented using
>>> __builtin_aarch64_smax/min  which are mapping to backend patterns
>>> using smin/smax rtl operators.  However as documented in rtl.def
>>>
>>>"Further, if both operands are zeros, or if either operand is NaN,
>>> then
>>>it is unspecified which of the two operands is returned as the
>>> result."
>>>
>>> There is no guarantee that a number will always be returned through
>>> smin/smax operator, and further tests show gcc will optimize something
>>> like smin (1.0f, Nan) to Nan, so current the vmaxnm and vminnm intrinsics
>>> will evetually fail the new added testcases included in this patch.
>>>
>>> This patch:
>>>
>>>* Migrate vminnm/vmaxnm float intrinsics to "3" pattern
>>>  which guarantee fminnm/fmaxnm sematics.
>>>
>>>* Add new testcases for vminnm and vmaxnm intrinsics which were
>>> missing
>>>  previously.  They are marked as XFAIL on arm*-*-* as ARM hasn't
>>>  implemented these intrinsics.
>>>
>>> OK for trunk?
>>
>> The AArch64 parts are OK. I can't remember whether the ARM port prefers
>> to have missing intrinsics XFAIL'd or if there is another way to disable
>> the tests that are not supported there. Kyrill/Christophe would you mind
>> commenting on whether this patch is correct for the intrinsics testsuite?
>>
>> Thanks,
>> James
>>
>>>
>>> 2016-07-06  Jiong Wang  
>>>
>>> gcc/
>>>* config/aarch64/aarch64-simd-builtins.def (smax): Remove float
>>> variants.
>>>(smin): Likewise.
>>>(fmax): New entry.
>>>(fmin): Likewise.
>>>* config/aarch64/arm_neon.h (vmaxnm_f32): Use
>>> __builtin_aarch64_fmaxv2sf.
>>>(vmaxnmq_f32): Likewise.
>>>(vmaxnmq_f64): Likewise.
>>>(vminnm_f32): Likewise.
>>>(vminnmq_f32): Likewise.
>>>(vminnmq_f64): Likewise.
>
>
> These intrinsics are supposed to be available for arm as well *except* for
> vminnmq_f64, vmaxnmq_f64.
>
I missed that point.
So, I agree with Kyrill:
- skip the ones that aren't supposed to be available for arm
- xfail the ones that aren't implemented yet.

Christophe


> For the intrinsics that should be implemented but aren't can you please file
> a bug report.?
> I see your patch doesn't xfail the test on arm, just skips it (so it appears
> unsupported).
>
> My preferred course of action is to guard the vminmq_f64, vmaxnmq_f64 parts
> of the test
> with #ifdef __aarch64__ and xfail the whole test for arm, with something
> like this:
>
> { dg-xfail-if "Intrinsics not yet implemented on arm  PR>" { arm*-*-* } }
>
> I do believe an XFAIL is appropriate here as the intrinsics should exist on
> arm, but don't
> currently due to a missed-implementation bug.
>
> Thanks,
> Kyrill
>
>
>>> gcc/testsuite/
>>>* gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: Support
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: Define
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vhsub.c: Define
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vmax.c: Define
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vmin.c: Define
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define
>>> HAS_INTEGER_VARIANT.
>>>* gcc.target/aarch64/advsimd-intrinsics/vmaxnm.c: New.
>>>* gcc.target/aarch64/advsimd-intrinsics/vminnm.c: New.
>
>


Re: [AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-06 Thread Kyrill Tkachov

Hi all,

On 06/07/16 16:29, James Greenhalgh wrote:

On Wed, Jul 06, 2016 at 02:11:51PM +0100, Jiong Wang wrote:

The current vmaxnm/vminnm float intrinsics are implemented using
__builtin_aarch64_smax/min  which are mapping to backend patterns
using smin/smax rtl operators.  However as documented in rtl.def

   "Further, if both operands are zeros, or if either operand is NaN, then
   it is unspecified which of the two operands is returned as the result."

There is no guarantee that a number will always be returned through
smin/smax operator, and further tests show gcc will optimize something
like smin (1.0f, Nan) to Nan, so current the vmaxnm and vminnm intrinsics
will evetually fail the new added testcases included in this patch.

This patch:

   * Migrate vminnm/vmaxnm float intrinsics to "3" pattern
 which guarantee fminnm/fmaxnm sematics.

   * Add new testcases for vminnm and vmaxnm intrinsics which were missing
 previously.  They are marked as XFAIL on arm*-*-* as ARM hasn't
 implemented these intrinsics.

OK for trunk?

The AArch64 parts are OK. I can't remember whether the ARM port prefers
to have missing intrinsics XFAIL'd or if there is another way to disable
the tests that are not supported there. Kyrill/Christophe would you mind
commenting on whether this patch is correct for the intrinsics testsuite?

Thanks,
James
  

2016-07-06  Jiong Wang  

gcc/
   * config/aarch64/aarch64-simd-builtins.def (smax): Remove float variants.
   (smin): Likewise.
   (fmax): New entry.
   (fmin): Likewise.
   * config/aarch64/arm_neon.h (vmaxnm_f32): Use __builtin_aarch64_fmaxv2sf.
   (vmaxnmq_f32): Likewise.
   (vmaxnmq_f64): Likewise.
   (vminnm_f32): Likewise.
   (vminnmq_f32): Likewise.
   (vminnmq_f64): Likewise.


These intrinsics are supposed to be available for arm as well *except* for
vminnmq_f64, vmaxnmq_f64.

For the intrinsics that should be implemented but aren't can you please file a 
bug report.?
I see your patch doesn't xfail the test on arm, just skips it (so it appears 
unsupported).

My preferred course of action is to guard the vminmq_f64, vmaxnmq_f64 parts of 
the test
with #ifdef __aarch64__ and xfail the whole test for arm, with something like 
this:

{ dg-xfail-if "Intrinsics not yet implemented on arm " { 
arm*-*-* } }

I do believe an XFAIL is appropriate here as the intrinsics should exist on 
arm, but don't
currently due to a missed-implementation bug.

Thanks,
Kyrill


gcc/testsuite/
   * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: Support 
HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: Define HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: Define HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vmax.c: Define HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vmin.c: Define HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define HAS_INTEGER_VARIANT.
   * gcc.target/aarch64/advsimd-intrinsics/vmaxnm.c: New.
   * gcc.target/aarch64/advsimd-intrinsics/vminnm.c: New.




Re: [AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-06 Thread Christophe Lyon
On 6 July 2016 at 17:29, James Greenhalgh  wrote:
> On Wed, Jul 06, 2016 at 02:11:51PM +0100, Jiong Wang wrote:
>> The current vmaxnm/vminnm float intrinsics are implemented using
>> __builtin_aarch64_smax/min  which are mapping to backend patterns
>> using smin/smax rtl operators.  However as documented in rtl.def
>>
>>   "Further, if both operands are zeros, or if either operand is NaN, then
>>   it is unspecified which of the two operands is returned as the result."
>>
>> There is no guarantee that a number will always be returned through
>> smin/smax operator, and further tests show gcc will optimize something
>> like smin (1.0f, Nan) to Nan, so current the vmaxnm and vminnm intrinsics
>> will evetually fail the new added testcases included in this patch.
>>
>> This patch:
>>
>>   * Migrate vminnm/vmaxnm float intrinsics to "3" pattern
>> which guarantee fminnm/fmaxnm sematics.
>>
>>   * Add new testcases for vminnm and vmaxnm intrinsics which were missing
>> previously.  They are marked as XFAIL on arm*-*-* as ARM hasn't
>> implemented these intrinsics.
>>
>> OK for trunk?
>
> The AArch64 parts are OK. I can't remember whether the ARM port prefers
> to have missing intrinsics XFAIL'd or if there is another way to disable
> the tests that are not supported there. Kyrill/Christophe would you mind
> commenting on whether this patch is correct for the intrinsics testsuite?
>

Are they really XFAIL? The patch has dg-skip-if "arm*-*-*".

FWIW, there are currently 2 tests with such a dg-skip-if directive.

Other tests which depend on the target have:
dg-require-effective-target arm_crypto_ok
dg-require-effective-target arm_neon_fp16_hw { target { arm*-*-* } }
dg-require-effective-target arm_v8_1a_neon_hw

So I think the dg-skip-if directive this patch contains is OK.

Christophe

> Thanks,
> James
>
>> 2016-07-06  Jiong Wang  
>>
>> gcc/
>>   * config/aarch64/aarch64-simd-builtins.def (smax): Remove float variants.
>>   (smin): Likewise.
>>   (fmax): New entry.
>>   (fmin): Likewise.
>>   * config/aarch64/arm_neon.h (vmaxnm_f32): Use __builtin_aarch64_fmaxv2sf.
>>   (vmaxnmq_f32): Likewise.
>>   (vmaxnmq_f64): Likewise.
>>   (vminnm_f32): Likewise.
>>   (vminnmq_f32): Likewise.
>>   (vminnmq_f64): Likewise.
>>
>> gcc/testsuite/
>>   * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: Support 
>> HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: Define 
>> HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define 
>> HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: Define 
>> HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vmax.c: Define HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vmin.c: Define HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define 
>> HAS_INTEGER_VARIANT.
>>   * gcc.target/aarch64/advsimd-intrinsics/vmaxnm.c: New.
>>   * gcc.target/aarch64/advsimd-intrinsics/vminnm.c: New.
>


Re: [AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-06 Thread James Greenhalgh
On Wed, Jul 06, 2016 at 02:11:51PM +0100, Jiong Wang wrote:
> The current vmaxnm/vminnm float intrinsics are implemented using
> __builtin_aarch64_smax/min  which are mapping to backend patterns
> using smin/smax rtl operators.  However as documented in rtl.def
> 
>   "Further, if both operands are zeros, or if either operand is NaN, then
>   it is unspecified which of the two operands is returned as the result."
> 
> There is no guarantee that a number will always be returned through
> smin/smax operator, and further tests show gcc will optimize something
> like smin (1.0f, Nan) to Nan, so current the vmaxnm and vminnm intrinsics
> will evetually fail the new added testcases included in this patch.
> 
> This patch:
> 
>   * Migrate vminnm/vmaxnm float intrinsics to "3" pattern
> which guarantee fminnm/fmaxnm sematics.
> 
>   * Add new testcases for vminnm and vmaxnm intrinsics which were missing
> previously.  They are marked as XFAIL on arm*-*-* as ARM hasn't
> implemented these intrinsics.
> 
> OK for trunk?

The AArch64 parts are OK. I can't remember whether the ARM port prefers
to have missing intrinsics XFAIL'd or if there is another way to disable
the tests that are not supported there. Kyrill/Christophe would you mind
commenting on whether this patch is correct for the intrinsics testsuite?

Thanks,
James
 
> 2016-07-06  Jiong Wang  
> 
> gcc/
>   * config/aarch64/aarch64-simd-builtins.def (smax): Remove float variants.
>   (smin): Likewise.
>   (fmax): New entry.
>   (fmin): Likewise.
>   * config/aarch64/arm_neon.h (vmaxnm_f32): Use __builtin_aarch64_fmaxv2sf.
>   (vmaxnmq_f32): Likewise.
>   (vmaxnmq_f64): Likewise.
>   (vminnm_f32): Likewise.
>   (vminnmq_f32): Likewise.
>   (vminnmq_f64): Likewise.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: Support 
> HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: Define 
> HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: Define HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vmax.c: Define HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vmin.c: Define HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define HAS_INTEGER_VARIANT.
>   * gcc.target/aarch64/advsimd-intrinsics/vmaxnm.c: New.
>   * gcc.target/aarch64/advsimd-intrinsics/vminnm.c: New.



[v3 PATCH] Add a new header for diagnosing the use of C++17 facilities in pre-C++17 modes.

2016-07-06 Thread Ville Voutilainen
2016-07-06  Ville Voutilainen  

Add a new header for diagnosing the use of C++17 facilities
in pre-C++17 modes.
* include/bits/c++17_warning.h:New.
diff --git a/libstdc++-v3/include/bits/c++17_warning.h 
b/libstdc++-v3/include/bits/c++17_warning.h
new file mode 100644
index 000..e7cca46
--- /dev/null
+++ b/libstdc++-v3/include/bits/c++17_warning.h
@@ -0,0 +1,37 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/c++17_warning.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{iosfwd}
+ */
+
+#ifndef _CXX17_WARNING_H
+#define _CXX17_WARNING_H 1
+
+#if __cplusplus > 201402L
+#error This file requires compiler and library support \
+for the ISO C++ 2017 standard. This support must be enabled \
+with the -std=c++17 or -std=gnu++17 compiler options.
+#endif
+
+#endif


[PATCH v2] x86: allow to suppress default clobbers added to asm()s

2016-07-06 Thread Jan Beulich
While it always seemed wrong to me that there's no way to avoid the
default "flags" and "fpsr" clobbers, the regression the fix for
PR/60663 introduced (see PR/63637) makes it even more desirable to have
such a mechanism: This way, at least asm()s with a single output and no
explicit clobbers could again have been made subject to CSE even with
that bug unfixed.
---
There wasn't much feedback on v1
(https://gcc.gnu.org/ml/gcc-patches/2014-10/msg03251.html)
and the feedback I did get from Jeff I didn't really mean to address
in this version:

> I really don't like having an option that's globally applied for this
> feature. THough I am OK with having a mechanism to avoid
> implicit clobbers on specific ASMs.

I don't really understand what's wrong with a command line option
allowing to state this globally for a source file or even entire project.

> Why use negative numbers for the hard register numbers? I
> wouldn't be at all surprised if lots of random code assumes
> register numbers are always positive.

I'm lacking an idea (or suggestion) of a better alternative. Using
positive numbers resulted in far more problems, as such registers
then got accepted elsewhere as valid too.

> I don't like adding new registers with special names like !foo.
> Instead I think that listing "!cc" or something similar in the asm
> itself if it doesn't clobber the cc register would be better. 

I didn't really understand what was meant here, i.e. how the
proposed alternative was supposed to look like in an actual
asm().

gcc/
2016-07-06  Jan Beulich  

* cfgexpand.c (expand_asm_stmt): Cope with negative register
numbers.
* config/i386/i386.c (ix86_target_string): Add
-mno-default-asm-clobbers.
(ix86_valid_target_attribute_inner_p): Handle
-m{,no-}default-asm-clobbers.
(ix86_md_asm_adjust): Handle "inverse" clobbers.
* config/i386/i386.h (NOFLAGS_REGNUM, NOFPSR_REGNUM): Define.
(ADDITIONAL_REGISTER_NAMES): Add "!flags" and "!fpsr".
(OVERLAPPING_REGISTER_NAMES): Define.
* config/i386/i386.opt: Add mdefault-asm-clobbers and
mno-default-asm-clobbers.
* varasm.c (decode_reg_name_and_count): Permit negative
register numbers from ADDITIONAL_REGISTER_NAMES.

gcc/testsuite/
2016-07-06  Jan Beulich  

* gcc.target/i386/20060218-1.c: Adjust expected error.
* gcc.target/i386/invclbr[123].c: New.

--- 2016-06-30/gcc/cfgexpand.c
+++ 2016-06-30/gcc/cfgexpand.c
@@ -2884,26 +2884,18 @@ expand_asm_stmt (gasm *stmt)
  int nregs, j;
 
  j = decode_reg_name_and_count (regname, );
- if (j < 0)
+ if (j == -2)
{
- if (j == -2)
-   {
- /* ??? Diagnose during gimplification?  */
- error ("unknown register name %qs in %", regname);
-   }
- else if (j == -4)
-   {
- rtx x = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
- clobber_rvec.safe_push (x);
-   }
- else
-   {
- /* Otherwise we should have -1 == empty string
-or -3 == cc, which is not a register.  */
- gcc_assert (j == -1 || j == -3);
-   }
+ /* ??? Diagnose during gimplification?  */
+ error ("unknown register name %qs in %", regname);
}
- else
+ else if (j == -4)
+   {
+ rtx x = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
+ clobber_rvec.safe_push (x);
+   }
+ else if (j != -1 /* empty string */
+  && j != -3 /* cc, which is not a register */)
for (int reg = j; reg < j + nregs; reg++)
  {
/* Clobbering the PIC register is an error.  */
@@ -2915,7 +2907,8 @@ expand_asm_stmt (gasm *stmt)
return;
  }
 
-   SET_HARD_REG_BIT (clobbered_regs, reg);
+   if (reg >= 0)
+ SET_HARD_REG_BIT (clobbered_regs, reg);
rtx x = gen_rtx_REG (reg_raw_mode[reg], reg);
clobber_rvec.safe_push (x);
  }
--- 2016-06-30/gcc/config/i386/i386.c
+++ 2016-06-30/gcc/config/i386/i386.c
@@ -4207,6 +4207,7 @@ ix86_target_string (HOST_WIDE_INT isa, i
 { "-minline-stringops-dynamically",
MASK_INLINE_STRINGOPS_DYNAMICALLY },
 { "-mms-bitfields",MASK_MS_BITFIELD_LAYOUT },
 { "-mno-align-stringops",  MASK_NO_ALIGN_STRINGOPS },
+{ "-mno-default-asm-clobbers", MASK_NO_DEFAULT_ASM_CLOBBERS },
 { "-mno-fancy-math-387",   MASK_NO_FANCY_MATH_387 },
 { "-mno-push-args",MASK_NO_PUSH_ARGS },
 { "-mno-red-zone", MASK_NO_RED_ZONE },
@@ -6488,6 +6489,10 @@ ix86_valid_target_attribute_inner_p (tre
 

[PATCH] Fix assembler arguments for -m16

2016-07-06 Thread Roger Pau Monne
At the moment the -m16 option only passes the "--32" parameter to the
assembler on glibc OSes, while on other OSes the assembler is called without
any specific flag. This is wrong and causes the assembler to fail. Fix it
by adding support for the -m16 option to x86-64.h.

2016-07-06  Roger Pau Monné  

* x86-64.h: append --32 to the assembler options when -m16 is used
even on non-glibc OSes.

---
Cc: h...@gcc.gnu.org
Cc: ger...@freebsd.org
---
This should be backported to all stable branches up to 4.9 (when -m16 was
introduced).

Please keep me on Cc since I'm not subscribed to the list, thanks.
---
 gcc/config/i386/x86-64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-64.h b/gcc/config/i386/x86-64.h
index b0bf835..9e6c6eb 100644
--- a/gcc/config/i386/x86-64.h
+++ b/gcc/config/i386/x86-64.h
@@ -49,7 +49,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define WCHAR_TYPE_SIZE 32
 
 #undef ASM_SPEC
-#define ASM_SPEC "%{m32:--32} %{m64:--64} %{mx32:--x32}"
+#define ASM_SPEC "%{m16|m32:--32} %{m64:--64} %{mx32:--x32}"
 
 #undef ASM_OUTPUT_ALIGNED_BSS
 #define ASM_OUTPUT_ALIGNED_BSS(FILE, DECL, NAME, SIZE, ALIGN) \
-- 
2.7.4 (Apple Git-66)



Re: [patch] Fix type merging deficiency during WPA

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 12:44 PM, Richard Biener
 wrote:
> On Wed, Jul 6, 2016 at 12:21 PM, Richard Biener
>  wrote:
>> On Wed, Jul 6, 2016 at 11:33 AM, Eric Botcazou  wrote:
 I see.  I think the solution is to perform cgraph/varpool merging
 before attempting to read in
 the global decl stream.  IIRC Micha had (old) patches for this.
>>>
>>> How can you merge varpool nodes if you haven't merged types?
>
> You merge them just in the way the linker instructs you via the
> resolution table.
>
 But I wonder why we don't tree-merge 'n' here (from my C example) and
 thus figure
 that the type domain of x is equal?  Or is it that 'n' and 'x' are in
 the same SCC (they
 referece each other in some way)?  In this case the bug would be that we
 fail to treat them equal optimistically.  That said, I don't see how
 TYPE_CANONICAL computation is relevant - what is relevant is the failure to
 merge the two types.
 In debugging this I'd start to see if the hashes are not equal or if
 they are equal
 at which node we consider them to differ.
>>>
>>> We just have 2 different DECLs with different DECL_UIDs, the definition 
>>> from a
>>> compilation unit and a reference from another compilation unit, so the 
>>> hashes
>>> naturally differ too.  What's supposed to have them reconciled at this 
>>> point?
>>
>> I am talking about tree/SCC merging which happily merges global decls as 
>> well.
>> It uses custom hashing (see lto-streamer-out.c:hash_tree) which doesn't hash
>> DECL_UID (obviously).  This merging process should be optimistic for all 
>> nodes
>> in the same SCC as well.
>>
>> That said, I expect the types to be tree merged and wonder why they are not.
>>
>> If they were merged they'd obviously share TYPE_CANONICAL because there
>> would be only one type to compute TYPE_CANONICAL for.
>
> It probably boils down to one unit refering to the DECL with DECL_EXTERNAL set
> and one to the DECL with TREE_STATIC set?  This would mean that
> hashing/comparing
> should be set up to "merge" those but the merging ultimatively rejected by 
> some
> toplevel logic (so we get users merged).  But as said, early symtab
> merging would
> fix this as well.

So sth like

Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 238039)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -996,7 +996,7 @@ hash_tree (struct streamer_tree_cache_d
   else
 hstate.add_flag (TREE_NO_WARNING (t));
   hstate.add_flag (TREE_NOTHROW (t));
-  hstate.add_flag (TREE_STATIC (t));
+  //hstate.add_flag (TREE_STATIC (t));
   hstate.add_flag (TREE_PROTECTED (t));
   hstate.add_flag (TREE_DEPRECATED (t));
   if (code != TREE_BINFO)
@@ -1050,7 +1050,7 @@ hash_tree (struct streamer_tree_cache_d
   hstate.add_flag (DECL_ARTIFICIAL (t));
   hstate.add_flag (DECL_USER_ALIGN (t));
   hstate.add_flag (DECL_PRESERVE_P (t));
-  hstate.add_flag (DECL_EXTERNAL (t));
+  //hstate.add_flag (DECL_EXTERNAL (t));
   hstate.add_flag (DECL_GIMPLE_REG_P (t));
   hstate.commit_flag ();
   hstate.add_int (DECL_ALIGN (t));
Index: gcc/lto/lto.c
===
--- gcc/lto/lto.c   (revision 238039)
+++ gcc/lto/lto.c   (working copy)
@@ -1263,7 +1263,8 @@ compare_tree_sccs_1 (tree t1, tree t2, t
 tree t1_ = (E1), t2_ = (E2); \
 if (t1_ != t2_ \
&& (!t1_ || !t2_ \
-   || !TREE_VISITED (t2_) \
+   || (!TREE_VISITED (t2_) \
+   && !same_decls (t1_, t2_)) \
|| (!TREE_ASM_WRITTEN (t2_) \
&& !compare_tree_sccs_1 (t1_, t2_, map \
   return false; \

plus the magic new function same_decls which would check if the trees are
var/function decls with the same assembler name. The function can't use
the cgraph as that is not yet read in unfortunately - I still think we should
do that so we can see the prevailing nodes early.

Richard.

>
> Richard.
>
>> Richard.
>>
>>> The TYPE_CANONICAL computation is relevant because, with GCC 6, the 
>>> criterion
>>> for compatibility of pointer types is the alias set, which is based on the
>>> TYPE_CANONICAL of the pointed-to type, so we fail to merge pointer types
>>> because warn_type_compatibility_p returns non-zero if TYPE_CANONICAL 
>>> differs.
>>>
>>> --
>>> Eric Botcazou


Re: [PATCH][ARM][Testsuite] Fix prototype in vst1Q_laneu64-1.c

2016-07-06 Thread Christophe Lyon
On 6 July 2016 at 15:04, Wilco Dijkstra  wrote:
> Fix prototype in vst1Q_laneu64-1.c to unsigned char* so it passes.
>
> Committed as trivial fix.
>
> ChangeLog
> 2016-07-06  Wilco Dijkstra  
>
> gcc/testsuite/
> * gcc.target/arm/vst1Q_laneu64-1.c (foo): Use unsigned char*.

Thanks for catching this. I intended to do it as part of my
"neon-testgen.ml removal"
patch, but I obviously made a mistake when I committed in svn.

> ---
>
> diff --git a/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c 
> b/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
> index 
> 5f4c927b6e0af9e9e3885e5e1fa25ec21fd53c78..de4e92a0b4aa5121864e2a78f20561168bb21dfa
>  100644
> --- a/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
> +++ b/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
> @@ -11,7 +11,7 @@
>  unsigned char dummy_store[1000];
>
>  void
> -foo (char* addr)
> +foo (unsigned char* addr)
>  {
>uint8x16_t vdata = vld1q_u8 (addr);
>vst1q_lane_u64 ((uint64_t*) _store, vreinterpretq_u64_u8 (vdata), 0);
>


[Ada] Extra precision in the evaluation of non-static expressions

2016-07-06 Thread Arnaud Charlet
This patch prevents the use of arbitrary-precision computations when an
expression consists of known numeric literals but one of them is not static
by the rules of the language but happens to be known to the compiler through
data-flow. The patch introduces a conversion to a machine number for the
evaluation of the non-static subexpression.

Executing:
   gnatmake -q maintest
   maintest

must yield:

 3541631232 3541631232 3541631232 3541631232
 3541631107 3541631107 3541631107 3541631107

---
with Text_IO;

procedure maintest
is

   type A_TYPE is mod 2 ** 32;
   for A_TYPE'SIZE use 32;

   A_VALUE : A_TYPE := 17000;

   FLOAT_MAX_CONST : constant := ((2.0**21 - 1.0) / 2.0**21) * 2.0**84;
   type FLOAT_TYPE is digits 6 range - 2.0** 84 .. FLOAT_MAX_CONST;
   for FLOAT_TYPE'SIZE use 32;

   type B_TYPE is mod 2**32;
   for B_TYPE'SIZE use 32;

   A_CONST : aliased constant FLOAT_TYPE := 83.3325 / 4.0;

   package mod_io is new text_io.modular_io (B_TYPE);
begin
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (A_VALUE)));
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (A_VALUE)));
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (A_VALUE)));
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (A_VALUE)));
   Text_IO.New_Line;

   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (17000)));
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (17000)));
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (17000)));
   mod_io.Put(B_TYPE (A_CONST * FLOAT_TYPE (17000)));
end maintest;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-07-06  Ed Schonberg  

* sem_eval.adb (Check_Non_Static_Context): If the expression
is a real literal of a floating point type that is part of a
larger expression and is not a static expression, transform it
into a machine number now so that the rest of the computation,
even if other components are static, is not evaluated with
extra precision.

Index: sem_eval.adb
===
--- sem_eval.adb(revision 238040)
+++ sem_eval.adb(working copy)
@@ -445,11 +445,24 @@
   --  that an infinity will result.
 
   if not Is_Static_Expression (N) then
- if Is_Floating_Point_Type (T)
-   and then Is_Out_Of_Range (N, Base_Type (T), Assume_Valid => True)
- then
-Error_Msg_N
-  ("??float value out of range, infinity will be generated", N);
+ if Is_Floating_Point_Type (T) then
+if Is_Out_Of_Range (N, Base_Type (T), Assume_Valid => True) then
+   Error_Msg_N
+ ("??float value out of range, infinity will be generated", N);
+
+--  The literal may be the result of constant-folding of a non-
+--  static subexpression of a larger expression (e.g. a conversion
+--  of a non-static variable whose value happens to be known). At
+--  this point we must reduce the value of the subexpression to a
+--  machine number (RM 4.9 (38/2)).
+
+elsif Nkind (N) = N_Real_Literal
+  and then Nkind (Parent (N)) in N_Subexpr
+then
+   Rewrite (N, New_Copy (N));
+   Set_Realval
+ (N, Machine (Base_Type (T), Realval (N), Round_Even, N));
+end if;
  end if;
 
  return;


[Ada] Spurious error on withed Ghost unit

2016-07-06 Thread Arnaud Charlet
This patch modifies the front end to prevent the generation of ALI and object
files as well as bypass the back end and suppress any errors related to code
generation when the compilation unit is ignored Ghost.


-- Source --


--  ignore.adc

pragma Assertion_Policy (Ghost => Ignore);

--  ghost_gen.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Ghost_Gen is
   procedure Ghost_Proc (Formal : T) is
   begin
  if Formal = Ghost_Obj then
 null;
  end if;
  Put_Line ("ERROR: Ghost_Gen");
   end Ghost_Proc;
end Ghost_Gen;

--  ghost_gen.ads

generic
   type T is private;
   Ghost_Obj : T;

package Ghost_Gen with Ghost is
   procedure Ghost_Proc (Formal : T);
end Ghost_Gen;

--  ghost_pack_spec.ads

package Ghost_Pack_Spec with Ghost is
   Ghost_Obj : constant Integer := 1;
end Ghost_Pack_Spec;

--  ghost_pack_spec_and_body.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Ghost_Pack_Spec_And_Body is
   procedure Ghost_Proc (Formal : Integer) is
   begin
  if Ghost_Obj = Formal then
 null;
  end if;
  Put_Line ("ERROR: Ghost_Pack_Spec_And_Body");
   end Ghost_Proc;
end Ghost_Pack_Spec_And_Body;

--  ghost_pack_spec_and_body.ads

package Ghost_Pack_Spec_And_Body with Ghost is
   Ghost_Obj : constant Integer := 1;

   procedure Ghost_Proc (Formal : Integer);
end Ghost_Pack_Spec_And_Body;

--  ghost_subp_body.adb

with Ada.Text_IO; use Ada.Text_IO;

procedure Ghost_Subp_Body (Formal : Integer) with Ghost is
   Ghost_Obj : Integer := 1;
begin
   if Ghost_Obj = Formal then
  null;
   end if;
   Put_Line ("ERROR: Ghost_Subp_Body");
end Ghost_Subp_Body;

--  ghost_subp_spec_and_body.adb

with Ada.Text_IO; use Ada.Text_IO;

procedure Ghost_Subp_Spec_And_Body (Formal : Integer) is
begin
   Put_Line ("ERROR: Ghost_Subp_Spec_And_Body");
end Ghost_Subp_Spec_And_Body;

--  ghost_subp_spec_and_body.ads

procedure Ghost_Subp_Spec_And_Body (Formal : Integer) with Ghost;

--  living_with_1.adb

with Ghost_Gen;

procedure Living_With_1 is
   package Ghost_Inst is new Ghost_Gen (Integer, 1);
begin
   Ghost_Inst.Ghost_Proc (2);
end Living_With_1;

--  living_with_2.adb

with Ghost_Pack_Spec; use Ghost_Pack_Spec;

procedure Living_With_2 is
   Local_Ghost_Obj : constant Integer := Ghost_Obj with Ghost;
begin
   null;
end Living_With_2;

--  living_with_3.adb

with Ghost_Pack_Spec_And_Body; use Ghost_Pack_Spec_And_Body;

procedure Living_With_3 is
   Local_Ghost_Obj : constant Integer := Ghost_Obj with Ghost;
begin
   Ghost_Proc (Local_Ghost_Obj);
end Living_With_3;

--  living_with_4.adb

with Ghost_Subp_Body;

procedure Living_With_4 is
   Local_Ghost_Obj : constant Integer := 1 with Ghost;
begin
   Ghost_Subp_Body (Local_Ghost_Obj);
end Living_With_4;

--  living_with_5.adb

with Ghost_Subp_Spec_And_Body;

procedure Living_With_5 is
   Local_Ghost_Obj : constant Integer := 1 with Ghost;
begin
   Ghost_Subp_Spec_And_Body (Local_Ghost_Obj);
end Living_With_5;


-- Compilation and output --


$ echo "checked Ghost code, withs from living untis"
$ gcc -c -gnatec=check.adc ghost_gen.adb
$ gcc -c -gnatec=check.adc ghost_pack_spec.ads
$ gcc -c -gnatec=check.adc ghost_pack_spec_and_body.adb
$ gcc -c -gnatec=check.adc ghost_subp_body.adb
$ gcc -c -gnatec=check.adc ghost_subp_spec_and_body.adb
$ gcc -c -gnatec=check.adc living_with_1.adb
$ gcc -c -gnatec=check.adc living_with_2.adb
$ gcc -c -gnatec=check.adc living_with_3.adb
$ gcc -c -gnatec=check.adc living_with_4.adb
$ gcc -c -gnatec=check.adc living_with_5.adb
$ ls *.o *.ali > objects_and_ali.txt
$ grep -n "ghost" objects_and_ali.txt | wc -l
$ rm -rf *.o *.ali
$ echo "ignored Ghost code, withs from living units"
$ gcc -c -gnatec=ignore.adc ghost_gen.adb
$ gcc -c -gnatec=ignore.adc ghost_pack_spec.ads
$ gcc -c -gnatec=ignore.adc ghost_pack_spec_and_body.adb
$ gcc -c -gnatec=ignore.adc ghost_subp_body.adb
$ gcc -c -gnatec=ignore.adc ghost_subp_spec_and_body.adb
$ gcc -c -gnatec=ignore.adc living_with_1.adb
$ gcc -c -gnatec=ignore.adc living_with_2.adb
$ gcc -c -gnatec=ignore.adc living_with_3.adb
$ gcc -c -gnatec=ignore.adc living_with_4.adb
$ gcc -c -gnatec=ignore.adc living_with_5.adb
$ ls *.o *.ali > objects_and_ali.txt
$ grep "ghost" objects_and_ali.txt | wc -l
$ rm -rf *.o *.ali
$ gnatmake -f -q -gnatec=ignore.adc living_with_1.adb
$ gnatmake -f -q -gnatec=ignore.adc living_with_2.adb
$ gnatmake -f -q -gnatec=ignore.adc living_with_3.adb
$ gnatmake -f -q -gnatec=ignore.adc living_with_4.adb
$ gnatmake -f -q -gnatec=ignore.adc living_with_5.adb
$ ./living_with_1
$ ./living_with_2
$ ./living_with_3
$ ./living_with_4
$ ./living_with_5
checked Ghost code, withs from living untis
10
ignored Ghost code, withs from living units
0

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-07-06  Hristian Kirtchev  

* gnat1drv.adb: Code clean up. Do not emit any
code generation errors when the unit is ignored Ghost.


Re: [PATCH PR71518] Adjust misalign for outer loops also.

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 2:58 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is a simple patch which add missed misalign adjustment for outer loop.
>
> Bootstrapping and regression testing did not show any new failures.
>
> Is it OK for trunk?

Ok.

Richard.

> ChangeLog:
> 2016-07-06  Yuri Rumyantsev  
>
> PR tree-optimization/71518
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Adjust
> misalign also for outer loops with negative step.
> gcc/testsuite/ChangeLog:
> * gcc.dg/pr71518.c: New test.


Re: -fopt-info handling

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 2:25 PM, Ulrich Drepper  wrote:
> On Tue, Jul 5, 2016 at 6:06 AM, Richard Biener
>  wrote:
>> I don't think all-all is a useful default.  "optimized" may be though.
>
> I relied on old documentation installed on one of my system.
> Apparently the default changed to optimized-optall.  So, no change to
> the documentation needed if the general opinion is that this is a sane
> default and the following patch actually installs a default behavior.

Ok.

Thanks,
Richard.


Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-07-06 Thread Richard Biener
On Tue, 5 Jul 2016, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Sun, 3 Jul 2016, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Wed, 15 Jun 2016, Richard Sandiford wrote:
> >> >
> >> >> Richard Biener  writes:
> >> >> > With the proposed cost change for vector construction we will end up
> >> >> > vectorizing the testcase in PR68961 again (on x86_64 and likely
> >> >> > on ppc64le as well after that target gets adjustments).  Currently
> >> >> > we can't optimize that away again noticing the direct overlap of
> >> >> > argument and return registers.  The obstackle is
> >> >> >
> >> >> > (insn 7 4 8 2 (set (reg:V2DF 93)
> >> >> > (vec_concat:V2DF (reg/v:DF 91 [ a ])
> >> >> > (reg/v:DF 92 [ aa ]))) 
> >> >> > ...
> >> >> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> >> >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> >> >> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> >> >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> >> >> >
> >> >> > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> >> >> >
> >> >> > First of all simplify_subreg doesn't handle the subregs of a 
> >> >> > vec_concat
> >> >> > (easy fix below).
> >> >> >
> >> >> > Then combine doesn't like to simplify the multi-use (it tries some
> >> >> > parallel it seems).  So I went to forwprop which eventually manages
> >> >> > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> >> >> > because it is not a constant.  Thus I allow arbitrary simplification
> >> >> > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> >> >> > to be a magic flag to tell it to restrict to the case where all
> >> >> > uses can be simplified or so, nor to restrict simplifications to a 
> >> >> > REG.
> >> >> > But I don't see any undesirable simplifications of (subreg 
> >> >> > ([vec_]concat)).
> >> >> 
> >> >> Adding that as a special case to propgate_rtx feels like a hack though 
> >> >> :-)
> >> >> I think:
> >> >> 
> >> >> > Index: gcc/fwprop.c
> >> >> > ===
> >> >> > *** gcc/fwprop.c  (revision 237286)
> >> >> > --- gcc/fwprop.c  (working copy)
> >> >> > *** propagate_rtx (rtx x, machine_mode mode,
> >> >> > *** 664,670 
> >> >> > || (GET_CODE (new_rtx) == SUBREG
> >> >> > && REG_P (SUBREG_REG (new_rtx))
> >> >> > && (GET_MODE_SIZE (mode)
> >> >> > !   <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))
> >> >> >   flags |= PR_CAN_APPEAR;
> >> >> > if (!varying_mem_p (new_rtx))
> >> >> >   flags |= PR_HANDLE_MEM;
> >> >> > --- 664,673 
> >> >> > || (GET_CODE (new_rtx) == SUBREG
> >> >> > && REG_P (SUBREG_REG (new_rtx))
> >> >> > && (GET_MODE_SIZE (mode)
> >> >> > !   <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)
> >> >> > !   || ((GET_CODE (new_rtx) == VEC_CONCAT
> >> >> > !|| GET_CODE (new_rtx) == CONCAT)
> >> >> > !   && GET_CODE (x) == SUBREG))
> >> >> >   flags |= PR_CAN_APPEAR;
> >> >> > if (!varying_mem_p (new_rtx))
> >> >> >   flags |= PR_HANDLE_MEM;
> >> >> 
> >> >> ...this if statement should fundamentally only test new_rtx.
> >> >> E.g. we'd want the same thing for any SUBREG inside X.
> >> >> 
> >> >> How about changing:
> >> >> 
> >> >>   /* The replacement we made so far is valid, if all of the recursive
> >> >>  replacements were valid, or we could simplify everything to
> >> >>  a constant.  */
> >> >>   return valid_ops || can_appear || CONSTANT_P (tem);
> >> >> 
> >> >> so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
> >> >> I suppose that's likely to increase register pressure though,
> >> >> if only some uses of new_rtx simplify.  (There again, requiring all
> >> >> uses to be replacable could make hot code the hostage of cold code.)
> >> >
> >> > Yes, my fear was about register presure increase for the case not all
> >> > uses can be replaced (fwprop doesn't seem to have code to verify or
> >> > require that).
> >> >
> >> > I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
> >> > case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
> >> > new_rtx == [VEC_]CONCAT case for example.
> >> 
> >> I don't think that helps though.  There might be other uses of a
> >> VEC_CONCAT that aren't SUBREGs, in which case we'd have the same
> >> problem of keeping both values live at once.
> >> 
> >> How about restricting the REG_P (tem) && !HARD_REGISTER_P (tem)
> >> to cases where new_rtx has more words than tem?
> >
> > So would you really make a simple mode-size check here?
> 
> I thought a mode check would be worth trying since (for better or worse)
> words are special for subregs.  But...
> 
> > I wonder which cases are there other than the subreg of [vec_concat]
> > that would end up with this case.  That is,
> >
> >   if (REG_P (tem) && 

[AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-06 Thread Jiong Wang

The current vmaxnm/vminnm float intrinsics are implemented using
__builtin_aarch64_smax/min  which are mapping to backend patterns
using smin/smax rtl operators.  However as documented in rtl.def

  "Further, if both operands are zeros, or if either operand is NaN, then
  it is unspecified which of the two operands is returned as the result."

There is no guarantee that a number will always be returned through
smin/smax operator, and further tests show gcc will optimize something
like smin (1.0f, Nan) to Nan, so current the vmaxnm and vminnm intrinsics
will evetually fail the new added testcases included in this patch.

This patch:

  * Migrate vminnm/vmaxnm float intrinsics to "3" pattern
which guarantee fminnm/fmaxnm sematics.

  * Add new testcases for vminnm and vmaxnm intrinsics which were missing
previously.  They are marked as XFAIL on arm*-*-* as ARM hasn't
implemented these intrinsics.

OK for trunk?

2016-07-06  Jiong Wang  

gcc/
  * config/aarch64/aarch64-simd-builtins.def (smax): Remove float variants.
  (smin): Likewise.
  (fmax): New entry.
  (fmin): Likewise.
  * config/aarch64/arm_neon.h (vmaxnm_f32): Use __builtin_aarch64_fmaxv2sf.
  (vmaxnmq_f32): Likewise.
  (vmaxnmq_f64): Likewise.
  (vminnm_f32): Likewise.
  (vminnmq_f32): Likewise.
  (vminnmq_f64): Likewise.

gcc/testsuite/
  * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: Support 
HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: Define HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: Define HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vmax.c: Define HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vmin.c: Define HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: Define HAS_INTEGER_VARIANT.
  * gcc.target/aarch64/advsimd-intrinsics/vmaxnm.c: New.
  * gcc.target/aarch64/advsimd-intrinsics/vminnm.c: New.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3e4740c460a335d8a4d5ce8b19fc311aa14a47d4..f1ad325f464f89c981cbdee8a8f6afafa938639a 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -244,13 +244,17 @@
   /* Implemented by 3.
  smax variants map to fmaxnm,
  smax_nan variants map to fmax.  */
-  BUILTIN_VDQIF (BINOP, smax, 3)
-  BUILTIN_VDQIF (BINOP, smin, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smax, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
   BUILTIN_VDQF (BINOP, smax_nan, 3)
   BUILTIN_VDQF (BINOP, smin_nan, 3)
 
+  /* Implemented by 3.  */
+  BUILTIN_VDQF (BINOP, fmax, 3)
+  BUILTIN_VDQF (BINOP, fmin, 3)
+
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 475e200a683436af5026edafa568f16126f4340a..300e7951f47a30a5b125899b240913023b94de0b 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17352,19 +17352,19 @@ vpminnms_f32 (float32x2_t a)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmaxnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_smaxv2sf (__a, __b);
+  return __builtin_aarch64_fmaxv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmaxnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_smaxv4sf (__a, __b);
+  return __builtin_aarch64_fmaxv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vmaxnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_smaxv2df (__a, __b);
+  return __builtin_aarch64_fmaxv2df (__a, __b);
 }
 
 /* vmaxv  */
@@ -17582,19 +17582,19 @@ vminq_u32 (uint32x4_t __a, uint32x4_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vminnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_sminv2sf (__a, __b);
+  return __builtin_aarch64_fminv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vminnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_sminv4sf (__a, __b);
+  return __builtin_aarch64_fminv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vminnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_sminv2df (__a, __b);
+  return __builtin_aarch64_fminv2df (__a, __b);
 }
 
 /* vminv  */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc
index 1eb9271b7f52aff96694f45a987c5368f2c9f95d..58082b2c95b2d6801ce5507070f8f828927adbb9 

[PATCH][ARM][Testsuite] Fix prototype in vst1Q_laneu64-1.c

2016-07-06 Thread Wilco Dijkstra
Fix prototype in vst1Q_laneu64-1.c to unsigned char* so it passes.

Committed as trivial fix.

ChangeLog
2016-07-06  Wilco Dijkstra  

gcc/testsuite/
* gcc.target/arm/vst1Q_laneu64-1.c (foo): Use unsigned char*.
---

diff --git a/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c 
b/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
index 
5f4c927b6e0af9e9e3885e5e1fa25ec21fd53c78..de4e92a0b4aa5121864e2a78f20561168bb21dfa
 100644
--- a/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
+++ b/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
@@ -11,7 +11,7 @@
 unsigned char dummy_store[1000];
 
 void
-foo (char* addr)
+foo (unsigned char* addr)
 {
   uint8x16_t vdata = vld1q_u8 (addr);
   vst1q_lane_u64 ((uint64_t*) _store, vreinterpretq_u64_u8 (vdata), 0);



[PATCH PR71518] Adjust misalign for outer loops also.

2016-07-06 Thread Yuri Rumyantsev
Hi All,

Here is a simple patch which add missed misalign adjustment for outer loop.

Bootstrapping and regression testing did not show any new failures.

Is it OK for trunk?

ChangeLog:
2016-07-06  Yuri Rumyantsev  

PR tree-optimization/71518
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Adjust
misalign also for outer loops with negative step.
gcc/testsuite/ChangeLog:
* gcc.dg/pr71518.c: New test.


patch
Description: Binary data


Re: [v3 PATCH] Implement LWG 2451

2016-07-06 Thread Jonathan Wakely

On 06/07/16 00:44 +0300, Ville Voutilainen wrote:

   Implement LWG 2451, optional should 'forward' T's
   implicit conversions.
   * include/experimental/optional (__is_optional_impl, __is_optional):
   New.
   (optional()): Make constexpr and default.
   (optional(_Up&&), optional(const optional<_Up>&),
   optional(optional<_Up>&& __t): New.
   (operator=(_Up&&)): Constrain.
   (operator=(const optional<_Up>&), operator=(optional<_Up>&&)): New.
   * testsuite/experimental/optional/cons/value.cc:
   Add tests for the functionality added by LWG 2451.
   * testsuite/experimental/optional/cons/value_neg.cc: New.


As I mentioned on IRC, experimental::optional is C++14-only so you can
use remove_cv_t and enable_if_t to get rid of some of the typename
X::type clutter.

OK for trunk and gcc-6 with or without that change, your choice.




Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-06 Thread Kyrill Tkachov


On 06/07/16 13:31, Rainer Orth wrote:

Hi Kyrill,


On 05/07/16 12:24, Rainer Orth wrote:

Marc Glisse  writes:


On Tue, 5 Jul 2016, Kyrill Tkachov wrote:


As for testing I've bootstrapped and tested the patch on aarch64 and
x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
always true to exercise the paths that synthesize the shift by
additions. Marc, could you test this on the sparc targets you care about?

I don't have access to Sparc, you want Rainer here (added in Cc:).

As is, the patch doesn't even build on current mainline:

/vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 
'mult_variant' has not been declared
   target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
  ^

enum mult_variant is only declared in expmed.c.

Ah, sorry I forgot to mention that this is patch 2/2.
It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which
is already approved
but I was planning to commit it together with this one.
Can you please try applying
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
as well as this?

sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
that the patch fixes PR tree-optimization/70923 (you should mention that
in the ChangeLog or close it as a duplicate), with the same caveat as
about Marc's latest patch for that:


Thanks! Much appreciated.


+FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1

The message appears twice, not once, on sparc, so the testcase should be
updated to accomodate that.


Ok.


Besides, the new testcase FAILs:

+FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 2
+FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2

The dump contains

gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3;
gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3;
gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls 
or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls 
or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.


I see. I suppose SPARC doesn't have vector shifts operating on 64-bit data?
I believe the testcase should be updated to use just "int" arrays rather than "long 
long".

I'll respin the testcases
Thanks again,
Kyrill


Rainer





[Ada] Missing abort deferral on controlled aggregate component assignment

2016-07-06 Thread Arnaud Charlet
This patch adds an abort defer / undefer pair around the initialization
statements of a controlled aggregate component as dictated by 9.8 11.


-- Source --


--  aggregates.ads

with Ada.Finalization; use Ada.Finalization;

package Aggregates is
   type Ctrl is new Controlled with null record;

   Ctrl_Obj : constant Ctrl := (Controlled with null record);

   type Arr is array (1 .. 3) of Ctrl;

   Arr_Obj_1 : constant Arr := (others => Ctrl_Obj);
   Arr_Obj_2 : constant Arr := (others => (Controlled with null record));

   type Rec is record
  Comp : Ctrl;
   end record;

   Rec_Obj_1 : constant Rec := (Comp => Ctrl_Obj);
   Rec_Obj_2 : constant Rec := (Comp => (Controlled with null record));
end Aggregates;


-- Compilation and output --


$ gcc -c -gnatDG aggregates.ads
$ line=$(grep -n "arr_obj_1 : constant" aggregates.ads.dg | cut -f1 -d:)
$ tail -n +$line aggregates.ads.dg | head -n 20 | grep "abort_" | sed "s/^ *//"
system__soft_links__abort_defer.all;
system__standard_library__abort_undefer_direct;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-07-06  Hristian Kirtchev  

* exp_aggr.adb Remove with and use clauses for Exp_Ch11 and Inline.
(Initialize_Array_Component): Protect the initialization
statements in an abort defer / undefer block when the associated
component is controlled.
(Initialize_Record_Component): Protect the initialization statements
in an abort defer / undefer block when the associated component is
controlled.
(Process_Transient_Component_Completion): Use Build_Abort_Undefer_Block
to create an abort defer / undefer block.
* exp_ch3.adb Remove with and use clauses for Exp_ch11 and Inline.
(Default_Initialize_Object): Use Build_Abort_Undefer_Block to
create an abort defer / undefer block.
* exp_ch5.adb (Expand_N_Assignment_Statement): Mark an abort
defer / undefer block as such.
* exp_ch9.adb (Find_Enclosing_Context): Do not consider an abort
defer / undefer block as a suitable context for an activation
chain or a master.
* exp_util.adb Add with and use clauses for Exp_Ch11.
(Build_Abort_Undefer_Block): New routine.
* exp_util.ads (Build_Abort_Undefer_Block): New routine.
* sinfo.adb (Is_Abort_Block): New routine.
(Set_Is_Abort_Block): New routine.
* sinfo.ads New attribute Is_Abort_Block along with occurrences
in nodes.
(Is_Abort_Block): New routine along with pragma Inline.
(Set_Is_Abort_Block): New routine along with pragma Inline.

Index: exp_ch9.adb
===
--- exp_ch9.adb (revision 321913)
+++ exp_ch9.adb (working copy)
@@ -6251,8 +6251,11 @@
   Defining_Identifier => D_T2,
   Type_Definition => Def1);
 
-  Insert_After_And_Analyze (N, Decl1);
+  --  Declare the new types before the original one since the latter will
+  --  refer to them through the Equivalent_Type slot.
 
+  Insert_Before_And_Analyze (N, Decl1);
+
   --  Associate the access to subprogram with its original access to
   --  protected subprogram type. Needed by the backend to know that this
   --  type corresponds with an access to protected subprogram type.
@@ -6286,7 +6289,7 @@
   Component_List =>
 Make_Component_List (Loc, Component_Items => Comps)));
 
-  Insert_After_And_Analyze (Decl1, Decl2);
+  Insert_Before_And_Analyze (N, Decl2);
   Set_Equivalent_Type (T, E_T);
end Expand_Access_Protected_Subprogram_Type;
 
@@ -9310,6 +9313,9 @@
 
   pragma Assert (Present (Pdef));
 
+  Insert_After (Current_Node, Rec_Decl);
+  Current_Node := Rec_Decl;
+
   --  Add private field components
 
   if Present (Private_Declarations (Pdef)) then
@@ -9570,9 +9576,6 @@
  Append_To (Cdecls, Object_Comp);
   end if;
 
-  Insert_After (Current_Node, Rec_Decl);
-  Current_Node := Rec_Decl;
-
   --  Analyze the record declaration immediately after construction,
   --  because the initialization procedure is needed for single object
   --  declarations before the next entity is analyzed (the freeze call
Index: exp_util.adb
===
--- exp_util.adb(revision 321913)
+++ exp_util.adb(working copy)
@@ -7912,11 +7912,11 @@
 
   Scope_Suppress.Suppress := (others => True);
 
-  --  If this is an elementary or a small not-by-reference record type, and
+  --  If this is an elementary or a small not by-reference record type, and
   --  we need to capture the value, just make a constant; this is cheap and
   --  objects of both kinds of types can be bit aligned, so it might not be
   --  possible to generate a reference 

[Ada] Spurious error on container instantiation with predicated array type

2016-07-06 Thread Arnaud Charlet
This patch fixes a spurious type error in a predicate function created within
an operation in an instantiation of a container package, when the element
type is an unconstrained array with a predicate.

The following must compile quietly 

   gcc -c gpr2-project-registry-attribute.adb

---
with Ada.Containers.Indefinite_Ordered_Maps; use Ada;
with Ada.Strings.Less_Case_Insensitive;

package body GPR2.Project.Registry.Attribute is

   function Less_Case_Insensitive
 (Left, Right : Qualified_Name) return Boolean is
 (Ada.Strings.Less_Case_Insensitive (String (Left), String (Right)));

   package Attribute_Definitions is new Containers.Indefinite_Ordered_Maps
 (Qualified_Name, Def, Less_Case_Insensitive);

end GPR2.Project.Registry.Attribute;

package GPR2.Project.Registry.Attribute is

   pragma Elaborate_Body;

   type Index_Kind is (No, Yes, Optional);

   type Qualified_Name is new Name_Type;

   type Def is record
  Index : Index_Kind;
   end record;

end GPR2.Project.Registry.Attribute;
package GPR2.Project.Registry is
end GPR2.Project.Registry;
package GPR2.Project is

end GPR2.Project;
package GPR2 is

   type Project_Kind is
 (K_Configuration, K_Abstract,
  K_Standard, K_Library, K_Aggregate, K_Aggregate_Library);

   --
   --  Name / Value
   --

   subtype Name_Type is String
 with Dynamic_Predicate => Name_Type'Length > 0;

   subtype Value_Type is String;

end GPR2;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-07-06  Ed Schonberg  

* sem_ch3.adb (Analyze_Subtype_Declaration): For generated
subtypes, such as actual subtypes of unconstrained formals,
inherit predicate functions, if any, from the parent type rather
than creating redundant new ones.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 238040)
+++ sem_ch3.adb (working copy)
@@ -4802,6 +4802,24 @@
   then
  Set_Has_Predicates (Id);
  Set_Has_Delayed_Freeze (Id);
+
+ --  Generated subtypes inherit the predicate function from the parent
+ --  (no aspects to examine on the generated declaration).
+
+ if not Comes_From_Source (N) then
+Set_Ekind (Id, Ekind (T));
+
+if Present (Predicate_Function (T)) then
+   Set_Predicate_Function (Id, Predicate_Function (T));
+
+elsif Present (Ancestor_Subtype (T))
+  and then Has_Predicates (Ancestor_Subtype (T))
+  and then Present (Predicate_Function (Ancestor_Subtype (T)))
+then
+   Set_Predicate_Function (Id,
+ Predicate_Function (Ancestor_Subtype (T)));
+end if;
+ end if;
   end if;
 
   --  Subtype of Boolean cannot have a constraint in SPARK


[Ada] Warning on fixed-point actual types with user-defined operators

2016-07-06 Thread Arnaud Charlet
This patch adds aa warning when a formal fixed point type is instantiated with
a type that has a user-defined arithmetic operations, but the generic has no
corresponding formal functions. This is worth a warning because of the special
semantics of fixed-point operators, in particular multiplying operators.

Compiling procfix.adb must yield:

   procfix.adb:23:28: warning:
   instance does not use primitive operation "*" at line 5

The warning disappears if a formal subprogram declaration is added to the
generic:

   with function "*" (X, Y :Fix) return Fix is <>;

---
with Text_IO; use Text_IO;
procedure Procfix is
  package P is
 type T is delta 0.1 range -10.0 .. 10.0;
 function "*" (X, Y: T) return T;
  end P;
  use P;

  package body P is
 function "*" (X, Y: T) return T is
 begin
return  (X + Y) / 2.0;
 end;
  end P;

  generic
 type Fix is delta <>;
  package Try is
 X : Fix := 3.0;
 Y : Fix := X * X;
  end;

  package Inst is new Try (T);
  use Inst;

begin
  Put_Line (T'Image (Inst.Y));
end;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-07-06  Ed Schonberg  

* sem_ch12.adb (Check_Fixed_Point_Actual): Add a warning when
a formal fixed point type is instantiated with a type that has
a user-defined arithmetic operations, but the generic has no
corresponding formal functions. This is worth a warning because
of the special semantics of fixed-point operators.

Index: sem_ch12.adb
===
--- sem_ch12.adb(revision 238040)
+++ sem_ch12.adb(working copy)
@@ -1105,6 +1105,12 @@
   --  In Ada 2005, indicates partial parameterization of a formal
   --  package. As usual an other association must be last in the list.
 
+  procedure Check_Fixed_Point_Actual (Actual : Node_Id);
+  --  Warn if an actual fixed-point type has user-defined arithmetic
+  --  operations, but there is no corresponding formal in the generic,
+  --  in which case the predefined operations will be used. This merits
+  --  a warning because of the special semantics of fixed point ops.
+
   procedure Check_Overloaded_Formal_Subprogram (Formal : Entity_Id);
   --  Apply RM 12.3(9): if a formal subprogram is overloaded, the instance
   --  cannot have a named association for it. AI05-0025 extends this rule
@@ -1187,6 +1193,52 @@
   end Check_Overloaded_Formal_Subprogram;
 
   ---
+  --  Check_Fixed_Point_Actual --
+  ---
+
+  procedure Check_Fixed_Point_Actual (Actual : Node_Id) is
+ Typ: constant Entity_Id := Entity (Actual);
+ Prims  : constant Elist_Id  := Collect_Primitive_Operations (Typ);
+ Elem   : Elmt_Id;
+ Formal : Node_Id;
+
+  begin
+ --  Locate primitive operations of the type that are arithmetic
+ --  operations.
+
+ Elem := First_Elmt (Prims);
+ while Present (Elem) loop
+if Nkind (Node (Elem)) = N_Defining_Operator_Symbol then
+
+   --  Check whether the generic unit has a formal subprogram of
+   --  the same name. This does not check types but is good enough
+   --  to justify a warning.
+
+   Formal := First_Non_Pragma (Formals);
+   while Present (Formal) loop
+  if Nkind (Formal) = N_Formal_Concrete_Subprogram_Declaration
+and then Chars (Defining_Entity (Formal)) =
+   Chars (Node (Elem))
+  then
+ exit;
+  end if;
+
+  Next (Formal);
+   end loop;
+
+   if No (Formal) then
+  Error_Msg_Sloc := Sloc (Node (Elem));
+  Error_Msg_NE
+("?instance does not use primitive operation

Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-06 Thread Rainer Orth
Hi Kyrill,

> On 05/07/16 12:24, Rainer Orth wrote:
>> Marc Glisse  writes:
>>
>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>>>
 As for testing I've bootstrapped and tested the patch on aarch64 and
 x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
 always true to exercise the paths that synthesize the shift by
 additions. Marc, could you test this on the sparc targets you care about?
>>> I don't have access to Sparc, you want Rainer here (added in Cc:).
>> As is, the patch doesn't even build on current mainline:
>>
>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 
>> 'mult_variant' has not been declared
>>   target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
>>  ^
>>
>> enum mult_variant is only declared in expmed.c.
>
> Ah, sorry I forgot to mention that this is patch 2/2.
> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which
> is already approved
> but I was planning to commit it together with this one.
> Can you please try applying
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
> as well as this?

sure, that did the trick.  A sparc-sun-solaris2.12 bootstrap revealed
that the patch fixes PR tree-optimization/70923 (you should mention that
in the ChangeLog or close it as a duplicate), with the same caveat as
about Marc's latest patch for that:

+FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects  scan-tree-dump-times 
vect "vectorized 1 loops" 1
+FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1

The message appears twice, not once, on sparc, so the testcase should be
updated to accomodate that.

Besides, the new testcase FAILs:

+FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorized 1 loops" 2
+FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2

The dump contains

gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3;
gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3;
gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function.
gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls 
or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop.
gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls 
or data references that cannot be analyzed
gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: -fopt-info handling

2016-07-06 Thread Ulrich Drepper
On Tue, Jul 5, 2016 at 6:06 AM, Richard Biener
 wrote:
> I don't think all-all is a useful default.  "optimized" may be though.

I relied on old documentation installed on one of my system.
Apparently the default changed to optimized-optall.  So, no change to
the documentation needed if the general opinion is that this is a sane
default and the following patch actually installs a default behavior.


d-gcc-opt-info2
Description: Binary data


Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue

2016-07-06 Thread Bernd Schmidt
There's one thing I don't quite understand and which seems to have 
changed since v1:


On 07/04/2016 02:19 PM, Dominik Vogt wrote:

@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)

   /* If there were any, allocate space.  */
   if (large_size > 0)
-   large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-  large_align, true);
+   {
+ large_allocsize = GEN_INT (large_size);
+ get_dynamic_stack_size (_allocsize, 0, large_align, NULL);
+   }
 }

   for (si = 0; si < n; ++si)
@@ -1186,6 +1190,19 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  /* Large alignment is only processed in the last pass.  */
  if (pred)
continue;
+
+ if (large_allocsize && ! large_allocation_done)
+   {
+ /* Allocate space the virtual stack vars area in the
+prologue.  */
+ HOST_WIDE_INT loffset;
+
+ loffset = alloc_stack_frame_space
+   (INTVAL (large_allocsize),
+PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
+ large_base = get_dynamic_stack_base (loffset, large_align);
+ large_allocation_done = true;
+   }
  gcc_assert (large_base != NULL);



Why is this code split between the two places here? v1 seems to have 
done it all in the first piece of code where we now only set 
large_allocsize.



Bernd



Re: [PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m

2016-07-06 Thread Andre Vieira (lists)
On 01/07/16 14:40, Ramana Radhakrishnan wrote:
> 
> 
> On 13/10/15 18:01, Andre Vieira wrote:
>> This patch ports the aeabi_idiv routine from Linaro Cortex-Strings 
>> (https://git.linaro.org/toolchain/cortex-strings.git), which was contributed 
>> by ARM under Free BSD license.
>>
>> The new aeabi_idiv routine is used to replace the one in 
>> libgcc/config/arm/lib1funcs.S. This replacement happens within the Thumb1 
>> wrapper. The new routine is under LGPLv3 license.
> 
> This is not under LGPLv3 . It is under GPLv3 with the runtime library 
> exception license, there's a difference. Assuming your licensing expectation 
> is ok  read on for more of a review.
> 
>>
>> The main advantage of this version is that it can improve the performance of 
>> the aeabi_idiv function for Thumb1. This solution will also increase the 
>> code size. So it will only be used if __OPTIMIZE_SIZE__ is not defined.
>>
>> Make check passed for armv6-m.
>>
>> libgcc/ChangeLog:
>> 2015-08-10  Hale Wang  
>> Andre Vieira  
>>
>>   * config/arm/lib1funcs.S: Add new wrapper.
>>
>> 0001-integer-division.patch
>>
>>
>> From 832a3d6af6f06399f70b5a4ac3727d55960c93b7 Mon Sep 17 00:00:00 2001
>> From: Andre Simoes Dias Vieira 
>> Date: Fri, 21 Aug 2015 14:23:28 +0100
>> Subject: [PATCH] new wrapper idivmod
>>
>> ---
>>  libgcc/config/arm/lib1funcs.S | 250 
>> --
>>  1 file changed, 217 insertions(+), 33 deletions(-)
>>
>> diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
>> index 
>> 252efcbd5385cc58a5ce1e48c6816d36a6f4c797..c9e544114590da8cde88382bea0f67206e593816
>>  100644
>> --- a/libgcc/config/arm/lib1funcs.S
>> +++ b/libgcc/config/arm/lib1funcs.S
>> @@ -306,34 +306,12 @@ LSYM(Lend_fde):
>>  #ifdef __ARM_EABI__
>>  .macro THUMB_LDIV0 name signed
>>  #if defined(__ARM_ARCH_6M__)
>> -.ifc \signed, unsigned
>> -cmp r0, #0
>> -beq 1f
>> -mov r0, #0
>> -mvn r0, r0  @ 0x
>> -1:
>> -.else
>> -cmp r0, #0
>> -beq 2f
>> -blt 3f
>> +
>> +push{r0, lr}
>>  mov r0, #0
>> -mvn r0, r0
>> -lsr r0, r0, #1  @ 0x7fff
>> -b   2f
>> -3:  mov r0, #0x80
>> -lsl r0, r0, #24 @ 0x8000
>> -2:
>> -.endif
>> -push{r0, r1, r2}
>> -ldr r0, 4f
>> -adr r1, 4f
>> -add r0, r1
>> -str r0, [sp, #8]
>> -@ We know we are not on armv4t, so pop pc is safe.
>> -pop {r0, r1, pc}
>> -.align  2
>> -4:
>> -.word   __aeabi_idiv0 - 4b
>> +bl  SYM(__aeabi_idiv0)
>> +pop {r1, pc}
>> +
> 
> I'd still retain the comment about pop pc here because there's often a 
> misconception of merging armv4t and armv6m code.
> 
>>  #elif defined(__thumb2__)
>>  .syntax unified
>>  .ifc \signed, unsigned
>> @@ -945,7 +923,170 @@ LSYM(Lover7):
>>  add dividend, work
>>.endif
>>  LSYM(Lgot_result):
>> -.endm   
>> +.endm
>> +
>> +#if defined(__prefer_thumb__) && !defined(__OPTIMIZE_SIZE__)
>> +/* If performance is preferred, the following functions are provided.  */
>> +
> 
> Comment above #if please and also check elsewhere in patch.
> 
>> +/* Branch to div(n), and jump to label if curbit is lo than divisior.  */
>> +.macro BranchToDiv n, label
>> +lsr curbit, dividend, \n
>> +cmp curbit, divisor
>> +blo \label
>> +.endm
>> +
>> +/* Body of div(n).  Shift the divisor in n bits and compare the divisor
>> +   and dividend.  Update the dividend as the substruction result.  */
>> +.macro DoDiv n
>> +lsr curbit, dividend, \n
>> +cmp curbit, divisor
>> +bcc 1f
>> +lsl curbit, divisor, \n
>> +sub dividend, dividend, curbit
>> +
>> +1:  adc result, result
>> +.endm
>> +
>> +/* The body of division with positive divisor.  Unless the divisor is very
>> +   big, shift it up in multiples of four bits, since this is the amount of
>> +   unwinding in the main division loop.  Continue shifting until the divisor
>> +   is larger than the dividend.  */
>> +.macro THUMB1_Div_Positive
>> +mov result, #0
>> +BranchToDiv #1, LSYM(Lthumb1_div1)
>> +BranchToDiv #4, LSYM(Lthumb1_div4)
>> +BranchToDiv #8, LSYM(Lthumb1_div8)
>> +BranchToDiv #12, LSYM(Lthumb1_div12)
>> +BranchToDiv #16, LSYM(Lthumb1_div16)
>> +LSYM(Lthumb1_div_large_positive):
>> +mov result, #0xff
>> +lsl divisor, divisor, #8
>> +rev result, result
>> +lsr curbit, dividend, #16
>> +cmp curbit, divisor
>> +blo 1f
>> +asr result, #8
>> +lsl divisor, divisor, #8
>> +beq LSYM(Ldivbyzero_waypoint)
>> +
>> +1:  lsr curbit, dividend, #12
>> +cmp curbit, divisor
>> +blo LSYM(Lthumb1_div12)
>> +b   LSYM(Lthumb1_div16)
>> +LSYM(Lthumb1_div_loop):
>> +lsr divisor, divisor, 

Re: [patch] Fix type merging deficiency during WPA

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 12:21 PM, Richard Biener
 wrote:
> On Wed, Jul 6, 2016 at 11:33 AM, Eric Botcazou  wrote:
>>> I see.  I think the solution is to perform cgraph/varpool merging
>>> before attempting to read in
>>> the global decl stream.  IIRC Micha had (old) patches for this.
>>
>> How can you merge varpool nodes if you haven't merged types?

You merge them just in the way the linker instructs you via the
resolution table.

>>> But I wonder why we don't tree-merge 'n' here (from my C example) and
>>> thus figure
>>> that the type domain of x is equal?  Or is it that 'n' and 'x' are in
>>> the same SCC (they
>>> referece each other in some way)?  In this case the bug would be that we
>>> fail to treat them equal optimistically.  That said, I don't see how
>>> TYPE_CANONICAL computation is relevant - what is relevant is the failure to
>>> merge the two types.
>>> In debugging this I'd start to see if the hashes are not equal or if
>>> they are equal
>>> at which node we consider them to differ.
>>
>> We just have 2 different DECLs with different DECL_UIDs, the definition from 
>> a
>> compilation unit and a reference from another compilation unit, so the hashes
>> naturally differ too.  What's supposed to have them reconciled at this point?
>
> I am talking about tree/SCC merging which happily merges global decls as well.
> It uses custom hashing (see lto-streamer-out.c:hash_tree) which doesn't hash
> DECL_UID (obviously).  This merging process should be optimistic for all nodes
> in the same SCC as well.
>
> That said, I expect the types to be tree merged and wonder why they are not.
>
> If they were merged they'd obviously share TYPE_CANONICAL because there
> would be only one type to compute TYPE_CANONICAL for.

It probably boils down to one unit refering to the DECL with DECL_EXTERNAL set
and one to the DECL with TREE_STATIC set?  This would mean that
hashing/comparing
should be set up to "merge" those but the merging ultimatively rejected by some
toplevel logic (so we get users merged).  But as said, early symtab
merging would
fix this as well.

Richard.

> Richard.
>
>> The TYPE_CANONICAL computation is relevant because, with GCC 6, the criterion
>> for compatibility of pointer types is the alias set, which is based on the
>> TYPE_CANONICAL of the pointed-to type, so we fail to merge pointer types
>> because warn_type_compatibility_p returns non-zero if TYPE_CANONICAL differs.
>>
>> --
>> Eric Botcazou


Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 11:50 AM, Yuri Rumyantsev  wrote:
> Richard,
>
> I pointed out in the commentary that REF is defined inside loop and
> this check is related to this statement. Should I clarify it?
>
> +  /* We consider REF defined in LOOP as independent if at least 2 loop
> + iterations are not dependent.  */

Yes, I fail to see why x[0] should not be disambiguated against y[i] in

#pragma GCC loop ivdep
  for (i...)
{
  y[i] = ...;
  for (j...)
... = x[0];
}

REF is always inside the loop nest with LOOP being the outermost loop.

Richard.

>
> 2016-07-06 12:38 GMT+03:00 Richard Biener :
>> On Tue, Jul 5, 2016 at 4:56 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is a simple fix to cure regressions introduced by my fix for
>>> 70729. Patch also contains minor changes in test found by Jakub.
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>
>> +  && bitmap_bit_p (_accesses.refs_in_loop[loop->num], ref->id))
>>
>> So safelen does not apply to refs in nested loops?  The added comment only
>> explains the safelen check change but this also requires explanation.
>>
>> Richard.
>>
>>> ChangeLog:
>>> 2016-07-05  Yuri Rumyantsev  
>>>
>>> PR tree-optimization/71734
>>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF defined in
>>> LOOP as independent if at least two loop iterations are not dependent.
>>> gcc/testsuite/ChangeLog:
>>> * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.


Re: [patch] Fix type merging deficiency during WPA

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 11:33 AM, Eric Botcazou  wrote:
>> I see.  I think the solution is to perform cgraph/varpool merging
>> before attempting to read in
>> the global decl stream.  IIRC Micha had (old) patches for this.
>
> How can you merge varpool nodes if you haven't merged types?
>
>> But I wonder why we don't tree-merge 'n' here (from my C example) and
>> thus figure
>> that the type domain of x is equal?  Or is it that 'n' and 'x' are in
>> the same SCC (they
>> referece each other in some way)?  In this case the bug would be that we
>> fail to treat them equal optimistically.  That said, I don't see how
>> TYPE_CANONICAL computation is relevant - what is relevant is the failure to
>> merge the two types.
>> In debugging this I'd start to see if the hashes are not equal or if
>> they are equal
>> at which node we consider them to differ.
>
> We just have 2 different DECLs with different DECL_UIDs, the definition from a
> compilation unit and a reference from another compilation unit, so the hashes
> naturally differ too.  What's supposed to have them reconciled at this point?

I am talking about tree/SCC merging which happily merges global decls as well.
It uses custom hashing (see lto-streamer-out.c:hash_tree) which doesn't hash
DECL_UID (obviously).  This merging process should be optimistic for all nodes
in the same SCC as well.

That said, I expect the types to be tree merged and wonder why they are not.

If they were merged they'd obviously share TYPE_CANONICAL because there
would be only one type to compute TYPE_CANONICAL for.

Richard.

> The TYPE_CANONICAL computation is relevant because, with GCC 6, the criterion
> for compatibility of pointer types is the alias set, which is based on the
> TYPE_CANONICAL of the pointed-to type, so we fail to merge pointer types
> because warn_type_compatibility_p returns non-zero if TYPE_CANONICAL differs.
>
> --
> Eric Botcazou


Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Jan Hubicka
> On Wed, 6 Jul 2016, Jan Hubicka wrote:
> 
> > > Jan Hubicka  writes:
> > > 
> > > > Bootstrapped/regtested x86_64-linux, OK?
> > > >
> > > > * gcc.dg/tree-ssa/scev-14.c: new testcase.
> > > 
> > > FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump-not ivopts "Overflowness 
> > > wrto loop niter:\tOverflow"
> > 
> > Aha, this is wrong version of testcase. The scan-tree-dump-not doesn't match
> > until we resolve decreasing unsigned iv variables as non-overlapping.  I 
> > will
> > test dropping that pattern for now. Sorry for that.
> 
> But did you test the patch?  You should have noticed this during your
> own testing...

Yeah, I managed to keep unrelated change to ivopts in the tree that made the 
testcase
to pass. Sorry for that.

Honza
> 
> Richard.


Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Richard Biener
On Wed, 6 Jul 2016, Jan Hubicka wrote:

> > Jan Hubicka  writes:
> > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > >
> > >   * gcc.dg/tree-ssa/scev-14.c: new testcase.
> > 
> > FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump-not ivopts "Overflowness 
> > wrto loop niter:\tOverflow"
> 
> Aha, this is wrong version of testcase. The scan-tree-dump-not doesn't match
> until we resolve decreasing unsigned iv variables as non-overlapping.  I will
> test dropping that pattern for now. Sorry for that.

But did you test the patch?  You should have noticed this during your
own testing...

Richard.


Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Jan Hubicka
> Jan Hubicka  writes:
> 
> > Bootstrapped/regtested x86_64-linux, OK?
> >
> > * gcc.dg/tree-ssa/scev-14.c: new testcase.
> 
> FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump-not ivopts "Overflowness wrto 
> loop niter:\tOverflow"

Aha, this is wrong version of testcase. The scan-tree-dump-not doesn't match
until we resolve decreasing unsigned iv variables as non-overlapping.  I will
test dropping that pattern for now. Sorry for that.

Honza


Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-06 Thread Yuri Rumyantsev
Richard,

I pointed out in the commentary that REF is defined inside loop and
this check is related to this statement. Should I clarify it?

+  /* We consider REF defined in LOOP as independent if at least 2 loop
+ iterations are not dependent.  */


2016-07-06 12:38 GMT+03:00 Richard Biener :
> On Tue, Jul 5, 2016 at 4:56 PM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is a simple fix to cure regressions introduced by my fix for
>> 70729. Patch also contains minor changes in test found by Jakub.
>>
>> Bootstrapping and regression testing did not show any new failures.
>>
>> Is it OK for trunk?
>
> +  && bitmap_bit_p (_accesses.refs_in_loop[loop->num], ref->id))
>
> So safelen does not apply to refs in nested loops?  The added comment only
> explains the safelen check change but this also requires explanation.
>
> Richard.
>
>> ChangeLog:
>> 2016-07-05  Yuri Rumyantsev  
>>
>> PR tree-optimization/71734
>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF defined in
>> LOOP as independent if at least two loop iterations are not dependent.
>> gcc/testsuite/ChangeLog:
>> * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.


Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-06 Thread Richard Biener
On Tue, Jul 5, 2016 at 4:56 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is a simple fix to cure regressions introduced by my fix for
> 70729. Patch also contains minor changes in test found by Jakub.
>
> Bootstrapping and regression testing did not show any new failures.
>
> Is it OK for trunk?

+  && bitmap_bit_p (_accesses.refs_in_loop[loop->num], ref->id))

So safelen does not apply to refs in nested loops?  The added comment only
explains the safelen check change but this also requires explanation.

Richard.

> ChangeLog:
> 2016-07-05  Yuri Rumyantsev  
>
> PR tree-optimization/71734
> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF defined in
> LOOP as independent if at least two loop iterations are not dependent.
> gcc/testsuite/ChangeLog:
> * g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.


Re: [patch] Fix type merging deficiency during WPA

2016-07-06 Thread Eric Botcazou
> I see.  I think the solution is to perform cgraph/varpool merging
> before attempting to read in
> the global decl stream.  IIRC Micha had (old) patches for this.

How can you merge varpool nodes if you haven't merged types?

> But I wonder why we don't tree-merge 'n' here (from my C example) and
> thus figure
> that the type domain of x is equal?  Or is it that 'n' and 'x' are in
> the same SCC (they
> referece each other in some way)?  In this case the bug would be that we
> fail to treat them equal optimistically.  That said, I don't see how
> TYPE_CANONICAL computation is relevant - what is relevant is the failure to
> merge the two types.
> In debugging this I'd start to see if the hashes are not equal or if
> they are equal
> at which node we consider them to differ.

We just have 2 different DECLs with different DECL_UIDs, the definition from a 
compilation unit and a reference from another compilation unit, so the hashes 
naturally differ too.  What's supposed to have them reconciled at this point?

The TYPE_CANONICAL computation is relevant because, with GCC 6, the criterion 
for compatibility of pointer types is the alias set, which is based on the 
TYPE_CANONICAL of the pointed-to type, so we fail to merge pointer types 
because warn_type_compatibility_p returns non-zero if TYPE_CANONICAL differs.

-- 
Eric Botcazou


Re: [patch] Fix type merging deficiency during WPA

2016-07-06 Thread Richard Biener
On Wed, Jul 6, 2016 at 9:56 AM, Eric Botcazou  wrote:
>> So this is sth like (invalid C)
>>
>> t.h
>> ---
>> int n;
>> struct X { int x[n]; };
>>
>> t1.c
>> --
>> #include "t.h"
>> struct X x;
>> t2.c
>> --
>> #include "t.h"
>> struct X x;
>>
>> ?
>
> Essentially yes, but with a single definition for 'n' and references to it.
>
>> It's not obvious from the fix (which I think is in the wrong place)
>> which operand_equal/hash
>> call during WPA this is supposed to fix.  So can you please provide a
>> little more context here?
>
> It's called during the canonical type computation invoked from lto_read_decls:
>
>   /* Compute the canonical type of all types.
>  ???  Should be able to assert that !TYPE_CANONICAL.  */
>   if (TYPE_P (t) && !TYPE_CANONICAL (t))
> {
>   gimple_register_canonical_type (t);
>   if (odr_type_p (t))
> register_odr_type (t);
> }
>
> In particular for VLAs:
>
>   /* For array types hash the domain bounds and the string flag.  */
>   if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
> {
>   hstate.add_int (TYPE_STRING_FLAG (type));
>   /* OMP lowering can introduce error_mark_node in place of
>  random local decls in types.  */
>   if (TYPE_MIN_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
> inchash::add_expr (TYPE_MIN_VALUE (TYPE_DOMAIN (type)), hstate);
>   if (TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
> inchash::add_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
> }

I see.  I think the solution is to perform cgraph/varpool merging
before attempting to read in
the global decl stream.  IIRC Micha had (old) patches for this.

But I wonder why we don't tree-merge 'n' here (from my C example) and
thus figure
that the type domain of x is equal?  Or is it that 'n' and 'x' are in
the same SCC (they
referece each other in some way)?  In this case the bug would be that we fail to
treat them equal optimistically.  That said, I don't see how TYPE_CANONICAL
computation is relevant - what is relevant is the failure to merge the
two types.
In debugging this I'd start to see if the hashes are not equal or if
they are equal
at which node we consider them to differ.

Richard.

> --
> Eric Botcazou


Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Andreas Schwab
Jan Hubicka  writes:

> Bootstrapped/regtested x86_64-linux, OK?
>
>   * gcc.dg/tree-ssa/scev-14.c: new testcase.

FAIL: gcc.dg/tree-ssa/scev-14.c scan-tree-dump-not ivopts "Overflowness wrto 
loop niter:\tOverflow"

$ grep Overflowness scev-14.c.155t.ivopts 
  Overflowness wrto loop niter: No-overflow
  Overflowness wrto loop niter: No-overflow
  Overflowness wrto loop niter: Overflow
  Overflowness wrto loop niter: Overflow
  Overflowness wrto loop niter: No-overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow
Overflowness wrto loop niter:   Overflow

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, 5 Jul 2016, Jan Hubicka wrote:
>
>> > On Tue, 5 Jul 2016, Richard Biener wrote:
>> > 
>> > >given widest_int has only precision of TImode on x86_64?
>> > 
>> > Is that the case? The comments say:
>> > 
>> >  It is really finite precision math where the precision is 4 times the
>> >  size of the largest integer that the target port can represent.
>> > 
>> > And the target has
>> > 
>> > /* Keep the OI and XI modes from confusing the compiler into thinking
>> >that these modes could actually be used for computation.  They are
>> >only holders for vectors during data movement.  */
>> > #define MAX_BITSIZE_MODE_ANY_INT (128)
>> > 
>> > I would thus expect widest_int to have at 512 bits on x86_64
>> > (possibly more depending on the exact definition of largest
>> > integer).
>> 
>> I think that comment is just confusing. (I got trapped by it, too)

FWIW, I think the comment dates from Kenny's original patch, where
widest_int really was suitable as an approximation of "infinite"
precision.  That got changed during review because of space concerns.

>> /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
>>early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
>>can accomodate at least 1 more bit so that unsigned numbers of that
>>mode can be represented as a signed value.  Note that it is still
>>possible to create fixed_wide_ints that have precisions greater than
>>MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
>>double-width multiplication result, for example.  */
>> 
>> #define WIDE_INT_MAX_ELTS \
>>   ((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT) / 
>> HOST_BITS_PER_WIDE_INT)
>> 
>> #define WIDE_INT_MAX_PRECISION (WIDE_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT)
>> 
>> typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION) widest_int;
>> 
>> My reading is that the type will end up being 128+64 bits, but there is only
>> one extra bit guarnatee in general, which is taken by sign.
>
> Yeah, I think the other comment should be adjusted accordingly.  I
> didn't remember we have that one extra bit either ... ;)  (given wide-ints
> have unsigned variants of ops I wonder if it is really necessary, but who
> knows - the wide-int rep w/o a sign is really sth odd and I blame RTL 
> for it).

Hmm, yeah.  We definitely need the extra bit for widest_int, but I'm
not sure why we need it for wide_int.

Thanks,
Richard


Re: [PATCH][AArch64] Improve Cortex-A53 integer scheduler

2016-07-06 Thread Richard Earnshaw (lists)
On 05/07/16 16:00, Wilco Dijkstra wrote:
> This patch improves the accuracy of the Cortex-A53 integer scheduler,
> resulting in performance gains across a wide range of benchmarks.
> 
> OK for commit?
> 

OK.

R.

> ChangeLog:
> 2016-07-05  Wilco Dijkstra  
> 
>   * config/arm/cortex-a53.md: Use final_presence_set for in-order.
>   (cortex_a53_shift): Add mov_shift.
>   (cortex_a53_shift_reg): Add new reservation for register shifts.
>   (cortex_a53_alu): Remove bfm.
>   (cortex_a53_alu_shift): Add bfm, remove mov_shift.
>   (cortex_a53_alu_extr): Add new reservation for EXTR.
>   (bypasses): Improve bypass modelling.
> 
> ---
> diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
> index 
> fc60bc26c7caf7e94064d7f292b877b12f333fca..70c0f4daabe0ccb8e32808f1af51f5460e087a18
>  100644
> --- a/gcc/config/arm/cortex-a53.md
> +++ b/gcc/config/arm/cortex-a53.md
> @@ -30,6 +30,7 @@
>  
>  (define_cpu_unit "cortex_a53_slot0" "cortex_a53")
>  (define_cpu_unit "cortex_a53_slot1" "cortex_a53")
> +(final_presence_set "cortex_a53_slot1" "cortex_a53_slot0")
>  
>  (define_reservation "cortex_a53_slot_any"
>   "cortex_a53_slot0\
> @@ -71,41 +72,43 @@
>  
>  (define_insn_reservation "cortex_a53_shift" 2
>(and (eq_attr "tune" "cortexa53")
> -   (eq_attr "type" "adr,shift_imm,shift_reg,mov_imm,mvn_imm"))
> +   (eq_attr "type" "adr,shift_imm,mov_imm,mvn_imm,mov_shift"))
>"cortex_a53_slot_any")
>  
> -(define_insn_reservation "cortex_a53_alu_rotate_imm" 2
> +(define_insn_reservation "cortex_a53_shift_reg" 2
>(and (eq_attr "tune" "cortexa53")
> -   (eq_attr "type" "rotate_imm"))
> -  "(cortex_a53_slot1)
> -   | (cortex_a53_single_issue)")
> +   (eq_attr "type" "shift_reg,mov_shift_reg"))
> +  "cortex_a53_slot_any+cortex_a53_hazard")
>  
>  (define_insn_reservation "cortex_a53_alu" 3
>(and (eq_attr "tune" "cortexa53")
> (eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,
>   alu_sreg,alus_sreg,logic_reg,logics_reg,
>   adc_imm,adcs_imm,adc_reg,adcs_reg,
> - bfm,csel,clz,rbit,rev,alu_dsp_reg,
> - mov_reg,mvn_reg,
> - mrs,multiple,no_insn"))
> + csel,clz,rbit,rev,alu_dsp_reg,
> + mov_reg,mvn_reg,mrs,multiple,no_insn"))
>"cortex_a53_slot_any")
>  
>  (define_insn_reservation "cortex_a53_alu_shift" 3
>(and (eq_attr "tune" "cortexa53")
> (eq_attr "type" "alu_shift_imm,alus_shift_imm,
>   crc,logic_shift_imm,logics_shift_imm,
> - alu_ext,alus_ext,
> - extend,mov_shift,mvn_shift"))
> + alu_ext,alus_ext,bfm,extend,mvn_shift"))
>"cortex_a53_slot_any")
>  
>  (define_insn_reservation "cortex_a53_alu_shift_reg" 3
>(and (eq_attr "tune" "cortexa53")
> (eq_attr "type" "alu_shift_reg,alus_shift_reg,
>   logic_shift_reg,logics_shift_reg,
> - mov_shift_reg,mvn_shift_reg"))
> + mvn_shift_reg"))
>"cortex_a53_slot_any+cortex_a53_hazard")
>  
> -(define_insn_reservation "cortex_a53_mul" 3
> +(define_insn_reservation "cortex_a53_alu_extr" 3
> +  (and (eq_attr "tune" "cortexa53")
> +   (eq_attr "type" "rotate_imm"))
> +  "cortex_a53_slot1|cortex_a53_single_issue")
> +
> +(define_insn_reservation "cortex_a53_mul" 4
>(and (eq_attr "tune" "cortexa53")
> (ior (eq_attr "mul32" "yes")
>   (eq_attr "mul64" "yes")))
> @@ -189,49 +192,43 @@
>  (define_insn_reservation "cortex_a53_branch" 0
>(and (eq_attr "tune" "cortexa53")
> (eq_attr "type" "branch,call"))
> -  "cortex_a53_slot_any,cortex_a53_branch")
> +  "cortex_a53_slot_any+cortex_a53_branch")
>  
>  
>  ;; General-purpose register bypasses
>  
>  
> -;; Model bypasses for unshifted operands to ALU instructions.
> +;; Model bypasses for ALU to ALU instructions.
>  
> -(define_bypass 1 "cortex_a53_shift"
> -  "cortex_a53_shift")
> +(define_bypass 0 "cortex_a53_shift*"
> +  "cortex_a53_alu")
>  
> -(define_bypass 1 "cortex_a53_alu,
> -   cortex_a53_alu_shift*,
> -   cortex_a53_alu_rotate_imm,
> -   cortex_a53_shift"
> +(define_bypass 1 "cortex_a53_shift*"
> +  "cortex_a53_shift*,cortex_a53_alu_*")
> +
> +(define_bypass 1 "cortex_a53_alu*"
>"cortex_a53_alu")
>  
> -(define_bypass 2 "cortex_a53_alu,
> -   cortex_a53_alu_shift*"
> +(define_bypass 1 "cortex_a53_alu*"
>"cortex_a53_alu_shift*"
>"aarch_forward_to_shift_is_not_shifted_reg")
>  
> -;; In our model, we allow any general-purpose register operation to
> -;; bypass to the accumulator operand of an integer 

Re: [BUILDROBOT] Selftest failed for i686-wrs-vxworks

2016-07-06 Thread Jan-Benedict Glaw
On Thu, 2016-06-30 16:09:23 -0400, David Malcolm  wrote:
> On Thu, 2016-06-30 at 08:38 -0400, Nathan Sidwell wrote:
> > > I haven't given it any additional manual testing so far. It's
> > > pre-installation though. Maybe I'd just set WIND_BASE to some
> > > arbitrary value, just to make xgcc pass it's initial start-up
> > > test so that it can continue with self-testing? Or shall we set
> > > some value in gcc/Makefile.in for running the self-test?
> > 
> > As I recall, WIND_BASE is expected to point at a vxworks install
> > to pick up header files.  It is an error not to have it set
> > (silently skipping it leads to user confusion).
> > 
> > If that's irrelevant for this testing environment, then setting it
> > to something (probably just "", but safer might be
> > "/These.are.not.the.dirs.you.are.looking.for") should be fine.
> 
> Sorry about the breakage.

You just uncovered it :)

> The error message appears to affect a few other targets within
> gcc/Makefile.in.

[...]
> 
> So there are at least 2 ways of fixing this:
> 
> (a) add "-nostdinc" when running the selftests i.e. to the invocations
> of GCC_FOR_TARGET in the "s-selftest" and "selftest-gdb" clauses of
> gcc/Makefile.in.
> I've verified that this fixes the issue for --target=i686-wrs-vxworks.
> 
> (b) set WIND_BASE to a dummy value in contrib/config-list.mk (if not
> already set) so that the vxworks targets are able to at least build all
> of "gcc" without needing a vxworks install.

I'd probably just do (b) and go for it. Easy, non-invasive fix.

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
 Signature of:Don't believe in miracles: Rely on them!
 the second  :


signature.asc
Description: Digital signature


Re: [patch] Fix type merging deficiency during WPA

2016-07-06 Thread Eric Botcazou
> So this is sth like (invalid C)
> 
> t.h
> ---
> int n;
> struct X { int x[n]; };
> 
> t1.c
> --
> #include "t.h"
> struct X x;
> t2.c
> --
> #include "t.h"
> struct X x;
> 
> ?

Essentially yes, but with a single definition for 'n' and references to it.

> It's not obvious from the fix (which I think is in the wrong place)
> which operand_equal/hash
> call during WPA this is supposed to fix.  So can you please provide a
> little more context here?

It's called during the canonical type computation invoked from lto_read_decls:

  /* Compute the canonical type of all types.
 ???  Should be able to assert that !TYPE_CANONICAL.  */
  if (TYPE_P (t) && !TYPE_CANONICAL (t))
{
  gimple_register_canonical_type (t);
  if (odr_type_p (t))
register_odr_type (t);
}

In particular for VLAs:

  /* For array types hash the domain bounds and the string flag.  */
  if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
{
  hstate.add_int (TYPE_STRING_FLAG (type));
  /* OMP lowering can introduce error_mark_node in place of
 random local decls in types.  */
  if (TYPE_MIN_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
inchash::add_expr (TYPE_MIN_VALUE (TYPE_DOMAIN (type)), hstate);
  if (TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != error_mark_node)
inchash::add_expr (TYPE_MAX_VALUE (TYPE_DOMAIN (type)), hstate);
}

-- 
Eric Botcazou


Re: Use iv_can_overflow_p in ivopts

2016-07-06 Thread Richard Biener
On Tue, 5 Jul 2016, Jan Hubicka wrote:

> Hi,
> this patch makes ivopts to use iv_can_overflow_p on its candidates. This helps
> to determine if candidate wraps in case it is not directly originating from IV
> variable (i.e. it is derived IV or artificial one). For those we can not use
> type information because we do now know if they are going to be computed each
> iteration. We can still use the iv_can_overflow_p analysis.
> 
> I also wrote code that propagates overflow flag from original IVs to derived
> ones and it does improve some of real world benchmarks. This patch alone seems
> quite benchmark neutral but I would like to proceed in smaller steps.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
>   * tree-scalar-evolution.c (iv_can_overflow_p): Export.
>   * tree-scalar-evolution.h (iv_can_overflow_p): Declare.
>   * tree-ssa-loop-ivopts.c (alloc_iv): Use it.
> 
> Index: tree-scalar-evolution.c
> ===
> --- tree-scalar-evolution.c   (revision 238012)
> +++ tree-scalar-evolution.c   (working copy)
> @@ -3317,7 +3317,7 @@ scev_reset (void)
> use this test even for derived IVs not computed every iteration or
> hypotetical IVs to be inserted into code.  */
>  
> -static bool
> +bool
>  iv_can_overflow_p (struct loop *loop, tree type, tree base, tree step)
>  {
>widest_int nit;
> Index: tree-scalar-evolution.h
> ===
> --- tree-scalar-evolution.h   (revision 238005)
> +++ tree-scalar-evolution.h   (working copy)
> @@ -38,6 +38,7 @@ extern unsigned int scev_const_prop (voi
>  extern bool expression_expensive_p (tree);
>  extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv 
> *,
>  bool);
> +extern bool iv_can_overflow_p (struct loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (struct loop *, tree);
>  
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
> Index: tree-ssa-loop-ivopts.c
> ===
> --- tree-ssa-loop-ivopts.c(revision 238005)
> +++ tree-ssa-loop-ivopts.c(working copy)
> @@ -1181,6 +1182,9 @@ alloc_iv (struct ivopts_data *data, tree
>iv->biv_p = false;
>iv->nonlin_use = NULL;
>iv->ssa_name = NULL_TREE;
> +  if (!no_overflow && !iv_can_overflow_p (data->current_loop, TREE_TYPE 
> (base),
> +   base, step))

please put the && to the next line.

Ok with that change.

Richard.

> +no_overflow = true;
>iv->no_overflow = no_overflow;
>iv->have_address_use = false;
>  
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Determine more IVs to be non-overflowing

2016-07-06 Thread Richard Biener
On Tue, 5 Jul 2016, Jan Hubicka wrote:

> > On Tue, 5 Jul 2016, Richard Biener wrote:
> > 
> > >given widest_int has only precision of TImode on x86_64?
> > 
> > Is that the case? The comments say:
> > 
> >  It is really finite precision math where the precision is 4 times the
> >  size of the largest integer that the target port can represent.
> > 
> > And the target has
> > 
> > /* Keep the OI and XI modes from confusing the compiler into thinking
> >that these modes could actually be used for computation.  They are
> >only holders for vectors during data movement.  */
> > #define MAX_BITSIZE_MODE_ANY_INT (128)
> > 
> > I would thus expect widest_int to have at 512 bits on x86_64
> > (possibly more depending on the exact definition of largest
> > integer).
> 
> I think that comment is just confusing. (I got trapped by it, too)
> 
> /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
>early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
>can accomodate at least 1 more bit so that unsigned numbers of that
>mode can be represented as a signed value.  Note that it is still
>possible to create fixed_wide_ints that have precisions greater than
>MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
>double-width multiplication result, for example.  */
> 
> #define WIDE_INT_MAX_ELTS \
>   ((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT) / 
> HOST_BITS_PER_WIDE_INT)
> 
> #define WIDE_INT_MAX_PRECISION (WIDE_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT)
> 
> typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION) widest_int;
> 
> My reading is that the type will end up being 128+64 bits, but there is only
> one extra bit guarnatee in general, which is taken by sign.

Yeah, I think the other comment should be adjusted accordingly.  I
didn't remember we have that one extra bit either ... ;)  (given wide-ints
have unsigned variants of ops I wonder if it is really necessary, but who
knows - the wide-int rep w/o a sign is really sth odd and I blame RTL 
for it).

Richard.

> Honza
> > 
> > -- 
> > Marc Glisse
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[Committed] S/390: Fix vecinit expansion.

2016-07-06 Thread Andreas Krebbel
The fallback routine in the S/390 vecinit expander did not check
whether each of the initializer elements is a proper general_operand.
Since revision r236582 the expander is invoked also with e.g. symbol
refs with an odd addend resulting in invalid insns.

Fixed by forcing the element into a register in such cases.

gcc/ChangeLog:

2016-07-06  Andreas Krebbel  

* config/s390/s390.c (s390_expand_vec_init): Force initializer
element to register if it doesn't match general_operand.
---
 gcc/ChangeLog  |  5 +
 gcc/config/s390/s390.c | 16 +++-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f309904..b248acd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-07-06  Andreas Krebbel  
+
+   * config/s390/s390.c (s390_expand_vec_init): Force initializer
+   element to register if it doesn't match general_operand.
+
 2016-07-05  Michael Meissner  
Bill Schmidt  
 
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ee0187c..9d2b2c0 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6443,11 +6443,17 @@ s390_expand_vec_init (rtx target, rtx vals)
   /* Unfortunately the vec_init expander is not allowed to fail.  So
  we have to implement the fallback ourselves.  */
   for (i = 0; i < n_elts; i++)
-emit_insn (gen_rtx_SET (target,
-   gen_rtx_UNSPEC (mode,
-   gen_rtvec (3, XVECEXP (vals, 0, i),
-  GEN_INT (i), target),
-   UNSPEC_VEC_SET)));
+{
+  rtx elem = XVECEXP (vals, 0, i);
+  if (!general_operand (elem, GET_MODE (elem)))
+   elem = force_reg (inner_mode, elem);
+
+  emit_insn (gen_rtx_SET (target,
+ gen_rtx_UNSPEC (mode,
+ gen_rtvec (3, elem,
+GEN_INT (i), target),
+ UNSPEC_VEC_SET)));
+}
 }
 
 /* Structure to hold the initial parameters for a compare_and_swap operation
-- 
1.9.1