Re: Speedup configure and build with system.h

2016-01-22 Thread Jakub Jelinek
On Fri, Jan 22, 2016 at 12:09:43PM -0800, H.J. Lu wrote:
> > * system.h (string, algorithm): Include only conditionally.
> > (new): Include always under C++.
> > * bb-reorder.c (toplevel): Define USES_ALGORITHM.
> > * final.c (toplevel): Ditto.
> > * ipa-chkp.c (toplevel): Define USES_STRING.
> > * genconditions.c (write_header): Make gencondmd.c define
> > USES_STRING.
> > * mem-stats.h (mem_usage::print_dash_line): Don't use std::string.
> >
> 
> This may have caused:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69434

Guess we need:

2016-01-22  Jakub Jelinek  

PR bootstrap/69434
* genrecog.c: Define INCLUDE_ALGORITHM before including system.h,
remove  include.

--- gcc/genrecog.c.jj   2016-01-04 18:50:33.207491883 +0100
+++ gcc/genrecog.c  2016-01-22 21:21:42.852362294 +0100
@@ -105,6 +105,7 @@
5. Write out C++ code for each function.  */
 
 #include "bconfig.h"
+#define INCLUDE_ALGORITHM
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -112,7 +113,6 @@
 #include "errors.h"
 #include "read-md.h"
 #include "gensupport.h"
-#include 
 
 #undef GENERATOR_FILE
 enum true_rtx_doe {


Jakub


[PATCH][PR tree-optimization/69347] Speedup DOM slightly

2016-01-22 Thread Jeff Law
So as noted in BZ69347 we have regressed a bit in the amount of time 
spent in DOM for included testcase.  My previous patch helped a little, 
but there's still some low hanging fruit.


One of the things that was added during this cycle was the ability to do 
a bit of secondary equivalence discovery.  ie, we discover an 
equivalence due to an edge traversal -- we can, in limited 
circumstances, back propagate the known value backwards and discover 
additional equivalences.


That code turns out to be surprisingly expensive.  After a fair amount 
poking with perf & gprof a few things stick out.


While dominance testing is relatively cheap, when done often enough it 
gets expensive.


We can do a bit better.  Essentially we're walking over a set of 
immediate uses and seeing which of those uses dominate a given block.


Instead we can take the given block and compute its dominators into a 
bitmap.  We then check if the immediate use blocks are in the bitmap.


That turns out to be considerably faster, I believe that's because the 
immediate uses are typically clustered, thus there's a single element in 
the bitmap.  Testing is two memory loads and a memory bit test.


Contrast that to 9 memory loads for dominated_by_p if I'm counting 
correctly.


That cuts the amount of time spent in DOM in half for the 69347 testcase.

The second change is in cprop_into_successor_phis and was found as I was 
wandering the perf report.  Essentially SSA_NAME_VALUE will always be 
null, an SSA_NAME or a constant (min_invariant).  Thus the test that 
it's an SSA_NAME || is_gimple_min_invariant is totally useless.


Bootstrapped and regression tested on x86.  Also verified that for my 
bucket of .i files, there was no change in the resulting code.


Given the major issues are resolved, I'm moving this to a P4.  There's 
still a compile-time regression that's not accounted for, but it's 
relatively small and I suspect it's related to the computed goto used 
for a dispatch table -- which is certainly supported, but it's not the 
typical code pushed through GCC.  The big regression left is intentional 
as we've intentionally upped the parameter for when to avoid gcse.


Installed on the trunk.

Jeff

commit c3b7df35046db818c4c4b7d808b5a3471a0ee9b9
Author: Jeff Law 
Date:   Fri Jan 22 13:17:54 2016 -0700

PR middle-end/69347
* tree-ssa-dom.c (back_propagate_equivalences): Factored out of
record_temporary_equivalences.  Rewritten to avoid unnecessary calls
into dominated_by_p.
(cprop_into_successor_phis): Avoid unnecessary tests.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 85cde94..73df84f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2016-01-21  Jeff Law  
+
+   PR middle-end/69347
+   * tree-ssa-dom.c (back_propagate_equivalences): Factored out of
+   record_temporary_equivalences.  Rewritten to avoid unnecessary calls
+   into dominated_by_p.
+   (cprop_into_successor_phis): Avoid unnecessary tests.
+
 2016-01-22  Richard Henderson  
 
PR target/69416
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 84c9a6a..b690d92 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -819,6 +819,74 @@ dom_valueize (tree t)
   return t;
 }
 
+/* We have just found an equivalence for LHS on an edge E.
+   Look backwards to other uses of LHS and see if we can derive
+   additional equivalences that are valid on edge E.  */
+static void
+back_propagate_equivalences (tree lhs, edge e,
+class const_and_copies *const_and_copies)
+{
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  bitmap domby = NULL;
+  basic_block dest = e->dest;
+
+  /* Iterate over the uses of LHS to see if any dominate E->dest.
+ If so, they may create useful equivalences too.
+
+ ???  If the code gets re-organized to a worklist to catch more
+ indirect opportunities and it is made to handle PHIs then this
+ should only consider use_stmts in basic-blocks we have already visited.  
*/
+  FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
+{
+  gimple *use_stmt = USE_STMT (use_p);
+
+  /* Often the use is in DEST, which we trivially know we can't use.
+This is cheaper than the dominator set tests below.  */
+  if (dest == gimple_bb (use_stmt))
+   continue;
+
+  /* Filter out statements that can never produce a useful
+equivalence.  */
+  tree lhs2 = gimple_get_lhs (use_stmt);
+  if (!lhs2 || TREE_CODE (lhs2) != SSA_NAME)
+   continue;
+
+  /* Profiling has shown the domination tests here can be fairly
+expensive.  We get significant improvements by building the
+set of blocks that dominate BB.  We can then just test
+for set membership below.
+
+We also initialize the set lazily since often the only uses
+are going to be in the same block as DEST.  */
+  if (!domby)
+   {
+ 

C++ PATCH for c++/69392 (ICE with 'this' init-capture)

2016-01-22 Thread Jason Merrill
We're already handling capture of 'this' specially, but we need to do 
that in the case of init-capture as well.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 3d101d4f78d0940b54838695df35874d18f827d0
Author: Jason Merrill 
Date:   Thu Jan 21 16:45:42 2016 -0500

	PR c++/69392
	* lambda.c (lambda_capture_field_type): Handle 'this' specially
	for init-capture, too.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 3b0ea18..93b192c 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -207,15 +207,8 @@ tree
 lambda_capture_field_type (tree expr, bool explicit_init_p)
 {
   tree type;
-  if (explicit_init_p)
-{
-  type = make_auto ();
-  type = do_auto_deduction (type, expr, type);
-}
-  else
-type = non_reference (unlowered_expr_type (expr));
-  if (type_dependent_expression_p (expr)
-  && !is_this_parameter (tree_strip_nop_conversions (expr)))
+  bool is_this = is_this_parameter (tree_strip_nop_conversions (expr));
+  if (!is_this && type_dependent_expression_p (expr))
 {
   type = cxx_make_type (DECLTYPE_TYPE);
   DECLTYPE_TYPE_EXPR (type) = expr;
@@ -223,6 +216,13 @@ lambda_capture_field_type (tree expr, bool explicit_init_p)
   DECLTYPE_FOR_INIT_CAPTURE (type) = explicit_init_p;
   SET_TYPE_STRUCTURAL_EQUALITY (type);
 }
+  else if (!is_this && explicit_init_p)
+{
+  type = make_auto ();
+  type = do_auto_deduction (type, expr, type);
+}
+  else
+type = non_reference (unlowered_expr_type (expr));
   return type;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-init14.C b/gcc/testsuite/g++.dg/cpp1y/lambda-init14.C
new file mode 100644
index 000..f7fffc5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-init14.C
@@ -0,0 +1,19 @@
+// PR c++/69392
+// { dg-do compile { target c++14 } }
+
+template 
+class Foo {
+  public:
+void foo(void) {}
+auto getCallableFoo(void) {
+  return
+[ptr = this]() { ptr->foo(); };
+}
+};
+
+int main()
+{
+  Foo f;
+  auto callable = f.getCallableFoo();
+  callable();
+}


Re: C++ PATCH for c++/69379 (ICE with PTRMEM_CST wrapped in NOP_EXPR)

2016-01-22 Thread Jason Merrill

On 01/22/2016 12:20 PM, Marek Polacek wrote:

On Thu, Jan 21, 2016 at 01:49:14PM -0500, Jason Merrill wrote:

On 01/21/2016 01:25 PM, Marek Polacek wrote:

The problem in this PR is that we have a PTRMEM_CST wrapped in NOP_EXPR
and fold_convert can't digest that.


Why didn't we fold away the NOP_EXPR before calling fold_convert?  I guess
we shouldn't call fold_convert on an un-folded operand.


So we start with fargs[j] = maybe_constant_value (argarray[j]); in
build_over_call, where argarray[j] is
(const struct
{
   void Dict:: (struct Dict *, int) * __pfn;
   long int __delta;
} &) _EXPR >>
so we go to the
3607 case NOP_EXPR:
case.  Here cxx_eval_constant_expression evaluates the inner ptrmem_cst,
then there's
3619 if (TREE_CODE (op) == PTRMEM_CST
3620 && !TYPE_PTRMEM_P (type))
3621   op = cplus_expand_constant (op);
but that doesn't trigger, because type is TYPE_PTRMEM_P.
Then we fold () the whole expression but that doesn't change the expression
(and I don't think it should do anything with C++-specific PTRMEM_CST) so we're
stuck with NOP_EXPR around PTRMEM_CST.

So maybe cxx_eval_constant_expression should handle PTRMEM_CSTs wrapped in
NOP_EXPR specially, but I don't know how.


If we have a NOP_EXPR to the same type, we should strip it here.

Jason




Re: Speedup configure and build with system.h

2016-01-22 Thread H.J. Lu
On Thu, Jan 21, 2016 at 8:57 AM, Michael Matz  wrote:
> Hi,
>
> this has bothered me for some time.  The gcc configure with stage1 feels
> like taking forever because some of the decl availability tests (checking
> for C function) include system.h, and that, since a while, unconditionally
> includes  and  under C++, and we meanwhile use the C++
> compiler for configure tests (which makes sense).  Now, the difference for
> a debuggable (but not even checking-enabled) cc1plus for a file containing
> just main():
>
> % cat blaeh.cc
> #include 
> #include 
> #include 
> #include 
> int main() {}
> % cc1plus -quiet -ftime-report blaeh.cc
>  TOTAL :   0.12 0.01 0.14
>
> (This is btw. three times as expensive as with 4.8 headers (i.e.
> precompile with g++-4.8 then compile with the same cc1plus as above,
> taking 0.04 seconds; the STL headers bloat quite much over time)
>
> Well, not quite blazing fast but then adding :
>
> % cc1plus -quiet -ftime-report blaeh-string.cc
>  TOTAL :   0.60 0.05 0.66
>
> Meeh.  And adding  on top:
>
> % cc1plus -quiet -ftime-report blaeh-string-alg.cc
>  TOTAL :   1.13 0.09 1.23
>
> So, more than a second for checking if some C-only decl is available, just
> because system.h unconditionally includes mostly useless STL headers.
>
> So, how useless exactly?  A whopping single file of cc1 proper needs
> , _two_ files need , and a single target has an unlucky
> interface in its prototypes and also needs .  (One additional
> header lazily uses std::string for no particular reason).  So we pay about
> 5 minutes build time per stage (there are ~400 libbackend.a files) for
> more or less nothing.
>
> So, let's include those headers only conditionally; I'm pretty sure it's
> not unreasonable for a source file, if it needs a particular STL facility
> to #define USES_abcheader (like one normally would have to #include
> ) before the "system.h" include.
>
> See the patch.  I've grepped for target or language dependencies on other
> STL types, and either they were already including the right header, or
> were covered with the new system.h (i.e. I've built all targets quickly
> for which grepping for 'std::' returned anything).  The genconditions.c
> change is for the benefit of aarch64 as well, and it single function
> aarch64_get_extension_string_for_isa_flags returning a std::string.
>
> What do people think?  Should I pass it through a proper bootstrap and put
> it to trunk?  It's a (developer time) regression, right? ;-)
>
>
> Ciao,
> Michael.
> * system.h (string, algorithm): Include only conditionally.
> (new): Include always under C++.
> * bb-reorder.c (toplevel): Define USES_ALGORITHM.
> * final.c (toplevel): Ditto.
> * ipa-chkp.c (toplevel): Define USES_STRING.
> * genconditions.c (write_header): Make gencondmd.c define
> USES_STRING.
> * mem-stats.h (mem_usage::print_dash_line): Don't use std::string.
>

This may have caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69434

H.J.


[gomp4] fix atomic tests

2016-01-22 Thread Nathan Sidwell
These two tests presumed a particular ordering of atomicc operation execution, 
which is kind of anethema to why  you'd want atomic ops and parallelizing loops. 
 I've removed the more obviously incorrect assumptions, but I have a suspicion 
the fortran one at least still contains undefined behaviour.


nathan
2016-01-22  Nathan Sidwell  

	* testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c: Don't
	assume atomic op ordering.
	* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90: Likewise.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c	(revision 232738)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c	(working copy)
@@ -783,31 +783,6 @@ main(int argc, char **argv)
   fgot = 1.0;
   fexp = 0.0;
 
-#pragma acc data copy (fgot, fdata[0:N])
-  {
-#pragma acc parallel loop
-for (i = 0; i < N; i++)
-  {
-float expr = 32.0;
-
-#pragma acc atomic capture
-fdata[i] = fgot = expr - fgot;
-  }
-  }
-
-  for (i = 0; i < N; i++)
-if (i % 2 == 0)
-  {
-	if (fdata[i] != 31.0)
-	  abort ();
-  }
-else
-  {
-	if (fdata[i] != 1.0)
-	  abort ();
-  }
-
-
   /* BINOP = / */
   fexp = 1.0;
   fgot = 8192.0*8192.0*64.0;
Index: libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
===
--- libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90	(revision 232738)
+++ libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90	(working copy)
@@ -257,7 +257,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= iexp - 1) call abort
   if (igot /= iexp) call abort
 
   igot = N
@@ -272,7 +271,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= iexp) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -288,7 +286,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= ibset (iexp, N - 1)) call abort
   if (igot /= iexp) call abort
 
   igot = 0
@@ -304,7 +301,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= ieor (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -320,7 +316,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= ior (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   igot = 1
@@ -335,7 +330,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= iexp - 1) call abort
   if (igot /= iexp) call abort
 
   igot = N
@@ -350,7 +344,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= iexp) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -366,7 +359,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= ibset (iexp, N - 1)) call abort
   if (igot /= iexp) call abort
 
   igot = 0
@@ -382,7 +374,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= ieor (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -398,7 +389,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (itmp /= ior (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   fgot = 1234.0
@@ -525,7 +515,6 @@ program main
 end do
   !$acc end parallel loop
 
-  if (ftmp /= fexp) call abort
   if (fgot /= fexp) call abort
 
   fgot = 1.0


Re: [PATCH] c++/58109 - alignas() fails to compile with constant expression

2016-01-22 Thread Jason Merrill

On 01/20/2016 06:04 PM, Martin Sebor wrote:

Right.  The problem is this code in is_late_template_attribute:


  /* If the first attribute argument is an identifier, only consider
 second and following arguments.  Attributes like mode, format,
 cleanup and several target specific attributes aren't late
 just because they have an IDENTIFIER_NODE as first
argument.  */
  if (arg == args && identifier_p (t))
continue;


It shouldn't skip an initial identifier if !attribute_takes_identifier_p.


That seems backwards. I expected attribute_takes_identifier_p()
to return true for attribute aligned since the attribute does
take one.


There are some attributes (mode, format, cleanup) that have magic 
handling of identifiers; aligned treats its argument as an expression 
whether or not that expression takes the form of an identifier.



In any case, I changed the patch as you suggest and retested it
on x86_64.  I saw the email about stage 3 having ended but I'm
not sure it applies to changes that are still in progress.


I wouldn't think so; certainly not for something this simple.  The patch 
is OK.


Jason



Re: [PATCH] Fix the remaining PR c++/24666 blockers (arrays decay to pointers too early)

2016-01-22 Thread Jason Merrill

On 01/22/2016 11:17 AM, Patrick Palka wrote:

On Thu, 21 Jan 2016, Patrick Palka wrote:

On Thu, 21 Jan 2016, Jason Merrill wrote:


On 01/19/2016 10:30 PM, Patrick Palka wrote:

 * g++.dg/template/unify17.C: XFAIL.


Hmm, I'm not comfortable with deliberately regressing this testcase.


  template 
-void bar (void (T[5])); // { dg-error "array of 'void'" }
+void bar (void (T[5])); // { dg-error "array of 'void'" "" { xfail
*-*-* } }


Can we work it so that T[5] also is un-decayed in the DECL_ARGUMENTS
of bar, but decayed in the TYPE_ARG_TYPES?


I think so, I'll try it.


Well, I tried it and the result is really ugly and it only "somewhat"
works.  (Maybe I'm just missing something obvious though.)  The ugliness
comes from the fact that decaying an array parameter type of a function
type embedded deep within some tree structure requires rebuilding all
the tree structures in between to avoid issues due to tree sharing.


Yes, that does complicate things.


This approach only "somewhat" works because it currently looks through
function, pointer, reference and array types.


Right, you would need to handle template arguments as well.


And I just noticed that
this approach does not work at all for USING_DECLs because no PARM_DECL
is ever retained anyway in that case.


I don't understand what you mean about USING_DECLs.


I think a better, complete fix for this issue would be to, one way or
another, be able to get at the PARM_DECLs that correspond to a given
FUNCTION_TYPE.  Say, if, the TREE_CHAIN of a FUNCTION_TYPE optionally
pointed to its PARM_DECLs, or something.  What do you think?


Hmm.  So void(int[5]) and void(int*) would be distinct types, but they 
would share TYPE_CANONICAL, as though one is a typedef of the other? 
Interesting, but I'm not sure how that would interact with template 
argument canonicalization.  Well, that can probably be made to work by 
treating dependent template arguments as distinct more frequently.


Another thought: What if we keep a list of arrays we need to substitute 
into for a particular function template?



In the meantime, at this stage, I am personally most comfortable with
the previous patch (the one that XFAILs unify17.C).


I don't think that's a good tradeoff, sorry.  For the moment, let's 
revert your earlier patch.


Jason



[patch] libstdc++/69116 Constrain std::valarray functions and operators

2016-01-22 Thread Jonathan Wakely

The example in the PR is a sneaky little problem. When  is
included the following overload is declared:

template
_Expr
operator<<(const _Tp& __t, const valarray<_Tp>& __v);

This is a candidate function for any "a << b" expression with
namespace std as an associated namespace. In order to do overload
resolution valarray gets instantiated to see if there is
a conversion from decltype(b). When decltype(a) is an abstract type
valarray results in an error outside the immediate
context, and overload resolution stops with an error.

This could happen for any of the overloaded operators and functions
that work with valarray, so my solution is to adjust the __fun<> class
template so that the result type of valarray operations is not defined
for types that cannot be used in valarray.  When the result_type is
missing the valarray operators give a SFINAE deduction failure not a
hard error.

Currently the check is !__is_abstract(_Tp) but it could be tweaked to
also check other conditions that cause a problem.

The new test uses -std=gnu++98 because if it uses a later standard
then it fails due to similar unconstrained operators in , and
std::complex also fails if is_abstract, so I'll have to fix that
next.

Tested powerpc64-linux, comimtted to trunk.

This is a regression, caused by the front end starting to diagnose the
invalid library instantiations more eagerly. The fix seems simple and
safe, so I plan to backport it to the branches too.


commit 64f205467ab5822f4b75f9ae14933c52d5062f66
Author: Jonathan Wakely 
Date:   Fri Jan 22 17:09:06 2016 +

Constrain std::valarray functions and operators

	PR libstdc++/69116
	* include/bits/valarray_before.h (__fun, __fun_with_valarray): Only
	define result_type for types which can be safely used with valarrays.
	* testsuite/26_numerics/valarray/69116.cc: New.

diff --git a/libstdc++-v3/include/bits/valarray_before.h b/libstdc++-v3/include/bits/valarray_before.h
index 3325bf8..86136f4 100644
--- a/libstdc++-v3/include/bits/valarray_before.h
+++ b/libstdc++-v3/include/bits/valarray_before.h
@@ -331,14 +331,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return pow(__x, __y); }
   };
 
+  template
+struct __fun_with_valarray
+{
+  typedef _Tp result_type;
+};
+
+  template
+struct __fun_with_valarray<_Tp, false>
+{
+  // No result type defined for invalid value types.
+};
 
   // We need these bits in order to recover the return type of
   // some functions/operators now that we're no longer using
   // function templates.
   template
-struct __fun
+struct __fun : __fun_with_valarray<_Tp>
 {
-  typedef _Tp result_type;
 };
 
   // several specializations for relational operators.
diff --git a/libstdc++-v3/testsuite/26_numerics/valarray/69116.cc b/libstdc++-v3/testsuite/26_numerics/valarray/69116.cc
new file mode 100644
index 000..ef98267
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/valarray/69116.cc
@@ -0,0 +1,53 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile }
+// { dg-options "-std=gnu++98" }
+
+// libstdc++/69116
+
+#include 
+#include 
+
+template
+  void foo(const T&) { }
+
+struct X : std::exception // makes namespace std an associated namespace
+{
+  virtual void pure() = 0;
+
+  typedef void(*func_type)(const X&);
+
+  void operator+(func_type) const;
+  void operator-(func_type) const;
+  void operator*(func_type) const;
+  void operator/(func_type) const;
+  void operator%(func_type) const;
+  void operator<<(func_type) const;
+  void operator>>(func_type) const;
+};
+
+void foo(X& x)
+{
+  x + foo;
+  x - foo;
+  x * foo;
+  x / foo;
+  x % foo;
+  x << foo;
+  x >> foo;
+}


[gomp4] Fix handling of subarrays with update directive

2016-01-22 Thread James Norris

Hi,

The attached patch fixes a defect reported with gcc 5.2
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69414).
It is also the case, the issue is present in the gomp4
branch. The patch also adds additional testing.

Committed to gomp4 after bootstrap and regtesting.

Thanks,
Jim

Index: ChangeLog.gomp
===
--- ChangeLog.gomp	(revision 232740)
+++ ChangeLog.gomp	(revision 232741)
@@ -1,3 +1,11 @@
+2016-01-22  James Norris  
+
+	* oacc-mem.c (delete_copyout, update_dev_host): Fix device address.
+	* testsuite/libgomp.oacc-c-c++-common/update-1.c: Additional tests.
+	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/update-1.f90: New file.
+	* testsuite/libgomp.oacc-fortran/update-1-2.f90: Likewise.
+
 2016-01-22  Nathan Sidwell  
 
 	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Specify vector.
Index: oacc-mem.c
===
--- oacc-mem.c	(revision 232740)
+++ oacc-mem.c	(revision 232741)
@@ -509,7 +509,7 @@
   gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
 }
 
-  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);
 
   host_size = n->host_end - n->host_start;
 
@@ -562,7 +562,7 @@
   gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
 }
 
-  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);
 
   gomp_mutex_unlock (_dev->lock);
 
Index: testsuite/libgomp.oacc-fortran/update-1-2.f90
===
--- testsuite/libgomp.oacc-fortran/update-1-2.f90	(revision 0)
+++ testsuite/libgomp.oacc-fortran/update-1-2.f90	(revision 232741)
@@ -0,0 +1,239 @@
+! Copy of update-1.f90 with self exchanged with host for !$acc update
+
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+program update
+  use openacc
+  implicit none 
+  integer, parameter :: N = 8
+  real :: a(N), b(N)
+  integer i
+
+  do i = 1, N
+a(i) = 3.0
+b(i) = 0.0
+  end do
+
+  !$acc enter data copyin (a, b)
+
+  !$acc parallel present (a, b)
+do i = 1, N
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, N
+if (a(i) .ne. 3.0) call abort
+if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .neqv. .TRUE.) call abort
+  if (acc_is_present (b) .neqv. .TRUE.) call abort
+
+  do i = 1, N
+a(i) = 5.0
+b(i) = 1.0
+  end do
+
+  !$acc update device (a, b)
+
+  !$acc parallel present (a, b)
+do i = 1, N
+  b(i) = a(i)
+end do 
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, N
+if (a(i) .ne. 5.0) call abort
+if (b(i) .ne. 5.0) call abort
+ end do
+
+  if (acc_is_present (a) .neqv. .TRUE.) call abort
+  if (acc_is_present (b) .neqv. .TRUE.) call abort
+
+  !$acc parallel present (a, b)
+  do i = 1, N
+b(i) = a(i)
+  end do
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, N
+if (a(i) .ne. 5.0) call abort
+if (b(i) .ne. 5.0) call abort
+  end do
+
+  do i = 1, N
+a(i) = 6.0
+b(i) = 0.0
+  end do
+
+  !$acc update device (a, b)
+
+  do i = 1, N
+a(i) = 9.0
+  end do
+
+  !$acc parallel present (a, b)
+do i = 1, N
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, N
+if (a(i) .ne. 6.0) call abort
+if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .neqv. .TRUE.) call abort
+  if (acc_is_present (b) .neqv. .TRUE.) call abort
+
+  do i = 1, N
+a(i) = 7.0
+b(i) = 2.0
+  end do
+
+  !$acc update device (a, b)
+
+  do i = 1, N
+a(i) = 9.0
+  end do
+
+  !$acc parallel present (a, b)
+do i = 1, N
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, N
+if (a(i) .ne. 7.0) call abort
+if (b(i) .ne. 7.0) call abort
+  end do
+
+  do i = 1, N
+a(i) = 9.0
+  end do
+
+  !$acc update device (a)
+
+  !$acc parallel present (a, b)
+do i = 1, N
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, N
+if (a(i) .ne. 9.0) call abort
+if (b(i) .ne. 9.0) call abort
+  end do
+
+  if (acc_is_present (a) .neqv. .TRUE.) call abort
+  if (acc_is_present (b) .neqv. .TRUE.) call abort
+
+  do i = 1, N
+a(i) = 5.0
+  end do
+
+  !$acc update device (a)
+
+  do i = 1, N
+a(i) = 6.0
+  end do
+
+  !$acc update device (a(1:rshift (N, 1)))
+
+  !$acc parallel present (a, b)
+do i = 1, N
+  b(i) = a(i)
+end do
+  !$acc end parallel
+
+  !$acc update self (a, b)
+
+  do i = 1, rshift (N, 1)
+if (a(i) .ne. 6.0) call abort
+if (b(i) .ne. 6.0) call abort
+  end do
+
+  do i = rshift (N, 1) + 1, N
+if (a(i) .ne. 5.0) call abort
+

[PATCH, COMMITTED] Fix a ChangeLog entry

2016-01-22 Thread Bernd Edlinger
This fixes my name in gcc/ChangeLog.

Index: ChangeLog
===
--- ChangeLog   (revision 232742)
+++ ChangeLog   (working copy)
@@ -144,7 +144,7 @@
* lra-coalesce.c (lra_coalesce): Invalidate value for the result
pseudo instead of inheritance ones.
 
-2016-01-21  Bernd Enlinger  
+2016-01-21  Bernd Edlinger  
Nick Clifton  
 
PR target/69129

committed as r232743.


[gomp4] fix some tests

2016-01-22 Thread Nathan Sidwell
I discovered these tests were relying on implicitly using vector partitioning, 
rather than specifying it explicitly.


Fixed thusly.

nathan
Index: gcc/testsuite/c-c++-common/goacc/reduction-1.c
===
--- gcc/testsuite/c-c++-common/goacc/reduction-1.c	(revision 232738)
+++ gcc/testsuite/c-c++-common/goacc/reduction-1.c	(working copy)
@@ -12,55 +12,55 @@ main(void)
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
 result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
 result *= array[i];
 
   /* 'max' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (max:result)
+#pragma acc loop vector reduction (max:result)
   for (i = 0; i < n; i++)
 result = result > array[i] ? result : array[i];
 
   /* 'min' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (min:result)
+#pragma acc loop vector reduction (min:result)
   for (i = 0; i < n; i++)
 result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&:result)
+#pragma acc loop vector reduction (&:result)
   for (i = 0; i < n; i++)
 result &= array[i];
 
   /* '|' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (|:result)
+#pragma acc loop vector reduction (|:result)
   for (i = 0; i < n; i++)
 result |= array[i];
 
   /* '^' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (^:result)
+#pragma acc loop vector reduction (^:result)
   for (i = 0; i < n; i++)
 result ^= array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
 lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
 lresult = lresult || (result > array[i]);
 
Index: gcc/testsuite/c-c++-common/goacc/reduction-2.c
===
--- gcc/testsuite/c-c++-common/goacc/reduction-2.c	(revision 232738)
+++ gcc/testsuite/c-c++-common/goacc/reduction-2.c	(working copy)
@@ -12,37 +12,37 @@ main(void)
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
 result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
 result *= array[i];
 
   /* 'max' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (max:result)
+#pragma acc loop vector reduction (max:result)
   for (i = 0; i < n; i++)
 result = result > array[i] ? result : array[i];
 
   /* 'min' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (min:result)
+#pragma acc loop vector reduction (min:result)
   for (i = 0; i < n; i++)
 result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc loop vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
 lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc loop vector reduction (||:lresult)
   for (i = 0; i < n; i++)
 lresult = lresult || (result > array[i]);
 
Index: gcc/testsuite/c-c++-common/goacc/reduction-3.c
===
--- gcc/testsuite/c-c++-common/goacc/reduction-3.c	(revision 232738)
+++ gcc/testsuite/c-c++-common/goacc/reduction-3.c	(working copy)
@@ -12,37 +12,37 @@ main(void)
 
   /* '+' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc loop vector reduction (+:result)
   for (i = 0; i < n; i++)
 result += array[i];
 
   /* '*' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc loop vector reduction (*:result)
   for (i = 0; i < n; i++)
 result *= array[i];
 
   /* 'max' reductions.  */
 #pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (max:result)
+#pragma acc loop vector reduction (max:result)
   for (i = 0; i < n; i++)
 result = result > array[i] ? result : array[i];
 
 

Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Jan Hubicka
> > Why do you say so? There are C->Ada calls as there are Ada->C calls in
> > plenty of existing software.
> 
> But what percentage of the C->Ada ones are performance critical?  Note that, 
> unlike the Ada->C or Ada/C++ ones, these have never been inlined and I can 

I think we was inlining them with LTO until I installed the patch.  Most of time
DECL_STRUCT_FUNCTION == NULL for WPA and thus the original check testing the
flags was disabled.  We did not update the EH coddegen during inlining, so 
probably
we just did not produce non-call EH for these.

Honza


Re: [PATCH] Improve TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS callback

2016-01-22 Thread Vladimir Makarov

On 01/22/2016 10:34 AM, Wilco Dijkstra wrote:

Improve TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS target hook. It turns out there
is another case where the register allocator uses the union of register classes
without checking that the cost of the resulting register class is lower than
both (see https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01765.html ).
It is hard to provide the best solution how to behave in this 
situation.  There are situations when choosing union class, if its cost 
is less than memory,  is a right solution.  Sometimes it is not as in 
this case.  Probably, we could choose the best choice in RA adding modes 
into consideration.  The problem is also that many target hooks don't 
consider all classes sometimes and based on a practice going from reload 
pass times.


However,  the patch itself is ok for me.

Thanks, Wilco.




Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-22 Thread Bernd Schmidt

On 01/22/2016 08:03 PM, Ian Lance Taylor wrote:

Updated patch.

I've verified that I'm changing the only relevant place in
tree-ssa-loop-ivopts.c that creates a POINTER_PLUS_EXPR, so I do think
that this is the only changed to fix the problem for ivopts.


I don't think so. One of the problems with ivopts is that it likes to 
cast everything to unsigned int, so looking for POINTER_PLUS_EXPR 
wouldn't find all affected spots. At least this used to happen, I didn't 
check recently. Also, a lot of the generated expressions are built by 
tree-affine.c rather than in ivopts directly.



Bernd




Re: [gomp4] fix some tests

2016-01-22 Thread Nathan Sidwell
These two libgomp  tests were likewise presuming vector partitioning,  without 
specifying it.


nathan
2016-01-22  Nathan Sidwell  

	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Specify vector.
	* testsuite/libgomp.oacc-c-c++-common/routine-2.c: Likewise.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c	(revision 232738)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c	(working copy)
@@ -25,7 +25,7 @@ main()
 
 #pragma acc parallel copy (a[0:n]) vector_length (32)
   {
-#pragma acc loop
+#pragma acc loop vector
 for (i = 0; i < n; i++)
   a[i] = fact (i);
   }
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c	(revision 232738)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c	(working copy)
@@ -27,7 +27,7 @@ main()
 
 #pragma acc parallel copy (a[0:n]) vector_length (32)
   {
-#pragma acc loop
+#pragma acc loop vector
 for (i = 0; i < n; i++)
   a[i] = fact (i);
   }


Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute

2016-01-22 Thread Eric Botcazou
> Ok, rebased onto a more recent build, and bootstrapping with Ada posed no
> problems. Sorry for the noise.

Great, no problem, and thanks for double checking.

-- 
Eric Botcazou


[PATCH] PR c++/69399: Add HAVE_WORKING_CXX_BUILTIN_CONSTANT_P

2016-01-22 Thread H.J. Lu
Without the fix for PR 65656, g++ miscompiles __builtin_constant_p in
wi::lrshift in wide-int.h.  Add a check with PR 65656 testcase to verify
that C++ __builtin_constant_p works properly.

Tested on x86-64 with working GCC:

gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
stage1-gcc/auto-host.h:#define HAVE_WORKING_CXX_BUILTIN_CONSTANT_P 1

and broken GCC:

gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
prev-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */
stage1-gcc/auto-host.h:/* #undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P */

Ok for trunk?

Thanks.

H.J.
---
gcc/

PR c++/69399
* configure.ac: Check if C++ __builtin_constant_p works
properly.
(HAVE_WORKING_CXX_BUILTIN_CONSTANT_P): AC_DEFINE.
* system.h (STATIC_CONSTANT_P): Use __builtin_constant_p only
if HAVE_WORKING_CXX_BUILTIN_CONSTANT_P is defined.
* config.in: Regenerated.
* configure: Likewise.

gcc/testsuite/

PR c++/69399
* gcc.dg/torture/pr69399.c: New test.
---
 gcc/config.in  | 10 -
 gcc/configure  | 41 --
 gcc/configure.ac   | 27 ++
 gcc/system.h   |  2 +-
 gcc/testsuite/gcc.dg/torture/pr69399.c | 21 +
 5 files changed, 97 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr69399.c

diff --git a/gcc/config.in b/gcc/config.in
index 1796e1d..11ebf5c 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1846,6 +1846,13 @@
 #endif
 
 
+/* Define this macro if C++ __builtin_constant_p with constexpr does not crash
+   with a variable. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_WORKING_CXX_BUILTIN_CONSTANT_P
+#endif
+
+
 /* Define to 1 if `fork' works. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_WORKING_FORK
@@ -1865,7 +1872,8 @@
 #endif
 
 
-/* Define if your assembler supports .dwsect 0xB */
+/* Define if your assembler supports AIX debug frame section label reference.
+   */
 #ifndef USED_FOR_TARGET
 #undef HAVE_XCOFF_DWARF_EXTRAS
 #endif
diff --git a/gcc/configure b/gcc/configure
index ff646e8..2798231 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -6534,6 +6534,43 @@ fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 fi
 
+# Check if C++ __builtin_constant_p works properly.  Without the fix
+# for PR 65656, g++ miscompiles __builtin_constant_p in wi::lrshift in
+# wide-int.h.  Add a check with PR 65656 testcase to verify that C++
+# __builtin_constant_p works properly.
+if test "$GCC" = yes; then
+  saved_CFLAGS="$CFLAGS"
+  saved_CXXFLAGS="$CXXFLAGS"
+  CFLAGS="$CFLAGS -O -x c++ -std=c++11"
+  CXXFLAGS="$CXXFLAGS -O -x c++ -std=c++11"
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX 
__builtin_constant_p works with constexpr" >&5
+$as_echo_n "checking whether $CXX __builtin_constant_p works with constexpr... 
" >&6; }
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+foo (int argc)
+{
+  constexpr bool x = __builtin_constant_p(argc);
+  return x ? 1 : 0;
+}
+
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+$as_echo "#define HAVE_WORKING_CXX_BUILTIN_CONSTANT_P 1" >>confdefs.h
+
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+  CFLAGS="$saved_CFLAGS"
+  CXXFLAGS="$saved_CXXFLAGS"
+fi
+
 # Check whether compiler is affected by placement new aliasing bug (PR 29286).
 # If the host compiler is affected by the bug, and we build with optimization
 # enabled (which happens e.g. when cross-compiling), the pool allocator may
@@ -18419,7 +18456,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18422 "configure"
+#line 18459 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18525,7 +18562,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18528 "configure"
+#line 18565 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 4dc7c10..9791a96 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -416,6 +416,33 @@ struct X { typedef long long t; };
 ]], [[X::t x;]])],[],[AC_MSG_ERROR([error verifying int64_t uses long 
long])])
 fi
 
+# Check if C++ __builtin_constant_p works properly.  Without the fix
+# for PR 65656, g++ miscompiles __builtin_constant_p in wi::lrshift in
+# wide-int.h.  Add a check with PR 65656 testcase to verify that C++
+# __builtin_constant_p works properly.
+if test "$GCC" = yes; then
+  saved_CFLAGS="$CFLAGS"
+  saved_CXXFLAGS="$CXXFLAGS"
+  CFLAGS="$CFLAGS 

Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-22 Thread Ian Lance Taylor
Updated patch.

I've verified that I'm changing the only relevant place in
tree-ssa-loop-ivopts.c that creates a POINTER_PLUS_EXPR, so I do think
that this is the only changed to fix the problem for ivopts.

OK for mainline?

Ian

gcc/ChangeLog:

2016-01-22  Ian Lance Taylor  

* common.opt (fkeep-gc-roots-live): New option.
* tree-ssa-loop-ivopts.c (add_autoinc_candidates): If
-fkeep-gc-roots-live, skip pointers.
* doc/invoke.texi (Optimize Options): Document
-fkeep-gc-roots-live.

gcc/testsuite/ChangeLog:

2016-01-22  Ian Lance Taylor  

* gcc.dg/tree-ssa/ivopt_5.c: New test.
Index: common.opt
===
--- common.opt  (revision 232580)
+++ common.opt  (working copy)
@@ -1380,6 +1380,10 @@
 Enable hoisting adjacent loads to encourage generating conditional move
 instructions.
 
+fkeep-gc-roots-live
+Common Report Var(flag_keep_gc_roots_live) Optimization
+Always keep a pointer to a live memory block
+
 floop-parallelize-all
 Common Report Var(flag_loop_parallelize_all) Optimization
 Mark all loops as parallel.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 232580)
+++ doc/invoke.texi (working copy)
@@ -359,7 +359,7 @@
 -fno-ira-share-spill-slots @gol
 -fisolate-erroneous-paths-dereference -fisolate-erroneous-paths-attribute @gol
 -fivopts -fkeep-inline-functions -fkeep-static-functions @gol
--fkeep-static-consts -flive-range-shrinkage @gol
+-fkeep-static-consts -fkeep-gc-roots-live -flive-range-shrinkage @gol
 -floop-block -floop-interchange -floop-strip-mine @gol
 -floop-unroll-and-jam -floop-nest-optimize @gol
 -floop-parallelize-all -flra-remat -flto -flto-compression-level @gol
@@ -6621,6 +6621,17 @@
 If you use @option{-Wunsafe-loop-optimizations}, the compiler warns you
 if it finds this kind of loop.
 
+@item -fkeep-gc-roots-live
+@opindex fkeep-gc-roots-live
+This option tells the compiler that a garbage collector will be used,
+and that therefore the compiled code must retain a live pointer into
+all memory blocks.  The compiler is permitted to construct a pointer
+that is outside the bounds of a memory block, but it must ensure that
+given a pointer into memory, some pointer into that memory remains
+live in the compiled code whenever it is live in the source code.
+This option is disabled by default for most languages, enabled by
+default for languages that use garbage collection.
+
 @item -fcrossjumping
 @opindex fcrossjumping
 Perform cross-jumping transformation.
Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c
===
--- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (working copy)
@@ -0,0 +1,23 @@
+/* { dg-options "-O2 -fdump-tree-ivopts -fkeep-gc-roots-live" } */
+
+/* No ivopts here when using -fkeep-gc-roots-live.   */
+
+void foo (char *pstart, int n)
+{
+  char *p;
+  char *pend = pstart + n;
+
+  for (p = pstart; p < pend; p++)
+*p = 1;
+}
+
+void foo1 (char *pstart, int n)
+{
+  char *p;
+  char *pend = pstart + n;
+
+  for (p = pstart; p != pend; p++)
+*p = 1;
+}
+
+/* { dg-final { scan-tree-dump-times "ivtmp.\[0-9_\]* = PHI <" 0 "ivopts"} } */
Index: tree-ssa-loop-ivopts.c
===
--- tree-ssa-loop-ivopts.c  (revision 232580)
+++ tree-ssa-loop-ivopts.c  (working copy)
@@ -2956,6 +2956,16 @@
   || !cst_and_fits_in_hwi (step))
 return;
 
+  /* -fkeep-gc-roots-live means that we have to keep a real pointer
+ live, but the ivopts code may replace a real pointer with one
+ pointing before or after the memory block that is then adjusted
+ into the memory block during the loop.  FIXME: It would likely be
+ better to actually force the pointer live and still use ivopts;
+ for example, it would be enough to write the pointer into memory
+ and keep it there until after the loop.  */
+  if (flag_keep_gc_roots_live && POINTER_TYPE_P (TREE_TYPE (base)))
+return;
+
   cstepi = int_cst_value (step);
 
   mem_mode = TYPE_MODE (TREE_TYPE (*use->op_p));


[PATCH][ARM][1/4] PR target/65932: Add testcase

2016-01-22 Thread Kyrill Tkachov

Hi all,

This patch adds an execute testcase that is fixed by the arm.h hunk of Jim 
Wilson's
patch that he posted at:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html

I propose this testsuite change instead of the testsuite changes in that patch 
since
the gcc.target/arm/wmul-[123].c regressions are dealt with in the other patches 
in this series.

Bootstrapped and tested on arm.

Ok for trunk?
Thanks,
Kyrill

2016-01-22  Kyrylo Tkachov  

PR target/65932
PR target/67714
* gcc.c-torture/execute/pr67714.c: New test.
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67714.c b/gcc/testsuite/gcc.c-torture/execute/pr67714.c
new file mode 100644
index ..386c0dfad8dd786938127d5e45987e7dc113b564
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr67714.c
@@ -0,0 +1,26 @@
+unsigned int b;
+int c;
+
+signed char
+fn1 ()
+{
+  signed char d;
+  for (int i = 0; i < 1; i++)
+d = -15;
+  return d;
+}
+
+int
+main (void)
+{
+  for (c = 0; c < 1; c++)
+b = 0;
+  char e = fn1 ();
+  signed char f = e ^ b;
+  volatile int g = (int) f;
+
+  if (g != 0xfff1)
+__builtin_abort ();
+
+  return 0;
+}


[PATCH][ARM][2/4] Fix operand costing logic for SMUL[TB][TB]

2016-01-22 Thread Kyrill Tkachov

Hi all,

As part of investigating the codegen effects of a fix for PR 65932 I found we 
assign
too high a cost for the sign-extending multiply instruction SMULBB.
This is because we add the cost of a multiply-extend but then also recurse into 
the
SIGN_EXTEND sub-expressions rather than the registers (or subregs) being 
sign-extended.

This patch is a simple fix. The fix is right by itself, but in combination with 
patch 3
fix the gcc.target/arm/wmul-2.c testcase.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-01-22  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs, MULT case): Properly extract
the operands of the SIGN_EXTENDs from a SMUL[TB][TB] rtx.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 17f00b5a1de21de35366b82040a7ad46d65f899e..ee3ebe4561ea3d9791fabdfcec8b16af63bd4d20 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10321,8 +10321,10 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  /* SMUL[TB][TB].  */
 	  if (speed_p)
 		*cost += extra_cost->mult[0].extend;
-	  *cost += rtx_cost (XEXP (x, 0), mode, SIGN_EXTEND, 0, speed_p);
-	  *cost += rtx_cost (XEXP (x, 1), mode, SIGN_EXTEND, 1, speed_p);
+	  *cost += rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ SIGN_EXTEND, 0, speed_p);
+	  *cost += rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+ SIGN_EXTEND, 1, speed_p);
 	  return true;
 	}
 	  if (speed_p)


[PATCH][cse][3/4] Don't overwrite original rtx when folding source of set

2016-01-22 Thread Kyrill Tkachov

Hi all,

This patch is a consequence of the thread I started at 
https://gcc.gnu.org/ml/gcc/2016-01/msg00100.html
The problem is that the fold_rtx call in cse_insn may overwrite its argument if 
the insn argument is non-NULL.
This leads to CSE not considering the original form of the RTX when doing its 
cost analysis later on.
This led to it picking a normal SImode multiply expression over the original 
multiply-sign-extend expression
which in my case is cheaper (as reflected in the fixed rtx costs from patch 2)

The simple fix is to pass NULL to fold_rtx so that it will return the candidate 
folded expression
into src_folded but still retain the original src for analysis.

With this change the gcc.target/arm/wmul-[12].c and the costs fix in patch [2/4]
the tests now generate their expected
sign-extend+multiply (+accumulate) sequences.

Apart from that this patch has no impact codegen on SPEC2006 for arm.
For aarch64 the impact is minimal and inconsequential. I've seen sequences that
select between 1 -1 being turned from a CSINC (of zero) into a CSNEG. Both are 
valid
and of equal value.

On x86_64 the impact was also minimal. Most benchmarks were not changed at all.
Some showed a negligible reduction in codesize and a slight register-allocation 
perturbations.
But nothing significant.
Hence, I claim that this patch is low impact.

Bootstrapped and tested on arm, aarch64, x86_64.
Ok for trunk?

Thanks,
Kyrill

2016-01-22  Kyrylo Tkachov  

* cse.c (cse_insn): Pass NULL to fold_rtx when initially
folding the source of a SET.
diff --git a/gcc/cse.c b/gcc/cse.c
index 58b8fc0313dcbfb2036054564746a7832ae52140..2665d9a2733cad7286b41a88753acfcf79be83f1 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4636,7 +4636,7 @@ cse_insn (rtx_insn *insn)
 
   /* Simplify and foldable subexpressions in SRC.  Then get the fully-
 	 simplified result, which may not necessarily be valid.  */
-  src_folded = fold_rtx (src, insn);
+  src_folded = fold_rtx (src, NULL);
 
 #if 0
   /* ??? This caused bad code to be generated for the m68k port with -O2.


[PATCH][ARM][0/4] Fixing PR target/65932

2016-01-22 Thread Kyrill Tkachov

Hi all,

PR target/65932 is a wrong-code bug affecting arm and has manifested itself
when compiling the Linux kernel, so it's something that we really
ought to fix. The problem stems from the fact that PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
PROMOTE_MODE also marks the promotion as unsigned, whereas the
TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
being wrongly zero-extended instead of sign-extended.  This also occurs
in PR target/67714.

Jim Wilson tried a few approaches and from the discussion
on the PR and on the ML 
(https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
the preferred approach is to make PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE
match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an ABI
change so we don't want to do that.  Changing PROMOTE_MODE to not zero-extend
fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
series.

It has been posted at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
and this series is based on top of the arm.h hunk of that patch.

There have been some concerns about the codegen quality fallout from Jim's
patch, which is what the remaining patches in this series address:

* 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
wmul-1.c and wmul-2.c are cases where we no longer generate
sign-extend+multiply (and accumulate) instructions but instead generate
the normal full-width multiplies (the operands are sign-extended from
preceeding sign-extending loads).  This is a regression for some targets
on which the sign-extending form is faster. Patches 2 and 3 address this.
gcc.target/arm/wmul-3.c is a test where we actually end up generating
better code and so the testcase just needs to be adjusted.
Patch 4 deals with that.

* Sign-extending rather than zero-extending short values means we make more
use of the ldrsb and ldrsh arm instructions rather than the zero-extending
ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited addressing
modes (only REG + REG), which could hurt code size. However, the change also
means that we can now merge sequences of load-zero-extend followed by a 
sign-extend
into a single load-sign-extend.
So we'd turn a (ldrh ; sxth) into an (ldrsh).

I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1) 
and looked
at the generated assembly for -Os and -O2. I did see both effects. That is,
ldrh instructions with an immediate being turned into two instructions:
ldrhr4, [r4, #12]
->
movsr5, #12
ldrshr4, [r4, r5]

But I also observed the beneficial effect. Various register allocation
perturbations also featured in the changes
Overall, for every codebase I've looked at both -Os and -O2 the new code was
slightly smaller. Probably not enough to call this an outright win, but
definitely not a regression overall.

As for ARM and Thumb2 states, this series doesn't have a major impact.
SPEC2006 bencharks are slightly reduced in size (but nothing to write home
about) and there are no code quality regressions. And even the regressions
caught by the testsuite in the wmul-[12].c tests don't actually manifest
in practice in SPEC, but they are fixed anyway.

The series contains one target-independent change to CSE in patch 3 that
I'll explain there.

The series has been bootstrapped and tested on arm, aarch64 and x86_64.
Is this ok for trunk together with Jim's arm.h hunk from
g ?

P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on 
arm.
Refer to the bugzilla entry there for the analysis of how this issue affects
that PR.

Thanks,
Kyrill



Re: [PATCH][ARM] (cleanup) remove arn_builtin_macro_redefined overwrite

2016-01-22 Thread Kyrill Tkachov


On 21/01/16 10:22, Christian Bruel wrote:

A tiny patch to cleanup the cpp_opts->warn_builtin_macro_redefined setting in 
arm_pragma_target_parse. Removes this hack obsoleted by the use of 
NODE_CONDITIONAL for pr target/69180.

regtested for arm-linux-gnueabi -mfpu=vfp, -mfpu=neon,-mfpu=neon-fp-armv8





Ok.

Thanks,
Kyrill


Re: [PATCH] Fix up reduction-1{1,2} testcases (PR middle-end/68221)

2016-01-22 Thread Jakub Jelinek
On Thu, Jan 21, 2016 at 07:17:59AM +0100, Thomas Schwinge wrote:
> Ping.

I'd prefer to keep the code as is, it is closer to what could result from
the user trying to do a similar thing, and thus has a better chance of
keeping being supported.

Jakub


Re: [PATCH] Fix use of declare'd vars by routine procedures.

2016-01-22 Thread Jakub Jelinek
On Sun, Jan 17, 2016 at 05:01:24PM -0600, James Norris wrote:
> The attached patch addresses the failure of declare-4 in the libgomp 
> testsuite.
> The primary failure was the use a variable with a link clause for an OpenACC
> routine function. The patch changes the test to use a create clause. The patch
> also adds checking of those globals used within an OpenACC routine function.
> Additional gcc testing is also included in the patch.
> 
> Regtested and bootstrapped on x86_64.

The testcase change is obviously fine, please install it separately.

As for the extra error, I think it would be better to diagnose this during
gimplification, that way you do it for all FEs, plus you have the omp
context in there, etc.  The error wording is weird, the diagnostic
should make it clear use of what is invalid in what and why.

Jakub


Re: TR29124 C++ Special Maths - Make pull functions into global namespace.

2016-01-22 Thread Jonathan Wakely

On 21/01/16 19:07 -0500, Ed Smith-Rowland wrote:

On 01/21/2016 07:29 AM, Jonathan Wakely wrote:

On 20/01/16 20:30 -0500, Ed Smith-Rowland wrote:
Now that libstdc++ installs a proper math.h we can piggyback on 
that to put in the last bit of TR29124.


This patch adds the math special functions to 
c_compatibility/math.h in the global namespace.

I remove the XFAILs from the compile_2.cc tests.

This converts 21 XFAILs into 21 PASSes.

Tested on x86_64-linux.

I understand if this is too late.
I'll put it up on trunk and backport after stage 1 reopens.

Meanwhile I'll commit this to the tr29124 branch.


I'm inclined to say the change is OK for trunk now. We've added math.h
and we've added the special functions, it would be good if the two new
things work together.

However ...


Index: include/c_compatibility/math.h
===
--- include/c_compatibility/math.h(revision 232610)
+++ include/c_compatibility/math.h(working copy)
@@ -75,70 +75,71 @@
#endif

#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
-using std::assoc_laguerref


This doesn't look like the right patch, because these lines aren't in
the version on trunk.


I must have given you a negative patch relative to the tr29124 
branch...  Sigh.

I'll make a new one.
Those lines have to be *added*.
Sorry.


No problem. It looks like it was a partial patch, adding semi-colons
to the using declarations, so you just need to diff against the
previous revision instead.



Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Eric Botcazou
> I only updated
> -  /* Don't inline if the callee can throw non-call exceptions but the
> - caller cannot.
> - FIXME: this is obviously wrong for LTO where STRUCT_FUNCTION is
> missing. - Move the flag into cgraph node or mirror it in the inline
> summary.  */ -  else if (callee_fun &&
> callee_fun->can_throw_non_call_exceptions -  && !(caller_fun &&
> caller_fun->can_throw_non_call_exceptions)) -{
> -  e->inline_failed = CIF_NON_CALL_EXCEPTIONS;
> -  inlinable = false;
> -}
> to actually work with LTO where callee_fun/caller_fun is not always
> available (but sometimes, like when ICF requested the body or when we
> merged profiles, it is).

No, that's not true.  Let's consider an Ada caller and a C callee.  With the 
old code (mine as you remarked): caller_fun->can_throw_non_call_exceptions is 
true and callee_fun->can_throw_non_call_exceptions is false, so the above test 
is false and we can inline.  With the new code (yours): check_match is true 
and opt_for_fn (callee->decl, flag_non_call_exceptions) is false, so we cannot 
inline.

-- 
Eric Botcazou


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Eric Botcazou
> I am fine with it being relaxed and permitting inlining !non_call_exceptions
> to non_call_exceptions functions..  It would be also cool to have a
> testcases.

Thanks, patch installed with check_match changed to check_maybe_up, I'll work 
towards adding an Ada+C LTO test but this will require fiddling with DejaGNU.

-- 
Eric Botcazou


Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-01-22 Thread Jakub Jelinek
On Fri, Jan 22, 2016 at 08:40:26AM +0100, Thomas Schwinge wrote:
> On Thu, 21 Jan 2016 22:54:26 +0100, I wrote:
> > On Mon, 18 Jan 2016 18:26:49 +0100, Tom de Vries  
> > wrote:
> > > [...] [OpenACC] kernels region [...]
> > > that parloops does not manage to parallelize:
> 
> > Telling from real-world code that we've been having a look at, when the
> > above situation happens, we're -- in the vast majority of all cases -- in
> > a situation where we generally want to avoid offloading (unless
> > explicitly requested), "to avoid data copy penalty" as well as typically
> > much slower single-threaded execution on the GPU.  Obviously, that will
> > have to be revisited as parloops (or any other mechanism in GCC) is able
> > to better understand/use the parallelism in OpenACC kernels constructs.
> > 
> > So, building upon Tom's patch, I have implemented an "avoid offloading"
> > flag given the presence of one un-parallelized OpenACC kernels construct.
> > This is currently only enabled for OpenACC kernels constructs, in
> > combination with nvptx offloading, but I think the general scheme will be
> > useful also for other constructs as well as other (non-shared memory)
> > offloading targets.
> 
> > Committed to gomp-4_0-branch in r232709:
> > 
> > commit 41a76d233e714fd7b79dc1f40823f607c38306ba
> > Author: tschwinge 
> > Date:   Thu Jan 21 21:52:50 2016 +
> > 
> > Un-parallelized OpenACC kernels constructs with nvptx offloading: 
> > "avoid offloading"
> 
> Thought I'd check before porting it over -- will such a patch also be
> accepted for trunk?

I think it is a bad idea to go against what the user wrote.  Warning that
some code might not be efficient?  Perhaps (if properly guarded with some
warning option one can turn off, either on a per-source file or using
pragmas even more fine grained).  But by default not offloading?  That is
just wrong.

Jakub


Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-01-22 Thread Thomas Schwinge
Hi Jakub!

On Fri, 22 Jan 2016 09:36:25 +0100, Jakub Jelinek  wrote:
> On Fri, Jan 22, 2016 at 08:40:26AM +0100, Thomas Schwinge wrote:
> > On Thu, 21 Jan 2016 22:54:26 +0100, I wrote:
> > > On Mon, 18 Jan 2016 18:26:49 +0100, Tom de Vries  
> > > wrote:
> > > > [...] [OpenACC] kernels region [...]
> > > > that parloops does not manage to parallelize:
> > 
> > > Telling from real-world code that we've been having a look at, when the
> > > above situation happens, we're -- in the vast majority of all cases -- in
> > > a situation where we generally want to avoid offloading (unless
> > > explicitly requested), "to avoid data copy penalty" as well as typically
> > > much slower single-threaded execution on the GPU.  Obviously, that will
> > > have to be revisited as parloops (or any other mechanism in GCC) is able
> > > to better understand/use the parallelism in OpenACC kernels constructs.
> > > 
> > > So, building upon Tom's patch, I have implemented an "avoid offloading"
> > > flag given the presence of one un-parallelized OpenACC kernels construct.
> > > This is currently only enabled for OpenACC kernels constructs, in
> > > combination with nvptx offloading, but I think the general scheme will be
> > > useful also for other constructs as well as other (non-shared memory)
> > > offloading targets.
> > 
> > > Committed to gomp-4_0-branch in r232709:
> > > 
> > > commit 41a76d233e714fd7b79dc1f40823f607c38306ba
> > > Author: tschwinge 
> > > Date:   Thu Jan 21 21:52:50 2016 +
> > > 
> > > Un-parallelized OpenACC kernels constructs with nvptx offloading: 
> > > "avoid offloading"
> > 
> > Thought I'd check before porting it over -- will such a patch also be
> > accepted for trunk?
> 
> I think it is a bad idea to go against what the user wrote.  Warning that
> some code might not be efficient?  Perhaps (if properly guarded with some
> warning option one can turn off, either on a per-source file or using
> pragmas even more fine grained).  But by default not offloading?  That is
> just wrong.

Well, let's argue the opposite way round: a user annotated the source
code with directives to help the compiler identify
parallelization/offloading opportunities.  These directives are just
descriptive hints however; (obeying program semantics, of course) the
compiler is free to ignore them, or just pay attention to some of them.
Suppose the compiler didn't find any parallelization opportunities, but
it knows that compared to host-fallback execution, offloading will be
slower for single-threaded code (data copy penalty, slower GPU clock
speed), so it only makes sense to not offload the code in such cases.

This is, quite possibly, semantically different from OpenMP directives,
where with OpenMP typically the compiler always exactly does what the
user prescribes with directives.  (But even there, you can automatically
apply SIMD parallelism, for example.  You just have to make sure that it
doesn't interfer with the program semantics, basically that the user
"won't notice".)

Does that clarify?


Grüße
 Thomas


Re: [PATCH 1/3, libgomp] Resolve libgomp plugin deadlock on exit, libgomp proper parts

2016-01-22 Thread Jakub Jelinek
On Tue, Jan 19, 2016 at 03:02:48PM +0900, Chung-Lin Tang wrote:
> Ping x 2.

I like the general idea of moving the gomp_fatal stuff up, out of the
plugins, but we now have another plugin, so it can't apply as is.
And in the intelmic plugin, adding return true; after abort (); looks wrong.
So, can you please update this series and repost?

Jakub


Re: Suspicious code in fold-const.c

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 12:06 AM, Andrew MacLeod  wrote:
> I was trying the ttype prototype for static type checking on fold-const.c to
> see how long it would take me to convert such a large file, and it choked on
> this snippet of code in fold_unary_loc, around lines 7690-7711:
>
> suspect code tagged with (*)
>
>  if ((CONVERT_EXPR_CODE_P (code)
>|| code == NON_LVALUE_EXPR)
>   && TREE_CODE (tem) == COND_EXPR
>   && TREE_CODE (TREE_OPERAND (tem, 1)) == code
>   && TREE_CODE (TREE_OPERAND (tem, 2)) == code
>   (*)  && ! VOID_TYPE_P (TREE_OPERAND (tem, 1))
>   (*)  && ! VOID_TYPE_P (TREE_OPERAND (tem, 2))
>   && (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (tem, 1), 0))
>   == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (tem, 2), 0)))
>   && (! (INTEGRAL_TYPE_P (TREE_TYPE (tem))
>  && (INTEGRAL_TYPE_P
>  (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (tem, 1),
> 0
>  && TYPE_PRECISION (TREE_TYPE (tem)) <= BITS_PER_WORD)
>   || flag_syntax_only))
> tem = build1_loc (loc, code, type,
>   build3 (COND_EXPR,
>   TREE_TYPE (TREE_OPERAND
>  (TREE_OPERAND (tem, 1),
> 0)),
>   TREE_OPERAND (tem, 0),
>   TREE_OPERAND (TREE_OPERAND (tem, 1),
> 0),
>   TREE_OPERAND (TREE_OPERAND (tem, 2),
> 0)));
>
> and with:
> #define VOID_TYPE_P(NODE)  (TREE_CODE (NODE) == VOID_TYPE)
>
> I don't think this is what was intended. it would expand into:
>
>   && TREE_CODE (TREE_OPERAND (tem, 1)) == code
>   && TREE_CODE (TREE_OPERAND (tem, 2)) == code
>&& ! (TREE_CODE (TREE_OPERAND (tem, 1)) == VOID_TYPE)
>&& ! (TREE_CODE (TREE_OPERAND (tem, 2)) == VOID_TYPE)
>
> the latter two would be obviously true if the first 2 were true.
>
> My guess is this is probably suppose to be
> && ! VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (tem, 1)))
>  && ! VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (tem, 2)))
>
> but I'm not sure.   Any guesses whats intended here?

Not sure, it might be to detect some of the x ? : throw () constructs
but not sure how those would survive the previous == code check.
Maybe a ? (void) ... : (void) ... is supposed to be detected.

The type check below would catch that as well
(huh?  a flag_syntax_only check in fold-const.c!?)

I'd say change to ! VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (tem, 1))
to match what the VOID_TYPE_P check does above this block.

Richard.

> Andrew
>
>


Re: [patch] Document restriction of scalar_storage_order

2016-01-22 Thread Eric Botcazou
> This patch is OK.

Thanks.

> I wish somebody could fix the existing documentation for this attribute
> to use the present tense instead of the future to describe GCC's current
> behavior, though  :-S

I think it's a common idiom: if you do this, then the compiler will do that.
I can see it in KDE's documentation ("hovering over a folder with it for a 
short time will open that folder"), Firefox's ("If you've opened more tabs 
than will fit on the tab strip"), etc.  But I have fixed it anyway.

-- 
Eric Botcazou


Re: [PATCH][ARM] Fix PR target/69403: Bug in thumb2_ior_scc_strict_it pattern

2016-01-22 Thread Kyrill Tkachov

Hi Han,

On 21/01/16 22:57, Han Shen wrote:

Hi Kyrill, the patched gcc generates correct asm for me for the test
case.  (I'll then build the whole system to see if other it-block
related bugs are gone too.)

One short question, the newly generated RTL for
 x = x |(a)
will be
 orr %0, %1, #1; it ; mov%D2\\t%0, %1 (b)

The cond in (a) should be the reverse of cond in(b), right?


yes, the first C-like expression is just some pseudocode.
I guess it would be better written as:
x = x | .

Thanks for trying it out.
I bootstrapped and tested this patch on trunk and GCC 5.
I'll be doing the same on the 4.9 branch.

Kyrill


Thanks for your quick fix.

Han

On Thu, Jan 21, 2016 at 9:51 AM, Kyrill Tkachov
 wrote:

Hi all,

In this wrong-code PR the pattern for performing
x = x |  for -mrestrict-it is buggy and ends up writing 1 to the
result register rather than orring it.

The offending pattern is *thumb2_ior_scc_strict_it.
My proposed solution is to rewrite it as a splitter, remove the
alternative for the case where operands[1] and 0 are the same reg
that emits the bogus:
it ; mov%0, #1; it ; orr %0, %1

to emit the RTL equivalent to:
orr %0, %1, #1; it ; mov%D2\\t%0, %1
while marking operand 0 as an earlyclobber operand so that it doesn't
get assigned the same register as operand 1.

This way we avoid the wrong-code, make the sequence better (by eliminating
the move of #1 into a register
and relaxing the constraints from 'l' to 'r' since only the register move
has to be conditional).
and still stay within the rules for arm_restrict_it.

Bootstrapped and tested on arm-none-linux-gnueabihf configured
--with-arch=armv8-a and --with-mode=thumb.

Ok for trunk, GCC 5 and 4.9?

Han, can you please try this out to see if it solves the problem on your end
as well?

Thanks,
Kyrill

2016-01-21  Kyrylo Tkachov  

 PR target/69403
 * config/arm/thumb2.md (*thumb2_ior_scc_strict_it): Convert to
 define_insn_and_split.  Ensure operands[1] and operands[0] do not
 get assigned the same register.

2016-01-21  Kyrylo Tkachov  

 PR target/69403
 * gcc.c-torture/execute/pr69403.c: New test.







PR 69400: Invalid 128-bit modulus result

2016-01-22 Thread Richard Sandiford
As described in the PR, wi::divmod_internal was sign- rather than
zero-extending a modulus result in cases where the result has fewer
HWIs than the precision and the upper bit of the upper HWI was set.

This patch tries to make things more robust by getting wi_pack
to handle the canonicalisation step itself.

Tested on x86_64-linux-gnu.  I added tests to the wide-int
plugin since that seemed more direct.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/69400
* wide-int.cc (wi_pack): Take the precision as argument and
perform canonicalization here rather than in the callers.
Use the main loop to handle all full-width HWIs.  Add a
zero HWI if in_len isn't a full result.
(wi::divmod_internal): Update accordingly.
(wi::mul_internal): Likewise.  Simplify.

gcc/testsuite/
PR tree-optimization/69400
* gcc.dg/plugin/wide-int_plugin.c (test_wide_int_mod_trunc): New
function.
(plugin_init): Call it.

diff --git a/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.c
index 17604c8..eea56be 100644
--- a/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.c
@@ -36,11 +36,44 @@ test_wide_int_round_sdiv (void)
 abort ();
 }
 
+static void
+test_wide_int_mod_trunc (void)
+{
+  for (unsigned int i = 1; i < MAX_BITSIZE_MODE_ANY_INT; ++i)
+{
+  if (wi::smod_trunc (wi::lshift (1, i + 1) - 3,
+ wi::lshift (1, i) - 1)
+ != wi::lshift (1, i) - 2)
+   abort ();
+  for (unsigned int base = 32; base <= MAX_BITSIZE_MODE_ANY_INT; base *= 2)
+   for (int bias = -1; bias <= 1; ++bias)
+ {
+   unsigned int precision = base + bias;
+   if (i + 1 < precision && precision <= MAX_BITSIZE_MODE_ANY_INT)
+ {
+   wide_int one = wi::uhwi (1, precision);
+   wide_int a = wi::lshift (one, i + 1) - 3;
+   wide_int b = wi::lshift (one, i) - 1;
+   wide_int c = wi::lshift (one, i) - 2;
+   if (wi::umod_trunc (a, b) != c)
+ abort ();
+   if (wi::smod_trunc (a, b) != c)
+ abort ();
+   if (wi::smod_trunc (-a, b) != -c)
+ abort ();
+   if (wi::smod_trunc (a, -b) != c)
+ abort ();
+ }
+ }
+}
+}
+
 int
 plugin_init (struct plugin_name_args *plugin_info,
 struct plugin_gcc_version *version)
 {
   test_double_int_round_udiv ();
   test_wide_int_round_sdiv ();
+  test_wide_int_mod_trunc ();
   return 0;
 }
diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
index 35eee2c..80ebf5b 100644
--- a/gcc/wide-int.cc
+++ b/gcc/wide-int.cc
@@ -1215,29 +1215,31 @@ wi_unpack (unsigned HOST_HALF_WIDE_INT *result, const 
HOST_WIDE_INT *input,
 }
 
 /* The inverse of wi_unpack.  IN_LEN is the the number of input
-   blocks.  The number of output blocks will be half this amount.  */
-static void
-wi_pack (unsigned HOST_WIDE_INT *result,
+   blocks and PRECISION is the precision of the result.  Return the
+   number of blocks in the canonicalized result.  */
+static unsigned int
+wi_pack (HOST_WIDE_INT *result,
 const unsigned HOST_HALF_WIDE_INT *input,
-unsigned int in_len)
+unsigned int in_len, unsigned int precision)
 {
   unsigned int i = 0;
   unsigned int j = 0;
+  unsigned int blocks_needed = BLOCKS_NEEDED (precision);
 
-  while (i + 2 < in_len)
+  while (i + 1 < in_len)
 {
-  result[j++] = (unsigned HOST_WIDE_INT)input[i]
-   | ((unsigned HOST_WIDE_INT)input[i + 1]
-  << HOST_BITS_PER_HALF_WIDE_INT);
+  result[j++] = ((unsigned HOST_WIDE_INT) input[i]
+| ((unsigned HOST_WIDE_INT) input[i + 1]
+   << HOST_BITS_PER_HALF_WIDE_INT));
   i += 2;
 }
 
   /* Handle the case where in_len is odd.   For this we zero extend.  */
   if (in_len & 1)
-result[j++] = (unsigned HOST_WIDE_INT)input[i];
-  else
-result[j++] = (unsigned HOST_WIDE_INT)input[i]
-  | ((unsigned HOST_WIDE_INT)input[i + 1] << HOST_BITS_PER_HALF_WIDE_INT);
+result[j++] = (unsigned HOST_WIDE_INT) input[i];
+  else if (j < blocks_needed)
+result[j++] = 0;
+  return canonize (result, j, precision);
 }
 
 /* Multiply Op1 by Op2.  If HIGH is set, only the upper half of the
@@ -1460,19 +1462,8 @@ wi::mul_internal (HOST_WIDE_INT *val, const 
HOST_WIDE_INT *op1val,
  *overflow = true;
 }
 
-  if (high)
-{
-  /* compute [prec] <- ([prec] * [prec]) >> [prec] */
-  wi_pack ((unsigned HOST_WIDE_INT *) val,
-  [half_blocks_needed], half_blocks_needed);
-  return canonize (val, blocks_needed, prec);
-}
-  else
-{
-  /* compute [prec] <- ([prec] * [prec]) && ((1 << [prec]) - 1) */
-  wi_pack ((unsigned HOST_WIDE_INT *) val, r, half_blocks_needed);
-  return canonize (val, blocks_needed, 

Re: [hsa merge 07/10] IPA-HSA pass

2016-01-22 Thread Jakub Jelinek
On Wed, Jan 20, 2016 at 09:53:30PM +0300, Ilya Verbin wrote:
> If you're OK with this, I'll install this patch:
> 
> 
> libgomp/
>   * target.c (gomp_get_target_fn_addr): Allow host fallback if target
>   function wasn't mapped to the device with non-shared memory.

Ok, thanks.

> diff --git a/libgomp/target.c b/libgomp/target.c
> index f1f5849..96fe3d5 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1436,12 +1436,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
> *devicep,
>splay_tree_key tgt_fn = splay_tree_lookup (>mem_map, );
>gomp_mutex_unlock (>lock);
>if (tgt_fn == NULL)
> - {
> -   if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> - return NULL;
> -   else
> - gomp_fatal ("Target function wasn't mapped");
> - }
> + return NULL;
>  
>return (void *) tgt_fn->tgt_offset;
>  }
> 
>   -- Ilya

Jakub


[PATCH][ARM][4/4] Adjust gcc.target/arm/wmul-[123].c tests

2016-01-22 Thread Kyrill Tkachov

Hi all,

In this final patch I adjust the troublesome gcc.target/arm/wmul-[123].c tests
to make them more helpful.
gcc.target/arm/wmul-[12].c may now generate either sign-extending multiplies
(+accumulate) or normal 32-bit multiplies since the arguments to the multiplies
are already sign-extended by preceding loads.
So for these tests the patch adds an -mtune option where we know the 
sign-extending
form to be beneficial. This is, of course, reflected in the rtx costs that 
guide the
RTL optimisers (after the fixes in patches 2 and 3).

For wmul-3.c we now generate objectively better code.
For the loop we previously generated:
.L2:
ldrhr1, [lr, #2]!
ldrhip, [r0, #2]!
smulbbip, r1, ip
subr4, r4, ip
smulbbr1, r1, r1
subr2, r2, r1
cmplr, r5
bne.L2

and now we generate:
.L2:
ldrshr1, [ip, #2]!
ldrshr4, [r0, #2]!
mlslr, r1, r4, lr
mlsr2, r1, r1, r2
cmpip, r5
bne.L2

AFAICT the new sequence is better than the old one even for -mtune=cortex-a9 
since it
contains two fewer instructions.

So this test is no longer a good source of getting smulbb instructions.
The proposed change in this patch is to greatly simplify it by writing a simple 
enough
one-liner that we can always expect to be compiled into a single smulbb 
instruction.

Tested on arm-none-eabi.
Ok for trunk?

Thanks,
Kyrill

2016-01-22  Kyrylo Tkachov  

* gcc.target/arm/wmul-3.c: Simplify test to generate just
a single smulbb instruction.
* gcc.target/amr/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
* gcc.target/amr/wmul-2.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c b/gcc/testsuite/gcc.target/arm/wmul-1.c
index d509fe645ea98877753773e7bcf9b6787897..c340f960fa444642fe18ae3bcac93d78fe9dc851 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-1.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 int mac(const short *a, const short *b, int sqr, int *sum)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c b/gcc/testsuite/gcc.target/arm/wmul-2.c
index 2ea55f9fbe12f74f38754cb72be791fd6e9495f4..bd2435c9113a82d2e102b545b3141cbda9ba326d 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-2.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 void vec_mpy(int y[], const short x[], short scaler)
 {
diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c b/gcc/testsuite/gcc.target/arm/wmul-3.c
index 144b553082e6158701639f05929987de01e7125a..87eba740142a80a1dc1979b4e79d9272a839e7b2 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-3.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-3.c
@@ -1,19 +1,11 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O" } */
 
-int mac(const short *a, const short *b, int sqr, int *sum)
+int
+foo (int a, int b)
 {
-  int i;
-  int dotp = *sum;
-
-  for (i = 0; i < 150; i++) {
-dotp -= b[i] * a[i];
-sqr -= b[i] * b[i];
-  }
-
-  *sum = dotp;
-  return sqr;
+  return (short) a * (short) b;
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 2 } } */
+/* { dg-final { scan-assembler-times "smulbb" 1 } } */


Re: PR 69400: Invalid 128-bit modulus result

2016-01-22 Thread Jakub Jelinek
On Fri, Jan 22, 2016 at 09:43:52AM +, Richard Sandiford wrote:
> gcc/
>   PR tree-optimization/69400
>   * wide-int.cc (wi_pack): Take the precision as argument and
>   perform canonicalization here rather than in the callers.
>   Use the main loop to handle all full-width HWIs.  Add a
>   zero HWI if in_len isn't a full result.
>   (wi::divmod_internal): Update accordingly.
>   (wi::mul_internal): Likewise.  Simplify.
> 
> gcc/testsuite/
>   PR tree-optimization/69400
>   * gcc.dg/plugin/wide-int_plugin.c (test_wide_int_mod_trunc): New
>   function.
>   (plugin_init): Call it.

I'd prefer to see also the testcase from the PR in the testsuite in addition
to the unit test.  Just make it /* { dg-do run { target int128 } } */
and put into gcc.dg/torture/

> --- a/gcc/wide-int.cc
> +++ b/gcc/wide-int.cc
> @@ -1215,29 +1215,31 @@ wi_unpack (unsigned HOST_HALF_WIDE_INT *result, const 
> HOST_WIDE_INT *input,
>  }
>  
>  /* The inverse of wi_unpack.  IN_LEN is the the number of input

I know you haven't touched this line and it is preexisting, but when
touching this, please also fix the "the the".

Ok with those changes.

Jakub


Re: gomp_target_fini

2016-01-22 Thread Jakub Jelinek
On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote:
> Thomas, I've mentioned this issue before - there is sometimes just too much
> irrelevant stuff to wade through in your patch submissions, and it
> discourages review. The discussion of the actual problem begins more than
> halfway through your multi-page mail. Please try to be more concise.
> 
> On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
> >Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> >atexit handler, gomp_target_fini, which, with the device lock held, will
> >call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> >clean up.
> >
> >Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> >context is now in an inconsistent state
> 
> >Thus, any cuMemFreeHost invocations that are run during clean-up will now
> >also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> >GOMP_PLUGIN_fatal, which again will trigger the same or another
> >(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> >trying to lock the device again, which is still locked.
> 
> > libgomp/
> > * error.c (gomp_vfatal): Call _exit instead of exit.
> 
> It seems unfortunate to disable the atexit handlers for everything for what
> seems purely an nvptx problem.
> 
> What exactly happens if you don't register the cleanups with atexit in the
> first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED in the
> cleanup functions?

I agree, _exit is just wrong, there could be important atexit hooks from the
application.  You can set some flag that the libgomp or nvptx plugin atexit
hooks should not do anything, or should do things differently.  But
bypassing all atexit handlers is risky.

Jakub


Re: [PATCH][ARM] Fix PR target/69403: Bug in thumb2_ior_scc_strict_it pattern

2016-01-22 Thread Ramana Radhakrishnan
On Fri, Jan 22, 2016 at 9:32 AM, Kyrill Tkachov
 wrote:
> Hi Han,
>
> On 21/01/16 22:57, Han Shen wrote:
>>
>> Hi Kyrill, the patched gcc generates correct asm for me for the test
>> case.  (I'll then build the whole system to see if other it-block
>> related bugs are gone too.)
>>
>> One short question, the newly generated RTL for
>>  x = x |(a)
>> will be
>>  orr %0, %1, #1; it ; mov%D2\\t%0, %1 (b)
>>
>> The cond in (a) should be the reverse of cond in(b), right?
>
>
> yes, the first C-like expression is just some pseudocode.
> I guess it would be better written as:
> x = x | .
>
> Thanks for trying it out.
> I bootstrapped and tested this patch on trunk and GCC 5.
> I'll be doing the same on the 4.9 branch.


Ok everywhere. While you are here could you also audit the other
patterns that we changed for restrict_it to check that this thinko
isn't present anywhere else, please ?

Ramana



>
> Kyrill
>
>
>> Thanks for your quick fix.
>>
>> Han
>>
>> On Thu, Jan 21, 2016 at 9:51 AM, Kyrill Tkachov
>>  wrote:
>>>
>>> Hi all,
>>>
>>> In this wrong-code PR the pattern for performing
>>> x = x |  for -mrestrict-it is buggy and ends up writing 1 to the
>>> result register rather than orring it.
>>>
>>> The offending pattern is *thumb2_ior_scc_strict_it.
>>> My proposed solution is to rewrite it as a splitter, remove the
>>> alternative for the case where operands[1] and 0 are the same reg
>>> that emits the bogus:
>>> it ; mov%0, #1; it ; orr %0, %1
>>>
>>> to emit the RTL equivalent to:
>>> orr %0, %1, #1; it ; mov%D2\\t%0, %1
>>> while marking operand 0 as an earlyclobber operand so that it doesn't
>>> get assigned the same register as operand 1.
>>>
>>> This way we avoid the wrong-code, make the sequence better (by
>>> eliminating
>>> the move of #1 into a register
>>> and relaxing the constraints from 'l' to 'r' since only the register move
>>> has to be conditional).
>>> and still stay within the rules for arm_restrict_it.
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf configured
>>> --with-arch=armv8-a and --with-mode=thumb.
>>>
>>> Ok for trunk, GCC 5 and 4.9?
>>>
>>> Han, can you please try this out to see if it solves the problem on your
>>> end
>>> as well?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-01-21  Kyrylo Tkachov  
>>>
>>>  PR target/69403
>>>  * config/arm/thumb2.md (*thumb2_ior_scc_strict_it): Convert to
>>>  define_insn_and_split.  Ensure operands[1] and operands[0] do not
>>>  get assigned the same register.
>>>
>>> 2016-01-21  Kyrylo Tkachov  
>>>
>>>  PR target/69403
>>>  * gcc.c-torture/execute/pr69403.c: New test.
>>
>>
>>
>


Re: [PATCH][Testsuite] Fix PR66877

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 11:51 AM, Alan Lawrence  wrote:
> This is a scan-tree-dump failure in vect-over-widen-3-big-array.c, that occurs
> only on ARM - the only platform to have vect_widen_shift.
>
> Tested on arm-none-eabi (armv8-crypto-neon-fp, plus a non-neon variant), also
> aarch64 (token platform without vect_widen_shift).

Ok.

Thanks,
Richard.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Only look for 1
> vect_recog_over_widening_pattern in dump if we have vect_widen_shift.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 
> b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> index 1ca3128..69773a5 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
> @@ -58,6 +58,7 @@ int main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: 
> detected" 2 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: 
> detected" 2 "vect" { target { ! vect_widen_shift } } } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: 
> detected" 1 "vect" { target vect_widen_shift } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>
> --
> 1.9.1
>


[PATCH][Testsuite] Fix PR66877

2016-01-22 Thread Alan Lawrence
This is a scan-tree-dump failure in vect-over-widen-3-big-array.c, that occurs
only on ARM - the only platform to have vect_widen_shift.

Tested on arm-none-eabi (armv8-crypto-neon-fp, plus a non-neon variant), also
aarch64 (token platform without vect_widen_shift).

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-over-widen-3-big-array.c: Only look for 1
vect_recog_over_widening_pattern in dump if we have vect_widen_shift.
---
 gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c 
b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
index 1ca3128..69773a5 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-3-big-array.c
@@ -58,6 +58,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: 
detected" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: 
detected" 2 "vect" { target { ! vect_widen_shift } } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern: 
detected" 1 "vect" { target vect_widen_shift } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 
-- 
1.9.1



Re: Fix PR 67665: ICE when passing two empty files directly to cc1 with -g

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 2:17 AM, Andrew Pinski  wrote:
> On Wed, Jan 13, 2016 at 4:36 AM, Richard Biener
>  wrote:
>> On Wed, Jan 13, 2016 at 9:27 AM, Andrew Pinski  wrote:
>>> Hi,
>>>   The support -combine was removed a while back but cc1 still accepts
>>> more than one file if directly invoked.  The support for multiple
>>> files has bit-rotten inside the C front-end now too.  This patch now
>>> errors out when invoked with more than one file instead of crashing
>>> later.
>>>
>>> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>
>> Ok, but can you please simplify the following code then?  The
>>
>>   i = 0;
>>   for (;;)
>> {
>> ...
>>   if (++i >= num_in_fnames)
>> break;
>>
>> and the code following the break should be no longer needed, no?
>
> Yes.  Let me resubmit the patch.  Also will this still be accepted
> even though we are in stage 4?

I suppose it's a regression, so yes.

Richard.

> Thanks,
> Andrew Pinski
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> c-family/ChangeLog:
>>> * c-opts.c (c_common_post_options): Move the error message about "two
>>> or more source files" such that it is unconditional.


Re: C++ PATCH for c++/69379 (ICE with PTRMEM_CST wrapped in NOP_EXPR)

2016-01-22 Thread Marek Polacek
On Fri, Jan 22, 2016 at 03:38:26PM -0500, Jason Merrill wrote:
> If we have a NOP_EXPR to the same type, we should strip it here.

This helps for the unreduced testcases in the PR, but not for the reduced one,
because for the reduced one, the types are not the same.  One type is
struct 
{
  void Dict:: (struct Dict *, T) * __pfn;
  long int __delta;
}
and the second one
struct 
{
  void Dict:: (struct Dict *) * __pfn;
  long int __delta;
}

The NOP_EXPR in this case originated in build_reinterpret_cast_1:
7070   else if ((TYPE_PTRFN_P (type) && TYPE_PTRFN_P (intype))
7071|| (TYPE_PTRMEMFUNC_P (type) && TYPE_PTRMEMFUNC_P (intype)))
7072 return build_nop (type, expr);

So maybe the following patch is desirable, but it's not a complete fix :(.
Thanks,

2016-01-22  Marek Polacek  

PR c++/69379
* constexpr.c (cxx_eval_constant_expression) [NOP_EXPR]: When
converting PTRMEM_CST to the same type, strip nops.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 6b0e5a8..6fe5cbe 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -3619,6 +3619,11 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
if (TREE_CODE (op) == PTRMEM_CST
&& !TYPE_PTRMEM_P (type))
  op = cplus_expand_constant (op);
+   if (TREE_CODE (op) == PTRMEM_CST
+   && tcode == NOP_EXPR
+   && same_type_ignoring_top_level_qualifiers_p (type,
+ TREE_TYPE (op)))
+ STRIP_NOPS (t);
if (POINTER_TYPE_P (type)
&& TREE_CODE (op) == INTEGER_CST
&& !integer_zerop (op))

Marek


Re: [PATCH] Fix the remaining PR c++/24666 blockers (arrays decay to pointers too early)

2016-01-22 Thread Patrick Palka

On Fri, 22 Jan 2016, Jason Merrill wrote:


On 01/22/2016 11:17 AM, Patrick Palka wrote:

On Thu, 21 Jan 2016, Patrick Palka wrote:

On Thu, 21 Jan 2016, Jason Merrill wrote:


On 01/19/2016 10:30 PM, Patrick Palka wrote:

 * g++.dg/template/unify17.C: XFAIL.


Hmm, I'm not comfortable with deliberately regressing this testcase.


  template 
-void bar (void (T[5])); // { dg-error "array of 'void'" }
+void bar (void (T[5])); // { dg-error "array of 'void'" "" { xfail
*-*-* } }


Can we work it so that T[5] also is un-decayed in the DECL_ARGUMENTS
of bar, but decayed in the TYPE_ARG_TYPES?


I think so, I'll try it.


Well, I tried it and the result is really ugly and it only "somewhat"
works.  (Maybe I'm just missing something obvious though.)  The ugliness
comes from the fact that decaying an array parameter type of a function
type embedded deep within some tree structure requires rebuilding all
the tree structures in between to avoid issues due to tree sharing.


Yes, that does complicate things.


This approach only "somewhat" works because it currently looks through
function, pointer, reference and array types.


Right, you would need to handle template arguments as well.


And I just noticed that
this approach does not work at all for USING_DECLs because no PARM_DECL
is ever retained anyway in that case.


I don't understand what you mean about USING_DECLs.


I just meant that we fail and would continue to fail to diagnose an
"array of void" error in the following test case:

   template 
   using X = void (T[5]);

   void foo (X);




I think a better, complete fix for this issue would be to, one way or
another, be able to get at the PARM_DECLs that correspond to a given
FUNCTION_TYPE.  Say, if, the TREE_CHAIN of a FUNCTION_TYPE optionally
pointed to its PARM_DECLs, or something.  What do you think?


Hmm.  So void(int[5]) and void(int*) would be distinct types, but they would 
share TYPE_CANONICAL, as though one is a typedef of the other? Interesting, 
but I'm not sure how that would interact with template argument 
canonicalization.  Well, that can probably be made to work by treating 
dependent template arguments as distinct more frequently.


Another thought: What if we keep a list of arrays we need to substitute into 
for a particular function template?


That approach definitely seems easier to reason about.  And it could
properly handle "using" templates as well as variable templates -- any
TEMPLATE_DECL, I think.




In the meantime, at this stage, I am personally most comfortable with
the previous patch (the one that XFAILs unify17.C).


I don't think that's a good tradeoff, sorry.  For the moment, let's revert 
your earlier patch.


Okay, will do.



Jason




Re: [PATCH] Fix -minline-stringops-dynamically (PR target/69432)

2016-01-22 Thread Jan Hubicka
> Hi!
> 
> With this option we generate a conditional library call.  When expanding
> such a call, do_pending_stack_adjust () is performed from expand_call
> and if there are any needed sp adjustments, they are emitted.  The
> problem is that this happens only in conditionally executed code, so on some
> path the args size level will be correct, on others it might be wrong.
> 
> Fixed by doing the adjustment before the first conditional jump if we know
> we'll emit a call conditionally.  Bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?
> 
> BTW, when looking at this, I've noticed something strange,
> expand_set_or_movmem_prologue_epilogue_by_misaligned_moves has bool
> dynamic_check argument, but the caller has int dynamic_check and in the
> caller as I understand the meaning is dynamic_check == -1 means no
> conditional library call, otherwise there is a conditional library call with
> some specific max size.  When calling
> expand_set_or_movmem_prologue_epilogue_by_misaligned_moves, the
> dynamic_check value is passed to the bool argument though, so that means
> dynamic_check != 0 instead of dynamic_check != -1.  Honza, can you please
> have a look at that?

Yep, that indeed looks odd.  I will take a look.

Path is OK.

Thanks,
Honza
> 
> 2016-01-22  Jakub Jelinek  
> 
>   PR target/69432
>   * config/i386/i386.c: Include dojump.h.
>   (expand_small_movmem_or_setmem,
>   expand_set_or_movmem_prologue_epilogue_by_misaligned_moves): Spelling
>   fixes.
>   (ix86_expand_set_or_movmem): Call do_pending_stack_adjust () early
>   if dynamic_check != -1.
> 
>   * g++.dg/opt/pr69432.C: New test.
> 
> --- gcc/config/i386/i386.c.jj 2016-01-19 09:20:34.0 +0100
> +++ gcc/config/i386/i386.c2016-01-22 20:39:32.289457234 +0100
> @@ -75,6 +75,7 @@ along with GCC; see the file COPYING3.
>  #include "dbgcnt.h"
>  #include "case-cfn-macros.h"
>  #include "regrename.h"
> +#include "dojump.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -25700,7 +25701,7 @@ expand_small_movmem_or_setmem (rtx destm
> if (DYNAMIC_CHECK)
>Round COUNT down to multiple of SIZE
> << optional caller supplied zero size guard is here >>
> -   << optional caller suppplied dynamic check is here >>
> +   << optional caller supplied dynamic check is here >>
> << caller supplied main copy loop is here >>
>   }
> done_label:
> @@ -25875,8 +25876,8 @@ expand_set_or_movmem_prologue_epilogue_b
>else
>   *min_size = 0;
>  
> -  /* Our loops always round down the bock size, but for dispatch to 
> library
> -  we need precise value.  */
> +  /* Our loops always round down the block size, but for dispatch to
> + library we need precise value.  */
>if (dynamic_check)
>   *count = expand_simple_binop (GET_MODE (*count), AND, *count,
> GEN_INT (-size), *count, 1, OPTAB_DIRECT);
> @@ -26469,6 +26470,13 @@ ix86_expand_set_or_movmem (rtx dst, rtx
>size_needed = GET_MODE_SIZE (move_mode) * unroll_factor;
>epilogue_size_needed = size_needed;
>  
> +  /* If we are going to call any library calls conditionally, make sure any
> + pending stack adjustment happen before the first conditional branch,
> + otherwise they will be emitted before the library call only and won't
> + happen from the other branches.  */
> +  if (dynamic_check != -1)
> +do_pending_stack_adjust ();
> +
>desired_align = decide_alignment (align, alg, expected_size, move_mode);
>if (!TARGET_ALIGN_STRINGOPS || noalign)
>  align = desired_align;
> --- gcc/testsuite/g++.dg/opt/pr69432.C.jj 2016-01-22 20:51:19.463428357 
> +0100
> +++ gcc/testsuite/g++.dg/opt/pr69432.C2016-01-22 20:51:04.0 
> +0100
> @@ -0,0 +1,62 @@
> +// PR target/69432
> +// { dg-do compile }
> +// { dg-options "-O3" }
> +// { dg-additional-options "-minline-stringops-dynamically" { target 
> i?86-*-* x86_64-*-* } }
> +
> +template 
> +void
> +f1 (S x, T y, U z)
> +{
> +  for (; y; --y, ++x)
> +*x = z;
> +}
> +
> +template 
> +void f2 (S x, T y, U z)
> +{
> +  f1 (x, y, z);
> +}
> +
> +struct A {};
> +struct B { static char f3 (A, unsigned); };
> +
> +template 
> +void f4 (S, U);
> +
> +struct C
> +{
> +  template 
> +  static S f5 (S x, T y, U z) { f2 (x, y, z); }
> +};
> +
> +template 
> +void f6 (S x, T y, U z) { C::f5 (x, y, z); }
> +
> +template 
> +void f7 (S x, T y, U z, V) { f6 (x, y, z); }
> +
> +struct E
> +{
> +  struct D : A { char e; D (A); };
> +  A f;
> +  E (int x) : g(f) { f8 (x); }
> +  ~E ();
> +  D g;
> +  void f9 (int x) { x ? B::f3 (g, x) : char (); }
> +  void f8 (int x) { f9 (x); }
> +};
> +
> +struct F : E
> +{
> +  F (int x) : E(x) { f10 (x); f4 (this, 0); }
> +  char h;
> +  void f10 (int x) { f7 (, x, h, 0); }
> +};
> +
> +long a;
> +
> +void
> +test ()
> +{
> +  F b(a);
> +}
> 
>   Jakub


[gomp4] gang partitioning

2016-01-22 Thread Nathan Sidwell
I've committed this patch to gomp4 branch.  It changes the auto partitioning 
logic to allocate the outermost loop to the outermost available partitioning. 
For instance, gang partitioning will be used for the outermost loop of a 
parallel region.   Innermost loops remain partitioned at the  innermost 
available level.


This means that if we run out of available partitions, we've parallelized the 
outer loop and the innermost loops, rather than just parallelized the inner loops.


nathan
2016-01-22  Nathan Sidwell  

	gcc/
	* omp-low.c (struct oacc_loop): Add 'inner' field.
	(new_oacc_loop_raw): Initialize it to zero.
	(oacc_loop_fixed_partitions): Initialize it.
	(oacc_loop_auto_partitions): Partition outermost loop to outermost
	available partitioning.

	gcc/testsuite/
	* c-c++-common/goacc/loop-auto-1.c: Adjust expected warnings.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust
	expected partitioning.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(revision 232749)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(working copy)
@@ -102,9 +102,11 @@ int vector_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+for (int jx = 0; jx < 1; jx++)
 #pragma acc loop auto
-for (int ix = 0; ix < size; ix++)
-  ary[ix] = place ();
+  for (int ix = 0; ix < size; ix++)
+	ary[ix] = place ();
   }
 
   return check (ary, size, 0, 0, 1);
@@ -117,7 +119,7 @@ int vector_2 (int *ary, int size)
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
 #pragma acc loop worker
-for (int jx = 0; jx <  size  / 64; jx++)
+for (int jx = 0; jx < size  / 64; jx++)
 #pragma acc loop auto
   for (int ix = 0; ix < 64; ix++)
 	ary[ix + jx * 64] = place ();
@@ -132,30 +134,16 @@ int worker_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+for (int kx = 0; kx < 1; kx++)
 #pragma acc loop auto
-for (int jx = 0; jx <  size  / 64; jx++)
+  for (int jx = 0; jx <  size  / 64; jx++)
 #pragma acc loop vector
-  for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
-  }
-
-  return check (ary, size, 0, 1, 1);
-}
-
-int worker_2 (int *ary, int size)
-{
-  clear (ary, size);
-  
-#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
-  {
-#pragma acc loop auto
-for (int jx = 0; jx <  size  / 64; jx++)
-#pragma acc loop auto
-  for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
+	for (int ix = 0; ix < 64; ix++)
+	  ary[ix + jx * 64] = place ();
   }
 
-  return check (ary, size, 0, 1, 1);
+  return check (ary, size, 0,  1, 1);
 }
 
 int gang_1 (int *ary, int size)
@@ -192,6 +180,22 @@ int gang_2 (int *ary, int size)
   return check (ary, size, 1, 1, 1);
 }
 
+int gang_3 (int *ary, int size)
+{
+  clear (ary, size);
+  
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop auto
+for (int jx = 0; jx <  size  / 64; jx++)
+#pragma acc loop auto
+  for (int ix = 0; ix < 64; ix++)
+	ary[ix + jx * 64] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
 #define N (32*32*32)
 int main ()
 {
@@ -213,13 +217,13 @@ int main ()
 
   if (worker_1 (ary,  N))
 return 1;
-  if (worker_2 (ary,  N))
-return 1;
   
   if (gang_1 (ary,  N))
 return 1;
   if (gang_2 (ary,  N))
 return 1;
+  if (gang_3 (ary,  N))
+return 1;
 
   return 0;
 }
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 232749)
+++ gcc/omp-low.c	(working copy)
@@ -249,8 +249,9 @@ struct oacc_loop
   tree routine;  /* Pseudo-loop enclosing a routine.  */
 
   unsigned mask;   /* Partitioning mask.  */
-  unsigned flags;   /* Partitioning flags.  */
-  tree chunk_size;   /* Chunk size.  */
+  unsigned inner;  /* Partitioning of inner loops.  */
+  unsigned flags;  /* Partitioning flags.  */
+  tree chunk_size; /* Chunk size.  */
   gcall *head_end; /* Final marker of head sequence.  */
 };
 
@@ -19434,7 +19435,7 @@ new_oacc_loop_raw (oacc_loop *parent, lo
   memset (loop->tails, 0, sizeof (loop->tails));
   loop->routine = NULL_TREE;
 
-  loop->mask = loop->flags = 0;
+  loop->mask = loop->flags = loop->inner = 0;
   loop->chunk_size = 0;
   loop->head_end = NULL;
 
@@ -19941,8 +19942,11 @@ oacc_loop_fixed_partitions (oacc_loop *l
   mask_all |= this_mask;
   
   if (loop->child)
-mask_all |= oacc_loop_fixed_partitions (loop->child,
-	outer_mask | this_mask);
+{
+  loop->inner = oacc_loop_fixed_partitions (loop->child,
+		outer_mask | this_mask); 
+

Re: Speedup configure and build with system.h

2016-01-22 Thread Jakub Jelinek
On Fri, Jan 22, 2016 at 09:23:48PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 22, 2016 at 12:09:43PM -0800, H.J. Lu wrote:
> > > * system.h (string, algorithm): Include only conditionally.
> > > (new): Include always under C++.
> > > * bb-reorder.c (toplevel): Define USES_ALGORITHM.
> > > * final.c (toplevel): Ditto.
> > > * ipa-chkp.c (toplevel): Define USES_STRING.
> > > * genconditions.c (write_header): Make gencondmd.c define
> > > USES_STRING.
> > > * mem-stats.h (mem_usage::print_dash_line): Don't use std::string.
> > >
> > 
> > This may have caused:
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69434
> 
> Guess we need:
> 
> 2016-01-22  Jakub Jelinek  
> 
>   PR bootstrap/69434
>   * genrecog.c: Define INCLUDE_ALGORITHM before including system.h,
>   remove  include.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

> --- gcc/genrecog.c.jj 2016-01-04 18:50:33.207491883 +0100
> +++ gcc/genrecog.c2016-01-22 21:21:42.852362294 +0100
> @@ -105,6 +105,7 @@
> 5. Write out C++ code for each function.  */
>  
>  #include "bconfig.h"
> +#define INCLUDE_ALGORITHM
>  #include "system.h"
>  #include "coretypes.h"
>  #include "tm.h"
> @@ -112,7 +113,6 @@
>  #include "errors.h"
>  #include "read-md.h"
>  #include "gensupport.h"
> -#include 
>  
>  #undef GENERATOR_FILE
>  enum true_rtx_doe {

Jakub


[PATCH] Avoid unnecessary creation of VEC_COND_EXPR in the vectorizer

2016-01-22 Thread Jakub Jelinek
Hi!

I've noticed we create a VEC_COND_EXPR tree just to grab the arguments from
it to construct a ternary gimple assign.

The following patch fixes that by creating the ternary gimple assign
directly.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-01-22  Jakub Jelinek  

* tree-vect-stmts.c (vectorizable_condition): Build a VEC_COND_EXPR
directly instead of building a temporary tree.

--- gcc/tree-vect-stmts.c.jj2016-01-20 15:39:08.0 +0100
+++ gcc/tree-vect-stmts.c   2016-01-22 10:25:59.74625 +0100
@@ -7478,7 +7478,7 @@ vectorizable_condition (gimple *stmt, gi
   tree comp_vectype = NULL_TREE;
   tree vec_cond_lhs = NULL_TREE, vec_cond_rhs = NULL_TREE;
   tree vec_then_clause = NULL_TREE, vec_else_clause = NULL_TREE;
-  tree vec_compare, vec_cond_expr;
+  tree vec_compare;
   tree new_temp;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   enum vect_def_type dt, dts[4];
@@ -7691,12 +7691,10 @@ vectorizable_condition (gimple *stmt, gi
  vec_compare = build2 (TREE_CODE (cond_expr), vec_cmp_type,
vec_cond_lhs, vec_cond_rhs);
}
-  vec_cond_expr = build3 (VEC_COND_EXPR, vectype,
-vec_compare, vec_then_clause, vec_else_clause);
-
-  new_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
-  new_temp = make_ssa_name (vec_dest, new_stmt);
-  gimple_assign_set_lhs (new_stmt, new_temp);
+  new_temp = make_ssa_name (vec_dest);
+  new_stmt = gimple_build_assign (new_temp, VEC_COND_EXPR,
+ vec_compare, vec_then_clause,
+ vec_else_clause);
   vect_finish_stmt_generation (stmt, new_stmt, gsi);
   if (slp_node)
 SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);

Jakub


[PATCH] Fix -minline-stringops-dynamically (PR target/69432)

2016-01-22 Thread Jakub Jelinek
Hi!

With this option we generate a conditional library call.  When expanding
such a call, do_pending_stack_adjust () is performed from expand_call
and if there are any needed sp adjustments, they are emitted.  The
problem is that this happens only in conditionally executed code, so on some
path the args size level will be correct, on others it might be wrong.

Fixed by doing the adjustment before the first conditional jump if we know
we'll emit a call conditionally.  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

BTW, when looking at this, I've noticed something strange,
expand_set_or_movmem_prologue_epilogue_by_misaligned_moves has bool
dynamic_check argument, but the caller has int dynamic_check and in the
caller as I understand the meaning is dynamic_check == -1 means no
conditional library call, otherwise there is a conditional library call with
some specific max size.  When calling
expand_set_or_movmem_prologue_epilogue_by_misaligned_moves, the
dynamic_check value is passed to the bool argument though, so that means
dynamic_check != 0 instead of dynamic_check != -1.  Honza, can you please
have a look at that?

2016-01-22  Jakub Jelinek  

PR target/69432
* config/i386/i386.c: Include dojump.h.
(expand_small_movmem_or_setmem,
expand_set_or_movmem_prologue_epilogue_by_misaligned_moves): Spelling
fixes.
(ix86_expand_set_or_movmem): Call do_pending_stack_adjust () early
if dynamic_check != -1.

* g++.dg/opt/pr69432.C: New test.

--- gcc/config/i386/i386.c.jj   2016-01-19 09:20:34.0 +0100
+++ gcc/config/i386/i386.c  2016-01-22 20:39:32.289457234 +0100
@@ -75,6 +75,7 @@ along with GCC; see the file COPYING3.
 #include "dbgcnt.h"
 #include "case-cfn-macros.h"
 #include "regrename.h"
+#include "dojump.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -25700,7 +25701,7 @@ expand_small_movmem_or_setmem (rtx destm
if (DYNAMIC_CHECK)
 Round COUNT down to multiple of SIZE
<< optional caller supplied zero size guard is here >>
-   << optional caller suppplied dynamic check is here >>
+   << optional caller supplied dynamic check is here >>
<< caller supplied main copy loop is here >>
  }
done_label:
@@ -25875,8 +25876,8 @@ expand_set_or_movmem_prologue_epilogue_b
   else
*min_size = 0;
 
-  /* Our loops always round down the bock size, but for dispatch to library
-we need precise value.  */
+  /* Our loops always round down the block size, but for dispatch to
+ library we need precise value.  */
   if (dynamic_check)
*count = expand_simple_binop (GET_MODE (*count), AND, *count,
  GEN_INT (-size), *count, 1, OPTAB_DIRECT);
@@ -26469,6 +26470,13 @@ ix86_expand_set_or_movmem (rtx dst, rtx
   size_needed = GET_MODE_SIZE (move_mode) * unroll_factor;
   epilogue_size_needed = size_needed;
 
+  /* If we are going to call any library calls conditionally, make sure any
+ pending stack adjustment happen before the first conditional branch,
+ otherwise they will be emitted before the library call only and won't
+ happen from the other branches.  */
+  if (dynamic_check != -1)
+do_pending_stack_adjust ();
+
   desired_align = decide_alignment (align, alg, expected_size, move_mode);
   if (!TARGET_ALIGN_STRINGOPS || noalign)
 align = desired_align;
--- gcc/testsuite/g++.dg/opt/pr69432.C.jj   2016-01-22 20:51:19.463428357 
+0100
+++ gcc/testsuite/g++.dg/opt/pr69432.C  2016-01-22 20:51:04.0 +0100
@@ -0,0 +1,62 @@
+// PR target/69432
+// { dg-do compile }
+// { dg-options "-O3" }
+// { dg-additional-options "-minline-stringops-dynamically" { target i?86-*-* 
x86_64-*-* } }
+
+template 
+void
+f1 (S x, T y, U z)
+{
+  for (; y; --y, ++x)
+*x = z;
+}
+
+template 
+void f2 (S x, T y, U z)
+{
+  f1 (x, y, z);
+}
+
+struct A {};
+struct B { static char f3 (A, unsigned); };
+
+template 
+void f4 (S, U);
+
+struct C
+{
+  template 
+  static S f5 (S x, T y, U z) { f2 (x, y, z); }
+};
+
+template 
+void f6 (S x, T y, U z) { C::f5 (x, y, z); }
+
+template 
+void f7 (S x, T y, U z, V) { f6 (x, y, z); }
+
+struct E
+{
+  struct D : A { char e; D (A); };
+  A f;
+  E (int x) : g(f) { f8 (x); }
+  ~E ();
+  D g;
+  void f9 (int x) { x ? B::f3 (g, x) : char (); }
+  void f8 (int x) { f9 (x); }
+};
+
+struct F : E
+{
+  F (int x) : E(x) { f10 (x); f4 (this, 0); }
+  char h;
+  void f10 (int x) { f7 (, x, h, 0); }
+};
+
+long a;
+
+void
+test ()
+{
+  F b(a);
+}

Jakub


Re: TR29124 C++ Special Maths - Make pull functions into global namespace.

2016-01-22 Thread Ed Smith-Rowland

On 01/22/2016 05:39 AM, Jonathan Wakely wrote:

On 21/01/16 19:07 -0500, Ed Smith-Rowland wrote:

On 01/21/2016 07:29 AM, Jonathan Wakely wrote:

On 20/01/16 20:30 -0500, Ed Smith-Rowland wrote:
Now that libstdc++ installs a proper math.h we can piggyback on 
that to put in the last bit of TR29124.


This patch adds the math special functions to 
c_compatibility/math.h in the global namespace.

I remove the XFAILs from the compile_2.cc tests.

This converts 21 XFAILs into 21 PASSes.

Tested on x86_64-linux.

I understand if this is too late.
I'll put it up on trunk and backport after stage 1 reopens.

Meanwhile I'll commit this to the tr29124 branch.


I'm inclined to say the change is OK for trunk now. We've added math.h
and we've added the special functions, it would be good if the two new
things work together.

However ...


Index: include/c_compatibility/math.h
===
--- include/c_compatibility/math.h(revision 232610)
+++ include/c_compatibility/math.h(working copy)
@@ -75,70 +75,71 @@
#endif

#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
-using std::assoc_laguerref


This doesn't look like the right patch, because these lines aren't in
the version on trunk.


I must have given you a negative patch relative to the tr29124 
branch...  Sigh.

I'll make a new one.
Those lines have to be *added*.
Sorry.


No problem. It looks like it was a partial patch, adding semi-colons
to the using declarations, so you just need to diff against the
previous revision instead.



Here are the correct patches.

Retested on x86-64-linux.

Committed as revision 232755.

Ed

2016-01-22  Edward Smith-Rowland  <3dw...@verizon.net>

TR29124 C++ Special Math -  pulls funcs into global namespace.
* include/c_compatibility/math.h: Import the TR29124 functions
into the global namespace.
* testsuite/special_functions/01_assoc_laguerre/compile_2.cc: Remove
xfail and make compile-only.
* testsuite/special_functions/02_assoc_legendre/compile_2.cc: Ditto.
* testsuite/special_functions/03_beta/compile_2.cc: Ditto.
* testsuite/special_functions/04_comp_ellint_1/compile_2.cc: Ditto.
* testsuite/special_functions/05_comp_ellint_2/compile_2.cc: Ditto.
* testsuite/special_functions/06_comp_ellint_3/compile_2.cc: Ditto.
* testsuite/special_functions/07_cyl_bessel_i/compile_2.cc: Ditto.
* testsuite/special_functions/08_cyl_bessel_j/compile_2.cc: Ditto.
* testsuite/special_functions/09_cyl_bessel_k/compile_2.cc: Ditto.
* testsuite/special_functions/10_cyl_neumann/compile_2.cc: Ditto.
* testsuite/special_functions/11_ellint_1/compile_2.cc: Ditto.
* testsuite/special_functions/12_ellint_2/compile_2.cc: Ditto.
* testsuite/special_functions/13_ellint_3/compile_2.cc: Ditto.
* testsuite/special_functions/14_expint/compile_2.cc: Ditto.
* testsuite/special_functions/15_hermite/compile_2.cc: Ditto.
* testsuite/special_functions/16_laguerre/compile_2.cc: Ditto.
* testsuite/special_functions/17_legendre/compile_2.cc: Ditto.
* testsuite/special_functions/18_riemann_zeta/compile_2.cc: Ditto.
* testsuite/special_functions/19_sph_bessel/compile_2.cc: Ditto.
* testsuite/special_functions/20_sph_legendre/compile_2.cc: Ditto.
* testsuite/special_functions/21_sph_neumann/compile_2.cc: Ditto.

Index: include/c_compatibility/math.h
===
--- include/c_compatibility/math.h  (revision 232742)
+++ include/c_compatibility/math.h  (working copy)
@@ -111,5 +111,73 @@
 using std::trunc;
 #endif // C++11 && _GLIBCXX_USE_C99_MATH_TR1
 
-#endif
-#endif
+#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
+using std::assoc_laguerref;
+using std::assoc_laguerrel;
+using std::assoc_laguerre;
+using std::assoc_legendref;
+using std::assoc_legendrel;
+using std::assoc_legendre;
+using std::betaf;
+using std::betal;
+using std::beta;
+using std::comp_ellint_1f;
+using std::comp_ellint_1l;
+using std::comp_ellint_1;
+using std::comp_ellint_2f;
+using std::comp_ellint_2l;
+using std::comp_ellint_2;
+using std::comp_ellint_3f;
+using std::comp_ellint_3l;
+using std::comp_ellint_3;
+using std::cyl_bessel_if;
+using std::cyl_bessel_il;
+using std::cyl_bessel_i;
+using std::cyl_bessel_jf;
+using std::cyl_bessel_jl;
+using std::cyl_bessel_j;
+using std::cyl_bessel_kf;
+using std::cyl_bessel_kl;
+using std::cyl_bessel_k;
+using std::cyl_neumannf;
+using std::cyl_neumannl;
+using std::cyl_neumann;
+using std::ellint_1f;
+using std::ellint_1l;
+using std::ellint_1;
+using std::ellint_2f;
+using std::ellint_2l;
+using std::ellint_2;
+using std::ellint_3f;
+using std::ellint_3l;
+using std::ellint_3;
+using std::expintf;
+using std::expintl;
+using std::expint;
+using std::hermitef;
+using std::hermitel;
+using std::hermite;
+using std::laguerref;
+using std::laguerrel;

Re: [patch] Document restriction of scalar_storage_order

2016-01-22 Thread Eric Botcazou
> Isn't this kind of implied by the already documented restriction of
> taking the address of a union field? Still, you can add this or any kind
> of similar language if you like.

It's stronger I think since you can do type punning without taking an address.

-- 
Eric Botcazou


Re: Remove redundant unshare_expr from ipa-prop

2016-01-22 Thread Jakub Jelinek
On Fri, Jan 22, 2016 at 05:36:14PM +1100, Kugan wrote:
> There is a redundant unshare_expr in ipa-prop. Attached patch removes
> it. Bootstrapped and regression tested on x86_64-pc-linux-gnu with no
> new regressions.
> 
> Is this OK for trunk?
> 
> Thanks,
> Kugan
> 
> gcc/ChangeLog:
> 
> 2016-01-22  Kugan Vivekanandarajah  
> 
>   * ipa-prop.c (ipa_set_jf_constant): Remove redundant unshare_expr.

Ok, thanks.

> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index 06a9aa2..d62c704 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -402,9 +402,6 @@ static void
>  ipa_set_jf_constant (struct ipa_jump_func *jfunc, tree constant,
>struct cgraph_edge *cs)
>  {
> -  constant = unshare_expr (constant);
> -  if (constant && EXPR_P (constant))
> -SET_EXPR_LOCATION (constant, UNKNOWN_LOCATION);
>jfunc->type = IPA_JF_CONST;
>jfunc->value.constant.value = unshare_expr_without_location (constant);

Jakub


Re: [PATCH][cse][3/4] Don't overwrite original rtx when folding source of set

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 10:52 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> This patch is a consequence of the thread I started at
> https://gcc.gnu.org/ml/gcc/2016-01/msg00100.html
> The problem is that the fold_rtx call in cse_insn may overwrite its argument
> if the insn argument is non-NULL.
> This leads to CSE not considering the original form of the RTX when doing
> its cost analysis later on.
> This led to it picking a normal SImode multiply expression over the original
> multiply-sign-extend expression
> which in my case is cheaper (as reflected in the fixed rtx costs from patch
> 2)
>
> The simple fix is to pass NULL to fold_rtx so that it will return the
> candidate folded expression
> into src_folded but still retain the original src for analysis.
>
> With this change the gcc.target/arm/wmul-[12].c and the costs fix in patch
> [2/4]
> the tests now generate their expected
> sign-extend+multiply (+accumulate) sequences.
>
> Apart from that this patch has no impact codegen on SPEC2006 for arm.
> For aarch64 the impact is minimal and inconsequential. I've seen sequences
> that
> select between 1 -1 being turned from a CSINC (of zero) into a CSNEG. Both
> are valid
> and of equal value.
>
> On x86_64 the impact was also minimal. Most benchmarks were not changed at
> all.
> Some showed a negligible reduction in codesize and a slight
> register-allocation perturbations.
> But nothing significant.
> Hence, I claim that this patch is low impact.
>
> Bootstrapped and tested on arm, aarch64, x86_64.
> Ok for trunk?

Ok if the rest of the series is approved (otherwise ok for stage1).

Thanks,
Richard.

> Thanks,
> Kyrill
>
> 2016-01-22  Kyrylo Tkachov  
>
> * cse.c (cse_insn): Pass NULL to fold_rtx when initially
> folding the source of a SET.


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 12:06 PM, Eric Botcazou  wrote:
>> I am fine with it being relaxed and permitting inlining !non_call_exceptions
>> to non_call_exceptions functions..  It would be also cool to have a
>> testcases.
>
> Thanks, patch installed with check_match changed to check_maybe_up, I'll work
> towards adding an Ada+C LTO test but this will require fiddling with DejaGNU.

I suppose that some C++ mixed -f[no-]non-call-exceptions testcases
might be good enough.

Richard.

> --
> Eric Botcazou


RE: [PATCH] [ARC] Add basic support for double load and store instructions

2016-01-22 Thread Claudiu Zissulescu
Thank u for the feedback. I hope this new patch solves the outstanding issues. 
Please find it attached.

//Claudiu


0001-ARC-Add-basic-support-for-double-load-and-store-inst.patch
Description: 0001-ARC-Add-basic-support-for-double-load-and-store-inst.patch


Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel
 wrote:
>
>
> On 01/19/2016 04:18 PM, Richard Biener wrote:
>>
>> maybe just if (currently_expanding_to_rtl)?
>>
>> But yes, this looks like a safe variant of the fix.
>>
>> Richard.
>>
> thanks, currently_expanding_to_rtl works perfectly. So the final version.
> I added a test for each target.

Ok.

Thanks,
Richard.

> bootstrapped / tested for :
> unix/-m32/-march=i586
> unix
>
> arm-qemu/
> arm-qemu//-mfpu=neon
> arm-qemu//-mfpu=neon-fp-armv8
>
> aarch64-qemu
>
>
>
>
>
>
>


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Jan Hubicka
> > I only updated
> > -  /* Don't inline if the callee can throw non-call exceptions but the
> > - caller cannot.
> > - FIXME: this is obviously wrong for LTO where STRUCT_FUNCTION is
> > missing. - Move the flag into cgraph node or mirror it in the inline
> > summary.  */ -  else if (callee_fun &&
> > callee_fun->can_throw_non_call_exceptions -&& !(caller_fun &&
> > caller_fun->can_throw_non_call_exceptions)) -{
> > -  e->inline_failed = CIF_NON_CALL_EXCEPTIONS;
> > -  inlinable = false;
> > -}
> > to actually work with LTO where callee_fun/caller_fun is not always
> > available (but sometimes, like when ICF requested the body or when we
> > merged profiles, it is).
> 
> No, that's not true.  Let's consider an Ada caller and a C callee.  With the 
> old code (mine as you remarked): caller_fun->can_throw_non_call_exceptions is 
> true and callee_fun->can_throw_non_call_exceptions is false, so the above 
> test 
> is false and we can inline.  With the new code (yours): check_match is true 
> and opt_for_fn (callee->decl, flag_non_call_exceptions) is false, so we 
> cannot 
> inline.

Hmm, I see now.  I wonder if we can also inline can_thorw_non_call_exceptions
to !can_throw_non_call_exceptions provied that we set the flag in
ipa-inline-transform.  That way we can inline Ada to C and the observation
about no EH regions should still hold.

Honza
> 
> -- 
> Eric Botcazou


Re: Speedup configure and build with system.h

2016-01-22 Thread Oleg Endo
On Thu, 2016-01-21 at 18:10 +0100, Richard Biener wrote:
> On Thu, Jan 21, 2016 at 5:57 PM, Michael Matz  wrote:
> > Hi,
> > 
> > this has bothered me for some time.  The gcc configure with stage1
> > feels
> > like taking forever because some of the decl availability tests
> > (checking
> > for C function) include system.h, and that, since a while,
> > unconditionally
> > includes  and  under C++, and we meanwhile use
> > the C++
> > compiler for configure tests (which makes sense).  Now, the
> > difference for
> > a debuggable (but not even checking-enabled) cc1plus for a file
> > containing
> > just main():
> > 
> > % cat blaeh.cc
> > #include 
> > #include 
> > #include 
> > #include 
> > int main() {}
> > % cc1plus -quiet -ftime-report blaeh.cc
> >  TOTAL :   0.12 0.01 0.14
> > 
> > (This is btw. three times as expensive as with 4.8 headers (i.e.
> > precompile with g++-4.8 then compile with the same cc1plus as
> > above,
> > taking 0.04 seconds; the STL headers bloat quite much over time)
> > 
> > Well, not quite blazing fast but then adding :
> > 
> > % cc1plus -quiet -ftime-report blaeh-string.cc
> >  TOTAL :   0.60 0.05 0.66
> > 
> > Meeh.  And adding  on top:
> > 
> > % cc1plus -quiet -ftime-report blaeh-string-alg.cc
> >  TOTAL :   1.13 0.09 1.23
> > 
> > So, more than a second for checking if some C-only decl is
> > available, just
> > because system.h unconditionally includes mostly useless STL
> > headers.
> > 
> > So, how useless exactly?  A whopping single file of cc1 proper
> > needs
> > , _two_ files need , and a single target has an
> > unlucky
> > interface in its prototypes and also needs .  (One
> > additional
> > header lazily uses std::string for no particular reason).  So we
> > pay about
> > 5 minutes build time per stage (there are ~400 libbackend.a files)
> > for
> > more or less nothing.
> > 
> > So, let's include those headers only conditionally; I'm pretty sure
> > it's
> > not unreasonable for a source file, if it needs a particular STL
> > facility
> > to #define USES_abcheader (like one normally would have to #include
> > ) before the "system.h" include.
> > 
> > See the patch.  I've grepped for target or language dependencies on
> > other
> > STL types, and either they were already including the right header,
> > or
> > were covered with the new system.h (i.e. I've built all targets
> > quickly
> > for which grepping for 'std::' returned anything).  The
> > genconditions.c
> > change is for the benefit of aarch64 as well, and it single
> > function
> > aarch64_get_extension_string_for_isa_flags returning a std::string.
> > 
> > What do people think?  Should I pass it through a proper bootstrap
> > and put
> > it to trunk?  It's a (developer time) regression, right? ;-)
> 
> Ok.
> 
> Thanks,
> Richard.
> 
> I'm inclined to say #define INCLUDE_ALGORITHM is a better name,
> but just bike-shedding...  and please convert the (bogus) ISL way of
> achieving a similar thing.
> 
> I'm also inclined to say that we should remove  usage.  Not
> sure about algorithm, but I'd say it's the same.
> 

 and  have been put into system.h because there have
been problems with malloc poisoning and C++ stdlib implementation other
than libstdc++, which sometimes pull other headers which then cause
trouble.  The fix for this set of errors was to include some of the
stdlib headers in system.h before anything else.

Cheers,
Oleg


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Richard Biener
On Fri, Jan 22, 2016 at 1:00 PM, Jan Hubicka  wrote:
>> > I only updated
>> > -  /* Don't inline if the callee can throw non-call exceptions but the
>> > - caller cannot.
>> > - FIXME: this is obviously wrong for LTO where STRUCT_FUNCTION is
>> > missing. - Move the flag into cgraph node or mirror it in the inline
>> > summary.  */ -  else if (callee_fun &&
>> > callee_fun->can_throw_non_call_exceptions -&& !(caller_fun &&
>> > caller_fun->can_throw_non_call_exceptions)) -{
>> > -  e->inline_failed = CIF_NON_CALL_EXCEPTIONS;
>> > -  inlinable = false;
>> > -}
>> > to actually work with LTO where callee_fun/caller_fun is not always
>> > available (but sometimes, like when ICF requested the body or when we
>> > merged profiles, it is).
>>
>> No, that's not true.  Let's consider an Ada caller and a C callee.  With the
>> old code (mine as you remarked): caller_fun->can_throw_non_call_exceptions is
>> true and callee_fun->can_throw_non_call_exceptions is false, so the above 
>> test
>> is false and we can inline.  With the new code (yours): check_match is true
>> and opt_for_fn (callee->decl, flag_non_call_exceptions) is false, so we 
>> cannot
>> inline.
>
> Hmm, I see now.  I wonder if we can also inline can_thorw_non_call_exceptions
> to !can_throw_non_call_exceptions provied that we set the flag in
> ipa-inline-transform.  That way we can inline Ada to C and the observation
> about no EH regions should still hold.

That might work (same for -fexceptions).  You might want to wrap the function
in a ERT_MUST_NOT_THROW though in that case.

Richard.

> Honza
>>
>> --
>> Eric Botcazou


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Eric Botcazou
> Why do you say so? There are C->Ada calls as there are Ada->C calls in
> plenty of existing software.

But what percentage of the C->Ada ones are performance critical?  Note that, 
unlike the Ada->C or Ada/C++ ones, these have never been inlined and I can 
imagine the kind of trouble this would introduce.  If the sofware contains a 
mix of C and Ada and some C->Ada calls are performance critical, then they'd 
better be rewritten in C, they will be optimized at any optimization level 
instead of just with LTO.

-- 
Eric Botcazou


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Eric Botcazou
> Hmm, I see now.  I wonder if we can also inline
> can_thorw_non_call_exceptions to !can_throw_non_call_exceptions provied
> that we set the flag in ipa-inline-transform.  That way we can inline Ada to
> C and the observation about no EH regions should still hold.

I'd say you're the only one caring about inlining Ada into other languages ;-)

-- 
Eric Botcazou


Re: [patch] Restore cross-language inlining into Ada

2016-01-22 Thread Arnaud Charlet
> > Hmm, I see now.  I wonder if we can also inline
> > can_thorw_non_call_exceptions to !can_throw_non_call_exceptions
> > provied
> > that we set the flag in ipa-inline-transform.  That way we can inline Ada to
> > C and the observation about no EH regions should still hold.
> 
> I'd say you're the only one caring about inlining Ada into other languages
> ;-)

Why do you say so? There are C->Ada calls as there are Ada->C calls in
plenty of existing software.

Arno


Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-22 Thread Ian Lance Taylor
On Fri, Jan 22, 2016 at 11:25 AM, Bernd Schmidt  wrote:
> On 01/22/2016 08:03 PM, Ian Lance Taylor wrote:
>>
>> Updated patch.
>>
>> I've verified that I'm changing the only relevant place in
>> tree-ssa-loop-ivopts.c that creates a POINTER_PLUS_EXPR, so I do think
>> that this is the only changed to fix the problem for ivopts.
>
>
> I don't think so. One of the problems with ivopts is that it likes to cast
> everything to unsigned int, so looking for POINTER_PLUS_EXPR wouldn't find
> all affected spots. At least this used to happen, I didn't check recently.
> Also, a lot of the generated expressions are built by tree-affine.c rather
> than in ivopts directly.

Thanks for the tip.  I moved the check to add_candidate_1 instead.
This is before the point where it converts to an integer type.  This
approach is better anyhow, as it permits a pointer loop to use an
integer induction variable for the offset.  My tests still pass, as
does bootstrap/testsuite on x86_64-pc-linux-gnu.

Does this look OK for mainline?

Ian


gcc/ChangeLog:

2016-01-22  Ian Lance Taylor  

* common.opt (fkeep-gc-roots-live): New option.
* tree-ssa-loop-ivopts.c (add_candidate_1): If
-fkeep-gc-roots-live, skip pointers.
(add_iv_candidate_for_biv): Handle add_candidate_1 returning
NULL.
* doc/invoke.texi (Optimize Options): Document
-fkeep-gc-roots-live.

gcc/testsuite/ChangeLog:

2016-01-22  Ian Lance Taylor  

* gcc.dg/tree-ssa/ivopt_5.c: New test.


Re: [PATCH] Avoid unnecessary creation of VEC_COND_EXPR in the vectorizer

2016-01-22 Thread Richard Biener
On January 22, 2016 11:09:06 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>I've noticed we create a VEC_COND_EXPR tree just to grab the arguments
>from
>it to construct a ternary gimple assign.
>
>The following patch fixes that by creating the ternary gimple assign
>directly.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK.

Thanks,
Richard.

>2016-01-22  Jakub Jelinek  
>
>   * tree-vect-stmts.c (vectorizable_condition): Build a VEC_COND_EXPR
>   directly instead of building a temporary tree.
>
>--- gcc/tree-vect-stmts.c.jj   2016-01-20 15:39:08.0 +0100
>+++ gcc/tree-vect-stmts.c  2016-01-22 10:25:59.74625 +0100
>@@ -7478,7 +7478,7 @@ vectorizable_condition (gimple *stmt, gi
>   tree comp_vectype = NULL_TREE;
>   tree vec_cond_lhs = NULL_TREE, vec_cond_rhs = NULL_TREE;
>   tree vec_then_clause = NULL_TREE, vec_else_clause = NULL_TREE;
>-  tree vec_compare, vec_cond_expr;
>+  tree vec_compare;
>   tree new_temp;
>   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>   enum vect_def_type dt, dts[4];
>@@ -7691,12 +7691,10 @@ vectorizable_condition (gimple *stmt, gi
> vec_compare = build2 (TREE_CODE (cond_expr), vec_cmp_type,
>   vec_cond_lhs, vec_cond_rhs);
>   }
>-  vec_cond_expr = build3 (VEC_COND_EXPR, vectype,
>-   vec_compare, vec_then_clause, vec_else_clause);
>-
>-  new_stmt = gimple_build_assign (vec_dest, vec_cond_expr);
>-  new_temp = make_ssa_name (vec_dest, new_stmt);
>-  gimple_assign_set_lhs (new_stmt, new_temp);
>+  new_temp = make_ssa_name (vec_dest);
>+  new_stmt = gimple_build_assign (new_temp, VEC_COND_EXPR,
>+vec_compare, vec_then_clause,
>+vec_else_clause);
>   vect_finish_stmt_generation (stmt, new_stmt, gsi);
>   if (slp_node)
> SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
>
>   Jakub




Re: Speedup configure and build with system.h

2016-01-22 Thread Richard Biener
On January 22, 2016 11:15:38 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Fri, Jan 22, 2016 at 09:23:48PM +0100, Jakub Jelinek wrote:
>> On Fri, Jan 22, 2016 at 12:09:43PM -0800, H.J. Lu wrote:
>> > > * system.h (string, algorithm): Include only
>conditionally.
>> > > (new): Include always under C++.
>> > > * bb-reorder.c (toplevel): Define USES_ALGORITHM.
>> > > * final.c (toplevel): Ditto.
>> > > * ipa-chkp.c (toplevel): Define USES_STRING.
>> > > * genconditions.c (write_header): Make gencondmd.c define
>> > > USES_STRING.
>> > > * mem-stats.h (mem_usage::print_dash_line): Don't use
>std::string.
>> > >
>> > 
>> > This may have caused:
>> > 
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69434
>> 
>> Guess we need:
>> 
>> 2016-01-22  Jakub Jelinek  
>> 
>>  PR bootstrap/69434
>>  * genrecog.c: Define INCLUDE_ALGORITHM before including system.h,
>>  remove  include.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

>> --- gcc/genrecog.c.jj2016-01-04 18:50:33.207491883 +0100
>> +++ gcc/genrecog.c   2016-01-22 21:21:42.852362294 +0100
>> @@ -105,6 +105,7 @@
>> 5. Write out C++ code for each function.  */
>>  
>>  #include "bconfig.h"
>> +#define INCLUDE_ALGORITHM
>>  #include "system.h"
>>  #include "coretypes.h"
>>  #include "tm.h"
>> @@ -112,7 +113,6 @@
>>  #include "errors.h"
>>  #include "read-md.h"
>>  #include "gensupport.h"
>> -#include 
>>  
>>  #undef GENERATOR_FILE
>>  enum true_rtx_doe {
>
>   Jakub




Re: [PATCH] libitm: Fix HTM fastpath.

2016-01-22 Thread Torvald Riegel
On Fri, 2016-01-08 at 12:07 +0100, Torvald Riegel wrote:
> This patch fixes a thinko in the HTM fastpath implementation.  In a
> nutshell, we also need to monitor the HTM fastpath control (ie,
> htm_fastpath) variable from within a HW transaction on the HTM fastpath,
> so that such a HW transaciton will only execute if the HTM hastpath is
> still enabled.
> 
> We move htm_fastpath into the serial lock so that a HW transaction only
> needs one cacheline of HTM capacity to monitor both htm_fastpath and
> check that no non-HW-transaction is currently running.
> 
> Tested on x86_64-linux.
> 
> 2016-01-08  Torvald Riegel  
> 
>   * beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline
>   boundary.
>   (htm_fastpath): Remove.
>   (gtm_thread::begin_transaction): Fix HTM fastpath.
>   (_ITM_commitTransaction): Adapt.
>   (_ITM_commitTransactionEH): Adapt.
>   * libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member
>   and accessors.
>   * libitm/config/posix/rwlock.h (gtm_rwlock): Likewise.
>   * libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt.
>   * libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath.
>   * libitm/libitm_i.h (htm_fastpath): Remove declaration.
>   * libitm/method-serial.cc (htm_mg): Adapt.
>   (gtm_thread::serialirr_mode): Adapt.
>   * libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt.

I have committed the attached patch (a minor rebase compared to the
prior one) after offline approval by Jakub Jelinek.  He tested on PPC as
well, and the patch fixed the problem we saw there.
commit 51a5f38f0228ed7e4772bf1d0439f93ab4ffcf23
Author: torvald 
Date:   Fri Jan 22 16:13:06 2016 +

libitm: Fix HTM fastpath.

	* beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline
	boundary.
	(htm_fastpath): Remove.
	(gtm_thread::begin_transaction): Fix HTM fastpath.
	(_ITM_commitTransaction): Adapt.
	(_ITM_commitTransactionEH): Adapt.
	* libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member
	and accessors.
	* libitm/config/posix/rwlock.h (gtm_rwlock): Likewise.
	* libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt.
	* libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath.
	* libitm/libitm_i.h (htm_fastpath): Remove declaration.
	* libitm/method-serial.cc (htm_mg): Adapt.
	(gtm_thread::serialirr_mode): Adapt.
	* libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232735 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 00d28f4..1a258ad 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -32,7 +32,11 @@ using namespace GTM;
 extern __thread gtm_thread_tls _gtm_thr_tls;
 #endif
 
-gtm_rwlock GTM::gtm_thread::serial_lock;
+// Put this at the start of a cacheline so that serial_lock's writers and
+// htm_fastpath fields are on the same cacheline, so that HW transactions
+// only have to pay one cacheline capacity to monitor both.
+gtm_rwlock GTM::gtm_thread::serial_lock
+  __attribute__((aligned(HW_CACHELINE_SIZE)));
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
 
@@ -51,9 +55,6 @@ static pthread_mutex_t global_tid_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_key_t thr_release_key;
 static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
 
-// See gtm_thread::begin_transaction.
-uint32_t GTM::htm_fastpath = 0;
-
 /* Allocate a transaction structure.  */
 void *
 GTM::gtm_thread::operator new (size_t s)
@@ -173,9 +174,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
   // lock's writer flag and thus abort if another thread is or becomes a
   // serial transaction.  Therefore, if the fastpath is enabled, then a
   // transaction is not executing as a HW transaction iff the serial lock is
-  // write-locked.  This allows us to use htm_fastpath and the serial lock's
-  // writer flag to reliable determine whether the current thread runs a HW
-  // transaction, and thus we do not need to maintain this information in
+  // write-locked.  Also, HW transactions monitor the fastpath control
+  // variable, so that they will only execute if dispatch_htm is still the
+  // current method group.  This allows us to use htm_fastpath and the serial
+  // lock's writers flag to reliable determine whether the current thread runs
+  // a HW transaction, and thus we do not need to maintain this information in
   // per-thread state.
   // If an uninstrumented code path is not available, we can still run
   // instrumented code from a HW transaction because the HTM fastpath kicks
@@ -187,9 +190,14 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
   // for any internal changes (e.g., they never 

[PATCH] Fix aarch64 bootstrap (pr69416)

2016-01-22 Thread Richard Henderson
The bare CONST_INT inside the CCmode IF_THEN_ELSE is causing combine to make 
incorrect simplifications.  At this stage it feels safer to wrap the CONST_INT 
inside of an UNSPEC than make more generic changes to combine.


But we should definitely investigate combine's CCmode issues for gcc7.


Ok?


r~
* config/aarch64/aarch64.md (UNSPEC_NZCV): New.
(ccmp, fccmp, fccmpe): Use it.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2f543aa..71fc514 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -129,6 +129,7 @@
 UNSPEC_RSQRT
 UNSPEC_RSQRTE
 UNSPEC_RSQRTS
+UNSPEC_NZCV
 ])
 
 (define_c_enum "unspecv" [
@@ -280,7 +281,7 @@
  (compare:CC
(match_operand:GPI 2 "register_operand" "r,r,r")
(match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
- (match_operand 5 "immediate_operand")))]
+ (unspec:CC [(match_operand 5 "immediate_operand")] UNSPEC_NZCV)))]
   ""
   "@
ccmp\\t%2, %3, %k5, %m4
@@ -298,7 +299,7 @@
  (compare:CCFP
(match_operand:GPF 2 "register_operand" "w")
(match_operand:GPF 3 "register_operand" "w"))
- (match_operand 5 "immediate_operand")))]
+ (unspec:CCFP [(match_operand 5 "immediate_operand")] UNSPEC_NZCV)))]
   "TARGET_FLOAT"
   "fccmp\\t%2, %3, %k5, %m4"
   [(set_attr "type" "fcmp")]
@@ -313,7 +314,7 @@
   (compare:CCFPE
(match_operand:GPF 2 "register_operand" "w")
(match_operand:GPF 3 "register_operand" "w"))
- (match_operand 5 "immediate_operand")))]
+ (unspec:CCFPE [(match_operand 5 "immediate_operand")] UNSPEC_NZCV)))]
   "TARGET_FLOAT"
   "fccmpe\\t%2, %3, %k5, %m4"
   [(set_attr "type" "fcmp")]


Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)

2016-01-22 Thread Alan Lawrence

On 20/01/16 21:10, Christophe Lyon wrote:

On 19 January 2016 at 15:51, Alan Lawrence  wrote:

On 19/01/16 11:15, Christophe Lyon wrote:


For neon_vdupn, I chose to implement neon_vdup_nv4hf and
neon_vdup_nv8hf instead of updating the VX iterator because I thought
it was not desirable to impact neon_vrev32.



Well, the same instruction will suffice for vrev32'ing vectors of HF just
as
well as vectors of HI, so I think I'd argue that's harmless enough. To
gain the
benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
though.


Since this is more intrusive, I'd rather leave that part for later. OK?



Sure.


+#ifdef __ARM_BIG_ENDIAN
+  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
+ right value for vectors with 8 lanes.  */
+#define __arm_lane(__vec, __idx) (__idx ^ 3)
+#else
+#define __arm_lane(__vec, __idx) __idx
+#endif
+



Looks right, but sounds... my concern here is that I'm hoping at some
point we
will move the *other* vget/set_lane intrinsics to use GCC vector
extensions
too. At which time (unlike __aarch64_lane which can be used everywhere)
this
will be the wrong formula. Can we name (and/or comment) it to avoid
misleading
anyone? The key characteristic seems to be that it is for vectors of
16-bit
elements only.


I'm not to follow, here. Looking at the patterns for
neon_vget_lane_*internal in neon.md,
I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".

Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
that would be similar to the aarch64 ones (by computing the number of
lanes of the input vector), but the "q" one would use half the total
number of lanes instead?



That works for me! Sthg like:

#define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
#define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
(NUM_LANES(__vec)/2 - __idx)
//or similarly
#define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))

Alternatively I'd been thinking

#define __arm_lane_32xN(__idx) __idx ^ 1
#define __arm_lane_16xN(__idx) __idx ^ 3
#define __arm_lane_8xN(__idx) __idx ^ 7

Bear in mind PR64893 that we had on AArch64 :-(



Here is a new version, based on the comments above.
I've also removed the addition of arm_fp_ok effective target since I
added that in my other testsuite patch.

OK now?


Yes. I realize my worry about PR64893 doesn't apply here since we pass constant 
lane numbers / vector bounds into __builtin_arm_lane_check. Thanks!


--Alan



Thanks,

Christophe


Cheers, Alan




Re: [PATCH] Fix aarch64 bootstrap (pr69416)

2016-01-22 Thread Richard Earnshaw (lists)
On 22/01/16 17:07, Richard Henderson wrote:
> The bare CONST_INT inside the CCmode IF_THEN_ELSE is causing combine to
> make incorrect simplifications.  At this stage it feels safer to wrap
> the CONST_INT inside of an UNSPEC than make more generic changes to
> combine.
> 
> But we should definitely investigate combine's CCmode issues for gcc7.
> 

Agreed.

> 
> Ok?
> 
OK.

R.

> 
> r~
> 
> d-69416
> 
> 
>   * config/aarch64/aarch64.md (UNSPEC_NZCV): New.
>   (ccmp, fccmp, fccmpe): Use it.
>   
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 2f543aa..71fc514 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -129,6 +129,7 @@
>  UNSPEC_RSQRT
>  UNSPEC_RSQRTE
>  UNSPEC_RSQRTS
> +UNSPEC_NZCV
>  ])
>  
>  (define_c_enum "unspecv" [
> @@ -280,7 +281,7 @@
> (compare:CC
>   (match_operand:GPI 2 "register_operand" "r,r,r")
>   (match_operand:GPI 3 "aarch64_ccmp_operand" "r,Uss,Usn"))
> -   (match_operand 5 "immediate_operand")))]
> +   (unspec:CC [(match_operand 5 "immediate_operand")] UNSPEC_NZCV)))]
>""
>"@
> ccmp\\t%2, %3, %k5, %m4
> @@ -298,7 +299,7 @@
> (compare:CCFP
>   (match_operand:GPF 2 "register_operand" "w")
>   (match_operand:GPF 3 "register_operand" "w"))
> -   (match_operand 5 "immediate_operand")))]
> +   (unspec:CCFP [(match_operand 5 "immediate_operand")] UNSPEC_NZCV)))]
>"TARGET_FLOAT"
>"fccmp\\t%2, %3, %k5, %m4"
>[(set_attr "type" "fcmp")]
> @@ -313,7 +314,7 @@
>  (compare:CCFPE
>   (match_operand:GPF 2 "register_operand" "w")
>   (match_operand:GPF 3 "register_operand" "w"))
> -   (match_operand 5 "immediate_operand")))]
> +   (unspec:CCFPE [(match_operand 5 "immediate_operand")] UNSPEC_NZCV)))]
>"TARGET_FLOAT"
>"fccmpe\\t%2, %3, %k5, %m4"
>[(set_attr "type" "fcmp")]
> 



Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute

2016-01-22 Thread Alan Lawrence

On 21/01/16 17:23, Alan Lawrence wrote:
> On 18/01/16 17:10, Eric Botcazou wrote:
>>
>> Could you post the list of files that differ?  How do they differ exactly?
>
> Hmmm. Well, I definitely had this failing to bootstrap once. I repeated that, 
> to
> try to identify exactly what the differences wereand it succeeded even 
> with
> my pre-AAPCS64-update host compiler. So, this is probably a false alarm; I'm
> bootstrapping again, after a rebase, to make sure...
>
> --Alan

Ok, rebased onto a more recent build, and bootstrapping with Ada posed no
problems. Sorry for the noise.

However, I had to drop the assert that TYPE_FIELDS was non-null because of some
C++ testcases.

Is this version OK for trunk?

--Alan

gcc/ChangeLog:

* gcc/config/aarch64/aarch64.c (aarch64_function_arg_alignment):
Rewrite, looking one level down for records and arrays.
---
 gcc/config/aarch64/aarch64.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9142ac0..b084f83 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1925,22 +1925,23 @@ aarch64_vfp_is_call_candidate (cumulative_args_t 
pcum_v, machine_mode mode,
 static unsigned int
 aarch64_function_arg_alignment (machine_mode mode, const_tree type)
 {
-  unsigned int alignment;
+  if (!type)
+return GET_MODE_ALIGNMENT (mode);
+  if (integer_zerop (TYPE_SIZE (type)))
+return 0;
 
-  if (type)
-{
-  if (!integer_zerop (TYPE_SIZE (type)))
-   {
- if (TYPE_MODE (type) == mode)
-   alignment = TYPE_ALIGN (type);
- else
-   alignment = GET_MODE_ALIGNMENT (mode);
-   }
-  else
-   alignment = 0;
-}
-  else
-alignment = GET_MODE_ALIGNMENT (mode);
+  gcc_assert (TYPE_MODE (type) == mode);
+
+  if (!AGGREGATE_TYPE_P (type))
+return TYPE_ALIGN (TYPE_MAIN_VARIANT (type));
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+return TYPE_ALIGN (TREE_TYPE (type));
+
+  unsigned int alignment = 0;
+
+  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+alignment = std::max (alignment, DECL_ALIGN (field));
 
   return alignment;
 }
-- 
1.9.1



Re: C++ PATCH for c++/69379 (ICE with PTRMEM_CST wrapped in NOP_EXPR)

2016-01-22 Thread Marek Polacek
On Thu, Jan 21, 2016 at 01:49:14PM -0500, Jason Merrill wrote:
> On 01/21/2016 01:25 PM, Marek Polacek wrote:
> >The problem in this PR is that we have a PTRMEM_CST wrapped in NOP_EXPR
> >and fold_convert can't digest that.
> 
> Why didn't we fold away the NOP_EXPR before calling fold_convert?  I guess
> we shouldn't call fold_convert on an un-folded operand.

So we start with fargs[j] = maybe_constant_value (argarray[j]); in
build_over_call, where argarray[j] is
(const struct 
{
  void Dict:: (struct Dict *, int) * __pfn;
  long int __delta;
} &) _EXPR >>
so we go to the
3607 case NOP_EXPR:
case.  Here cxx_eval_constant_expression evaluates the inner ptrmem_cst,
then there's
3619 if (TREE_CODE (op) == PTRMEM_CST
3620 && !TYPE_PTRMEM_P (type))
3621   op = cplus_expand_constant (op);
but that doesn't trigger, because type is TYPE_PTRMEM_P.
Then we fold () the whole expression but that doesn't change the expression
(and I don't think it should do anything with C++-specific PTRMEM_CST) so we're
stuck with NOP_EXPR around PTRMEM_CST.

So maybe cxx_eval_constant_expression should handle PTRMEM_CSTs wrapped in
NOP_EXPR specially, but I don't know how.

Marek


Re: Suspicious code in fold-const.c

2016-01-22 Thread Bernd Schmidt

On 01/22/2016 02:37 PM, Andrew MacLeod wrote:


 /* If the initializer is non-void, then it's a normal expression
that will be assigned to the slot.  */
(*)  if (!VOID_TYPE_P (t))
  (*) return RECURSE (t);




I suspect this should also be
if (!VOID_TYPE_P(TREE_TYPE(t))


The terminology in the documentation is somewhat unfortunate:

/* For TARGET_EXPR, operand 0 is the target of an initialization,
   operand 1 is the initializer for the target, which may be void
 if simply expanding it initializes the target.
   operand 2 is the cleanup for this node, if any.
   operand 3 is the saved initializer after this node has been
   expanded once; this is so we can re-expand the tree later.  */
DEFTREECODE (TARGET_EXPR, "target_expr", tcc_expression, 4)

I suspect that should read "which may have void type". Code in cp/tree.c 
also looks at the type of the initializer to see if it is void, so I 
think you are right with your suspicion.


So, I think your proposed change is OK (modulo formatting), but it may 
cause problems since it'll enable code that was never tested. Maybe best 
to do it for gcc-7. Ideally you'd also make a change cleaning up the 
wording in tree.def.



Bernd


Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-01-22 Thread Jakub Jelinek
On Fri, Jan 22, 2016 at 02:18:38PM +0100, Bernd Schmidt wrote:
> On 01/22/2016 09:36 AM, Jakub Jelinek wrote:
> >
> >I think it is a bad idea to go against what the user wrote.  Warning that
> >some code might not be efficient?  Perhaps (if properly guarded with some
> >warning option one can turn off, either on a per-source file or using
> >pragmas even more fine grained).  But by default not offloading?  That is
> >just wrong.
> 
> I'm leaning more towards Thomas' side of the argument. The kernels construct
> is a hint, a "do your best" request to the compiler. If the compiler sees
> that it can't parallelize a loop inside a kernels region, it's probably best
> not to offload it.

What about #pragma oacc parallel?  That would never do that?

Jakub


Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-01-22 Thread Bernd Schmidt

On 01/22/2016 02:25 PM, Jakub Jelinek wrote:


What about #pragma oacc parallel?  That would never do that?


It shouldn't, no (IMO).


Bernd



Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-01-22 Thread Wilco Dijkstra
On 12/16/2015 03:30 PM, Evandro Menezes wrote:
>
>    On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:
>
>    On 20 October 2015 at 00:40, Evandro Menezes  
>wrote:
>
>    In the existing targets, it seems that it's always faster to zero 
>up a DF
>
>    register with "movi %d0, #0" instead of "fmov %d0, xzr".
>
>    This patch modifies the respective pattern.
>
>
>    Hi Evandro,
>
>    This patch changes the generic, u architecture independent instruction
>    selection. The ARM ARM (C3.5.3) makes a specific recommendation about
>    the choice of instruction in this situation and the current
>    implementation in GCC follows that recommendation.  Wilco has also
>    picked up on this issue he has the same patch internal to ARM along
>    with an ongoing discussion with ARM architecture folk regarding this
>    recommendation.  I'm reluctant to take this patch right now on the
>    basis that it runs contrary to ARM ARM recommendation pending the
>    conclusion of Wilco's discussion with ARM architecture folk.
>
>
>    Have you had a chance to discuss this internally further?

Yes, it was decided to remove the recommendation from future ARM ARM's.

Several review comments on your patch 
(https://patchwork.ozlabs.org/patch/532736):

* This should be added to movhf, movsf and movdf - movtf already does this.
* It is important to set the "fp" and "simd" attributes so that the movi 
variant can
   only be selected if it is available.

Cheers,
Wilco



Re: [PATCH][ARM] Fix PR target/69245 Rewrite arm_set_current_function

2016-01-22 Thread Kyrill Tkachov

Hi Christian,

On 22/01/16 14:07, Christian Bruel wrote:

Hi Kyrill,

On 01/21/2016 01:22 PM, Kyrill Tkachov wrote:

Hi Christian,

On 21/01/16 10:36, Christian Bruel wrote:

The current arm_set_current_function was both awkward and buggy. For instance 
using partially set TARGET_OPTION set from pragma_parse, while 
restore_target_globalsnor arm_option_params_internal was not reset. Another 
issue is that in some
paths, target_reinit was not called due to old cached 
target_option_current_node value. for instance with

foo{}
#pragma GCC target ...

foo was called with global_options set from old GCC target (which was wrong) 
and correct rtl values.

This is a reimplementation of the function. Hoping to be easier to read (and 
maintain). Solves the current issues seen so far.

regtested for arm-linux-gnueabi -mfpu=vfp, -mfpu=neon,-mfpu=neon-fp-armv8


Thanks for the patch, I'll try it out.
In the meantime there's a couple of style and typo nits...

+  /* Make sure that target_reinit is called for next function, since
+ TREE_TARGET_OPTION might change with the #pragma even if there are
+ no target attribute attached to the function.  */

s/attribute/attributes

-  arm_previous_fndecl = fndecl;
+  /* if no attribute, use the mode set by the current pragma target.  */
+  if (! new_tree)
+new_tree = target_option_current_node;
+

s/if/If/

+  /* now target_reinit can save the state for later. */
+  TREE_TARGET_GLOBALS (new_tree)
+= save_target_globals_default_opts ();

s/now/Now/



While playing on my side. I realized that we could simplify the patch further 
by removing the need to set and use target_option_current_node, since this is 
redundant with what handle_pragma_push/pop_options does.
Also since that the functions inside a pragma GCC target region will have 
DECL_FUNCTION_SPECIFIC_TARGET set already, we don't seem to need a special case 
for those.

With this V2, arm_set_current_function is becoming more minimalist and still 
fixes the current issues. Could you test this version instead ?



Thanks, I'll check this out instead.
I've played a bit with your previous version and the effect on the testcases 
looked ok, but I have a couple of
comments on the testcase in the meantime

Index: gcc/testsuite/gcc.target/arm/pr69245.c
===
--- gcc/testsuite/gcc.target/arm/pr69245.c  (revision 0)
+++ gcc/testsuite/gcc.target/arm/pr69245.c  (working copy)
@@ -0,0 +1,24 @@
+/* PR target/69245 */
+/* Test that pop_options restores the vfp fpu mode.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-add-options arm_fp } */
+
+#pragma GCC target "fpu=vfp"
+
+#pragma GCC push_options
+#pragma GCC target "fpu=neon"
+int a, c, d;
+float b;
+static int fn1 ()
+{
+  return 0;
+}
+#pragma GCC pop_options
+
+void fn2 ()
+{
+  d = b * c + a;
+}
+
+/* { dg-final { scan-assembler-times "\.fpu vfp" 1 } } */


PR 69245 is an ICE whereas your testcase doesn't ICE without the patch, it just 
fails the
scan-assembler check. I'd like to have the testcase trigger the ICE without 
your patch.
For that we need -O2 in dg-options.
Also, the "fpu=vfp" pragma you put in the beginning doesn't allow the ICE to 
trigger, presumably
because it triggers a different path through the pragma option popping code.
So removing that pragma and instead changing the dg-add-options from arm_fp to 
arm_vfp3 (which is
floating-point without the vfma instruction causes the ICE) does the trick for 
me.
Also the "fpu=neon" pragma should also be changed to be "fpu=neon-vfpv4" 
because that setting allows
the vfma instruction which is being wrongly considered in fn2().
I suppose you'll then want to change the scan-assembler directive to look for 
\.fpu vfp3.

Thanks,
Kyrill



Re: Suspicious code in fold-const.c

2016-01-22 Thread Andrew MacLeod

On 01/22/2016 06:03 AM, Richard Biener wrote:

On Fri, Jan 22, 2016 at 12:06 AM, Andrew MacLeod  wrote:

I was trying the ttype prototype for static type checking on fold-const.c to
see how long it would take me to convert such a large file, and it choked on
this snippet of code in fold_unary_loc, around lines 7690-7711:

suspect code tagged with (*)

  if ((CONVERT_EXPR_CODE_P (code)
|| code == NON_LVALUE_EXPR)
   && TREE_CODE (tem) == COND_EXPR
   && TREE_CODE (TREE_OPERAND (tem, 1)) == code
   && TREE_CODE (TREE_OPERAND (tem, 2)) == code
   (*)  && ! VOID_TYPE_P (TREE_OPERAND (tem, 1))
   (*)  && ! VOID_TYPE_P (TREE_OPERAND (tem, 2))
   && (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (tem, 1), 0))
   == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (tem, 2), 0)))
   && (! (INTEGRAL_TYPE_P (TREE_TYPE (tem))
  && (INTEGRAL_TYPE_P
  (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (tem, 1),
0
  && TYPE_PRECISION (TREE_TYPE (tem)) <= BITS_PER_WORD)
   || flag_syntax_only))
 tem = build1_loc (loc, code, type,
   build3 (COND_EXPR,
   TREE_TYPE (TREE_OPERAND
  (TREE_OPERAND (tem, 1),
0)),
   TREE_OPERAND (tem, 0),
   TREE_OPERAND (TREE_OPERAND (tem, 1),
0),
   TREE_OPERAND (TREE_OPERAND (tem, 2),
 0)));

and with:
#define VOID_TYPE_P(NODE)  (TREE_CODE (NODE) == VOID_TYPE)

I don't think this is what was intended. it would expand into:

   && TREE_CODE (TREE_OPERAND (tem, 1)) == code
   && TREE_CODE (TREE_OPERAND (tem, 2)) == code
&& ! (TREE_CODE (TREE_OPERAND (tem, 1)) == VOID_TYPE)
&& ! (TREE_CODE (TREE_OPERAND (tem, 2)) == VOID_TYPE)

the latter two would be obviously true if the first 2 were true.

My guess is this is probably suppose to be
&& ! VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (tem, 1)))
  && ! VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (tem, 2)))

but I'm not sure.   Any guesses whats intended here?

Not sure, it might be to detect some of the x ? : throw () constructs
but not sure how those would survive the previous == code check.
Maybe a ? (void) ... : (void) ... is supposed to be detected.

The type check below would catch that as well
(huh?  a flag_syntax_only check in fold-const.c!?)

I'd say change to ! VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (tem, 1))
to match what the VOID_TYPE_P check does above this block.


Thats what im going with for now anyway :-)

A second one has shown up which is less obvious.   Are we in the habit 
of using  void_type_node as expressions? I would have guessed not.   I'm 
going to guess this is another case of forgetting TREE_TYPE(x)


tree_invalid_nonnegative_warnv_p has the following snippet:

  enum tree_code code = TREE_CODE (t);
  if (TYPE_UNSIGNED (TREE_TYPE (t)))
return true;

  switch (code)
{
case TARGET_EXPR:
  {
tree temp = TARGET_EXPR_SLOT (t);
t = TARGET_EXPR_INITIAL (t);

/* If the initializer is non-void, then it's a normal expression
   that will be assigned to the slot.  */
(*)  if (!VOID_TYPE_P (t))
 (*) return RECURSE (t);

/* Otherwise, the initializer sets the slot in some way. One common
   way is an assignment statement at the end of the 
initializer.  */

while (1)
  {
if (TREE_CODE (t) == BIND_EXPR)
  t = expr_last (BIND_EXPR_BODY (t));
else if (TREE_CODE (t) == TRY_FINALLY_EXPR


The RECURSE macro calls tree_expr_nonnegative_warnv_p(), which performs 
a switch() on the class of TREE_CODE, and handles cases: tcc_binary:  
tcc_comparison: tcc_unary: tcc_constant: tcc_declaration: 
tcc_reference:   so nothing for tcc_type.


then performs a switch on the TREE_CODE and handles a bunch of EXPR's.. 
falls into the default case which calls back into 
tree_invalid_nonnegative_warnv_p where it ends up doing nothing for a 
type node.


Which means this ends up also executing code to no purpose.

I suspect this should also be
if (!VOID_TYPE_P(TREE_TYPE(t))

What do you think?

Andrew





Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-01-22 Thread Bernd Schmidt

On 01/22/2016 09:36 AM, Jakub Jelinek wrote:


I think it is a bad idea to go against what the user wrote.  Warning that
some code might not be efficient?  Perhaps (if properly guarded with some
warning option one can turn off, either on a per-source file or using
pragmas even more fine grained).  But by default not offloading?  That is
just wrong.


I'm leaning more towards Thomas' side of the argument. The kernels 
construct is a hint, a "do your best" request to the compiler. If the 
compiler sees that it can't parallelize a loop inside a kernels region, 
it's probably best not to offload it.



Bernd



Re: Speedup configure and build with system.h

2016-01-22 Thread Michael Matz
Hi,

On Fri, 22 Jan 2016, Oleg Endo wrote:

>  and  have been put into system.h because there have 
> been problems with malloc poisoning and C++ stdlib implementation other 
> than libstdc++, which sometimes pull other headers which then cause 
> trouble.  The fix for this set of errors was to include some of the 
> stdlib headers in system.h before anything else.

Richard meant to remove use of std::string in the compiler, at which point 
it's not necessary to include  anywhere, in system.h or whereever.

That's a separate discussion, though (I would agree with it).


Ciao,
Michael.


[PATCH][Testsuite] Fix scan-tree-dump failures with vect_multiple_sizes

2016-01-22 Thread Alan Lawrence
Since r230292, these tests in gcc.dg/vect have been failing on ARM, AArch64, 
and x86_64 with -march=haswell (among others - when prefer_avx128 is true):

vect-outer-1-big-array.c scan-tree-dump-times vect "grouped access in outer 
loop" 2
vect-outer-1.c scan-tree-dump-times vect "grouped access in outer loop" 2
vect-outer-1a-big-array.c scan-tree-dump-times vect "grouped access in outer 
loop" 2
vect-outer-1a.c scan-tree-dump-times vect "grouped access in outer loop" 2
vect-outer-1b-big-array.c scan-tree-dump-times vect "grouped access in outer 
loop" 2
vect-outer-1b.c scan-tree-dump-times vect "grouped access in outer loop" 2
vect-outer-2b.c scan-tree-dump-times vect "grouped access in outer loop" 2
vect-outer-3b.c scan-tree-dump-times vect "grouped access in outer loop" 4
vect-reduc-dot-s8b.c scan-tree-dump-times vect "vect_recog_widen_mult_pattern: 
detected" 2
(plus all the -flto -ffat-lto-objects equivalents).

This is because that commit changed vect_analyze_loop{,_2} to bail out early
and avoid retrying with a different vector size on finding a fatal error;
all the above tests are conditioned on { target vect_multiple_sizes }.

Hence, drop the vect_multiple_sizes version of the scan-tree-dump, as we now
expect those messages to show up once everywhere.

The optional extra would be to add a message that vect_analyze_loop was failing
with *fatal* error, and scan for that, but that doesn't really seem warranted.

Tested vect.exp on aarch64-none-elf, arm-none-eabi, and x86_64 linux with 
-march=haswell.

OK for trunk?

Cheers, Alan

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-outer-1-big-array.c: Drop vect_multiple_sizes;
use same scan-tree-dump-times on all platforms.
* gcc.dg/vect/vect-outer-1.c: Likewise.
* gcc.dg/vect/vect-outer-1a-big-array.c: Likewise.
* gcc.dg/vect/vect-outer-1a.c: Likewise.
* gcc.dg/vect/vect-outer-1b-big-array.c: Likewise.
* gcc.dg/vect/vect-outer-1b.c: Likewise.
* gcc.dg/vect/vect-outer-2b.c: Likewise.
* gcc.dg/vect/vect-outer-3b.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-outer-1-big-array.c  | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-1.c| 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-1a-big-array.c | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-1a.c   | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-1b-big-array.c | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-1b.c   | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-2b.c   | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-outer-3b.c   | 3 +--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  | 3 +--
 9 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-1-big-array.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-1-big-array.c
index 6c61b09..63918ad 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-1-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-1-big-array.c
@@ -22,5 +22,4 @@ foo (){
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
*-*-* } } } */
-/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" { 
target { ! vect_multiple_sizes } } } } */
-/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 2 "vect" { 
target vect_multiple_sizes } } } */
+/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" } 
} */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-1.c
index 5fdaa83..b1bcbc2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-1.c
@@ -22,5 +22,4 @@ foo (){
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
*-*-* } } } */
-/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" { 
target { ! vect_multiple_sizes } } } } */
-/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 2 "vect" { 
target vect_multiple_sizes } } } */
+/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" } 
} */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-1a-big-array.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-1a-big-array.c
index 68b25f9..98dfcfb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-outer-1a-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-1a-big-array.c
@@ -20,5 +20,4 @@ foo (){
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail 
*-*-* } } } */
-/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" { 
target { ! vect_multiple_sizes } } } } */
-/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 2 "vect" { 
target vect_multiple_sizes } } } */
+/* { dg-final { scan-tree-dump-times "grouped access in outer loop" 1 "vect" } 
} */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-1a.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-1a.c
index 85649af..0200fb4 100644
--- 

Re: [PATCH PR68542]

2016-01-22 Thread Yuri Rumyantsev
Richard,

I fixed all remarks pointed by you in vectorizer part of patch. Could
you take a look on modified patch.

Uros,

Could you please review i386 part of patch related to support of
conditional branches with vector comparison.

Bootstrap and regression testing did not show any new failures.
Is it OK for trunk?

Thanks.
Yuri.

ChangeLog:

2016-01-22  Yuri Rumyantsev  

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Add support for conditional
brnach with vector comparison.
* config/i386/sse.md (define_expand "cbranch4): Add define-expand
for vector comparion with eq/ne only.
(optimize_mask_stores): New function.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores after vec_info destroy.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-mask-store-move-1.c: New test.
* testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

2016-01-20 15:24 GMT+03:00 Richard Biener :
> On Mon, Jan 18, 2016 at 3:50 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> Here is the second part of patch which really preforms mask stores and
>> all statements related to it to new basic block guarded by test on
>> zero mask. Hew test is also added.
>>
>> Is it OK for trunk?
>
> +  /* Pick up all masked stores in loop if any.  */
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
> +{
> +  stmt = gsi_stmt (gsi);
>
> you fail to iterate over all BBs of the loop here.  Please follow
> other uses in the
> vectorizer.
>
> +  while (!worklist.is_empty ())
> +{
> +  gimple *last, *last_store;
> +  edge e, efalse;
> +  tree mask;
> +  basic_block store_bb, join_bb;
> +  gimple_stmt_iterator gsi_to;
> +  /* tree arg3; */
>
> remove
>
> +  tree vdef, new_vdef;
> +  gphi *phi;
> +  bool first_dump;
> +  tree vectype;
> +  tree zero;
> +
> +  last = worklist.pop ();
> +  mask = gimple_call_arg (last, 2);
> +  /* Create new bb.  */
>
> bb should be initialized from gimple_bb (last), not loop->header
>
> +  e = split_block (bb, last);
>
> +   gsi_from = gsi_for_stmt (stmt1);
> +   gsi_to = gsi_start_bb (store_bb);
> +   gsi_move_before (_from, _to);
> +   update_stmt (stmt1);
>
> I think the update_stmt is redundant and you should be able to
> keep two gsis throughout the loop, from and to, no?
>
> +   /* Put other masked stores with the same mask to STORE_BB.  */
> +   if (worklist.is_empty ()
> +   || gimple_call_arg (worklist.last (), 2) != mask
> +   || !is_valid_sink (worklist.last (), last_store))
>
> as I understand the code the last check is redundant with the invariant
> you track if you verify the stmt you breaked from the inner loop is
> actually equal to worklist.last () and you add a flag to track whether
> you did visit a load (vuse) in the sinking loop you didn't end up sinking.
>
> + /* Issue different messages depending on FIRST_DUMP.  */
> + if (first_dump)
> +   {
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "Move MASK_STORE to new bb#%d\n",
> +  store_bb->index);
> + first_dump = false;
> +   }
> + else
> +   dump_printf_loc (MSG_NOTE, vect_location,
> +"Move MASK_STORE to created bb\n");
>
> just add a separate dump when you create the BB, "Created new bb#%d for ..."
> to avoid this.
>
> Note that I can't comment on the x86 backend part so that will need to
> be reviewed by somebody
> else.
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2016-01-18  Yuri Rumyantsev  
>>
>> PR middle-end/68542
>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>> comparison with boolean result.
>> * config/i386/sse.md (define_expand "cbranch4): Add define-expand
>> for vector comparion with eq/ne only.
>> * tree-vect-loop.c (is_valid_sink): New function.
>> (optimize_mask_stores): Likewise.
>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>> has_mask_store field of vect_info.
>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>> vectorized loops having masked stores.
>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>> correspondent macros.
>> (optimize_mask_stores): Add prototype.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>>
>> 2016-01-18 17:07 GMT+03:00 Richard Biener :
>>> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev 

Re: [PATCH 4/5] Don't mark targets of unconditional jumps with side effects as FALLTHRU.

2016-01-22 Thread Marcin Kościelnicki

On 22/01/16 08:44, Andreas Krebbel wrote:

On 01/22/2016 12:10 AM, Jeff Law wrote:

On 01/21/2016 03:05 AM, Andreas Krebbel wrote:

On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:

When an unconditional jump with side effects targets an immediately
following label, rtl_tidy_fallthru_edge is called.  Since it has side
effects, it doesn't remove the jump, but the label is still marked
as fallthru.  This later causes a verification error.  Do nothing in this
case instead.

gcc/ChangeLog:

* cfgrtl.c (rtl_tidy_fallthru_edge): Bail for unconditional jumps
with side effects.


The change looks ok to me (although I'm not able to approve it). Could you 
please run regressions
tests on x86_64 with that change?

Perhaps a short comment in the code would be good.

I think the patch is technically fine, the question is does it fix a
visible bug?  I read the series as new feature enablement so I put this
patch into my gcc7 queue.


We need the patch for the S/390 split-stack implementation which we would like 
to see in GCC 6.  I'm
aware that this isn't stage 3 material but people seem to have reasons to 
really want split stack on
S/390 asap and we would have to backport this feature anyway. Therefore I would 
prefer to have it in
the official release already. That's the only common code change we would need 
for that.

I've started a bootstrap and regression test for the patch also on Power.

Do you see a chance we can get this into GCC 6?

Bye,

-Andreas-



I've tested the patch on x86_64, no regressions.

I'm not entirely sure if the patch needs to go in for the current 
version of split-stack support.


This patch fixed a showstopper bug on g5 CPUs when the patch still 
supported them.  I haven't seen this bug with the z900 sequences (which 
are now the only ones left), but since we're still using unconditional 
jumps with side effects, I left it in just to be safe.  The testsuite 
passes on s390x -fsplit-stack both with the patch and without it.


So, I don't know.  It seems to work now, probably because no 
optimization pass has a reason to touch that jump, but it may start to 
fail if someone adds a new optimization that tries to be smart with our 
prologue.


Re: Speedup configure and build with system.h

2016-01-22 Thread Michael Matz
Hi,

On Thu, 21 Jan 2016, Richard Biener wrote:

> I'm inclined to say #define INCLUDE_ALGORITHM is a better name,

I've done this.  On a different (slower) machine than the one from the 
initial mail:

without patch, -j31 bootstrap all,ada:
real35m2.655s
user395m28.135s
sys 12m10.814s

with patch, -j31 bootstrap all,ada:
real31m45.942s
user364m17.566s
sys 11m1.173s

So, even real-time savings of 3 minutes with a -j31 build; I'll take that 
:)

> and please convert the (bogus) ISL way of achieving a similar thing.

But I've not done this, as I'm not too satisfied with the result.  See 
below; we get rid of the small USES_ISL condition around poisoning 
calloc/strdup, but pay with it for a larger include block in system.h and 
the fact that now any changes to the ISL include list result in a 
recompilation of everything as system.h, not just graphite.h is changed.  
We'd trade a small hack with a larger one for policy reasons.  Let me know 
if you think it's nevertheless better.


Ciao,
Michael.

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 0544700..dabc0b9 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -19,7 +19,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 562cee0..8108227 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index fe8a71a..170e535 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index efa39bf..a3f8ada 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -19,7 +19,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index d026d4f..cea629b 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -19,7 +19,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 92ab2f9..a46b63c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 
@@ -47,16 +47,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "domwalk.h"
 #include "tree-ssa-propagate.h"
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
 #include "graphite.h"
 
 /* Assigns to RES the value of the INTEGER_CST T.  */
diff --git a/gcc/graphite.c b/gcc/graphite.c
index 3236006..c85823d 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
The wiki page http://gcc.gnu.org/wiki/Graphite contains pointers to
the related work.  */
 
-#define USES_ISL
+#define INCLUDE_ISL
 
 #include "config.h"
 #include "system.h"
diff --git a/gcc/graphite.h b/gcc/graphite.h
index 2f36ded..30c4d2c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -23,29 +23,6 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_GRAPHITE_POLY_H
 
 #include "sese.h"
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
-/* isl 0.15 or later.  */
-#include 
-
-#else
-/* isl 0.14 or 0.13.  */
-# define isl_stat int
-# define isl_stat_ok 0
-#endif
 
 typedef struct poly_dr *poly_dr_p;
 
diff --git a/gcc/system.h b/gcc/system.h
index 8151e0a..d5b2f85 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -669,6 +669,31 @@ extern int vsnprintf (char *, size_t, const char *, 
va_list);
 #include 
 #endif
 

Re: [PATCH] Fix the remaining PR c++/24666 blockers (arrays decay to pointers too early)

2016-01-22 Thread Patrick Palka

On Thu, 21 Jan 2016, Patrick Palka wrote:


On Thu, 21 Jan 2016, Jason Merrill wrote:


On 01/19/2016 10:30 PM, Patrick Palka wrote:

 * g++.dg/template/unify17.C: XFAIL.


Hmm, I'm not comfortable with deliberately regressing this testcase.


  template 
-void bar (void (T[5])); // { dg-error "array of 'void'" }
+void bar (void (T[5])); // { dg-error "array of 'void'" "" { xfail
*-*-* } }


Can we work it so that T[5] also is un-decayed in the DECL_ARGUMENTS of 
bar, but decayed in the TYPE_ARG_TYPES?


I think so, I'll try it.


Well, I tried it and the result is really ugly and it only "somewhat"
works.  (Maybe I'm just missing something obvious though.)  The ugliness
comes from the fact that decaying an array parameter type of a function
type embedded deep within some tree structure requires rebuilding all
the tree structures in between to avoid issues due to tree sharing.
This approach only "somewhat" works because it currently looks through
function, pointer, reference and array types.  And I just noticed that
this approach does not work at all for USING_DECLs because no PARM_DECL
is ever retained anyway in that case.

I think a better, complete fix for this issue would be to, one way or
another, be able to get at the PARM_DECLs that correspond to a given
FUNCTION_TYPE.  Say, if, the TREE_CHAIN of a FUNCTION_TYPE optionally
pointed to its PARM_DECLs, or something.  What do you think?

In the meantime, at this stage, I am personally most comfortable with
the previous patch (the one that XFAILs unify17.C).


diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index f4604b6..c70eb12 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -9082,6 +9082,92 @@ check_var_type (tree identifier, tree type)
   return type;
 }

+/* Given a type T, return a copy of T within which each array parameter type of
+   each function type embedded in T has been decayed to the corresponding
+   pointer type.  If no decaying was done, return T.
+
+   For example, if T corresponds to the type
+
+   void (** (char, void (&) (int[5]), char[7])) (int (*) (long[10]));
+ ~~   ~~~ ~~~
+
+   then the type returned by this function corresponds to
+
+   void (** (char, void (&) (int *), char *)) (int (*) (long *))
+ ~   ~~ ~~
+*/
+
+static tree
+decay_array_parms_r (tree t)
+{
+  if (t == NULL_TREE)
+return t;
+
+  if (FUNC_OR_METHOD_TYPE_P (t))
+{
+  auto_vec new_types;
+  new_types.reserve (8);
+  bool any_changed_p = false;
+
+  for (tree arg = TYPE_ARG_TYPES (t);
+  arg != NULL_TREE && arg != void_list_node;
+  arg = TREE_CHAIN (arg))
+   {
+ tree old_type = TREE_VALUE (arg);
+ tree new_type;
+
+ if (TREE_CODE (old_type) == ARRAY_TYPE)
+   new_type = decay_array_parms_r (type_decays_to (old_type));
+ else
+   new_type = decay_array_parms_r (old_type);
+
+ if (old_type != new_type)
+   any_changed_p = true;
+
+ new_types.safe_push (new_type);
+   }
+
+  tree old_ret_type = TREE_TYPE (t);
+  tree new_ret_type = decay_array_parms_r (old_ret_type);
+  if (old_ret_type != new_ret_type)
+   any_changed_p = true;
+
+  if (!any_changed_p)
+   return t;
+
+  tree new_type_arg_types = NULL_TREE;
+  tree arg;
+  int i = 0;
+  for (arg = TYPE_ARG_TYPES (t);
+  arg != NULL_TREE && arg != void_list_node;
+  arg = TREE_CHAIN (arg), i++)
+   new_type_arg_types = tree_cons (TREE_PURPOSE (arg),
+   new_types[i],
+   new_type_arg_types);
+  new_type_arg_types = nreverse (new_type_arg_types);
+  if (arg == void_list_node)
+   new_type_arg_types = chainon (new_type_arg_types, void_list_node);
+
+  return build_function_type (new_ret_type, new_type_arg_types);
+}
+  else if (TREE_CODE (t) == POINTER_TYPE)
+{
+  tree old_type_type = TREE_TYPE (t);
+  tree new_type_type = decay_array_parms_r (old_type_type);
+  if (old_type_type != new_type_type)
+   return build_pointer_type (new_type_type);
+}
+  else if (TREE_CODE (t) == REFERENCE_TYPE)
+{
+  tree old_type_type = TREE_TYPE (t);
+  tree new_type_type = decay_array_parms_r (old_type_type);
+  if (old_type_type != new_type_type)
+   return cp_build_reference_type (new_type_type, TYPE_REF_IS_RVALUE (t));
+}
+
+  return t;
+}
+
 /* Given declspecs and a declarator (abstract or otherwise), determine
the name and type of the object declared and construct a DECL node
for it.
@@ -10213,6 +10299,15 @@ grokdeclarator (const cp_declarator *declarator,

type = build_function_type (type, arg_types);

+   /* Decay all the (dependent) array parameter types embedded in this
+  function type into their corresponding pointer types.  This is
+  

Re: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2016-01-22 Thread Andreas Schwab
This breaks bootstrap on aarch64 by miscompiling the stage2 compiler.

../../../libgomp/priority_queue.h:422:11: internal compiler error: RTL flag 
check: MEM_VOLATILE_P used with unexpected rtx code 'mem' in 
set_mem_attributes_minus_bitpos, at emit-rtl.c:1833
   if (list->tasks == node)
   ^~~

0x5c3077 ???
../sysdeps/aarch64/start.S:81

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH PR68542]

2016-01-22 Thread H.J. Lu
On Fri, Jan 22, 2016 at 6:29 AM, Yuri Rumyantsev  wrote:
> Richard,
>
> I fixed all remarks pointed by you in vectorizer part of patch. Could
> you take a look on modified patch.
>
> Uros,
>
> Could you please review i386 part of patch related to support of
> conditional branches with vector comparison.
>
> Bootstrap and regression testing did not show any new failures.
> Is it OK for trunk?
>
> Thanks.
> Yuri.
>
> ChangeLog:
>
> 2016-01-22  Yuri Rumyantsev  
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Add support for conditional
> brnach with vector comparison.
  ^ Typo.
> * config/i386/sse.md (define_expand "cbranch4): Add define-expand
> for vector comparion with eq/ne only.
> (optimize_mask_stores): New function.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores after vec_info destroy.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
> * testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>


-- 
H.J.


Re: [PATCH][ARM][4/4] Adjust gcc.target/arm/wmul-[123].c tests

2016-01-22 Thread Kyrill Tkachov

Hi Bernd,

On 22/01/16 14:53, Bernd Schmidt wrote:

On 01/22/2016 10:52 AM, Kyrill Tkachov wrote:


AFAICT the new sequence is better than the old one even for
-mtune=cortex-a9 since it contains two fewer instructions.


Just curious (I think this patch series is good but will leave it to the arm 
folks) - are these instructions equally expensive? Some CPUs are faster when 
doing widening multiplies on smaller objects.



The widening multiplies are indeed faster on some targets (which is why we want 
to keep them in the wmul-[12].c tests).
But for wmul-3.c the new sequence uses fewer instructions. So, while the 
resulting sequences should be
of similar performance overall, the new sequence has a smaller code size.

Kyrill



Bernd




[Patch Obvious] gcc.dg/vect/bb-slp-pr68892.c requires vectorization of doubles

2016-01-22 Thread James Greenhalgh

Hi,

As title. This testcase fails on arm-none-linux-gnueabihf, because we don't
have vectorization of doubles there.

Committed as obvious as revision 232731.

Thanks,
James

---
2016-01-22  James Greenhalgh  

* gcc.dg/vect/bb-slp-pr68892.c: Require vect_double.
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr68892.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr68892.c
index 648fe481..ba51b76 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr68892.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr68892.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-fvect-cost-model=dynamic" } */
+/* { dg-require-effective-target vect_double } */
 
 double a[128][128];
 double b[128];


Re: [PATCH, rs6000] Fix PR63354

2016-01-22 Thread David Edelsohn
On Fri, Jan 22, 2016 at 12:42 AM, Bill Schmidt
 wrote:
> Hi,
>
> On Thu, 2016-01-21 at 21:21 -0600, Bill Schmidt wrote:
>> The testcase will need a slight adjustment, as currently it fails on
>> powerpc64 with -m32 testing.  Working on a fix.
>>
>> Bill
>>
>
> This patch adjusts the gcc.target/powerpc/pr63354 test to require 64-bit
> code generation, and also restricts the test to Linux targets, as this
> is necessary for using -mprofile-kernel.  Tested on
> powerpc64-unknown-linux-gnu configured with --with-cpu=power7 and
> testing with -m32; the test is now correctly skipped there.  Is this
> okay for trunk?
>
> Thanks,
> Bill
>
>
> 2016-01-22  Bill Schmidt  
>
> * gcc.target/powerpc/pr63354.c: Restrict to Linux targets with
> 64-bit support.

Okay.

Thanks, David


Re: [PATCH][ARM][4/4] Adjust gcc.target/arm/wmul-[123].c tests

2016-01-22 Thread Bernd Schmidt

On 01/22/2016 10:52 AM, Kyrill Tkachov wrote:


AFAICT the new sequence is better than the old one even for
-mtune=cortex-a9 since it contains two fewer instructions.


Just curious (I think this patch series is good but will leave it to the 
arm folks) - are these instructions equally expensive? Some CPUs are 
faster when doing widening multiplies on smaller objects.



Bernd


Re: [PATCH, rs6000] Fix PR63354

2016-01-22 Thread Bill Schmidt
OK, thanks, Joseph!  I'll make that adjustment later today.

Bill

On Fri, 2016-01-22 at 15:51 +, Joseph Myers wrote:
> On Thu, 21 Jan 2016, Bill Schmidt wrote:
> 
> > +/* { dg-do compile { target { powerpc64*-linux-* } } } */
> 
> That's suboptimal; you should allow powerpc*-*-linux* targets so that the 
> test is also run for --enable-targets=all powerpc-linux builds when 
> testing a -m64 multilib.
> 




[PATCH] Improve TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS callback

2016-01-22 Thread Wilco Dijkstra
Improve TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS target hook. It turns out there
is another case where the register allocator uses the union of register classes
without checking that the cost of the resulting register class is lower than
both (see https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01765.html ).  

This happens when the cost of the best and alternative class are both lower 
than the
memory cost.  In this case we typically end up with ALL_REGS as the allocno
class, which almost invariably results in bad allocations with many redundant
int<->FP moves (which are expensive on various cores).  AArch64 is affected by
this significantly due to supporting many scalar integer operations in SIMD.

Currently the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS hook on AArch64 forces the
class to GENERAL_REGS if the allocno class is ALL_REGS and the register has an
integer mode.  This is bad if the best class happens to be FP_REGS.  To handle
this case as well, an extra argument is needed in the hook to pass the best
class.  If the allocno class is ALL_REGS, but the best class isn't, we use the
best class instead (rather than using the mode).

Previously this might happen (for the vdup_lane.c regression discussed in the 
linked post):

r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
 a1 (r79,l0) best GENERAL_REGS, allocno GENERAL_REGS

a1(r79,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 
FP_LO_REGS:0,0 FP_REGS:0,0 ALL_REGS:1,1 MEM:9000,9000

The proposed allocno is ALL_REGS (despite having the highest cost!) and is then
forced by the hook to GENERAL_REGS because r79 has integer mode.  However 
FP_REGS
has the lowest cost.  After this patch the choice is as follows:

r79: preferred FP_REGS, alternative GENERAL_REGS, allocno FP_REGS
 a1 (r79,l0) best FP_REGS, allocno FP_REGS

As a result it is now no longer a requirement to use register move costs that 
are larger than the memory move cost.  So it will be feasible to use realistic
costs for both without a huge penalty.


ChangeLog:
2016-01-22  Wilco Dijkstra  

gcc/
* ira-costs.c (find_costs_and_classes): Add extra argument.
* target.def (ira_change_pseudo_allocno_class): Add parameter.
* targhooks.h (ira_change_pseudo_allocno_class): Likewise.
* targhooks.c (ira_change_pseudo_allocno_class): Likewise.
* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class)
Add best_class parameter, and return it if not ALL_REGS.
* config/mips/mips.c (mips_ira_change_pseudo_allocno_class): Add 
parameter.
* doc/tm.texi (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Update target 
hook.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
2584f16d345b3d015d577dd28c08a73ee3e0b0fb..f3f750ed9486a583df3753e4e67584432e9586bb
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -726,18 +726,31 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char 
*msg)
 
 /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have
-   the same cost even if ALL_REGS has a much larger cost.  This results in bad
-   allocations and spilling.  To avoid this we force the class to GENERAL_REGS
-   if the mode is integer.  */
+   the same cost even if ALL_REGS has a much larger cost.  ALL_REGS is also
+   used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory
+   cost (in this case the best class is the lowest cost one).  Using ALL_REGS
+   irrespectively of its cost results in bad allocations with many redundant
+   int<->FP moves which are expensive on various cores.
+   To avoid this we don't allow ALL_REGS as the allocno class, but force a
+   decision between FP_REGS and GENERAL_REGS.  We use the allocno class if it
+   isn't ALL_REGS.  Similarly, use the best class if it isn't ALL_REGS.
+   Otherwise set the allocno class depending on the mode.
+   The result of this is that it is no longer inefficient to have a higher
+   memory move cost than the register move cost.
+*/
 
 static reg_class_t
-aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class)
+aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class,
+reg_class_t best_class)
 {
   enum machine_mode mode;
 
   if (allocno_class != ALL_REGS)
 return allocno_class;
 
+  if (best_class != ALL_REGS)
+return best_class;
+
   mode = PSEUDO_REGNO_MODE (regno);
   return FLOAT_MODE_P (mode) || VECTOR_MODE_P (mode) ? FP_REGS : GENERAL_REGS;
 }
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 
6145944fbf7584fbafc6915cbe5fd855e1d21dc8..95f8bc06e03b605757d191a4448d974a587d623c
 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -19883,7 +19883,8 @@ mips_lra_p (void)
 /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.  */
 
 static reg_class_t

Re: [PATCH, rs6000] Fix PR63354

2016-01-22 Thread Joseph Myers
On Thu, 21 Jan 2016, Bill Schmidt wrote:

> +/* { dg-do compile { target { powerpc64*-linux-* } } } */

That's suboptimal; you should allow powerpc*-*-linux* targets so that the 
test is also run for --enable-targets=all powerpc-linux builds when 
testing a -m64 multilib.

-- 
Joseph S. Myers
jos...@codesourcery.com