Re: [PATCH] libgomp: Add comment to clarify last_team usage

2015-07-03 Thread Jakub Jelinek
On Fri, Jul 03, 2015 at 03:09:27PM +0200, Sebastian Huber wrote:
 libgomp/ChangeLog
 2015-07-03  Sebastian Huber  sebastian.hu...@embedded-brains.de
 
   * libgomp.h (gomp_thread_pool): Comment last_team field.
 ---
  libgomp/libgomp.h | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
 index 5272f01..5ed0f78 100644
 --- a/libgomp/libgomp.h
 +++ b/libgomp/libgomp.h
 @@ -458,6 +458,9 @@ struct gomp_thread_pool
struct gomp_thread **threads;
unsigned threads_size;
unsigned threads_used;
 +  /* The last team is used for non-nested teams to delay their destruction to
 + make sure all the threads in the team move on to the pool's barrier 
 before
 + the team's barrier is destroyed.  */
struct gomp_team *last_team;
/* Number of threads running in this contention group.  */
unsigned long threads_busy;
 -- 
 1.8.4.5

Ok for trunk.

Jakub


Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-03 Thread Martin Liška
On 07/03/2015 03:07 PM, Richard Sandiford wrote:
 Martin Jambor mjam...@suse.cz writes:
 On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote:
 Trevor Saunders tbsau...@tbsaunde.org writes:
 On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:
 Martin Liška mli...@suse.cz writes:
 diff --git a/gcc/asan.c b/gcc/asan.c
 index e89817e..dabd6f1 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -362,20 +362,20 @@ struct asan_mem_ref
/* Pool allocation new operator.  */
inline void *operator new (size_t)
{
 -return pool.allocate ();
 +return ::new (pool.allocate ()) asan_mem_ref ();
}
  
/* Delete operator utilizing pool allocation.  */
inline void operator delete (void *ptr)
{
 -pool.remove ((asan_mem_ref *) ptr);
 +pool.remove (ptr);
}
  
/* Memory allocation pool.  */
 -  static pool_allocatorasan_mem_ref pool;
 +  static pool_allocator pool;
  };

 I'm probably going over old ground/wounds, sorry, but what's the benefit
 of having this sort of pattern?  Why not simply have object_allocators
 and make callers use pool.allocate () and pool.remove (x) (with 
 pool.remove
 calling the destructor) instead of new and delete?  It feels wrong to me
 to tie the data type to a particular allocation object like this.

 Well the big question is what does allocate() do about construction?  if
 it seems wierd for it to not call the ctor, but I'm not sure we can do a
 good job of forwarding args to allocate() with C++98.

 If you need non-default constructors then:

   new (pool) type (aaa, bbb)...;

 doesn't seem too bad.  I agree object_allocator's allocate () should call
 the constructor.


 but then the pool allocator must not call placement new on the
 allocated memory itself because that would result in double
 construction.
 
 But we're talking about two different methods.  The normal allocator
 object_allocator T::allocate () would use placement new and return a
 pointer to the new object while operator new (size_t, object_allocator T )
 wouldn't call placement new and would just return a pointer to the memory.
 
 And using the pool allocator functions directly has the nice property
 that you can tell when a delete/remove isn't necessary because the pool
 itself is being cleared.

 Well, all these cases involve a pool with static storage lifetime right?
 so actually if you don't delete things in these pool they are
 effectively leaked.

 They might have a static storage lifetime now, but it doesn't seem like
 a good idea to hard-bake that into the interface

 Does that mean that operators new and delete are considered evil?
 
 Not IMO.  Just that static load-time-initialized caches are not
 necessarily a good thing.  That's effectively what the pool
 allocator is.
 
 (by saying that for
 these types you should use new and delete, but for other pool-allocated
 types you should use object_allocators).

 Depending on what kind of pool allocator you use, you will be forced
 to either call placement new or not, so the inconsistency will be
 there anyway.
 
 But how we handle argument-taking constructors is a problem that needs
 to be solved for the pool-allocated objects that don't use a single
 static type-specific pool.  And once we solve that, we get consistency
 across all pools:
 
 - if you want a new object and argumentless construction is OK,
   use pool.allocate ()
 
 - if you want a new object and need to pass arguments to the constructor,
   use new (pool) some_type (arg1, arg2, ...)
 
 Maybe I just have bad memories
 from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
 of state that was obviously static in the old days, but that needed
 to become non-static to support vaguely-efficient switching between
 different subtargets.  The same kind of thing is likely to happen again.
 I assume things like the jit would prefer not to have new global state
 with load-time construction.

 I'm not sure I follow this branch of the discussion, the allocators of
 any kind surely can dynamically allocated themselves?
 
 Sure, but either (a) you keep the pools as a static part of the class
 and some initialisation and finalisation code that has tendrils into
 all such classes or (b) you move the static pool outside of the
 class to some new (still global) state.  Explicit pool allocation,
 like in the C days, gives you the option of putting the pool whereever
 it needs to go without relying on the principle that you can get to
 it from global state.
 
 Thanks,
 Richard
 

Ok Richard.

I've just finally understood your suggestions and I would suggest following:

+ I will add a new method to object_allocatorT that will return an allocated 
memory (void*)
(w/o calling any construction)
+ object_allocatorT::allocate will call placement new with for a 
parameterless ctor
+ I will remove all overwritten operators new/delete on e.g. et_forest, ...
+ For these classes, I will add void* operator new (size_t, object_allocatorT 
)
+ Pool 

Re: C++ PATCH for c++/66748 (ICE with abi_tag on enum)

2015-07-03 Thread Jason Merrill

OK, thanks.

Jason


Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-03 Thread Jakub Jelinek
On Fri, Jul 03, 2015 at 03:41:29PM +0200, Richard Biener wrote:
  The fallout (at least on x86_64) is surprisingly small, i.e. none, just
  gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due
  to a bug in the vectorizer.  Jakub has a patch and knows the details.
  As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants
  (that was the motivation of this pass).

Here is the fix for that.

The problem is that for simd clone calls, if they have void return type,
STMT_VINFO_VECTYPE is NULL.  If vectorize_simd_clone_call succeeds,
that is fine, but if it doesn't, we can fall into all the other
vectorizable_* functions, and some of them compute some variables
IMHO prematurely.  It doesn't make sense to compute nunits/ncopies
etc. if stmt isn't even an assignment etc.
So, this patch adjusts the few routines that had this problem,
so that we check is_gimple_assign and gimple_assign_rhs_code or whatever
is the quick GIMPLE test those functions use to find if stmt is of interest
to them, and only when it is, compute whatever they need later.
As NULL STMT_VINFO_VECTYPE can happen only for calls, all these functions
don't ICE anymore.

Ok for trunk if it passes bootstrap/regtest?

In the pr59984.c testcase, with Marek's patch and this patch, one loop in
test is already vectorized (the ICE was on the other one), I'll work on
recognizing multiples of GOMP_SIMD_LANE () as linear next, so that we
vectorize also the loop with bar.  Without Marek's patch we weren't
vectorizing any of the two loops.

2015-07-03  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/66718
* tree-vect-stmts.c (vectorizable_assignment, vectorizable_store,
vectorizable_load, vectorizable_condition): Move vectype,
nunits, ncopies computation after checking what kind of statement
stmt is.

--- gcc/tree-vect-stmts.c.jj2015-06-30 14:08:45.0 +0200
+++ gcc/tree-vect-stmts.c   2015-07-03 14:06:28.843573210 +0200
@@ -4043,13 +4043,11 @@ vectorizable_assignment (gimple stmt, gi
   tree scalar_dest;
   tree op;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   tree new_temp;
   tree def;
   gimple def_stmt;
   enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   int i, j;
   vectree vec_oprnds = vNULL;
@@ -4060,16 +4058,6 @@ vectorizable_assignment (gimple stmt, gi
   enum tree_code code;
   tree vectype_in;
 
-  /* Multiple types in SLP are handled by creating the appropriate number of
- vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in
- case of SLP.  */
-  if (slp_node || PURE_SLP_STMT (stmt_info))
-ncopies = 1;
-  else
-ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
-
-  gcc_assert (ncopies = 1);
-
   if (!STMT_VINFO_RELEVANT_P (stmt_info)  !bb_vinfo)
 return false;
 
@@ -4095,6 +4083,19 @@ vectorizable_assignment (gimple stmt, gi
   if (code == VIEW_CONVERT_EXPR)
 op = TREE_OPERAND (op, 0);
 
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  /* Multiple types in SLP are handled by creating the appropriate number of
+ vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in
+ case of SLP.  */
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+ncopies = 1;
+  else
+ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies = 1);
+
   if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo,
 def_stmt, def, dt[0], vectype_in))
 {
@@ -5006,7 +5007,6 @@ vectorizable_store (gimple stmt, gimple_
   tree vec_oprnd = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = 
NULL;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree elem_type;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = NULL;
@@ -5020,7 +5020,6 @@ vectorizable_store (gimple stmt, gimple_
   tree dataref_ptr = NULL_TREE;
   tree dataref_offset = NULL_TREE;
   gimple ptr_incr = NULL;
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   int j;
   gimple next_stmt, first_stmt = NULL;
@@ -5039,28 +5038,6 @@ vectorizable_store (gimple stmt, gimple_
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   tree aggr_type;
 
-  if (loop_vinfo)
-loop = LOOP_VINFO_LOOP (loop_vinfo);
-
-  /* Multiple types in SLP are handled by creating the appropriate number of
- vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in
- case of SLP.  */
-  if (slp || PURE_SLP_STMT (stmt_info))
-ncopies = 1;
-  else
-ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
-
-  gcc_assert (ncopies = 1);
-
-  /* FORNOW. This restriction should be 

Re: [PATCH 0/3] [ARM] PR63870 improve error messages for NEON vldN_lane/vstN_lane

2015-07-03 Thread Alan Lawrence

Charles Baylis wrote:

These patches are a port of the changes do the same thing for AArch64 (see
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01984.html)

The first patch ports over some infrastructure, and the second converts the
vldN_lane and vstN_lane intrinsics. The changes required for vget_lane and
vset_lane will be done in a future patch.

The third patch includes the test cases from the AArch64 version, except that
the xfails for arm targets have been removed. If this series gets approved
before the AArch64 patch, I will commit the tests with xfail for aarch64
targets.


Given the large number of test cases, essentially because of test framework 
limitations, does it make sense to put these in their own directory? Just a thought.


Cheers, Alan



Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-03 Thread Richard Biener
On Fri, 3 Jul 2015, Richard Biener wrote:

 On Fri, 3 Jul 2015, Marek Polacek wrote:
 
  This patch implements a new pass, called laddress, which deals with
  lowering ADDR_EXPR assignments.  Such lowering ought to help the
  vectorizer, but it also could expose more CSE opportunities, maybe
  help reassoc, etc.  It's only active when optimize != 0.
  
  So e.g.
_1 = (sizetype) i_9;
_7 = _1 * 4;
_4 = b + _7;
  instead of
_4 = b[i_9];
  
  This triggered 14105 times during the regtest and 6392 times during
  the bootstrap.
  
  The fallout (at least on x86_64) is surprisingly small, i.e. none, just
  gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due
  to a bug in the vectorizer.  Jakub has a patch and knows the details.
  As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants
  (that was the motivation of this pass).
  
  This doesn't introduce any kind of verification nor PROP_laddress.
  Don't know if we want that, but hopefully it can be done as a follow-up
  if we do.
 
 Yes.  At the moment nothing requires lowered address form so this is
 merely an optimization (and not a bug for some later pass to
 re-introduce un-lowered non-invariant addresses).  I can imagine
 that for example IVOPTs could be simplified if we didn't have this
 kind of addresses in the IL.
 
  Do we want to move some optimizations into this new pass, e.g.
  from fwprop?
 
 I think we might want to re-try forwprop_into_addr_expr before lowering
 the address.  Well, but that's maybe just over-cautionous.
 
  Thoughts?
 
 Please move the pass before crited, crited and pre are supposed to
 go together.
 
 Otherwise looks ok to me.
 
 Thanks,
 Richard.
 
  Bootstrapped/regtested on x86_64-linux.
  
  2015-07-03  Marek Polacek  pola...@redhat.com
  
  PR tree-optimization/66718
  * Makefile.in (OBJS): Add tree-ssa-laddress.o. 
  * passes.def: Schedule pass_laddress.
  * timevar.def (DEFTIMEVAR): Add TV_TREE_LADDRESS.
  * tree-pass.h (make_pass_laddress): Declare.
  * tree-ssa-laddress.c: New file.
  
  * gcc.dg/vect/vect-126.c: New test.
  
  diff --git gcc/Makefile.in gcc/Makefile.in
  index 89eda96..2574b98 100644
  --- gcc/Makefile.in
  +++ gcc/Makefile.in
  @@ -1447,6 +1447,7 @@ OBJS = \
  tree-ssa-dse.o \
  tree-ssa-forwprop.o \
  tree-ssa-ifcombine.o \
  +   tree-ssa-laddress.o \

I'd say gimple-laddress.c is a better fit.  There is nothing
SSA specific in the pass and 'tree' is legacy...

  tree-ssa-live.o \
  tree-ssa-loop-ch.o \
  tree-ssa-loop-im.o \
  diff --git gcc/passes.def gcc/passes.def
  index 0d8356b..ac16e8a 100644
  --- gcc/passes.def
  +++ gcc/passes.def
  @@ -214,6 +214,7 @@ along with GCC; see the file COPYING3.  If not see
 NEXT_PASS (pass_cse_sincos);
 NEXT_PASS (pass_optimize_bswap);
 NEXT_PASS (pass_split_crit_edges);
  +  NEXT_PASS (pass_laddress);
 NEXT_PASS (pass_pre);
 NEXT_PASS (pass_sink_code);
 NEXT_PASS (pass_asan);
  diff --git gcc/testsuite/gcc.dg/vect/vect-126.c 
  gcc/testsuite/gcc.dg/vect/vect-126.c
  index e69de29..66a5821 100644
  --- gcc/testsuite/gcc.dg/vect/vect-126.c
  +++ gcc/testsuite/gcc.dg/vect/vect-126.c
  @@ -0,0 +1,64 @@
  +/* PR tree-optimization/66718 */
  +/* { dg-do compile } */
  +/* { dg-additional-options -mavx2 { target avx_runtime } } */
  +
  +int *a[1024], b[1024];
  +struct S { int u, v, w, x; };
  +struct S c[1024];
  +int d[1024][10];
  +
  +void
  +f0 (void)
  +{
  +  for (int i = 0; i  1024; i++)
  +a[i] = b[0];
  +}
  +
  +void
  +f1 (void)
  +{
  +  for (int i = 0; i  1024; i++)
  +{
  +  int *p = b[0];
  +  a[i] = p + i;
  +}
  +}
  +
  +void
  +f2 (int *p)
  +{
  +  for (int i = 0; i  1024; i++)
  +a[i] = p[i];
  +}
  +
  +void
  +f3 (void)
  +{
  +  for (int i = 0; i  1024; i++)
  +a[i] = b[i];
  +}
  +
  +void
  +f4 (void)
  +{
  +  int *p = c[0].v;
  +  for (int i = 0; i  1024; i++)
  +a[i] = p[4 * i];
  +}
  +
  +void
  +f5 (void)
  +{
  +  for (int i = 0; i  1024; i++)
  +a[i] = c[i].v;
  +}
  +
  +void
  +f6 (void)
  +{
  +  for (int i = 0; i  1024; i++)
  +for (unsigned int j = 0; j  10; j++)
  +  a[i] = d[i][j];
  +}
  +
  +/* { dg-final { scan-tree-dump-times vectorized 1 loops in function 7 
  vect { target vect_condition } } } */
  diff --git gcc/timevar.def gcc/timevar.def
  index efac4b7..fcc2fe0 100644
  --- gcc/timevar.def
  +++ gcc/timevar.def
  @@ -275,6 +275,7 @@ DEFTIMEVAR (TV_GIMPLE_SLSR   , straight-line 
  strength reduction)
   DEFTIMEVAR (TV_VTABLE_VERIFICATION   , vtable verification)
   DEFTIMEVAR (TV_TREE_UBSAN, tree ubsan)
   DEFTIMEVAR (TV_INITIALIZE_RTL, initialize rtl)
  +DEFTIMEVAR (TV_TREE_LADDRESS , address lowering)
   
   /* Everything else in rest_of_compilation not included above.  */
   DEFTIMEVAR (TV_EARLY_LOCAL  , early local passes)
  diff --git gcc/tree-pass.h gcc/tree-pass.h
  index 

Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-03 Thread Richard Biener
On Fri, 3 Jul 2015, Marek Polacek wrote:

 This patch implements a new pass, called laddress, which deals with
 lowering ADDR_EXPR assignments.  Such lowering ought to help the
 vectorizer, but it also could expose more CSE opportunities, maybe
 help reassoc, etc.  It's only active when optimize != 0.
 
 So e.g.
   _1 = (sizetype) i_9;
   _7 = _1 * 4;
   _4 = b + _7;
 instead of
   _4 = b[i_9];
 
 This triggered 14105 times during the regtest and 6392 times during
 the bootstrap.
 
 The fallout (at least on x86_64) is surprisingly small, i.e. none, just
 gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due
 to a bug in the vectorizer.  Jakub has a patch and knows the details.
 As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants
 (that was the motivation of this pass).
 
 This doesn't introduce any kind of verification nor PROP_laddress.
 Don't know if we want that, but hopefully it can be done as a follow-up
 if we do.

Yes.  At the moment nothing requires lowered address form so this is
merely an optimization (and not a bug for some later pass to
re-introduce un-lowered non-invariant addresses).  I can imagine
that for example IVOPTs could be simplified if we didn't have this
kind of addresses in the IL.

 Do we want to move some optimizations into this new pass, e.g.
 from fwprop?

I think we might want to re-try forwprop_into_addr_expr before lowering
the address.  Well, but that's maybe just over-cautionous.

 Thoughts?

Please move the pass before crited, crited and pre are supposed to
go together.

Otherwise looks ok to me.

Thanks,
Richard.

 Bootstrapped/regtested on x86_64-linux.
 
 2015-07-03  Marek Polacek  pola...@redhat.com
 
   PR tree-optimization/66718
   * Makefile.in (OBJS): Add tree-ssa-laddress.o. 
   * passes.def: Schedule pass_laddress.
   * timevar.def (DEFTIMEVAR): Add TV_TREE_LADDRESS.
   * tree-pass.h (make_pass_laddress): Declare.
   * tree-ssa-laddress.c: New file.
 
   * gcc.dg/vect/vect-126.c: New test.
 
 diff --git gcc/Makefile.in gcc/Makefile.in
 index 89eda96..2574b98 100644
 --- gcc/Makefile.in
 +++ gcc/Makefile.in
 @@ -1447,6 +1447,7 @@ OBJS = \
   tree-ssa-dse.o \
   tree-ssa-forwprop.o \
   tree-ssa-ifcombine.o \
 + tree-ssa-laddress.o \
   tree-ssa-live.o \
   tree-ssa-loop-ch.o \
   tree-ssa-loop-im.o \
 diff --git gcc/passes.def gcc/passes.def
 index 0d8356b..ac16e8a 100644
 --- gcc/passes.def
 +++ gcc/passes.def
 @@ -214,6 +214,7 @@ along with GCC; see the file COPYING3.  If not see
NEXT_PASS (pass_cse_sincos);
NEXT_PASS (pass_optimize_bswap);
NEXT_PASS (pass_split_crit_edges);
 +  NEXT_PASS (pass_laddress);
NEXT_PASS (pass_pre);
NEXT_PASS (pass_sink_code);
NEXT_PASS (pass_asan);
 diff --git gcc/testsuite/gcc.dg/vect/vect-126.c 
 gcc/testsuite/gcc.dg/vect/vect-126.c
 index e69de29..66a5821 100644
 --- gcc/testsuite/gcc.dg/vect/vect-126.c
 +++ gcc/testsuite/gcc.dg/vect/vect-126.c
 @@ -0,0 +1,64 @@
 +/* PR tree-optimization/66718 */
 +/* { dg-do compile } */
 +/* { dg-additional-options -mavx2 { target avx_runtime } } */
 +
 +int *a[1024], b[1024];
 +struct S { int u, v, w, x; };
 +struct S c[1024];
 +int d[1024][10];
 +
 +void
 +f0 (void)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] = b[0];
 +}
 +
 +void
 +f1 (void)
 +{
 +  for (int i = 0; i  1024; i++)
 +{
 +  int *p = b[0];
 +  a[i] = p + i;
 +}
 +}
 +
 +void
 +f2 (int *p)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] = p[i];
 +}
 +
 +void
 +f3 (void)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] = b[i];
 +}
 +
 +void
 +f4 (void)
 +{
 +  int *p = c[0].v;
 +  for (int i = 0; i  1024; i++)
 +a[i] = p[4 * i];
 +}
 +
 +void
 +f5 (void)
 +{
 +  for (int i = 0; i  1024; i++)
 +a[i] = c[i].v;
 +}
 +
 +void
 +f6 (void)
 +{
 +  for (int i = 0; i  1024; i++)
 +for (unsigned int j = 0; j  10; j++)
 +  a[i] = d[i][j];
 +}
 +
 +/* { dg-final { scan-tree-dump-times vectorized 1 loops in function 7 
 vect { target vect_condition } } } */
 diff --git gcc/timevar.def gcc/timevar.def
 index efac4b7..fcc2fe0 100644
 --- gcc/timevar.def
 +++ gcc/timevar.def
 @@ -275,6 +275,7 @@ DEFTIMEVAR (TV_GIMPLE_SLSR   , straight-line 
 strength reduction)
  DEFTIMEVAR (TV_VTABLE_VERIFICATION   , vtable verification)
  DEFTIMEVAR (TV_TREE_UBSAN, tree ubsan)
  DEFTIMEVAR (TV_INITIALIZE_RTL, initialize rtl)
 +DEFTIMEVAR (TV_TREE_LADDRESS , address lowering)
  
  /* Everything else in rest_of_compilation not included above.  */
  DEFTIMEVAR (TV_EARLY_LOCAL, early local passes)
 diff --git gcc/tree-pass.h gcc/tree-pass.h
 index 2808dad..c47b22e 100644
 --- gcc/tree-pass.h
 +++ gcc/tree-pass.h
 @@ -393,6 +393,7 @@ extern gimple_opt_pass *make_pass_cd_dce (gcc::context 
 *ctxt);
  extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
  extern 

Re: C++ PATCH to change default dialect to C++14

2015-07-03 Thread Jason Merrill

On 07/02/2015 07:41 PM, Jim Wilson wrote:

The code compiles with -std=c++98.  It does not compile with -std=c++14.
So this testcase should be fixed to work with c++14.


Done.

Jason



[PATCH] rs6000: Add testcase for shifts

2015-07-03 Thread Segher Boessenkool
This new test tests that all shifts of int compile to exactly one
machine instruction, not two as in the PR (which was a problem in
combine).  Tested on powerpc64-linux, with the usual options
(-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); okay for trunk?


Segher


2015-07-03  Segher Boessenkool  seg...@kernel.crashing.org

gcc/testsuite/
PR rtl-optimization/66706
* gcc.target/powerpc/shift-int.c: New testcase.

---
 gcc/testsuite/gcc.target/powerpc/shift-int.c | 23 +++
 1 file changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/shift-int.c

diff --git a/gcc/testsuite/gcc.target/powerpc/shift-int.c 
b/gcc/testsuite/gcc.target/powerpc/shift-int.c
new file mode 100644
index 000..fe696ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/shift-int.c
@@ -0,0 +1,23 @@
+/* Check that shifts do not get unnecessary extends.
+   See PR66706 for a case where this failed.  */
+
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+/* Each function should compile to exactly two instructions.  */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 16 } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+blr} 8 } } */
+
+
+typedef unsigned u;
+typedef signed s;
+
+u rot(u x, u n) { return (x  n) | (x  (32 - n)); }
+u shl(u x, u n) { return x  n; }
+u shr(u x, u n) { return x  n; }
+s asr(s x, u n) { return x  n; }
+
+u roti(u x) { return (x  23) | (x  9); }
+u shli(u x) { return x  23; }
+u shri(u x) { return x  23; }
+s asri(s x) { return x  23; }
-- 
1.8.1.4



[PATCH 0/2][trunk+5 backport][ARM] PR/65956 Implement AAPCS updates for alignment attribute

2015-07-03 Thread Alan Lawrence
This patch series implements the changes/additions to the ARM ABI proposed at 
https://gcc.gnu.org/ml/gcc/2015-07/msg00040.html .


The first patch is the ABI update. This is an ABI-breaking change for any code 
using __attribute__((aligned(...))) on a public interface (a case not previously 
defined by the AAPCS).


This causes a regression of gcc.c-torture/execute/20040709-1.c at -O0 (only), 
and the align_rec2.c fails, both due to a latent bug where we can emit strd/ldrd 
on an odd-numbered register in ARM state. The second patch prevents such illegal 
instructions and fixes both tests.


On trunk, tested via bootstrap + check-gcc on arm-none-linux-gnueabihf 
(cortex-a15+neon). Also cross-tested arm-none-eabi with a number of variants.


On gcc-5-branch, patches rebase cleanly, tested via profiledbootstrap + 
check-gcc. (Yes, profiledbootstrap succeeds.)




Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-03 Thread Richard Sandiford
Martin Jambor mjam...@suse.cz writes:
 On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote:
 Trevor Saunders tbsau...@tbsaunde.org writes:
  On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:
  Martin Liška mli...@suse.cz writes:
   diff --git a/gcc/asan.c b/gcc/asan.c
   index e89817e..dabd6f1 100644
   --- a/gcc/asan.c
   +++ b/gcc/asan.c
   @@ -362,20 +362,20 @@ struct asan_mem_ref
  /* Pool allocation new operator.  */
  inline void *operator new (size_t)
  {
   -return pool.allocate ();
   +return ::new (pool.allocate ()) asan_mem_ref ();
  }

  /* Delete operator utilizing pool allocation.  */
  inline void operator delete (void *ptr)
  {
   -pool.remove ((asan_mem_ref *) ptr);
   +pool.remove (ptr);
  }

  /* Memory allocation pool.  */
   -  static pool_allocatorasan_mem_ref pool;
   +  static pool_allocator pool;
};
  
  I'm probably going over old ground/wounds, sorry, but what's the benefit
  of having this sort of pattern?  Why not simply have object_allocators
  and make callers use pool.allocate () and pool.remove (x) (with 
  pool.remove
  calling the destructor) instead of new and delete?  It feels wrong to me
  to tie the data type to a particular allocation object like this.
 
  Well the big question is what does allocate() do about construction?  if
  it seems wierd for it to not call the ctor, but I'm not sure we can do a
  good job of forwarding args to allocate() with C++98.
 
 If you need non-default constructors then:
 
   new (pool) type (aaa, bbb)...;
 
 doesn't seem too bad.  I agree object_allocator's allocate () should call
 the constructor.
 

 but then the pool allocator must not call placement new on the
 allocated memory itself because that would result in double
 construction.

But we're talking about two different methods.  The normal allocator
object_allocator T::allocate () would use placement new and return a
pointer to the new object while operator new (size_t, object_allocator T )
wouldn't call placement new and would just return a pointer to the memory.

  And using the pool allocator functions directly has the nice property
  that you can tell when a delete/remove isn't necessary because the pool
  itself is being cleared.
 
  Well, all these cases involve a pool with static storage lifetime right?
  so actually if you don't delete things in these pool they are
  effectively leaked.
 
 They might have a static storage lifetime now, but it doesn't seem like
 a good idea to hard-bake that into the interface

 Does that mean that operators new and delete are considered evil?

Not IMO.  Just that static load-time-initialized caches are not
necessarily a good thing.  That's effectively what the pool
allocator is.

 (by saying that for
 these types you should use new and delete, but for other pool-allocated
 types you should use object_allocators).

 Depending on what kind of pool allocator you use, you will be forced
 to either call placement new or not, so the inconsistency will be
 there anyway.

But how we handle argument-taking constructors is a problem that needs
to be solved for the pool-allocated objects that don't use a single
static type-specific pool.  And once we solve that, we get consistency
across all pools:

- if you want a new object and argumentless construction is OK,
  use pool.allocate ()

- if you want a new object and need to pass arguments to the constructor,
  use new (pool) some_type (arg1, arg2, ...)

 Maybe I just have bad memories
 from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
 of state that was obviously static in the old days, but that needed
 to become non-static to support vaguely-efficient switching between
 different subtargets.  The same kind of thing is likely to happen again.
 I assume things like the jit would prefer not to have new global state
 with load-time construction.

 I'm not sure I follow this branch of the discussion, the allocators of
 any kind surely can dynamically allocated themselves?

Sure, but either (a) you keep the pools as a static part of the class
and some initialisation and finalisation code that has tendrils into
all such classes or (b) you move the static pool outside of the
class to some new (still global) state.  Explicit pool allocation,
like in the C days, gives you the option of putting the pool whereever
it needs to go without relying on the principle that you can get to
it from global state.

Thanks,
Richard


Re: [PATCH] Allow embedded timestamps by C/C++ macros to be set externally

2015-07-03 Thread Dhole
On 06/30/2015 06:23 PM, Manuel López-Ibáñez wrote:
 On 30 June 2015 at 17:18, Dhole dh...@openmailbox.org wrote:
 In the debian reproducible builds project we have considered several
 options to address this issue. We considered redefining the __DATE__ and
 __TIME__ defines by command line flags passed to gcc, but as you say,
 that triggers warnings, which could become errors when building with
 -Werror and thus may require manual intervention on many packages.
 
 Well, it would require adding -Wno-something (-Wno-reproducible?
 -Wno-unreproducible? or perhaps simply -freproducible? ) to some
 CFLAGS/CXXFLAGS. Is that too much manual intervention? (I'm asking
 sincerely, perhaps indeed it is).

Our idea with the SOURCE_DATE_EPOCH env var was to find a general
solution for all toolchain packages involved in the build process that
embed timestamps. We already have a patched version of a package used
during Debian builds (debhelper) which sets the SOURCE_DATE_EPOCH in the
build environment. With the submitted patch to GCC nothing else would be
needed, and we believe it would be useful to other projects working on
reproducible builds, as they would only need to set the
SOURCE_DATE_EPOCH env var during their build process. Modifying the
CFLAGS/CXXFLAGS would need more intervention during the build process,
and this would be a solution only useful for GCC and not other toolchain
packages. It could be done, but we'd prefer the general approach.

As mentioned before, we are trying to create a standard way of modifying
timestamp embedding behavior for any package with the SOURCE_DATE_EPOCH.

 This could be a big hammer option that simply disables any warning
 that is not relevant for reproducible builds (the default being
 -Wsomething), for example avoid emitting --Wbuiltin-macro-redefined
 warnings in the specific cases of __TIME__ and __DATE. Just an idea,
 the maintainers would need to say if they would accept such an option.
 
 Cheers,
 
 Manuel.
 

I'm looking forward to hear opinions from the maintainers :)

Regards,
-- 
Dhole



signature.asc
Description: OpenPGP digital signature


Re: [v3 PATCH] Implement Fundamentals v2 propagate_const

2015-07-03 Thread Jonathan Wakely

On 03/07/15 14:40 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64. Patch gzipped to avoid polluting people's
mailboxes with a 45k patch.


Thanks very much, I made a few whitespace changes and committed it (as
attached) after testing.

I've also updated the status tables in the docs, see the second
attachment.


patch.txt.gz
Description: application/gzip
commit 6b5c94dc814c2aea9c457d0eb6cb965e9a7011f1
Author: Jonathan Wakely jwak...@redhat.com
Date:   Fri Jul 3 14:02:53 2015 +0100

* doc/xml/manual/status_cxx2017.xml: Update status table.
* doc/html/manual/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 07e2dbe..491e024 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -112,7 +112,7 @@ not in any particular release.
   /entry
   entryCleaning-up noexcept in the Library/entry
   entryPartial/entry
-  entry/
+  entryChanges to basic_string not complete./entry
 /row
 
 row
@@ -177,14 +177,13 @@ not in any particular release.
 /row
 
 row
-  ?dbhtml bgcolor=#C8B0B0 ?
   entry
link xmlns:xlink=http://www.w3.org/1999/xlink; 
xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4387.html;
  N4387
/link
   /entry
   entry Improving pair and tuple, revision 3 /entry
-  entryN/entry
+  entryY/entry
   entry/
 /row
 
@@ -304,14 +303,13 @@ not in any particular release.
 
 
 row
-  ?dbhtml bgcolor=#C8B0B0 ?
   entry
link xmlns:xlink=http://www.w3.org/1999/xlink; 
xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4388.html;
  N4388
/link
   /entry
   entryConst-Propagating Wrapper/entry
-  entryN/entry
+  entryY/entry
   entryLibrary Fundamentals 2 TS/entry
 /row
 


Re: [PATCH] rs6000: Add testcase for shifts

2015-07-03 Thread David Edelsohn
On Fri, Jul 3, 2015 at 10:16 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:
 This new test tests that all shifts of int compile to exactly one
 machine instruction, not two as in the PR (which was a problem in
 combine).  Tested on powerpc64-linux, with the usual options
 (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); okay for trunk?


 Segher


 2015-07-03  Segher Boessenkool  seg...@kernel.crashing.org

 gcc/testsuite/
 PR rtl-optimization/66706
 * gcc.target/powerpc/shift-int.c: New testcase.

Okay.

Thanks, David


[Patch docs obvious AArch64] Fix position of -moverride documentation

2015-07-03 Thread James Greenhalgh
Hi,

-moverride is not a feature modifier, so it is currently misplaced in the
documentation.

Fix that by moving it out to the general AArch64 options section.

Checked in the HTML output that is now in a sensible place, and committed
as attached as obvious as revision 225384.

Thanks,
James

---
2015-07-03  James Greenhalgh  james.greenha...@arm.com

* doc/invoke.texi (moverride): Move to correct section.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 844d7edaecf2bc6642324ad8513f7c2add0ee486..1dfce1143027cef86d8fbf59580035e6d25f1189 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12496,6 +12496,15 @@ the target processor for which to tune for performance (as if
 by @option{-mtune}).  Where this option is used in conjunction
 with @option{-march} or @option{-mtune}, those options take precedence
 over the appropriate part of this option.
+
+@item -moverride=@var{string}
+@opindex moverride
+Override tuning decisions made by the back-end in response to a
+@option{-mtune=} switch.  The syntax, semantics, and accepted values
+for @var{string} in this option are not guaranteed to be consistent
+across releases.
+
+This option is only intended to be useful when developing GCC.
 @end table
 
 @subsubsection @option{-march} and @option{-mcpu} Feature Modifiers
@@ -12526,14 +12535,6 @@ Enable Limited Ordering Regions support.
 @item rdma
 Enable ARMv8.1 Advanced SIMD instructions.
 
-@item -moverride=@var{string}
-@opindex master
-Override tuning decisions made by the back-end in response to a
-@option{-mtune=} switch.  The syntax, semantics, and accepted values
-for @var{string} in this option are not guaranteed to be consistent
-across releases.
-
-This option is only intended to be useful when developing GCC.
 @end table
 
 That is, @option{crypto} implies @option{simd} implies @option{fp}.


[PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-03 Thread Alan Lawrence
These include tests of structs, scalars, and vectors - only general-purpose 
registers are affected by the ABI rules for alignment, but we can restrict the 
vector test to use the base AAPCS.


Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the 
latter showing an internal inconsistency, the first two merely that GCC did not 
obey the new ABI).


With this patch, the align_rec2.c fails, and also 
gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug 
where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed by 
the second patch.


gcc/ChangeLog:

* config/arm/arm.c (arm_needs_doubleword_align): Drop any outer
alignment attribute, exploring one level down for aggregates.

gcc/testsuite/ChangeLog:

* gcc.target/arm/aapcs/align1.c: New.
* gcc.target/arm/aapcs/align_rec1.c: New.
* gcc.target/arm/aapcs/align2.c: New.
* gcc.target/arm/aapcs/align_rec2.c: New.
* gcc.target/arm/aapcs/align3.c: New.
* gcc.target/arm/aapcs/align_rec3.c: New.
* gcc.target/arm/aapcs/align4.c: New.
* gcc.target/arm/aapcs/align_rec4.c: New.
* gcc.target/arm/aapcs/align_vararg1.c: New.
* gcc.target/arm/aapcs/align_vararg2.c: New.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype,
 static bool
 arm_needs_doubleword_align (machine_mode mode, const_tree type)
 {
-  return (GET_MODE_ALIGNMENT (mode)  PARM_BOUNDARY
-	  || (type  TYPE_ALIGN (type)  PARM_BOUNDARY));
+  if (!type)
+return PARM_BOUNDARY  GET_MODE_ALIGNMENT (mode);
+
+  if (!AGGREGATE_TYPE_P (type))
+return TYPE_ALIGN (TYPE_MAIN_VARIANT (type))  PARM_BOUNDARY;
+
+  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+if (DECL_ALIGN (field)  PARM_BOUNDARY)
+  return true;
+
+  return false;
 }
 
 
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
new file mode 100644
index ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
@@ -0,0 +1,29 @@
+/* Test AAPCS layout (alignment).  */
+
+/* { dg-do run { target arm_eabi } } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options -O } */
+
+#ifndef IN_FRAMEWORK
+#define TESTFILE align1.c
+
+typedef __attribute__((aligned (8))) int alignedint;
+
+alignedint a = 11;
+alignedint b = 13;
+alignedint c = 17;
+alignedint d = 19;
+alignedint e = 23;
+alignedint f = 29;
+
+#include abitest.h
+#else
+  ARG (alignedint, a, R0)
+  /* Attribute suggests R2, but we should use only natural alignment:  */
+  ARG (alignedint, b, R1)
+  ARG (alignedint, c, R2)
+  ARG (alignedint, d, R3)
+  ARG (alignedint, e, STACK)
+  /* Attribute would suggest STACK + 8 but should be ignored:  */
+  LAST_ARG (alignedint, f, STACK + 4)
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
new file mode 100644
index ..992da53c606c793f25278152406582bb993719d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
@@ -0,0 +1,30 @@
+/* Test AAPCS layout (alignment).  */
+
+/* { dg-do run { target arm_eabi } } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options -O } */
+
+#ifndef IN_FRAMEWORK
+#define TESTFILE align2.c
+
+/* The underlying struct here has alignment 4.  */
+typedef struct __attribute__((aligned (8)))
+  {
+int x;
+int y;
+  } overaligned;
+
+/* A couple of instances, at 8-byte-aligned memory locations.  */
+overaligned a = { 2, 3 };
+overaligned b = { 5, 8 };
+
+#include abitest.h
+#else
+  ARG (int, 7, R0)
+  /* Alignment should be 4.  */
+  ARG (overaligned, a, R1)
+  ARG (int, 9, R3)
+  ARG (int, 10, STACK)
+  /* Alignment should be 4.  */
+  LAST_ARG (overaligned, b, STACK + 4)
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align3.c b/gcc/testsuite/gcc.target/arm/aapcs/align3.c
new file mode 100644
index ..81ad3f587a95aae52ec601ce5a60b198e5351edf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/align3.c
@@ -0,0 +1,42 @@
+/* Test AAPCS layout (alignment).  */
+
+/* { dg-do run { target arm_eabi } } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options -O3 } */
+
+#ifndef IN_FRAMEWORK
+#define TESTFILE align3.c
+
+/* Struct will be aligned to 8.  */
+struct s
+  {
+int x;
+/* 4 bytes padding here.  */
+__attribute__((aligned (8))) int y;
+/* 4 bytes padding here.  */
+  };
+
+typedef struct s __attribute__((aligned (4))) underaligned;
+
+#define EXPECTED_STRUCT_SIZE 16
+extern void link_failure (void);
+int
+foo ()
+{
+  /* Optimization gets rid of this before linking. 

[PATCH 2/2][ARM] fix movdi expander to avoid illegal ldrd/strd

2015-07-03 Thread Alan Lawrence
The previous patch caused a regression in gcc.c-torture/execute/20040709-1.c at 
-O0 (only), and the new align_rec2.c test fails, both outputting an illegal 
assembler instruction (ldrd on an odd-numbered reg) from output_move_double in 
arm.c. Most routes have checks against such an illegal instruction, but 
expanding a function call can directly name such impossible register (pairs), 
bypassing the normal checks.


gcc/ChangeLog:

* config/arm/arm.md (movdi): Avoid odd-number ldrd/strd in ARM state.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 164ac13a26289bf755c89e78a8a5f751883c6039..c6718282d2555f8cf9a4e9111b1393e1f7704983 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5415,6 +5415,42 @@
   if (!REG_P (operands[0]))
 	operands[1] = force_reg (DImode, operands[1]);
 }
+  if (REG_P (operands[0])  REGNO (operands[0])  FIRST_VIRTUAL_REGISTER
+   !HARD_REGNO_MODE_OK (REGNO (operands[0]), DImode))
+{
+  /* Avoid LDRD's into an odd-numbered register pair in ARM state
+	 when expanding function calls.  */
+  gcc_assert (can_create_pseudo_p ());
+  if (MEM_P (operands[1])  MEM_VOLATILE_P (operands[1]))
+	{
+	  /* Perform load into legal reg pair first, then move.  */
+	  rtx reg = gen_reg_rtx (DImode);
+	  emit_insn (gen_movdi (reg, operands[1]));
+	  operands[1] = reg;
+	}
+  emit_move_insn (gen_lowpart (SImode, operands[0]),
+		  gen_lowpart (SImode, operands[1]));
+  emit_move_insn (gen_highpart (SImode, operands[0]),
+	  gen_highpart (SImode, operands[1]));
+  DONE;
+}
+  else if (REG_P (operands[1])  REGNO (operands[1])  FIRST_VIRTUAL_REGISTER
+	!HARD_REGNO_MODE_OK (REGNO (operands[1]), DImode))
+{
+  /* Avoid LDRD's into an odd-numbered register pair in ARM state
+	 when expanding function prologue.  */
+  gcc_assert (can_create_pseudo_p ());
+  rtx split_dest = (MEM_P (operands[0])  MEM_VOLATILE_P (operands[0]))
+		   ? gen_reg_rtx (DImode)
+		   : operands[0];
+  emit_move_insn (gen_lowpart (SImode, split_dest),
+		  gen_lowpart (SImode, operands[1]));
+  emit_move_insn (gen_highpart (SImode, split_dest),
+	  gen_highpart (SImode, operands[1]));
+  if (split_dest != operands[0])
+	emit_insn (gen_movdi (operands[0], split_dest));
+  DONE;
+}
   
 )
 


Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-03 Thread Martin Jambor
Hi,

On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote:
 Trevor Saunders tbsau...@tbsaunde.org writes:
  On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:
  Martin Liška mli...@suse.cz writes:
   diff --git a/gcc/asan.c b/gcc/asan.c
   index e89817e..dabd6f1 100644
   --- a/gcc/asan.c
   +++ b/gcc/asan.c
   @@ -362,20 +362,20 @@ struct asan_mem_ref
  /* Pool allocation new operator.  */
  inline void *operator new (size_t)
  {
   -return pool.allocate ();
   +return ::new (pool.allocate ()) asan_mem_ref ();
  }

  /* Delete operator utilizing pool allocation.  */
  inline void operator delete (void *ptr)
  {
   -pool.remove ((asan_mem_ref *) ptr);
   +pool.remove (ptr);
  }

  /* Memory allocation pool.  */
   -  static pool_allocatorasan_mem_ref pool;
   +  static pool_allocator pool;
};
  
  I'm probably going over old ground/wounds, sorry, but what's the benefit
  of having this sort of pattern?  Why not simply have object_allocators
  and make callers use pool.allocate () and pool.remove (x) (with pool.remove
  calling the destructor) instead of new and delete?  It feels wrong to me
  to tie the data type to a particular allocation object like this.
 
  Well the big question is what does allocate() do about construction?  if
  it seems wierd for it to not call the ctor, but I'm not sure we can do a
  good job of forwarding args to allocate() with C++98.
 
 If you need non-default constructors then:
 
   new (pool) type (aaa, bbb)...;
 
 doesn't seem too bad.  I agree object_allocator's allocate () should call
 the constructor.
 

but then the pool allocator must not call placement new on the
allocated memory itself because that would result in double
construction.  And calling placement new was explicitely requested in
the previous thread about allocators, so we still need two kinds of
allocators, typed and untyped.  Or just the untyped allocators and
requiring that users construct their objects via placement new.  In
fact, they might have to call placement new even if there is no
constructor because of the weird aliasing issue.  Two kinds of
pool-allocators seem the lesser evil to me.

  And using the pool allocator functions directly has the nice property
  that you can tell when a delete/remove isn't necessary because the pool
  itself is being cleared.
 
  Well, all these cases involve a pool with static storage lifetime right?
  so actually if you don't delete things in these pool they are
  effectively leaked.
 
 They might have a static storage lifetime now, but it doesn't seem like
 a good idea to hard-bake that into the interface

Does that mean that operators new and delete are considered evil?

 (by saying that for
 these types you should use new and delete, but for other pool-allocated
 types you should use object_allocators).

Depending on what kind of pool allocator you use, you will be forced
to either call placement new or not, so the inconsistency will be
there anyway.

I'm using pool allocators for classes with non-default constructors a
lot in the HSA branch so I'd appreciate an early settlement of this
issue.  I think I slightly prefer overloading new and delete to using
placement new (at least in new code) because then users just allocate
stuff as usual and there is one central point where thing can be
changed.  But I do not have strong feelings and will comply with
whatever we can agree on.

 Maybe I just have bad memories
 from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
 of state that was obviously static in the old days, but that needed
 to become non-static to support vaguely-efficient switching between
 different subtargets.  The same kind of thing is likely to happen again.
 I assume things like the jit would prefer not to have new global state
 with load-time construction.

I'm not sure I follow this branch of the discussion, the allocators of
any kind surely can dynamically allocated themselves?

Thanks,

Martin


RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-03 Thread Marek Polacek
This patch implements a new pass, called laddress, which deals with
lowering ADDR_EXPR assignments.  Such lowering ought to help the
vectorizer, but it also could expose more CSE opportunities, maybe
help reassoc, etc.  It's only active when optimize != 0.

So e.g.
  _1 = (sizetype) i_9;
  _7 = _1 * 4;
  _4 = b + _7;
instead of
  _4 = b[i_9];

This triggered 14105 times during the regtest and 6392 times during
the bootstrap.

The fallout (at least on x86_64) is surprisingly small, i.e. none, just
gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due
to a bug in the vectorizer.  Jakub has a patch and knows the details.
As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants
(that was the motivation of this pass).

This doesn't introduce any kind of verification nor PROP_laddress.
Don't know if we want that, but hopefully it can be done as a follow-up
if we do.  Do we want to move some optimizations into this new pass, e.g.
from fwprop?

Thoughts?

Bootstrapped/regtested on x86_64-linux.

2015-07-03  Marek Polacek  pola...@redhat.com

PR tree-optimization/66718
* Makefile.in (OBJS): Add tree-ssa-laddress.o. 
* passes.def: Schedule pass_laddress.
* timevar.def (DEFTIMEVAR): Add TV_TREE_LADDRESS.
* tree-pass.h (make_pass_laddress): Declare.
* tree-ssa-laddress.c: New file.

* gcc.dg/vect/vect-126.c: New test.

diff --git gcc/Makefile.in gcc/Makefile.in
index 89eda96..2574b98 100644
--- gcc/Makefile.in
+++ gcc/Makefile.in
@@ -1447,6 +1447,7 @@ OBJS = \
tree-ssa-dse.o \
tree-ssa-forwprop.o \
tree-ssa-ifcombine.o \
+   tree-ssa-laddress.o \
tree-ssa-live.o \
tree-ssa-loop-ch.o \
tree-ssa-loop-im.o \
diff --git gcc/passes.def gcc/passes.def
index 0d8356b..ac16e8a 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -214,6 +214,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_split_crit_edges);
+  NEXT_PASS (pass_laddress);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
   NEXT_PASS (pass_asan);
diff --git gcc/testsuite/gcc.dg/vect/vect-126.c 
gcc/testsuite/gcc.dg/vect/vect-126.c
index e69de29..66a5821 100644
--- gcc/testsuite/gcc.dg/vect/vect-126.c
+++ gcc/testsuite/gcc.dg/vect/vect-126.c
@@ -0,0 +1,64 @@
+/* PR tree-optimization/66718 */
+/* { dg-do compile } */
+/* { dg-additional-options -mavx2 { target avx_runtime } } */
+
+int *a[1024], b[1024];
+struct S { int u, v, w, x; };
+struct S c[1024];
+int d[1024][10];
+
+void
+f0 (void)
+{
+  for (int i = 0; i  1024; i++)
+a[i] = b[0];
+}
+
+void
+f1 (void)
+{
+  for (int i = 0; i  1024; i++)
+{
+  int *p = b[0];
+  a[i] = p + i;
+}
+}
+
+void
+f2 (int *p)
+{
+  for (int i = 0; i  1024; i++)
+a[i] = p[i];
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i  1024; i++)
+a[i] = b[i];
+}
+
+void
+f4 (void)
+{
+  int *p = c[0].v;
+  for (int i = 0; i  1024; i++)
+a[i] = p[4 * i];
+}
+
+void
+f5 (void)
+{
+  for (int i = 0; i  1024; i++)
+a[i] = c[i].v;
+}
+
+void
+f6 (void)
+{
+  for (int i = 0; i  1024; i++)
+for (unsigned int j = 0; j  10; j++)
+  a[i] = d[i][j];
+}
+
+/* { dg-final { scan-tree-dump-times vectorized 1 loops in function 7 vect 
{ target vect_condition } } } */
diff --git gcc/timevar.def gcc/timevar.def
index efac4b7..fcc2fe0 100644
--- gcc/timevar.def
+++ gcc/timevar.def
@@ -275,6 +275,7 @@ DEFTIMEVAR (TV_GIMPLE_SLSR   , straight-line 
strength reduction)
 DEFTIMEVAR (TV_VTABLE_VERIFICATION   , vtable verification)
 DEFTIMEVAR (TV_TREE_UBSAN, tree ubsan)
 DEFTIMEVAR (TV_INITIALIZE_RTL, initialize rtl)
+DEFTIMEVAR (TV_TREE_LADDRESS , address lowering)
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL  , early local passes)
diff --git gcc/tree-pass.h gcc/tree-pass.h
index 2808dad..c47b22e 100644
--- gcc/tree-pass.h
+++ gcc/tree-pass.h
@@ -393,6 +393,7 @@ extern gimple_opt_pass *make_pass_cd_dce (gcc::context 
*ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt);
 extern unsigned int tail_merge_optimize (unsigned int);
 extern gimple_opt_pass *make_pass_profile (gcc::context *ctxt);
diff --git gcc/tree-ssa-laddress.c gcc/tree-ssa-laddress.c
index e69de29..3f69d7d 100644
--- gcc/tree-ssa-laddress.c
+++ gcc/tree-ssa-laddress.c
@@ -0,0 +1,137 @@
+/* Lower and optimize address expressions.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Marek Polacek pola...@redhat.com
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or 

[PATCH] libgomp: Add comment to clarify last_team usage

2015-07-03 Thread Sebastian Huber
libgomp/ChangeLog
2015-07-03  Sebastian Huber  sebastian.hu...@embedded-brains.de

* libgomp.h (gomp_thread_pool): Comment last_team field.
---
 libgomp/libgomp.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 5272f01..5ed0f78 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -458,6 +458,9 @@ struct gomp_thread_pool
   struct gomp_thread **threads;
   unsigned threads_size;
   unsigned threads_used;
+  /* The last team is used for non-nested teams to delay their destruction to
+ make sure all the threads in the team move on to the pool's barrier before
+ the team's barrier is destroyed.  */
   struct gomp_team *last_team;
   /* Number of threads running in this contention group.  */
   unsigned long threads_busy;
-- 
1.8.4.5



Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)

2015-07-03 Thread Richard Biener
On July 3, 2015 4:06:26 PM GMT+02:00, Jakub Jelinek ja...@redhat.com wrote:
On Fri, Jul 03, 2015 at 03:41:29PM +0200, Richard Biener wrote:
  The fallout (at least on x86_64) is surprisingly small, i.e. none,
just
  gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is
due
  to a bug in the vectorizer.  Jakub has a patch and knows the
details.
  As the test shows, we're now able to vectorize ADDR_EXPR of
non-invariants
  (that was the motivation of this pass).

Here is the fix for that.

The problem is that for simd clone calls, if they have void return
type,
STMT_VINFO_VECTYPE is NULL.  If vectorize_simd_clone_call succeeds,
that is fine, but if it doesn't, we can fall into all the other
vectorizable_* functions, and some of them compute some variables
IMHO prematurely.  It doesn't make sense to compute nunits/ncopies
etc. if stmt isn't even an assignment etc.
So, this patch adjusts the few routines that had this problem,
so that we check is_gimple_assign and gimple_assign_rhs_code or
whatever
is the quick GIMPLE test those functions use to find if stmt is of
interest
to them, and only when it is, compute whatever they need later.
As NULL STMT_VINFO_VECTYPE can happen only for calls, all these
functions
don't ICE anymore.

Ok for trunk if it passes bootstrap/regtest?

OK.

Thanks,
Richard.

In the pr59984.c testcase, with Marek's patch and this patch, one loop
in
test is already vectorized (the ICE was on the other one), I'll work on
recognizing multiples of GOMP_SIMD_LANE () as linear next, so that we
vectorize also the loop with bar.  Without Marek's patch we weren't
vectorizing any of the two loops.

2015-07-03  Jakub Jelinek  ja...@redhat.com

   PR tree-optimization/66718
   * tree-vect-stmts.c (vectorizable_assignment, vectorizable_store,
   vectorizable_load, vectorizable_condition): Move vectype,
   nunits, ncopies computation after checking what kind of statement
   stmt is.

--- gcc/tree-vect-stmts.c.jj   2015-06-30 14:08:45.0 +0200
+++ gcc/tree-vect-stmts.c  2015-07-03 14:06:28.843573210 +0200
@@ -4043,13 +4043,11 @@ vectorizable_assignment (gimple stmt, gi
   tree scalar_dest;
   tree op;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   tree new_temp;
   tree def;
   gimple def_stmt;
enum vect_def_type dt[2] = {vect_unknown_def_type,
vect_unknown_def_type};
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   int i, j;
   vectree vec_oprnds = vNULL;
@@ -4060,16 +4058,6 @@ vectorizable_assignment (gimple stmt, gi
   enum tree_code code;
   tree vectype_in;
 
-  /* Multiple types in SLP are handled by creating the appropriate
number of
- vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in
- case of SLP.  */
-  if (slp_node || PURE_SLP_STMT (stmt_info))
-ncopies = 1;
-  else
-ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
-
-  gcc_assert (ncopies = 1);
-
   if (!STMT_VINFO_RELEVANT_P (stmt_info)  !bb_vinfo)
 return false;
 
@@ -4095,6 +4083,19 @@ vectorizable_assignment (gimple stmt, gi
   if (code == VIEW_CONVERT_EXPR)
 op = TREE_OPERAND (op, 0);
 
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  /* Multiple types in SLP are handled by creating the appropriate
number of
+ vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in
+ case of SLP.  */
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+ncopies = 1;
+  else
+ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies = 1);
+
   if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo,
def_stmt, def, dt[0], vectype_in))
 {
@@ -5006,7 +5007,6 @@ vectorizable_store (gimple stmt, gimple_
   tree vec_oprnd = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr
= NULL;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree elem_type;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = NULL;
@@ -5020,7 +5020,6 @@ vectorizable_store (gimple stmt, gimple_
   tree dataref_ptr = NULL_TREE;
   tree dataref_offset = NULL_TREE;
   gimple ptr_incr = NULL;
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   int j;
   gimple next_stmt, first_stmt = NULL;
@@ -5039,28 +5038,6 @@ vectorizable_store (gimple stmt, gimple_
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   tree aggr_type;
 
-  if (loop_vinfo)
-loop = LOOP_VINFO_LOOP (loop_vinfo);
-
-  /* Multiple types in SLP are handled by creating the appropriate
number of
- vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in
- case of SLP.  */
-  if (slp || PURE_SLP_STMT (stmt_info))
-ncopies = 1;
-  else
-ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) 

Re: [PATCH 0/2][trunk+5 backport][ARM] PR/65956 Implement AAPCS updates for alignment attribute

2015-07-03 Thread Richard Biener
On July 3, 2015 5:24:24 PM GMT+02:00, Alan Lawrence alan.lawre...@arm.com 
wrote:
This patch series implements the changes/additions to the ARM ABI
proposed at 
https://gcc.gnu.org/ml/gcc/2015-07/msg00040.html .

The first patch is the ABI update. This is an ABI-breaking change for
any code 
using __attribute__((aligned(...))) on a public interface (a case not
previously 
defined by the AAPCS).

This causes a regression of gcc.c-torture/execute/20040709-1.c at -O0
(only), 
and the align_rec2.c fails, both due to a latent bug where we can emit
strd/ldrd 
on an odd-numbered register in ARM state. The second patch prevents
such illegal 
instructions and fixes both tests.

On trunk, tested via bootstrap + check-gcc on arm-none-linux-gnueabihf 
(cortex-a15+neon). Also cross-tested arm-none-eabi with a number of
variants.

On gcc-5-branch, patches rebase cleanly, tested via profiledbootstrap +

check-gcc. (Yes, profiledbootstrap succeeds.)

Just FYI, the back port is OK to apply once the trunk side is approved.

Thanks,
Richard.




[patch] libstdc++/66742 use allocators correctly in list::sort()

2015-07-03 Thread Jonathan Wakely

In list::sort() we use 65 list objects to use as temporary storage,
splicing and swapping elements between lists.

However the lists are default constructed, with no allocator argument,
which is wrong because the allocator type might not be default
constructible, and even more wrong because splicing and merging
between lists with non-equal allocators is undefined behaviour.

So if this-get_allocator() != allocator_type() then we have undefined
behaviour.

The attached patch replaces the fixed-size array of 64
default-constructed lists with a new container-like type,
_ListSortBuf, which is initially empty (so we don't create lists we
don't need) but will grow up to a maximum size (which I kept at 64).
As the container grows it initializes new elements with the correct
allocator, so that every list used in the sort uses the same
allocator.

As well as reducing the number of lists we construct when sorting this
also allows us to range-check and ensure we don't overflow the
fixed-size array (we now get an exception if that happens, although
that's probably not possible even on a 64-bit machine).

Unfortunately this seems to hurt performance, presumably the extra
indirections to the _ListSOrtBuf rather than just an array of lists
confuse the optimisers.

Does anyone see any better solution to this? (other than rewriting the
whole sort function, which I think has been proposed)
commit ba5b393a09022907f9aee2b539ad14fc1fc8a42d
Author: Jonathan Wakely jwak...@redhat.com
Date:   Fri Jul 3 15:20:36 2015 +0100

	PR libstdc++/66742
	* include/bits/list.tcc (_ListSortBuf): Define.
	(list::sort): Use _ListSortBuf.
	* testsuite/23_containers/list/operations/66742.cc: New.

diff --git a/libstdc++-v3/include/bits/list.tcc b/libstdc++-v3/include/bits/list.tcc
index 714d9b5..f335af4 100644
--- a/libstdc++-v3/include/bits/list.tcc
+++ b/libstdc++-v3/include/bits/list.tcc
@@ -440,6 +440,53 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 	  }
   }
 
+  // A simple array-like container with a fixed maximum size.
+  templatetypename _List, size_t _Size
+class _ListSortBuf
+{
+  templatetypename _Tp, typename _Alloc
+	friend class list;
+
+  struct __attribute__((__aligned__(__alignof__(_List _Elem
+  {
+	  unsigned char _M_buf[sizeof(_List)];
+	  operator _List*() { return reinterpret_cast_List*(_M_buf); }
+  };
+  _Elem _M_buf[_Size];
+
+  typedef typename _List::allocator_type allocator_type;
+
+  struct _Impl : allocator_type
+  {
+	_Impl(const allocator_type __a) : allocator_type(__a), _M_size(0) { }
+	size_t _M_size;
+  } _M_impl;
+
+  _ListSortBuf(const allocator_type __a) : _M_impl(__a) { }
+
+  ~_ListSortBuf()
+  {
+	while (_M_impl._M_size)
+	  static_cast_List*(_M_buf[--_M_impl._M_size])-~_List();
+  }
+
+  _List* begin() { return _M_buf[0]; }
+  _List* end() { return begin() + _M_impl._M_size; }
+
+  void _M_grow()
+  {
+	if (_M_impl._M_size == _Size)
+	  __throw_bad_alloc();
+	::new(static_castvoid*(_M_buf + _M_impl._M_size)) _List(_M_impl);
+	++_M_impl._M_size;
+  }
+
+  bool empty() const { return _M_impl._M_size == 0; }
+
+  _ListSortBuf(const _ListSortBuf);
+  _ListSortBuf operator=(const _ListSortBuf);
+};
+
   templatetypename _Tp, typename _Alloc
 void
 list_Tp, _Alloc::
@@ -448,33 +495,36 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   // Do nothing if the list has length 0 or 1.
   if (this-_M_impl._M_node._M_next != this-_M_impl._M_node
 	   this-_M_impl._M_node._M_next-_M_next != this-_M_impl._M_node)
-  {
-list __carry;
-list __tmp[64];
-list * __fill = __tmp;
-list * __counter;
+	{
+	  list __carry(get_allocator());
+	  _ListSortBuflist, 64 __tmp(get_allocator());
+	  list* __counter;
 
-do
-	  {
-	__carry.splice(__carry.begin(), *this, begin());
+	  do
+	{
+	  __carry.splice(__carry.begin(), *this, begin());
 
-	for(__counter = __tmp;
-		__counter != __fill  !__counter-empty();
-		++__counter)
-	  {
-		__counter-merge(__carry);
-		__carry.swap(*__counter);
-	  }
-	__carry.swap(*__counter);
-	if (__counter == __fill)
-	  ++__fill;
-	  }
-	while ( !empty() );
+	  for(__counter = __tmp.begin();
+		  __counter != __tmp.end()  !__counter-empty();
+		  ++__counter)
+		{
+		  __counter-merge(__carry);
+		  __carry.swap(*__counter);
+		}
+	  if (__counter == __tmp.end())
+		__tmp._M_grow();
+	  __carry.swap(*__counter);
+	}
+	  while ( !empty() );
 
-for (__counter = __tmp + 1; __counter != __fill; ++__counter)
-  __counter-merge(*(__counter - 1));
-swap( *(__fill - 1) );
-  }
+	  if (!__tmp.empty())
+	{
+	  for (__counter = __tmp.begin() + 1; __counter != __tmp.end();
+		  ++__counter)
+		__counter-merge(*(__counter - 1));
+	  swap( *(__tmp.end() - 1) );
+	}
+	}
 }
 
   templatetypename _Tp, typename _Alloc
@@ -526,31 +576,34 @@ 

Re: [PATCH 2/2][ARM] fix movdi expander to avoid illegal ldrd/strd

2015-07-03 Thread Richard Earnshaw
On 03/07/15 16:27, Alan Lawrence wrote:
 The previous patch caused a regression in
 gcc.c-torture/execute/20040709-1.c at -O0 (only), and the new
 align_rec2.c test fails, both outputting an illegal assembler
 instruction (ldrd on an odd-numbered reg) from output_move_double in
 arm.c. Most routes have checks against such an illegal instruction, but
 expanding a function call can directly name such impossible register
 (pairs), bypassing the normal checks.
 
 gcc/ChangeLog:
 
 * config/arm/arm.md (movdi): Avoid odd-number ldrd/strd in ARM state.
 

OK.

R.

 arm_overalign_2.patch
 
 
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
 164ac13a26289bf755c89e78a8a5f751883c6039..c6718282d2555f8cf9a4e9111b1393e1f7704983
  100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -5415,6 +5415,42 @@
if (!REG_P (operands[0]))
   operands[1] = force_reg (DImode, operands[1]);
  }
 +  if (REG_P (operands[0])  REGNO (operands[0])  FIRST_VIRTUAL_REGISTER
 +   !HARD_REGNO_MODE_OK (REGNO (operands[0]), DImode))
 +{
 +  /* Avoid LDRD's into an odd-numbered register pair in ARM state
 +  when expanding function calls.  */
 +  gcc_assert (can_create_pseudo_p ());
 +  if (MEM_P (operands[1])  MEM_VOLATILE_P (operands[1]))
 + {
 +   /* Perform load into legal reg pair first, then move.  */
 +   rtx reg = gen_reg_rtx (DImode);
 +   emit_insn (gen_movdi (reg, operands[1]));
 +   operands[1] = reg;
 + }
 +  emit_move_insn (gen_lowpart (SImode, operands[0]),
 +   gen_lowpart (SImode, operands[1]));
 +  emit_move_insn (gen_highpart (SImode, operands[0]),
 +   gen_highpart (SImode, operands[1]));
 +  DONE;
 +}
 +  else if (REG_P (operands[1])  REGNO (operands[1])  
 FIRST_VIRTUAL_REGISTER
 + !HARD_REGNO_MODE_OK (REGNO (operands[1]), DImode))
 +{
 +  /* Avoid LDRD's into an odd-numbered register pair in ARM state
 +  when expanding function prologue.  */
 +  gcc_assert (can_create_pseudo_p ());
 +  rtx split_dest = (MEM_P (operands[0])  MEM_VOLATILE_P (operands[0]))
 +? gen_reg_rtx (DImode)
 +: operands[0];
 +  emit_move_insn (gen_lowpart (SImode, split_dest),
 +   gen_lowpart (SImode, operands[1]));
 +  emit_move_insn (gen_highpart (SImode, split_dest),
 +   gen_highpart (SImode, operands[1]));
 +  if (split_dest != operands[0])
 + emit_insn (gen_movdi (operands[0], split_dest));
 +  DONE;
 +}

  )
  
 



Re: [patch] libstdc++/66742 use allocators correctly in list::sort()

2015-07-03 Thread Daniel Krügler
2015-07-03 17:51 GMT+02:00 Jonathan Wakely jwak...@redhat.com:
 As well as reducing the number of lists we construct when sorting this
 also allows us to range-check and ensure we don't overflow the
 fixed-size array (we now get an exception if that happens, although
 that's probably not possible even on a 64-bit machine).

 Unfortunately this seems to hurt performance, presumably the extra
 indirections to the _ListSOrtBuf rather than just an array of lists
 confuse the optimisers.

 Does anyone see any better solution to this? (other than rewriting the
 whole sort function, which I think has been proposed)

I have not yet thought about better solutions, but:

- Isn't it necessary to cope with possibly final allocators when
unconditionally forming the derived member class

struct _Impl : allocator_type

? Maybe you could just define that as a non-deriving aggregate?

- Daniel


Re: [patch] libstdc++/66742 use allocators correctly in list::sort()

2015-07-03 Thread Jonathan Wakely

On 03/07/15 18:56 +0200, Daniel Krügler wrote:

- Isn't it necessary to cope with possibly final allocators when
unconditionally forming the derived member class

struct _Impl : allocator_type


If the allocator was final we couldn't even instantiate std::list
because of this in _List_base:

 struct _List_impl
 : public _Node_alloc_type
 {

This is https://gcc.gnu.org/PR60921 and I'm going to fix it
everywhere, so I'm not concerned about this one place yet.


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-03 Thread Richard Biener
On July 3, 2015 6:11:13 PM GMT+02:00, Richard Earnshaw 
richard.earns...@foss.arm.com wrote:
On 03/07/15 16:26, Alan Lawrence wrote:
 These include tests of structs, scalars, and vectors - only
 general-purpose registers are affected by the ABI rules for
alignment,
 but we can restrict the vector test to use the base AAPCS.
 
 Prior to this patch, align2.c, align3.c and align_rec1.c were failing
 (the latter showing an internal inconsistency, the first two merely
that
 GCC did not obey the new ABI).
 
 With this patch, the align_rec2.c fails, and also
 gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a
latent
 bug where we can emit strd/ldrd on an odd-numbered register in ARM
 state, fixed by the second patch.
 
 gcc/ChangeLog:
 
 * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer
 alignment attribute, exploring one level down for aggregates.
 
 gcc/testsuite/ChangeLog:
 
 * gcc.target/arm/aapcs/align1.c: New.
 * gcc.target/arm/aapcs/align_rec1.c: New.
 * gcc.target/arm/aapcs/align2.c: New.
 * gcc.target/arm/aapcs/align_rec2.c: New.
 * gcc.target/arm/aapcs/align3.c: New.
 * gcc.target/arm/aapcs/align_rec3.c: New.
 * gcc.target/arm/aapcs/align4.c: New.
 * gcc.target/arm/aapcs/align_rec4.c: New.
 * gcc.target/arm/aapcs/align_vararg1.c: New.
 * gcc.target/arm/aapcs/align_vararg2.c: New.
 
 arm_overalign_1.patch
 
 
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index
04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581
100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS
*pcum, tree fntype,
  static bool
  arm_needs_doubleword_align (machine_mode mode, const_tree type)
  {
 -  return (GET_MODE_ALIGNMENT (mode)  PARM_BOUNDARY
 -  || (type  TYPE_ALIGN (type)  PARM_BOUNDARY));
 +  if (!type)
 +return PARM_BOUNDARY  GET_MODE_ALIGNMENT (mode);
 +
 +  if (!AGGREGATE_TYPE_P (type))
 +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type))  PARM_BOUNDARY;
 +
 +  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN
(field))
 +if (DECL_ALIGN (field)  PARM_BOUNDARY)
 +  return true;
 +

Is this behavior correct for unions or aggregates with record or union members?


Technically this is incorrect since AGGREGATE_TYPE_P includes
ARRAY_TYPE
and ARRAY_TYPE doesn't have TYPE_FIELDS.  I doubt we could reach that
case though (unless there's a language that allows passing arrays by
value).

For array types I think you need to check TYPE_ALIGN (TREE_TYPE
(type)).

R.

 +  return false;
  }
  
  
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c
b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 new file mode 100644
 index
..8981d57c3eaf0bd89d224bec79ff8a45627a0a89
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 @@ -0,0 +1,29 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target arm32 } */
 +/* { dg-options -O } */
 +
 +#ifndef IN_FRAMEWORK
 +#define TESTFILE align1.c
 +
 +typedef __attribute__((aligned (8))) int alignedint;
 +
 +alignedint a = 11;
 +alignedint b = 13;
 +alignedint c = 17;
 +alignedint d = 19;
 +alignedint e = 23;
 +alignedint f = 29;
 +
 +#include abitest.h
 +#else
 +  ARG (alignedint, a, R0)
 +  /* Attribute suggests R2, but we should use only natural
alignment:  */
 +  ARG (alignedint, b, R1)
 +  ARG (alignedint, c, R2)
 +  ARG (alignedint, d, R3)
 +  ARG (alignedint, e, STACK)
 +  /* Attribute would suggest STACK + 8 but should be ignored:  */
 +  LAST_ARG (alignedint, f, STACK + 4)
 +#endif
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c
b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 new file mode 100644
 index
..992da53c606c793f25278152406582bb993719d2
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 @@ -0,0 +1,30 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target arm32 } */
 +/* { dg-options -O } */
 +
 +#ifndef IN_FRAMEWORK
 +#define TESTFILE align2.c
 +
 +/* The underlying struct here has alignment 4.  */
 +typedef struct __attribute__((aligned (8)))
 +  {
 +int x;
 +int y;
 +  } overaligned;
 +
 +/* A couple of instances, at 8-byte-aligned memory locations.  */
 +overaligned a = { 2, 3 };
 +overaligned b = { 5, 8 };
 +
 +#include abitest.h
 +#else
 +  ARG (int, 7, R0)
 +  /* Alignment should be 4.  */
 +  ARG (overaligned, a, R1)
 +  ARG (int, 9, R3)
 +  ARG (int, 10, STACK)
 +  /* Alignment should be 4.  */
 +  LAST_ARG (overaligned, b, STACK + 4)
 +#endif
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align3.c
b/gcc/testsuite/gcc.target/arm/aapcs/align3.c
 new file mode 100644
 index
..81ad3f587a95aae52ec601ce5a60b198e5351edf
 --- /dev/null
 +++ 

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-03 Thread Richard Earnshaw
On 03/07/15 16:26, Alan Lawrence wrote:
 These include tests of structs, scalars, and vectors - only
 general-purpose registers are affected by the ABI rules for alignment,
 but we can restrict the vector test to use the base AAPCS.
 
 Prior to this patch, align2.c, align3.c and align_rec1.c were failing
 (the latter showing an internal inconsistency, the first two merely that
 GCC did not obey the new ABI).
 
 With this patch, the align_rec2.c fails, and also
 gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent
 bug where we can emit strd/ldrd on an odd-numbered register in ARM
 state, fixed by the second patch.
 
 gcc/ChangeLog:
 
 * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer
 alignment attribute, exploring one level down for aggregates.
 
 gcc/testsuite/ChangeLog:
 
 * gcc.target/arm/aapcs/align1.c: New.
 * gcc.target/arm/aapcs/align_rec1.c: New.
 * gcc.target/arm/aapcs/align2.c: New.
 * gcc.target/arm/aapcs/align_rec2.c: New.
 * gcc.target/arm/aapcs/align3.c: New.
 * gcc.target/arm/aapcs/align_rec3.c: New.
 * gcc.target/arm/aapcs/align4.c: New.
 * gcc.target/arm/aapcs/align_rec4.c: New.
 * gcc.target/arm/aapcs/align_vararg1.c: New.
 * gcc.target/arm/aapcs/align_vararg2.c: New.
 
 arm_overalign_1.patch
 
 
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 
 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581
  100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree 
 fntype,
  static bool
  arm_needs_doubleword_align (machine_mode mode, const_tree type)
  {
 -  return (GET_MODE_ALIGNMENT (mode)  PARM_BOUNDARY
 -   || (type  TYPE_ALIGN (type)  PARM_BOUNDARY));
 +  if (!type)
 +return PARM_BOUNDARY  GET_MODE_ALIGNMENT (mode);
 +
 +  if (!AGGREGATE_TYPE_P (type))
 +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type))  PARM_BOUNDARY;
 +
 +  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 +if (DECL_ALIGN (field)  PARM_BOUNDARY)
 +  return true;
 +

Technically this is incorrect since AGGREGATE_TYPE_P includes ARRAY_TYPE
and ARRAY_TYPE doesn't have TYPE_FIELDS.  I doubt we could reach that
case though (unless there's a language that allows passing arrays by value).

For array types I think you need to check TYPE_ALIGN (TREE_TYPE (type)).

R.

 +  return false;
  }
  
  
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c 
 b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 new file mode 100644
 index 
 ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 @@ -0,0 +1,29 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target arm32 } */
 +/* { dg-options -O } */
 +
 +#ifndef IN_FRAMEWORK
 +#define TESTFILE align1.c
 +
 +typedef __attribute__((aligned (8))) int alignedint;
 +
 +alignedint a = 11;
 +alignedint b = 13;
 +alignedint c = 17;
 +alignedint d = 19;
 +alignedint e = 23;
 +alignedint f = 29;
 +
 +#include abitest.h
 +#else
 +  ARG (alignedint, a, R0)
 +  /* Attribute suggests R2, but we should use only natural alignment:  */
 +  ARG (alignedint, b, R1)
 +  ARG (alignedint, c, R2)
 +  ARG (alignedint, d, R3)
 +  ARG (alignedint, e, STACK)
 +  /* Attribute would suggest STACK + 8 but should be ignored:  */
 +  LAST_ARG (alignedint, f, STACK + 4)
 +#endif
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c 
 b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 new file mode 100644
 index 
 ..992da53c606c793f25278152406582bb993719d2
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 @@ -0,0 +1,30 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target arm32 } */
 +/* { dg-options -O } */
 +
 +#ifndef IN_FRAMEWORK
 +#define TESTFILE align2.c
 +
 +/* The underlying struct here has alignment 4.  */
 +typedef struct __attribute__((aligned (8)))
 +  {
 +int x;
 +int y;
 +  } overaligned;
 +
 +/* A couple of instances, at 8-byte-aligned memory locations.  */
 +overaligned a = { 2, 3 };
 +overaligned b = { 5, 8 };
 +
 +#include abitest.h
 +#else
 +  ARG (int, 7, R0)
 +  /* Alignment should be 4.  */
 +  ARG (overaligned, a, R1)
 +  ARG (int, 9, R3)
 +  ARG (int, 10, STACK)
 +  /* Alignment should be 4.  */
 +  LAST_ARG (overaligned, b, STACK + 4)
 +#endif
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align3.c 
 b/gcc/testsuite/gcc.target/arm/aapcs/align3.c
 new file mode 100644
 index 
 ..81ad3f587a95aae52ec601ce5a60b198e5351edf
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align3.c
 @@ -0,0 +1,42 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target 

[Patch, obvious] Guard inform with warning return value

2015-07-03 Thread Paolo Carlini

Hi,

noticed this nit in a conditional for c++11 attributes. I'm going to 
commit the below as obvious.


Thanks,
Paolo.

/
2015-07-03  Paolo Carlini  paolo.carl...@oracle.com

* attribs.c (decl_attributes): Guard inform with the return value
of the preceding warning.
Index: attribs.c
===
--- attribs.c   (revision 225384)
+++ attribs.c   (working copy)
@@ -469,10 +469,10 @@ decl_attributes (tree *node, tree attributes, int
  /* This is a c++11 attribute that appertains to a
 type-specifier, outside of the definition of, a class
 type.  Ignore it.  */
- warning (OPT_Wattributes, attribute ignored);
- inform (input_location,
- an attribute that appertains to a type-specifier 
- is ignored);
+ if (warning (OPT_Wattributes, attribute ignored))
+   inform (input_location,
+   an attribute that appertains to a type-specifier 
+   is ignored);
  continue;
}
 


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-03 Thread Jakub Jelinek
On Fri, Jul 03, 2015 at 04:26:02PM +0100, Alan Lawrence wrote:
 These include tests of structs, scalars, and vectors - only general-purpose
 registers are affected by the ABI rules for alignment, but we can restrict
 the vector test to use the base AAPCS.
 
 Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the
 latter showing an internal inconsistency, the first two merely that GCC did
 not obey the new ABI).
 
 With this patch, the align_rec2.c fails, and also
 gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug
 where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed
 by the second patch.
 
 gcc/ChangeLog:
 
   * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer
   alignment attribute, exploring one level down for aggregates.

Can you please also add the testcase from
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00278.html
to your patch set?  Or I can commit it separately after it is approved
(if it is).

Jakub


[PATCH] Fixes accidental renaming of gdb.py file (i.e. libstdc++.so.6.0.22-gdb.py)

2015-07-03 Thread Michael Darling
The addition of libstdc++fs broke an inexact and fragile method in the
libstdc++-v3/python makefile, so it mis-names a python script after
libstdc++fs rather than libstdc++.

With DESTDIR /usr/lib, toolexeclibdir ../lib, and the .so version of
6.0.21, this makefile used to install the python script to
/usr/lib/libstdc++.so.6.0.21-gdb.py.

Once libstdc++fs was added, this makefile installs the python script
to /usr/lib/libstdc++fs.a-gdb.py.

This makefile examines files named libstdc++* in
DESTDIR/toolexeclibdir, excluding: symlinks; *.la files; and previous
*-gdb.py files.  Its comments report it is done this way because
libtool hides the real names from us.

This patch changes the makefile so it examines files named libstdc++.*
(notice the addition of the dot.)  Although this is still not an
optimum method, it at least puts the makefile on the right track
again.  Adding the dot is more future-proof than excluding files
starting with libstdc++fs, because of the possibility of future
additions of similarly named libraries.

The patch below is also an attachment to this email.



Index: libstdc++-v3/ChangeLog
===
--- libstdc++-v3/ChangeLog(revision 225409)
+++ libstdc++-v3/ChangeLog(working copy)
@@ -1,3 +1,9 @@
+2015-07-03  Michael Darling  darli...@gmail.com
+
+* python/Makefile.am: python script name based off libstdc++.* rather
+than libstdc++*, to avoid being mis-named after libstdc++fs.
+* python/Makefile.in: Regenerate.
+
 2015-07-03  Jonathan Wakely  jwak...@redhat.com

 * doc/xml/manual/status_cxx2017.xml: Update status table.
Index: libstdc++-v3/python/Makefile.am
===
--- libstdc++-v3/python/Makefile.am(revision 225409)
+++ libstdc++-v3/python/Makefile.am(working copy)
@@ -45,11 +45,11 @@
 @$(mkdir_p) $(DESTDIR)$(toolexeclibdir)
 ## We want to install gdb.py as SOMETHING-gdb.py.  SOMETHING is the
 ## full name of the final library.  We want to ignore symlinks, the
-## .la file, and any previous -gdb.py file.  This is inherently
-## fragile, but there does not seem to be a better option, because
-## libtool hides the real names from us.
+## .la file, any previous -gdb.py file, and libstdc++fs*.  This is
+## inherently fragile, but there does not seem to be a better option,
+## because libtool hides the real names from us.
 @here=`pwd`; cd $(DESTDIR)$(toolexeclibdir); \
-  for file in libstdc++*; do \
+  for file in libstdc++.*; do \
 case $$file in \
   *-gdb.py) ;; \
   *.la) ;; \
Index: libstdc++-v3/python/Makefile.in
===
--- libstdc++-v3/python/Makefile.in(revision 225409)
+++ libstdc++-v3/python/Makefile.in(working copy)
@@ -547,7 +547,7 @@
 install-data-local: gdb.py
 @$(mkdir_p) $(DESTDIR)$(toolexeclibdir)
 @here=`pwd`; cd $(DESTDIR)$(toolexeclibdir); \
-  for file in libstdc++*; do \
+  for file in libstdc++.*; do \
 case $$file in \
   *-gdb.py) ;; \
   *.la) ;; \


gcc.libstdc++-v3.python.dot.fix.patch
Description: Binary data


Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-03 Thread Richard Sandiford
Hi Martin,

Martin Liška mli...@suse.cz writes:
 On 07/03/2015 03:07 PM, Richard Sandiford wrote:
 Martin Jambor mjam...@suse.cz writes:
 On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote:
 Trevor Saunders tbsau...@tbsaunde.org writes:
 On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:
 Martin Liška mli...@suse.cz writes:
 diff --git a/gcc/asan.c b/gcc/asan.c
 index e89817e..dabd6f1 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -362,20 +362,20 @@ struct asan_mem_ref
/* Pool allocation new operator.  */
inline void *operator new (size_t)
{
 -return pool.allocate ();
 +return ::new (pool.allocate ()) asan_mem_ref ();
}
  
/* Delete operator utilizing pool allocation.  */
inline void operator delete (void *ptr)
{
 -pool.remove ((asan_mem_ref *) ptr);
 +pool.remove (ptr);
}
  
/* Memory allocation pool.  */
 -  static pool_allocatorasan_mem_ref pool;
 +  static pool_allocator pool;
  };

 I'm probably going over old ground/wounds, sorry, but what's the benefit
 of having this sort of pattern?  Why not simply have object_allocators
 and make callers use pool.allocate () and pool.remove (x) (with
 pool.remove
 calling the destructor) instead of new and delete?  It feels wrong to me
 to tie the data type to a particular allocation object like this.

 Well the big question is what does allocate() do about construction?  if
 it seems wierd for it to not call the ctor, but I'm not sure we can do a
 good job of forwarding args to allocate() with C++98.

 If you need non-default constructors then:

   new (pool) type (aaa, bbb)...;

 doesn't seem too bad.  I agree object_allocator's allocate () should call
 the constructor.


 but then the pool allocator must not call placement new on the
 allocated memory itself because that would result in double
 construction.
 
 But we're talking about two different methods.  The normal allocator
 object_allocator T::allocate () would use placement new and return a
 pointer to the new object while operator new (size_t, object_allocator T )
 wouldn't call placement new and would just return a pointer to the memory.
 
 And using the pool allocator functions directly has the nice property
 that you can tell when a delete/remove isn't necessary because the pool
 itself is being cleared.

 Well, all these cases involve a pool with static storage lifetime right?
 so actually if you don't delete things in these pool they are
 effectively leaked.

 They might have a static storage lifetime now, but it doesn't seem like
 a good idea to hard-bake that into the interface

 Does that mean that operators new and delete are considered evil?
 
 Not IMO.  Just that static load-time-initialized caches are not
 necessarily a good thing.  That's effectively what the pool
 allocator is.
 
 (by saying that for
 these types you should use new and delete, but for other pool-allocated
 types you should use object_allocators).

 Depending on what kind of pool allocator you use, you will be forced
 to either call placement new or not, so the inconsistency will be
 there anyway.
 
 But how we handle argument-taking constructors is a problem that needs
 to be solved for the pool-allocated objects that don't use a single
 static type-specific pool.  And once we solve that, we get consistency
 across all pools:
 
 - if you want a new object and argumentless construction is OK,
   use pool.allocate ()
 
 - if you want a new object and need to pass arguments to the constructor,
   use new (pool) some_type (arg1, arg2, ...)
 
 Maybe I just have bad memories
 from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
 of state that was obviously static in the old days, but that needed
 to become non-static to support vaguely-efficient switching between
 different subtargets.  The same kind of thing is likely to happen again.
 I assume things like the jit would prefer not to have new global state
 with load-time construction.

 I'm not sure I follow this branch of the discussion, the allocators of
 any kind surely can dynamically allocated themselves?
 
 Sure, but either (a) you keep the pools as a static part of the class
 and some initialisation and finalisation code that has tendrils into
 all such classes or (b) you move the static pool outside of the
 class to some new (still global) state.  Explicit pool allocation,
 like in the C days, gives you the option of putting the pool whereever
 it needs to go without relying on the principle that you can get to
 it from global state.
 
 Thanks,
 Richard
 

 Ok Richard.

 I've just finally understood your suggestions and I would suggest following:

 + I will add a new method to object_allocatorT that will return an
 allocated memory (void*)
 (w/o calling any construction)
 + object_allocatorT::allocate will call placement new with for a
 parameterless ctor
 + I will remove all overwritten operators new/delete on e.g. et_forest, ...
 + For these classes, I will add void* 

Re: Fix PR52482, libitm compilation in OSX ppc with old cctools

2015-07-03 Thread Mike Stump
On Jul 3, 2015, at 4:16 AM, Carlos Sánchez de La Lama csanchez...@gmail.com 
wrote:
 PR52482 seems to be cause by old gas not supporting named parameters in
 macros. Xcode-2.5 (last available for OSX PPC) gas version is 1.38.
 
 Patch is against gcc-4.8.4, but affected lines have not changed in SVN HEAD.

Ok.

I dropped this into all active release branches as well.

If anyone spots any problems with the change, let me know.

If you do a test suite run, feel free to email it to the test results list.

[PR66726] Factor conversion out of COND_EXPR

2015-07-03 Thread Kugan
Please find a patch that attempt to FIX PR66726 by factoring conversion
out of COND_EXPR as explained in the PR.

Bootstrapped and regression tested on x86-64-none-linux-gnu with no new
regressions. Is this OK for trunk?

Thanks,
Kugan


gcc/testsuite/ChangeLog:

2015-07-03  Kugan Vivekanandarajah  kug...@linaro.org
Jeff Law  l...@redhat.com

PR middle-end/66726
* gcc.dg/tree-ssa/pr66726.c: New test.

gcc/ChangeLog:

2015-07-03  Kugan Vivekanandarajah  kug...@linaro.org

PR middle-end/66726
* tree-ssa-phiopt.c (factor_out_conditional_conversion): New function.
(tree_ssa_phiopt_worker): Call factor_out_conditional_conversion.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c
index e69de29..b636c8f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c
@@ -0,0 +1,13 @@
+
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-phiopt2 } */
+
+extern unsigned short mode_size[];
+int
+oof (int mode)
+{
+  return (64  mode_size[mode] ? 64 : mode_size[mode]);
+}
+
+/* { dg-final { scan-tree-dump-times MIN_EXPR 1 phiopt2 } } */
+
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index d2a5cee..e8af086 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 static unsigned int tree_ssa_phiopt_worker (bool, bool);
 static bool conditional_replacement (basic_block, basic_block,
 edge, edge, gphi *, tree, tree);
+static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, tree);
 static int value_replacement (basic_block, basic_block,
  edge, edge, gimple, tree, tree);
 static bool minmax_replacement (basic_block, basic_block,
@@ -342,6 +343,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
cfgchanged = true;
  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
+ else if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1))
+   cfgchanged = true;
}
 }
 
@@ -410,6 +413,108 @@ replace_phi_edge_with_variable (basic_block cond_block,
  bb-index);
 }
 
+/* PR66726: Factor conversion out of COND_EXPR.  If the argument of the PHI
+   stmt are CONVERT_STMT, factor out the conversion and perform the conversion
+   to the result of PHI stmt.  */
+
+static bool
+factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
+  tree arg0, tree arg1)
+{
+  gimple def0 = NULL, def1 = NULL, new_stmt;
+  tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE;
+  tree temp, result;
+  gimple_stmt_iterator gsi;
+
+  /* One of the argument has to be SSA_NAME and other argument can
+ be an SSA_NAME of INTEGER_CST.  */
+  if ((TREE_CODE (arg0) != SSA_NAME
+TREE_CODE (arg0) != INTEGER_CST)
+  || (TREE_CODE (arg1) != SSA_NAME
+  TREE_CODE (arg1) != INTEGER_CST)
+  || (TREE_CODE (arg0) == INTEGER_CST
+  TREE_CODE (arg1) == INTEGER_CST))
+return false;
+
+  /* Handle only PHI statements with two arguments.  TODO: If all
+ other arguments to PHI are INTEGER_CST, we can handle more
+ than two arguments too.  */
+  if (gimple_phi_num_args (phi) != 2)
+return false;
+
+  /* If arg0 is an SSA_NAME and the stmt which defines arg0 is
+ ai CONVERT_STMT, use the LHS as new_arg0.  */
+  if (TREE_CODE (arg0) == SSA_NAME)
+{
+  def0 = SSA_NAME_DEF_STMT (arg0);
+  if (!is_gimple_assign (def0)
+ || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def0)))
+   return false;
+  new_arg0 = gimple_assign_rhs1 (def0);
+}
+
+  /* If arg1 is an SSA_NAME and the stmt which defines arg0 is
+ ai CONVERT_STMT, use the LHS as new_arg1.  */
+  if (TREE_CODE (arg1) == SSA_NAME)
+{
+  def1 = SSA_NAME_DEF_STMT (arg1);
+  if (!is_gimple_assign (def1)
+ || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def1)))
+   return false;
+  new_arg1 = gimple_assign_rhs1 (def1);
+}
+
+  /* If arg0 is an INTEGER_CST, fold it to new type.  */
+  if (TREE_CODE (arg0) != SSA_NAME)
+{
+  if (!POINTER_TYPE_P (TREE_TYPE (new_arg1))
+  int_fits_type_p (arg0, TREE_TYPE (new_arg1)))
+   new_arg0 = fold_convert (TREE_TYPE (new_arg1), arg0);
+  else
+   return false;
+}
+
+  /* If arg1 is an INTEGER_CST, fold it to new type.  */
+  if (TREE_CODE (arg1) != SSA_NAME)
+{
+  if (!POINTER_TYPE_P (TREE_TYPE (new_arg0))
+  int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
+   new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
+  else
+   return false;
+}
+
+  /* If types of new_arg0 and new_arg1 are different bailout.  */
+  if (TREE_TYPE (new_arg0) != TREE_TYPE (new_arg1))
+return false;
+
+  /* Replace the PHI stmt with the new_arg0 and new_arg1.  Also insert
+ 

[PATCH] PR fortran/66725 -- Fix multiple ICEs

2015-07-03 Thread Steve Kargl
It seems that when the matching of various specifiers in 
OPEN, CLOSE, and WRITE were written with much confidence 
that user would not do something stupid.

The attached patch fixes multiple ICEs.  Regression tested
on i386-*-freebsd.  OK to commit?

PS: There are other ICEs caused be ill-formed specifiers.
This patch does not address those.


2015-07-03  Steven G. Kargl  ka...@gcc.gnu.org

* io.c (is_char_type): New function to test for BT_CHARACTER
(gfc_match_open, gfc_match_close, match_dt_element): Use it.

2015-07-03  Steven G. Kargl  ka...@gcc.gnu.org
* gfortran.dg/pr66725.f90: New test.

-- 
Steve
Index: gcc/fortran/io.c
===
--- gcc/fortran/io.c	(revision 225367)
+++ gcc/fortran/io.c	(working copy)
@@ -1242,6 +1242,19 @@ gfc_match_format (void)
 }
 
 
+static bool
+is_char_type (const char *name, gfc_expr *e)
+{
+  if (e-ts.type != BT_CHARACTER)
+{
+  gfc_error (%s requires a scalar-default-char-expr at %L,
+		   name, e-where);
+  return false;
+}
+  return true;
+}
+
+
 /* Match an expression I/O tag of some sort.  */
 
 static match
@@ -1870,6 +1883,9 @@ gfc_match_open (void)
   static const char *access_f2003[] = { STREAM, NULL };
   static const char *access_gnu[] = { APPEND, NULL };
 
+  if (!is_char_type (ACCESS, open-access))
+	goto cleanup;
+
   if (!compare_to_allowed_values (ACCESS, access_f95, access_f2003,
   access_gnu,
   open-access-value.character.string,
@@ -1882,6 +1898,9 @@ gfc_match_open (void)
 {
   static const char *action[] = { READ, WRITE, READWRITE, NULL };
 
+  if (!is_char_type (ACTION, open-action))
+	goto cleanup;
+
   if (!compare_to_allowed_values (ACTION, action, NULL, NULL,
   open-action-value.character.string,
   OPEN, warn))
@@ -1895,6 +1914,9 @@ gfc_match_open (void)
 			   not allowed in Fortran 95))
 	goto cleanup;
 
+  if (!is_char_type (ASYNCHRONOUS, open-asynchronous))
+	goto cleanup;
+
   if (open-asynchronous-expr_type == EXPR_CONSTANT)
 	{
 	  static const char * asynchronous[] = { YES, NO, NULL };
@@ -1913,6 +1935,9 @@ gfc_match_open (void)
 			   not allowed in Fortran 95))
 	goto cleanup;
 
+  if (!is_char_type (BLANK, open-blank))
+	goto cleanup;
+
   if (open-blank-expr_type == EXPR_CONSTANT)
 	{
 	  static const char *blank[] = { ZERO, NULL, NULL };
@@ -1931,6 +1956,9 @@ gfc_match_open (void)
 			   not allowed in Fortran 95))
 	goto cleanup;
 
+  if (!is_char_type (DECIMAL, open-decimal))
+	goto cleanup;
+
   if (open-decimal-expr_type == EXPR_CONSTANT)
 	{
 	  static const char * decimal[] = { COMMA, POINT, NULL };
@@ -1949,6 +1977,9 @@ gfc_match_open (void)
 	{
 	  static const char *delim[] = { APOSTROPHE, QUOTE, NONE, NULL };
 
+	if (!is_char_type (DELIM, open-delim))
+	  goto cleanup;
+
 	  if (!compare_to_allowed_values (DELIM, delim, NULL, NULL,
 	  open-delim-value.character.string,
 	  OPEN, warn))
@@ -1962,7 +1993,10 @@ gfc_match_open (void)
   if (!gfc_notify_std (GFC_STD_F2003, ENCODING= at %C 
 			   not allowed in Fortran 95))
 	goto cleanup;
-
+
+  if (!is_char_type (ENCODING, open-encoding))
+	goto cleanup;
+
   if (open-encoding-expr_type == EXPR_CONSTANT)
 	{
 	  static const char * encoding[] = { DEFAULT, UTF-8, NULL };
@@ -1979,6 +2013,9 @@ gfc_match_open (void)
 {
   static const char *form[] = { FORMATTED, UNFORMATTED, NULL };
 
+  if (!is_char_type (FORM, open-form))
+	goto cleanup;
+
   if (!compare_to_allowed_values (FORM, form, NULL, NULL,
   open-form-value.character.string,
   OPEN, warn))
@@ -1990,6 +2027,9 @@ gfc_match_open (void)
 {
   static const char *pad[] = { YES, NO, NULL };
 
+  if (!is_char_type (PAD, open-pad))
+	goto cleanup;
+
   if (!compare_to_allowed_values (PAD, pad, NULL, NULL,
   open-pad-value.character.string,
   OPEN, warn))
@@ -2001,6 +2041,9 @@ gfc_match_open (void)
 {
   static const char *position[] = { ASIS, REWIND, APPEND, NULL };
 
+  if (!is_char_type (POSITION, open-position))
+	goto cleanup;
+
   if (!compare_to_allowed_values (POSITION, position, NULL, NULL,
   open-position-value.character.string,
   OPEN, warn))
@@ -2014,6 +2057,9 @@ gfc_match_open (void)
 			   not allowed in Fortran 95))
   goto cleanup;
 
+  if (!is_char_type (ROUND, open-round))
+	goto cleanup;
+
   if (open-round-expr_type == EXPR_CONSTANT)
 	{
 	  static const char * round[] = { UP, DOWN, ZERO, NEAREST,
@@ -2034,6 +2080,9 @@ gfc_match_open (void)
 			   not allowed in Fortran 95))
 	goto cleanup;
 
+  if (!is_char_type (SIGN, open-sign))
+	goto cleanup;
+
   if (open-sign-expr_type == EXPR_CONSTANT)
 	{
 	  static const char * sign[] = { PLUS, SUPPRESS, PROCESSOR_DEFINED,
@@ -2071,6 +2120,9 @@ gfc_match_open (void)
   static const char *status[] = { OLD, NEW, SCRATCH,
 	REPLACE, 

[PATCH] MIPS: fix failing branch range checks for micromips

2015-07-03 Thread Andrew Bennett
Hi,

The current branch range tests assume that the MIPS branch instructions 
have a 16 bit branch offset which is shifted by 2.  Unfortunately for microMIPS 
this offset is shifted by 1 which reduces the branch range and is causing the 
branch-[2,4,6,10,12].c tests to fail.  
 
The following patch fixes this issue by firstly adding a new macro to 
branch-helper.h
which outputs the correct number of nops to describe the maximum positive range
of a 16 bit micromips branch offset (assuming the branch instruction has a 
delay slot).
Secondly it breaks-up the branch-[2,4,6,10,12].c files into mips tests (which 
have 
-mno-micromips added to them) and micromips tests (which use the new macro).

I have tested this on the mips-mti-elf target using 
mips32r2/{-mno-micromips/-mmicromips}
test options and there are no new regressions.
  
There is a follow-up patch that I will be working on that will correctly update 
the other 
branch tests to correctly test out of range branch behaviour for micromips.  
Currently these 
are passing because the mips branch range offset is large enough.  These 
offsets will 
need to be reduced for micromips to verify the compiler is calculating branch 
ranges correctly.

The ChangeLog and patch are below.

Ok to commit?


Many thanks,



Andrew



testsuite/
* gcc.target/mips/branch-2.c: Add -mno-micromips to dg-options.
* gcc.target/mips/branch-4.c: Ditto.
* gcc.target/mips/branch-6.c: Ditto. 
* gcc.target/mips/branch-8.c: Ditto.
* gcc.target/mips/branch-10.c: Ditto.
* gcc.target/mips/branch-12.c: Ditto.
* gcc.target/mips/branch-umips-2.c: New file.
* gcc.target/mips/branch-umips-4.c: New file.
* gcc.target/mips/branch-umips-6.c: New file. 
* gcc.target/mips/branch-umips-8.c: New file.
* gcc.target/mips/branch-umips-10.c: New file.
* gcc.target/mips/branch-umips-12.c: New file.
* gcc.target/mips/branch-helper.h (OCCUPY_0xfffc): New define.



diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c 
b/gcc/testsuite/gcc.target/mips/branch-10.c
index e2b1b5f..00569b0 100644
--- a/gcc/testsuite/gcc.target/mips/branch-10.c
+++ b/gcc/testsuite/gcc.target/mips/branch-10.c
@@ -1,4 +1,4 @@
-/* { dg-options -mshared -mabi=n32 } */
+/* { dg-options -mshared -mabi=n32 -mno-micromips } */
 /* { dg-final { scan-assembler-not (\\\$28|%gp_rel|%got) } } */
 /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */
 
diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c 
b/gcc/testsuite/gcc.target/mips/branch-12.c
index 4aef160..7d7580b 100644
--- a/gcc/testsuite/gcc.target/mips/branch-12.c
+++ b/gcc/testsuite/gcc.target/mips/branch-12.c
@@ -1,4 +1,4 @@
-/* { dg-options -mshared -mabi=64 } */
+/* { dg-options -mshared -mabi=64 -mno-micromips } */
 /* { dg-final { scan-assembler-not (\\\$28|%gp_rel|%got) } } */
 /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */
 
diff --git a/gcc/testsuite/gcc.target/mips/branch-2.c 
b/gcc/testsuite/gcc.target/mips/branch-2.c
index 6409c4c..241e885 100644
--- a/gcc/testsuite/gcc.target/mips/branch-2.c
+++ b/gcc/testsuite/gcc.target/mips/branch-2.c
@@ -1,4 +1,4 @@
-/* { dg-options -mshared -mabi=32 } */
+/* { dg-options -mshared -mabi=32 -mno-micromips } */
 /* { dg-final { scan-assembler-not (\\\$25|\\\$28|cpload) } } */
 /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */
 /* { dg-final { scan-assembler-not \\.cprestore } } */
diff --git a/gcc/testsuite/gcc.target/mips/branch-4.c 
b/gcc/testsuite/gcc.target/mips/branch-4.c
index 31e4909..923e6d4 100644
--- a/gcc/testsuite/gcc.target/mips/branch-4.c
+++ b/gcc/testsuite/gcc.target/mips/branch-4.c
@@ -1,4 +1,4 @@
-/* { dg-options -mshared -mabi=n32 } */
+/* { dg-options -mshared -mabi=n32 -mno-micromips } */
 /* { dg-final { scan-assembler-not (\\\$25|\\\$28|%gp_rel|%got) } } */
 /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */
 
diff --git a/gcc/testsuite/gcc.target/mips/branch-6.c 
b/gcc/testsuite/gcc.target/mips/branch-6.c
index 77e0340..2c75ab1 100644
--- a/gcc/testsuite/gcc.target/mips/branch-6.c
+++ b/gcc/testsuite/gcc.target/mips/branch-6.c
@@ -1,4 +1,4 @@
-/* { dg-options -mshared -mabi=64 } */
+/* { dg-options -mshared -mabi=64 -mno-micromips } */
 /* { dg-final { scan-assembler-not (\\\$25|\\\$28|%gp_rel|%got) } } */
 /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */
 
diff --git a/gcc/testsuite/gcc.target/mips/branch-8.c 
b/gcc/testsuite/gcc.target/mips/branch-8.c
index ba5f954..85df6b8 100644
--- a/gcc/testsuite/gcc.target/mips/branch-8.c
+++ b/gcc/testsuite/gcc.target/mips/branch-8.c
@@ -1,4 +1,4 @@
-/* { dg-options -mshared -mabi=32 } */
+/* { dg-options -mshared -mabi=32 -mno-micromips } */
 /* { dg-final { scan-assembler-not (\\\$28|cpload|cprestore) } } */
 /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */
 
diff --git a/gcc/testsuite/gcc.target/mips/branch-helper.h 
b/gcc/testsuite/gcc.target/mips/branch-helper.h
index 85399be..bc4a31f 100644
--- 

[gomp] Move openacc vector worker single handling to RTL

2015-07-03 Thread Nathan Sidwell
This patch reorganizes the handling of vector and worker single modes and their 
transitions to/from partitioned mode out of omp-low and into mach-dep-reorg. 
That allows the regular middle end optimizers to behave normally -- with two 
exceptions, see below.


There are no libgomp regressions, and a number of progressions -- mainly private 
variables now 'just work'.


The approach taken is to have expand_omp_for_static_(no)chunk to emit open acc 
builtins at the start and end of the loop -- the points where execution should 
transition into a partitioned mode and back to single mode.   I've actually used 
a single builtin with a constant argument to say whether it is the head or tail 
of the loop.  You could consider these to be like 'fork' and 'join' primitives, 
if that helps.


We cope with multi-mode loops over (say worker  vector dimensions), by emitted 
two loop head and tails in nested seqence.  I.e. 'hed-worker, head-vector loop 
tail-vector tail-worker'.  Thus at a transition we only have to consider one 
particular axis.


These builtins are made known to the duplication and merging optimizations as 
not-to-be duplicated or merged (see builtin_unique_p).  For instance, the jump 
threading optimizer has to already check operations on the potentially  threaded 
path as suitable for duplication, and this is an additional test there.  The 
tail-merging optimizer similarly has to determine that tails are identical, and 
that is never true for this particular builtin.  The intent is that the loops 
are then maintained as single-entry-single-exit all the way through to RTL 
expansion.


Where and when these builtins are expanded to target specific code is not fixed. 
 In the case of PTX they go all the way to RTL expansion.


At RTL expansion the builtins are expanded to volatile unspecs.  We insert 'pre' 
markers too, as some code needs to know the last instruction before the 
transition.  These are uncopyable, and AFAICT RTL doesn't do tail merging (or at 
least I've not encountered it) so again these cause the SESE nature of the loop 
to be preserved all the way to mach dep reorg.


That's where the fun starts.  We scan the CFG looking for the loop markers. 
First we break basic blocks so the head and tail markers are the first insns of 
their block.  That prevents us needing a mode transition mid block.  We then 
rescan the graph discovering loops and adding each block to the loop in which it 
resides.  The entire function is modeled as a NULL loop.


Once that is done we walk the loop structure and insert state propagation code 
at the loop head points.  For vector propagation that'll be a sequence of PTX 
shuffle instructions.  For worker propagation it is a bit more complicated.  At 
the pre-head marker, we insert a spill of state to .shared memory (executed by 
the single active worker) and at the head marker we insert a fill (executed by 
all workers).  We also insert a sync barrier before the fill.  More on where 
that memory comes from later.


Finally we walk the loop structure again, inserting block or loop neutering 
code.  Where possible we try and skip entire blocks[*], but the basic approach 
is the same.  We insert branch-around at the start of the initial block and, if 
needed, insert propagation code at the end of the final block (which might be 
the same block).  The vector-propagation case is again a simple shuffle, but the 
worker case is a spill/sync/fill sequence, with the spill done by the single 
active worker.  The subsequent unified branch is marked with an unspec operand, 
rather than relying on detecting the data flow.


Note, the branch around is inserted using hidden branches that appear to the 
rest of the compiler as volatile unspecs referring to a later label.  I don't 
think the expense of creating new blocks is necessary or worthwhile -- this is 
flow control the compiler doesn't need to know about (if it did, I argue that 
we're inserting this too early).


The worker spill/fill storage is a file-scope array variable, sized during 
compilation and emitted directly at the end of the compilation process.  Again, 
this is not registered with the rest of the compiler = (a) I  wasn't sure how 
to, and (b) considered this an internal bit of the backend.  It is shared by all 
functions in this TU.  Unfortunately PTX  doesn't appear to support COMMON,  so 
making it shared across all TU appears difficult -- one can always use LTO 
optimization anyway,


IMHO this is a step towards putting target-dependent handling in the target 
compiler and out of the more generic host-side compiler.


The changelog is separated into 3 parts
- a) general infrastructure
- b) additiona
- c) deletions.

comments?

nathan

[*] a possible optimization is to do superblock discovery, and skip those in a 
similar manner to loop skipping.
2015-07-02  Nathan Sidwell  nat...@codesourcery.com

Infrastructure:
* builtins.h (builtin_unique_p): Declare.
* builtins.c 

Re: [gomp] Move openacc vector worker single handling to RTL

2015-07-03 Thread Jakub Jelinek
On Fri, Jul 03, 2015 at 06:51:57PM -0400, Nathan Sidwell wrote:
 IMHO this is a step towards putting target-dependent handling in the target
 compiler and out of the more generic host-side compiler.
 
 The changelog is separated into 3 parts
 - a) general infrastructure
 - b) additiona
 - c) deletions.
 
 comments?

Thanks for working on it.

If the builtins are not meant to be used by users directly (I assume they
aren't) nor have a 1-1 correspondence to a library routine, it is much
better to emit them as internal calls (see internal-fn.{c,def}) instead of
BUILT_IN_NORMAL functions.

Jakub


Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-07-03 Thread Richard Earnshaw
On 03/07/15 19:24, Richard Biener wrote:
 On July 3, 2015 6:11:13 PM GMT+02:00, Richard Earnshaw 
 richard.earns...@foss.arm.com wrote:
 On 03/07/15 16:26, Alan Lawrence wrote:
 These include tests of structs, scalars, and vectors - only
 general-purpose registers are affected by the ABI rules for
 alignment,
 but we can restrict the vector test to use the base AAPCS.

 Prior to this patch, align2.c, align3.c and align_rec1.c were failing
 (the latter showing an internal inconsistency, the first two merely
 that
 GCC did not obey the new ABI).

 With this patch, the align_rec2.c fails, and also
 gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a
 latent
 bug where we can emit strd/ldrd on an odd-numbered register in ARM
 state, fixed by the second patch.

 gcc/ChangeLog:

 * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer
 alignment attribute, exploring one level down for aggregates.

 gcc/testsuite/ChangeLog:

 * gcc.target/arm/aapcs/align1.c: New.
 * gcc.target/arm/aapcs/align_rec1.c: New.
 * gcc.target/arm/aapcs/align2.c: New.
 * gcc.target/arm/aapcs/align_rec2.c: New.
 * gcc.target/arm/aapcs/align3.c: New.
 * gcc.target/arm/aapcs/align_rec3.c: New.
 * gcc.target/arm/aapcs/align4.c: New.
 * gcc.target/arm/aapcs/align_rec4.c: New.
 * gcc.target/arm/aapcs/align_vararg1.c: New.
 * gcc.target/arm/aapcs/align_vararg2.c: New.

 arm_overalign_1.patch


 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index
 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581
 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS
 *pcum, tree fntype,
  static bool
  arm_needs_doubleword_align (machine_mode mode, const_tree type)
  {
 -  return (GET_MODE_ALIGNMENT (mode)  PARM_BOUNDARY
 - || (type  TYPE_ALIGN (type)  PARM_BOUNDARY));
 +  if (!type)
 +return PARM_BOUNDARY  GET_MODE_ALIGNMENT (mode);
 +
 +  if (!AGGREGATE_TYPE_P (type))
 +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type))  PARM_BOUNDARY;
 +
 +  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN
 (field))
 +if (DECL_ALIGN (field)  PARM_BOUNDARY)
 +  return true;
 +
 
 Is this behavior correct for unions or aggregates with record or union 
 members?

Yes, at least that was my intention.  It's an error in the wording of
the proposed change, which I think should say composite types not
aggregate types.

R.

 

 Technically this is incorrect since AGGREGATE_TYPE_P includes
 ARRAY_TYPE
 and ARRAY_TYPE doesn't have TYPE_FIELDS.  I doubt we could reach that
 case though (unless there's a language that allows passing arrays by
 value).

 For array types I think you need to check TYPE_ALIGN (TREE_TYPE
 (type)).

 R.

 +  return false;
  }
  
  
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 new file mode 100644
 index
 ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c
 @@ -0,0 +1,29 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target arm32 } */
 +/* { dg-options -O } */
 +
 +#ifndef IN_FRAMEWORK
 +#define TESTFILE align1.c
 +
 +typedef __attribute__((aligned (8))) int alignedint;
 +
 +alignedint a = 11;
 +alignedint b = 13;
 +alignedint c = 17;
 +alignedint d = 19;
 +alignedint e = 23;
 +alignedint f = 29;
 +
 +#include abitest.h
 +#else
 +  ARG (alignedint, a, R0)
 +  /* Attribute suggests R2, but we should use only natural
 alignment:  */
 +  ARG (alignedint, b, R1)
 +  ARG (alignedint, c, R2)
 +  ARG (alignedint, d, R3)
 +  ARG (alignedint, e, STACK)
 +  /* Attribute would suggest STACK + 8 but should be ignored:  */
 +  LAST_ARG (alignedint, f, STACK + 4)
 +#endif
 diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 new file mode 100644
 index
 ..992da53c606c793f25278152406582bb993719d2
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c
 @@ -0,0 +1,30 @@
 +/* Test AAPCS layout (alignment).  */
 +
 +/* { dg-do run { target arm_eabi } } */
 +/* { dg-require-effective-target arm32 } */
 +/* { dg-options -O } */
 +
 +#ifndef IN_FRAMEWORK
 +#define TESTFILE align2.c
 +
 +/* The underlying struct here has alignment 4.  */
 +typedef struct __attribute__((aligned (8)))
 +  {
 +int x;
 +int y;
 +  } overaligned;
 +
 +/* A couple of instances, at 8-byte-aligned memory locations.  */
 +overaligned a = { 2, 3 };
 +overaligned b = { 5, 8 };
 +
 +#include abitest.h
 +#else
 +  ARG (int, 7, R0)
 +  /* Alignment should be 4.  */
 +  ARG (overaligned, a, R1)
 +  ARG (int, 9, R3)
 +  ARG (int, 10, STACK)
 +  /* Alignment should be 4.  */
 +  LAST_ARG (overaligned, b, STACK + 4)
 +#endif
 diff --git 

[PATCH PR66702]Skip pr48052 on targets not support vect_int_mult

2015-07-03 Thread Bin Cheng
Hi,
The test failed on sparc because sparc doesn't support vect_int_mult.  This
patch adds the prerequisite condition thus skips test on such platforms.

An obvious change, will apply it in 24h.

Thanks,
bin

gcc/testsuite/ChangeLog
2015-07-02  Bin Cheng  bin.ch...@arm.com

PR tree-optimization/66720
* gcc.dg/vect/pr48052.c: Use dg-require-effective-target
vect_int_mult.
Index: gcc/testsuite/gcc.dg/vect/pr48052.c
===
--- gcc/testsuite/gcc.dg/vect/pr48052.c (revision 225094)
+++ gcc/testsuite/gcc.dg/vect/pr48052.c (working copy)
@@ -1,9 +1,9 @@
 /* { dg-do compile } */
-/* { dg-additional-options -O3 } */
+/* { dg-require-effective-target vect_int_mult } */
 
 int foo(int* A, int* B,  unsigned start, unsigned BS)
 {
-  int s;
+  int s = 0;
   for (unsigned k = start;  k  start + BS; k++)
 {
   s += A[k] * B[k];
@@ -14,7 +14,7 @@ int foo(int* A, int* B,  unsigned start, unsigned
 
 int bar(int* A, int* B, unsigned BS)
 {
-  int s;
+  int s = 0;
   for (unsigned k = 0;  k  BS; k++)
 {
   s += A[k] * B[k];


Re: C++ PATCH to change default dialect to C++14

2015-07-03 Thread Richard Biener
On Fri, Jul 3, 2015 at 1:41 AM, Jim Wilson jim.wil...@linaro.org wrote:
 On 07/01/2015 11:17 PM, Jim Wilson wrote:
 On Wed, Jul 1, 2015 at 10:21 PM, Jason Merrill ja...@redhat.com wrote:
 This document also says that A workaround until libraries get updated is to
 include cstddef or stddef.h before any headers from that library.
 Can you try modifying the graphite* files accordingly?

 Right.  I forgot to try that.  Trying it now, I see that my build gets
 past the point that it failed, so this does appear to work.  I won't
 be able to finish a proper test until tomorrow, but for now this patch
 seems to work.

 Since the patch to include system.h before the isl header did not work,
 I went ahead and tested this patch to add stddef.h includes before the
 isl headers.  I tested it with an x86_64 bootstrap and make check.
 There were no problems caused by my patch.

Ok then.

I presume it might still cause issues on some hosts in the end.

At some point we talked about doing sth like


#define WANT_ISL_HEADERS
#include system.h

and include isl headers from system.h at the appropriate location
if WANT_ISL_HEADERS

Richard.

 Though as a side effect of doing this, I discovered another minor
 problem with the C++ version change.  This caused one additional
 testsuite failure.  It also caused a bunch of tests to start working,
 which is nice, but the new failure needs to be addressed.

 /home/wilson/FOSS/GCC/gcc-svn/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.
 c: In function 'void test_double_int_round_udiv()':
 /home/wilson/FOSS/GCC/gcc-svn/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.
 c:13:45: error: narrowing conversion of '-1' from '
 int' to 'long unsigned int' inside { } [-Wnarrowing]
double_int dmax = { -1, HOST_WIDE_INT_MAX };
  ^
 /home/wilson/FOSS/GCC/gcc-svn/gcc/testsuite/gcc.dg/plugin/wide-int_plugin.
 c:14:33: error: narrowing conversion of '-1' from '
 int' to 'long unsigned int' inside { } [-Wnarrowing]
double_int dnegone = { -1, -1 };
  ^
 ...
 FAIL: gcc.dg/plugin/wide-int_plugin.c compilation

 The code compiles with -std=c++98.  It does not compile with -std=c++14.
  So this testcase should be fixed to work with c++14.  Or the c++14
 support should be fixed if it is broken.

 Jim



Re: [Patch ARM-AArch64/testsuite Neon intrinsics: vget_lane

2015-07-03 Thread Marcus Shawcroft
On 2 July 2015 at 14:44, Christophe Lyon christophe.l...@linaro.org wrote:
 Hi,

 Here is the missing test for ARM/AArch64 AdvSIMD intrinsic: vget_lane.

 Tested on arm, armeb, aarch64 and aarch64_be targets (using QEMU).

 The tests all pass, expect on armeb where vgetq_lane_s64 and
 vgetq_lane_u64 fail. I haven't investigated in details yet.

 OK for trunk?

 2015-07-02  Christophe Lyon  christophe.l...@linaro.org

 * gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: New testcase.


OK /Marcus


Re: [Fortran f951, C++14] Fix trans-common.c compilation failure on AIX

2015-07-03 Thread Richard Biener
On Thu, Jul 2, 2015 at 10:49 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Jul 02, 2015 at 04:47:13PM -0400, David Edelsohn wrote:
 I can change the patch to include it after system.h, if that is
 preferred.  That order also works on AIX.

 If including it right after system.h works, it is preapproved.

Note that after config.h is generally better (considering all the #poison
stuff in system.h).

Not using std::map but GCCs own hash_map would be prefered though.
(otherwise at some point we'll end up including all of libstdc++ from
system.h given host compiler weirdness and workarounds for include
stuff - which is what system.h is for)

Richard.

 Jakub


Re: [Fortran f951, C++14] Fix trans-common.c compilation failure on AIX

2015-07-03 Thread Jakub Jelinek
On Fri, Jul 03, 2015 at 10:32:38AM +0200, Richard Biener wrote:
 On Thu, Jul 2, 2015 at 10:49 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Thu, Jul 02, 2015 at 04:47:13PM -0400, David Edelsohn wrote:
  I can change the patch to include it after system.h, if that is
  preferred.  That order also works on AIX.
 
  If including it right after system.h works, it is preapproved.
 
 Note that after config.h is generally better (considering all the #poison
 stuff in system.h).
 
 Not using std::map but GCCs own hash_map would be prefered though.
 (otherwise at some point we'll end up including all of libstdc++ from
 system.h given host compiler weirdness and workarounds for include
 stuff - which is what system.h is for)

Can we poison std::map and other templates we want to avoid in GCC sources,
so that people wouldn't be tempted to use it?

Jakub


Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)

2015-07-03 Thread Marek Polacek
Ping^4.

On Fri, Jun 26, 2015 at 10:08:51AM +0200, Marek Polacek wrote:
 I'm pinging the C++ parts.
 
 On Fri, Jun 19, 2015 at 12:44:36PM +0200, Marek Polacek wrote:
  Ping.
  
  On Fri, Jun 12, 2015 at 11:07:29AM +0200, Marek Polacek wrote:
   Ping.
   
   On Fri, Jun 05, 2015 at 10:55:08AM +0200, Marek Polacek wrote:
On Thu, Jun 04, 2015 at 09:04:19PM +, Joseph Myers wrote:
 The C changes are OK.

Jason, do you want to approve the C++ parts?

Marek


Re: [Fortran f951, C++14] Fix trans-common.c compilation failure on AIX

2015-07-03 Thread Richard Biener
On Fri, Jul 3, 2015 at 10:37 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Jul 03, 2015 at 10:32:38AM +0200, Richard Biener wrote:
 On Thu, Jul 2, 2015 at 10:49 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Thu, Jul 02, 2015 at 04:47:13PM -0400, David Edelsohn wrote:
  I can change the patch to include it after system.h, if that is
  preferred.  That order also works on AIX.
 
  If including it right after system.h works, it is preapproved.

 Note that after config.h is generally better (considering all the #poison
 stuff in system.h).

 Not using std::map but GCCs own hash_map would be prefered though.
 (otherwise at some point we'll end up including all of libstdc++ from
 system.h given host compiler weirdness and workarounds for include
 stuff - which is what system.h is for)

 Can we poison std::map and other templates we want to avoid in GCC sources,
 so that people wouldn't be tempted to use it?

Won't they just include map before system.h then? ;)

This needs to be caught by patch review I fear.

Richard.

 Jakub


Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-03 Thread Richard Sandiford
Trevor Saunders tbsau...@tbsaunde.org writes:
 On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:
 Martin Liška mli...@suse.cz writes:
  diff --git a/gcc/asan.c b/gcc/asan.c
  index e89817e..dabd6f1 100644
  --- a/gcc/asan.c
  +++ b/gcc/asan.c
  @@ -362,20 +362,20 @@ struct asan_mem_ref
 /* Pool allocation new operator.  */
 inline void *operator new (size_t)
 {
  -return pool.allocate ();
  +return ::new (pool.allocate ()) asan_mem_ref ();
 }
   
 /* Delete operator utilizing pool allocation.  */
 inline void operator delete (void *ptr)
 {
  -pool.remove ((asan_mem_ref *) ptr);
  +pool.remove (ptr);
 }
   
 /* Memory allocation pool.  */
  -  static pool_allocatorasan_mem_ref pool;
  +  static pool_allocator pool;
   };
 
 I'm probably going over old ground/wounds, sorry, but what's the benefit
 of having this sort of pattern?  Why not simply have object_allocators
 and make callers use pool.allocate () and pool.remove (x) (with pool.remove
 calling the destructor) instead of new and delete?  It feels wrong to me
 to tie the data type to a particular allocation object like this.

 Well the big question is what does allocate() do about construction?  if
 it seems wierd for it to not call the ctor, but I'm not sure we can do a
 good job of forwarding args to allocate() with C++98.

If you need non-default constructors then:

  new (pool) type (aaa, bbb)...;

doesn't seem too bad.  I agree object_allocator's allocate () should call
the constructor.

 However it seems kind of wierd the operator new here is calling the
 placement new on the object it allocates.

Yeah.

 And using the pool allocator functions directly has the nice property
 that you can tell when a delete/remove isn't necessary because the pool
 itself is being cleared.

 Well, all these cases involve a pool with static storage lifetime right?
 so actually if you don't delete things in these pool they are
 effectively leaked.

They might have a static storage lifetime now, but it doesn't seem like
a good idea to hard-bake that into the interface (by saying that for
these types you should use new and delete, but for other pool-allocated
types you should use object_allocators).  Maybe I just have bad memories
from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
of state that was obviously static in the old days, but that needed
to become non-static to support vaguely-efficient switching between
different subtargets.  The same kind of thing is likely to happen again.
I assume things like the jit would prefer not to have new global state
with load-time construction.

Thanks,
Richard


Re: [PATCH] PR target/66746: Failure to compile #include x86intrin.h with -miamcu

2015-07-03 Thread Uros Bizjak
On Fri, Jul 3, 2015 at 5:53 AM, H.J. Lu hjl.to...@gmail.com wrote:
 x86intrin.h has useful intrinsics for instructions for IA MCU.  This
 patch adds __iamcu__ check to x86intrin.h and ia32intrin.h.

 OK for trunk?

 H.J.
 ---
 gcc/

 PR target/66746
 * config/i386/ia32intrin.h (__crc32b): Don't define if __iamcu__
 is defined.
 (__crc32w): Likewise.
 (__crc32d): Likewise.
 (__rdpmc): Likewise.
 (__rdtscp): Likewise.
 (_rdpmc): Likewise.
 (_rdtscp): Likewise.
 * config/i386/x86intrin.h: Only include ia32intrin.h if __iamcu__
 is defined.

 gcc/testsuite/

 PR target/66746
 * gcc.target/i386/pr66746.c: New file.

OK.

Thanks,
Uros.

  gcc/config/i386/ia32intrin.h| 16 +++-
  gcc/config/i386/x86intrin.h |  5 +
  gcc/testsuite/gcc.target/i386/pr66746.c | 10 ++
  3 files changed, 30 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr66746.c

 diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h
 index 1f728c8..b8d1c31 100644
 --- a/gcc/config/i386/ia32intrin.h
 +++ b/gcc/config/i386/ia32intrin.h
 @@ -49,6 +49,8 @@ __bswapd (int __X)
return __builtin_bswap32 (__X);
  }

 +#ifndef __iamcu__
 +
  #ifndef __SSE4_2__
  #pragma GCC push_options
  #pragma GCC target(sse4.2)
 @@ -82,6 +84,8 @@ __crc32d (unsigned int __C, unsigned int __V)
  #pragma GCC pop_options
  #endif /* __DISABLE_SSE4_2__ */

 +#endif /* __iamcu__ */
 +
  /* 32bit popcnt */
  extern __inline int
  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 @@ -90,6 +94,8 @@ __popcntd (unsigned int __X)
return __builtin_popcount (__X);
  }

 +#ifndef __iamcu__
 +
  /* rdpmc */
  extern __inline unsigned long long
  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 @@ -98,6 +104,8 @@ __rdpmc (int __S)
return __builtin_ia32_rdpmc (__S);
  }

 +#endif /* __iamcu__ */
 +
  /* rdtsc */
  extern __inline unsigned long long
  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 @@ -106,6 +114,8 @@ __rdtsc (void)
return __builtin_ia32_rdtsc ();
  }

 +#ifndef __iamcu__
 +
  /* rdtscp */
  extern __inline unsigned long long
  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 @@ -114,6 +124,8 @@ __rdtscp (unsigned int *__A)
return __builtin_ia32_rdtscp (__A);
  }

 +#endif /* __iamcu__ */
 +
  /* 8bit rol */
  extern __inline unsigned char
  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 @@ -290,9 +302,11 @@ __writeeflags (unsigned int X)
  #define _bit_scan_reverse(a)   __bsrd(a)
  #define _bswap(a)  __bswapd(a)
  #define _popcnt32(a)   __popcntd(a)
 +#ifndef __iamcu__
  #define _rdpmc(a)  __rdpmc(a)
 -#define _rdtsc()   __rdtsc()
  #define _rdtscp(a) __rdtscp(a)
 +#endif /* __iamcu__ */
 +#define _rdtsc()   __rdtsc()
  #define _rotwl(a,b)__rolw((a), (b))
  #define _rotwr(a,b)__rorw((a), (b))
  #define _rotl(a,b) __rold((a), (b))
 diff --git a/gcc/config/i386/x86intrin.h b/gcc/config/i386/x86intrin.h
 index 6f7b1f6..be0a1a1 100644
 --- a/gcc/config/i386/x86intrin.h
 +++ b/gcc/config/i386/x86intrin.h
 @@ -26,6 +26,8 @@

  #include ia32intrin.h

 +#ifndef __iamcu__
 +
  #include mmintrin.h

  #include xmmintrin.h
 @@ -86,4 +88,7 @@
  #include xsavecintrin.h

  #include mwaitxintrin.h
 +
 +#endif /* __iamcu__ */
 +
  #endif /* _X86INTRIN_H_INCLUDED */
 diff --git a/gcc/testsuite/gcc.target/i386/pr66746.c 
 b/gcc/testsuite/gcc.target/i386/pr66746.c
 new file mode 100644
 index 000..3ef77bf
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/pr66746.c
 @@ -0,0 +1,10 @@
 +/* { dg-do compile { target ia32 } } */
 +/* { dg-options -O2 -miamcu } */
 +
 +/* Defining away extern and __inline results in all of them being
 +   compiled as proper functions.  */
 +
 +#define extern
 +#define __inline
 +
 +#include x86intrin.h
 --
 2.4.3



Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-03 Thread Martin Liška
On 07/03/2015 10:55 AM, Richard Sandiford wrote:
 Trevor Saunders tbsau...@tbsaunde.org writes:
 On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote:
 Martin Liška mli...@suse.cz writes:
 diff --git a/gcc/asan.c b/gcc/asan.c
 index e89817e..dabd6f1 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -362,20 +362,20 @@ struct asan_mem_ref
/* Pool allocation new operator.  */
inline void *operator new (size_t)
{
 -return pool.allocate ();
 +return ::new (pool.allocate ()) asan_mem_ref ();
}
  
/* Delete operator utilizing pool allocation.  */
inline void operator delete (void *ptr)
{
 -pool.remove ((asan_mem_ref *) ptr);
 +pool.remove (ptr);
}
  
/* Memory allocation pool.  */
 -  static pool_allocatorasan_mem_ref pool;
 +  static pool_allocator pool;
  };

 I'm probably going over old ground/wounds, sorry, but what's the benefit
 of having this sort of pattern?  Why not simply have object_allocators
 and make callers use pool.allocate () and pool.remove (x) (with pool.remove
 calling the destructor) instead of new and delete?  It feels wrong to me
 to tie the data type to a particular allocation object like this.

 Well the big question is what does allocate() do about construction?  if
 it seems wierd for it to not call the ctor, but I'm not sure we can do a
 good job of forwarding args to allocate() with C++98.
 
 If you need non-default constructors then:
 
   new (pool) type (aaa, bbb)...;
 
 doesn't seem too bad.  I agree object_allocator's allocate () should call
 the constructor.

Hello.

I do not insist on having a new/delete operator for aforementioned class.
However, I don't know a different approach that will do an object construction
in the allocate method w/o utilizing placement new?

 
 However it seems kind of wierd the operator new here is calling the
 placement new on the object it allocates.
 
 Yeah.
 
 And using the pool allocator functions directly has the nice property
 that you can tell when a delete/remove isn't necessary because the pool
 itself is being cleared.

 Well, all these cases involve a pool with static storage lifetime right?
 so actually if you don't delete things in these pool they are
 effectively leaked.
 
 They might have a static storage lifetime now, but it doesn't seem like
 a good idea to hard-bake that into the interface (by saying that for
 these types you should use new and delete, but for other pool-allocated
 types you should use object_allocators).  Maybe I just have bad memories
 from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot
 of state that was obviously static in the old days, but that needed
 to become non-static to support vaguely-efficient switching between
 different subtargets.  The same kind of thing is likely to happen again.
 I assume things like the jit would prefer not to have new global state
 with load-time construction.

Agree with that it's a global state. But even before my transformation the code
utilized static variables that are similar problem from e.g. JIT perspective. 
Best
approach would be to encapsulate these static allocators to a class (a pass?). 
It's
quite a lot of work.

Thanks,
Martin

 
 Thanks,
 Richard
 



Re: [Patch SRA] Fix PR66119 by calling get_move_ratio in SRA

2015-07-03 Thread Richard Biener
On Tue, 30 Jun 2015, James Greenhalgh wrote:

 
 On Fri, Jun 26, 2015 at 06:10:00PM +0100, Jakub Jelinek wrote:
  On Fri, Jun 26, 2015 at 06:03:34PM +0100, James Greenhalgh wrote:
   --- /dev/null
   +++ b/gcc/testsuite/g++.dg/pr66119.C
 
  I think generally testcases shouldn't be added into g++.dg/ directly,
  but subdirectories.  So g++.dg/opt/ ?
 
   @@ -0,0 +1,69 @@
   +/* PR66119 - MOVE_RATIO is not constant in a compiler run, so Scalar
   +   Reduction of Aggregates must ask the back-end more than once what
   +   the value of MOVE_RATIO now is.  */
   +
   +/* { dg-do compile  { target i?86-*-* x86_64-*-* } }  */
 
  In g++.dg/, dejagnu cycles through all 3 major -std=c* versions,
  thus using -std=c++11 is inappropriate.
  If the test requires c++11, instead you do
  // { dg-do compile { target { { i?86-*-* x86_64-*-* }  c++11 } } }
 
   +/* { dg-options -std=c++11 -O3 -mavx -fdump-tree-sra -march=slm { 
   target avx_runtime } } */
 
  and remove -std=c++11 here.  I don't see any point in guarding it with
  avx_runtime, after all, if not avx_runtime, the test will be compiled with
  -O0 and thus very likely fail the scan-tree-dump test.
 
  As it is dg-do compile test only, you have no dependency on assembler nor
  linker nor runtime.
  But I'd add -mtune=slm too.
 
 Thanks, I'm used to the dance we try to do to get Neon enabled/disabled
 correctly when testing multilib environments on ARM so tried to
 overengineer things!
 
 I've updated the testcase as you suggested, and moved it to g++.dg/opt.
 
 OK?

Looks good to me now.

Thanks,
Richard.

 Thanks,
 James
 
 ---
 gcc/
 
 2015-06-30  James Greenhalgh  james.greenha...@arm.com
 
   PR tree-optimization/66119
   * toplev.c (process_options): Don't set up default values for
   the sra_max_scalarization_size_{speed,size} parameters.
   * tree-sra (analyze_all_variable_accesses): If no values
   have been set for the sra_max_scalarization_size_{speed,size}
   parameters, call get_move_ratio to get target defaults.
 
 gcc/testsuite/
 
 2015-06-30  James Greenhalgh  james.greenha...@arm.com
 
   * g++.dg/opt/pr66119.C: New.
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [Ping, Patch, Fortran, PR58586, v5] ICE with derived type with allocatable component passed by value

2015-07-03 Thread Andre Vehreschild
Ping!

Version increment only to reflect rebasing on current trunk.

Bootstraps and regtests fine on x86_64-linux-gnu/f21.

I am tempted to follow Paul's method of setting a deadline for objections. Else
I will commit the patch next Friday (just kidding). I am more interested in
a review. The patch now lives in my code base for several months and is used to
compile a rather sophisticated fortran code without issues. So I expect no big
trouble in trunk given that the patch addresses a rather seldomly (;-)) used
construct. 

Ok for trunk?

Regards,
Andre

On Tue, 19 May 2015 16:01:37 +0200
Andre Vehreschild ve...@gmx.de wrote:

 Hi,
 
 attached is the most recent version of the patch for 58586. It adapts to
 recent trunk and addresses the caveats so far, i.e. the testcases in the
 comments now compile and run again w/o errors.
 
 Bootstraps and regtests fine on x86_64-linux-gnu/f21.
 
 Comments?
 
 - Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr58586_5.clog
Description: Binary data
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index efafabc..d16bf13 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -14083,10 +14083,15 @@ resolve_symbol (gfc_symbol *sym)
 
   if ((!a-save  !a-dummy  !a-pointer
 	!a-in_common  !a-use_assoc
-	(a-referenced || a-result)
-	!(a-function  sym != sym-result))
+	!a-result  !a-function)
 	  || (a-dummy  a-intent == INTENT_OUT  !a-pointer))
 	apply_default_init (sym);
+  else if (a-function  sym-result  a-access != ACCESS_PRIVATE
+	(sym-ts.u.derived-attr.alloc_comp
+		   || sym-ts.u.derived-attr.pointer_comp))
+	/* Mark the result symbol to be referenced, when it has allocatable
+	   components.  */
+	sym-result-attr.referenced = 1;
 }
 
   if (sym-ts.type == BT_CLASS  sym-ns == gfc_current_ns
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index b4f75ba..aec2018 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -5885,9 +5885,33 @@ gfc_generate_function_code (gfc_namespace * ns)
   tmp = gfc_trans_code (ns-code);
   gfc_add_expr_to_block (body, tmp);
 
-  if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node)
+  if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node
+  || (sym-result  sym-result != sym
+	   sym-result-ts.type == BT_DERIVED
+	   sym-result-ts.u.derived-attr.alloc_comp))
 {
+  bool artificial_result_decl = false;
   tree result = get_proc_result (sym);
+  gfc_symbol *rsym = sym == sym-result ? sym : sym-result;
+
+  /* Make sure that a function returning an object with
+	 alloc/pointer_components always has a result, where at least
+	 the allocatable/pointer components are set to zero.  */
+  if (result == NULL_TREE  sym-attr.function
+	   ((sym-result-ts.type == BT_DERIVED
+	(sym-attr.allocatable
+		   || sym-attr.pointer
+		   || sym-result-ts.u.derived-attr.alloc_comp
+		   || sym-result-ts.u.derived-attr.pointer_comp))
+	  || (sym-result-ts.type == BT_CLASS
+		   (CLASS_DATA (sym)-attr.allocatable
+		  || CLASS_DATA (sym)-attr.class_pointer
+		  || CLASS_DATA (sym-result)-attr.alloc_comp
+		  || CLASS_DATA (sym-result)-attr.pointer_comp
+	{
+	  artificial_result_decl = true;
+	  result = gfc_get_fake_result_decl (sym, 0);
+	}
 
   if (result != NULL_TREE  sym-attr.function  !sym-attr.pointer)
 	{
@@ -5907,16 +5931,30 @@ gfc_generate_function_code (gfc_namespace * ns)
 			null_pointer_node));
 	}
 	  else if (sym-ts.type == BT_DERIVED
-		sym-ts.u.derived-attr.alloc_comp
 		!sym-attr.allocatable)
 	{
-	  rank = sym-as ? sym-as-rank : 0;
-	  tmp = gfc_nullify_alloc_comp (sym-ts.u.derived, result, rank);
-	  gfc_add_expr_to_block (init, tmp);
+	  gfc_expr *init_exp;
+	  /* Arrays are not initialized using the default initializer of
+		 their elements.  Therefore only check if a default
+		 initializer is available when the result is scalar.  */
+	  init_exp = rsym-as ? NULL : gfc_default_initializer (rsym-ts);
+	  if (init_exp)
+		{
+		  tmp = gfc_trans_structure_assign (result, init_exp, 0);
+		  gfc_free_expr (init_exp);
+		  gfc_add_expr_to_block (init, tmp);
+		}
+	  else if (rsym-ts.u.derived-attr.alloc_comp)
+		{
+		  rank = rsym-as ? rsym-as-rank : 0;
+		  tmp = gfc_nullify_alloc_comp (rsym-ts.u.derived, result,
+		rank);
+		  gfc_prepend_expr_to_block (body, tmp);
+		}
 	}
 	}
 
-  if (result == NULL_TREE)
+  if (result == NULL_TREE || artificial_result_decl)
 	{
 	  /* TODO: move to the appropriate place in resolve.c.  */
 	  if (warn_return_type  sym == sym-result)
@@ -5926,7 +5964,7 @@ gfc_generate_function_code (gfc_namespace * ns)
 	  if (warn_return_type)
 	TREE_NO_WARNING(sym-backend_decl) = 1;
 	}
-  else
+  if (result != NULL_TREE)
 	gfc_add_expr_to_block (body, gfc_generate_return ());
 }
 
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 7747a67..195f7a4 100644

[PATCH][match-and-simplify] Properly canonicalize operand order for sub-expressions

2015-07-03 Thread Richard Biener

I observed that we fail to match patterns because when valueizing
sub-expression operands we fail to canonicalize operand order
and thus try matching (1 + a) - 1 instead of the canonical
(a + 1) - 1.  The following fixes this at least for commutative
tree codes.  For comparisons which we also canonicalize in the
plumbing (by means of changing the comparison code via
swap_tree_comparison) this isn't that easily done.  I'm thinking
of a proper solution here.

Bootstrap  regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-07-03  Richard Biener  rguent...@suse.de

* genmatch.c (commutative_tree_code, commutative_ternary_tree_code):
Copy from tree.c
(dt_operand::gen_gimple_expr): After valueizing operands
re-canonicalize operand order for commutative tree codes.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 225368)
+++ gcc/genmatch.c  (working copy)
@@ -175,6 +175,62 @@ END_BUILTINS
 };
 #undef DEF_BUILTIN
 
+/* Return true if CODE represents a commutative tree code.  Otherwise
+   return false.  */
+bool
+commutative_tree_code (enum tree_code code)
+{
+  switch (code)
+{
+case PLUS_EXPR:
+case MULT_EXPR:
+case MULT_HIGHPART_EXPR:
+case MIN_EXPR:
+case MAX_EXPR:
+case BIT_IOR_EXPR:
+case BIT_XOR_EXPR:
+case BIT_AND_EXPR:
+case NE_EXPR:
+case EQ_EXPR:
+case UNORDERED_EXPR:
+case ORDERED_EXPR:
+case UNEQ_EXPR:
+case LTGT_EXPR:
+case TRUTH_AND_EXPR:
+case TRUTH_XOR_EXPR:
+case TRUTH_OR_EXPR:
+case WIDEN_MULT_EXPR:
+case VEC_WIDEN_MULT_HI_EXPR:
+case VEC_WIDEN_MULT_LO_EXPR:
+case VEC_WIDEN_MULT_EVEN_EXPR:
+case VEC_WIDEN_MULT_ODD_EXPR:
+  return true;
+
+default:
+  break;
+}
+  return false;
+}
+
+/* Return true if CODE represents a ternary tree code for which the
+   first two operands are commutative.  Otherwise return false.  */
+bool
+commutative_ternary_tree_code (enum tree_code code)
+{
+  switch (code)
+{
+case WIDEN_MULT_PLUS_EXPR:
+case WIDEN_MULT_MINUS_EXPR:
+case DOT_PROD_EXPR:
+case FMA_EXPR:
+  return true;
+
+default:
+  break;
+}
+  return false;
+}
+
 
 /* Base class for all identifiers the parser knows.  */
 
@@ -1996,6 +2052,25 @@ dt_operand::gen_gimple_expr (FILE *f)
   child_opname, child_opname);
   fprintf (f, {\n);
 }
+  /* While the toplevel operands are canonicalized by the caller
+ after valueizing operands of sub-expressions we have to
+ re-canonicalize operand order.  */
+  if (operator_id *code = dyn_cast operator_id * (id))
+{
+  /* ???  We can't canonicalize tcc_comparison operands here
+ because that requires changing the comparison code which
+we already matched...  */
+  if (commutative_tree_code (code-code)
+ || commutative_ternary_tree_code (code-code))
+   {
+ char child_opname0[20], child_opname1[20];
+ gen_opname (child_opname0, 0);
+ gen_opname (child_opname1, 1);
+ fprintf (f, if (tree_swap_operands_p (%s, %s, false))\n
+std::swap (%s, %s);\n, child_opname0, child_opname1,
+  child_opname0, child_opname1);
+   }
+}
 
   return n_ops;
 }


Re: [PATCH 2/2] Add leon3r0 and leon3r0v7 CPU targets

2015-07-03 Thread Eric Botcazou
 Thank you for the patch in your other mail that changes this!

You're welcome.

 We were also thinking of the instruction timing information found in the
 leon_costs and leon3_costs. We took a look at the values in leon_costs
 and they seem to fit well with the UT699, except for division. We got a
 bit unsure as to what leon system they are based on, as the division
 cost was wrong also for the AT697F, which is the most common leon2
 system. Would it be ok to update the division cost values of leon_costs
 so that they match UT699 and AT697F?

Sure.

 In general, depending on how one instantiate a leon system and which FPU
 is selected, you will get different timing. Is there a recommended way
 of adding support for this without adding additional CPU targets?
 We are considering to add support for GRFPU-lite, which only differs in
 the timing.

One could add a -mtune-fpu switch.  Did you look at other architectures in the 
GCC tree that would have similar requirements?

-- 
Eric Botcazou


Fix PR52482, libitm compilation in OSX ppc with old cctools

2015-07-03 Thread Carlos Sánchez de La Lama
Hi all,

PR52482 seems to be cause by old gas not supporting named parameters in
macros. Xcode-2.5 (last available for OSX PPC) gas version is 1.38.

Patch is against gcc-4.8.4, but affected lines have not changed in SVN HEAD.

BR

Carlos

diff -ur gcc-4.8.4.old/libitm/config/powerpc/sjlj.S gcc-4.8.4/libitm/config/powerpc/sjlj.S
--- gcc-4.8.4.old/libitm/config/powerpc/sjlj.S	2014-04-04 16:17:55.0 +0200
+++ gcc-4.8.4/libitm/config/powerpc/sjlj.S	2015-07-03 11:34:23.0 +0200
@@ -83,16 +83,16 @@
 	bl	\name
 .endm
 #elif defined(_CALL_DARWIN)
-.macro FUNC name
+.macro FUNC
 	.globl	_$0
 _$0:
 .endmacro
-.macro END name
+.macro END
 .endmacro
-.macro HIDDEN name
+.macro HIDDEN
 	.private_extern _$0
 .endmacro
-.macro CALL name
+.macro CALL
 	bl	_$0
 .endmacro
 # ifdef __ppc64__

-- 
'Whoever has the power in society determines what can be studied, determines
what can be observed, determines what can be thought.'

Michael Crichton, Micro (2011)


C++ PATCH for c++/66748 (ICE with abi_tag on enum)

2015-07-03 Thread Marek Polacek
The following testcase was breaking because we we're trying to
access TYPE_LANG_SPECIFIC via CLASSTYPE_TEMPLATE_* macros without
first checking that we indeed have a CLASS_TYPE.

Bootstrapped/regtested on x86_64-linux, ok for trunk/5/4.9?

2015-07-03  Marek Polacek  pola...@redhat.com

PR c++/66748
* tree.c (handle_abi_tag_attribute): Check for CLASS_TYPE_P before
accessing TYPE_LANG_SPECIFIC node.

* g++.dg/abi/abi-tag15.C: New test.

diff --git gcc/cp/tree.c gcc/cp/tree.c
index 0d1112c..22d5b3a 100644
--- gcc/cp/tree.c
+++ gcc/cp/tree.c
@@ -3654,13 +3654,15 @@ handle_abi_tag_attribute (tree* node, tree name, tree 
args,
 name, *node);
  goto fail;
}
-  else if (CLASSTYPE_TEMPLATE_INSTANTIATION (*node))
+  else if (CLASS_TYPE_P (*node)
+   CLASSTYPE_TEMPLATE_INSTANTIATION (*node))
{
  warning (OPT_Wattributes, ignoring %qE attribute applied to 
   template instantiation %qT, name, *node);
  goto fail;
}
-  else if (CLASSTYPE_TEMPLATE_SPECIALIZATION (*node))
+  else if (CLASS_TYPE_P (*node)
+   CLASSTYPE_TEMPLATE_SPECIALIZATION (*node))
{
  warning (OPT_Wattributes, ignoring %qE attribute applied to 
   template specialization %qT, name, *node);
diff --git gcc/testsuite/g++.dg/abi/abi-tag15.C 
gcc/testsuite/g++.dg/abi/abi-tag15.C
index e69de29..bfda3a2 100644
--- gcc/testsuite/g++.dg/abi/abi-tag15.C
+++ gcc/testsuite/g++.dg/abi/abi-tag15.C
@@ -0,0 +1,3 @@
+// PR c++/66748
+
+enum __attribute__((abi_tag(foo))) E {}; // { dg-error redeclaration of }

Marek


[v3 PATCH] Implement Fundamentals v2 propagate_const

2015-07-03 Thread Ville Voutilainen
Tested on Linux-PPC64. Patch gzipped to avoid polluting people's
mailboxes with a 45k patch.

2015-07-03  Ville Voutilainen  ville.voutilai...@gmail.com
Implement std::experimental::fundamentals_v2::propagate_const.
* include/Makefile.am: Add propagate_const.
* include/Makefile.in: Add propagate_const.
* include/experimental/propagate_const: New.
* testsuite/experimental/propagate_const/assignment/copy.cc: Likewise.
* testsuite/experimental/propagate_const/assignment/move.cc: Likewise.
* testsuite/experimental/propagate_const/assignment/move_neg.cc:
Likewise.
* testsuite/experimental/propagate_const/cons/copy.cc: Likewise.
* testsuite/experimental/propagate_const/cons/default.cc: Likewise.
* testsuite/experimental/propagate_const/cons/move.cc: Likewise.
* testsuite/experimental/propagate_const/cons/move_neg.cc: Likewise.
* testsuite/experimental/propagate_const/hash/1.cc: Likewise.
* testsuite/experimental/propagate_const/observers/1.cc: Likewise.
* testsuite/experimental/propagate_const/relops/1.cc: Likewise.
* testsuite/experimental/propagate_const/requirements1.cc: Likewise.
* testsuite/experimental/propagate_const/requirements2.cc: Likewise.
* testsuite/experimental/propagate_const/requirements3.cc: Likewise.
* testsuite/experimental/propagate_const/requirements4.cc: Likewise.
* testsuite/experimental/propagate_const/requirements5.cc: Likewise.
* testsuite/experimental/propagate_const/swap/1.cc: Likewise.
* testsuite/experimental/propagate_const/typedefs.cc: Likewise.


propagate_const.diff.gz
Description: GNU Zip compressed data


Re: [PATCH 2/2] Add leon3r0 and leon3r0v7 CPU targets

2015-07-03 Thread Daniel Cederman

One could add a -mtune-fpu switch.  Did you look at other architectures in the
GCC tree that would have similar requirements?



Thank you for the suggestion about adding a -mtune-fpu switch. I have 
not yet looked at the other architectures, but will do so before proceeding.


--
Daniel Cederman


[PATCH] Do not use floating point registers when compiling with -msoft-float for SPARC

2015-07-03 Thread Daniel Cederman
__builtin_apply* and __builtin_return accesses the floating point registers on
SPARC even when compiling with -msoft-float.

gcc/ChangeLog:

2015-06-26  Daniel Cederman  ceder...@gaisler.com

* config/sparc/sparc.c (sparc_function_value_regno_p): Floating
  point registers cannot be used when compiling for a target
  without FPU.
* config/sparc/sparc.md: A function cannot return a value in a
  floating point register when compiled without floating point
  support.
---
 gcc/config/sparc/sparc.c  |  2 +-
 gcc/config/sparc/sparc.md | 26 --
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 2556eec..e0d40a5 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -7403,7 +7403,7 @@ sparc_libcall_value (machine_mode mode,
 static bool
 sparc_function_value_regno_p (const unsigned int regno)
 {
-  return (regno == 8 || regno == 32);
+  return (regno == 8 || (TARGET_FPU  regno == 32));
 }
 
 /* Do what is necessary for `va_start'.  We look at the current function
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index a561877..c296913 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -6398,7 +6398,6 @@
   
 {
   rtx valreg1 = gen_rtx_REG (DImode, 8);
-  rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32);
   rtx result = operands[1];
 
   /* Pass constm1 to indicate that it may expect a structure value, but
@@ -6407,8 +6406,12 @@
 
   /* Save the function value registers.  */
   emit_move_insn (adjust_address (result, DImode, 0), valreg1);
-  emit_move_insn (adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8),
- valreg2);
+  if (TARGET_FPU)
+{
+  rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32);
+  emit_move_insn (adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 
8),
+  valreg2);
+}
 
   /* The optimizer does not know that the call sets the function value
  registers we stored in the result block.  We avoid problems by
@@ -6620,7 +6623,6 @@
   
 {
   rtx valreg1 = gen_rtx_REG (DImode, 24);
-  rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32);
   rtx result = operands[0];
 
   if (! TARGET_ARCH64)
@@ -6637,14 +6639,18 @@
   emit_insn (gen_update_return (rtnreg, value));
 }
 
-  /* Reload the function value registers.  */
+  /* Reload the function value registers.
+ Put USE insns before the return.  */
   emit_move_insn (valreg1, adjust_address (result, DImode, 0));
-  emit_move_insn (valreg2,
- adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8));
-
-  /* Put USE insns before the return.  */
   emit_use (valreg1);
-  emit_use (valreg2);
+
+  if ( TARGET_FPU )
+{
+  rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32);
+  emit_move_insn (valreg2,
+  adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 
8));
+  emit_use (valreg2);
+}
 
   /* Construct the return.  */
   expand_naked_return ();
-- 
2.4.3



[PATCH] Update instruction cost for LEON

2015-07-03 Thread Daniel Cederman
gcc/ChangeLog:

2015-07-03  Daniel Cederman  ceder...@gaisler.com

* config/sparc/sparc.c (struct processor_costs): Set div cost
for leon to match UT699 and AT697F. Set mul cost for leon3 to
match standard leon3.
---
 gcc/config/sparc/sparc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index e0d40a5..54341c5 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -251,8 +251,8 @@ struct processor_costs leon_costs = {
   COSTS_N_INSNS (5), /* imul */
   COSTS_N_INSNS (5), /* imulX */
   0, /* imul bit factor */
-  COSTS_N_INSNS (5), /* idiv */
-  COSTS_N_INSNS (5), /* idivX */
+  COSTS_N_INSNS (35), /* idiv */
+  COSTS_N_INSNS (35), /* idivX */
   COSTS_N_INSNS (1), /* movcc/movr */
   0, /* shift penalty */
 };
@@ -272,8 +272,8 @@ struct processor_costs leon3_costs = {
   COSTS_N_INSNS (15), /* fdivd */
   COSTS_N_INSNS (22), /* fsqrts */
   COSTS_N_INSNS (23), /* fsqrtd */
-  COSTS_N_INSNS (5), /* imul */
-  COSTS_N_INSNS (5), /* imulX */
+  COSTS_N_INSNS (1), /* imul */
+  COSTS_N_INSNS (1), /* imulX */
   0, /* imul bit factor */
   COSTS_N_INSNS (35), /* idiv */
   COSTS_N_INSNS (35), /* idivX */
-- 
2.4.3



[PATCH] save takes a single integer (register or 13-bit signed immediate)

2015-07-03 Thread Daniel Cederman
This removes a warning about operand 0 missing mode

gcc/ChangeLog:

2015-06-26  Daniel Cederman  ceder...@gaisler.com

* config/sparc/sparc.md: Window save takes a single integer
---
 gcc/config/sparc/sparc.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index c296913..66f7306 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -6490,7 +6490,7 @@
 
 (define_insn window_save
   [(unspec_volatile
-   [(match_operand 0 arith_operand rI)]
+   [(match_operand:SI 0 arith_operand rI)]
UNSPECV_SAVEW)]
   !TARGET_FLAT
   save\t%%sp, %0, %%sp
-- 
2.4.3