date:20151126

Re: [PATCH 2/6] Fix memory leak in tree-ssa

2015-11-26 Thread Martin Liška


On 11/23/2015 02:48 PM, marxin wrote:

gcc/ChangeLog:

2015-11-20  Martin Liska  

* tree-ssa.c (redirect_edge_var_map_destroy): Release
vectors that are used as a second argument of a hash_map.
---
  gcc/tree-ssa.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index 02fca4c..db7d065 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -121,6 +121,11 @@ redirect_edge_var_map_vector (edge e)
  void
  redirect_edge_var_map_destroy (void)
  {
+  if (edge_var_maps)
+for (hash_map::iterator it =
+edge_var_maps->begin (); it != edge_var_maps->end (); ++it)
+  (*it).second.release ();
+
delete edge_var_maps;
edge_var_maps = NULL;
  }



Hi.

As Trevor fixed behavior of hash_maps that now release both key and value,
the patch is not needed any more.

Martin

Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c

2015-11-26 Thread Bernd Schmidt


On 11/26/2015 09:53 PM, Martin Liška wrote:

Is the patch still candidate to be merged in current stage3, or should I
leave it to the next stage1?
What about the first patch or the patch, where I just applied
replacement of whitespaces?


As I said previously, the one to just replace whitespace is ok for now. 
Please ping the other one when stage1 opens (I expect it'll need changes 
by then).



Bernd

Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c

2015-11-26 Thread Martin Liška


On 11/20/2015 12:15 PM, Martin Liška wrote:

On 11/20/2015 03:14 AM, Bernd Schmidt wrote:

BTW, I'm with whoever said absolutely no way to the idea of making automatic 
changes like this as part of a commit hook.

I think the whitespace change can go in if it hasn't already, but I think the 
other one still has enough problems that I'll say - leave it for the next stage 
1.


@@ -178,8 +173,9 @@ warn_uninitialized_vars (bool warn_possibly_uninitialized)

FOR_EACH_BB_FN (bb, cfun)
  {
-  bool always_executed = dominated_by_p (CDI_POST_DOMINATORS,
- single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb);
+  bool always_executed
+= dominated_by_p (CDI_POST_DOMINATORS,
+  single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb);
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
  {


Better to pull the single_succ into its own variable perhaps?


@@ -1057,7 +1039,8 @@ prune_uninit_phi_opnds_in_unrealizable_paths (gphi *phi,
   *visited_flag_phis = BITMAP_ALLOC (NULL);

 if (bitmap_bit_p (*visited_flag_phis,
-SSA_NAME_VERSION (gimple_phi_result (flag_arg_def
+SSA_NAME_VERSION (
+  gimple_phi_result (flag_arg_def
   return false;

 bitmap_set_bit (*visited_flag_phis,


Pull the gimple_phi_result into a separate variable, or just leave it unchanged 
for now, the modified version is too ugly to live.


 bitmap_clear_bit (*visited_flag_phis,
-SSA_NAME_VERSION (gimple_phi_result (flag_arg_def)));
+SSA_NAME_VERSION (
+  gimple_phi_result (flag_arg_def)));
 continue;
   }


Here too.


-  all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths (phi,
- uninit_opnds,
- as_a  (flag_def),
- boundary_cst,
- cmp_code,
- visited_phis,
- _flag_phis);
+  all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths
+(phi, uninit_opnds, as_a (flag_def), boundary_cst, cmp_code,
+ visited_phis, _flag_phis);


I'd rather shorten the name of the function, even if it goes against the spirit 
of the novel writing month.


-  if (gphi *use_phi = dyn_cast  (use_stmt))
-use_bb = gimple_phi_arg_edge (use_phi,
-  PHI_ARG_INDEX_FROM_USE (use_p))->src;
+  if (gphi *use_phi = dyn_cast (use_stmt))
+use_bb
+  = gimple_phi_arg_edge (use_phi, PHI_ARG_INDEX_FROM_USE (use_p))->src;
 else
   use_bb = gimple_bb (use_stmt);


There are some changes of this nature and I'm not sure it's an improvement. 
Leave them out for now?


@@ -2345,8 +2309,8 @@ warn_uninitialized_phi (gphi *phi, vec *worklist,
   }

 /* Now check if we have any use of the value without proper guard.  */
-  uninit_use_stmt = find_uninit_use (phi, uninit_opnds,
- worklist, added_to_worklist);
+  uninit_use_stmt
+= find_uninit_use (phi, uninit_opnds, worklist, added_to_worklist);

 /* All uses are properly guarded.  */
 if (!uninit_use_stmt)


Here too.


@@ -2396,12 +2359,24 @@ public:
 {}

 /* opt_pass methods: */
-  opt_pass * clone () { return new pass_late_warn_uninitialized (m_ctxt); }
-  virtual bool gate (function *) { return gate_warn_uninitialized (); }
+  opt_pass *clone ();
+  virtual bool gate (function *);


This may technically violate our coding standards, but it's consistent with a 
lot of similar cases. Since coding standards are about enforcing consistency, 
I'd drop this change.


   namespace {

   const pass_data pass_data_early_warn_uninitialized =
   {
-  GIMPLE_PASS, /* type */
+  GIMPLE_PASS,   /* type */
 "*early_warn_uninitialized", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
-  TV_TREE_UNINIT, /* tv_id */
-  PROP_ssa, /* properties_required */
-  0, /* properties_provided */
-  0, /* properties_destroyed */
-  0, /* todo_flags_start */
-  0, /* todo_flags_finish */
+  OPTGROUP_NONE,   /* optinfo_flags */
+  TV_TREE_UNINIT,   /* tv_id */
+  PROP_ssa,   /* properties_required */
+  0,   /* properties_provided */
+  0,   /* properties_destroyed */
+  0,   /* todo_flags_start */
+  0,   /* todo_flags_finish */
   };


Likewise. Seems to be done practically nowhere.


   class pass_early_warn_uninitialized : public gimple_opt_pass
@@ -2519,14 +2491,23 @@ public:
 {}

 /* opt_pass methods: */
-  virtual bool gate (function *) { return gate_warn_uninitialized (); }
-  virtual unsigned int execute (function *)
-{
-  return execute_early_warn_uninitialized ();
-}
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *);


Likewise.


Bernd


Hi.

Enhanced patch should cover all notes pointed in the

Re: [PATCH 3/6] Fix memory leaks in IPA devirt

2015-11-26 Thread Martin Liška


On 11/23/2015 11:29 PM, Trevor Saunders wrote:

On Mon, Nov 23, 2015 at 02:48:37PM +0100, marxin wrote:

gcc/ChangeLog:

2015-11-20  Martin Liska  

* ipa-devirt.c (ipa_devirt): Use auto_vec instead
of a local-scope vec. Release final_warning_records.
---
  gcc/ipa-devirt.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index e74f853..6003c92 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -3837,7 +3837,7 @@ ipa_devirt (void)

if (warn_suggest_final_methods)
{
- vec decl_warnings_vec = vNULL;
+ auto_vec decl_warnings_vec;

  final_warning_records->decl_warnings.traverse
 
(_warnings_vec);
@@ -3887,7 +3887,8 @@ ipa_devirt (void)
  decl, count, dyn_count);
}
}
-   
+
+  final_warning_records->type_warnings.release ();
delete (final_warning_records);


You should be able to just make
final_warning_record::type_warnings an auto_vec right? that
seems less error prone, though this is certainly fine for now.

Trev


final_warning_records = 0;
  }
--
2.6.3




Hi.

There's v2 of the patch that reflects ideas suggested by Trevor.

Ready to be installed?
Thanks,
Martin
>From 2362e45abccc28a8fc6ed9ad6cbc69a9bee888c7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 23 Nov 2015 14:48:37 +0100
Subject: [PATCH 2/6] Fix memory leaks in IPA devirt

gcc/ChangeLog:

2015-11-20  Martin Liska  

	* ipa-devirt.c (ipa_devirt): Use auto_vec instead
	of a local-scope vec.
	(struct final_warning_record): Use auto_vec instead
	of vec.
---
 gcc/ipa-devirt.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index e74f853..1539bb9 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -2987,7 +2987,7 @@ struct decl_warn_count
 struct final_warning_record
 {
   gcov_type dyn_count;
-  vec type_warnings;
+  auto_vec type_warnings;
   hash_map decl_warnings;
 };
 struct final_warning_record *final_warning_records;
@@ -3609,7 +3609,6 @@ ipa_devirt (void)
   if (warn_suggest_final_methods || warn_suggest_final_types)
 {
   final_warning_records = new (final_warning_record);
-  final_warning_records->type_warnings = vNULL;
   final_warning_records->type_warnings.safe_grow_cleared (odr_types.length ());
   free_polymorphic_call_targets_hash ();
 }
@@ -3837,7 +3836,7 @@ ipa_devirt (void)
 
   if (warn_suggest_final_methods)
 	{
-	  vec decl_warnings_vec = vNULL;
+	  auto_vec decl_warnings_vec;
 
 	  final_warning_records->decl_warnings.traverse
 	 (_warnings_vec);
@@ -3887,7 +3886,7 @@ ipa_devirt (void)
 			  decl, count, dyn_count);
 	}
 	}
-	
+
   delete (final_warning_records);
   final_warning_records = 0;
 }
-- 
2.6.3

RE: [PATCH] MIPS/GCC/doc: Reorder `-mcompact-branches='

2015-11-26 Thread Maciej W. Rozycki

On Thu, 26 Nov 2015, Moore, Catherine wrote:

> >  OK to apply?
> > 
>  Yes -- thanks.

 Applied, thanks for your review.

  Maciej

fix formatting

2015-11-26 Thread Mike Stump

I checked this in to fix a formatting issue.  != binds more tightly than &&.


Index: lra-constraints.c
===
--- lra-constraints.c   (revision 230982)
+++ lra-constraints.c   (working copy)
@@ -2556,8 +2556,8 @@ process_alt_operands (int only_alternati
 another operand as an operand matching the earlyclobber
 operand can be also the same.  */
  if (first_conflict_j == last_conflict_j
- && operand_reg[last_conflict_j]
- != NULL_RTX && ! curr_alt_match_win[last_conflict_j]
+ && operand_reg[last_conflict_j] != NULL_RTX
+ && ! curr_alt_match_win[last_conflict_j]
  && REGNO (operand_reg[i]) == REGNO (operand_reg[last_conflict_j]))
{
  curr_alt_win[last_conflict_j] = false;

Re: [PATCH 1/6] Fix memory leak in cilk

2015-11-26 Thread Martin Liška


On 11/23/2015 02:48 PM, marxin wrote:

gcc/c/ChangeLog:

2015-11-20  Martin Liska  

PR c++/68312
* c-array-notation.c (fix_builtin_array_notation_fn):
Use release_vec_vec instead of vec::release.
(build_array_notation_expr): Likewise.
(fix_conditional_array_notations_1): Likewise.
(fix_array_notation_expr): Likewise.
(fix_array_notation_call_expr): Likewise.

gcc/cp/ChangeLog:

2015-11-20  Martin Liska  

PR c++/68312
* cp-array-notation.c (expand_sec_reduce_builtin):
Likewise.
(create_array_refs): Replace argument with const reference.
(expand_an_in_modify_expr): Likewise.
(cp_expand_cond_array_notations): Likewise.
(expand_unary_array_notation_exprs): Likewise.

gcc/c-family/ChangeLog:

2015-11-20  Martin Liska  

PR c++/68312
* array-notation-common.c (cilkplus_extract_an_triplets):
Release vector of vectors.
* cilk.c (gimplify_cilk_spawn): Free allocated memory.

gcc/ChangeLog:

2015-11-20  Martin Liska  

PR c++/68312
* vec.h (release_vec_vec): New function.
---
  gcc/c-family/array-notation-common.c |  2 ++
  gcc/c-family/cilk.c  |  1 +
  gcc/c/c-array-notation.c | 38 ++
  gcc/cp/cp-array-notation.c   | 52 ++--
  gcc/vec.h| 12 +
  5 files changed, 55 insertions(+), 50 deletions(-)

diff --git a/gcc/c-family/array-notation-common.c 
b/gcc/c-family/array-notation-common.c
index 4f7072b..5f2209d 100644
--- a/gcc/c-family/array-notation-common.c
+++ b/gcc/c-family/array-notation-common.c
@@ -636,6 +636,8 @@ cilkplus_extract_an_triplets (vec *list, 
size_t size, size_t rank,
  fold_build1 (CONVERT_EXPR, integer_type_node,
   ARRAY_NOTATION_STRIDE (ii_tree));
  }
+
+  release_vec_vec (array_exprs);
  }

  /* Replaces all the __sec_implicit_arg functions in LIST with the induction
diff --git a/gcc/c-family/cilk.c b/gcc/c-family/cilk.c
index e75e20c..1167b2b 100644
--- a/gcc/c-family/cilk.c
+++ b/gcc/c-family/cilk.c
@@ -844,6 +844,7 @@ gimplify_cilk_spawn (tree *spawn_p)
call2, build_empty_stmt (EXPR_LOCATION (call1)));
append_to_statement_list (spawn_expr, spawn_p);

+  free (arg_array);
return GS_OK;
  }

diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c
index 21f8684..49f5f7b 100644
--- a/gcc/c/c-array-notation.c
+++ b/gcc/c/c-array-notation.c
@@ -98,7 +98,7 @@ make_triplet_val_inv (location_t loc, tree *value)

  static void
  create_cmp_incr (location_t loc, vec *node, size_t rank,
-vec an_info)
+const vec _info)
  {
for (size_t ii = 0; ii < rank; ii++)
  {
@@ -122,7 +122,7 @@ create_cmp_incr (location_t loc, vec *node, 
size_t rank,
  */

  static vec *
-create_array_refs (location_t loc, vec an_info,
+create_array_refs (location_t loc, const vec _info,
   vec an_loop_info, size_t size, size_t rank)
  {
tree ind_mult, ind_incr;
@@ -205,7 +205,7 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
location_t location = UNKNOWN_LOCATION;
tree loop_with_init = alloc_stmt_list ();
vec an_info = vNULL;
-  vec an_loop_info = vNULL;
+  auto_vec an_loop_info;
enum built_in_function an_type =
  is_cilkplus_reduce_builtin (CALL_EXPR_FN (an_builtin_fn));
if (an_type == BUILT_IN_NONE)
@@ -593,8 +593,7 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree 
*new_var)
  }
append_to_statement_list_force (body, _with_init);

-  an_info.release ();
-  an_loop_info.release ();
+  release_vec_vec (an_info);

return loop_with_init;
  }
@@ -614,7 +613,7 @@ build_array_notation_expr (location_t location, tree lhs, 
tree lhs_origtype,
tree array_expr_lhs = NULL_TREE, array_expr_rhs = NULL_TREE;
tree array_expr = NULL_TREE;
tree an_init = NULL_TREE;
-  vec cond_expr = vNULL;
+  auto_vec cond_expr;
tree body, loop_with_init = alloc_stmt_list();
tree scalar_mods = NULL_TREE;
vec *rhs_array_operand = NULL, *lhs_array_operand = NULL;
@@ -624,7 +623,7 @@ build_array_notation_expr (location_t location, tree lhs, 
tree lhs_origtype,
tree new_modify_expr, new_var = NULL_TREE, builtin_loop = NULL_TREE;
size_t rhs_list_size = 0, lhs_list_size = 0;
vec lhs_an_info = vNULL, rhs_an_info = vNULL;
-  vec lhs_an_loop_info = vNULL, rhs_an_loop_info = vNULL;
+  auto_vec lhs_an_loop_info, rhs_an_loop_info;

/* If either of this is true, an error message must have been send out
   already.  Not necessary to send out multiple error messages.  */
@@ -881,14 +880,9 @@ build_array_notation_expr (location_t location, tree lhs, 
tree lhs_origtype,
  }

[PATCH 7/N] Fix newly introduced memory leak in tree-ssa-loop-ivopts.c

2015-11-26 Thread Martin Liška


Hi.

There's one more patch that fixes really of lot memory leaks related to loop 
ivopts.
The regression was introduced by r230647.

Patch was tested in the series with the rest and the compiler bootstraps 
successfully.

Ready for trunk?
Thanks,
Martin
>From 1f06962c8f126de5aa847882dadba4b95fc89bfc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 25 Nov 2015 13:06:07 +0100
Subject: [PATCH 6/6] Fix newly introduced memory leak in
 tree-ssa-loop-ivopts.c

gcc/ChangeLog:

2015-11-25  Martin Liska  

	* hash-traits.h (struct typed_delete_remove): New function.
	(typed_delete_remove ::remove): Likewise.
	* tree-ssa-loop-ivopts.c (struct iv_common_cand): Replace
	auto_vec with vec.
	(record_common_cand): Replace XNEW with operator new.
---
 gcc/hash-traits.h  | 23 +++
 gcc/tree-ssa-loop-ivopts.c |  6 +++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/gcc/hash-traits.h b/gcc/hash-traits.h
index 450354a..3997ede 100644
--- a/gcc/hash-traits.h
+++ b/gcc/hash-traits.h
@@ -38,6 +38,23 @@ typed_free_remove ::remove (Type *p)
   free (p);
 }
 
+/* Helpful type for removing with delete.  */
+
+template 
+struct typed_delete_remove
+{
+  static inline void remove (Type *p);
+};
+
+
+/* Remove with delete.  */
+
+template 
+inline void
+typed_delete_remove ::remove (Type *p)
+{
+  delete p;
+}
 
 /* Helpful type for a no-op remove.  */
 
@@ -260,6 +277,12 @@ struct nofree_ptr_hash : pointer_hash , typed_noop_remove  {};
 template 
 struct free_ptr_hash : pointer_hash , typed_free_remove  {};
 
+/* Traits for pointer elements that should be freed via delete operand when an
+   element is deleted.  */
+
+template 
+struct delete_ptr_hash : pointer_hash , typed_delete_remove  {};
+
 /* Traits for elements that point to gc memory.  The pointed-to data
must be kept across collections.  */
 
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 98dc451..d7a0e9e 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -253,13 +253,13 @@ struct iv_common_cand
   tree base;
   tree step;
   /* IV uses from which this common candidate is derived.  */
-  vec uses;
+  auto_vec uses;
   hashval_t hash;
 };
 
 /* Hashtable helpers.  */
 
-struct iv_common_cand_hasher : free_ptr_hash 
+struct iv_common_cand_hasher : delete_ptr_hash 
 {
   static inline hashval_t hash (const iv_common_cand *);
   static inline bool equal (const iv_common_cand *, const iv_common_cand *);
@@ -3127,7 +3127,7 @@ record_common_cand (struct ivopts_data *data, tree base,
   slot = data->iv_common_cand_tab->find_slot (, INSERT);
   if (*slot == NULL)
 {
-  *slot = XNEW (struct iv_common_cand);
+  *slot = new iv_common_cand ();
   (*slot)->base = base;
   (*slot)->step = step;
   (*slot)->uses.create (8);
-- 
2.6.3

[PR67335] drop dummy zero from reverse VTA ops, fix infinite recursion

2015-11-26 Thread Alexandre Oliva

VTA's cselib expression hashing compares expressions with the same
hash before adding them to the hash table.  When there is a collision
involving a self-referencing expression, we could get infinite
recursion, in spite of the cycle breakers already in place.  The
problem is currently latent in the trunk, because by chance we don't
get a collision.

Such value cycles are often introduced by reverse_op; most often,
they're indirect, and then value canonicalization takes care of the
cycle, but if the reverse operation simplifies to the original value,
we used to issue a (plus V (const_int 0)), because at some point
adding a plain value V to a location list as a reverse_op equivalence
caused other problems.

(Jakub, do you by any chance still remember what those problems were,
 some 5+ years ago?)

This dummy zero, in turn, caused the value canonicalizer to not fully
realize the equivalence, leading to more complex graphs and,
occasionally, to infinite recursion when comparing such
value-plus-zero expressions recursively.

Simply using V solves the infinite recursion from the PR testcase,
since the extra equivalence and the preexisting value canonicalization
together prevent recursion while the unrecognized equivalence
wouldn't, but it exposed another infinite recursion in
memrefs_conflict_p: get_addr had a cycle breaker in place, to skip RTL
referencing values introduced after the one we're examining, but it
wouldn't break the cycle if the value itself appeared in the
expression being examined.

After removing the dummy zero above, this kind of cycle in the
equivalence graph is no longer introduced by VTA itself, but dummy
zeros are also present in generated code, such as in the 32-bit x86's
pro_epilogue_adjust_stack_si_add epilogue insn generated as part of
the builtin longjmp in _Unwind_RaiseException building libgcc's
unwind-dw2.o.  So, break the recursion cycle for them too.


for  gcc/ChangeLog

PR debug/67355
* var-tracking.c (reverse_op): Don't add dummy zero to reverse
ops that simplify back to the original value.
* alias.c (refs_newer_value_p): Cut off recursion for
expressions containing the original value.
---
 gcc/alias.c|4 ++--
 gcc/var-tracking.c |5 -
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/alias.c b/gcc/alias.c
index 9a642dd..d868da3 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -2072,7 +2072,7 @@ base_alias_check (rtx x, rtx x_base, rtx y, rtx y_base,
 }
 
 /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
-   that of V.  */
+   (or equal to) that of V.  */
 
 static bool
 refs_newer_value_p (const_rtx expr, rtx v)
@@ -2080,7 +2080,7 @@ refs_newer_value_p (const_rtx expr, rtx v)
   int minuid = CSELIB_VAL_PTR (v)->uid;
   subrtx_iterator::array_type array;
   FOR_EACH_SUBRTX (iter, array, expr, NONCONST)
-if (GET_CODE (*iter) == VALUE && CSELIB_VAL_PTR (*iter)->uid > minuid)
+if (GET_CODE (*iter) == VALUE && CSELIB_VAL_PTR (*iter)->uid >= minuid)
   return true;
   return false;
 }
diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 9185bfd..07eea84 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -5774,11 +5774,6 @@ reverse_op (rtx val, const_rtx expr, rtx_insn *insn)
return;
}
   ret = simplify_gen_binary (code, GET_MODE (val), val, arg);
-  if (ret == val)
-   /* Ensure ret isn't VALUE itself (which can happen e.g. for
-  (plus (reg1) (reg2)) when reg2 is known to be 0), as that
-  breaks a lot of routines during var-tracking.  */
-   ret = gen_rtx_fmt_ee (PLUS, GET_MODE (val), val, const0_rtx);
   break;
 default:
   gcc_unreachable ();


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [PR67335] drop dummy zero from reverse VTA ops, fix infinite recursion

2015-11-26 Thread Alexandre Oliva

On Nov 26, 2015, Alexandre Oliva  wrote:

> for  gcc/ChangeLog

>   PR debug/67355
>   * var-tracking.c (reverse_op): Don't add dummy zero to reverse
>   ops that simplify back to the original value.
>   * alias.c (refs_newer_value_p): Cut off recursion for
>   expressions containing the original value.

Doh, I forgot an important part of the email.

The patch was regstrapped on x86_64-linux-gnu and i686-linux-gnu, so far
only in the GCC 5 branch, where both described problems appear.  I
suppose it might be a bit too late for 5.3, though :-(

Is this ok to install in the trunk (assuming regstrap succeeds there
too), and in the GCC 5 branch some time after that?

(I'm going to be away all week next week, so I'd rather wait till I
return to check it in, so that I can address any fallout more promptly)

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [PATCH 7/N] Fix newly introduced memory leak in tree-ssa-loop-ivopts.c

2015-11-26 Thread Bin.Cheng

On Fri, Nov 27, 2015 at 5:08 AM, Martin Liška  wrote:
> Hi.
>
> There's one more patch that fixes really of lot memory leaks related to loop
> ivopts.
> The regression was introduced by r230647.
>
> Patch was tested in the series with the rest and the compiler bootstraps
> successfully.
>
> Ready for trunk?

Hi Martin,
Thanks for fixing my issue.  The IVO part of patch is OK.
Just for me to understand, iv_common_cand is freed via free_ptr_hash,
and thus typed_free_remove.  So what leaks is the iv_use * vector in
struct iv_common_cand, right?  I did forget to free that.
BTW, how do you monitor memory use in GCC, maybe I can run same test
for my future patches.

Thanks,
bin

[patch] Fix PR c++/68290

2015-11-26 Thread Eric Botcazou

Hi,

this is a variant of the just fixed PR c++/68434 on SPARC64/Solaris:

/vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/concepts/auto1.C:16:6: 
internal compiler error: canonical types differ for identical types C and C

with the same underlying issue, which is that TYPE_CANONICAL is computed 
before PLACEHOLDER_TYPE_CONSTRAINTS is set in make_constrained_auto.

Tested on x86_64-suse-linux, OK for the mainline?


2015-11-26  Eric Botcazou  

PR c++/68290
* constraint.cc (make_constrained_auto): Move to...
* pt.c (make_auto_1): Add set_canonical parameter and set
TYPE_CANONICAL on the type only if it is true.
(make_decltype_auto): Adjust call to make_auto_1.
(make_auto): Likewise.
(splice_late_return_type): Likewise.
(make_constrained_auto): ...here.  Call make_auto_1 instead of
make_auto and pass false.  Set TYPE_CANONICAL directly.

-- 
Eric BotcazouIndex: constraint.cc
===
--- constraint.cc	(revision 230924)
+++ constraint.cc	(working copy)
@@ -1353,32 +1353,6 @@ finish_template_introduction (tree tmpl_
 }
 
 
-/* Make a "constrained auto" type-specifier. This is an
-   auto type with constraints that must be associated after
-   deduction.  The constraint is formed from the given
-   CONC and its optional sequence of arguments, which are
-   non-null if written as partial-concept-id.  */
-tree
-make_constrained_auto (tree con, tree args)
-{
-  tree type = make_auto();
-
-  /* Build the constraint. */
-  tree tmpl = DECL_TI_TEMPLATE (con);
-  tree expr;
-  if (VAR_P (con))
-expr = build_concept_check (tmpl, type, args);
-  else
-expr = build_concept_check (build_overload (tmpl, NULL_TREE), type, args);
-
-  tree constr = make_predicate_constraint (expr);
-  PLACEHOLDER_TYPE_CONSTRAINTS (type) = constr;
-
-  /* Attach the constraint to the type declaration. */
-  tree decl = TYPE_NAME (type);
-  return decl;
-}
-
 /* Given the predicate constraint T from a constrained-type-specifier, extract
its TMPL and ARGS.  FIXME why do we need two different forms of
constrained-type-specifier?  */
Index: pt.c
===
--- pt.c	(revision 230924)
+++ pt.c	(working copy)
@@ -23467,10 +23467,10 @@ make_args_non_dependent (vec

Re: [PING] Re: [PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-11-26 Thread Martin Sebor


On 11/26/2015 10:45 AM, Martin Sebor wrote:

On 11/26/2015 04:33 AM, Ramana Radhakrishnan wrote:



Cookies on ARM are 8-bytes [1], but sizeof ((size_t) n) is only 4-bytes,
so this check will fail (We'll ask for 500 bytes, the test here will
only
be looking for 496).

Would it undermine the test for other architectures if I were to swap
out
the != for a >= ? I think that is in line with the "argument large
enough
for the array" that this test is looking for, but would not catch
bugs where
we were allocating more memory than neccessary.

Otherwise I can spin a patch which skips the test for ARM targets.



I didn't want to skip this for ARM, instead something that takes into
account the cookie size - (very gratuitous hack was to just add 4 in a
#ifdef __arm__ block). Something like attached, brown paper bag
warning ;)


Thanks. I'll commit it today after some testing.


I've checked in a slightly modified version of your patch in r230987.



I should probably also check to see if there are other such targets
and try to find a way to generalize the test. (There should be a way
to expose the cookie size to programs, otherwise they have no way to
avoid buffer overflow in array forms of placement new).


There don't appear to be other targets besides ARM that override
the default cookie size.  I opened enhancement c++/68571 - provide
__builtin_cookie_size, to provide an API to make it possible for
programs to query this constant.

Martin

Re: Fix verify_type ICE during Ada bootstrap

2015-11-26 Thread Jan Hubicka

> On Tue, 24 Nov 2015, Jan Hubicka wrote:
> 
> > > > 
> > > > We do already wrap all bases into MEM_REFs at streaming time, it would
> > > > be easy to adjust it to make it effectively alias-set zero.  But of
> > > > course the overhead and the downstream effects of having more MEM_REFs
> > > > (we strip the unneeded ones at stream-in) are unknown (compared to
> > > > the effect of disabling inlining).
> > > 
> > > Hmm, I can test in on Firefox (once I get it back to working condition).
> > 
> > One way would be to keep current MEM_REFS stripping and conditoinal in
> > get_alias_set on strict aliasing, but extend inliner to introduce them at a
> > point -fno-strict-aliasing is inlined to -fstrict-aliasing.  That way we 
> > could
> > drop the code in lto-streamer-out that forcingly set alias set to 0 when
> > get_alias_set == 0 and hopefully get all code transitions right.
> 
> Yeah, that could also work.  We can also rewrite overflow stuff
> this way to do overflow related inlining (in one direction only?).
> That is, when inlining !strict-overflow into strict-overflow code
> re-write arithmetic to unsigned during inlining.
> 
> Sth for next stage1.
> 
> Maybe you can open an enhancement PR for these cases.

I will certainly do.
sadly more I understand the implementation the easier I can consturct wrong
code examples (see testcases bellow).

I think the whole idea of storing TYPE_ALIAS_SET at streaming out time is not
working well. First of all it does not solve optimization attribute and second
we can randomly lose the info (on prestreamed type or types where canonical
type merging prevails with non-0 alias set type) or push random type to alias
set 0 (where canonical type merging prevail the oposite direction).  I do not
see how to easily fix it: canonical type merging can not make difference between
alias set 0 types and others unless we make it clear that the derived types
can not alias (which I think they can).  I suppose only way here would be to
force all alias set 0 types to be variant and revisit all the code to check
it before going to main vairant type.  Compared to that I like
the solution with flag in MEM_REF better, but that of course is an invasive
change where we will need to revisit all MEM_REF construction to set the
flag correctly.

I wonder what would you think of the following patch.  It basically makes
type representation to be completely agnostic of -fstrict-aliasing (it should
be because -fstrict-aliasing is function local property, while types are
not) and makes -fstrict-aliasing to be purely evaulated at a time we ask
TBAA oracle. I disabled the TYPE_ALIAS_SET streaming and instead I assert
that LTO's implementation will be compatible (which caught some surprises
where we tamper with alias set in rtti.c and free_lang_data)

I modifed inliner to make the -fstrict-aliasing infectious. That is instead of
forbidding the inlinng it simply drops the flags in caller. Moreover I play
the game with COMDATs and the fact that whenever you inline comdat w/o
explicit optimization attribute to caller w/o explicit optimization
attribute you may assume that the function body is valid under caller flags.
We already play this game in can_inline_p.  This means that no inlines are
blocked in firefox.

Of course it is always dodgy to change optimization flags after ealry
optimizations that may have made code previously valid wrt -fstrict-aliasing
invalid. I reviewed the 26 uses of -fstrict-aliasing in the compiler and it
seems that only ipa-icf and fold-const may result in such transform. So I
disabled them pre-inlining (which I think is good idea for time being until we
make -fstrict-aliasing part of MEM_REF: both transforms are far less important
than inlining). Once we update to MEM_REF we can easilu drop this.

The patch bootstraps/regtests x86_64-linux and seems to do decent job on
Firefox (actually increasing effectivity of TBAA, only 26 functions are demoted
to -fno-strict-aliasing because of the new code in ipa-inline-transform).  I
plan to do more testing tomorrow (I still can't build the firefox binary to do
some benchamrks).

Honza

* ipa-inline-transform.c (inline_call): Merge -fno-strict-aliasing
if needed.
* ipa-icf-gimple.c (func_checker::compatible_types_p): Pass true
to get_alias_set.
* alias.c (get_alias_set): Add new strict flag.
(new_alias_set): Always produce new set.
(record_component_aliases): Pass true to get_alias_set.
* alias.h (get_alias_set): New parameter STRICT which is false by
default.
* fold-const.c (operand_equal_p): Before inlining don not permit
any transformations that would be invalid if code became strict-aliasing
* tree-streamer-out.c
 (pack_ts_type_common_value_fields): Do not stream TYPE_ALIAS_SET;
sanity check that no alias set 0 info is lost.
* tree-streamer-in.c (unpack_ts_type_common_value_fields): Do not
stream in

RE: Fix 61441

2015-11-26 Thread Saraswati, Sujoy (OSTL)

Hi,

> I think the general principle is:
> 
> * The caller decides whether folding is desirable (whether it would lose
> exceptions, for example).
> 
> * The real.c code is called only when the caller has decided that folding is
> desirable, and should always produce the correct output (which for a
> conversion means producing a quiet NaN from a signaling NaN).
> 
> So both places need changes, but real_convert is where the code that makes
> it a quiet NaN should go.
> 
> Another place in the patch that looks incorrect: the changes to fold-const-
> call.c calling real_powi and checking if the result is a signaling NaN.  The 
> result
> of real_powi should never be a signaling NaN.
> Rather, real_powi should produce a quiet NaN if its input is a signaling NaN,
> and the callers should check if the argument is a signaling NaN when deciding
> whether to fold, not if the result is.

I made the changes accordingly and will post the patches now.

Regards,
Sujoy
 
> --
> Joseph S. Myers
> jos...@codesourcery.com

Fix 61441 [ 1/5] Add REAL_VALUE_ISSIGNALING_NAN

2015-11-26 Thread Saraswati, Sujoy (OSTL)

Hi,
  This series of patches fixes PR61441.  The fix is broken into 5 patches. 

  The first one adds REAL_VALUE_ISSIGNALING_NAN. 

  2015-11-26  Sujoy Saraswati 

   PR tree-optimization/61441
   * real.c (real_issignaling_nan): New.
   * real.h (real_issignaling_nan, REAL_VALUE_ISSIGNALING_NAN): New.

Index: gcc/real.c
===
--- gcc/real.c  (revision 230851)
+++ gcc/real.c  (working copy)
@@ -1195,6 +1195,13 @@ real_isnan (const REAL_VALUE_TYPE *r)
   return (r->cl == rvc_nan);
 }

+/* Determine whether a floating-point value X is a signalling NaN.  */
+bool
+real_issignaling_nan (const REAL_VALUE_TYPE *r)
+{
+  return real_isnan (r) && r->signalling;
+}
+
 /* Determine whether a floating-point value X is finite.  */

 bool
Index: gcc/real.h
===
--- gcc/real.h  (revision 230851)
+++ gcc/real.h  (working copy)
@@ -262,6 +262,9 @@ extern bool real_isinf (const REAL_VALUE_TYPE *);
 /* Determine whether a floating-point value X is a NaN.  */
 extern bool real_isnan (const REAL_VALUE_TYPE *);

+/* Determine whether a floating-point value X is a signalling NaN.  */
+extern bool real_issignaling_nan (const REAL_VALUE_TYPE *);
+
 /* Determine whether a floating-point value X is finite.  */
 extern bool real_isfinite (const REAL_VALUE_TYPE *);

@@ -357,6 +360,9 @@ extern const struct real_format arm_half_format;
 /* Determine whether a floating-point value X is a NaN.  */
 #define REAL_VALUE_ISNAN(x)real_isnan (&(x))

+/* Determine whether a floating-point value X is a signalling NaN.  */
+#define REAL_VALUE_ISSIGNALING_NAN(x)  real_issignaling_nan (&(x))
+
 /* Determine whether a floating-point value X is negative.  */
 #define REAL_VALUE_NEGATIVE(x) real_isneg (&(x))

Fix 61441 [2/5] Use REAL_VALUE_ISSIGNALING_NAN instead of REAL_VALUE_ISNAN where appropriate

2015-11-26 Thread Saraswati, Sujoy (OSTL)

This patch uses REAL_VALUE_ISSIGNALING_NAN instead of REAL_VALUE_ISNAN to avoid 
the operation for sNaN.

Regards,
Sujoy

2015-11-26  Sujoy Saraswati 

PR tree-optimization/61441
* fold-const.c (const_binop): Use REAL_VALUE_ISSIGNALING_NAN instead
of REAL_VALUE_ISNAN to avoid the operation for sNaN.
* simplify-rtx.c (simplify_const_binary_operation) Same.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 230851)
+++ gcc/fold-const.c(working copy)
@@ -1150,9 +1150,10 @@ const_binop (enum tree_code code, tree arg1, tree
   mode = TYPE_MODE (type);

   /* Don't perform operation if we honor signaling NaNs and
-either operand is a NaN.  */
+either operand is a signaling NaN.  */
   if (HONOR_SNANS (mode)
- && (REAL_VALUE_ISNAN (d1) || REAL_VALUE_ISNAN (d2)))
+  && (REAL_VALUE_ISSIGNALING_NAN (d1)
+  || REAL_VALUE_ISSIGNALING_NAN (d2)))
return NULL_TREE;

   /* Don't perform operation if it would raise a division

Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 230851)
+++ gcc/simplify-rtx.c  (working copy)
@@ -3892,7 +3892,8 @@ simplify_const_binary_operation (enum rtx_code cod
  real_convert (, mode, CONST_DOUBLE_REAL_VALUE (op1));

  if (HONOR_SNANS (mode)
- && (REAL_VALUE_ISNAN (f0) || REAL_VALUE_ISNAN (f1)))
+  && (REAL_VALUE_ISSIGNALING_NAN (f0)
+  || REAL_VALUE_ISSIGNALING_NAN (f1)))
return 0;

  if (code == DIV

Fix 61441 [3/5] Remove flag_errno_math check for RINT

2015-11-26 Thread Saraswati, Sujoy (OSTL)

Hi,
 This patch removes flag_errno_math check for RINT, treating it similar to 
nearbyint.  
Regards,
Sujoy

  2015-11-26  Sujoy Saraswati 

PR tree-optimization/61441
* match.pd (f(x) -> x): Removed flag_errno_math check for RINT.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 230851)
+++ gcc/match.pd(working copy)
@@ -2565,16 +2565,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (fns (fns @0))
   (fns @0)))
 /* f(x) -> x if x is integer valued and f does nothing for such values.  */
-(for fns (TRUNC FLOOR CEIL ROUND NEARBYINT)
+(for fns (TRUNC FLOOR CEIL ROUND NEARBYINT RINT)
  (simplify
   (fns integer_valued_real_p@0)
   @0))
-/* Same for rint.  We have to check flag_errno_math because
-   integer_valued_real_p accepts +Inf, -Inf and NaNs as integers.  */
-(if (!flag_errno_math)
- (simplify
-  (RINT integer_valued_real_p@0)
-  @0))

 /* hypot(x,0) and hypot(0,x) -> abs(x).  */
 (simplify

Fix 61441 [4/5] Produce quiet NaN for real value operations

2015-11-26 Thread Saraswati, Sujoy (OSTL)

Hi,
 This patch makes resulting NaN values to be quiet NaN for real value 
operations, irrespective of the flag_signaling_nans flag. The caller has the 
responsibility to avoid the operation if flag_signaling_nans is on.
Regards,
Sujoy

2015-11-26  Sujoy Saraswati 

PR tree-optimization/61441
* real.c (do_add): Make resulting NaN value to be qNaN.
 (do_multiply, do_divide, do_fix_trunc): Same.
 (real_arithmetic, real_ldexp, real_convert): Same.
 (real_isinteger): Updated comment stating it returns false for 
sNaN.

===
diff -u -p a/gcc/real.c b/gcc/real.c
--- a/gcc/real.c2015-11-25 10:35:29.059583459 +0530
+++ b/gcc/real.c2015-11-25 15:07:53.604085529 +0530
@@ -541,6 +541,10 @@ do_add (REAL_VALUE_TYPE *r, const REAL_V
 case CLASS2 (rvc_normal, rvc_inf):
   /* R + Inf = Inf.  */
   *r = *b;
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   r->sign = sign ^ subtract_p;
   return false;

@@ -554,6 +558,10 @@ do_add (REAL_VALUE_TYPE *r, const REAL_V
 case CLASS2 (rvc_inf, rvc_normal):
   /* Inf + R = Inf.  */
   *r = *a;
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   return false;

 case CLASS2 (rvc_inf, rvc_inf):
@@ -676,6 +684,10 @@ do_multiply (REAL_VALUE_TYPE *r, const R
 case CLASS2 (rvc_nan, rvc_nan):
   /* ANY * NaN = NaN.  */
   *r = *b;
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   r->sign = sign;
   return false;

@@ -684,6 +696,10 @@ do_multiply (REAL_VALUE_TYPE *r, const R
 case CLASS2 (rvc_nan, rvc_inf):
   /* NaN * ANY = NaN.  */
   *r = *a;
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   r->sign = sign;
   return false;

@@ -826,6 +842,10 @@ do_divide (REAL_VALUE_TYPE *r, const REA
 case CLASS2 (rvc_nan, rvc_nan):
   /* ANY / NaN = NaN.  */
   *r = *b;
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   r->sign = sign;
   return false;

@@ -834,6 +854,10 @@ do_divide (REAL_VALUE_TYPE *r, const REA
 case CLASS2 (rvc_nan, rvc_inf):
   /* NaN / ANY = NaN.  */
   *r = *a;
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   r->sign = sign;
   return false;

@@ -964,6 +988,10 @@ do_fix_trunc (REAL_VALUE_TYPE *r, const
 case rvc_zero:
 case rvc_inf:
 case rvc_nan:
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   break;

 case rvc_normal:
@@ -1022,7 +1050,13 @@ real_arithmetic (REAL_VALUE_TYPE *r, int

 case MIN_EXPR:
   if (op1->cl == rvc_nan)
+  {
*r = *op1;
+/* Make resulting NaN value to be qNaN. The caller has the
+   responsibility to avoid the operation if flag_signaling_nans
+   is on.  */
+r->signalling = 0;
+  }
   else if (do_compare (op0, op1, -1) < 0)
*r = *op0;
   else
@@ -1031,7 +1065,13 @@ real_arithmetic (REAL_VALUE_TYPE *r, int

 case MAX_EXPR:
   if (op1->cl == rvc_nan)
+  {
*r = *op1;
+/* Make resulting NaN value to be qNaN. The caller has the
+   responsibility to avoid the operation if flag_signaling_nans
+   is on.  */
+r->signalling = 0;
+  }
   else if (do_compare (op0, op1, 1) < 0)
*r = *op1;
   else
@@ -1162,6 +1202,10 @@ real_ldexp (REAL_VALUE_TYPE *r, const RE
 case rvc_zero:
 case rvc_inf:
 case rvc_nan:
+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  r->signalling = 0;
   break;

 case rvc_normal:
@@ -2731,6 +2775,12 @@ real_convert (REAL_VALUE_TYPE *r, format

   round_for_format (fmt, r);

+  /* Make resulting NaN value to be qNaN. The caller has the
+ responsibility to avoid the operation if flag_signaling_nans
+ is on.  */
+  if (r->cl == rvc_nan)
+r->signalling = 0;
+
   /* round_for_format de-normalizes denormals.  Undo just that

Fix 61441 [5/5] Disable various transformations for signaling NaN operands

2015-11-26 Thread Saraswati, Sujoy (OSTL)

Hi,
 This patch avoids various transformations with signaling NaN operands when 
flag_signaling_nans is on, to avoid folding which would lose exceptions. A test 
case for this change is also added as part of this patch.
Regards,
Sujoy
  
  2015-11-26  Sujoy Saraswati 

PR tree-optimization/61441
* fold-const.c (const_binop): Convert sNaN to qNaN when
flag_signaling_nans is off.
(const_unop): Avoid the operation, other than NEGATE and
ABS, if flag_signaling_nans is on and the operand is an sNaN.
(fold_convert_const_real_from_real): Avoid the operation if
flag_signaling_nans is on and the operand is an sNaN.
(integer_valued_real_unary_p): Update comment stating it
returns false for sNaN values.
(integer_valued_real_binary_p, integer_valued_real_call_p): Same.
(integer_valued_real_single_p): Same.
(integer_valued_real_invalid_p, integer_valued_real_p): Same.
* fold-const-call.c (fold_const_pow): Avoid the operation
if flag_signaling_nans is on and the operand is an sNaN.
(fold_const_builtin_load_exponent) Same.
(fold_const_call_sss): Same for BUILT_IN_POWI.
* gimple-fold.c (gimple_assign_integer_valued_real_p): Same.
(gimple_call_integer_valued_real_p): Same.
(gimple_phi_integer_valued_real_p): Same.
(gimple_stmt_integer_valued_real_p): Same.
* simplify-rtx.c (simplify_const_unary_operation): Avoid the
operation if flag_signaling_nans is on and the operand is an sNaN.
(simplify_const_binary_operation): Same.
* tree-ssa-math-opts.c (gimple_expand_builtin_pow): Avoid the
operation if flag_signaling_nans is on and the operand is an sNaN.

PR tree-optimization/61441
* gcc.dg/pr61441.c: New testcase.

===
diff -u -p a/gcc/fold-const.c b/gcc/fold-const.c
--- a/gcc/fold-const.c  2015-11-25 15:24:49.656116740 +0530
+++ b/gcc/fold-const.c  2015-11-25 15:25:07.712117294 +0530
@@ -1166,9 +1166,21 @@ const_binop (enum tree_code code, tree a
   /* If either operand is a NaN, just return it.  Otherwise, set up
 for floating-point trap; we return an overflow.  */
   if (REAL_VALUE_ISNAN (d1))
-   return arg1;
+  {
+/* Make resulting NaN value to be qNaN when flag_signaling_nans
+   is off.  */
+d1.signalling = 0;
+t = build_real (type, d1);
+return t;
+  }
   else if (REAL_VALUE_ISNAN (d2))
-   return arg2;
+  {
+/* Make resulting NaN value to be qNaN when flag_signaling_nans
+   is off.  */
+d2.signalling = 0;
+t = build_real (type, d2);
+return t;
+  }

   inexact = real_arithmetic (, code, , );
   real_convert (, mode, );
@@ -1538,6 +1550,15 @@ const_binop (enum tree_code code, tree t
 tree
 const_unop (enum tree_code code, tree type, tree arg0)
 {
+  /* Don't perform the operation, other than NEGATE and ABS, if
+ flag_signaling_nans is on and the operand is a NaN.  */
+  if (TREE_CODE (arg0) == REAL_CST
+  && HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg0)))
+  && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg0))
+  && code != NEGATE_EXPR
+  && code != ABS_EXPR)
+return NULL_TREE;
+
   switch (code)
 {
 CASE_CONVERT:
@@ -1949,6 +1970,12 @@ fold_convert_const_real_from_real (tree
   REAL_VALUE_TYPE value;
   tree t;

+  /* Don't perform the operation if flag_signaling_nans is on
+ and the operand is a NaN.  */
+  if (HONOR_SNANS (TYPE_MODE (TREE_TYPE (arg1)))
+  && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
+return NULL_TREE;
+
   real_convert (, TYPE_MODE (type), _REAL_CST (arg1));
   t = build_real (type, value);

@@ -13420,7 +13447,7 @@ tree_single_nonzero_warnv_p (tree t, boo

 /* Return true if the floating point result of (CODE OP0) has an
integer value.  We also allow +Inf, -Inf and NaN to be considered
-   integer values.
+   integer values. Return false for signalling NaN.

DEPTH is the current nesting depth of the query.  */

@@ -13453,7 +13480,7 @@ integer_valued_real_unary_p (tree_code c

 /* Return true if the floating point result of (CODE OP0 OP1) has an
integer value.  We also allow +Inf, -Inf and NaN to be considered
-   integer values.
+   integer values. Return false for signalling NaN.

DEPTH is the current nesting depth of the query.  */

@@ -13477,8 +13504,8 @@ integer_valued_real_binary_p (tree_code

 /* Return true if the floating point result of calling FNDECL with arguments
ARG0 and ARG1 has an integer value.  We also allow +Inf, -Inf and NaN to be
-   considered integer values.  If FNDECL takes fewer than 2 arguments,
-   the remaining ARGn are null.
+   considered integer values. Return

Re: RFC: C++ delayed folding merge

2015-11-26 Thread Thomas Schwinge

Hi!

On Mon, 9 Nov 2015 01:30:34 -0500, Jason Merrill  wrote:
> I'm planning to merge the C++ delayed folding branch this week [...]

In r230554,
,
Cesar already fixed up cp_fold_r to also care for OACC_LOOP, but another
thing I just wondered about is whether this function also needs to do
something for operand 6 (OMP_FOR_ORIG_DECLS) of OMP_FOR and its variants?
If not, would it make sense to note that in a source code comment, given
that any other operand is being handled?

> --- a/gcc/cp/cp-gimplify.c
> +++ b/gcc/cp/cp-gimplify.c

> +/* Perform any pre-gimplification folding of C++ front end trees to
> +   GENERIC.
> +   Note:  The folding of none-omp cases is something to move into
> + the middle-end.  As for now we have most foldings only on GENERIC
> + in fold-const, we need to perform this before transformation to
> + GIMPLE-form.  */
> +
> +static tree
> +cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data)
> +{
> +  tree stmt;
> +  struct cp_genericize_data *wtd = (struct cp_genericize_data *) data;
> +  enum tree_code code;
> +
> +  *stmt_p = stmt = cp_fold (*stmt_p, wtd->fold_hash);
> +
> +  code = TREE_CODE (stmt);
> +  if (code == OMP_FOR || code == OMP_SIMD || code == OMP_DISTRIBUTE
> +  || code == OMP_TASKLOOP || code == CILK_FOR || code == CILK_SIMD)
> +{
> +  tree x;
> +  int i, n;
> +
> +  cp_walk_tree (_FOR_BODY (stmt), cp_fold_r, data, NULL);
> +  cp_walk_tree (_FOR_CLAUSES (stmt), cp_fold_r, data, NULL);
> +  cp_walk_tree (_FOR_INIT (stmt), cp_fold_r, data, NULL);
> +  x = OMP_FOR_COND (stmt);
> +  if (x && TREE_CODE_CLASS (TREE_CODE (x)) == tcc_comparison)
> + {
> +   cp_walk_tree (_OPERAND (x, 0), cp_fold_r, data, NULL);
> +   cp_walk_tree (_OPERAND (x, 1), cp_fold_r, data, NULL);
> + } 
> +  else if (x && TREE_CODE (x) == TREE_VEC)
> + {
> +   n = TREE_VEC_LENGTH (x);
> +   for (i = 0; i < n; i++)
> + {
> +   tree o = TREE_VEC_ELT (x, i);
> +   if (o && TREE_CODE_CLASS (TREE_CODE (o)) == tcc_comparison)
> + cp_walk_tree (_OPERAND (o, 1), cp_fold_r, data, NULL);
> + }
> + }
> +  x = OMP_FOR_INCR (stmt);
> +  if (x && TREE_CODE (x) == TREE_VEC)
> + {
> +   n = TREE_VEC_LENGTH (x);
> +   for (i = 0; i < n; i++)
> + {
> +   tree o = TREE_VEC_ELT (x, i);
> +   if (o && TREE_CODE (o) == MODIFY_EXPR)
> + o = TREE_OPERAND (o, 1);
> +   if (o && (TREE_CODE (o) == PLUS_EXPR || TREE_CODE (o) == 
> MINUS_EXPR
> + || TREE_CODE (o) == POINTER_PLUS_EXPR))
> + {
> +   cp_walk_tree (_OPERAND (o, 0), cp_fold_r, data, NULL);
> +   cp_walk_tree (_OPERAND (o, 1), cp_fold_r, data, NULL);
> + }
> + }
> + }
> +  cp_walk_tree (_FOR_PRE_BODY (stmt), cp_fold_r, data, NULL);
> +  *walk_subtrees = 0;
> +}
> +
> +  return NULL;
> +}


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH] Convert manual unsigned +/- overflow checking into {ADD,SUB}_OVERFLOW (PR target/67089)

2015-11-26 Thread Richard Biener

On Wed, 25 Nov 2015, Marc Glisse wrote:

> On Wed, 25 Nov 2015, Jakub Jelinek wrote:
> 
> > > The same is true whether we write it b > a or (a - b) > a (I don't think
> > > PRE
> > > + SCCVN avoid increasing register pressure).
> > > 
> > > > So, I'd really prefer doing x-y>x to y>x only for single use.
> > > 
> > > Ok (for now).
> > 
> > Do you plan to work on that (my match.pd experience is smaller than yours),
> > or should I add to my todo list?
> 
> Are we talking stage 3 or next stage 1? If you want something for stage 3, I
> think you'll have to do it, it shouldn't be much longer than
> 
> (for cmp (gt le)
>  (simplify
>   (cmp (minus:s @0 @1) @0)
>   (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
>(cmp @1 @0
> 
> and a similar one for x against floats or whatever, but I could be wrong)

floats should be fine here.  But eventually saturating types need to
be excluded?

Note that the :s on the minus has no effect as the result is a
single operation.  If you want to restrict this nevertheless
you need a && single_use (@2) and capture the minus with
(minus@2 @0 @1) instead.

Richard.

[gomp4] Merge trunk r230627 (2015-11-19) into gomp-4_0-branch

2015-11-26 Thread Thomas Schwinge

Hi!

Committed to gomp-4_0-branch in r230925:

commit 3ce2a2ee891a51909c23b2cb1a13f230c6a75e36
Merge: d60891e 62efaf6
Author: tschwinge 
Date:   Thu Nov 26 09:02:30 2015 +

svn merge -r 230275:230627 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230925 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: GCC 5.3 Status Report (2015-11-20)

2015-11-26 Thread Richard Biener

On Wed, Nov 25, 2015 at 7:59 PM, David Edelsohn  wrote:
> On Wed, Nov 25, 2015 at 11:57 AM, Paolo Bonzini  wrote:
>
>> Patch committed to upstream libtool, thanks for your understanding.
>
> Great!
>
> How can I have the patch backported to GCC trunk and 5-branch libtool,
> and then rebuild configure with the appropriate versions of autoconf?
> I have not been able to install the correct versions of autoconf to
> produce configure without gratuitous changes.

I've never had problems building the FSF tarballs of the versions we use
and using those to re-generate files.  Sometimes it shows others
re-generated with slightly different versions (thus producing unrelated
changes) though.

I originally planned to do RC1 tomorrow (Friday) but if I don't see any
traces of this patch I'll wait until Monday but no longer.

Thanks,
Richard.

> Thanks, David

Re: RFA: PATCH to gimple_canonical_types_compatible_p for middle-end/66214

2015-11-26 Thread Richard Biener

On Wed, Nov 25, 2015 at 4:55 PM, Jason Merrill  wrote:
> The problem here is that we're trying to compare the TYPE_FIELDS of two
> variants of an incomplete type, which doesn't make sense; we shouldn't
> expect TYPE_FIELDS of an incomplete type to be meaningful.
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?

Hmm, originally the code wasn't supposed to be called for incomplete types
as you generally can't compare them.  But now that the verifier uses the
predicate it instead should have this guard.

Richard.

Re: RFA: PATCH to gimple_canonical_types_compatible_p for middle-end/66214

2015-11-26 Thread Richard Biener

On Thu, Nov 26, 2015 at 7:32 AM, Jan Hubicka  wrote:
> Hi,
> what aout this?
>
> Index: tree.c
> ===
> --- tree.c  (revision 230924)
> +++ tree.c  (working copy)
> @@ -13424,6 +13424,12 @@ gimple_canonical_types_compatible_p (con
>{
> tree f1, f2;
>
> +   /* Don't try to compare variants of an incomplete type, before
> +  TYPE_FIELDS has been copied around.  */
> +   if (!COMPLETE_TYPE_P (t1) && !COMPLETE_TYPE_P (t2))
> + return true;
> +

As said, you shouldn't call this function on variants.  It wasn't
designed for that.  Please do the above
check where necessary in the caller (the verifier).

Overloading this for verification and canonical type compute now bites back...

RIchard.

> if (TYPE_REVERSE_STORAGE_ORDER (t1) != TYPE_REVERSE_STORAGE_ORDER 
> (t2))
>   return false;
>
> @@ -13710,28 +13716,35 @@ verify_type (const_tree t)
> }
>  }
>else if (RECORD_OR_UNION_TYPE_P (t))
> -for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
> -  {
> -   /* TODO: verify properties of decls.  */
> -   if (TREE_CODE (fld) == FIELD_DECL)
> - ;
> -   else if (TREE_CODE (fld) == TYPE_DECL)
> - ;
> -   else if (TREE_CODE (fld) == CONST_DECL)
> - ;
> -   else if (TREE_CODE (fld) == VAR_DECL)
> - ;
> -   else if (TREE_CODE (fld) == TEMPLATE_DECL)
> - ;
> -   else if (TREE_CODE (fld) == USING_DECL)
> - ;
> -   else
> - {
> -   error ("Wrong tree in TYPE_FIELDS list");
> -   debug_tree (fld);
> -   error_found = true;
> - }
> -  }
> +{
> +  if (TYPE_FIELDS (t) && !COMPLETE_TYPE_P (t) && in_lto_p)
> +   {
> + error ("TYPE_FIELDS defined in incomplete type");
> + error_found = true;
> +   }
> +  for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
> +   {
> + /* TODO: verify properties of decls.  */
> + if (TREE_CODE (fld) == FIELD_DECL)
> +   ;
> + else if (TREE_CODE (fld) == TYPE_DECL)
> +   ;
> + else if (TREE_CODE (fld) == CONST_DECL)
> +   ;
> + else if (TREE_CODE (fld) == VAR_DECL)
> +   ;
> + else if (TREE_CODE (fld) == TEMPLATE_DECL)
> +   ;
> + else if (TREE_CODE (fld) == USING_DECL)
> +   ;
> + else
> +   {
> + error ("Wrong tree in TYPE_FIELDS list");
> + debug_tree (fld);
> + error_found = true;
> +   }
> +   }
> +}
>else if (TREE_CODE (t) == INTEGER_TYPE
>|| TREE_CODE (t) == BOOLEAN_TYPE
>|| TREE_CODE (t) == OFFSET_TYPE

Re: Fix 61441 [3/5] Remove flag_errno_math check for RINT

2015-11-26 Thread Richard Biener

On Thu, Nov 26, 2015 at 9:31 AM, Saraswati, Sujoy (OSTL)
 wrote:
> Hi,
>  This patch removes flag_errno_math check for RINT, treating it similar to 
> nearbyint.
> Regards,
> Sujoy

Ok.

Richard.

>   2015-11-26  Sujoy Saraswati 
>
> PR tree-optimization/61441
> * match.pd (f(x) -> x): Removed flag_errno_math check for RINT.
>
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 230851)
> +++ gcc/match.pd(working copy)
> @@ -2565,16 +2565,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(fns (fns @0))
>(fns @0)))
>  /* f(x) -> x if x is integer valued and f does nothing for such values.  */
> -(for fns (TRUNC FLOOR CEIL ROUND NEARBYINT)
> +(for fns (TRUNC FLOOR CEIL ROUND NEARBYINT RINT)
>   (simplify
>(fns integer_valued_real_p@0)
>@0))
> -/* Same for rint.  We have to check flag_errno_math because
> -   integer_valued_real_p accepts +Inf, -Inf and NaNs as integers.  */
> -(if (!flag_errno_math)
> - (simplify
> -  (RINT integer_valued_real_p@0)
> -  @0))
>
>  /* hypot(x,0) and hypot(0,x) -> abs(x).  */
>  (simplify

Re: [PATCH 3/4] [ARM] PR63870 Add test cases

2015-11-26 Thread James Greenhalgh

On Thu, Nov 26, 2015 at 09:41:15AM +, Charles Baylis wrote:
> Hi James,
> 
> Ping. This needs an ack from an AArch64 reviewer/maintainer

Fine by me, it will considerably clean up my test results for ARM!

Thanks,
James

Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-26 Thread Richard Biener

On Wed, Nov 25, 2015 at 4:54 PM, Michael Matz  wrote:
> Hi,
>
> On Wed, 25 Nov 2015, Bernd Schmidt wrote:
>
>> So here's a very basic version which I think is appropriate for the
>> current stage, and can be extended later. Ok if it passes testing?
>
> When we're improving that place, we should really only consider ASMs that
> change memory state to be problematic ("&& gimple_vdef (stmt)").

Ok with the change suggested by Micha for the asm()s.  Note that I
originally used gimple_vuse () instead of gimple_vdef () as even
reading random memory is a barrier for the compiler to move stores
across it (not reads, of course).  Which is why I also considered
pure (global memory reading) calls to be a barrier (for the stores).

Of course as we don't consider regular assign statement reads (or stores)
to be a "barrier" in the sense that matters here (we're not looking for
memory optimization barriers!) this might be moot and then the
middle-end will effectively require all synchronization barriers (which we
are looking for(?)) to appear as clobbering memory.

Thanks,
Richard.

>
> Ciao,
> Michael.

Re: [C++ Patch] PR 68087

2015-11-26 Thread Richard Biener

On Wed, Nov 25, 2015 at 5:09 PM, Paolo Carlini  wrote:
> Hi,
>
> On 11/25/2015 04:59 PM, Markus Trippelsdorf wrote:
>>>
>>> Index: cp/constexpr.c
>>> ===
>>> --- cp/constexpr.c  (revision 230865)
>>> +++ cp/constexpr.c  (working copy)
>>> @@ -1799,8 +1799,8 @@ cxx_eval_array_reference (const constexpr_ctx *ctx
>>> gcc_unreachable ();
>>>   }
>>>   -  i = tree_to_shwi (index);
>>> -  if (i < 0)
>>> +  if (!tree_fits_shwi_p (index)
>>> +  || (i = tree_to_shwi (index)) < 0)
>>
>> Last time Richard pointed out that:
>>  if (wi::lts_p (index, 0))
>> is more idiomatic.
>
> I see, but isn't used anywhere else in the whole cp/ and in the case at
> issue we still need to assign to 'i', thus I would rather follow the
> existing practice in the front-end...

You can also use tree_int_cst_sgn (index) == -1 (which uses wi::neg_p
(index) internally).

Richard.

> Paolo.

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-11-26 Thread Jakub Jelinek

On Tue, Nov 10, 2015 at 11:39:36AM +0100, Jakub Jelinek wrote:
> On Tue, Nov 03, 2015 at 05:25:53PM +0300, Alexander Monakov wrote:
> > Here's an alternative patch that does not depend on exposure of 
> > shared-memory
> > address space, and does not try to use pass_late_lower_omp.  It's based on
> > Bernd's suggestion to transform
> 
> FYI, I've committed a new testcase to gomp-4_5-branch that covers various
> target data sharing/team sharing/privatization parallel
> sharing/privatization offloading cases.

And another testcase, this time using only OpenMP 4.0 features, and trying
to test the behavior of addressable vars in declare target functions where
it is not clear if they are executed in teams, distribute or parallel for
contexts.

Wanted to look what LLVM generates here (tried llvm trunk), but they are
unable to parse #pragma omp distribute or #pragma omp declare target,
so it is hard to guess anything.

Tested with XeonPhi offloading as well as host fallback, committed to trunk.

2015-11-26  Jakub Jelinek  

* testsuite/libgomp.c/target-35.c: New test.

--- libgomp/testsuite/libgomp.c/target-35.c (revision 0)
+++ libgomp/testsuite/libgomp.c/target-35.c (working copy)
@@ -0,0 +1,129 @@
+#include 
+#include 
+
+#pragma omp declare target
+__attribute__((noinline))
+void
+foo (int x, int y, int z, int *a, int *b)
+{
+  if (x == 0)
+{
+  int i, j;
+  for (i = 0; i < 64; i++)
+   #pragma omp parallel for shared (a, b)
+   for (j = 0; j < 32; j++)
+ foo (3, i, j, a, b);
+}
+  else if (x == 1)
+{
+  int i, j;
+  #pragma omp distribute dist_schedule (static, 1)
+  for (i = 0; i < 64; i++)
+   #pragma omp parallel for shared (a, b)
+   for (j = 0; j < 32; j++)
+ foo (3, i, j, a, b);
+}
+  else if (x == 2)
+{
+  int j;
+  #pragma omp parallel for shared (a, b)
+  for (j = 0; j < 32; j++)
+   foo (3, y, j, a, b);
+}
+  else
+{
+  #pragma omp atomic
+  b[y] += z;
+  #pragma omp atomic
+  *a += 1;
+}
+}
+
+__attribute__((noinline))
+int
+bar (int x, int y, int z)
+{
+  int a, b[64], i;
+  a = 8;
+  for (i = 0; i < 64; i++)
+b[i] = i;
+  foo (x, y, z, , b);
+  if (x == 0)
+{
+  if (a != 8 + 64 * 32)
+   return 1;
+  for (i = 0; i < 64; i++)
+   if (b[i] != i + 31 * 32 / 2)
+ return 1;
+}
+  else if (x == 1)
+{
+  int c = omp_get_num_teams ();
+  int d = omp_get_team_num ();
+  int e = d;
+  int f = 0;
+  for (i = 0; i < 64; i++)
+   if (i == e)
+ {
+   if (b[i] != i + 31 * 32 / 2)
+ return 1;
+   f++;
+   e = e + c;
+ }
+   else if (b[i] != i)
+ return 1;
+  if (a < 8 || a > 8 + f * 32)
+   return 1;
+}
+  else if (x == 2)
+{
+  if (a != 8 + 32)
+   return 1;
+  for (i = 0; i < 64; i++)
+   if (b[i] != i + (i == y ? 31 * 32 / 2 : 0))
+ return 1;
+}
+  else if (x == 3)
+{
+  if (a != 8 + 1)
+   return 1;
+  for (i = 0; i < 64; i++)
+   if (b[i] != i + (i == y ? z : 0))
+ return 1;
+}
+  return 0;
+}
+#pragma omp end declare target
+
+int
+main ()
+{
+  int i, j, err = 0;
+  #pragma omp target map(tofrom:err)
+  #pragma omp teams reduction(+:err)
+  err += bar (0, 0, 0);
+  if (err)
+abort ();
+  #pragma omp target map(tofrom:err)
+  #pragma omp teams reduction(+:err)
+  err += bar (1, 0, 0);
+  if (err)
+abort ();
+  #pragma omp target map(tofrom:err)
+  #pragma omp teams reduction(+:err)
+  #pragma omp distribute
+  for (i = 0; i < 64; i++)
+err += bar (2, i, 0);
+  if (err)
+abort ();
+  #pragma omp target map(tofrom:err)
+  #pragma omp teams reduction(+:err)
+  #pragma omp distribute
+  for (i = 0; i < 64; i++)
+  #pragma omp parallel for reduction(+:err)
+for (j = 0; j < 32; j++)
+  err += bar (3, i, j);
+  if (err)
+abort ();
+  return 0;
+}


Jakub

Re: [PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions

2015-11-26 Thread Kyrill Tkachov



On 24/11/15 00:15, Segher Boessenkool wrote:

On Thu, Nov 19, 2015 at 03:20:22PM +, Kyrill Tkachov wrote:

Hmmm, so the answer to that is a bit further down the validate_replacement:
path.
It's the code after the big comment:
   /* See if this is a PARALLEL of two SETs where one SET's destination is
  a register that is unused and this isn't marked as an instruction that
  might trap in an EH region.  In that case, we just need the other SET.
  We prefer this over the PARALLEL.

  This can occur when simplifying a divmod insn.  We *must* test for this
  case here because the code below that splits two independent SETs
  doesn't
  handle this case correctly when it updates the register status.

  It's pointless doing this if we originally had two sets, one from
  i3, and one from i2.  Combining then splitting the parallel results
  in the original i2 again plus an invalid insn (which we delete).
  The net effect is only to move instructions around, which makes
  debug info less accurate.  */

The code extracts all the valid sets inside the PARALLEL and calls
recog_for_combine on them
individually, ignoring the clobber.

Before I made this use is_parallel_of_n_reg_sets the code used to test
if it is a parallel of two sets, and no clobbers allowed.  So it would
never allow a clobber of zero.  But now it does.  I'll fix this in
is_parallel_of_n_reg_sets.

Thanks for finding the problem!


Thanks for fixing the wrong-code issue.
As I mentioned on IRC, this patch improves codegen on aarch64 as well.
I've re-checked SPEC2006 and it seems to improve codegen around 
multiply-extend-accumulate
instructions. For example the sequence:
movw4, 64
movx1, 24
smaddlx1, w9, w4, x1 // multiply-sign-extend-accumulate
addx1, x3, x1

becomes something like this:
movw3, 64
smaddlx1, w9, w3, x0
addx1, x1, 24  // constant 24 propagated into the add

Another was transforming the muliply-extend into something cheaper:
movx0, 40
movw22, 32
umaddlx22, w21, w22, x0 // multiply-zero-extend-accumulate

changed becomes:
ubfizx22, x21, 5, 32 // ASHIFT+extend
addx22, x22, 40

which should be always beneficial.
From what I can see we don't lose any of the multiply-extend-accumulate
 opportunities that we gained from the original combine patch.

So can we take this patch in as well?

Thanks,
Kyrill


Segher

[PATCH] GCC-5 Backport of pr67037

2015-11-26 Thread Bernd Edlinger


Hi,

I have boot-strapped & reg-tested this patch now
on arm-linux-gnueabihf with all languages enabled.

Is it OK for the gcc-5-branch?


Thanks
Bernd.gcc:
2015-11-26  Bernd Edlinger  

Backport from mainline
2015-09-30  Bernd Edlinger  

PR rtl-optimization/67037
* lra-constraints.c (process_addr_reg): Use copy_rtx when necessary.

testsuite:
2015-11-26  Bernd Edlinger  

Backport from mainline
2015-09-30  Bernd Edlinger  

PR rtl-optimization/67037
* gcc.c-torture/execute/pr67037.c: New test.
--- gcc/lra-constraints.c.jj	2015-09-25 23:06:08.0 +0200
+++ gcc/lra-constraints.c	2015-09-29 13:29:01.695783261 +0200
@@ -1339,7 +1339,7 @@ process_addr_reg (rtx *loc, bool check_o
   if (after != NULL)
 {
   start_sequence ();
-  lra_emit_move (reg, new_reg);
+  lra_emit_move (before_p ? copy_rtx (reg) : reg, new_reg);
   emit_insn (*after);
   *after = get_insns ();
   end_sequence ();
--- /dev/null	2015-09-28 14:17:37.079363115 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr67037.c	2015-09-30 08:04:22.794285807 +0200
@@ -0,0 +1,49 @@
+long (*extfunc)();
+
+static inline void lstrcpynW( short *d, const short *s, int n )
+{
+unsigned int count = n;
+
+while ((count > 1) && *s)
+{
+count--;
+*d++ = *s++;
+}
+if (count) *d = 0;
+}
+
+int __attribute__((noinline,noclone))
+badfunc(int u0, int u1, int u2, int u3,
+  short *fsname, unsigned int fsname_len)
+{
+static const short ntfsW[] = {'N','T','F','S',0};
+char superblock[2048+3300];
+int ret = 0;
+short *p;
+
+if (extfunc())
+return 0;
+p = (void *)extfunc();
+if (p != 0)
+goto done;
+
+extfunc(superblock);
+
+lstrcpynW(fsname, ntfsW, fsname_len);
+
+ret = 1;
+done:
+return ret;
+}
+
+static long f()
+{
+return 0;
+}
+
+int main()
+{
+short buf[6];
+extfunc = f;
+return !badfunc(0, 0, 0, 0, buf, 6);
+}

Re: [PATCH] GCC-5 Backport of pr67037

2015-11-26 Thread Richard Biener

On Thu, 26 Nov 2015, Bernd Edlinger wrote:

> 
> Hi,
> 
> I have boot-strapped & reg-tested this patch now
> on arm-linux-gnueabihf with all languages enabled.
> 
> Is it OK for the gcc-5-branch?

Ok.

Thanks,
Richard.

[PATCH, PR target/68416, i386, MPX] Add bounds registers to ALL_REGS set

2015-11-26 Thread Ilya Enkovich

Hi,

This patch fixes redundant bndmov problem by adding bounds registers to
ALL_REGS set.  This patch was bootstrapped and regtested on
x86_64-unknown-linux-gnu.  OK for trunk and gcc-5-branch?

Thanks,
Ilya
--
gcc/

2015-11-26  Vladimir Makarov  

PR target/68416
* config/i386/i386.h (enum reg_class): Add
bounds registers to ALL_REGS.

gcc/testsuite/

2015-11-26  Ilya Enkovich  

PR target/68416
* gcc.target/i386/mpx/pr68416.c: New test.


diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ceda472..e69c9cc 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1457,7 +1457,7 @@ enum reg_class
 { 0x1ff1,0xffe0,   0x1f },   /* FLOAT_INT_SSE_REGS */\
{ 0x0,   0x0, 0x1fc0 },   /* MASK_EVEX_REGS */   \
{ 0x0,   0x0, 0x1fe0 },   /* MASK_REGS */ \
-{ 0x,0x, 0x1fff }\
+{ 0x,0x,0x1 }\
 }
 
 /* The same information, inverted:
diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68416.c 
b/gcc/testsuite/gcc.target/i386/mpx/pr68416.c
new file mode 100644
index 000..10587ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr68416.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mmpx -fcheck-pointer-bounds" } */
+/* { dg-final { scan-assembler-not "bndmov" } } */
+
+int
+foo(int **arr, int i)
+{
+  return (*arr)[i];
+}

Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-26 Thread Kyrill Tkachov



On 20/11/15 16:44, Andre Vieira wrote:

Hi Kyrill
On 20/11/15 11:51, Kyrill Tkachov wrote:

Hi Andre,

On 18/11/15 09:44, Andre Vieira wrote:

On 17/11/15 10:10, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote:

On 16/11/15 12:07, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:

Hi,

   This patch changes the target support mechanism to make it
recognize any ARM 'M' profile as a non-neon supporting target. The
current check only tests for armv6 architectures and earlier, and
does not account for armv7-m.

   This is correct because there is no 'M' profile that supports neon
and the current test is not sufficient to exclude armv7-m.

   Tested by running regressions for this testcase for various ARM
targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira 

 * gcc/testsuite/lib/target-supports.exp
(check_effective_target_arm_neon_ok_nocache): Added check
   for M profile.



 From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00
2001
From: Andre Simoes Dias Vieira 
Date: Fri, 13 Nov 2015 11:16:34 +
Subject: [PATCH] Disable neon testing for armv7-m

---
  gcc/testsuite/lib/target-supports.exp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2854,7 +2854,7 @@ proc
check_effective_target_arm_neon_ok_nocache { } {
  int dummy;
  /* Avoid the case where a test adds -mfpu=neon, but the
toolchain is
 configured for -mcpu=arm926ej-s, for example.  */
-#if __ARM_ARCH < 7
+#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
  #error Architecture too old for NEON.


Could you fix this #error message while you're here?

Why we can't change this test to look for the __ARM_NEON macro from
ACLE:

#if __ARM_NEON < 1
   #error NEON is not enabled
#endif

Thanks,
James



There is a check for this already:
'check_effective_target_arm_neon'. I think the idea behind
arm_neon_ok is to check whether the hardware would support neon,
whereas arm_neon is to check whether neon was enabled, i.e.
-mfpu=neon was used or a mcpu was passed that has neon enabled by
default.

The comments for 'check_effective_target_arm_neon_ok_nocache'
highlight this, though maybe the comments for
check_effective_target_arm_neon could be better.

# Return 1 if this is an ARM target supporting -mfpu=neon
# -mfloat-abi=softfp or equivalent options.  Some multilibs may be
# incompatible with these options.  Also set et_arm_neon_flags to the
# best options to add.

proc check_effective_target_arm_neon_ok_nocache
...
/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
configured for -mcpu=arm926ej-s, for example.  */
...


and

# Return 1 if this is a ARM target with NEON enabled.

proc check_effective_target_arm_neon


OK, got it - sorry for my mistake, I had the two procs confused.

I'd still like to see the error message fixed "Architecture too old
for NEON."
is not an accurate description of the problem.

Thanks,
James



This OK?



This is ok,
I've committed for you with the slightly tweaked ChangeLog entry:
2015-11-20  Andre Vieira  

 * lib/target-supports.exp
 (check_effective_target_arm_neon_ok_nocache): Add check
 for M profile.

as r230653.

Thanks,
Kyrill



Cheers,
Andre




Thank you. Would there be any objections to backporting this to gcc-5-branch? I 
checked, it applies cleanly and its a simple enough way of preventing a lot of 
FAILS for armv7-m.



I agree.
I've committed this to the GCC 5 branch for you as r230930.

Thanks,
Kyrill


Best Regards,
Andre

Re: [RFC] Getting LTO incremental linking work

2015-11-26 Thread Richard Biener

On Wed, 25 Nov 2015, Jan Hubicka wrote:

> > > 
> > >  1) linker plugin is modified to pass -flinker-output to lto wrapper
> > > linker-output is either dyn (.so), pie or exec
> > > for incremental linking I added .rel for 3) and noltorel for 1)
> > > 
> > > currently it does rel because 3) (nor 2) can not be done when 
> > > incremnetal
> > > linking is done on both LTO and non-LTO objects.
> > 
> > That's because the result would be a "fat" object where both pieces
> > would be needed.  Btw, I wonder why you are not running into the
> 
> Yep, we woud end up with both LTO and non-LTO in one object and because
> we have no way to claim just part of it in next linking, the non-LTO will
> be ignored (just as is the case with far objects)
> 
> > same issues as me when producing linker plugin output (the "merged"
> > LTO IL) that is LTO IL.  Ah, possibly because the link is incremental,
> > and thus all special-handling of LTO sections is disabled.
> 
> Yep, i just throw in the LTO IL and linker passes it through .
> > 
> > > In this case linker
> > > plugin output warings about code quality loss and switch to
> > > noltorel.
> > >  2) with -flinker-ouptut the lto wrapper behaves same way as with
> > > -flto-partition=none.
> > >  3) lto frontend parses -flinker-output and sets our internal flags 
> > > accordingly.
> > > I added new flag_incremental_linking to inform middle-end about the 
> > > fact
> > > that the output is going to be statically linked again.  This disables
> > > the privatization of hidden symbols and if set to 2 it also triggers
> > > the LTO IL streaming
> > 
> > I wonder why it behaves like -flto-partition=none in the case it does
> > not need to do LTO IL streaming (which I hope does LTO IL streaming
> > only?  or does this implement fat objects "correctly"?).  Can't
> 
> Yes, I do stream LTO il into assembler file, like normal -flto build would do
> for non-lto1 frontend.  So I produce one .s file that I need assembler to be
> called on.  By default lto-wrapper thinks we do WPA and it would look for list
> of ltrans partitions and execute ltranses that I do not want to happen.
> 
> Since no codegen is done we have no use for ltranses.  It would be nice to 
> spit
> the .o file through simple-object interface.  Sadly we can't do that because
> simple-object won't put the LTO marker symbols in.  Something I want to track
> and drop assembler stage from LTO generaltion in general
> https://gcc.gnu.org/ml/gcc/2014-09/msg00340.html
> 
> Well, one case where WPA would help is production of fat-objects.  Currently
> it works (by compiling the LTO data into assembly again) but it is not done
> in parallel.  I suppose we could deal with this later - it is non-critical.
> My longer term plan is to make WPA parallelization independent of LTO - it
> makes sense when you build one large non-LTO object, too.
> 
> > we still parallelize the build via LTRANS and then incrementally
> > link the result (I suppose the linker will do that for us with the
> > linker plugin outputs already?)?
> > 
> > -flto-partition=none itself isn't more memory intensive than
> > WPA in these days, it's only about compile-time, correct?
> 
> It is.  Just by streaming everything in and out we "compress" the memory 
> layout
> noticeably.  -flto-partition=one has smaller peak than -flto-partition=none.
> But again, here all this triggers with -ffat-objects only.
> > 
> > Your patch means that Andis/HJs work is no longer needed and we can
> > drop the section suffixes again?
> 
> Maybe. It is different implementation of same thing. They can be both used,
> though I suppose real incremental linking is better in longer term than
> section merging.
> > > 
> > > Does anyone see problems with this approach? I think this is easy enough 
> > > and fixes PR67548 so it may still get to mainline?
> > 
> > Yes, it would be a very nice feature to have indeed.
> > 
> > I don't see anything trying to change things with the collect2 path?
> 
> Hmm, with collect2 we don't even support static libraries, do we need to 
> support
> incremental link?  I suppose collect2 can recognize -r and LTO objects and 
> spawn
> the linker same way.
> > 
> > > I need to do more testing, but in general I think the implemntation is OK 
> > > as it is.  We need a way to force noltorel model for testsuite, as the
> > > new default will bypass codegen for all our -r -nostdlib testcases.
> > 
> > Maybe we can turn most of them to -shared?
> 
> Would that work on all targets? (i.e. mingw?).

We do have some testcases using -shared already, they require us to use
PIC flags though AFAICS.  -shared isn't 1:1 equivalent to -r -nostdlib...

> For testing purposes I suppose I will add a flag. It should also silence 
> the linker plugin warning about generating assembly early. -rno-lto 
> perhaps?

what about allowing -flinker-output=XXX at link time as a driver option
and avoiding to override it if already present?

> >

Re: [PATCH, PR target/68416, i386, MPX] Add bounds registers to ALL_REGS set

2015-11-26 Thread Uros Bizjak

Hello!

> gcc/
>
> 2015-11-26  Vladimir Makarov  
>
> PR target/68416
> * config/i386/i386.h (enum reg_class): Add
> bounds registers to ALL_REGS.
>
> gcc/testsuite/
>
> 2015-11-26  Ilya Enkovich  
>
> PR target/68416
> * gcc.target/i386/mpx/pr68416.c: New test.

OK for mainline and gcc-5 after gcc 5.3 is released.

Thanks,
Uros.

Re: [PING] Re: [PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-11-26 Thread James Greenhalgh

On Thu, Nov 05, 2015 at 12:30:08PM -0700, Martin Sebor wrote:
> On 11/02/2015 09:55 PM, Jason Merrill wrote:
> >On 10/26/2015 10:06 PM, Martin Sebor wrote:
> >>+  if (TREE_CONSTANT (maybe_constant_value (outer_nelts)))
> >>+{
> >>+  if (tree_int_cst_lt (max_outer_nelts_tree, outer_nelts))
> >
> >maybe_constant_value may return a constant, but that doesn't mean that
> >outer_nelts was already constant; if it wasn't, the call to
> >tree_int_cst_lt will fail.
> 
> Thanks for the hint. I wasn't able to trigger the failure. I suspect
> outer_nelts must have already been folded at this point because the
> maybe_constant_value call isn't necessary. I removed it.
> 
> >Since we're moving toward delayed folding, I'd prefer to use the result
> >of maybe_constant_value only for this diagnostic, and then continue to
> >pass the unfolded value along.
> 
> Sure. Done in the attached patch.
> 
> Martin

> gcc/cp/ChangeLog
> 
> 2015-10-19  Martin Sebor  
> 
>   PR c++/67913
>   PR c++/67927
>   * call.c (build_operator_new_call): Do not assume size_check
>   is non-null, analogously to the top half of the function.
>   * init.c (build_new_1): Detect and diagnose array sizes in
>   excess of the maximum of roughly SIZE_MAX / 2.
>   Insert a runtime check only for arrays with a non-constant size.
>   (build_new): Detect and diagnose negative array sizes.
> 
> gcc/testsuite/ChangeLog
> 
> 2015-10-19  Martin Sebor  
> 
>   * init/new45.C: New test to verify that operator new is invoked
>   with or without overhead for a cookie.

This new test fails for ARM targets, snipping some of the diff...

> +inline __attribute__ ((always_inline))
> +void* operator new[] (size_t n)
> +{
> +// Verify that array new is invoked with an argument large enough
> +// for the array and a size_t cookie to store the number of elements.
> +// (This holds for classes with user-defined types but not POD types).
> +if (n != N * sizeof (UDClass) + sizeof n) abort ();
> +return malloc (n);
> +}

Cookies on ARM are 8-bytes [1], but sizeof ((size_t) n) is only 4-bytes,
so this check will fail (We'll ask for 500 bytes, the test here will only
be looking for 496).

Would it undermine the test for other architectures if I were to swap out
the != for a >= ? I think that is in line with the "argument large enough
for the array" that this test is looking for, but would not catch bugs where
we were allocating more memory than neccessary.

Otherwise I can spin a patch which skips the test for ARM targets.

Thanks,
James

Re: [RFC] Getting LTO incremental linking work

2015-11-26 Thread Richard Biener

On Thu, 26 Nov 2015, Jan Hubicka wrote:

> > > Moreover we do have all infrastructure ready to implement 3).  Our tree 
> > > merging
> > > and symbol table handling is fuly incremental and I think made a patch to 
> > > implement it today.   The scheme is easy:
> > 
> > What happens when .S (assembler) files are part of the incremential object?
> > The kernel does that. Your patch would do the final generation in this case,
> > right?
> 
> Yes, it will spit out warning (which can be silenced -Wl,-rnolto is used) and 
> turn
> the whole object into non-LTO one.
> > 
> > In theory we could change the build system to avoid that case though, but
> > it would need some changes.
> > 
> > It would be better if that could be handled somehow.

The final output of the incremental link would need to be two objects,
one with the LTO IL and one with the incrementally linked non-LTO
objects.  The only way to make it "one" object is a static archive?
Or extend ELF to behave as a "container" for multiple sub-objects...

> How does this work with your patchset?  Ideally we should have way to claim
> only portions of object files, but we don't have that. If we claim the file,
> the symbols in real symbol table are not visible.
> 
> I suppose we could play a games here with slim LTO: claim the file, see if
> there are any symbols defined in the non-LTO symbol table and if so, interpret
> read the symbol table and tell linker about the symbols and at the very end
> include the offending object file in the list of objects returned back to
> linker.

This is what I was trying with early-LTO-debug btw... the slim object
also contains early debug sections which I don't "claim" and I feed
the objects back to the linker (as plugin output), expecting it to
drop the LTO IL and take the early debug sections...

> The linker then should take the symbols it wants.  There would be some fun
> involved, because the resolution info we get will consider the symbols
> defined in that object file to be IR which would need to be compensated for.

A sensible option might be to simply error on incrementally linking
slim-LTO with non-LTO objects.  For fat objects we could either
drop LTO or error as well.

Fixing this on the user (Makefile) side would be easiest.  But it
has to use two incrementally linked objects in this case of course
so it wouldn't be very transparent.

Richard.

[PATCH, i386] Use scalar mask for 16-byte and 32-byte vectors when possible

2015-11-26 Thread Ilya Enkovich

Hi,

This patch allows usage of scalar masks for ymm and xmm registers when target 
supports it.  Bootstrapped and regtested on x86_64-unknown-linux-gnu.  OK for 
trunk?

Thanks,
Ilya
--
gcc/

2015-11-26  Ilya Enkovich  

* config/i386/i386.c (ix86_get_mask_mode): Use scalar
modes for 32 and 16 byte vectors when possible.

gcc/testsuite/

2015-11-26  Ilya Enkovich  

* gcc.dg/vect/vect-32-chars.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 83749d5..d7c359f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -53443,7 +53443,8 @@ ix86_get_mask_mode (unsigned nunits, unsigned 
vector_size)
   unsigned elem_size = vector_size / nunits;
 
   /* Scalar mask case.  */
-  if (TARGET_AVX512F && vector_size == 64)
+  if ((TARGET_AVX512F && vector_size == 64)
+  || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16)))
 {
   if (elem_size == 4 || elem_size == 8 || TARGET_AVX512BW)
return smallest_mode_for_size (nunits, MODE_INT);
diff --git a/gcc/testsuite/gcc.dg/vect/vect-32-chars.c 
b/gcc/testsuite/gcc.dg/vect/vect-32-chars.c
new file mode 100644
index 000..0af5d2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-32-chars.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx512bw -mavx512vl" { target { i?86-*-* 
x86_64-*-* } } } */
+
+char a[32];
+char b[32];
+char c[32];
+
+void test()
+{
+  int i = 0;
+  for (i = 0; i < 32; i++)
+if (b[i] > 0)
+  a[i] = c[i];
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { i?86-*-* 
x86_64-*-* } } } } */

Re: [PATCH] New version of libmpx with new memmove wrapper

2015-11-26 Thread Ilya Enkovich

2015-11-25 18:41 GMT+03:00 Aleksandra Tsvetkova :
> gcc/testsuite/ChangeLog
> 2015-10-27  Tsvetkova Alexandra  
>
> * gcc.target/i386/mpx/memmove.c: New test for __mpx_wrapper_memmove.
>
> libmpx/ChangeLog
> 2015-10-28  Tsvetkova Alexandra  
>
> * mpxrt/Makefile.am (libmpx_la_LDFLAGS): Add -version-info option.
> * libmpxwrap/Makefile.am (libmpx_la_LDFLAGS): Likewise + includes fixed.
> * libmpx/Makefile.in: Regenerate.
> * mpxrt/Makefile.in: Regenerate.
> * libmpxwrap/Makefile.in: Regenerate.
> * mpxrt/libtool-version: New version.
> * libmpxwrap/libtool-version: Likewise.
> * mpxrt/libmpx.map: Add new version and a new symbol.
> * mpxrt/mpxrt.h: New file.
> * mpxrt/mpxrt.c (NUM_L1_BITS): Moved to mpxrt.h.
> (REG_IP_IDX): Moved to mpxrt.h.
> (REX_PREFIX): Moved to mpxrt.h.
> (XSAVE_OFFSET_IN_FPMEM): Moved to mpxrt.h.
> (MPX_L1_SIZE): Moved to mpxrt.h.
> * libmpxwrap/mpx_wrappers.c: (__mpx_wrapper_memmove): Rewritten.
> (mpx_pointer): New type.
> (mpx_bt_entry): New type.
> (alloc_bt): New function.
> (get_bt): New function.
> (copy_if_possible): New function.
> (copy_if_possible_from_end): New function.
> (move_bounds): New function.
>
> Memmove became 2 times slower on 8 bytes. On bigger lengths (>64 bytes) it 
> became up to 3,5 times better with pointers, up to 21 times better without 
> pointers.

+  bd_type bd = get_bd ();
+  if (!(bd))
+return;

Add explicit typecast.

+  /* No MPX or not enough bytes for the pointer,
+therefore, not necessary to copy.  */
+  if ((n > sizeof (void *)) || (src != dst))
+move_bounds (dst, src, n);

Comment doesn't match condition.  I believe condition should be ((n >=
sizeof (void *)) && (src != dst)).

Otherwise patch looks good. We need to make sure spec benchmark
failure is due to improved checking quality before commit it.

Thanks,
Ilya

Re: [PATCH] Fix debug fallout of proposed PR68162 fix

2015-11-26 Thread Richard Biener

On Wed, 18 Nov 2015, Richard Biener wrote:

> 
> The following patch makes sure we still emit a DW_TAG_typedef for
> the element typedef in gcc.dg/debug/dwarf2/pr47939-4.c after a change
> to how the C frontend structures the variant chain of arrays.
> 
> It makes dwarf2out _not_ re-build the variant for arrays (like it
> does for vectors).
> 
> Patch was tested by Joseph (also on gdb testsuite?) and I'm currently
> re-testing on x86_64-unknown-linux-gnu.
> 
> Ok for trunk and GCC 5 branch?

Ping?

Thanks,
Richard.

> Thanks,
> Richard.
> 
> 2015-11-18  Richard Biener  
> 
>   PR c/68162
>   * dwarf2out.c (gen_type_die_with_usage): Keep variant types
>   of arrays.
> 
> Index: gcc/dwarf2out.c
> ===
> --- gcc/dwarf2out.c   (revision 230428)
> +++ gcc/dwarf2out.c   (working copy)
> @@ -20784,9 +20784,10 @@ gen_type_die_with_usage (tree type, dw_d
>/* We are going to output a DIE to represent the unqualified version
>   of this type (i.e. without any const or volatile qualifiers) so
>   get the main variant (i.e. the unqualified version) of this type
> - now.  (Vectors are special because the debugging info is in the
> + now.  (Vectors and arrays are special because the debugging info is in 
> the
>   cloned type itself).  */
> -  if (TREE_CODE (type) != VECTOR_TYPE)
> +  if (TREE_CODE (type) != VECTOR_TYPE
> +  && TREE_CODE (type) != ARRAY_TYPE)
>  type = type_main_variant (type);
>  
>/* If this is an array type with hidden descriptor, handle it first.  */

Re: Fix PR 67609

2015-11-26 Thread Richard Henderson


On 11/25/2015 06:14 PM, James Greenhalgh wrote:

On Tue, Oct 27, 2015 at 01:21:51PM -0700, Richard Henderson wrote:

   * aarch64 is almost certainly vulnerable, since it deleted its
CANNOT_CHANGE_MODE_CLASS implementation in January.  I haven't tried to create
a test case that fails for it, but I'm certain it's possible.


The best I've come up with so far needs some union-hackery that I'm not
convinced is legal, and looks like:

   typedef union
   {
 double v[2];
 double s __attribute__ ((vector_size (16)));
   } data;

   data reg;

   void
   set_lower (double b)
   {
 dodgy_data stack_var;
 double __attribute__ ((vector_size (16))) one = { 1.0, 1.0 };
 stack_var.s = reg.s;
 stack_var.s += one;
 stack_var.v[0] += b;
 reg.s = stack_var.s;
   }


Modulo the one typo, this is a valid test case.


This shows the issue going back to GCC 4.9, the code we generate for
AArch64 looks like:

set_lower:
fmovv2.2d, 1.0e+0
adrpx0, reg
ldr q1, [x0, #:lo12:reg]
faddv1.2d, v1.2d, v2.2d
orr v2.16b, v1.16b, v1.16b
faddd2, d0, d1 //   <- Clobbered stack_var.v[1].
str q2, [x0, #:lo12:reg] // <- Wrote zeroes to the top half 
of reg
ret


And yes, we mis-compile it, for the same reason:

(insn 13 12 14 2 (set (subreg:DF (reg/v:TI 78 [ stack_var ]) 0)
(reg:DF 85)) /localhome/devel/rth/z.c:16 54 {*movdf_aarch64}
 (nil))

becomes

(insn 13 11 16 2 (set (subreg:DF (reg/v:TI 78 [ stack_var ]) 0)
(plus:DF (reg:DF 32 v0 [ b ])
(subreg:DF (reg:V2DF 82) 0))) /localhome/devel/rth/z.c:16 804 
{adddf3}
 (expr_list:REG_DEAD (reg:V2DF 82)
(expr_list:REG_DEAD (reg:DF 32 v0 [ b ])
(nil

becomes

(insn:TI 13 11 17 (set (reg:DF 34 v2 [orig:78 stack_var ] [78])
(plus:DF (reg:DF 32 v0 [ b ])
(reg:DF 33 v1 [82]))) /localhome/devel/rth/z.c:16 804 {adddf3}
 (expr_list:REG_DEAD (reg:DF 33 v1 [82])
(expr_list:REG_DEAD (reg:DF 32 v0 [ b ])
(nil


Reading the documentation you add below, we'd need to bring back
CANNOT_CHANGE_MODE_CLASS for AArch64 and forbid changes from wide registers
to 64-bit (and larger) values. Is this right?


Not exactly -- forbid BITS_PER_WORD (64-bit) subregs of hard registers > 
BITS_PER_WORD.  See the verbiage I added to the i386 backend for this.




 Are these workarounds intended
to be temporary, or is the midend bug likely to be fixed?


Not in the near term, no.  We'd need to replace subreg, which that does 3 jobs 
simultaneously, with something else that's less ambiguous.



r~

[PATCH][calls.c][4.9/5 backport] PR rtl-optimization/67226: Take into account pretend_args_size when checking stack offsets for sibcall optimisation

2015-11-26 Thread Kyrill Tkachov


Hi all,

This is a backport to GCC 4.9 and 5 of the patch at 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03082.html
The only difference with the trunk version is that we perform #ifdef checks on 
STACK_GROWS_DOWNWARD because
on these branches we have not moved away from conditional compilation of those 
bits.

Bernd approved the backports at 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03090.html
so I'll be committing this to the branches.

Bootstrapped and tested on arm, aarch64, x86_64.
Confirmed that the testcase fails on all the branches before this patch and 
passes with it.

Thanks,
Kyrill

2015-11-26  Kyrylo Tkachov  
Bernd Schmidt  

PR rtl-optimization/67226
* calls.c (store_one_arg): Take into account
crtl->args.pretend_args_size when checking for overlap between
arg->value and argblock + arg->locate.offset during sibcall
optimization.

2015-11-26  Kyrylo Tkachov  

PR rtl-optimization/67226
* gcc.c-torture/execute/pr67226.c: New test.
diff --git a/gcc/calls.c b/gcc/calls.c
index 0987dd0..ee8ea5f 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -4959,6 +4959,13 @@ store_one_arg (struct arg_data *arg, rtx argblock, int flags,
 	  if (XEXP (x, 0) != crtl->args.internal_arg_pointer)
 		i = INTVAL (XEXP (XEXP (x, 0), 1));
 
+	  /* arg.locate doesn't contain the pretend_args_size offset,
+		 it's part of argblock.  Ensure we don't count it in I.  */
+#ifdef STACK_GROWS_DOWNWARD
+		i -= crtl->args.pretend_args_size;
+#else
+		i += crtl->args.pretend_args_size;
+#endif
 	  /* expand_call should ensure this.  */
 	  gcc_assert (!arg->locate.offset.var
 			  && arg->locate.size.var == 0
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67226.c b/gcc/testsuite/gcc.c-torture/execute/pr67226.c
new file mode 100644
index 000..c533496
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr67226.c
@@ -0,0 +1,42 @@
+struct assembly_operand
+{
+  int type, value, symtype, symflags, marker;
+};
+
+struct assembly_operand to_input, from_input;
+
+void __attribute__ ((__noinline__, __noclone__))
+assemblez_1 (int internal_number, struct assembly_operand o1)
+{
+  if (o1.type != from_input.type)
+__builtin_abort ();
+}
+
+void __attribute__ ((__noinline__, __noclone__))
+t0 (struct assembly_operand to, struct assembly_operand from)
+{
+  if (to.value == 0)
+assemblez_1 (32, from);
+  else
+__builtin_abort ();
+}
+
+int
+main (void)
+{
+  to_input.value = 0;
+  to_input.type = 1;
+  to_input.symtype = 2;
+  to_input.symflags = 3;
+  to_input.marker = 4;
+
+  from_input.value = 5;
+  from_input.type = 6;
+  from_input.symtype = 7;
+  from_input.symflags = 8;
+  from_input.marker = 9;
+
+  t0 (to_input, from_input);
+
+  return 0;
+}

[PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-26 Thread Kyrill Tkachov


Hi all,

In this PR we have an IF-THEN-JOIN formation i.e. no ELSE block and we have a 
situation
where the THEN block modifies a register used in emit_b, so emit_b must be 
emitted before
the THEN block. However the bug in the logic that performs these checks ends up 
to us
emitting emit_a+then_bb followed by emit_b+else_bb.

The fix is pretty simple and involves emitting emit_b (+ else_bb that is empty 
in this case)
if modified_a is true, even if emit_a is NULL. If emit_a is NULL noce_emit_bb 
will handle
it properly and not do anything bad, so we're safe.

Bootstrapped and tested on arm, aarch64, x86_64.

Ok for trunk?

Thanks,
Kyrill

2015-11-26  Kyrylo Tkachov  

PR rtl-optimization/68506
* ifcvt.c (noce_try_cmove_arith): Try emitting the else basic block
first if emit_a exists or then_bb modifies 'b'.

2015-11-26  Kyrylo Tkachov  

PR rtl-optimization/68506
* gcc.c-torture/execute/pr68506.c: New test.
commit 08a371d20793bd57e9a68b85beaf2cab0804ed48
Author: Kyrylo Tkachov 
Date:   Tue Nov 24 11:49:30 2015 +

PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index af7a3b9..3e3dc8d 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2220,7 +2220,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 	  }
 
 }
-if (emit_a && modified_in_a)
+if (emit_a || modified_in_a)
   {
 	modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);
 	if (tmp_b && else_bb)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68506.c b/gcc/testsuite/gcc.c-torture/execute/pr68506.c
new file mode 100644
index 000..15984ed
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68506.c
@@ -0,0 +1,63 @@
+/* { dg-options "-fno-builtin-abort" } */
+
+int a, b, m, n, o, p, s, u, i;
+char c, q, y;
+short d;
+unsigned char e;
+static int f, h;
+static short g, r, v;
+unsigned t;
+
+extern void abort ();
+
+int
+fn1 (int p1)
+{
+  return a ? p1 : p1 + a;
+}
+
+unsigned char
+fn2 (unsigned char p1, int p2)
+{
+  return p2 >= 2 ? p1 : p1 >> p2;
+}
+
+static short
+fn3 ()
+{
+  int w, x = 0;
+  for (; p < 31; p++)
+{
+  s = fn1 (c | ((1 && c) == c));
+  t = fn2 (s, x);
+  c = (unsigned) c > -(unsigned) ((o = (m = d = t) == p) <= 4UL) && n;
+  v = -c;
+  y = 1;
+  for (; y; y++)
+	e = v == 1;
+  d = 0;
+  for (; h != 2;)
+	{
+	  for (;;)
+	{
+	  if (!m)
+		abort ();
+	  r = 7 - f;
+	  x = e = i | r;
+	  q = u * g;
+	  w = b == q;
+	  if (w)
+		break;
+	}
+	  break;
+	}
+}
+  return x;
+}
+
+int
+main ()
+{
+  fn3 ();
+  return 0;
+}

Re: [PATCH] Fix pattern causing C_MAYBE_CONST_EXPRs leak into gimplifier (PR c/68513)

2015-11-26 Thread Marek Polacek

On Wed, Nov 25, 2015 at 03:16:53PM +, Joseph Myers wrote:
> > Wonder if we couldn't use some FE specific bit on the SAVE_EXPR to say
> > whether c_fully_fold_internal has already processed it or not, and just
> > get rid of c_save_expr, in c_fully_fold* recurse into SAVE_EXPRs too, but
> > only if that bit is not yet already set, and set it afterwards.
> 
> I suppose you could do that, in line with the general principle of 
> reducing early folding (as long as you ensure that folding the contents of 
> a SAVE_EXPR results in modifying that SAVE_EXPR so that all pointers to it 
> stay pointing to the same tree node).

I had a go at this, but I'm now skeptical about removing c_save_expr.
save_expr calls fold (), so we need to ensure that we don't pass any
C_MAYBE_CONST_EXPRs into it, meaning that we'd need to call c_fully_fold before
save_expr anyway...

So maybe go the "remove C_MAYBE_CONST_EXPRs in SAVE_EXPRs in c_gimplify_expr"
way?

Marek

Re: [PATCH] Fix pattern causing C_MAYBE_CONST_EXPRs leak into gimplifier (PR c/68513)

2015-11-26 Thread Jakub Jelinek

On Thu, Nov 26, 2015 at 12:15:48PM +0100, Marek Polacek wrote:
> On Wed, Nov 25, 2015 at 03:16:53PM +, Joseph Myers wrote:
> > > Wonder if we couldn't use some FE specific bit on the SAVE_EXPR to say
> > > whether c_fully_fold_internal has already processed it or not, and just
> > > get rid of c_save_expr, in c_fully_fold* recurse into SAVE_EXPRs too, but
> > > only if that bit is not yet already set, and set it afterwards.
> > 
> > I suppose you could do that, in line with the general principle of 
> > reducing early folding (as long as you ensure that folding the contents of 
> > a SAVE_EXPR results in modifying that SAVE_EXPR so that all pointers to it 
> > stay pointing to the same tree node).
> 
> I had a go at this, but I'm now skeptical about removing c_save_expr.
> save_expr calls fold (), so we need to ensure that we don't pass any
> C_MAYBE_CONST_EXPRs into it, meaning that we'd need to call c_fully_fold 
> before
> save_expr anyway...

I bet that is too hard for stage3, but IMHO if we want delayed folding, we
want to delay it even for SAVE_EXPRs.  Supposedly the reason why save_expr
calls fold is to determine the cases where there is no point to create the
SAVE_EXPR.  But perhaps it should just fold for the purpose of testing that
and return the original expression if after folding it is simple
arithmetics, and wrap the original expression into SAVE_EXPR.

Though in this particular case, where save_expr is just an optimization it
is perhaps premature optimization and we should not perform that.

For stage3, I agree, some other fix is needed (and one usable also for the 5
branch).

Jakub

Re: [PATCH] Fix pattern causing C_MAYBE_CONST_EXPRs leak into gimplifier (PR c/68513)

2015-11-26 Thread Richard Biener

On Thu, 26 Nov 2015, Jakub Jelinek wrote:

> On Thu, Nov 26, 2015 at 12:15:48PM +0100, Marek Polacek wrote:
> > On Wed, Nov 25, 2015 at 03:16:53PM +, Joseph Myers wrote:
> > > > Wonder if we couldn't use some FE specific bit on the SAVE_EXPR to say
> > > > whether c_fully_fold_internal has already processed it or not, and just
> > > > get rid of c_save_expr, in c_fully_fold* recurse into SAVE_EXPRs too, 
> > > > but
> > > > only if that bit is not yet already set, and set it afterwards.
> > > 
> > > I suppose you could do that, in line with the general principle of 
> > > reducing early folding (as long as you ensure that folding the contents 
> > > of 
> > > a SAVE_EXPR results in modifying that SAVE_EXPR so that all pointers to 
> > > it 
> > > stay pointing to the same tree node).
> > 
> > I had a go at this, but I'm now skeptical about removing c_save_expr.
> > save_expr calls fold (), so we need to ensure that we don't pass any
> > C_MAYBE_CONST_EXPRs into it, meaning that we'd need to call c_fully_fold 
> > before
> > save_expr anyway...
> 
> I bet that is too hard for stage3, but IMHO if we want delayed folding, we
> want to delay it even for SAVE_EXPRs.  Supposedly the reason why save_expr
> calls fold is to determine the cases where there is no point to create the
> SAVE_EXPR.  But perhaps it should just fold for the purpose of testing that
> and return the original expression if after folding it is simple
> arithmetics, and wrap the original expression into SAVE_EXPR.

Btw, I tried to remove that fold () at some point but it spectacularly
regressed (though before the C++ early folding work) in constexpr
cases.

> Though in this particular case, where save_expr is just an optimization it
> is perhaps premature optimization and we should not perform that.
> 
> For stage3, I agree, some other fix is needed (and one usable also for the 5
> branch).

I'm currently testing the genmatch.c patch ... (which might make this
situation even worse).  I don't think we have the issue on the 5 branch
so much (because of way less patterns in match.pd).

Richard.

Re: [PATCH] Improve verification of loop->latch in verify_loop_structure

2015-11-26 Thread Alan Lawrence

This caused an ICE compiling value.c from gdb on
aarch64-none-linux-gnu; the testcase, after preprocessing on aarch64,
ICEs on both aarch64 and x86_64, but is about 1MB - I'm working on
reducing that down to something small enough to post...

$ ./gcc/xgcc -B ./gcc -O2 -g value.c
../../binutils-gdb/gdb/value.c: In function ‘show_convenience’:
../../binutils-gdb/gdb/value.c:2615:1: error: loop 3’s latch is missing
../../binutils-gdb/gdb/value.c:2615:1: internal compiler error: in
verify_loop_structure, at cfgloop.c:1669
0x71e653 verify_loop_structure()
/work/alalaw01/src2/gcc/gcc/cfgloop.c:1669
0x97c6ae checking_verify_loop_structure
/work/alalaw01/src2/gcc/gcc/cfgloop.h:325
0x97c6ae loop_optimizer_init(unsigned int)
/work/alalaw01/src2/gcc/gcc/loop-init.c:106
0x97c78a rtl_loop_init
/work/alalaw01/src2/gcc/gcc/loop-init.c:398
0x97c78a execute
/work/alalaw01/src2/gcc/gcc/loop-init.c:425

--Alan

Re: [PING] Re: [PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-11-26 Thread Ramana Radhakrishnan


> Cookies on ARM are 8-bytes [1], but sizeof ((size_t) n) is only 4-bytes,
> so this check will fail (We'll ask for 500 bytes, the test here will only
> be looking for 496).
> 
> Would it undermine the test for other architectures if I were to swap out
> the != for a >= ? I think that is in line with the "argument large enough
> for the array" that this test is looking for, but would not catch bugs where
> we were allocating more memory than neccessary.
> 
> Otherwise I can spin a patch which skips the test for ARM targets.
> 

I didn't want to skip this for ARM, instead something that takes into account 
the cookie size - (very gratuitous hack was to just add 4 in a #ifdef __arm__ 
block). Something like attached, brown paper bag warning ;)
 

* g++.dg/init/new45.C: Adjust for cookie size on arm.

regards
Ramana

> Thanks,
> James
> 
diff --git a/gcc/testsuite/g++.dg/init/new45.C 
b/gcc/testsuite/g++.dg/init/new45.C
index 92dac18..31473a3 100644
--- a/gcc/testsuite/g++.dg/init/new45.C
+++ b/gcc/testsuite/g++.dg/init/new45.C
@@ -29,8 +29,16 @@ void* operator new[] (size_t n)
 // Verify that array new is invoked with an argument large enough
 // for the array and a size_t cookie to store the number of elements.
 // (This holds for classes with user-defined types but not POD types).
-if (n != N * sizeof (UDClass) + sizeof n) abort ();
-return malloc (n);
+  size_t val = N * sizeof (UDClass) + sizeof n;
+
+// On ARM EABI the cookie is always 8 bytes as per Section 3.2.2
+// of 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0041d/IHI0041D_cppabi.pdf
+#if defined( __arm__) && defined(__ARM_EABI__)
+  val = val + 4;
+#endif
+
+  if (n != val) abort ();
+  return malloc (n);
 }
 
 inline __attribute__ ((always_inline))
@@ -60,8 +68,16 @@ void* operator new[] (size_t n, UDClass *p)
 // Verify that placement array new overload for a class type with
 // a user-defined ctor and dtor is invoked with an argument large
 // enough for the array and a cookie.
-if (n != N * sizeof (UDClass) + sizeof n) abort ();
-return p;
+  size_t val = N * sizeof (UDClass) + sizeof n;
+
+// On ARM EABI the cookie is always 8 bytes as per Section 3.2.2
+// of 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0041d/IHI0041D_cppabi.pdf
+#if defined( __arm__) && defined(__ARM_EABI__)
+  val = val + 4;
+#endif
+
+  if (n != val) abort ();
+  return p;
 }
 
 // UDClassllocate a sufficiently large buffer to construct arrays into.

Re: [PATCH, PR target/68416, i386, MPX] Add bounds registers to ALL_REGS set

2015-11-26 Thread Ilya Enkovich

2015-11-26 13:15 GMT+03:00 Uros Bizjak :
> Hello!
>
>> gcc/
>>
>> 2015-11-26  Vladimir Makarov  
>>
>> PR target/68416
>> * config/i386/i386.h (enum reg_class): Add
>> bounds registers to ALL_REGS.
>>
>> gcc/testsuite/
>>
>> 2015-11-26  Ilya Enkovich  
>>
>> PR target/68416
>> * gcc.target/i386/mpx/pr68416.c: New test.
>
> OK for mainline and gcc-5 after gcc 5.3 is released.

Do you think this patch is not safe?  This would be useful for 5.3
because it significantly reduces MPX overhead on some benchmarks.

Thanks,
Ilya

>
> Thanks,
> Uros.

[PATCH] Fix PR67203

2015-11-26 Thread Richard Biener


This fixes the g++.dg/tree-ssa/pr61034.C to make the expected result
dependent on PUSH_ARGS_REVERSED (via an explicit list of targets).

With swapping the final cd_dce and dse passes I get all targets
to produce zero 'free' calls in .optimized but that doesn't
sound appropriate at this stage.

And instead of swapping order DSE and DCE should be merged.  I'll
see what I can do for next stage1 or maybe this is an interesting
GSoC project as well.

Tested on x86_64, applied.

Richard.

2015-11-26  Richard Biener  

PR testsuite/67203
* g++.dg/tree-ssa/pr61034.C: Make expected optimization result
dependent on PUSH_ARGS_REVERSED.  Drop optimization level and
also monitor final optimization result.

Index: gcc/testsuite/g++.dg/tree-ssa/pr61034.C
===
--- gcc/testsuite/g++.dg/tree-ssa/pr61034.C (revision 230925)
+++ gcc/testsuite/g++.dg/tree-ssa/pr61034.C (working copy)
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O3 -fdump-tree-fre2" }
+// { dg-options "-O2 -fdump-tree-fre2 -fdump-tree-optimized" }
 
 #define assume(x) if(!(x))__builtin_unreachable()
 
@@ -43,5 +43,12 @@ bool f(I a, I b, I c, I d) {
 // This works only if everything is inlined into 'f'.
 
 // { dg-final { scan-tree-dump-times ";; Function" 1 "fre2" } }
-// { dg-final { scan-tree-dump-times "free" 10 "fre2" } }
 // { dg-final { scan-tree-dump-times "unreachable" 11 "fre2" } }
+
+// Note that depending on PUSH_ARGS_REVERSED we are presented with
+// a different initial CFG and thus the final outcome is different
+
+// { dg-final { scan-tree-dump-times "free" 10 "fre2" { target x86_64-*-* 
i?86-*-* } } }
+// { dg-final { scan-tree-dump-times "free" 3 "optimized" { target x86_64-*-* 
i?86-*-* } } }
+// { dg-final { scan-tree-dump-times "free" 14 "fre2" { target aarch64-*-* 
ia64-*-* arm-*-* hppa*-*-* sparc*-*-* powerpc*-*-* } } }
+// { dg-final { scan-tree-dump-times "free" 4 "optimized" { target aarch64-*-* 
ia64-*-* arm-*-* hppa*-*-* sparc*-*-* powerpc*-*-* } } }

[PATCH] -Wshift-overflow: Warn for shifting sign bit out of a negative number

2015-11-26 Thread Paolo Bonzini

maybe_warn_shift_overflow is checking for patterns such as (1 << 31)
and not warning for them.

However, if the shifted value is negative, a shift by a non-zero
amount will always shift *out* of the sign bit rather than into it.
Thus it should be warned about, even if the value only requires
one bit more than the precision of the LHS.

Ok for trunk?

Paolo

gcc:
* c-family/c-common.c (maybe_warn_shift_overflow): Warn on all
overflows if shifting 1 out of the sign bit.

gcc/testsuite:
* c-c++-common/Wshift-overflow-1.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-2.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-3.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-4.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-6.c: Test shifting 1 out of the sign bit.
* c-c++-common/Wshift-overflow-7.c: Test shifting 1 out of the sign bit.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 230930)
+++ c-family/c-common.c (working copy)
@@ -12631,8 +12631,11 @@ maybe_warn_shift_overflow (location_t loc, tree op
 
   unsigned int min_prec = (wi::min_precision (op0, SIGNED)
   + TREE_INT_CST_LOW (op1));
-  /* Handle the left-shifting 1 into the sign bit case.  */
-  if (min_prec == prec0 + 1)
+  /* Handle the case of left-shifting 1 into the sign bit.
+   * However, shifting 1 _out_ of the sign bit, as in
+   * INT_MIN << 1, is considered an overflow.
+   */
+  if (!tree_int_cst_sign_bit(op0) && min_prec == prec0 + 1)
 {
   /* Never warn for C++14 onwards.  */
   if (cxx_dialect >= cxx14)
Index: testsuite/c-c++-common/Wshift-overflow-1.c
===
--- testsuite/c-c++-common/Wshift-overflow-1.c  (revision 230930)
+++ testsuite/c-c++-common/Wshift-overflow-1.c  (working copy)
@@ -8,6 +8,9 @@
 #define LLONGM1 (sizeof (long long) * __CHAR_BIT__ - 1)
 #define LLONGM2 (sizeof (long long) * __CHAR_BIT__ - 2)
 
+#define INT_MIN (-__INT_MAX__-1)
+#define LONG_LONG_MIN (-__LONG_LONG_MAX__-1)
+
 int i1 = 1 << INTM1;
 int i2 = 9 << INTM1; /* { dg-warning "requires 36 bits to represent" } */
 int i3 = 10 << INTM2; /* { dg-warning "requires 35 bits to represent" } */
@@ -18,6 +21,7 @@
 int i8 = -10 << INTM2; /* { dg-warning "requires 35 bits to represent" } */
 int i9 = -__INT_MAX__ << 2; /* { dg-warning "requires 34 bits to represent" } 
*/
 int i10 = -__INT_MAX__ << INTM1; /* { dg-warning "requires 63 bits to 
represent" } */
+int i11 = INT_MIN << 1; /* { dg-warning "requires 33 bits to represent" } */
 
 int r1 = 1 >> INTM1;
 int r2 = 9 >> INTM1;
@@ -46,6 +50,7 @@
 long long int l8 = -10LL << LLONGM2; /* { dg-warning "requires 67 bits to 
represent" } */
 long long int l9 = -__LONG_LONG_MAX__ << 2; /* { dg-warning "requires 66 bits 
to represent" } */
 long long int l10 = -__LONG_LONG_MAX__ << LLONGM1; /* { dg-warning "requires 
127 bits to represent" } */
+long long int l11 = LONG_LONG_MIN << 1; /* { dg-warning "requires 65 bits to 
represent" } */
 
 void
 fn (void)
Index: testsuite/c-c++-common/Wshift-overflow-2.c
===
--- testsuite/c-c++-common/Wshift-overflow-2.c  (revision 230930)
+++ testsuite/c-c++-common/Wshift-overflow-2.c  (working copy)
@@ -8,6 +8,9 @@
 #define LLONGM1 (sizeof (long long) * __CHAR_BIT__ - 1)
 #define LLONGM2 (sizeof (long long) * __CHAR_BIT__ - 2)
 
+#define INT_MIN (-__INT_MAX__-1)
+#define LONG_LONG_MIN (-__LONG_LONG_MAX__-1)
+
 int i1 = 1 << INTM1;
 int i2 = 9 << INTM1;
 int i3 = 10 << INTM2;
@@ -18,6 +21,7 @@
 int i8 = -10 << INTM2;
 int i9 = -__INT_MAX__ << 2;
 int i10 = -__INT_MAX__ << INTM1;
+int i11 = INT_MIN << 1;
 
 int r1 = 1 >> INTM1;
 int r2 = 9 >> INTM1;
@@ -46,6 +50,7 @@
 long long int l8 = -10LL << LLONGM2;
 long long int l9 = -__LONG_LONG_MAX__ << 2;
 long long int l10 = -__LONG_LONG_MAX__ << LLONGM1;
+long long int l11 = LONG_LONG_MIN << 1;
 
 void
 fn (void)
Index: testsuite/c-c++-common/Wshift-overflow-3.c
===
--- testsuite/c-c++-common/Wshift-overflow-3.c  (revision 230930)
+++ testsuite/c-c++-common/Wshift-overflow-3.c  (working copy)
@@ -9,6 +9,9 @@
 #define LLONGM1 (sizeof (long long) * __CHAR_BIT__ - 1)
 #define LLONGM2 (sizeof (long long) * __CHAR_BIT__ - 2)
 
+#define INT_MIN (-__INT_MAX__-1)
+#define LONG_LONG_MIN (-__LONG_LONG_MAX__-1)
+
 int i1 = 1 << INTM1;
 int i2 = 9 << INTM1; /* { dg-warning "requires 36 bits to represent" } */
 int i3 = 10 << INTM2; /* { dg-warning "requires 35 bits to represent" } */
@@ -19,6 +22,7 @@
 int i8 = -10 << INTM2; /* { dg-warning "requires 35 bits to represent" } */
 int i9 = -__INT_MAX__ << 2; /* { dg-warning "requires 34 bits to represent" } 
*/
 int i10 = -__INT_MAX__ << INTM1; /* { dg-warning "requires 63 bits to 
represent" } */
+int i11 = INT_MIN << 1;

Re: [PATCH] Improve verification of loop->latch in verify_loop_structure

2015-11-26 Thread Alan Lawrence

Here's a reduced testcase (reduced to the point of generating lots of
warnings, I'm compiling with -O2 -w, on x86_64):

struct __jmp_buf_tag
  {
  };
typedef struct __jmp_buf_tag sigjmp_buf[1];
extern struct cmd_list_element *showlist;
struct internalvar
{
  struct internalvar *next;
};
static struct internalvar *internalvars;
struct internalvar *
create_internalvar (const char *name)
{
  struct internalvar *var = ((struct internalvar *) xmalloc (sizeof
(struct internalvar)));
  internalvars = var;
}

void
show_convenience (char *ignore, int from_tty)
{
  struct gdbarch *gdbarch = get_current_arch ();
  int varseen = 0;
  for (struct internalvar *var = internalvars; var; var = var->next)
{
  if (!varseen)
varseen = 1;
  sigjmp_buf *buf = exceptions_state_mc_init ();
  __sigsetjmp ();
  while (exceptions_state_mc_action_iter ())
while (exceptions_state_mc_action_iter_1 ())
  ;
}
  if (!varseen)
  printf_unfiltered ();
}

On 26 November 2015 at 11:33, Alan Lawrence  wrote:
> This caused an ICE compiling value.c from gdb on
> aarch64-none-linux-gnu; the testcase, after preprocessing on aarch64,
> ICEs on both aarch64 and x86_64, but is about 1MB - I'm working on
> reducing that down to something small enough to post...
>
> $ ./gcc/xgcc -B ./gcc -O2 -g value.c
> ../../binutils-gdb/gdb/value.c: In function ‘show_convenience’:
> ../../binutils-gdb/gdb/value.c:2615:1: error: loop 3’s latch is missing
> ../../binutils-gdb/gdb/value.c:2615:1: internal compiler error: in
> verify_loop_structure, at cfgloop.c:1669
> 0x71e653 verify_loop_structure()
> /work/alalaw01/src2/gcc/gcc/cfgloop.c:1669
> 0x97c6ae checking_verify_loop_structure
> /work/alalaw01/src2/gcc/gcc/cfgloop.h:325
> 0x97c6ae loop_optimizer_init(unsigned int)
> /work/alalaw01/src2/gcc/gcc/loop-init.c:106
> 0x97c78a rtl_loop_init
> /work/alalaw01/src2/gcc/gcc/loop-init.c:398
> 0x97c78a execute
> /work/alalaw01/src2/gcc/gcc/loop-init.c:425
>
> --Alan

[PATCH] Add PR rtl-optimization/68249 and PR rtl-optimization/68321 testcases

2015-11-26 Thread Jakub Jelinek

Hi!

These PRs are dups of PR68194 that has been fixed recently.
I've added the testcases for them to the trunk and 5 branch.

2015-11-26  Jakub Jelinek  

PR rtl-optimization/68249
PR rtl-optimization/68321
* gcc.c-torture/execute/pr68249.c: New test.
* gcc.c-torture/execute/pr68321.c: New test.

--- gcc/testsuite/gcc.c-torture/execute/pr68249.c.jj2015-11-26 
12:39:43.204789597 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr68249.c   2015-11-26 
12:39:23.0 +0100
@@ -0,0 +1,36 @@
+/* PR rtl-optimization/68249 */
+
+int a, b, c, g, k, l, m, n;
+char h;
+
+void
+fn1 ()
+{
+  for (; k; k++)
+{
+  m = b || c < 0 || c > 1 ? : c;
+  g = l = n || m < 0 || (m > 1) > 1 >> m ? : 1 << m;
+}
+  l = b + 1;
+  for (; b < 1; b++)
+h = a + 1;
+}
+
+int
+main ()
+{
+  char j; 
+  for (; a < 1; a++)
+{
+  fn1 ();
+  if (h)
+   j = h;
+  if (j > c)
+   g = 0;
+}
+
+  if (h != 1) 
+__builtin_abort (); 
+
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr68321.c.jj2015-11-26 
12:39:43.204789597 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr68321.c   2015-11-26 
12:39:05.0 +0100
@@ -0,0 +1,38 @@
+/* PR rtl-optimization/68321 */
+
+int e = 1, u = 5, t2, t5, i, k;
+int a[1], b, m;
+char n, t;
+
+int
+fn1 (int p1)
+{
+  int g[1];
+  for (;;)
+{
+  if (p1 / 3)
+   for (; t5;)
+ u || n;
+  t2 = p1 & 4;
+  if (b + 1)
+   return 0;
+  u = g[0];
+}
+}
+
+int
+main ()
+{
+  for (; e >= 0; e--)
+{
+  char c;
+  if (!m)
+   c = t;
+  fn1 (c);
+}
+  
+  if (a[t2] != 0) 
+__builtin_abort (); 
+
+  return 0;
+}

Jakub

Fix PR c++/68527

2015-11-26 Thread Eric Botcazou

This is a tree checking failure on invalid C++ code with -fdump-ada-spec.
I guess that we could simply bail out if there are errors in the code, but we 
already have guards for error_mark_node so the patch adds a couple more.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2015-11-26  Eric Botcazou  

PR c++/68527
* c-ada-spec.c (dump_nested_types): Add guard for error_mark_node.
(print_ada_struct_decl): Likewise.

-- 
Eric BotcazouIndex: c-family/c-ada-spec.c
===
--- c-family/c-ada-spec.c	(revision 230924)
+++ c-family/c-ada-spec.c	(working copy)
@@ -2461,7 +2461,8 @@ dump_nested_types (pretty_printer *buffe
   field = TYPE_FIELDS (outer);
   while (field)
 {
-  if ((TREE_TYPE (field) != outer
+  if (((TREE_TYPE (field) != outer
+	&& TREE_TYPE (field) != error_mark_node)
 	   || (TREE_CODE (TREE_TYPE (field)) == POINTER_TYPE
 	   && TREE_TYPE (TREE_TYPE (field)) != outer))
 	   && (!TYPE_NAME (TREE_TYPE (field))
@@ -3230,9 +3231,10 @@ print_ada_struct_decl (pretty_printer *b
 		}
 	}
 	  /* Avoid printing the structure recursively.  */
-	  else if ((TREE_TYPE (tmp) != node
-		   || (TREE_CODE (TREE_TYPE (tmp)) == POINTER_TYPE
-		   && TREE_TYPE (TREE_TYPE (tmp)) != node))
+	  else if (((TREE_TYPE (tmp) != node
+		 && TREE_TYPE (tmp) != error_mark_node)
+		|| (TREE_CODE (TREE_TYPE (tmp)) == POINTER_TYPE
+			&& TREE_TYPE (TREE_TYPE (tmp)) != node))
 		   && TREE_CODE (tmp) != TYPE_DECL
 		   && !TREE_STATIC (tmp))
 	{

Re: [PATCH, PR target/68416, i386, MPX] Add bounds registers to ALL_REGS set

2015-11-26 Thread Uros Bizjak

On Thu, Nov 26, 2015 at 12:46 PM, Ilya Enkovich  wrote:
> 2015-11-26 13:15 GMT+03:00 Uros Bizjak :

>>> 2015-11-26  Vladimir Makarov  
>>>
>>> PR target/68416
>>> * config/i386/i386.h (enum reg_class): Add
>>> bounds registers to ALL_REGS.
>>>
>>> gcc/testsuite/
>>>
>>> 2015-11-26  Ilya Enkovich  
>>>
>>> PR target/68416
>>> * gcc.target/i386/mpx/pr68416.c: New test.
>>
>> OK for mainline and gcc-5 after gcc 5.3 is released.
>
> Do you think this patch is not safe?  This would be useful for 5.3
> because it significantly reduces MPX overhead on some benchmarks.

Although it looks safe, it is just a couple of days before the
release. Let's ask release manager.

Uros.

Re: [PATCH] Improve verification of loop->latch in verify_loop_structure

2015-11-26 Thread Richard Biener

On Thu, 26 Nov 2015, Alan Lawrence wrote:

> This caused an ICE compiling value.c from gdb on
> aarch64-none-linux-gnu; the testcase, after preprocessing on aarch64,
> ICEs on both aarch64 and x86_64, but is about 1MB - I'm working on
> reducing that down to something small enough to post...
> 
> $ ./gcc/xgcc -B ./gcc -O2 -g value.c
> ../../binutils-gdb/gdb/value.c: In function ‘show_convenience’:
> ../../binutils-gdb/gdb/value.c:2615:1: error: loop 3’s latch is missing
> ../../binutils-gdb/gdb/value.c:2615:1: internal compiler error: in
> verify_loop_structure, at cfgloop.c:1669
> 0x71e653 verify_loop_structure()
> /work/alalaw01/src2/gcc/gcc/cfgloop.c:1669
> 0x97c6ae checking_verify_loop_structure
> /work/alalaw01/src2/gcc/gcc/cfgloop.h:325
> 0x97c6ae loop_optimizer_init(unsigned int)
> /work/alalaw01/src2/gcc/gcc/loop-init.c:106
> 0x97c78a rtl_loop_init
> /work/alalaw01/src2/gcc/gcc/loop-init.c:398
> 0x97c78a execute
> /work/alalaw01/src2/gcc/gcc/loop-init.c:425

See also PR68549 for why I think this happens "by design".  Thus
I think we need to revert the checking when 
LOOPS_MAY_HAVE_MULTIPLE_LATCHES for now.

Richard.

[patch] Copy-edit the Option Summary in invoke.texi

2015-11-26 Thread Jonathan Wakely


At https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html we document
-Waggressive-loop-optimizations but you can't find that option at
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html because we
document -Wno-aggressive-loop-optimizations instead. Similarly, you
can't find -Wpedantic-ms-format in the full listing, because we
document the negative form, -Wno-pedantic-ms-format, but list *both*
in the summary. This patches fixes those mistakes.

I've also tried to put the list back into alphabetical order, and
re-justified the list a bit to avoid some especially short lines (I
don't understand the inconsistent use of single or double spaces
between options, so if there's some logic to that I've not followed
it, but I think this is an improvement).

OK for trunk?


commit 625ae12118ba144be5a9bd5d2315660be81de5f5
Author: Jonathan Wakely 
Date:   Thu Nov 26 11:25:58 2015 +

Copy-edit the Option Summary in invoke.texi

	* doc/invoke.texi (Option Summary): Use negative form of
	-Waggressive-loop-optimizations, remove redundant -Wpedantic-ms-format,
	sort alphabetically and re-justify.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 53a0467..34f5e1a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -242,39 +242,37 @@ Objective-C and Objective-C++ Dialects}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
 -pedantic-errors @gol
 -w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
--Waggressive-loop-optimizations -Warray-bounds -Warray-bounds=@var{n} @gol
--Wbool-compare -Wduplicated-cond -Wframe-address @gol
--Wno-attributes -Wno-builtin-macro-redefined @gol
+-Wno-aggressive-loop-optimizations -Warray-bounds -Warray-bounds=@var{n} @gol
+-Wno-attributes -Wbool-compare -Wno-builtin-macro-redefined @gol
 -Wc90-c99-compat -Wc99-c11-compat @gol
 -Wc++-compat -Wc++11-compat -Wc++14-compat -Wcast-align  -Wcast-qual  @gol
 -Wchar-subscripts -Wclobbered  -Wcomment -Wconditionally-supported  @gol
--Wconversion -Wcoverage-mismatch -Wdate-time -Wdelete-incomplete -Wno-cpp  @gol
+-Wconversion -Wcoverage-mismatch -Wno-cpp -Wdate-time -Wdelete-incomplete @gol
 -Wno-deprecated -Wno-deprecated-declarations -Wno-designated-init @gol
 -Wdisabled-optimization @gol
 -Wno-discarded-qualifiers -Wno-discarded-array-qualifiers @gol
--Wno-div-by-zero -Wdouble-promotion -Wempty-body  -Wenum-compare @gol
--Wno-endif-labels -Werror  -Werror=* @gol
--Wfatal-errors  -Wfloat-equal  -Wformat  -Wformat=2 @gol
+-Wno-div-by-zero -Wdouble-promotion -Wduplicated-cond @gol
+-Wempty-body  -Wenum-compare -Wno-endif-labels @gol
+-Werror  -Werror=* -Wfatal-errors -Wfloat-equal  -Wformat  -Wformat=2 @gol
 -Wno-format-contains-nul -Wno-format-extra-args -Wformat-nonliteral @gol
--Wformat-security  -Wformat-signedness  -Wformat-y2k @gol
+-Wformat-security  -Wformat-signedness  -Wformat-y2k -Wframe-address @gol
 -Wframe-larger-than=@var{len} -Wno-free-nonheap-object -Wjump-misses-init @gol
 -Wignored-qualifiers  -Wincompatible-pointer-types @gol
 -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int @gol
 -Winit-self  -Winline  -Wno-int-conversion @gol
 -Wno-int-to-pointer-cast -Wno-invalid-offsetof @gol
--Wnull-dereference @gol
--Winvalid-pch -Wlarger-than=@var{len}  -Wunsafe-loop-optimizations @gol
+-Winvalid-pch -Wlarger-than=@var{len} @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmemset-transposed-args @gol
 -Wmisleading-indentation -Wmissing-braces @gol
 -Wmissing-field-initializers -Wmissing-include-dirs @gol
 -Wno-multichar  -Wnonnull  -Wnormalized=@r{[}none@r{|}id@r{|}nfc@r{|}nfkc@r{]} @gol
--Wodr  -Wno-overflow  -Wopenmp-simd @gol
--Woverride-init-side-effects @gol
--Woverlength-strings  -Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
--Wparentheses  -Wpedantic-ms-format -Wno-pedantic-ms-format @gol
+-Wnull-dereference -Wodr  -Wno-overflow  -Wopenmp-simd  @gol
+-Woverride-init-side-effects -Woverlength-strings @gol
+-Wpacked  -Wpacked-bitfield-compat  -Wpadded @gol
+-Wparentheses -Wno-pedantic-ms-format @gol
 -Wplacement-new -Wpointer-arith  -Wno-pointer-to-int-cast @gol
--Wredundant-decls  -Wno-return-local-addr @gol
+-Wno-pragmas -Wredundant-decls  -Wno-return-local-addr @gol
 -Wreturn-type  -Wsequence-point  -Wshadow  -Wno-shadow-ivar @gol
 -Wshift-overflow -Wshift-overflow=@var{n} @gol
 -Wshift-count-negative -Wshift-count-overflow -Wshift-negative-value @gol
@@ -289,7 +287,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
 -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs @gol
 -Wtype-limits  -Wundef @gol
--Wuninitialized  -Wunknown-pragmas  -Wno-pragmas @gol
+-Wuninitialized  -Wunknown-pragmas  -Wunsafe-loop-optimizations @gol
 -Wunsuffixed-float-constants  -Wunused  -Wunused-function @gol
 -Wunused-label  -Wunused-local-typedefs -Wunused-parameter @gol
 -Wno-unused-result -Wunused-value @gol

Re: [PATCH, PR target/68416, i386, MPX] Add bounds registers to ALL_REGS set

2015-11-26 Thread Richard Biener

On Thu, 26 Nov 2015, Uros Bizjak wrote:

> On Thu, Nov 26, 2015 at 12:46 PM, Ilya Enkovich  
> wrote:
> > 2015-11-26 13:15 GMT+03:00 Uros Bizjak :
> 
> >>> 2015-11-26  Vladimir Makarov  
> >>>
> >>> PR target/68416
> >>> * config/i386/i386.h (enum reg_class): Add
> >>> bounds registers to ALL_REGS.
> >>>
> >>> gcc/testsuite/
> >>>
> >>> 2015-11-26  Ilya Enkovich  
> >>>
> >>> PR target/68416
> >>> * gcc.target/i386/mpx/pr68416.c: New test.
> >>
> >> OK for mainline and gcc-5 after gcc 5.3 is released.
> >
> > Do you think this patch is not safe?  This would be useful for 5.3
> > because it significantly reduces MPX overhead on some benchmarks.
> 
> Although it looks safe, it is just a couple of days before the
> release. Let's ask release manager.

I don't consider MPX support release critical so fine with me for 5.3.

Richard.

Re: [PATCH, PR target/68416, i386, MPX] Add bounds registers to ALL_REGS set

2015-11-26 Thread Ilya Enkovich

2015-11-26 15:18 GMT+03:00 Richard Biener :
> On Thu, 26 Nov 2015, Uros Bizjak wrote:
>
>> On Thu, Nov 26, 2015 at 12:46 PM, Ilya Enkovich  
>> wrote:
>> > 2015-11-26 13:15 GMT+03:00 Uros Bizjak :
>>
>> >>> 2015-11-26  Vladimir Makarov  
>> >>>
>> >>> PR target/68416
>> >>> * config/i386/i386.h (enum reg_class): Add
>> >>> bounds registers to ALL_REGS.
>> >>>
>> >>> gcc/testsuite/
>> >>>
>> >>> 2015-11-26  Ilya Enkovich  
>> >>>
>> >>> PR target/68416
>> >>> * gcc.target/i386/mpx/pr68416.c: New test.
>> >>
>> >> OK for mainline and gcc-5 after gcc 5.3 is released.
>> >
>> > Do you think this patch is not safe?  This would be useful for 5.3
>> > because it significantly reduces MPX overhead on some benchmarks.
>>
>> Although it looks safe, it is just a couple of days before the
>> release. Let's ask release manager.
>
> I don't consider MPX support release critical so fine with me for 5.3.

Thanks!

Ilya

>
> Richard.

Re: [PATCH] Fix pattern causing C_MAYBE_CONST_EXPRs leak into gimplifier (PR c/68513)

2015-11-26 Thread Joseph Myers

On Thu, 26 Nov 2015, Marek Polacek wrote:

> I had a go at this, but I'm now skeptical about removing c_save_expr.
> save_expr calls fold (), so we need to ensure that we don't pass any
> C_MAYBE_CONST_EXPRs into it, meaning that we'd need to call c_fully_fold 
> before
> save_expr anyway...
> 
> So maybe go the "remove C_MAYBE_CONST_EXPRs in SAVE_EXPRs in c_gimplify_expr"
> way?

I believe it should be safe for gimplification to process 
C_MAYBE_CONST_EXPR in the same way c_fully_fold_internal does.  That is, 
this should not affect correctness.  If a C_MAYBE_CONST_EXPR got through 
to gimplification, in some cases it may mean that something did not get 
properly folded with c_fully_fold as it should have done - but if the move 
to match.pd means all optimizations currently done with fold end up 
working on GIMPLE as well, any missed optimizations from this should 
disappear (and if we can solve the diagnostics issues, eventually fewer 
calls to c_fully_fold should be needed and they should be more about 
checking what can occur in constant expressions and less about folding for 
optimization).

The general principle of delaying folding also means that we should move 
away from convert_* folding things.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCHES, PING*5] Enhance standard DWARF for Ada

2015-11-26 Thread Pierre-Marie de Rodat


On 11/25/2015 07:35 PM, Jason Merrill wrote:

Actually, even though my patches introduce DWARF procedures for only one

case (size functions from stor-layout.c), they don’t necessarily come
from code generation (GENERIC): they are just a way to factorize common
DWARF operations. Thinking more about it, it may be more sound to store
stack slot diffs instead of FUNCTION_DECL nodes in dwarf_proc_decl_table.


Makes sense.


Done! (I repalced the dwarf_proc_decl_table hash table with a 
dwarf_proc_stack_usage_map hash_map) Here's an update for the only 
affected patch. Regtested again on x86_64-linux.


--
Pierre-Marie de Rodat
>From 46826e401566c26ad77e2bb6b782cc6034b96fd3 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Thu, 3 Jul 2014 14:16:09 +0200
Subject: [PATCH 2/8] DWARF: handle variable-length records and variant parts

Enhance the DWARF back-end to emit proper descriptions for
variable-length records as well as variant parts in records.

In order to achieve this, generate DWARF expressions ("location
descriptions" in dwarf2out's parlance) for size and data member location
attributes.  Also match QUAL_UNION_TYPE data types as variant parts,
assuming the formers appear only to implement the latters (which is the
case at the moment: only the Ada front-end emits them).

Note that very few debuggers can handle these descriptions (GDB does not
yet), so in order to ease the the transition enable these only when
-fgnat-encodings=minimal.

gcc/ada/ChangeLog:

	* gcc-interface/decl.c (gnat_to_gnu_entity): Disable ___XVS GNAT
	encodings when -fgnat-encodings=minimal.
	(components_to_record): Disable ___XVE, ___XVN, ___XVU and
	___XVZ GNAT encodings when -fgnat-encodings=minimal.
	* gcc-interface/utils.c (maybe_pad_type): Disable __XVS GNAT
	encodings when -fgnat-encodings=minimal.

gcc/ChangeLog:

	* function.h (struct function): Add a preserve_body field.
	* cgraph.c (cgraph_node::release_body): Preserve bodies when
	asked to by the preserve_body field.
	* stor-layout.c (finalize_size_functions): Keep a copy of the
	original function tree and set the preserve_body field in the
	function structure.
	* dwarf2out.h (dw_discr_list_ref): New typedef.
	(enum dw_val_class): Add value classes for discriminant values
	and discriminant lists.
	(struct dw_discr_value): New structure.
	(struct dw_val_node): Add discriminant values and discriminant
	lists to the union.
	(struct dw_loc_descr_node): Add frame_offset_rel and
	dw_loc_frame_offset (only for checking) fields to handle DWARF
	procedures generation.
	(struct dw_discr_list_node): New structure.
	* dwarf2out.c (new_loc_descr): Initialize the
	dw_loc_frame_offset field.
	(dwarf_proc_stack_usage_map): New.
	(dw_val_equal_p): Handle discriminants.
	(size_of_discr_value): New.
	(size_of_discr_list): New.
	(size_of_die): Handle discriminants.
	(add_loc_descr_to_each): New.
	(add_loc_list): New.
	(print_discr_value): New.
	(print_dw_val): Handle discriminants.
	(value_format): Handle discriminants.
	(output_discr_value): New.
	(output_die): Handle discriminants.
	(output_loc_operands): Handle DW_OP_call2 and DW_OP_call4.
	(uint_loc_descriptor): New.
	(uint_comparison_loc_list): New.
	(loc_list_from_uint_comparison): New.
	(add_discr_value): New.
	(add_discr_list): New.
	(AT_discr_list): New.
	(loc_descr_to_next_no_op): New.
	(free_loc_descr): New.
	(loc_descr_without_nops): New.
	(struct loc_descr_context): Add a dpi field.
	(struct dwarf_procedure_info): New helper structure.
	(new_dwarf_proc_die): New.
	(is_handled_procedure_type): New.
	(resolve_args_picking_1): New.
	(resolve_args_picking): New.
	(function_to_dwarf_procedure): New.
	(copy_dwarf_procedure): New.
	(copy_dwarf_procs_ref_in_attrs): New.
	(copy_dwarf_procs_ref_in_dies): New.
	(break_out_comdat_types): Copy DWARF procedures along with the
	types that reference them.
	(loc_list_from_tree): Rename into loc_list_from_tree_1.  Handle
	CALL_EXPR in the cases suitable for DWARF procedures.  Handle
	for PARM_DECL when generating a location description for a DWARF
	procedure.  Handle big unsigned INTEGER_CST nodes.  Handle
	NON_LVALUE_EXPR, EXACT_DIV_EXPR and all unsigned comparison
	operators.  Add a wrapper for loc_list_from_tree that strips
	DW_OP_nop operations from the result.
	(type_byte_size): New.
	(struct vlr_context): New helper structure.
	(field_byte_offset): Change signature to return either a
	constant offset or a location description for dynamic ones.
	Handle dynamic byte offsets with constant bit offsets and handle
	fields in variant parts.
	(add_data_member_location): Change signature to handle dynamic
	member offsets and fields in variant parts.  Update call to
	field_byte_offset.  Handle location lists.  Emit a variable data
	member location only when -fgnat-encodings=minimal.
	(add_bound_info): Emit self-referential bounds only when
	-fgnat-encodings=minimal.
	(add_byte_size_attribute): Use type_byte_size in order to handle
	dynamic type sizes.  Emit variable

Re: [PATCH 01/15] Selftest framework (unittests v4)

2015-11-26 Thread Bernd Schmidt


On 11/25/2015 11:47 PM, David Malcolm wrote:

FWIW, the reason I special-cased the linked list was to avoid any
dynamic memory allocation: the ctors run before main, so I wanted to
keep them as simple as possible.


Is there any particular reason for this? C++ doesn't disallow memory 
allocation in global constructors, does it?



Putting the linked list directly into
those objects means that running the ctors is a simple case of wiring up
some pointers: the memory is already statically allocated.  (also, one
thing I want to test is vec<> itself [1]).


Ok so use a C++ list instead of a vec. My days of using C++ for personal 
projects are 15 years in the past so maybe I'm not an authority, but 
that's how I feel the language is supposed to be used - use provided 
data structures rather than coding them up over and over.



I do want some level of determinism over test ordering, for the sake of
everyone's sanity.  It's probably simplest to either hardcode the order,
or have priority levels.  I favor the former (and right now am leaning
towards a very explicit no-magic approach with no auto-registration,
given the linker issues I've been seeing with auto-registration).


I guess that works too. Certainly explicit function calls are 
preferrable over #including other C files as a workaround for such a 
problem.


I still wish others would chime in on the rest of the issues we've 
discussed (run to first failure vs. providing elaborate test summaries), 
I want to make my preference clear but I don't want to dictate it.



Bernd

Re: [PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions

2015-11-26 Thread Segher Boessenkool

On Thu, Nov 26, 2015 at 09:50:50AM +, Kyrill Tkachov wrote:
> As I mentioned on IRC, this patch improves codegen on aarch64 as well.
> I've re-checked SPEC2006 and it seems to improve codegen around 
> multiply-extend-accumulate
> instructions. For example the sequence:
> movw4, 64
> movx1, 24
> smaddlx1, w9, w4, x1 // multiply-sign-extend-accumulate
> addx1, x3, x1
> 
> becomes something like this:
> movw3, 64
> smaddlx1, w9, w3, x0
> addx1, x1, 24  // constant 24 propagated into the add

So combine isn't smart enough to combine those last three into those
last two.  Yeah that makes sense.

> Another was transforming the muliply-extend into something cheaper:
> movx0, 40
> movw22, 32
> umaddlx22, w21, w22, x0 // multiply-zero-extend-accumulate
> 
> changed becomes:
> ubfizx22, x21, 5, 32 // ASHIFT+extend
> addx22, x22, 40
> 
> which should be always beneficial.

But it only applies given some other preconditions. right?  Either case,
make sense that one is also too complicated for combine.

> From what I can see we don't lose any of the multiply-extend-accumulate
>  opportunities that we gained from the original combine patch.
> 
> So can we take this patch in as well?

See the patch mail...


Segher

Re: [PATCH][combine] PR rtl-optimization/68381: Only restrict pure simplification in mult-extend subst case, allow other substitutions

2015-11-26 Thread Segher Boessenkool

On Thu, Nov 19, 2015 at 10:26:25AM +, Kyrill Tkachov wrote:
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2015-11-19  Kyrylo Tkachov  
> 
> PR rtl-optimization/68381
> * combine.c (subst): Do not return clobber of zero in widening mult
> case.  Just return x unchanged if it is a no-op substitution.
> 
> 2015-11-19  Kyrylo Tkachov  
> 
> PR rtl-optimization/68381
> * gcc.c-torture/execute/pr68381.c: New test.

This is fine for trunk.  Thanks.


Segher

[PATCH] Fix genmatch SAVE_TEMPS usage for multi-uses

2015-11-26 Thread Richard Biener


This fixes the issue that genmatch wraps captures in SAVE_TEMPS
only for correctness reasons right now (for TREE_SIDE_EFFECTS
captures) but not to avoid duplicating expensive computations.

The following fixes that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk,
testing on the branch right now.

Richard.

2015-11-26  Richard Biener  

* genmatch.c (dt_simplify::gen_1): For generic wrap all
multi-result-use captures in a SAVE_EXPR.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 230924)
+++ gcc/genmatch.c  (working copy)
@@ -3112,16 +3111,10 @@ dt_simplify::gen_1 (FILE *f, int indent,
  {
if (cinfo.info[i].same_as != (unsigned)i)
  continue;
-   if (!cinfo.info[i].force_no_side_effects_p
-   && cinfo.info[i].result_use_count > 1)
- {
-   fprintf_indent (f, indent,
-   "if (TREE_SIDE_EFFECTS (captures[%d]))\n",
-   i);
-   fprintf_indent (f, indent,
-   "  captures[%d] = save_expr 
(captures[%d]);\n",
-   i, i);
- }
+   if (cinfo.info[i].result_use_count > 1)
+ fprintf_indent (f, indent,
+ "captures[%d] = save_expr (captures[%d]);\n",
+ i, i);
  }
  for (unsigned j = 0; j < e->ops.length (); ++j)
{

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-26 Thread Bernd Schmidt


On 11/26/2015 12:12 PM, Kyrill Tkachov wrote:

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index af7a3b9..3e3dc8d 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2220,7 +2220,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
  }

  }
-if (emit_a && modified_in_a)
+if (emit_a || modified_in_a)
{
Having stared at it in the debugger for a while, I think I managed to 
convince myself that this is correct. So, OK.


A few other comments. This whole if block is indented too far, please 
fix while you're there. Also eliminate the unnecessary blank lines 
before closing braces (two instances inside this if block). There are 
other formatting errors in this function, but those are best left alone 
for now.



modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);


Can this ever be true? We arrange for emit_b to set a new pseudo, don't 
we? Are we allowing cases where we copy a pattern that sets more than 
one register, and is that safe?



Bernd

[PATCH] Fix PR68554

2015-11-26 Thread Richard Biener


Committed.

Richard.

2015-11-26  Richard Biener  

PR testsuite/68554
* gcc.dg/vect/bb-slp-subgroups-2.c: Require vect_perm.

Index: gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c  (revision 230942)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_perm } */
 /* PR tree-optimization/67682.  */
 
 #include "tree-vect.h"

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-26 Thread Kyrill Tkachov



On 26/11/15 13:40, Bernd Schmidt wrote:

On 11/26/2015 12:12 PM, Kyrill Tkachov wrote:

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index af7a3b9..3e3dc8d 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2220,7 +2220,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
}

  }
-if (emit_a && modified_in_a)
+if (emit_a || modified_in_a)
{

Having stared at it in the debugger for a while, I think I managed to convince 
myself that this is correct. So, OK.



Thanks.

A few other comments. This whole if block is indented too far, please fix while you're there. Also eliminate the unnecessary blank lines before closing braces (two instances inside this if block). There are other formatting errors in this 
function, but those are best left alone for now.


Ok, I'll fix the indentation in that if-else block




  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a, emit_b);


Can this ever be true? We arrange for emit_b to set a new pseudo, don't we? Are 
we allowing cases where we copy a pattern that sets more than one register, and 
is that safe?



You're right, this statement always sets modifieb_in_b to false. We reject 
anything bug single_set insns
by this point in the code. I'll replace that with modified_in_b = false;

Thanks,
Kyrill


Bernd

[PATCH] Fix PR68555

2015-11-26 Thread Richard Biener


Committed.

Richard.

2015-11-26  Richard Biener  

PR testsuite/68555
* gcc.dg/vect/bb-slp-10.c: Adjust pattern, use target selector
and not XFAIL.

Index: gcc/testsuite/gcc.dg/vect/bb-slp-10.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-10.c   (revision 230959)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-10.c   (working copy)
@@ -49,6 +49,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "unsupported alignment in basic block." "slp2" 
{ xfail vect_element_align } } } */
+/* { dg-final { scan-tree-dump "bad data alignment in basic block" "slp2" { 
target { ! vect_element_align } } } } */
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" { 
target vect_element_align } } } */

Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute

2015-11-26 Thread Alan Lawrence

On 6 November 2015 at 16:59, Jakub Jelinek  wrote:
>
> In any case, to manually reproduce, compile
> gnatmake -g -gnatws macrosub.adb
> with GCC 5.1.1 (before the ARM changes) and then try to run that process 
> against
> GCC 5.2.1 (after the ARM changes) libgnat-5.so, which is what make check
> does (it uses host_gnatmake to compile the support stuff, so ideally the
> processes built by host gcc/gnatmake should not be run with the
> LD_LIBRARY_PATH=$ADA_INCLUDE_PATH:$BASE:$LD_LIBRARY_PATH
> in the environment, and others should).
> In macrosub in particular, the problem is in:
>   WHILE NOT END_OF_FILE (INFILE1) LOOP
>GET_LINE (INFILE1, A_LINE, A_LENGTH);
> in FILL_TABLE, where A_LINE'First is 0 and A_LINE'Last is 400 (if I remember
> right), but if you step into GET_LINE compiled by GCC 5.2.1, Item'First
> and Item'Last don't match that.

Ok, I see the mismatch now.

However, to get there, I had to use my 5.1 gnatmake -g -gnatws
macrosub.ads --rts=/path/to/5.2/arm-none-linux-gnueabihf/libada, as if
I ran 5.1 gnatmake without that flag, I did not manage to get the
wrong value passed/received with LD_LIBRARY_PATH set to any of
build-5.2/gcc/ada/rts, build-5.2/arm-none-linux-gnueabihf/libada,
build-5.2/arm-none-linux-gnueabihf/libada/adalib (any further
suggestions?). [Also I note 'LD_DEBUG=all ./macrosub' does not show
libgnat being loaded that way.]

With 5.1 gnatmake -g -gnatws macrosub.ads
--rts=/path/to/5.2/arm-none-linux-gnueabihf/libada :

$ gdb ./macrosub
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
[snip]
Reading symbols from ./macrosub...done.
(gdb) break get_line
Breakpoint 1 at 0x1aeec: get_line. (4 locations)
(gdb) run
Starting program:
/home/alalaw01/build-5.1.0/gcc/testsuite/ada/acats/support/macrosub
BEGINNING MACRO SUBSTITUTIONS.

Breakpoint 1, ada.text_io.get_line (item=...) at a-tigeli.adb:41
41  procedure Get_Line
(gdb) print item'first
$1 = -443273216
(gdb) print item'last
$2 = -514850813
(gdb) n
146FIO.Check_Read_Status (AP (File));
(gdb) n
152if Item'First > Item'Last then
(gdb) print item'first
$3 = 1
(gdb) print item'last
$4 = 0
(gdb) up
#1  0x0001f34c in getsubs.fill_table () at getsubs.adb:122
122GET_LINE (INFILE1, A_LINE, A_LENGTH);
(gdb) print a_line'first
$5 = 1
(gdb) print a_line'last
$6 = 400

So yes, we have an ABI change; which is not entirely unexpected. So,
questions

(1) Why does LD_LIBRARY_PATH affect your system, not mine (i.e. if
this is because my gnatmake is building with static linking, then
why). This is maybe the least interesting question so I'm leaving it
for now...
(2) If/when LD_LIBRARY_PATH does have an effect - as you say, things
compiled with host gnatmake, should be run against host libraries, not
against target libraries. Otherwise, potentially *any* gcc ABI change
can break the build process, right? So I think this is of interest
regardless of the ARM AAPCS change, but I will be slightly
presumptious and hope that the Adacore folk will pick this up...[CC
Eric]
(3) Has the ARM AAPCS had an effect that we didn't mean it to? I don't
see any evidence so far that this is _necessarily_ the case, but I
will look into this, bearing Florian's advice in mind (thanks!)...

--Alan

[PATCH] MIPS/GCC/doc: Reorder `-mcompact-branches='

2015-11-26 Thread Maciej W. Rozycki

Move the `-mcompact-branches=' option out of the middle of a block of 
floating-point options.  The option is not related to FP in any way.  
Place it immediately below other branch instruction selection options.

gcc/
* doc/invoke.texi (Option Summary) : Reorder
`-mcompact-branches='.
(MIPS Options): Likewise.
---

 OK to apply?

  Maciej

gcc-mips-compact-branches-doc-fix.diff
Index: gcc/gcc/doc/invoke.texi
===
--- gcc.orig/gcc/doc/invoke.texi2015-11-23 21:02:46.594781253 +
+++ gcc/gcc/doc/invoke.texi 2015-11-23 23:12:45.009558919 +
@@ -793,7 +793,6 @@ Objective-C and Objective-C++ Dialects}.
 -mgp32  -mgp64  -mfp32  -mfpxx  -mfp64  -mhard-float  -msoft-float @gol
 -mno-float  -msingle-float  -mdouble-float @gol
 -modd-spreg -mno-odd-spreg @gol
--mcompact-branches=@var{policy} @gol
 -mabs=@var{mode}  -mnan=@var{encoding} @gol
 -mdsp  -mno-dsp  -mdspr2  -mno-dspr2 @gol
 -mmcu -mmno-mcu @gol
@@ -824,6 +823,7 @@ Objective-C and Objective-C++ Dialects}.
 -mfix-vr4130  -mno-fix-vr4130  -mfix-sb1  -mno-fix-sb1 @gol
 -mflush-func=@var{func}  -mno-flush-func @gol
 -mbranch-cost=@var{num}  -mbranch-likely  -mno-branch-likely @gol
+-mcompact-branches=@var{policy} @gol
 -mfp-exceptions -mno-fp-exceptions @gol
 -mvr4130-align -mno-vr4130-align -msynci -mno-synci @gol
 -mrelax-pic-calls -mno-relax-pic-calls -mmcount-ra-address @gol
@@ -17602,30 +17602,6 @@ for the o32 ABI.  This is the default fo
 support these registers.  When using the o32 FPXX ABI, @option{-mno-odd-spreg}
 is set by default.
 
-@item -mcompact-branches=never
-@itemx -mcompact-branches=optimal
-@itemx -mcompact-branches=always
-@opindex mcompact-branches=never
-@opindex mcompact-branches=optimal
-@opindex mcompact-branches=always
-These options control which form of branches will be generated.  The
-default is @option{-mcompact-branches=optimal}.
-
-The @option{-mcompact-branches=never} option ensures that compact branch
-instructions will never be generated.
-
-The @option{-mcompact-branches=always} option ensures that a compact
-branch instruction will be generated if available.  If a compact branch
-instruction is not available, a delay slot form of the branch will be
-used instead.
-
-This option is supported from MIPS Release 6 onwards.
-
-The @option{-mcompact-branches=optimal} option will cause a delay slot
-branch to be used if one is available in the current ISA and the delay
-slot is successfully filled.  If the delay slot is not filled, a compact
-branch will be chosen if one is available.
-
 @item -mabs=2008
 @itemx -mabs=legacy
 @opindex mabs=2008
@@ -18189,6 +18165,30 @@ and processors that implement those arch
 Likely instructions are not be generated by default because the MIPS32
 and MIPS64 architectures specifically deprecate their use.
 
+@item -mcompact-branches=never
+@itemx -mcompact-branches=optimal
+@itemx -mcompact-branches=always
+@opindex mcompact-branches=never
+@opindex mcompact-branches=optimal
+@opindex mcompact-branches=always
+These options control which form of branches will be generated.  The
+default is @option{-mcompact-branches=optimal}.
+
+The @option{-mcompact-branches=never} option ensures that compact branch
+instructions will never be generated.
+
+The @option{-mcompact-branches=always} option ensures that a compact
+branch instruction will be generated if available.  If a compact branch
+instruction is not available, a delay slot form of the branch will be
+used instead.
+
+This option is supported from MIPS Release 6 onwards.
+
+The @option{-mcompact-branches=optimal} option will cause a delay slot
+branch to be used if one is available in the current ISA and the delay
+slot is successfully filled.  If the delay slot is not filled, a compact
+branch will be chosen if one is available.
+
 @item -mfp-exceptions
 @itemx -mno-fp-exceptions
 @opindex mfp-exceptions

[PTX] simplify call emission

2015-11-26 Thread Nathan Sidwell


I've committed this patch to simplify some more call emission machinery.

write_func_decl_from_insn was doing more work than necessary.
1) it doesn't need to examine the callee to figure out whether this is an 
indirect call or not.  It's callers have already done this, and can pass in the 
relevant info (name or NULL).


2) it doesn't need to deal with split regs.  That was already done when the call 
was expanded.


nvptx_output_call_insn had the same issue with split regs.

I changed the formatting slightly, so the proto-1.c testcase needed a tweak. 
While there I canged the fn name from nearly-but-not-quite acc_on_device, to 
'foo', so it didn;t look superficially like a builtin test.


nathan
2015-11-26  Nathan Sidwell  

	* config/nvptx/nvptx.c (write_func_decl_from_insn): Replace callee
	arg with name.  Don't deal with split regs.  Tweak formatting.
	(nvptx_expand_call): Adjust write_func_decl_from_insn call.
	(nvptx_output_call_insn): Don't deal with split regs here.

	testsuite/
	* gcc.target/nvptx/proto-1.c: Adjust expected asm.


Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 230963)
+++ config/nvptx/nvptx.c	(working copy)
@@ -719,69 +719,46 @@ nvptx_output_return (void)
generated by emit_library_call for which no decl exists.  */
 
 static void
-write_func_decl_from_insn (std::stringstream , rtx result, rtx pat,
-			   rtx callee)
+write_func_decl_from_insn (std::stringstream , const char *name,
+			   rtx result, rtx pat)
 {
-  bool callprototype = register_operand (callee, Pmode);
-  const char *name = "_";
-  if (!callprototype)
+  if (!name)
+{
+  s << "\t.callprototype ";
+  name = "_";
+}
+  else
 {
-  name = XSTR (callee, 0);
-  name = nvptx_name_replacement (name);
   s << "\n// BEGIN GLOBAL FUNCTION DECL: " << name << "\n";
+  s << "\t.extern .func ";
 }
-  s << (callprototype ? "\t.callprototype\t" : "\t.extern .func ");
 
   if (result != NULL_RTX)
-{
-  s << "(.param";
-  s << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
- false);
-  s << " ";
-  if (callprototype)
-	s << "_";
-  else
-	s << "%out_retval";
-  s << ")";
-}
+s << "(.param"
+  << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)), false)
+  << " %rval) ";
 
   s << name;
 
+  const char *sep = " (";
   int arg_end = XVECLEN (pat, 0);
-  
-  if (1 < arg_end)
+  for (int i = 1; i < arg_end; i++)
 {
-  const char *comma = "";
-  s << " (";
-  for (int i = 1; i < arg_end; i++)
-	{
-	  rtx t = XEXP (XVECEXP (pat, 0, i), 0);
-	  machine_mode mode = GET_MODE (t);
-	  machine_mode split = maybe_split_mode (mode);
-	  int count = 1;
-
-	  if (split != VOIDmode)
-	{
-	  mode = split;
-	  count = 2;
-	}
-
-	  while (count--)
-	{
-	  s << comma << ".param";
-	  s << nvptx_ptx_type_from_mode (mode, false);
-	  s << " ";
-	  if (callprototype)
-		s << "_";
-	  else
-		s << "%arg" << i - 1;
-	  if (mode == QImode || mode == HImode)
-		s << "[1]";
-	  comma = ", ";
-	}
-	}
-  s << ")";
+  /* We don't have to deal with mode splitting here, as that was
+	 already done when generating the call sequence.  */
+  machine_mode mode = GET_MODE (XEXP (XVECEXP (pat, 0, i), 0));
+
+  s << sep
+	<< ".param"
+	<< nvptx_ptx_type_from_mode (mode, false)
+	<< " %arg"
+	<< i;
+  if (mode == QImode || mode == HImode)
+	s << "[1]";
+  sep = ", ";
 }
+  if (arg_end != 1)
+s << ")";
   s << ";\n";
 }
 
@@ -905,10 +882,7 @@ nvptx_expand_call (rtx retval, rtx addre
   && stdarg_p (cfun->machine->funtype))
 {
   varargs = gen_reg_rtx (Pmode);
-  if (Pmode == DImode)
-	emit_move_insn (varargs, stack_pointer_rtx);
-  else
-	emit_move_insn (varargs, stack_pointer_rtx);
+  emit_move_insn (varargs, stack_pointer_rtx);
   cfun->machine->has_call_with_varargs = true;
 }
   vec = rtvec_alloc (nargs + 1 + (varargs ? 1 : 0));
@@ -951,7 +925,11 @@ nvptx_expand_call (rtx retval, rtx addre
   if (*slot == NULL)
 	{
 	  *slot = callee;
-	  write_func_decl_from_insn (func_decls, retval, pat, callee);
+
+	  const char *name = XSTR (callee, 0);
+	  if (decl_type)
+	name = nvptx_name_replacement (name);
+	  write_func_decl_from_insn (func_decls, name, retval, pat);
 	}
 }
 
@@ -1798,7 +1776,7 @@ nvptx_assemble_undefined_decl (FILE *fil
 const char *
 nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
 {
-  char buf[256];
+  char buf[16];
   static int labelno;
   bool needs_tgt = register_operand (callee, Pmode);
   rtx pat = PATTERN (insn);
@@ -1825,36 +1803,22 @@ nvptx_output_call_insn (rtx_insn *insn,
   labelno++;
   ASM_OUTPUT_LABEL (asm_out_file, buf);
   std::stringstream s;
-  write_func_decl_from_insn (s, result, pat, callee);
+  write_func_decl_from_insn (s, NULL,

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Bernd Schmidt


On 11/25/2015 01:20 PM, Richard Sandiford wrote:

This series fixes PR 68432, a regression caused by my internal-functions-
for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
condition but conditionally FAIL if we're optimising for size:

   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
   && !flag_trapping_math)
 {
   if (TARGET_ROUND)
emit_insn (gen_sse4_1_round2
   (operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
   else if (optimize_insn_for_size_p ())
 FAIL;
   else
ix86_expand_rint (operands[0], operands[1]);
 }


How many such cases are there? Is it just the ix86 patterns? And, could 
the same effect be achieved by just moving the optimize_insn_for_size_p 
test into the predicate (as some existing patterns already do), and then 
testing the predicate while ensuring that optimize_insn_for_x returns 
the right value? That seems like a minimal fix, and I think one that 
would be vastly more appropriate for stage 3. The alternative splitting 
looks error-prone and may not be optimal, and I still have misgivings 
about the new attribute syntax and its application to define_expands.



Bernd

Re: [PATCH, PING*4] PR debug/53927: fix value for DW_AT_static_link

2015-11-26 Thread Pierre-Marie de Rodat


Thank you Jason!

On 11/25/2015 09:35 PM, Eric Botcazou wrote:

We try to declare variables only at the first use point now I think.


Fixed, thanks!


+  /* Debugging information needs to compute the frame base address of the
+nestee frame out of the static chain from the nested frame.

"parent frame"


Just to make sure: instead of “nestee”: fixed.


No useless period: "debug info"


Fixed.


"(through the debugging information)" sounds superfluous. "in order not to..."


Done.


+  fb_decl = make_node (FIELD_DECL);
+  name = concat ("FRAME_BASE.",
+IDENTIFIER_POINTER (DECL_NAME (root->context)),
+NULL);
+  DECL_NAME (fb_decl) = get_identifier (name);
+  free (name);

Let's avoid this concat/free business and use a simpler name.


Right, I put instead:

DECL_NAME (fb_decl) = get_identifier ("FRAME_BASE.PARENT");


TYPE_FIELDS (root->frame_type)
   = chainon (TYPE_FIELDS (root->frame_type), fb_decl);


Much better, thanks. :-)

Here’s the updated patch. Regtested again on x86_64-linux.

--
Pierre-Marie de Rodat
>From 3f067709d86fea108ef4debcc7f9b39cde91e644 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Wed, 25 Feb 2015 14:48:24 +0100
Subject: [PATCH] DWARF: fix loc. descr. generation for DW_AT_static_link

gcc/ChangeLog:

	PR debug/53927
	* tree-nested.c (finalize_nesting_tree_1): Append a field to
	hold the frame base address.
	* dwarf2out.c (gen_subprogram_die): Generate for
	DW_AT_static_link a location description that computes the value
	of this field.
---
 gcc/dwarf2out.c   | 20 +++---
 gcc/tree-nested.c | 62 +--
 2 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index f184750..5249fca 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -19113,9 +19113,23 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
   compute_frame_pointer_to_fb_displacement (cfa_fb_offset);
 
   if (fun->static_chain_decl)
-	add_AT_location_description
-	  (subr_die, DW_AT_static_link,
-	   loc_list_from_tree (fun->static_chain_decl, 2, NULL));
+	{
+	  /* DWARF requires here a location expression that computes the
+	 address of the enclosing subprogram's frame base.  The machinery
+	 in tree-nested.c is supposed to store this specific address in the
+	 last field of the FRAME record.  */
+	  const tree frame_type
+	= TREE_TYPE (TREE_TYPE (fun->static_chain_decl));
+	  const tree fb_decl = tree_last (TYPE_FIELDS (frame_type));
+
+	  tree fb_expr
+	= build1 (INDIRECT_REF, frame_type, fun->static_chain_decl);
+	  fb_expr = build3 (COMPONENT_REF, TREE_TYPE (fb_decl),
+			fb_expr, fb_decl, NULL_TREE);
+
+	  add_AT_location_description (subr_die, DW_AT_static_link,
+   loc_list_from_tree (fb_expr, 0, NULL));
+	}
 }
 
   /* Generate child dies for template paramaters.  */
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 1f6311c..cf6b0d8 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -2718,10 +2718,10 @@ fold_mem_refs (tree *const , void *data ATTRIBUTE_UNUSED)
   return true;
 }
 
-/* Do "everything else" to clean up or complete state collected by the
-   various walking passes -- lay out the types and decls, generate code
-   to initialize the frame decl, store critical expressions in the
-   struct function for rtl to find.  */
+/* Do "everything else" to clean up or complete state collected by the various
+   walking passes -- create a field to hold the frame base address, lay out the
+   types and decls, generate code to initialize the frame decl, store critical
+   expressions in the struct function for rtl to find.  */
 
 static void
 finalize_nesting_tree_1 (struct nesting_info *root)
@@ -2737,20 +2737,70 @@ finalize_nesting_tree_1 (struct nesting_info *root)
  out at this time.  */
   if (root->frame_type)
 {
+  /* Debugging information needs to compute the frame base address of the
+	 parent frame out of the static chain from the nested frame.
+
+	 The static chain is the address of the FRAME record, so one could
+	 imagine it would be possible to compute the frame base address just
+	 adding a constant offset to this address.  Unfortunately, this is not
+	 possible: if the FRAME object has alignment constraints that are
+	 stronger than the stack, then the offset between the frame base and
+	 the FRAME object will be dynamic.
+
+	 What we do instead is to append a field to the FRAME object that holds
+	 the frame base address: then debug info just has to fetch this
+	 field.  */
+
+  /* Debugging information will refer to the CFA as the frame base
+	 address: we will do the same here.  */
+  const tree frame_addr_fndecl
+= builtin_decl_explicit (BUILT_IN_DWARF_CFA);
+
+  /* Create a field in the FRAME record to hold the frame base address for
+	 this stack frame.  Since it will be used only by the

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-26 Thread Bernd Schmidt


On 11/26/2015 02:52 PM, Kyrill Tkachov wrote:


On 26/11/15 13:40, Bernd Schmidt wrote:

On 11/26/2015 12:12 PM, Kyrill Tkachov wrote:

  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a,
emit_b);


Can this ever be true? We arrange for emit_b to set a new pseudo,
don't we? Are we allowing cases where we copy a pattern that sets more
than one register, and is that safe?


You're right, this statement always sets modifieb_in_b to false. We
reject anything bug single_set insns
by this point in the code. I'll replace that with modified_in_b = false;


Note that there's a mirrored test for modified_in_a, and both are 
already initialized to false. Also - careful with single_set, it can 
return true even for multiple sets in case there's a REG_DEAD note on 
one of them. You might want to strengthen your tests to also include 
!multiple_sets. Then, maybe instead of deleting these tests, turn them 
into gcc_checking_asserts.



Bernd

Re: [PR68432 20/22] Record attributes for define_expand

2015-11-26 Thread Bernd Schmidt


On 11/25/2015 05:08 PM, Richard Sandiford wrote:

Also, using a string like that rather than some kind of
identifier or a define_icode_attr maybe isn't the best approach?


By "some kind of identifier" do you just mean replacing "code,alternative"
with a string that doesn't have a comma?


Yeah. It really looks too much like the definition of attribute values IMO.


The problem with define_icode_attr is that you get combinatorial
explosion with the type of the return value.  At the moment we just have
integers and enums (which can be defined by define_enum_attr as well as
define_attr), but who knows what we'll have in future? :-)


Maybe

(define_typed_attr {enum,int} {insn,icode,icodealt}
 [all the usual stuff])


Bernd

[PATCH RFC 1/2] MIPS/GCC: Factor out LINK_SPEC

2015-11-26 Thread Maciej W. Rozycki

Factor out common linker specs to LINK_SPEC, to be included from 
individual SUBTARGET_LINK_SPEC definitions, along the lines of 
SUBTARGET_ASM_SPEC, SUBTARGET_CC1_SPEC, SUBTARGET_CPP_SPEC, etc.  This 
essentially revives the use of the LINK_SPEC definition from mips.h 
which by now has become overridden by nearly all MIPS subtargets.  The 
only target omitted from this change is VxWorks.

There is no functional change here in principle, however this does 
change the handling of `-mips*' options a bit for some subtargets which 
used different glob patterns or just listed all individual options meant 
to be supported.  I think such a unification of the MIPS target as a 
whole does make sense, especially in the face of `-mips*' options being 
legacy aliases for corresponding `-march=mips*' options which do not 
receive any special per-subtarget treatment.

Also `%{shared}' despite being common needs special treatment because of 
how `%(netbsd_link_spec)' has been defined.

gcc/
* gcc/config/mips/mips.h [!LINK_SHARED_SPEC] (LINK_SHARED_SPEC):
New macro.
[!SUBTARGET_LINK_SPEC] (SUBTARGET_LINK_SPEC): Likewise.
(LINK_SPEC): Replace `%{shared}' with LINK_SHARED_SPEC, add 
`%(subtarget_link_spec)'.
(EXTRA_SPECS): Add "subtarget_link_spec".
* config/mips/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): Remove
`%{G*}', `%{EB}', `%{EL}', `%{mips*}' and `%{shared}'.
(LINK_SPEC): Rename macro to...
(SUBTARGET_LINK_SPEC): ... this.
* gcc/config/mips/linux-common.h (LINK_SPEC): Rename macro to...
(SUBTARGET_LINK_SPEC): ... this.
* gcc/config/mips/netbsd.h (LINK_SHARED_SPEC): New macro.
(LINK_SPEC): Rename macro to...
(SUBTARGET_LINK_SPEC): ... this.  Remove `%(endian_spec)', 
`%{G*}' and all individual `%{mips*}' pieces.
* gcc/config/mips/sde.h (LINK_SPEC): Rename macro to...
(SUBTARGET_LINK_SPEC): ... this.  Remove `%(endian_spec)',
`%{G*}', all individual `%{mips*}' pieces and `%{shared}'.
---
gcc-mips-subtarget-link-spec.diff
Index: gcc/gcc/config/mips/gnu-user.h
===
--- gcc.orig/gcc/config/mips/gnu-user.h 2015-09-18 00:39:02.199086616 +0100
+++ gcc/gcc/config/mips/gnu-user.h  2015-09-18 01:46:22.970510955 +0100
@@ -54,7 +54,6 @@ along with GCC; see the file COPYING3.  
 
 #undef GNU_USER_TARGET_LINK_SPEC
 #define GNU_USER_TARGET_LINK_SPEC "\
-  %{G*} %{EB} %{EL} %{mips*} %{shared} \
   %{!shared: \
 %{!static: \
   %{rdynamic:-export-dynamic} \
@@ -66,8 +65,8 @@ along with GCC; see the file COPYING3.  
   %{mabi=64:-m" GNU_USER_LINK_EMULATION64 "} \
   %{mabi=32:-m" GNU_USER_LINK_EMULATION32 "}"
 
-#undef LINK_SPEC
-#define LINK_SPEC GNU_USER_TARGET_LINK_SPEC
+#undef SUBTARGET_LINK_SPEC
+#define SUBTARGET_LINK_SPEC GNU_USER_TARGET_LINK_SPEC
 
 #undef SUBTARGET_ASM_SPEC
 #define SUBTARGET_ASM_SPEC \
Index: gcc/gcc/config/mips/linux-common.h
===
--- gcc.orig/gcc/config/mips/linux-common.h 2015-09-18 00:39:02.206172580 
+0100
+++ gcc/gcc/config/mips/linux-common.h  2015-09-18 00:39:19.891178351 +0100
@@ -27,8 +27,8 @@ along with GCC; see the file COPYING3.  
 ANDROID_TARGET_OS_CPP_BUILTINS();  \
   } while (0)
 
-#undef  LINK_SPEC
-#define LINK_SPEC  \
+#undef  SUBTARGET_LINK_SPEC
+#define SUBTARGET_LINK_SPEC\
   LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LINK_SPEC,  \
   GNU_USER_TARGET_LINK_SPEC " " ANDROID_LINK_SPEC)
 
Index: gcc/gcc/config/mips/mips.h
===
--- gcc.orig/gcc/config/mips/mips.h 2015-09-18 00:39:02.218248501 +0100
+++ gcc/gcc/config/mips/mips.h  2015-09-18 02:04:47.233407839 +0100
@@ -1334,11 +1334,27 @@ FP_ASM_SPEC "\
 
 /* Extra switches sometimes passed to the linker.  */
 
+/* SUBTARGET_LINK_SPEC is always passed to the linker.  It may be
+   overridden by subtargets.  */
+
+#ifndef SUBTARGET_LINK_SPEC
+#define SUBTARGET_LINK_SPEC ""
+#endif
+
+/* LINK_SHARED_SPEC is usually set to `%{shared}', however some
+   subtargets may have other means to include this spec and wish
+   to override this macro.  Notice the leading space.  */
+
+#ifndef LINK_SHARED_SPEC
+#define LINK_SHARED_SPEC " %{shared}"
+#endif
+
 #ifndef LINK_SPEC
 #define LINK_SPEC "\
 %(endian_spec) \
-%{G*} %{mips1} %{mips2} %{mips3} %{mips4} %{mips32*} %{mips64*} \
-%{shared}"
+%{G*} %{mips1} %{mips2} %{mips3} %{mips4} %{mips32*} %{mips64*}" \
+LINK_SHARED_SPEC " \
+%(subtarget_link_spec)"
 #endif  /* LINK_SPEC defined */
 
 
@@ -1382,6 +1398,7 @@ FP_ASM_SPEC "\
   { "subtarget_cpp_spec", SUBTARGET_CPP_SPEC },
\
   { "subtarget_asm_debugging_spec", SUBTARGET_ASM_DEBUGGING_SPEC },\
   {

[PATCH RFC 0/2] GCC: MIPS IEEE Std 754 NaN interlinking support

2015-11-26 Thread Maciej W. Rozycki

Hi,

 This implements the GCC interface for IEEE Std 754 NaN interlinking, 
following the recommendations set out in the "MIPS ABI Extension for
IEEE Std 754 Non-Compliant Interlinking" document available here:
.

 Two patches comprise the change, of which one is a preparatory clean-up 
and the other one implements the feature added.

 These changes have passed manual testing by compiling and linking objects 
with different options relevant here, i.e. the `-mieee=' and 
`-mrelaxed-nan=' options added here and the `-mnan=' option already 
supported.  No full regression testing has been run.

 At this time this is an RFC only and therefore patches are posted despite 
stage 3.  These patches are not meant to be merged before the document 
referred above has been finalised and corresponding binutils patches this 
feature relies on committed.

  Maciej

[PATCH RFC 2/2] MIPS/GCC: IEEE Std 754 NaN interlinking support

2015-11-26 Thread Maciej W. Rozycki

Implement the GCC interface for IEEE Std 754 NaN interlinking, following 
"MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking" 
 and
the recommendations set out there as follows:

* implement driver and compiler command-line options to control the 
  compliance mode:

  -mieee=strict-- to select the strict compliance mode enforcing 
  compliance checks down to the static link and
  therefore suitable for use cases where IEEE 
  Std 754 compliance is critical,

  -mieee=relaxed-exec  -- to select the relaxed compliance mode for 
  executable links only, and the strict 
  compliance mode for other execution modes,
  effectively producing objects and shared 
  libraries still usable with software which is 
  strictly compliant with IEEE Std 754,

  -mieee=relaxed   -- to select the relaxed compliance mode,

  -mieee=force-relaxed -- to select the relaxed compliance mode, 
  additionally suppressing warnings from 
  suspicious static links.

* implement corresponding configuration options to control the default 
  compliance mode:

  --with-ieee=strict   -- to select the strict compliance mode as 
  the default,

  --with-ieee=relaxed-exec -- to select the relaxed compliance mode for 
  executable links only as the default.

* implement compiler command-line options to control the NaN compliance 
  mode:

  -mrelaxed-nan=none  -- to output code suitable for use cases where 
 IEEE Std 754 compliance is critical and 
 therefore including `.ieee strict; .ieee warn' 
 in assembly produced,

  -mrelaxed-nan=exec  -- to output code suitable for use cases where 
 IEEE Std 754 compliance is not required, 
 however still usable with software which is 
 strictly compliant with IEEE Std 754, and 
 therefore including `.ieee strict; .ieee 
 nowarn' in assembly produced,

  -mrelaxed-nan=all   -- to output code suitable for use cases where 
 IEEE Std 754 compliance is never required and 
 therefore including `.ieee relaxed; .ieee 
 nowarn' in assembly produced,

  -mrelaxed-nan=force -- same as `-mrelaxed-nan=all' to simplify 
 handling in specs.

There is one issue with this change in that a `-mrelaxed-nan=' option 
supplied on a GCC command line does not override the default set with a 
`--with-ieee=' option.  This is because the driver stuffs the initial 
`-mieee=' option produced with OPTION_DEFAULT_SPECS *after* any 
`-mrelaxed-nan=' option provided by the user.  I take it as a driver bug
as user-supplied options are supposed to be placed last on the command 
line so as to allow any earlier or implied options to be overridden.
Therefore I am not going to address this problem with this change, the
user is advised to use `-mieee=' options wherever possible anyway.

gcc/
* flags.h (set_fast_math_flags): New prototype.
* opts.c (set_fast_math_flags): Export.
* common/config/mips/mips-common.c (mips_handle_option): Handle 
`-mieee=' options.
* config/mips/mips-opts.h (mips_relaxed_nan_setting): New enum.
(mips_ieee_setting): Likewise.
* config/mips/mips.h (OPTION_DEFAULT_SPECS): Add `ieee'.
(ASM_SPEC): Handle `-mieee=' and `-mrelaxed-nan=' options.
(LINK_SPEC): Likewise.
[!HAVE_AS_IEEE] (HAVE_AS_IEEE): New macro.
* config/mips/mips.c (mips_file_start): Handle `-mrelaxed-nan=' 
options.
* config/mips/mips.opt (mrelaxed-nan=, mieee=): New options.
* doc/install.texi (Configuration): Add `--with-ieee=' options.
* doc/invoke.texi (Option Summary) : Add 
`-mrelaxed-nan=' and `-mieee=' options.
(MIPS Options): Likewise.
* config.gcc : Handle `--with-ieee=' options.
(all_defaults): Add `ieee'.
* configure.ac : Check for GAS `-mieee=' support.
* config.in: Regenerate.
* configure: Regenerate.

gcc/testsuite/
* gcc.target/mips/ieee-force-relaxed.c: New test.
* gcc.target/mips/ieee-relaxed-exec.c: New test.
* gcc.target/mips/ieee-relaxed.c: New test.
* gcc.target/mips/ieee-strict.c: New test.
* gcc.target/mips/relaxed-nan-all.c: New test.
* gcc.target/mips/relaxed-nan-exec.c: New test.
* gcc.target/mips/relaxed-nan-force.c: New test.
* gcc.target/mips/relaxed-nan-none.c: New test.
---

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-26 Thread Kyrill Tkachov



On 26/11/15 14:23, Bernd Schmidt wrote:

On 11/26/2015 02:52 PM, Kyrill Tkachov wrote:


On 26/11/15 13:40, Bernd Schmidt wrote:

On 11/26/2015 12:12 PM, Kyrill Tkachov wrote:

  modified_in_b = emit_b != NULL_RTX && modified_in_p (orig_a,
emit_b);


Can this ever be true? We arrange for emit_b to set a new pseudo,
don't we? Are we allowing cases where we copy a pattern that sets more
than one register, and is that safe?


You're right, this statement always sets modifieb_in_b to false. We
reject anything bug single_set insns
by this point in the code. I'll replace that with modified_in_b = false;


Note that there's a mirrored test for modified_in_a, and both are already 
initialized to false.


Yeah, that can be changed to just false too. I'll do that in the next revision.

Also - careful with single_set, it can return true even for multiple sets in case there's a REG_DEAD note on one of them. You might want to strengthen your tests to also include !multiple_sets. Then, maybe instead of deleting these tests, 
turn them into gcc_checking_asserts.




I see. I think the best place to do that would be in insn_valid_noce_process_p 
and just get it to return
false if multiple_sets (insn) is true.

Would it be ok if I did that as a separate follow-up patch?
We don't have a testcase where this actually causes trouble and I'd like to 
keep the fix for
this PR as self-contained as possible.

Thanks,
Kyrill





Bernd

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/68506: Fix emitting order of insns in IF-THEN-JOIN case

2015-11-26 Thread Bernd Schmidt


On 11/26/2015 03:35 PM, Kyrill Tkachov wrote:

Would it be ok if I did that as a separate follow-up patch?
We don't have a testcase where this actually causes trouble and I'd like
to keep the fix for
this PR as self-contained as possible.


Sure.


Bernd

Re: [PATCH] Improve verification of loop->latch in verify_loop_structure

2015-11-26 Thread Tom de Vries


On 26/11/15 13:15, Richard Biener wrote:

On Thu, 26 Nov 2015, Alan Lawrence wrote:


This caused an ICE compiling value.c from gdb on
aarch64-none-linux-gnu; the testcase, after preprocessing on aarch64,
ICEs on both aarch64 and x86_64, but is about 1MB - I'm working on
reducing that down to something small enough to post...

$ ./gcc/xgcc -B ./gcc -O2 -g value.c
../../binutils-gdb/gdb/value.c: In function ‘show_convenience’:
../../binutils-gdb/gdb/value.c:2615:1: error: loop 3’s latch is missing
../../binutils-gdb/gdb/value.c:2615:1: internal compiler error: in
verify_loop_structure, at cfgloop.c:1669
0x71e653 verify_loop_structure()
 /work/alalaw01/src2/gcc/gcc/cfgloop.c:1669
0x97c6ae checking_verify_loop_structure
 /work/alalaw01/src2/gcc/gcc/cfgloop.h:325
0x97c6ae loop_optimizer_init(unsigned int)
 /work/alalaw01/src2/gcc/gcc/loop-init.c:106
0x97c78a rtl_loop_init
 /work/alalaw01/src2/gcc/gcc/loop-init.c:398
0x97c78a execute
 /work/alalaw01/src2/gcc/gcc/loop-init.c:425


See also PR68549 for why I think this happens "by design".  Thus
I think we need to revert the checking when
LOOPS_MAY_HAVE_MULTIPLE_LATCHES for now.


I'll revert the whole patch for now.

Thanks,
- Tom

Improving the cxx0x_warning.h diagnostic

2015-11-26 Thread Jonathan Wakely


We have lots of headers that do this:

#if __cplusplus < 201103L
# include 
#else

and that file has a #error (not #warning as the name would suggest).

Unfortunately a #error does not stop compilation, so when users try to
compile C++11 source code (which includes standard headers) and they
don't use the right -std option, they are likely to get that #error
message, followed by a cascade of later errors due to the use of C++11
syntax or library types.

We could solve this!

--- a/libstdc++-v3/include/bits/c++0x_warning.h
+++ b/libstdc++-v3/include/bits/c++0x_warning.h
@@ -29,9 +29,11 @@
#define _CXX0X_WARNING_H 1

#if __cplusplus < 201103L
-#error This file requires compiler and library support for the \
-ISO C++ 2011 standard. This support is currently experimental, and must be \
-enabled with the -std=c++11 or -std=gnu++11 compiler options.
+#error This file requires compiler and library support for at least \
+the ISO C++ 2011 standard, which must be enabled with \
+the -std=c++11 or -std=gnu++11 compiler options.
+// Include a non-existent file to terminate compilation:
+#include <__no_such_header__>
#endif

#endif

When a header cannot be included we stop during preprocessing and
never even try to compile the C++11 code that follows , so the user
gets nothing more than:

In file included from /home/jwakely/gcc/6/include/c++/6.0.0/thread:35:0,
from th.cc:1:
/home/jwakely/gcc/6/include/c++/6.0.0/bits/c++0x_warning.h:32:2: error: #error 
This file requires compiler and library support for at least the ISO C++ 2011 
standard, which must be enabled with the -std=c++11 or -std=gnu++11 compiler 
options.
#error This file requires compiler and library support for at least \
 ^

/home/jwakely/gcc/6/include/c++/6.0.0/bits/c++0x_warning.h:36:30: fatal error: 
__no_such_header__: No such file or directory
#include <__no_such_header__>
 ^

compilation terminated.


I'm not very happy with the __no_such_header__ part, but we could
bikeshed a better name. The point is that the compilation stops
immediately, and the last errors printed are the ones about using the
wrong -std option.

Is this a good idea?

Too late for 6.0?

Re: [PATCH, PING*4] PR debug/53927: fix value for DW_AT_static_link

2015-11-26 Thread Eric Botcazou

> Here’s the updated patch. Regtested again on x86_64-linux.

Thanks, it looks good as far as I'm concerned (modulo the missing "in order" 
before "not to shift all other offsets" in a comment).

-- 
Eric Botcazou

Re: [PATCH, PING*4] PR debug/53927: fix value for DW_AT_static_link

2015-11-26 Thread Pierre-Marie de Rodat


On 11/26/2015 03:50 PM, Eric Botcazou wrote:

Here’s the updated patch. Regtested again on x86_64-linux.


Thanks, it looks good as far as I'm concerned (modulo the missing "in order"
before "not to shift all other offsets" in a comment).


Fixed and pushed. Thank you again for reviewing!

--
Pierre-Marie de Rodat

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Richard Sandiford

Bernd Schmidt  writes:
> On 11/25/2015 01:20 PM, Richard Sandiford wrote:
>> This series fixes PR 68432, a regression caused by my internal-functions-
>> for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
>> condition but conditionally FAIL if we're optimising for size:
>>
>>if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
>>&& !flag_trapping_math)
>>  {
>>if (TARGET_ROUND)
>>  emit_insn (gen_sse4_1_round2
>> (operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
>>else if (optimize_insn_for_size_p ())
>>  FAIL;
>>else
>>  ix86_expand_rint (operands[0], operands[1]);
>>  }
>
> How many such cases are there? Is it just the ix86 patterns?

Yeah, just x86 AFAICT.

> And, could the same effect be achieved by just moving the
> optimize_insn_for_size_p test into the predicate (as some existing
> patterns already do), and then testing the predicate while ensuring
> that optimize_insn_for_x returns the right value?

That would mean that the validity of a gimple call would depend on both
the target predicates and whether the block containing the statement
is optimised for size or speed.  So whenever we want to test whether
a gimple call is valid, we'd need to generate rtl for its arguments
and pass them to the target predicates.  We'd also need to be aware
that moving a call between blocks could make it invalid (because
we might be moving a call from a block optimised for speed to a block
optimised for size).  I don't think those are the kinds of thing that
gimple passes would normally expect.

It seems better to use FAILs and predicates for correctness only
and use other ways of representing size/speed decisions.  And since we
already have another way for rtl, it seems like a good idea to use it
for gimple too.

Thanks,
Richard

Re: [PATCH] libstdc++: Fix libstdc++/67440: pretty-printing of a const set fails

2015-11-26 Thread Jonathan Wakely


On 25/11/15 19:55 +, Jonathan Wakely wrote:

On 25 November 2015 at 17:29, Alan Lawrence  wrote:

On 16/11/15 21:04, Doug Evans wrote:


Hi.

Apologies for the delay.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67440

Tested with current trunk.

2015-11-16  Doug Evans  

 PR libstdc++/67440
 * python/libstdcxx/v6/printers.py (find_type): Handle "const" in
 type name.
 * testsuite/libstdc++-prettyprinters/debug.cc: Add test for
 const set.
 * testsuite/libstdc++-prettyprinters/simple.cc: Ditto.
 * testsuite/libstdc++-prettyprinters/simple11.cc: Ditto.



On gcc-5-branch, the debug.cc and simple.cc tests don't seem to compile, on
either x86_64-none-linux-gnu or aarch64-none-linux-gnu. I get errors like:

/work/alalaw01/src/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:
In function 'int main()':
/work/alalaw01/src/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc:77:43:
error: in C++98 'const_intset' must be initialized by constructor, not by
'{...}'
   const std::set const_intset = {2, 3};
   ^


Which should have failed to compile on trunk as well, but we're
missing a -std=gnu++98 in the simple.cc testcase, so on trunk it uses
the -std=gnu++14 default. I'll add -std=gnu++98 to the test.


I've committed this to trunk, and will apply it to gcc-5-branch after
I finish testing it on the branch.

commit 0ee15827a1132aeb960ff613ecafd8351a2535e0
Author: Jonathan Wakely 
Date:   Thu Nov 26 15:10:15 2015 +

Ensure pretty-printer test uses C++98 mode

	* testsuite/libstdc++-prettyprinters/simple.cc: Add -std=gnu++98 to
	dg-options and avoid use of uniform-init.

diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
index 68c4d83..e1956bf 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
@@ -1,7 +1,7 @@
 // If you modify this, please update simple11.cc and debug.cc as well.
 
 // { dg-do run }
-// { dg-options "-g -O0" }
+// { dg-options "-g -O0 -std=gnu++98" }
 
 // Copyright (C) 2011-2015 Free Software Foundation, Inc.
 //
@@ -74,7 +74,10 @@ main()
 // { dg-final { note-test mpiter {{first = "zardoz", second = 23}} } }
 
   // PR 67440
-  const std::set const_intset = {2, 3};
+  std::set intset;
+  intset.insert(2);
+  intset.insert(3);
+  const std::set const_intset = intset;
 // { dg-final { note-test const_intset {std::set with 2 elements = {[0] = 2, [1] = 3}} } }
 
   std::set sp;

[PATCH 1/7][ARM] Add support for ARMv8.1.

2015-11-26 Thread Matthew Wahab


Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Is this ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
* config/arm/arm-protos.h (FL2_ARCH8_1): New.
(FL2_FOR_ARCH8_1A): New.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch8_1): New.
(arm_option_override): Set arm_arch8_1.
* config/arm/arm.h (TARGET_NEON_RDMA): New.
(arm_arch8_1): Declare.
* doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
"armv8.1-a+crc".
(ARM Options, -mfpu): Fix a typo.
>From 3ee3a16839c1c316906e33f5384da05ee70dd831 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 11:31:25 +0100
Subject: [PATCH 1/7] [ARM] Add ARMv8.1 architecture flags and options.

Change-Id: I6bb0c7f020613a1a17e40bccc28b00c30d644c70
---
 gcc/config/arm/arm-arches.def |  5 +
 gcc/config/arm/arm-protos.h   |  3 +++
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  |  4 
 gcc/config/arm/arm.h  |  6 ++
 gcc/doc/invoke.texi   |  6 +++---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..6c83153 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH ("armv8.1-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.1-a+crc",cortexa53, 8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e4b8fb3..c3eb6d3 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -388,6 +388,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -416,6 +418,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 48aac41..db17f6e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,16 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8.1-a) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(30)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e0cdc20..8cbf364 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -817,6 +817,9 @@ int arm_arch7em =

RE: [PATCH] MIPS/GCC/doc: Reorder `-mcompact-branches='

2015-11-26 Thread Moore, Catherine



> -Original Message-
> From: Maciej W. Rozycki [mailto:ma...@imgtec.com]
> Sent: Thursday, November 26, 2015 9:01 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Moore, Catherine; Matthew Fortune
> Subject: [PATCH] MIPS/GCC/doc: Reorder `-mcompact-branches='
> 
> Move the `-mcompact-branches=' option out of the middle of a block of
> floating-point options.  The option is not related to FP in any way.
> Place it immediately below other branch instruction selection options.
> 
>   gcc/
>   * doc/invoke.texi (Option Summary) : Reorder
>   `-mcompact-branches='.
>   (MIPS Options): Likewise.
> ---
> 
>  OK to apply?
> 
 Yes -- thanks.

[PATCH 2/7][ARM] Multilib support for ARMv8.1.

2015-11-26 Thread Matthew Wahab


This patch sets up multilib support for ARMv8.1, treating it as a
synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
instructions are only generated for the new, instruction-specific
instrinsics, mapping to ARMv8 rather than adding a new multilib variant
is sufficient.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
matches for "armv8-a".

>From 9cd389bf72cff391423e17423f4624904aff5474 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 23 Oct 2015 09:37:12 +0100
Subject: [PATCH 2/7] [ARM] Multilib support for ARMv8.1

Change-Id: I65ee77768e22452ac15452cf6d4fdec3079ef852
---
 gcc/config/arm/t-aprofile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index cf34161..b23f1bc 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -98,6 +98,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a+crc
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3
-- 
2.1.4

[PATCH 3/7][ARM] Add patterns for new instructions

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
introduced in the ARMv8.1 architecture. The instructions are made
available when -march=armv8.1-a is enabled with suitable fpu settings,
such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/iterators.md (VQRDMLH_AS): New.
(neon_rdma_as): New.
* config/arm/neon.md
(neon_vqrdmlh): New.
(neon_vqrdmlh_lane): New.
* config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
(UNSPEC_VQRDMLSH): New.

>From fea646491d51548b775fdfb5a4fd6d6bc72d4c83 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 12:00:50 +0100
Subject: [PATCH 3/7] [ARM] Add patterns for new instructions.

Change-Id: Ia84c345019c7beda2d3c6c39074043d2e005347a
---
 gcc/config/arm/iterators.md |  5 +
 gcc/config/arm/neon.md  | 45 +
 gcc/config/arm/unspecs.md   |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..c7a6880 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -362,6 +362,8 @@
 (define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
UNSPEC_SHA1P])
 
+(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
 ;;
 ;; Mode attributes
 ;;
@@ -831,3 +833,6 @@
(simple_return " && use_simple_return_p ()")])
 (define_code_attr return_cond_true [(return " && USE_RETURN_INSN (TRUE)")
(simple_return " && use_simple_return_p ()")])
+
+;; Attributes for VQRDMLAH/VQRDMLSH
+(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..844ef5e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2014,6 +2014,18 @@
   [(set_attr "type" "neon_sat_mul_")]
 )
 
+;; vqrdmlah, vqrdmlsh
+(define_insn "neon_vqrdmlh"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+	(unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "0")
+		   (match_operand:VMDQI 2 "s_register_operand" "w")
+		   (match_operand:VMDQI 3 "s_register_operand" "w")]
+		  VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+  "vqrdmlh.\t%0, %2, %3"
+  [(set_attr "type" "neon_sat_mla__long")]
+)
+
 (define_insn "neon_vqdmlal"
   [(set (match_operand: 0 "s_register_operand" "=w")
 (unspec: [(match_operand: 1 "s_register_operand" "0")
@@ -3176,6 +3188,39 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_sat_mul__scalar_q")]
 )
 
+;; vqrdmlah_lane, vqrdmlsh_lane
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "0")
+		  (match_operand:VMQI 2 "s_register_operand" "w")
+		  (match_operand: 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%q0, %q2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "0")
+		  (match_operand:VMDI 2 "s_register_operand" "w")
+		  (match_operand:VMDI 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%P0, %P2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "neon_vmla_lane"
   [(set (match_operand:VMD 0 "s_register_operand" "=w")
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 44d4e7d..e7ae9a2 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -360,5 +360,7 @@
   UNSPEC_NVRINTX
   UNSPEC_NVRINTA
   UNSPEC_NVRINTN
+  UNSPEC_VQRDMLAH
+  UNSPEC_VQRDMLSH
 ])
 
-- 
2.1.4

[PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_QRDMX.

>From 4009cf5c0455429a415be9ca239ac09ac86b17dd Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

Change-Id: I26cde507e8844a731e4fd857fbd30bf87f213f89
---
 gcc/config/arm/arm-c.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c336a16..6bf740b 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -66,6 +66,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_SAT", TARGET_ARM_SAT);
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRYPTO", TARGET_CRYPTO);
 
+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
   if (unaligned_access)
 builtin_define ("__ARM_FEATURE_UNALIGNED");
   if (TARGET_CRC32)
-- 
2.1.4

[PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.

The main changes are
- add_options_for_arm_v8_1a_neon: Call
  check_effective_target_arm_v8_1a_neon_ok to select a suitable set of
  options.
- check_effective_target_arm_v8_1a_neon_ok: Test possible command line
  options, recording the first set that works.
- check_effective_target_arm_v8_1a_neon_hw: Add a test for ARM targets.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

testsuite/
2015-11-26  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

Re: [wwwdocs] Update C++ conformance status

2015-11-26 Thread Jonathan Wakely


On 21/11/15 16:54 +0100, Gerald Pfeifer wrote:

On Sat, 21 Nov 2015, Jonathan Wakely wrote:

I forgot to respond to this, and never committed the patch, sorry.

I've committed the changes to htdocs/projects/cxx0x.html now, but
not the htdocs/bugs/index.html change.


I wasn't opposed to the bugs/index.html change, mind.  Only
wondering about the 3.x info.


I agree that the 3.x info is not useful on that page. Maybe we should
just drop the whole "Common problems when upgrading the compiler"
section, because info on 3.x is outdated, nearly everybody understands
that C++ compilers conform to the standard these days (even MS got in
on that act eventually ;-) and the info about breaking the C++ ABI with
every major release is just wrong!


Version-specific changes like the ones described here, and how to
cope with them, are usually covered in gcc-*/porting_to.html these
days, perhaps we should add pointers from bugs.html?

I agree with your thoughts and went ahead and made a first set of
changes along these lines (patch below).


LGTM.


Absolutely go ahead and trim (or remove) this further.

There is also a section about "C++ non-bugs" where I am not sure
the current contents still makes a lot of sense?


Agreed, most of it isn't useful now (although some is).

I'll make another pass at it next week, thanks.

[PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
the arm_neon.h header, together with the ARM builtins used to implement
them. The intrinsics are available when -march=armv8.1-a is enabled
together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
(vqrdmlahq_s16, vqrdmlahq_s32): New.
(vqrdmlsh_s16, vqrdmlsh_s32): New.
(vqrdmlahq_s16, vqrdmlshq_s32): New.
* config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".

>From 93e9db5bf06172f18f4e89e9533c66d8a0c4f2ca Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:21:44 +0100
Subject: [PATCH 6/7] [ARM] Add neon intrinsics vqrdmlah, vqrdmlsh.

Change-Id: Ic40ff4d477f36ec01714c68e3b83b66208c7958b
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..b617f80 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1158,6 +1158,56 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlshv4si (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df..8d5c0ca 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -45,6 +45,8 @@ VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
 VAR2 (TERNOP, vqdmlal, v4hi, v2si)
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR4 (TERNOP, vqrdmlah, v4hi, v2si, v8hi, v4si)
+VAR4 (TERNOP, vqrdmlsh, v4hi, v2si, v8hi, v4si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-- 
2.1.4

[PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-11-26 Thread Matthew Wahab


Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

* config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
(vqrdmlahq_lane_s32): New.
(vqrdmlah_lane_s16): New.
(vqrdmlah_lane_s32): New.
(vqrdmlshq_lane_s16): New.
(vqrdmlshq_lane_s32): New.
(vqrdmlsh_lane_s16): New.
(vqrdmlsh_lane_s32): New.
* config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
"vqrdmlsh_lane".

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Bernd Schmidt


On 11/26/2015 04:13 PM, Richard Sandiford wrote:


That would mean that the validity of a gimple call would depend on both
the target predicates and whether the block containing the statement
is optimised for size or speed.  So whenever we want to test whether
a gimple call is valid, we'd need to generate rtl for its arguments
and pass them to the target predicates.  We'd also need to be aware
that moving a call between blocks could make it invalid (because
we might be moving a call from a block optimised for speed to a block
optimised for size).  I don't think those are the kinds of thing that
gimple passes would normally expect.


In your world, would we move such calls into places where we currently 
would reject expanding them? I.e., would the expanders no longer fail, 
even if the target does not want to expand something when optimizing for 
size?


The other question is, you mention the need to generate rtl. I don't 
quite see why - the predicates (or insn conditions, to follow the 
terminology in the manual) for a named pattern aren't allowed to look at 
operands. Surely these conditions are already taken into account in your 
internal_fn work?



It seems better to use FAILs and predicates for correctness only
and use other ways of representing size/speed decisions.  And since we
already have another way for rtl, it seems like a good idea to use it
for gimple too.


I'm looking for a minimal fix for gcc-6, and a 22-patch series that 
rewrites lots of target patterns isn't that. If someone else wants to 
approve it then fine, but I think this is not a good approach for now.


To avoid having to retest validity when moving an internal function, 
could you just make the availability test run the predicate with both 
for_speed and for_size options, and require that the pattern is valid 
for both? That should give you a definitive answer as to whether you can 
later expand the insn, and I'd call that good enough for now.


I wish we'd taken some more time to think through the consequences of 
the original internal_fn patchset.



Bernd

1 2 >

1 - 100 of 133 matches

Mail list logo