Re: [PATCH] Fix PR83418

2017-12-15 Thread Richard Biener
On Thu, 14 Dec 2017, Richard Biener wrote:

> On December 14, 2017 4:43:42 PM GMT+01:00, Jeff Law  wrote:
> >On 12/14/2017 01:54 AM, Richard Biener wrote:
> >> 
> >> IVOPTs (at least) leaves unfolded stmts in the IL and VRP
> >overzealously
> >> asserts they cannot happen.
> >> 
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >> 
> >> Richard.
> >> 
> >> 2017-12-14  Richard Biener  
> >> 
> >>PR tree-optimization/83418
> >>* vr-values.c
> >(vr_values::extract_range_for_var_from_comparison_expr):
> >>Instead of asserting we don't get unfolded comparisons deal with
> >>them.
> >> 
> >>* gcc.dg/torture/pr83418.c: New testcase.
> >I think this also potentially affects dumping.  I've seen the dumper
> >crash trying to access a INTEGER_CST where we expected to find an
> >SSA_NAME while iterating over a statement's operands.
> >
> >I haven't submitted the workaround because I hadn't tracked down the
> >root cause to verify something deeper isn't wrong.
> 
> Yes, I've seen this as well, see my comment in the PR. The issue is that DOM 
> calls VRP analyze (and dump) routines with not up to date operands during 
> optimize_stmt. 

I had the following in my tree to allow dumping.

Richard.

Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 255640)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -2017,6 +2017,7 @@ dom_opt_dom_walker::optimize_stmt (basic
 undefined behavior that get diagnosed if they're left in 
the
 IL because we've attached range information to new
 SSA_NAMES.  */
+ update_stmt_if_modified (stmt);
  edge taken_edge = NULL;
  evrp_range_analyzer.vrp_visit_cond_stmt (as_a  
(stmt),
   &taken_edge);



Re: [SFN] Bootstrap broken

2017-12-15 Thread Jakub Jelinek
Hi!

I'll try to read it in more details later today, but one thing I've noticed:

On Thu, Dec 14, 2017 at 11:51:29PM -0200, Alexandre Oliva wrote:
> @@ -5380,7 +5410,6 @@ verify_gimple_in_cfg (struct function *fn, bool 
> verify_nothrow)
> err |= err2;
>   }
>  
> -  bool label_allowed = true;
>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>   {
> gimple *stmt = gsi_stmt (gsi);
> @@ -5397,19 +5426,6 @@ verify_gimple_in_cfg (struct function *fn, bool 
> verify_nothrow)
> err2 = true;
>   }
>  
> -   /* Labels may be preceded only by debug markers, not debug bind
> -  or source bind or any other statements.  */
> -   if (gimple_code (stmt) == GIMPLE_LABEL)
> - {
> -   if (!label_allowed)
> - {
> -   error ("gimple label in the middle of a basic block");
> -   err2 = true;
> - }
> - }
> -   else if (!gimple_debug_begin_stmt_p (stmt))
> - label_allowed = false;
> -

Please don't revert the above 2 hunks.  Instead just remove the
> -   else if (!gimple_debug_begin_stmt_p (stmt))
> - label_allowed = false;
lines only from it and adjust the comment.  We want to verify there are
no statements before labels.

Jakub


Re: [PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)

2017-12-15 Thread Richard Biener
On Thu, 14 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase FAILs -fcompare-debug, because one COND_EXPR
> branch from the FE during gimplifications is just >
> which doesn't have TREE_SIDE_EFFECTS, but for -gstatement-frontiers it
> is a STATEMENT_LIST which contains # DEBUG BEGIN_STMT and that  >.

Ugh...  the issue is that this difference might have many other
-fcompare-debug issues, like when folding things?  Why is it a
STATEMENT_LIST rather than a COMPOUND_EXPR?

>  Neither # DEBUG BEGIN_STMT nor that NOP_EXPR have
> TREE_SIDE_EFFECTS, but STATEMENT_LIST has TREE_SIDE_EFFECTS already from
> make_node and the gimplifier (and apparently the C++ FE too) checks
> just that bit.  With { { { 0; } { 1; } { 2; } { 3; } } } one can end up
> with quite large STATEMENT_LIST subtrees which in reality still don't
> have any side-effects.
> Maintaining accurate TREE_SIDE_EFFECTS bit on STATEMENT_LIST would be hard,
> if we would only add and never remove statements, then we could just clear
> it during make_node and set it whenever adding TREE_SIDE_EFFECTS statement
> into it, but as soon as we sometimes remove from STATEMENT_LIST, or merge
> STATEMENT_LISTs etc., maintaining this is too IMHO expensive, especially
> when we usually just will not care about it.
> So, I think it is better to just compute this lazily in the few spots where
> we are interested about this, in real-world testcases most of the
> STATEMENT_LISTs will have side-effects and should find them pretty early
> when walking the tree.
> As a side-effect, this patch will handle those
> { { { 0; } { 1; } { 2; } { 3; } } } and similar then/else statement lists
> better.

I don't like this too much.  Iff then we should do "real" lazy
computation, like adding a TREE_SIDE_EFFECTS_VALID flag on STATEMENT_LIST,
keeping TREE_SIDE_EFFECTS up-to-date when easily possible and when doing
the expensive thing cache the result.  That said, I'm not convinced
this will fix -fcompare-debug issues for good.  Is it really necessary
to introduce this IL difference so early and in such an intrusive way?

Can't we avoid adding # DEBUG BEGIN_STMT when there's not already
a STATEMENT_LIST around for example?

Thanks,
Richard.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-12-14  Jakub Jelinek  
> 
>   PR debug/83419
>   * tree.h (statement_with_side_effects_p): Declare.
>   * tree.c (statement_with_side_effects_p): New function.
>   * gimplify.c (shortcut_cond_expr, gimplify_cond_expr): Use it.
> 
>   * cp-gimplify.c (genericize_if_stmt, gimplify_expr_stmt,
>   cp_fold): Use statement_with_side_effects_p instead of
>   just TREE_SIDE_EFFECTS.
> 
>   * gcc.dg/pr83419.c: New test.
> 
> --- gcc/tree.h.jj 2017-12-12 09:48:15.0 +0100
> +++ gcc/tree.h2017-12-14 13:22:34.157858781 +0100
> @@ -4780,6 +4780,7 @@ extern tree obj_type_ref_class (const_tr
>  extern bool types_same_for_odr (const_tree type1, const_tree type2,
>   bool strict=false);
>  extern bool contains_bitfld_component_ref_p (const_tree);
> +extern bool statement_with_side_effects_p (tree);
>  extern bool block_may_fallthru (const_tree);
>  extern void using_eh_for_cleanups (void);
>  extern bool using_eh_for_cleanups_p (void);
> --- gcc/tree.c.jj 2017-12-12 09:48:15.0 +0100
> +++ gcc/tree.c2017-12-14 13:21:25.857752480 +0100
> @@ -12296,6 +12296,26 @@ contains_bitfld_component_ref_p (const_t
>return false;
>  }
>  
> +/* Return true if STMT has side-effects.  This is like
> +   TREE_SIDE_EFFECTS (stmt), except it returns false for NULL and if STMT
> +   is a STATEMENT_LIST, it recurses on the statements.  */
> +
> +bool
> +statement_with_side_effects_p (tree stmt)
> +{
> +  if (stmt == NULL_TREE)
> +return false;
> +  if (TREE_CODE (stmt) != STATEMENT_LIST)
> +return TREE_SIDE_EFFECTS (stmt);
> +
> +  for (tree_stmt_iterator i = tsi_start (stmt);
> +   !tsi_end_p (i); tsi_next (&i))
> +if (statement_with_side_effects_p (tsi_stmt (i)))
> +  return true;
> +
> +  return false;
> +}
> +
>  /* Try to determine whether a TRY_CATCH expression can fall through.
> This is a subroutine of block_may_fallthru.  */
>  
> --- gcc/gimplify.c.jj 2017-12-14 11:53:34.907142223 +0100
> +++ gcc/gimplify.c2017-12-14 13:18:19.261184074 +0100
> @@ -3637,8 +3637,8 @@ shortcut_cond_expr (tree expr)
>tree *true_label_p;
>tree *false_label_p;
>bool emit_end, emit_false, jump_over_else;
> -  bool then_se = then_ && TREE_SIDE_EFFECTS (then_);
> -  bool else_se = else_ && TREE_SIDE_EFFECTS (else_);
> +  bool then_se = statement_with_side_effects_p (then_);
> +  bool else_se = statement_with_side_effects_p (else_);
>  
>/* First do simple transformations.  */
>if (!else_se)
> @@ -3656,7 +3656,7 @@ shortcut_cond_expr (tree expr)
> if (rexpr_has_location (pred))
>   SET_EXPR_LOCATION (expr, rexpr_location (pred));
> then_

Re: [PATCH] Fix (-A) - B -> (-B) - A optimization in fold_binary_loc (PR tree-optimization/83269)

2017-12-15 Thread Richard Biener
On Thu, 14 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcase shows, the (-A) - B -> (-B) - A optimization can't
> be done the way it is if the negation of A is performed in type with
> wrapping behavior while the subtraction is done in signed type (with the
> same precision), as if A is (unsigned) INT_MIN, then (int) -(unsigned) INT_MIN
> is INT_MIN and INT_MIN - B is different from (-B) - INT_MIN.
> The reason we can see this is because we check that arg0 is NEGATE_EXPR, but
> arg0 is STRIP_NOPS from op0.  If the NEGATE_EXPR is already done in signed
> type, then it would be already UB if A was INT_MIN and so we can safely do
> it.
> 
> Whether we perform the subtraction in the unsigned type or just don't
> optimize I think doesn't matter that much, at least the only spot during
> x86_64-linux and i686-linux bootstraps/regtests this new condition triggered
> was the new testcase, nothing else.  So if you instead prefer to punt, I can
> tweak the patch, move the negated condition to the if above it.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I think a better fix would be to just check TREE_CODE (op0) == NEGATE_EXPR
and use op0, like we do for op1 (probably fixed that earlier).  I'd rather
not complicate the fold-const.c code more at this point.

Richard.

> 2017-12-14  Jakub Jelinek  
> 
>   PR tree-optimization/83269
>   * fold-const.c (fold_binary_loc): Perform (-A) - B -> (-B) - A
>   subtraction in arg0's type if type is signed and arg0 is unsigned.
>   Formatting fix.
> 
>   * gcc.c-torture/execute/pr83269.c: New test.
> 
> --- gcc/fold-const.c.jj   2017-12-08 00:50:27.0 +0100
> +++ gcc/fold-const.c  2017-12-14 17:42:31.221398170 +0100
> @@ -9098,8 +9098,8 @@ expr_not_equal_to (tree t, const wide_in
> return NULL_TREE.  */
>  
>  tree
> -fold_binary_loc (location_t loc,
> -  enum tree_code code, tree type, tree op0, tree op1)
> +fold_binary_loc (location_t loc, enum tree_code code, tree type,
> +  tree op0, tree op1)
>  {
>enum tree_code_class kind = TREE_CODE_CLASS (code);
>tree arg0, arg1, tem;
> @@ -9770,10 +9770,34 @@ fold_binary_loc (location_t loc,
>/* (-A) - B -> (-B) - A  where B is easily negated and we can swap.  */
>if (TREE_CODE (arg0) == NEGATE_EXPR
> && negate_expr_p (op1))
> - return fold_build2_loc (loc, MINUS_EXPR, type,
> - negate_expr (op1),
> - fold_convert_loc (loc, type,
> -   TREE_OPERAND (arg0, 0)));
> + {
> +   /* If arg0 is e.g. unsigned int and type is int, then we need to
> +  perform the subtraction in arg0's type, because if A is
> +  INT_MIN at runtime, the original expression can be well defined
> +  while the latter is not.  See PR83269.  */
> +   if (ANY_INTEGRAL_TYPE_P (type)
> +   && TYPE_OVERFLOW_UNDEFINED (type)
> +   && ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> +   && !TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0)))
> + {
> +   /* Don't do this when sanitizing, as by doing the subtraction
> +  in unsigned type we won't notice if the original program
> +  has been buggy.  */
> +   if (!TYPE_OVERFLOW_SANITIZED (type))
> + {
> +   tem = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (arg0),
> +  fold_convert_loc (loc,
> +TREE_TYPE (arg0),
> +negate_expr (op1)),
> +  TREE_OPERAND (arg0, 0));
> +   return fold_convert_loc (loc, type, tem);
> + }
> + }
> +   else
> + return fold_build2_loc (loc, MINUS_EXPR, type, negate_expr (op1),
> + fold_convert_loc (loc, type,
> +   TREE_OPERAND (arg0, 0)));
> + }
>  
>/* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
>__complex__ ( x, -y ).  This is not the same for SNaNs or if
> --- gcc/testsuite/gcc.c-torture/execute/pr83269.c.jj  2017-12-14 
> 17:43:24.534710997 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr83269.c 2017-12-14 
> 17:43:10.0 +0100
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/83269 */
> +
> +int
> +main ()
> +{
> +#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ > 4 && __CHAR_BIT__ == 8
> +  volatile unsigned char a = 1;
> +  long long b = 0x8000L;
> +  int c = -((int)(-b) - (-0x7fff * a));
> +  if (c != 1)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Richard Biener
On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor  wrote:
> On 12/14/2017 03:43 AM, Richard Biener wrote:
>>
>> On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:
>>>
>>> On 12/12/2017 05:35 PM, Jeff Law wrote:


 On 12/12/2017 01:15 PM, Martin Sebor wrote:
>
>
> Bug 83373 - False positive reported by -Wstringop-overflow, is
> another example of warning triggered by a missed optimization
> opportunity, this time in the strlen pass.  The optimization
> is discussed in pr78450 - strlen(s) return value can be assumed
> to be less than the size of s.  The gist of it is that the result
> of strlen(array) can be assumed to be less than the size of
> the array (except in the corner case of last struct members).
>
> To avoid the false positive the attached patch adds this
> optimization to the strlen pass.  Although the patch passes
> bootstrap and regression tests for all front-ends I'm not sure
> the way it determines the upper bound of the range is 100%
> correct for languages with arrays with a non-zero lower bound.
> Maybe it's just not as tight as it could be.


 What about something hideous like

 struct fu {
   char x1[10];
   char x2[10];
   int avoid_trailing_array;
 }

 Where objects stored in x1 are not null terminated.  Are we in the realm
 of undefined behavior at that point (I hope so)?
>>>
>>>
>>>
>>> Yes, this is undefined.  Pointer arithmetic (either direct or
>>> via standard library functions) is only defined for pointers
>>> to the same object or subobject.  So even something like
>>>
>>>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>>>
>>> is undefined.
>>
>>
>> There's nothing undefined here - computing the pointer pointing
>> to one-after-the-last element of an array is valid (you are just
>> not allowed to dereference it).
>
>
> Right, and memcpy dereferences it, so it's undefined.

That's interpretation of the standard that I don't share.

Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?  Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?  This means that pfu->x1 + 10 is a valid pointer
into *pfu no matter what you say and you can dereference it.

Richard.

> Martin
>


Re: [PATCH] Fix -fcompare-debug due to DEBUG_BEGIN_STMTs (PR debug/83419)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 09:34:44AM +0100, Richard Biener wrote:
> Ugh...  the issue is that this difference might have many other
> -fcompare-debug issues, like when folding things?  Why is it a
> STATEMENT_LIST rather than a COMPOUND_EXPR?

I believe most of other foldings don't use TREE_SIDE_EFFECTS on whole
statements, just on expressions.  The possible exception is
STATEMENT_EXPRESSIONs.  As for COMPOUND_EXPR, you mean use it only if
we actually don't create a STATEMENT_LIST for other reasons?
Don't we optimize away COMPOUND_EXPR lhs if it doesn't have TREE_SIDE_EFFECTS,
and we'd need COMPOUND_EXPR to have no TREE_SIDE_EFFECTS as whole.

> >  Neither # DEBUG BEGIN_STMT nor that NOP_EXPR have
> > TREE_SIDE_EFFECTS, but STATEMENT_LIST has TREE_SIDE_EFFECTS already from
> > make_node and the gimplifier (and apparently the C++ FE too) checks
> > just that bit.  With { { { 0; } { 1; } { 2; } { 3; } } } one can end up
> > with quite large STATEMENT_LIST subtrees which in reality still don't
> > have any side-effects.
> > Maintaining accurate TREE_SIDE_EFFECTS bit on STATEMENT_LIST would be hard,
> > if we would only add and never remove statements, then we could just clear
> > it during make_node and set it whenever adding TREE_SIDE_EFFECTS statement
> > into it, but as soon as we sometimes remove from STATEMENT_LIST, or merge
> > STATEMENT_LISTs etc., maintaining this is too IMHO expensive, especially
> > when we usually just will not care about it.
> > So, I think it is better to just compute this lazily in the few spots where
> > we are interested about this, in real-world testcases most of the
> > STATEMENT_LISTs will have side-effects and should find them pretty early
> > when walking the tree.
> > As a side-effect, this patch will handle those
> > { { { 0; } { 1; } { 2; } { 3; } } } and similar then/else statement lists
> > better.
> 
> I don't like this too much.  Iff then we should do "real" lazy
> computation, like adding a TREE_SIDE_EFFECTS_VALID flag on STATEMENT_LIST,
> keeping TREE_SIDE_EFFECTS up-to-date when easily possible and when doing

How would that possible?  I have 3 nested STATEMENT_LISTs, and remove the
only statement with TREE_SIDE_EFFECTS from the innermost one.  I can clear
TREE_SIDE_EFFECTS_VALID from that STATEMENT_LIST easily, but what would fix
up the 2 parent ones?

> the expensive thing cache the result.  That said, I'm not convinced
> this will fix -fcompare-debug issues for good.  Is it really necessary
> to introduce this IL difference so early and in such an intrusive way?
> 
> Can't we avoid adding # DEBUG BEGIN_STMT when there's not already
> a STATEMENT_LIST around for example?

I'll defer that to Alex.  Or we could surely just unset TREE_SIDE_EFFECTS
when parsing a STATEMENT_LIST that contains just a single !TREE_SIDE_EFFECTS
statement other than the # DEBUG BEGIN_STMT markers.  The real question is
what we do without -g when removing stuff from the STATEMENT_LISTs.  Do we
fold those into the only statement if we end up with just one, or optimize
away completely if it contains none, or do we just keep around
STATEMENT_LIST containing just the 0; or nothing at all?
If that is the case and whether there is a STATEMENT_LIST or not depends
purely on whether we've ever created one, then perhaps clearing the
TREE_SIDE_EFFECTS during parsing, or starting with clear TREE_SIDE_EFFECTS
in make_node for STATEMENT_LIST and updating it on additions to the
STATEMENT_LIST would do the trick.

Jakub


Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford
 wrote:
> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
> isn't needed with the new VECTOR_CST layout.  It's really just the
> original patch with bits removed, but just in case:
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install?

To keep things simple at this point OK.  Note that I'd eventually
like to see this as VEC_PERM_EXPR .
For reductions when we need { x, 0, ... } we now have to use a
VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR
to merge it with {0, ... }, right?  Rather than VEC_PERM_EXPR 

Thanks,
Richard.

> Richard
>
>
> 2017-12-15  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * doc/generic.texi (VEC_DUPLICATE_EXPR): Document.
> (VEC_COND_EXPR): Add missing @tindex.
> * doc/md.texi (vec_duplicate@var{m}): Document.
> * tree.def (VEC_DUPLICATE_EXPR): New tree codes.
> * tree.c (build_vector_from_val): Add stubbed-out handling of
> variable-length vectors, using VEC_DUPLICATE_EXPR.
> (uniform_vector_p): Handle VEC_DUPLICATE_EXPR.
> * cfgexpand.c (expand_debug_expr): Likewise.
> * tree-cfg.c (verify_gimple_assign_unary): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
> * fold-const.c (const_unop): Fold VEC_DUPLICATE_EXPRs of a constant.
> (test_vec_duplicate_folding): New function.
> (fold_const_c_tests): Call it.
> * optabs.def (vec_duplicate_optab): New optab.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
> * optabs.h (expand_vector_broadcast): Declare.
> * optabs.c (expand_vector_broadcast): Make non-static.  Try using
> vec_duplicate_optab.
> * expr.c (store_constructor): Try using vec_duplicate_optab for
> uniform vectors.
> (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
>
> Index: gcc/doc/generic.texi
> ===
> --- gcc/doc/generic.texi2017-12-15 00:24:47.213516622 +
> +++ gcc/doc/generic.texi2017-12-15 00:24:47.498459276 +
> @@ -1768,6 +1768,7 @@ a value from @code{enum annot_expr_kind}
>
>  @node Vectors
>  @subsection Vectors
> +@tindex VEC_DUPLICATE_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1779,9 +1780,14 @@ a value from @code{enum annot_expr_kind}
>  @tindex VEC_PACK_TRUNC_EXPR
>  @tindex VEC_PACK_SAT_EXPR
>  @tindex VEC_PACK_FIX_TRUNC_EXPR
> +@tindex VEC_COND_EXPR
>  @tindex SAD_EXPR
>
>  @table @code
> +@item VEC_DUPLICATE_EXPR
> +This node has a single operand and represents a vector in which every
> +element is equal to that operand.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===
> --- gcc/doc/md.texi 2017-12-15 00:24:47.213516622 +
> +++ gcc/doc/md.texi 2017-12-15 00:24:47.499459075 +
> @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
>  the vector mode @var{m}, or a vector mode with the same element mode and
>  smaller number of elements.
>
> +@cindex @code{vec_duplicate@var{m}} instruction pattern
> +@item @samp{vec_duplicate@var{m}}
> +Initialize vector output operand 0 so that each element has the value given
> +by scalar input operand 1.  The vector has mode @var{m} and the scalar has
> +the mode appropriate for one element of @var{m}.
> +
> +This pattern only handles duplicates of non-constant inputs.  Constant
> +vectors go through the @code{mov@var{m}} pattern instead.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===
> --- gcc/tree.def2017-12-15 00:24:47.213516622 +
> +++ gcc/tree.def2017-12-15 00:24:47.505457868 +
> @@ -537,6 +537,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
> 1 and 2 are NULL.  The operands are then taken from the cfg edges. */
>  DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Represents a vector in which every element is equal to operand 0.  */
> +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
> vector operands.
>
> Index: gcc/tree.c
> ===
> --- gcc/tree.c  2017-12-15 00:24:47.213516622 +
> +++ gcc/tree.c  2017-12-15 00:24

Re: [PATCH] Fix (-A) - B -> (-B) - A optimization in fold_binary_loc (PR tree-optimization/83269)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 09:38:52AM +0100, Richard Biener wrote:
> On Thu, 14 Dec 2017, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > As the following testcase shows, the (-A) - B -> (-B) - A optimization can't
> > be done the way it is if the negation of A is performed in type with
> > wrapping behavior while the subtraction is done in signed type (with the
> > same precision), as if A is (unsigned) INT_MIN, then (int) -(unsigned) 
> > INT_MIN
> > is INT_MIN and INT_MIN - B is different from (-B) - INT_MIN.
> > The reason we can see this is because we check that arg0 is NEGATE_EXPR, but
> > arg0 is STRIP_NOPS from op0.  If the NEGATE_EXPR is already done in signed
> > type, then it would be already UB if A was INT_MIN and so we can safely do
> > it.
> > 
> > Whether we perform the subtraction in the unsigned type or just don't
> > optimize I think doesn't matter that much, at least the only spot during
> > x86_64-linux and i686-linux bootstraps/regtests this new condition triggered
> > was the new testcase, nothing else.  So if you instead prefer to punt, I can
> > tweak the patch, move the negated condition to the if above it.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I think a better fix would be to just check TREE_CODE (op0) == NEGATE_EXPR
> and use op0, like we do for op1 (probably fixed that earlier).  I'd rather
> not complicate the fold-const.c code more at this point.

That would regress the case when type is unsigned.  If you don't want to
complicate fold-const.c, my preference would be to add the extra && !, it
isn't that much.

Of course, a question is why this optimization hasn't been moved to match.pd
when others had been.

2017-12-15  Jakub Jelinek  

PR tree-optimization/83269
* fold-const.c (fold_binary_loc): Perform (-A) - B -> (-B) - A
subtraction in arg0's type if type is signed and arg0 is unsigned.
Formatting fix.

* gcc.c-torture/execute/pr83269.c: New test.

--- gcc/fold-const.c.jj 2017-12-08 00:50:27.0 +0100
+++ gcc/fold-const.c2017-12-14 17:42:31.221398170 +0100
@@ -9098,8 +9098,8 @@ expr_not_equal_to (tree t, const wide_in
return NULL_TREE.  */
 
 tree
-fold_binary_loc (location_t loc,
-enum tree_code code, tree type, tree op0, tree op1)
+fold_binary_loc (location_t loc, enum tree_code code, tree type,
+tree op0, tree op1)
 {
   enum tree_code_class kind = TREE_CODE_CLASS (code);
   tree arg0, arg1, tem;
@@ -9769,11 +9769,18 @@ fold_binary_loc (location_t loc,
 
   /* (-A) - B -> (-B) - A  where B is easily negated and we can swap.  */
   if (TREE_CODE (arg0) == NEGATE_EXPR
- && negate_expr_p (op1))
-   return fold_build2_loc (loc, MINUS_EXPR, type,
-   negate_expr (op1),
-   fold_convert_loc (loc, type,
- TREE_OPERAND (arg0, 0)));
+ && negate_expr_p (op1)
+ /* If arg0 is e.g. unsigned int and type is int, then this could
+introduce UB, because if A is INT_MIN at runtime, the original
+expression can be well defined while the latter is not.
+See PR83269.  */
+ && !(ANY_INTEGRAL_TYPE_P (type)
+  && TYPE_OVERFLOW_UNDEFINED (type)
+  && ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
+  && !TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0
+   return fold_build2_loc (loc, MINUS_EXPR, type, negate_expr (op1),
+   fold_convert_loc (loc, type,
+ TREE_OPERAND (arg0, 0)));
 
   /* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
 __complex__ ( x, -y ).  This is not the same for SNaNs or if
--- gcc/testsuite/gcc.c-torture/execute/pr83269.c.jj2017-12-14 
17:43:24.534710997 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr83269.c   2017-12-14 
17:43:10.0 +0100
@@ -0,0 +1,14 @@
+/* PR tree-optimization/83269 */
+
+int
+main ()
+{
+#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ > 4 && __CHAR_BIT__ == 8
+  volatile unsigned char a = 1;
+  long long b = 0x8000L;
+  int c = -((int)(-b) - (-0x7fff * a));
+  if (c != 1)
+__builtin_abort ();
+#endif
+  return 0;
+}


Jakub


Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 1:34 AM, Richard Sandiford
 wrote:
> Similarly to the update 05 patch, this patch just adds VEC_SERIES_EXPR,
> since the VEC_SERIES_CST isn't needed with the new VECTOR_CST layout.
> build_vec_series now uses the new VECTOR_CST layout, but otherwise
> this is just the original patch with bits removed.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install?

Given we need to use VEC_DUPLICATE + VEC_PERM for {x, 0... }(?)
how about doing VEC_DUPLICATE and PLUS for this one?  Or is 'step'
allowed to be non-constant?  It seems to be.

Ah well.

OK.

Thanks,
Richard.

> Richard
>
>
> 2017-12-15  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * doc/generic.texi (VEC_SERIES_EXPR): Document.
> * doc/md.texi (vec_series@var{m}): Document.
> * tree.def (VEC_SERIES_EXPR): New tree code.
> * tree.h (build_vec_series): Declare.
> * tree.c (build_vec_series): New function.
> * cfgexpand.c (expand_debug_expr): Handle VEC_SERIES_EXPR.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * expr.c (expand_expr_real_2): Likewise.
> * optabs-tree.c (optab_for_tree_code): Likewise.
> * tree-cfg.c (verify_gimple_assign_binary): Likewise.
> * fold-const.c (const_binop): Fold VEC_SERIES_EXPRs of constants.
> * expmed.c (make_tree): Handle VEC_SERIES.
> * optabs.def (vec_series_optab): New optab.
> * optabs.h (expand_vec_series_expr): Declare.
> * optabs.c (expand_vec_series_expr): New function.
> * tree-vect-generic.c (expand_vector_operations_1): Check that
> the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===
> --- gcc/doc/generic.texi2017-12-15 00:30:46.596993903 +
> +++ gcc/doc/generic.texi2017-12-15 00:30:46.911991495 +
> @@ -1769,6 +1769,7 @@ a value from @code{enum annot_expr_kind}
>  @node Vectors
>  @subsection Vectors
>  @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1788,6 +1789,14 @@ a value from @code{enum annot_expr_kind}
>  This node has a single operand and represents a vector in which every
>  element is equal to that operand.
>
> +@item VEC_SERIES_EXPR
> +This node represents a vector formed from a scalar base and step,
> +given as the first and second operands respectively.  Element @var{i}
> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
> +
> +This node is restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===
> --- gcc/doc/md.texi 2017-12-15 00:30:46.596993903 +
> +++ gcc/doc/md.texi 2017-12-15 00:30:46.912991487 +
> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{vec_series@var{m}} instruction pattern
> +@item @samp{vec_series@var{m}}
> +Initialize vector output operand 0 so that element @var{i} is equal to
> +operand 1 plus @var{i} times operand 2.  In other words, create a linear
> +series whose base value is operand 1 and whose step is operand 2.
> +
> +The vector output has mode @var{m} and the scalar inputs have the mode
> +appropriate for one element of @var{m}.  This pattern is not used for
> +floating-point vectors, in order to avoid having to specify the
> +rounding behavior for @var{i} > 1.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===
> --- gcc/tree.def2017-12-15 00:30:46.596993903 +
> +++ gcc/tree.def2017-12-15 00:30:46.919991433 +
> @@ -540,6 +540,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
>  /* Represents a vector in which every element is equal to operand 0.  */
>  DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>
> +/* Vector series created from a start (base) value and a step.
> +
> +   A = VEC_SERIES_EXPR (B, C)
> +
> +   means
> +
> +   for (i = 0; i < N; i++)
> + A[i] = B + C * i;  */
> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
> vector operands.
>
> Index: gcc/tree.h
> =

Re: [14/nn] Add helpers for shift count modes

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 1:48 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
  wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>  wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern).  Is it the mode of the shifted
>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>> the corresponding target pattern says?  (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch.  As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted.  As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
>
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
>>>
>>> OK.  I'd gone for word_mode because that's what expand_binop uses
>>> for CONST_INTs:
>>>
>>>   op1_mode = (GET_MODE (op1) != VOIDmode
>>>   ? as_a  (GET_MODE (op1))
>>>   : word_mode);
>>>
>>> But using the inner mode should be fine too.  The patch below does that.
>>>
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
>>>
>>> ...but I can do that instead if you think it's better.
>>>
> Just throwing in some comments here, RTL isn't my primary
> expertise.

 To add a little bit - shift amounts is maybe the only(?) place
 where a modeless CONST_INT makes sense!  So "fixing"
 that first sounds backwards.
>>>
>>> But even here they have a mode conceptually, since out-of-range shift
>>> amounts are target-defined rather than undefined.  E.g. if the target
>>> interprets the shift amount as unsigned, then for a shift amount
>>> (const_int -1) it matters whether the mode is QImode (and so we're
>>> shifting by 255) or HImode (and so we're shifting by 65535.
>>
>> I think RTL is well-defined (at least I hope so ...) and machine constraints
>> need to be modeled explicitely (like embedding an implicit bit_and in
>> shift patterns).
>
> Well, RTL is well-defined in the sense that if you have
>
>   (ashift X (foo:HI ...))
>
> then the shift amount must be interpreted as HImode rather than some
> other mode.  The problem here is to define a default choice of mode for
> const_ints, in cases where the shift is being created out of the blue.
>
> Whether the shift amount is effectively signed or unsigned isn't defined
> by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
> out-of-range values, and the behaviour for out-of-range RTL shifts is
> specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.
>
> I think the revised patch does implement your suggestion of using the
> integer equivalent of the inner mode as the default, but we need to
> decide whether to go with it, go with the original word_mode approach
> (taken from existing expand_binop code) or something else.  Something
> else could include the widest supported integer mode, so that we never
> change the value.

I guess it's pretty arbitrary what we choose (but we might need to adjust
targets?).  For something like this an appealing choice would be sth
that is host and target idependent, like [u]int32_t or given CONST_INT
is always 64bits now and signed int64_t aka HOST_WIDE_INT (bad
name now).  That means it's the "infinite precision" thing that fits
into CONST_INT ;)

Richard.

> Thanks,
> Richard
>
>>> OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)
>>>
>>> Jeff Law  writes:
 On 10/26/2017 06:06 AM, Richard Biener wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>  wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> 

Re: [001/nnn] poly_int: add poly-int.h

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 4:40 AM, Martin Sebor  wrote:
> On 12/07/2017 03:48 PM, Jeff Law wrote:
>>
>> On 12/07/2017 03:38 PM, Richard Sandiford wrote:
>>
 So I think that's the final ack on this series.
>>>
>>>
>>> Thanks to both of you, really appreciate it!
>>
>> Sorry it took so long.
>>
>>>
 Richard S. can you confirm?  I fully expect the trunk has moved some
 and the patches will need adjustments -- consider adjustments which
 work in a manner similar to the patches to date pre-approved.
>>>
>>>
>>> Yeah, that's now all of the poly_int patches.  I still owe you replies
>>> to some of them -- I'll get to that as soon as I can.
>>
>> NP.  I don't think any of the questions were all that significant.
>> Those which were I think you already responded to.
>
>
> I am disappointed that the no-op ctor issue hasn't been adequately
> addressed.  No numbers were presented as to the difference it makes
> to have the ctor do the expected thing (i.e., initialize the object).
> In my view, the choice seems arbitrarily in favor of a hypothetical
> performance improvement at -O0 without regard to the impact on
> correctness.  We have recently seen the adverse effects of similar
> choices in other areas: the hash table insertion[*] and the related
> offset_int initialization.

As were coming from a C code base not initializing stuff is what I expect.
I'm really surprised to see lot of default initialization done in places
where it only hurts compile-time (of GCC at least where we need to
optimize that away).

Richard.

> Martin
>
> [*] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82977
>
> PS To be clear, the numbers I asked for were those showing
> the difference between a no-op ctor and one that initializes
> the object to some determinate state, whatever that is.  IIUC
> the numbers in the following post show the aggregate slowdown
> for many or most of the changes in the series, not just
> the ctor.  If the numbers were significant, I suggested
> a solution to explicitly request a non-op ctor to make
> the default safe and eliminate the overhead where it mattered.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01028.html


Patch ping^2

2017-12-15 Thread Jakub Jelinek
Hi!

I'd like to ping a bunch of patches:

http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02521.html 
   
  PR c++/83205 - diagnose invalid std::tuple_size::value for structured  
   
 bindings; the follow-up with plural spelling is approved   
   
 already
   

   
http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02629.html 
   
  PR c++/83217 - handle references to non-complete type in structured   
   
 bindings   
   

http://gcc.gnu.org/ml/gcc-patches/2017-12/msg00184.html
  PR c++/81197 - fix ICE with structured bindings and extend_ref_init_temps

http://gcc.gnu.org/ml/gcc-patches/2017-12/msg00391.html
  PR c++/83300 - RFC fix for C++ late attribute handling

http://gcc.gnu.org/ml/gcc-patches/2017-12/msg00507.html
  PR c++/80135, c++/81922 - fix ICEs with flexible array member initialization
in nested contexts

http://gcc.gnu.org/ml/gcc-patches/2017-12/msg00395.html
  PR sanitizer/81281 - further improvements for match.pd
   (T)(P + A) - (T)(P + B) -> (T)A - (T)B optimization

http://gcc.gnu.org/ml/gcc-patches/2017-12/msg00346.html
  PR target/41455, target/82935 - use tail calls in aggregate copy or clear
  expanded as memcpy or memset call if possible

Jakub


Re: [PATCH] Fix (-A) - B -> (-B) - A optimization in fold_binary_loc (PR tree-optimization/83269)

2017-12-15 Thread Richard Biener
On Fri, 15 Dec 2017, Jakub Jelinek wrote:

> On Fri, Dec 15, 2017 at 09:38:52AM +0100, Richard Biener wrote:
> > On Thu, 14 Dec 2017, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > As the following testcase shows, the (-A) - B -> (-B) - A optimization 
> > > can't
> > > be done the way it is if the negation of A is performed in type with
> > > wrapping behavior while the subtraction is done in signed type (with the
> > > same precision), as if A is (unsigned) INT_MIN, then (int) -(unsigned) 
> > > INT_MIN
> > > is INT_MIN and INT_MIN - B is different from (-B) - INT_MIN.
> > > The reason we can see this is because we check that arg0 is NEGATE_EXPR, 
> > > but
> > > arg0 is STRIP_NOPS from op0.  If the NEGATE_EXPR is already done in signed
> > > type, then it would be already UB if A was INT_MIN and so we can safely do
> > > it.
> > > 
> > > Whether we perform the subtraction in the unsigned type or just don't
> > > optimize I think doesn't matter that much, at least the only spot during
> > > x86_64-linux and i686-linux bootstraps/regtests this new condition 
> > > triggered
> > > was the new testcase, nothing else.  So if you instead prefer to punt, I 
> > > can
> > > tweak the patch, move the negated condition to the if above it.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > I think a better fix would be to just check TREE_CODE (op0) == NEGATE_EXPR
> > and use op0, like we do for op1 (probably fixed that earlier).  I'd rather
> > not complicate the fold-const.c code more at this point.
> 
> That would regress the case when type is unsigned.  If you don't want to
> complicate fold-const.c, my preference would be to add the extra && !, it
> isn't that much.

Ok, that works for me.

> Of course, a question is why this optimization hasn't been moved to match.pd
> when others had been.

Mostly laziness and the "fear" of match.pd negate_expr_p not being
powerful enough (it isn't recursive as the fold-const.c one and it
doesn't have all the complicated cases).

Thanks,
Richard.

> 2017-12-15  Jakub Jelinek  
> 
>   PR tree-optimization/83269
>   * fold-const.c (fold_binary_loc): Perform (-A) - B -> (-B) - A
>   subtraction in arg0's type if type is signed and arg0 is unsigned.
>   Formatting fix.
> 
>   * gcc.c-torture/execute/pr83269.c: New test.
> 
> --- gcc/fold-const.c.jj   2017-12-08 00:50:27.0 +0100
> +++ gcc/fold-const.c  2017-12-14 17:42:31.221398170 +0100
> @@ -9098,8 +9098,8 @@ expr_not_equal_to (tree t, const wide_in
> return NULL_TREE.  */
>  
>  tree
> -fold_binary_loc (location_t loc,
> -  enum tree_code code, tree type, tree op0, tree op1)
> +fold_binary_loc (location_t loc, enum tree_code code, tree type,
> +  tree op0, tree op1)
>  {
>enum tree_code_class kind = TREE_CODE_CLASS (code);
>tree arg0, arg1, tem;
> @@ -9769,11 +9769,18 @@ fold_binary_loc (location_t loc,
>  
>/* (-A) - B -> (-B) - A  where B is easily negated and we can swap.  */
>if (TREE_CODE (arg0) == NEGATE_EXPR
> -   && negate_expr_p (op1))
> - return fold_build2_loc (loc, MINUS_EXPR, type,
> - negate_expr (op1),
> - fold_convert_loc (loc, type,
> -   TREE_OPERAND (arg0, 0)));
> +   && negate_expr_p (op1)
> +   /* If arg0 is e.g. unsigned int and type is int, then this could
> +  introduce UB, because if A is INT_MIN at runtime, the original
> +  expression can be well defined while the latter is not.
> +  See PR83269.  */
> +   && !(ANY_INTEGRAL_TYPE_P (type)
> +&& TYPE_OVERFLOW_UNDEFINED (type)
> +&& ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg0))
> +&& !TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0
> + return fold_build2_loc (loc, MINUS_EXPR, type, negate_expr (op1),
> + fold_convert_loc (loc, type,
> +   TREE_OPERAND (arg0, 0)));
>  
>/* Fold __complex__ ( x, 0 ) - __complex__ ( 0, y ) to
>__complex__ ( x, -y ).  This is not the same for SNaNs or if
> --- gcc/testsuite/gcc.c-torture/execute/pr83269.c.jj  2017-12-14 
> 17:43:24.534710997 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr83269.c 2017-12-14 
> 17:43:10.0 +0100
> @@ -0,0 +1,14 @@
> +/* PR tree-optimization/83269 */
> +
> +int
> +main ()
> +{
> +#if __SIZEOF_INT__ == 4 && __SIZEOF_LONG_LONG__ > 4 && __CHAR_BIT__ == 8
> +  volatile unsigned char a = 1;
> +  long long b = 0x8000L;
> +  int c = -((int)(-b) - (-0x7fff * a));
> +  if (c != 1)
> +__builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] make canonicalize_condition keep its promise

2017-12-15 Thread Segher Boessenkool
On Thu, Dec 14, 2017 at 01:43:35PM -0700, Jeff Law wrote:
> On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> >   There is no existing loop structure. This starts with a memcmp() call
> > and then goes down through the builtin expansion mechanism, which is
> > ultimately expanding the pattern cmpmemsi which is where my code is
> > generating a loop that finishes with bdnzt. The code that's ultimately
> > generated looks like this:
> Understood.  But what I still struggle with is how you're getting into
> check_simple_exit to begin with and whether or not that should be happening.
> 
> The only way to get into check_simple_exit is via find_simple_exit which
> is only called from get_simple_loop_desc.
> 
> And if you're calling get_simple_loop_desc, then there is some kind of
> loop structure in place AFAICT that contains this insn which is rather
> surprising.

Why?  It *is* a loop!

Or are you wondering why loop-iv.c is involved?  get_simple_loop_desc
in there is also called from much later in RTL passes.

> > I really think the ultimate problem here is that both
> > canonicalize_condition and get_condition promise in their documenting
> > comments that they will return something that has a cond at the root of
> > the rtx, or 0 if they don't understand what they're given. In this case
> > they do not understand the rtx of bdnzt and are returning rtx rooted
> > with an and, not a cond. This may seem like papering over the problem,
> > but I think it is legitimate for these functions to return 0 when the
> > branch insn in question does not have a simple cond at the heart of it.
> > And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> > Ultimately, yes something better ought to be done here.
> 
> 
> 
> Your pattern has the form:
> 
>   [(set (pc)
>   (if_then_else
> (and
>(ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
>(const_int 1))
>(match_operator 3 "branch_comparison_operator"
> [(match_operand 4 "cc_reg_operand" "y,y,y,y")
>  (const_int 0)]))
> (label_ref (match_operand 0))
> (pc)))
>(set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
>   (plus:P (match_dup 1)
>   (const_int -1)))
>(clobber (match_scratch:P 5 "=X,X,&r,r"))
>(clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>(clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> 
> 
> 
> That's a form that get_condition knows how to parse.  It's going to pull
> out the condition which looks like this:
> 
> 
> (and
>(ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
>(const_int 1))
>(match_operator 3 "branch_comparison_operator"
> [(match_operand 4 "cc_reg_operand" "y,y,y,y")
>  (const_int 0)]))
> 
> ANd pass that down to canonicalize_condition.  That doesn't look like
> something canonicalize_condition should handle and thus it ought to be
> returning NULL_RTX.

Yes exactly.

> However, I'm still concerned about how we got to a point where this is
> happening.  So while we can fix canonicalize_condition to reject this
> form (and you can argue we should and I'd generally agree with you), it
> could well be papering over a problem earlier.

canonicalize_condition does not do what its documentation says it does.
Fixing that is not papering over a problem.  Of course there could be a
problem elsewhere, sure.  But *this* problem is blocking Aaron's other
patches right now (which are approved and ready to go in).


Segher


[PATCH] RL78 pragma address

2017-12-15 Thread Sebastian Perta
Hello 

The following patch adds a new pragma, "pragma address" for RL78.
The patch updates extend.texi and add a test case to the regression as well.
For the test case I checked than test is getting picked up in gcc.log
unfortunately 
for the .texi part I don't know where to look/what to do to get the
documentation generated.

This is similar to the pragma address implemented for M32C.

Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rl78-sim

Please let me know if this is OK, Thank you!
Sebastian

Index: ChangeLog
===
--- ChangeLog   (revision 255643)
+++ ChangeLog   (working copy)
@@ -1,3 +1,19 @@
+2017-12-14  Sebastian Perta  
+
+   * config/rl78/rl78.c (rl78_get_pragma_address): New function
+   * config/rl78/rl78.c (rl78_note_pragma_address): New function 
+   * config/rl78/rl78.c (rl78_output_aligned_common): use .set instead 
+   of .comm for pragma address variables
+   * config/rl78/rl78.c (rl78_insert_attributes): make pragma address 
+   variables volatile
+   * config/rl78/rl78-c.c (rl78_pragma_address): New function
+   * config/rl78/rl78-c.c (rl78_register_pragmas): registered the new 
+   pragma address
+   * config/rl78/rl78-protos.h: New declaration
rl78_note_pragma_address
+   * doc/entend.texi: Added documenation for RL78 pragmas
+   * testsuite/gcc.target/rl78/test_pragma_address.c: New file
+   
+   
 2017-12-14  Andreas Schwab  
 
PR bootstrap/83396
Index: config/rl78/rl78-c.c
===
--- config/rl78/rl78-c.c(revision 255643)
+++ config/rl78/rl78-c.c(working copy)
@@ -23,7 +23,42 @@
 #include "coretypes.h"
 #include "tm.h"
 #include "c-family/c-common.h"
+#include "c-family/c-pragma.h"
+#include "rl78-protos.h"
 
+/* Implements the "pragma ADDRESS" pragma.  This pragma takes a
+   variable name and an address, and arranges for that variable to be
+   "at" that address.  The variable is also made volatile.  */
+static void
+rl78_pragma_address (cpp_reader * reader ATTRIBUTE_UNUSED)
+{
+  /* on off */
+  tree var, addr;
+  enum cpp_ttype type;
+
+  type = pragma_lex (&var);
+  if (type == CPP_NAME)
+{
+  type = pragma_lex (&addr);
+  if (type == CPP_NUMBER)
+   {
+ if (var != error_mark_node)
+   {
+ unsigned uaddr = tree_to_uhwi (addr);
+ rl78_note_pragma_address (IDENTIFIER_POINTER (var), uaddr);
+   }
+
+ type = pragma_lex (&var);
+ if (type != CPP_EOF)
+   {
+ error ("junk at end of #pragma ADDRESS");
+   }
+ return;
+   }
+}
+  error ("malformed #pragma ADDRESS variable address");
+}
+
 /* Implements REGISTER_TARGET_PRAGMAS.  */
 void
 rl78_register_pragmas (void)
@@ -30,4 +65,7 @@
 {
   c_register_addr_space ("__near", ADDR_SPACE_NEAR);
   c_register_addr_space ("__far", ADDR_SPACE_FAR);
+  
+  c_register_pragma (NULL, "ADDRESS", rl78_pragma_address);
+  c_register_pragma (NULL, "address", rl78_pragma_address);
 }
Index: config/rl78/rl78-protos.h
===
--- config/rl78/rl78-protos.h   (revision 255643)
+++ config/rl78/rl78-protos.h   (working copy)
@@ -52,6 +52,7 @@
 intrl78_sfr_p (rtx x);
 void   rl78_output_aligned_common (FILE *, tree, const char *,
int, int, int);
+void   rl78_note_pragma_address (const char *varname, unsigned
address);
 
 intrl78_one_far_p (rtx *operands, int num_operands);
 
Index: config/rl78/rl78.c
===
--- config/rl78/rl78.c  (revision 255643)
+++ config/rl78/rl78.c  (working copy)
@@ -4565,6 +4565,30 @@
   fputs (str2, file);
 }
 
+struct GTY(()) pragma_entry {
+  const char *varname;
+  unsigned address;
+};
+typedef struct pragma_entry pragma_entry;
+
+/* Hash table of pragma info.  */
+static GTY(()) hash_map *pragma_htab;
+
+static bool
+rl78_get_pragma_address (const char *varname, unsigned *address)
+{
+  if (!pragma_htab)
+return false;
+
+  unsigned int *slot = pragma_htab->get (varname);
+  if (slot)
+{
+  *address = *slot;
+  return true;
+}
+  return false;
+}
+
 void
 rl78_output_aligned_common (FILE *stream,
tree decl ATTRIBUTE_UNUSED,
@@ -4571,6 +4595,7 @@
const char *name,
int size, int align, int global)
 {
+  unsigned int address;
   /* We intentionally don't use rl78_section_tag() here.  */
   if (name[0] == '@' && name[2] == '.')
 {
@@ -4609,14 +4634,34 @@
   assemble_name (stream, name);
   fprintf (stream, "\n");
 }
-  fprintf (stream, "\t.comm\t");
-  assemble_name (stream, name);
-  fprintf (stream, ",%u,%u\n", size, align / BITS_PE

Re: [PATCH] Further improvements for the (T)(P+A)-(T)(P+B) folding (PR sanitizer/81281)

2017-12-15 Thread Richard Biener
On Thu, 7 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> When committing the previous PR81281 patch, I've removed all the @@0 cases
> on plus:c, used @0 instead, to make sure we don't regress.
> 
> This patch readds those where possible.  For the cases where there is
> just P and A, it was mostly a matter of @@0 and convert? instead of convert
> plus using type from @1 instead of @0, though if @0 is INTEGER_CST, what we
> usually end up with is a (plus (convert (plus @1 @0) @2) where @2 negated
> is equal to @0, so the patch adds a simplification for that too.
> 
> For the case with P, A and B, the patch limits it to the case where either
> both A and B are narrower or both are wider.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Comments below.

> 2017-12-07  Jakub Jelinek  
> 
>   PR sanitizer/81281
>   * match.pd ((T)(P + A) - (T)P -> (T) A): Use @@0 instead of @0 and
>   convert? on @0 instead of convert.  Check type of @1, not @0.
>   Add a simplify for (T)(P + A) + Q where -Q is equal to P.
>   ((T)P - (T)(P + A) -> -(T) A): Use @@0 instead of @0 and
>   convert? on @0 instead of convert.  Check type of @1, not @0.
>   ((T)(P + A) - (T)(P + B) -> (T)A - (T)B): Use @@0 instead of @0,
>   only optimize if either both @1 and @2 types are narrower
>   precision, or both are wider or equal precision, and in the former
>   case only if both have undefined overflow.
> 
>   * gcc.dg/pr81281-3.c: New test.
> 
> --- gcc/match.pd.jj   2017-12-07 14:00:51.083048186 +0100
> +++ gcc/match.pd  2017-12-07 15:17:49.132784931 +0100
> @@ -1784,8 +1784,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  
>/* (T)(P + A) - (T)P -> (T) A */
>(simplify
> -   (minus (convert (plus:c @0 @1))
> -(convert @0))
> +   (minus (convert (plus:c @@0 @1))
> +(convert? @0))
> (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
>   /* For integer types, if A has a smaller type
>  than T the result depends on the possible
> @@ -1794,10 +1794,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  However, if an overflow in P + A would cause
>  undefined behavior, we can assume that there
>  is no overflow.  */
> - || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> - && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
> + || (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1

Given @1 and @@0 are in the same plus this change isn't technically
necessary but it makes it clearer which type we look at (thus ok).

>  (convert @1)))
>(simplify
> +   (plus (convert (plus @1 INTEGER_CST@0)) INTEGER_CST@2)
> +   (with { bool overflow;
> +wide_int w = wi::neg (wi::to_wide (@2), &overflow); }
> +(if (wi::to_widest (@0) == widest_int::from (w, TYPE_SIGN (TREE_TYPE 
> (@2)))
> +  && (!overflow
> +  || (INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +  && TYPE_UNSIGNED (TREE_TYPE (@2
> +  && (element_precision (type) <= element_precision (TREE_TYPE (@1))
> +  /* For integer types, if A has a smaller type
> + than T the result depends on the possible
> + overflow in P + A.
> + E.g. T=size_t, A=(unsigned)429497295, P>0.
> + However, if an overflow in P + A would cause
> + undefined behavior, we can assume that there
> + is no overflow.  */
> +  || (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)

I think we don't need to worry about definedness of overflow.  All
that matters is whether twos complement arithmetic will simplify
the expression to (convert @1).  Specifically the possible overflow
of the negation of @2 for the case element_precision (type) <= 
element_precision (TREE_TYPE (@1)) shouldn't matter, likewise
for the widening case (we'd never get the equality).

Don't we want to compare @0 and -@2 in the type of @2?  Like
for (unsigned int)(unsigned-long-x + 0x10005) + -5U which
we should be able to simplify?  For the widening case that would
work as well as far as I can see?

If you can split out this new pattern the rest is ok with honoring
the comment below.

> + (convert @1
> +  (simplify
> (minus (convert (pointer_plus @@0 @1))
>  (convert @0))
> (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
> @@ -1818,8 +1837,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  
>/* (T)P - (T)(P + A) -> -(T) A */
>(simplify
> -   (minus (convert @0)
> -(convert (plus:c @0 @1)))
> +   (minus (convert? @0)
> +(convert (plus:c @@0 @1)))
> (if (INTEGRAL_TYPE_P (type)
>   && TYPE_OVERFLOW_UNDEFINED (type)
>   && element_precision (type) <= element_precision (TREE_TYPE (@1)))
> @@ -1833,8 +1852,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   However, if an overflow in P + A would cause
>   undefined behavior, we can assume that there
>   is

Re: [PATCH] Use tail calls to memcpy/memset even for structure assignments (PR target/41455, PR target/82935)

2017-12-15 Thread Richard Biener
On Wed, 6 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> Aggregate assignments and clears aren't in GIMPLE represented as calls,
> and while often they expand inline, sometimes we emit libcalls for them.
> This patch allows us to tail call those libcalls if there is nothing
> after them.  The patch changes the tailcall pass, so that it recognizes
> a = b; and c = {}; statements under certain conditions as potential tail
> calls returning void, and if it finds good tail call candidates, it marks
> them specially.  Because we have only a single bit left for GIMPLE_ASSIGN,
> I've decided to wrap the rhs1 into a new internal call, so
> a = b; will be transformed into a = TAILCALL_ASSIGN (b); and
> c = {}; will be transformed into c = TAILCALL_ASSIGN ();
> The rest of the patch is about propagating the flag (may use tailcall if
> the emit_block_move or clear_storage is the last thing emitted) down
> through expand_assignment and functions it calls.
> 
> Those functions use 1-3 other flags, so instead of adding another bool
> to all of them (next to nontemporal, call_param_p, reverse) I've decided
> to pass around a bitmask of flags.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hum, it doesn't look pretty ;)  Can we defer this to stage1 given
it's a long-standing issue and we have quite big changes going in still?

Thanks,
Richard.

> 2017-12-06  Jakub Jelinek  
> 
>   PR target/41455
>   PR target/82935
>   * internal-fn.def (TAILCALL_ASSIGN): New internal function.
>   * internal-fn.c (expand_LAUNDER): Pass EXPAND_FLAG_NORMAL to
>   expand_assignment.
>   (expand_TAILCALL_ASSIGN): New function.
>   * tree-tailcall.c (struct tailcall): Adjust comment.
>   (find_tail_calls): Recognize also aggregate assignments and
>   aggregate clearing as possible tail calls.  Use is_gimple_assign
>   instead of gimple_code check.
>   (optimize_tail_call): Rewrite aggregate assignments or aggregate
>   clearing in tail call positions using IFN_TAILCALL_ASSIGN
>   internal function.
>   * tree-outof-ssa.c (insert_value_copy_on_edge): Adjust store_expr
>   caller.
>   * tree-chkp.c (chkp_expand_bounds_reset_for_mem): Adjust
>   expand_assignment caller.
>   * function.c (assign_parm_setup_reg): Likewise.
>   * ubsan.c (ubsan_encode_value): Likewise.
>   * cfgexpand.c (expand_call_stmt, expand_asm_stmt): Likewise.
>   (expand_gimple_stmt_1): Likewise.  Fix up formatting.
>   * calls.c (initialize_argument_information): Adjust store_expr caller.
>   * expr.h (enum expand_flag): New.
>   (expand_assignment): Replace bool argument with enum expand_flag.
>   (store_expr_with_bounds, store_expr): Replace int, bool, bool arguments
>   with enum expand_flag.
>   * expr.c (expand_assignment): Replace nontemporal argument with flags.
>   Assert no bits other than EXPAND_FLAG_NONTEMPORAL and
>   EXPAND_FLAG_TAILCALL are set.  Adjust store_expr, store_fields and
>   store_expr_with_bounds callers.
>   (store_expr_with_bounds): Replace call_param_p, nontemporal and
>   reverse args with flags argument.  Adjust recursive calls.  Pass
>   BLOCK_OP_TAILCALL to clear_storage and expand_block_move if
>   EXPAND_FLAG_TAILCALL is set.  Call clear_storage directly for
>   EXPAND_FLAG_TAILCALL assignments from emtpy CONSTRUCTOR.
>   (store_expr): Replace call_param_p, nontemporal and reverse args
>   with flags argument.  Adjust store_expr_with_bounds caller.
>   (store_constructor_field): Adjust store_field caller.
>   (store_constructor): Adjust store_expr and expand_assignment callers.
>   (store_field): Replace nontemporal and reverse arguments with flags
>   argument.  Adjust store_expr callers.  Pass BLOCK_OP_TAILCALL to
>   emit_block_move if EXPAND_FLAG_TAILCALL is set.
>   (expand_expr_real_2): Adjust store_expr and store_field callers.
>   (expand_expr_real_1): Adjust store_expr and expand_assignment callers.
> 
>   * gcc.target/i386/pr41455.c: New test.
> 
> --- gcc/internal-fn.def.jj2017-12-06 09:02:30.072952012 +0100
> +++ gcc/internal-fn.def   2017-12-06 16:56:20.958518104 +0100
> @@ -254,6 +254,11 @@ DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF
>  /* Divmod function.  */
>  DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
>  
> +/* Special markup for aggregate copy or clear that can be implemented
> +   using a tailcall.  lhs = rhs1; is represented by
> +   lhs = TAILCALL_ASSIGN (rhs1); and lhs = {}; by lhs = TAILCALL_ASSIGN ();  
> */
> +DEF_INTERNAL_FN (TAILCALL_ASSIGN, ECF_NOTHROW | ECF_LEAF, NULL)
> +
>  #undef DEF_INTERNAL_INT_FN
>  #undef DEF_INTERNAL_FLT_FN
>  #undef DEF_INTERNAL_FLT_FLOATN_FN
> --- gcc/internal-fn.c.jj  2017-12-06 09:02:29.968953307 +0100
> +++ gcc/internal-fn.c 2017-12-06 18:00:15.993826828 +0100
> @@ -2672,7 +2672,7 @@ expand_LAUNDER (internal_fn, gcall *call
>if (!lhs)
>  return;
>  
> 

[patch] More robust fix for PR target/66488

2017-12-15 Thread Eric Botcazou
Hi,

this PR was about the blow-up of the garbage collector on x86_64-w64-mingw32 
when more than 3 GB are allocated.  The fix was to set HOST_BITS_PER_PTR to 
the appropriate value (64) in config/i386/xm-mingw32.h.

This means that the same issue can happen on other P64 hosts so the attached 
patch replaces the fix by a more robust variant.  And I'm proposing that it be 
installed on all active branches (the original fix is not on the 6 branch).

Tested on x86_64-w64-mingw32 (6 branch) and x86_64-suse-linux (mainline), OK?


2017-12-15  Eric Botcazou  

PR target/66488
* ggc-page.c (HOST_BITS_PER_PTR): Do not define here...
* hwint.h (HOST_BITS_PER_PTR): ...but here instead.
* config/i386/xm-mingw32.h (HOST_BITS_PER_PTR): Delete.

-- 
Eric BotcazouIndex: config/i386/xm-mingw32.h
===
--- config/i386/xm-mingw32.h	(revision 255622)
+++ config/i386/xm-mingw32.h	(working copy)
@@ -37,8 +37,3 @@ along with GCC; see the file COPYING3.
"long long" values.  Instead, we use "I64".  */
 #define HOST_LONG_LONG_FORMAT "I64"
 #endif
-
-/* this is to prevent gcc-heap.c from assuming sizeof(long) == sizeof(intptr_t) */
-#ifdef __x86_64__
-#	define HOST_BITS_PER_PTR 64
-#endif
Index: ggc-page.c
===
--- ggc-page.c	(revision 255622)
+++ ggc-page.c	(working copy)
@@ -92,11 +92,6 @@ along with GCC; see the file COPYING3.
  4: Object marks as well.  */
 #define GGC_DEBUG_LEVEL (0)
 
-#ifndef HOST_BITS_PER_PTR
-#define HOST_BITS_PER_PTR  HOST_BITS_PER_LONG
-#endif
-
-
 /* A two-level tree is used to look up the page-entry for a given
pointer.  Two chunks of the pointer's bits are extracted to index
the first and second levels of the tree, as follows:
Index: hwint.h
===
--- hwint.h	(revision 255622)
+++ hwint.h	(working copy)
@@ -14,6 +14,7 @@
 #define HOST_BITS_PER_SHORT (CHAR_BIT * SIZEOF_SHORT)
 #define HOST_BITS_PER_INT   (CHAR_BIT * SIZEOF_INT)
 #define HOST_BITS_PER_LONG  (CHAR_BIT * SIZEOF_LONG)
+#define HOST_BITS_PER_PTR   (CHAR_BIT * SIZEOF_VOID_P)
 
 /* The string that should be inserted into a printf style format to
indicate a "long" operand.  */


[Ada] Fix inconsistent usage of Machine in s-fatgen.adb

2017-12-15 Thread Pierre-Marie de Rodat
System.Fat_Gen is a generic unit implementing support routines for floating-
point attributes, for example the 'Machine attribute.  These routines make
themselves use of the 'Machine attribute, some of them by calling the
Machine support routine directly, some others by using the attribute.

Consistency dictates that a single idiom be used and the latter is to be
preferred, since it generates better code for targets without excessive
precision issues, i.e. all of them except for x86 and x86-64.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Eric Botcazou  

* exp_attr.adb (Is_Inline_Floating_Point_Attribute): Fix comment.
* libgnat/s-fatgen.adb (Model): Use Machine attribute.
(Truncation): Likewise.

Index: libgnat/s-fatgen.adb
===
--- libgnat/s-fatgen.adb(revision 255678)
+++ libgnat/s-fatgen.adb(working copy)
@@ -394,7 +394,7 @@
 
function Model (X : T) return T is
begin
-  return Machine (X);
+  return T'Machine (X);
end Model;
 
--
@@ -739,10 +739,11 @@
   Result := abs X;
 
   if Result >= Radix_To_M_Minus_1 then
- return Machine (X);
+ return T'Machine (X);
 
   else
- Result := Machine (Radix_To_M_Minus_1 + Result) - Radix_To_M_Minus_1;
+ Result :=
+   T'Machine (Radix_To_M_Minus_1 + Result) - Radix_To_M_Minus_1;
 
  if Result > abs X then
 Result := Result - 1.0;
Index: exp_attr.adb
===
--- exp_attr.adb(revision 255678)
+++ exp_attr.adb(working copy)
@@ -8274,7 +8274,7 @@
--  Start of processing for Is_Inline_Floating_Point_Attribute
 
begin
-  --  Machine and Model can be expanded by the GCC and AAMP back ends only
+  --  Machine and Model can be expanded by the GCC back end only
 
   if Id = Attribute_Machine or else Id = Attribute_Model then
  return Is_GCC_Target;


Re: [PATCH] Use tail calls to memcpy/memset even for structure assignments (PR target/41455, PR target/82935)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 10:30:32AM +0100, Richard Biener wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Hum, it doesn't look pretty ;)  Can we defer this to stage1 given
> it's a long-standing issue and we have quite big changes going in still?

Ok, deferred.

Jakub


Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-15 Thread Sudakshina Das

Hi

On 14/12/17 18:26, Kyrill Tkachov wrote:


On 14/12/17 18:17, Sudi Das wrote:

Hi

On 14/12/17 17:37, Christophe Lyon wrote:
> On 14 December 2017 at 17:05, Sudakshina Das  wrote:
>> Hi
>>
>> This patch is a follow up on my previous patch with r255536 that was a
>> back-port for fixing a wrong code generation
>> (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
>> As pointed out by Christophe Lyon
>> (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test 
case

>> started to fail on the new dejagnu for arm-none-linux-gnueabi and
>> arm-none-eabi.
>> This patch just removes the dg-add-options from the test case 
because I

>> think dg-options has all that is needed anyway.
>>
>> Testing: Since I could not reproduce the failure on my machine, 
Christophe

>> would it be possible for you to check if this patch fixes the
>> regression for you?
>>
>
> Manually  tested on one of the offending configs, it did the trick.
> Thanks
>

Thank you so much. I will wait for an OK and commit it!



Thanks Sudi and Christophe.
The patch is ok with an appropriate ChangeLog entry.




Thanks Kyrill. I added the ChanngeLog entry. Committed with r255681.

Sudi


Kyrill


Sudi

> Christophe
>
>> Thanks
>> Sudi







[PATCH] Fix PR83388

2017-12-15 Thread Richard Biener

The following fixes removal of sanitizer IFN calls during LTO streaming
in when not linking with -fsanitize= to not interfer with IPA reference
nodes that might be attached to those stmts.  The easiest idea I could
come up with that would also work with IPA passes refering to those
refs is to replace the IFN call with a NOP-like one.  The following
patch does this, the next DCE pass will then remove those stmts
(or at -O0 expand will expand them to nothing).  If we start to
use this replacement trick for calls with a LHS DCE would need to
be taught to replace the LHS with a default-def (or if aggregate
just ignore it).  OTOH such replacement would be fishy.

Bootstrap and regtest in progress.

Any comments?

Thanks,
Richard.

2017-12-15  Richard Biener  

PR lto/83388
* internal-fn.def (IFN_NOP): Add.
* internal-fn.c (expand_NOP): Do nothing.
* lto-streamer-in.c (input_function): Instead of removing
sanitizer calls replace them with IFN_NOP calls.

* gcc.dg/lto/pr83388_0.c: New testcase.

Index: gcc/internal-fn.def
===
--- gcc/internal-fn.def (revision 255678)
+++ gcc/internal-fn.def (working copy)
@@ -254,6 +254,9 @@ DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF
 /* Divmod function.  */
 DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
 
+/* A NOP function with aribtrary arguments and return value.  */
+DEF_INTERNAL_FN (NOP, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
Index: gcc/internal-fn.c
===
--- gcc/internal-fn.c   (revision 255678)
+++ gcc/internal-fn.c   (working copy)
@@ -2722,6 +2722,14 @@ expand_DIVMOD (internal_fn, gcall *call_
   target, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Expand a NOP.  */
+
+static void
+expand_NOP (internal_fn, gcall *call_stmt)
+{
+  /* Nothing.  But it shouldn't really prevail.  */
+}
+
 /* Expand a call to FN using the operands in STMT.  FN has a single
output operand and NARGS input operands.  */
 
Index: gcc/lto-streamer-in.c
===
--- gcc/lto-streamer-in.c   (revision 255678)
+++ gcc/lto-streamer-in.c   (working copy)
@@ -1136,41 +1136,47 @@ input_function (tree fn_decl, struct dat
  if (is_gimple_call (stmt)
  && gimple_call_internal_p (stmt))
{
+ bool replace = false;
  switch (gimple_call_internal_fn (stmt))
{
case IFN_UBSAN_NULL:
  if ((flag_sanitize
  & (SANITIZE_NULL | SANITIZE_ALIGNMENT)) == 0)
-   remove = true;
+   replace = true;
  break;
case IFN_UBSAN_BOUNDS:
  if ((flag_sanitize & SANITIZE_BOUNDS) == 0)
-   remove = true;
+   replace = true;
  break;
case IFN_UBSAN_VPTR:
  if ((flag_sanitize & SANITIZE_VPTR) == 0)
-   remove = true;
+   replace = true;
  break;
case IFN_UBSAN_OBJECT_SIZE:
  if ((flag_sanitize & SANITIZE_OBJECT_SIZE) == 0)
-   remove = true;
+   replace = true;
  break;
case IFN_UBSAN_PTR:
  if ((flag_sanitize & SANITIZE_POINTER_OVERFLOW) == 0)
-   remove = true;
+   replace = true;
  break;
case IFN_ASAN_MARK:
  if ((flag_sanitize & SANITIZE_ADDRESS) == 0)
-   remove = true;
+   replace = true;
  break;
case IFN_TSAN_FUNC_EXIT:
  if ((flag_sanitize & SANITIZE_THREAD) == 0)
-   remove = true;
+   replace = true;
  break;
default:
  break;
}
- gcc_assert (!remove || gimple_call_lhs (stmt) == NULL_TREE);
+ if (replace)
+   {
+ gimple_call_set_internal_fn (as_a  (stmt),
+  IFN_NOP);
+ update_stmt (stmt);
+   }
}
}
  if (remove)
Index: gcc/testsuite/gcc.dg/lto/pr83388_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr83388_0.c(nonexistent)
+++ gcc/testsuite/gcc.dg/lto/pr83388_0.c(working copy)
@@ -0,0 +1,18 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -O2 -flto -fsanitize=nu

[Ada] Ignore external calls from instances for elaboration

2017-12-15 Thread Pierre-Marie de Rodat
This patch restores the functionality of debug switch -gnatdL to the behavior
prior to revision 255412.  The existing behavior has been associated with
switch -gnatd_i.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Hristian Kirtchev  

* debug.adb: Move the functionality of -gnatdL to -gnatd_i. Restore
the behavior of -gnatdL from before revision 255412.
* sem_elab.adb: Update the section of compiler switches.
(Build_Call_Marker): Do not create a marker for a call which originates
from an expanded spec or body of an instantiated gener, does not invoke
a generic formal subprogram, the target is external to the instance,
and -gnatdL is in effect.
(In_External_Context): New routine.
(Process_Conditional_ABE_Activation_Impl): Update the uses of -gnatdL
and associated flag.
(Process_Conditional_ABE_Call): Update the uses of -gnatdL and
associated flag.
* switch-c.adb (Scan_Front_End_Switches): Switch -gnatJ now sets switch
-gnatd_i.
* exp_unst.adb: Minor typo fixes and edits.

gcc/testsuite/

2017-12-15  Hristian Kirtchev  

* gnat.dg/abe_pkg.adb, gnat.dg/abe_pkg.ads: New testcase.
Index: checks.adb
===
--- checks.adb  (revision 255678)
+++ checks.adb  (working copy)
@@ -6819,7 +6819,7 @@
 
   if Nkind (N) /= N_Attribute_Reference
 and then (not Is_Entity_Name (N)
-or else Treat_As_Volatile (Entity (N)))
+   or else Treat_As_Volatile (Entity (N)))
   then
  Force_Evaluation (N, Mode => Strict);
   end if;
Index: debug.adb
===
--- debug.adb   (revision 255678)
+++ debug.adb   (working copy)
@@ -153,7 +153,7 @@
--  d_f
--  d_g
--  d_h
-   --  d_i
+   --  d_i  Ignore activations and calls to instances for elaboration
--  d_j
--  d_k
--  d_l
@@ -479,8 +479,8 @@
--   error messages are target dependent and irrelevant.
 
--  dL   The compiler ignores calls in instances and invoke subprograms
-   --   which are external to the instance for the static elaboration
-   --   model. This switch is orthogonal to d.G.
+   --   which are external to the instance for both the static and dynamic
+   --   elaboration models.
 
--  dM   Assume all variables have been modified, and ignore current value
--   indications. This debug flag disconnects the tracking of constant
@@ -734,8 +734,7 @@
--  d.G  Previously the compiler ignored calls via generic formal parameters
--   when doing the analysis for the static elaboration model. This is
--   now fixed, but we provide this debug flag to revert to the previous
-   --   situation of ignoring such calls to aid in transition. This switch
-   --   is orthogonal to dL.
+   --   situation of ignoring such calls to aid in transition.
 
--  d.H  Sets ASIS_GNSA_Mode to True. This signals the front end to suppress
--   the call to gigi in ASIS_Mode.
@@ -832,6 +831,10 @@
--   control, conditional entry calls, timed entry calls, and requeue
--   statements in both the static and dynamic elaboration models.
 
+   --  d_i  The compiler ignores calls and task activations when they target a
+   --   subprogram or task type defined in an external instance for both
+   --   the static and dynamic elaboration models.
+
--  d_p  The compiler ignores calls to subprograms which verify the run-time
--   semantics of invariants and postconditions in both the static and
--   dynamic elaboration models.
Index: exp_ch6.adb
===
--- exp_ch6.adb (revision 255680)
+++ exp_ch6.adb (working copy)
@@ -5356,7 +5356,7 @@
 
  Else_Statements => New_List (
Make_Raise_Program_Error (Loc,
-  Reason => PE_All_Guards_Closed)));
+ Reason => PE_All_Guards_Closed)));
 
  --  If a separate initialization assignment was created
  --  earlier, append that following the assignment of the
Index: exp_ch7.adb
===
--- exp_ch7.adb (revision 255680)
+++ exp_ch7.adb (working copy)
@@ -4200,13 +4200,11 @@

 
procedure Expand_Cleanup_Actions (N : Node_Id) is
-  pragma Assert
-(Nkind_In (N,
-   N_Extended_Return_Statement,
-   N_Block_Statement,
-   N_Subprogram_Body,
-   N_Task_Body,
-   N_Entry_Body));
+  pragma Assert (Nkind_In (N, N_Block_Statement,
+  N_Entry_Body,
+  N_Extended_Ret

[Ada] Completing expression function need not trigger loading of package body

2017-12-15 Thread Pierre-Marie de Rodat
This patch prevents expression functions which complete previous declarations
in a package spec from loading the body of the package spec on the basis that
the expression function body is needed for inlining. This in turn prevents the
generation of spurious dependencies on units in ALI files.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Hristian Kirtchev  

* inline.adb (Add_Inlined_Body): Do not add a function which is
completed by an expression function defined in the same context as the
initial declaration because the completing body is not in a package
body.
(Is_Non_Loading_Expression_Function): New routine.

gcc/testsuite/

2017-12-15  Hristian Kirtchev  

* gnat.dg/expr_func_main.adb, gnat.dg/expr_func_pkg.ads,
gnat.dg/expr_func_pkg.adb: New testcase.
Index: inline.adb
===
--- inline.adb  (revision 255678)
+++ inline.adb  (working copy)
@@ -298,10 +298,65 @@
   --  Inline_Package means that the call is considered for inlining and
   --  its package compiled and scanned for more inlining opportunities.
 
+  function Is_Non_Loading_Expression_Function
+(Id : Entity_Id) return Boolean;
+  --  Determine whether arbitrary entity Id denotes a subprogram which is
+  --  either
+  --
+  --* An expression function
+  --
+  --* A function completed by an expression function where both the
+  --  spec and body are in the same context.
+
   function Must_Inline return Inline_Level_Type;
   --  Inlining is only done if the call statement N is in the main unit,
   --  or within the body of another inlined subprogram.
 
+  
+  -- Is_Non_Loading_Expression_Function --
+  
+
+  function Is_Non_Loading_Expression_Function
+(Id : Entity_Id) return Boolean
+  is
+ Body_Decl : Node_Id;
+ Body_Id   : Entity_Id;
+ Spec_Decl : Node_Id;
+
+  begin
+ --  A stand-alone expression function is transformed into a spec-body
+ --  pair in-place. Since both the spec and body are in the same list,
+ --  the inlining of such an expression function does not need to load
+ --  anything extra.
+
+ if Is_Expression_Function (Id) then
+return True;
+
+ --  A function may be completed by an expression function
+
+ elsif Ekind (Id) = E_Function then
+Spec_Decl := Unit_Declaration_Node (Id);
+
+if Nkind (Spec_Decl) = N_Subprogram_Declaration then
+   Body_Id := Corresponding_Body (Spec_Decl);
+
+   if Present (Body_Id) then
+  Body_Decl := Unit_Declaration_Node (Body_Id);
+
+  --  The inlining of a completing expression function does
+  --  not need to load anything extra when both the spec and
+  --  body are in the same context.
+
+  return
+Was_Expression_Function (Body_Decl)
+  and then Parent (Spec_Decl) = Parent (Body_Decl);
+   end if;
+end if;
+ end if;
+
+ return False;
+  end Is_Non_Loading_Expression_Function;
+
   -
   -- Must_Inline --
   -
@@ -415,10 +470,12 @@
  Set_Needs_Debug_Info (E, False);
   end if;
 
-  --  If the subprogram is an expression function, then there is no need to
-  --  load any package body since the body of the function is in the spec.
+  --  If the subprogram is an expression function, or is completed by one
+  --  where both the spec and body are in the same context, then there is
+  --  no need to load any package body since the body of the function is
+  --  in the spec.
 
-  if Is_Expression_Function (E) then
+  if Is_Non_Loading_Expression_Function (E) then
  Set_Is_Called (E);
  return;
   end if;
Index: ../testsuite/gnat.dg/expr_func_main.adb
===
--- ../testsuite/gnat.dg/expr_func_main.adb (revision 0)
+++ ../testsuite/gnat.dg/expr_func_main.adb (revision 0)
@@ -0,0 +1,9 @@
+--  { dg-do compile }
+
+with Expr_Func_Pkg; use Expr_Func_Pkg;
+
+procedure Expr_Func_Main is
+   Val : Boolean := Expr_Func (456);
+begin
+   null;
+end Expr_Func_Main;
Index: ../testsuite/gnat.dg/expr_func_pkg.adb
===
--- ../testsuite/gnat.dg/expr_func_pkg.adb  (revision 0)
+++ ../testsuite/gnat.dg/expr_func_pkg.adb  (revision 0)
@@ -0,0 +1,7 @@
+package body Expr_Func_Pkg is
+   function Func (Val : Integer) return Boolean is
+   begin
+  Error;  --  { dg-error "\"Error\" is undefined" }
+  return Val = 123;
+   end Func;
+end Expr_Func_Pkg;
In

[Ada] Compiler crash with -gnatd.1 (force unnesting of subprograms)

2017-12-15 Thread Pierre-Marie de Rodat
This patch fixes a crash in the compiler when enabling unnesting of subprograms
on a generic unit.

The following must compile quietly:

gcc -c -gnatg -gnatd.1 a-btgbso.adb

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Ed Schonberg  

* exp_unst.adb (Unnest_Subprograms): Nothing to do if the main unit is
a generic package body. Unnesting is only an issue when generating
code, and if the main unit is generic then nested instance bodies have
not been created and analyzed, and unnesting will crash in the absence
of those bodies,

Index: exp_unst.adb
===
--- exp_unst.adb(revision 255680)
+++ exp_unst.adb(working copy)
@@ -302,6 +302,16 @@
  return;
   end if;
 
+  --  If the main unit is a package body then we need to examine the spec
+  --  to determine whether the main unit is generic (the scope stack is not
+  --  present when this is called on the main unit).
+
+  if Ekind (Cunit_Entity (Main_Unit)) = E_Package_Body
+and then Is_Generic_Unit (Spec_Entity (Cunit_Entity (Main_Unit)))
+  then
+ return;
+  end if;
+
   --  At least for now, do not unnest anything but main source unit
 
   if not In_Extended_Main_Source_Unit (Subp_Body) then
@@ -553,8 +563,8 @@
Ent := Entity (Name (N));
 
--  We are only interested in calls to subprograms nested
-   --  within Subp. Calls to Subp itself or to subprograms that
-   --  are outside the nested structure do not affect us.
+   --  within Subp. Calls to Subp itself or to subprograms
+   --  that are outside the nested structure do not affect us.
 
if Scope_Within (Ent, Subp) then
 
@@ -1653,7 +1663,6 @@
 if Present (STT.ARECnF)
   and then Nkind (CTJ.N) /= N_Attribute_Reference
 then
-
--  CTJ.N is a call to a subprogram which may require a pointer
--  to an activation record. The subprogram containing the call
--  is CTJ.From and the subprogram being called is CTJ.To, so we


[Ada] Crash on subprogram instantiation in nested package

2017-12-15 Thread Pierre-Marie de Rodat
This patch fixes a crash on a subpogram instance that appears within a package
that declares the actual type for the instance, when the corresponding type is
a private or incomplete formal type.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada

2017-12-15  Ed Schonberg  

* sem_ch6.adb (Possible_Freeze): Do not set Delayed_Freeze on an
subprogram instantiation, now that the enclosing wrapper package
carries an explicit freeze node. THis prevents freeze nodes for the
subprogram for appearing in the wrong scope. This is relevant when the
generic subprogram has a private or incomplete formal type and the
instance appears within a package that declares the actual type for the
instantiation, and that type has itself a delayed freeze.

gcc/testsuite/

2017-12-15  Ed Schonberg  

* gnat.dg/subp_inst.adb, gnat.dg/subp_inst_pkg.adb,
gnat.dg/subp_inst_pkg.ads: New testcase.
Index: sem_ch6.adb
===
--- sem_ch6.adb (revision 255678)
+++ sem_ch6.adb (working copy)
@@ -5834,8 +5834,21 @@
   -
 
   procedure Possible_Freeze (T : Entity_Id) is
+ Scop : constant Entity_Id := Scope (Designator);
   begin
- if Has_Delayed_Freeze (T) and then not Is_Frozen (T) then
+ --  If the subprogram appears within a package instance (which
+ --  may be the wrapper package of a subprogram instance) the
+ --  freeze node for that package will freeze the subprogram at
+ --  the proper place, so do not emit a freeze node for the
+ --  subprogram, given that it may appear in the wrong scope.
+
+ if Ekind (Scop) = E_Package
+   and then not Comes_From_Source (Scop)
+   and then Is_Generic_Instance (Scop)
+ then
+null;
+
+ elsif Has_Delayed_Freeze (T) and then not Is_Frozen (T) then
 Set_Has_Delayed_Freeze (Designator);
 
  elsif Is_Access_Type (T)
Index: ../testsuite/gnat.dg/subp_inst.adb
===
--- ../testsuite/gnat.dg/subp_inst.adb  (revision 0)
+++ ../testsuite/gnat.dg/subp_inst.adb  (revision 0)
@@ -0,0 +1,26 @@
+--  { dg-do compile }
+with Subp_Inst_Pkg;
+procedure Subp_Inst is
+   procedure Test_Access_Image is
+  package Nested is
+ type T is private;
+
+ type T_General_Access is access all T;
+ type T_Access is access T;
+ function Image1 is new Subp_Inst_Pkg.Image (T, T_Access);
+ function Image2 is new Subp_Inst_Pkg.Image (T, T_General_Access);
+ function Image3 is new Subp_Inst_Pkg.T_Image (T);
+  private
+ type T is null record;
+  end Nested;
+
+  A : aliased Nested.T;
+  AG : aliased constant Nested.T_General_Access := A'Access;
+  AA : aliased constant Nested.T_Access := new Nested.T;
+   begin
+  null;
+   end Test_Access_Image;
+
+begin
+   Test_Access_Image;
+end Subp_Inst;
Index: ../testsuite/gnat.dg/subp_inst_pkg.adb
===
--- ../testsuite/gnat.dg/subp_inst_pkg.adb  (revision 0)
+++ ../testsuite/gnat.dg/subp_inst_pkg.adb  (revision 0)
@@ -0,0 +1,20 @@
+with Ada.Unchecked_Conversion;
+with System.Address_Image;
+package body Subp_Inst_Pkg is
+
+   function Image (Val : T_Access) return String is
+  function Convert is new Ada.Unchecked_Conversion
+ (T_Access, System.Address);
+   begin
+  return System.Address_Image (Convert (Val));
+   end Image;
+
+   function T_Image (Val : access T) return String is
+  type T_Access is access all T;
+  function Convert is new Ada.Unchecked_Conversion
+ (T_Access, System.Address);
+   begin
+  return System.Address_Image (Convert (Val));
+   end T_Image;
+
+end Subp_Inst_Pkg;
Index: ../testsuite/gnat.dg/subp_inst_pkg.ads
===
--- ../testsuite/gnat.dg/subp_inst_pkg.ads  (revision 0)
+++ ../testsuite/gnat.dg/subp_inst_pkg.ads  (revision 0)
@@ -0,0 +1,13 @@
+package Subp_Inst_Pkg is
+   pragma Pure;
+
+   generic
+  type T;
+  type T_Access is access T;
+   function Image (Val : T_Access) return String;
+
+   generic
+  type T;
+   function T_Image (Val : access T) return String;
+
+end Subp_Inst_Pkg;


Re: [PATCH] Fix PR83388

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 11:08:43AM +0100, Richard Biener wrote:
> --- gcc/internal-fn.def   (revision 255678)
> +++ gcc/internal-fn.def   (working copy)
> @@ -254,6 +254,9 @@ DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF
>  /* Divmod function.  */
>  DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
>  
> +/* A NOP function with aribtrary arguments and return value.  */

arbitrary

> +static void
> +expand_NOP (internal_fn, gcall *call_stmt)
> +{
> +  /* Nothing.  But it shouldn't really prevail.  */

It could with -O0 (but who would use -flto -O0) or with -fno-tree-dce
-fno-tree-whateverelse.

LGTM otherwise.

Jakub


[PATCH] Avoid excessive function type casts with splay-trees

2017-12-15 Thread Bernd Edlinger
Hi,

when working on the -Wcast-function-type patch I noticed some rather
ugly and non-portable function type casts that are necessary to accomplish
some actually very simple tasks.

Often functions taking pointer arguments are called with a different signature
taking uintptr_t arguments, which is IMHO not really safe to do...

The attached patch adds a context argument to the callback functions but
keeps the existing interface as far as possible.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.include:
2017-12-15  Bernd Edlinger  

* splay-tree.h (splay_tree_compare_ex_fn, splay_tree_delete_key_ex_fn,
splay_tree_delete_value_ex_fn): New function types.
(splay_tree_s): Update to use new function types.
(splay_tree_ex_new): Declare new constructor.
(splay_tree_compare_strings, splay_tree_delete_pointers,
splay_tree_compare_wrapper, splay_tree_delete_key_wrapper,
splay_tree_delete_value_wrapper, splay_tree_xmalloc_allocate,
splay_tree_xmalloc_deallocate): Declare new utility functions.

libiberty:
2017-12-15  Bernd Edlinger  

* splay-tree.c (splay_tree_delete_helper, splay_tree_splay,
splay_tree_insert, splay_tree_remove, splay_tree_lookup,
splay_tree_predecessor, splay_tree_successor): Adjust.
(splay_tree_new_typed_alloc): Call splay_tree_ex_new.
(splay_tree_ex_new): New constructor.
(splay_tree_compare_strings, splay_tree_delete_pointers,
splay_tree_compare_wrapper, splay_tree_delete_key_wrapper,
splay_tree_delete_value_wrapper): New utility functions.
(splay_tree_xmalloc_allocate, splay_tree_xmalloc_deallocate): Export.

gcc:
2017-12-15  Bernd Edlinger  

* typed-splay-tree.h (typed_splay_tree::m_compare_outer_fn,
typed_splay_tree::m_delete_key_outer_fn,
typed_splay_tree::m_delete_value_outer_fn): New data members.
(typed_splay_tree::compare_inner_fn,
typed_splay_tree::delete_key_inner_fn,
typed_splay_tree::delete_value_inner_fn): New helper functions.
(typed_splay_tree::typed_splay_tree): Use splay_tree_ex_new.
* tree-dump.c (dump_node): Use splay_tree_delete_pointers.

c-family:
2017-12-15  Bernd Edlinger  

* c-lex.c (get_fileinfo): Use splay_tree_compare_strings and
splay_tree_delete_pointers.

cp:
2017-12-15  Bernd Edlinger  

* decl2.c (start_static_storage_duration_function): Use
splay_tree_delete_pointers.
Index: gcc/c-family/c-lex.c
===
--- gcc/c-family/c-lex.c	(revision 255661)
+++ gcc/c-family/c-lex.c	(working copy)
@@ -101,11 +101,9 @@ get_fileinfo (const char *name)
   struct c_fileinfo *fi;
 
   if (!file_info_tree)
-file_info_tree = splay_tree_new ((splay_tree_compare_fn)
- (void (*) (void)) strcmp,
+file_info_tree = splay_tree_new (splay_tree_compare_strings,
  0,
- (splay_tree_delete_value_fn)
- (void (*) (void)) free);
+ splay_tree_delete_pointers);
 
   n = splay_tree_lookup (file_info_tree, (splay_tree_key) name);
   if (n)
Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c	(revision 255661)
+++ gcc/cp/decl2.c	(working copy)
@@ -3558,8 +3558,7 @@ start_static_storage_duration_function (unsigned c
   priority_info_map = splay_tree_new (splay_tree_compare_ints,
 	  /*delete_key_fn=*/0,
 	  /*delete_value_fn=*/
-	  (splay_tree_delete_value_fn)
-	  (void (*) (void)) free);
+	  splay_tree_delete_pointers);
 
   /* We always need to generate functions for the
 	 DEFAULT_INIT_PRIORITY so enter it now.  That way when we walk
Index: gcc/tree-dump.c
===
--- gcc/tree-dump.c	(revision 255661)
+++ gcc/tree-dump.c	(working copy)
@@ -736,8 +736,7 @@ dump_node (const_tree t, dump_flags_t flags, FILE
   di.flags = flags;
   di.node = t;
   di.nodes = splay_tree_new (splay_tree_compare_pointers, 0,
-			 (splay_tree_delete_value_fn)
-			 (void (*) (void)) free);
+			 splay_tree_delete_pointers);
 
   /* Queue up the first node.  */
   queue (&di, t, DUMP_NONE);
Index: gcc/typed-splay-tree.h
===
--- gcc/typed-splay-tree.h	(revision 255661)
+++ gcc/typed-splay-tree.h	(working copy)
@@ -63,7 +63,30 @@ class typed_splay_tree
 
   static value_type node_to_value (splay_tree_node node);
 
- private:
+  compare_fn m_compare_outer_fn;
+  static int compare_inner_fn (splay_tree_key k1, splay_tree_key k2,
+			   void *user_data)
+  {
+typed_splay_tree *myself = (typed_splay_tree *) user_data;
+return myself->m_compare_outer_fn ((key_type) k1, (key_type) k2);
+  }
+
+  delete_key_fn m_delete_key_outer_fn;
+  static void delete_key_inner_fn (splay_tree_key key, void *user_data)
+  {
+typed_splay_tree

Re: [patch] More robust fix for PR target/66488

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 10:38 AM, Eric Botcazou  wrote:
> Hi,
>
> this PR was about the blow-up of the garbage collector on x86_64-w64-mingw32
> when more than 3 GB are allocated.  The fix was to set HOST_BITS_PER_PTR to
> the appropriate value (64) in config/i386/xm-mingw32.h.
>
> This means that the same issue can happen on other P64 hosts so the attached
> patch replaces the fix by a more robust variant.  And I'm proposing that it be
> installed on all active branches (the original fix is not on the 6 branch).
>
> Tested on x86_64-w64-mingw32 (6 branch) and x86_64-suse-linux (mainline), OK?

Ok.

Thanks,
Richard.

>
> 2017-12-15  Eric Botcazou  
>
> PR target/66488
> * ggc-page.c (HOST_BITS_PER_PTR): Do not define here...
> * hwint.h (HOST_BITS_PER_PTR): ...but here instead.
> * config/i386/xm-mingw32.h (HOST_BITS_PER_PTR): Delete.
>
> --
> Eric Botcazou


Re: [PATCH] Avoid excessive function type casts with splay-trees

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 10:44:54AM +, Bernd Edlinger wrote:
> when working on the -Wcast-function-type patch I noticed some rather
> ugly and non-portable function type casts that are necessary to accomplish
> some actually very simple tasks.
> 
> Often functions taking pointer arguments are called with a different signature
> taking uintptr_t arguments, which is IMHO not really safe to do...
> 
> The attached patch adds a context argument to the callback functions but
> keeps the existing interface as far as possible.

Just formatting nits, not full review:

> +  return strcmp ((char*) k1, (char*) k2);

char * instead of char*, please.

> +void
> +splay_tree_delete_key_wrapper (splay_tree_key key, void *fn)
> +{
> +  splay_tree_delete_key_fn delete_key = (splay_tree_delete_key_fn) 
> (uintptr_t) fn;

Too long line, should be:
  splay_tree_delete_key_fn delete_key
= (splay_tree_delete_key_fn) (uintptr_t) fn;

> +void
> +splay_tree_delete_value_wrapper (splay_tree_value value, void *fn)
> +{
> +  splay_tree_delete_value_fn delete_value = (splay_tree_delete_value_fn) 
> (uintptr_t) fn;

Ditto.

Jakub


Re: [PATCH] Fix PR83388

2017-12-15 Thread Richard Biener
On Fri, 15 Dec 2017, Jakub Jelinek wrote:

> On Fri, Dec 15, 2017 at 11:08:43AM +0100, Richard Biener wrote:
> > --- gcc/internal-fn.def (revision 255678)
> > +++ gcc/internal-fn.def (working copy)
> > @@ -254,6 +254,9 @@ DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF
> >  /* Divmod function.  */
> >  DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
> >  
> > +/* A NOP function with aribtrary arguments and return value.  */
> 
> arbitrary
> 
> > +static void
> > +expand_NOP (internal_fn, gcall *call_stmt)
> > +{
> > +  /* Nothing.  But it shouldn't really prevail.  */
> 
> It could with -O0 (but who would use -flto -O0) or with -fno-tree-dce
> -fno-tree-whateverelse.

Yeah, with the -O0 variant of the testcase it does prevail and then
we expand it to nothing.

Richard.


Re: [PATCH] Further improvements for the (T)(P+A)-(T)(P+B) folding (PR sanitizer/81281)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 10:28:52AM +0100, Richard Biener wrote:
> > --- gcc/match.pd.jj 2017-12-07 14:00:51.083048186 +0100
> > +++ gcc/match.pd2017-12-07 15:17:49.132784931 +0100
> > @@ -1784,8 +1784,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  
> >/* (T)(P + A) - (T)P -> (T) A */
> >(simplify
> > -   (minus (convert (plus:c @0 @1))
> > -(convert @0))
> > +   (minus (convert (plus:c @@0 @1))
> > +(convert? @0))
> > (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
> > /* For integer types, if A has a smaller type
> >than T the result depends on the possible
> > @@ -1794,10 +1794,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >However, if an overflow in P + A would cause
> >undefined behavior, we can assume that there
> >is no overflow.  */
> > -   || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > -   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
> > +   || (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> > +   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1
> 
> Given @1 and @@0 are in the same plus this change isn't technically
> necessary but it makes it clearer which type we look at (thus ok).

My understanding is that it is necessary, because @@0 could have different
type from @0 and TREE_TYPE (@0) is the type of where @0 is used rather
than @@0.

> >  (convert @1)))
> >(simplify
> > +   (plus (convert (plus @1 INTEGER_CST@0)) INTEGER_CST@2)
> > +   (with { bool overflow;
> > +  wide_int w = wi::neg (wi::to_wide (@2), &overflow); }
> > +(if (wi::to_widest (@0) == widest_int::from (w, TYPE_SIGN (TREE_TYPE 
> > (@2)))
> > +&& (!overflow
> > +|| (INTEGRAL_TYPE_P (TREE_TYPE (@2))
> > +&& TYPE_UNSIGNED (TREE_TYPE (@2
> > +&& (element_precision (type) <= element_precision (TREE_TYPE (@1))
> > +/* For integer types, if A has a smaller type
> > +   than T the result depends on the possible
> > +   overflow in P + A.
> > +   E.g. T=size_t, A=(unsigned)429497295, P>0.
> > +   However, if an overflow in P + A would cause
> > +   undefined behavior, we can assume that there
> > +   is no overflow.  */
> > +|| (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> > +&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)
> 
> I think we don't need to worry about definedness of overflow.  All

We are talking about
(int) (x + 0x8000U) + INT_MIN
I think you're right that we can still optimize that to (int) x.

> that matters is whether twos complement arithmetic will simplify
> the expression to (convert @1).  Specifically the possible overflow
> of the negation of @2 for the case element_precision (type) <= 
> element_precision (TREE_TYPE (@1)) shouldn't matter, likewise
> for the widening case (we'd never get the equality).
> 
> Don't we want to compare @0 and -@2 in the type of @2?  Like
> for (unsigned int)(unsigned-long-x + 0x10005) + -5U which
> we should be able to simplify?  For the widening case that would
> work as well as far as I can see?

So, we can have several cases, the narrowing one, e.g.:
(int)(unsigned-long-long-x + 0x10005ULL) + -5
(unsigned)(long-long-x + 0x10005LL) + -5U
(int)(unsigned-long-long-x + 0x1fffbULL) + 5
(unsigned)(long-long-x + 0x1fffbLL) + 5U
same precision:
(int)(unsigned-x + 5U) + -5
(unsigned)(int-x + 5) + -5U
(int)(unsigned-x + -5U) + 5
(unsigned)(int-x + -5) + 5
and widening ones:
(long long)(int-x + 5) + -5LL
(unsigned long long)(int-x + 5) + -5ULL
(long long)(int-x + -5) + 5LL
(unsigned long long)(int-x + -5) + 5ULL
You mean we should effectively (though on wide_int/widest_int)
fold_unary (MINUS_EXPR, TREE_TYPE (@2), fold_convert (TREE_TYPE (@2), @0))
and compare that to @2?

> If you can split out this new pattern the rest is ok with honoring
> the comment below.

Ok (will need to comment out the corresponding testcase, done below).

> > -|| (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > -&& TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
> > +(if (((element_precision (type) <= element_precision (TREE_TYPE (@1)))
> > + == (element_precision (type) <= element_precision (TREE_TYPE (@1

Yeah, this @1 above should have been @2.  Thanks for catching this.

So for now like this?

2017-12-15  Jakub Jelinek  

PR sanitizer/81281
* match.pd ((T)(P + A) - (T)P -> (T) A): Use @@0 instead of @0 and
convert? on @0 instead of convert.  Check type of @1, not @0.
((T)P - (T)(P + A) -> -(T) A): Use @@0 instead of @0 and
convert? on @0 instead of convert.  Check type of @1, not @0.
((T)(P + A) - (T)(P + B) -> (T)A - (T)B): Use @@0 instead of @0,
only optimize if either both @1 and @2 types are narrower
precision, or both are wider or equal precision, and in the former
case only if both have undefined overflow.

* gcc.dg/pr81281-3.c: New test.

--- gcc/match.pd.jj 2017-12-07 18:04:54.580750329 +0100
+++ gcc/match.pd  

Re: [PATCH] Further improvements for the (T)(P+A)-(T)(P+B) folding (PR sanitizer/81281)

2017-12-15 Thread Richard Biener
On Fri, 15 Dec 2017, Jakub Jelinek wrote:

> On Fri, Dec 15, 2017 at 10:28:52AM +0100, Richard Biener wrote:
> > > --- gcc/match.pd.jj   2017-12-07 14:00:51.083048186 +0100
> > > +++ gcc/match.pd  2017-12-07 15:17:49.132784931 +0100
> > > @@ -1784,8 +1784,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  
> > >/* (T)(P + A) - (T)P -> (T) A */
> > >(simplify
> > > -   (minus (convert (plus:c @0 @1))
> > > -(convert @0))
> > > +   (minus (convert (plus:c @@0 @1))
> > > +(convert? @0))
> > > (if (element_precision (type) <= element_precision (TREE_TYPE (@1))
> > >   /* For integer types, if A has a smaller type
> > >  than T the result depends on the possible
> > > @@ -1794,10 +1794,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  However, if an overflow in P + A would cause
> > >  undefined behavior, we can assume that there
> > >  is no overflow.  */
> > > - || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > - && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
> > > + || (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1
> > 
> > Given @1 and @@0 are in the same plus this change isn't technically
> > necessary but it makes it clearer which type we look at (thus ok).
> 
> My understanding is that it is necessary, because @@0 could have different
> type from @0 and TREE_TYPE (@0) is the type of where @0 is used rather
> than @@0.

Using @@0 also guarantees you to get this specific operand when
later refering to it via @0.

> > >  (convert @1)))
> > >(simplify
> > > +   (plus (convert (plus @1 INTEGER_CST@0)) INTEGER_CST@2)
> > > +   (with { bool overflow;
> > > +wide_int w = wi::neg (wi::to_wide (@2), &overflow); }
> > > +(if (wi::to_widest (@0) == widest_int::from (w, TYPE_SIGN (TREE_TYPE 
> > > (@2)))
> > > +  && (!overflow
> > > +  || (INTEGRAL_TYPE_P (TREE_TYPE (@2))
> > > +  && TYPE_UNSIGNED (TREE_TYPE (@2
> > > +  && (element_precision (type) <= element_precision (TREE_TYPE (@1))
> > > +  /* For integer types, if A has a smaller type
> > > + than T the result depends on the possible
> > > + overflow in P + A.
> > > + E.g. T=size_t, A=(unsigned)429497295, P>0.
> > > + However, if an overflow in P + A would cause
> > > + undefined behavior, we can assume that there
> > > + is no overflow.  */
> > > +  || (INTEGRAL_TYPE_P (TREE_TYPE (@1))
> > > +  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)
> > 
> > I think we don't need to worry about definedness of overflow.  All
> 
> We are talking about
> (int) (x + 0x8000U) + INT_MIN
> I think you're right that we can still optimize that to (int) x.
> 
> > that matters is whether twos complement arithmetic will simplify
> > the expression to (convert @1).  Specifically the possible overflow
> > of the negation of @2 for the case element_precision (type) <= 
> > element_precision (TREE_TYPE (@1)) shouldn't matter, likewise
> > for the widening case (we'd never get the equality).
> > 
> > Don't we want to compare @0 and -@2 in the type of @2?  Like
> > for (unsigned int)(unsigned-long-x + 0x10005) + -5U which
> > we should be able to simplify?  For the widening case that would
> > work as well as far as I can see?
> 
> So, we can have several cases, the narrowing one, e.g.:
> (int)(unsigned-long-long-x + 0x10005ULL) + -5
> (unsigned)(long-long-x + 0x10005LL) + -5U
> (int)(unsigned-long-long-x + 0x1fffbULL) + 5
> (unsigned)(long-long-x + 0x1fffbLL) + 5U
> same precision:
> (int)(unsigned-x + 5U) + -5
> (unsigned)(int-x + 5) + -5U
> (int)(unsigned-x + -5U) + 5
> (unsigned)(int-x + -5) + 5
> and widening ones:
> (long long)(int-x + 5) + -5LL
> (unsigned long long)(int-x + 5) + -5ULL
> (long long)(int-x + -5) + 5LL
> (unsigned long long)(int-x + -5) + 5ULL
> You mean we should effectively (though on wide_int/widest_int)
> fold_unary (MINUS_EXPR, TREE_TYPE (@2), fold_convert (TREE_TYPE (@2), @0))
> and compare that to @2?

I think so.

> > If you can split out this new pattern the rest is ok with honoring
> > the comment below.
> 
> Ok (will need to comment out the corresponding testcase, done below).
> 
> > > -  || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > -  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0
> > > +(if (((element_precision (type) <= element_precision (TREE_TYPE 
> > > (@1)))
> > > +   == (element_precision (type) <= element_precision (TREE_TYPE (@1
> 
> Yeah, this @1 above should have been @2.  Thanks for catching this.
> 
> So for now like this?

Yes.

Thanks,
Richard.

> 2017-12-15  Jakub Jelinek  
> 
>   PR sanitizer/81281
>   * match.pd ((T)(P + A) - (T)P -> (T) A): Use @@0 instead of @0 and
>   convert? on @0 instead of convert.  Check type of @1, not @0.
>   ((T)P - (T)(P + A) -> -(T) A): Use @@0 instead of @0 and
>   convert? on @0 instead of convert.  Check type of @1, not @0.
>   ((T)(P + A) - (T)(P + B) -> (T)A - (T)B): Use @@

[Ada] Spurious warning on default initialized object

2017-12-15 Thread Pierre-Marie de Rodat
This patch updates the implications that pragma Default_Initial_Condition has
on full default initialization of objects and types. According to the SPARK RM,
the pragma may appear without an expression

   7.3.3 The aspect_definition may be omitted; this is semantically equivalent
 to specifying a static Boolean_expression having the value True.

which also satisfies the notion of "full default initialization" in SPARK

   3.1   A type is said to define full default initialization if it is

* a private type whose Default_Initial_Condition aspect is
  specified to be a Boolean_expression.

The end result is that an object is now considered fully default initialized
for warning purposes. Prior to this patch, the compiler would warn on a read
of an object when

   * The object has default initialization
   * The object type carries pragma Default_Initial_Condition without an
 expression
   * No value is provided in between the object declaration and read

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Hristian Kirtchev  

* exp_util.adb (Add_Own_DIC): Ensure that the expression of the pragma
is available (Is_Verifiable_DIC_Pragma): Moved from Sem_Util.
* sem_util.adb (Has_Full_Default_Initialization):
Has_Fully_Default_Initializing_DIC_Pragma is now used to determine
whether a type has full default initialization due to pragma
Default_Initial_Condition.
(Has_Fully_Default_Initializing_DIC_Pragma): New routine.
(Is_Verifiable_DIC_Pragma): Moved to Exp_Util.
* sem_util.ads (Has_Fully_Default_Initializing_DIC_Pragma): New
routine.
(Is_Verifiable_DIC_Pragma): Moved to Exp_Util.
* sem_warn.adb (Is_OK_Fully_Initialized):
Has_Fully_Default_Initializing_DIC_Pragma is now used to determine
whether a type has full default initialization due to pragma
Default_Initial_Condition.

gcc/testsuite/

2017-12-15  Hristian Kirtchev  

* gnat.dg/dflt_init_cond.adb, gnat.dg/dflt_init_cond_pkg.ads: New
testcase.
Index: exp_util.adb
===
--- exp_util.adb(revision 255683)
+++ exp_util.adb(working copy)
@@ -165,6 +165,10 @@
--  Force evaluation of bounds of a slice, which may be given by a range
--  or by a subtype indication with or without a constraint.
 
+   function Is_Verifiable_DIC_Pragma (Prag : Node_Id) return Boolean;
+   --  Determine whether pragma Default_Initial_Condition denoted by Prag has
+   --  an assertion expression that should be verified at run time.
+
function Make_CW_Equivalent_Type
  (T : Entity_Id;
   E : Node_Id) return Entity_Id;
@@ -1500,6 +1504,7 @@
   --  Start of processing for Add_Own_DIC
 
   begin
+ pragma Assert (Present (DIC_Expr));
  Expr := New_Copy_Tree (DIC_Expr);
 
  --  Perform the following substitution:
@@ -1733,8 +1738,6 @@
  --  Produce an empty completing body in the following cases:
  --* Assertions are disabled
  --* The DIC Assertion_Policy is Ignore
- --* Pragma DIC appears without an argument
- --* Pragma DIC appears with argument "null"
 
  if No (Stmts) then
 Stmts := New_List (Make_Null_Statement (Loc));
@@ -8715,6 +8718,21 @@
   and then Is_Itype (Full_Typ);
end Is_Untagged_Private_Derivation;
 
+   --
+   -- Is_Verifiable_DIC_Pragma --
+   --
+
+   function Is_Verifiable_DIC_Pragma (Prag : Node_Id) return Boolean is
+  Args : constant List_Id := Pragma_Argument_Associations (Prag);
+
+   begin
+  --  To qualify as verifiable, a DIC pragma must have a non-null argument
+
+  return
+Present (Args)
+  and then Nkind (Get_Pragma_Arg (First (Args))) /= N_Null;
+   end Is_Verifiable_DIC_Pragma;
+
---
-- Is_Volatile_Reference --
---
Index: sem_util.adb
===
--- sem_util.adb(revision 255680)
+++ sem_util.adb(working copy)
@@ -10384,19 +10384,16 @@
 
function Has_Full_Default_Initialization (Typ : Entity_Id) return Boolean is
   Comp : Entity_Id;
-  Prag : Node_Id;
 
begin
-  --  A type subject to pragma Default_Initial_Condition is fully default
-  --  initialized when the pragma appears with a non-null argument. Since
-  --  any type may act as the full view of a private type, this check must
-  --  be performed prior to the specialized tests below.
+  --  A type subject to pragma Default_Initial_Condition may be fully
+  --  default initialized depending on inheritance and the argument of
+  --  the pragma. Since any type may act as the full view of a private
+  --  type, this check must be performed prior t

[Ada] Optimizing allocators for arrays with non-static upper bound

2017-12-15 Thread Pierre-Marie de Rodat
This patch extends the optimization of allocators for arrays of non-controlled
components, when the qualified expression for the aggregate has an
unconstrained type and the upper bound of the aggregte is non-static. In this
case it is safe to build the array in the allocated object, instead of first
creating a temporary for the aggregate, then allocating the object, and then
assigning the temporary to the object, as mandated by the dynamic semantics
of initialized allocators. This optimization is particularly useful when the
size of the aggregate may be too large to be built on the stack,

Executing the following:

   gnatmake -q foo
   ./foo

must yield:

   1000

---
with Text_IO; use Text_IO;
procedure Foo is

   type Record_Type is record
  I : Integer;
   end record;

   type Array_Type is array (Positive range <>) of Record_Type;
   type Array_Access is access all Array_Type;

   function Get_Last return Integer is
   begin
  return 10_000_000;
   end Get_Last;

   A : Array_Access := new Array_Type'(1 .. Get_Last => (I => 0));
begin
   Put_Line (Integer'Image (A'Length));
end Foo;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Ed Schonberg  

* exp_aggr.adb (In_Place_Assign_OK): Extend the predicate to recognize
an array aggregate in an allocator, when the designated type is
unconstrained and the upper bound of the aggregate belongs to the base
type of the index.

Index: exp_aggr.adb
===
--- exp_aggr.adb(revision 255678)
+++ exp_aggr.adb(working copy)
@@ -5537,13 +5537,29 @@
Get_Index_Bounds (Obj_In, Obj_Lo, Obj_Hi);
 
if not Compile_Time_Known_Value (Aggr_Lo)
- or else not Compile_Time_Known_Value (Aggr_Hi)
  or else not Compile_Time_Known_Value (Obj_Lo)
  or else not Compile_Time_Known_Value (Obj_Hi)
  or else Expr_Value (Aggr_Lo) /= Expr_Value (Obj_Lo)
- or else Expr_Value (Aggr_Hi) /= Expr_Value (Obj_Hi)
then
   return False;
+
+   --  For an assignment statement we require static matching
+   --  of bounds. Ditto for an allocator whose qualified
+   --  expression is a constrained type. If the expression in
+   --  the allocator is an unconstrained array, we accept an
+   --  upper bound that is not static, to allow for non-static
+   --  expressions of the base type. Clearly there are further
+   --  possibilities (with diminishing returns) for safely
+   --  building arrays in place here.
+
+   elsif Nkind (Parent (N)) = N_Assignment_Statement
+ or else Is_Constrained (Etype (Parent (N)))
+   then
+  if not Compile_Time_Known_Value (Aggr_Hi)
+   or else Expr_Value (Aggr_Hi) /= Expr_Value (Obj_Hi)
+  then
+ return False;
+  end if;
end if;
 
Next_Index (Aggr_In);


[Ada] Reject certain constants as constituents

2017-12-15 Thread Pierre-Marie de Rodat
This patch updates the analysis of pragma Refined_State to reject constants
which are used as refinement constituents and are either

   * Part of the visible state of a package

   * Part of the hidden state of a package, and lack indicator Part_Of.


-- Source --


--  var.ads

package Var
  with SPARK_Mode,
   Initializes => Input
is
   Input : Integer := 0;
end Var;

--  pack.ads

with Var;

package Pack
  with SPARK_Mode,
   Abstract_State => State
is
   procedure Force_Body;

private
   Const_1 : constant Integer := Var.Input;
   Const_2 : constant Integer := 2 with Part_Of => State;

   Var_1 : Integer := 1;
   Var_2 : Integer := 2 with Part_Of => State;

   package Priv_Pack is
  Const_3 : constant Integer := Var.Input;
  Const_4 : constant Integer := 4 with Part_Of => State;

  Var_3 : Integer := 3;
  Var_4 : Integer := 4 with Part_Of => State;
   end Priv_Pack;
end Pack;

--  pack.adb

package body Pack
  with SPARK_Mode,
   Refined_State =>
 (State =>
   (Const_1, --  Error
Const_2, --  OK
Var_1,   --  Error
Var_2,   --  OK
Priv_Pack.Const_3,   --  Error
Priv_Pack.Const_4,   --  OK
Priv_Pack.Var_3, --  Error
Priv_Pack.Var_4, --  OK
Const_5, --  OK
Const_6, --  OK
Body_Pack.Const_7,   --  OK
Body_Pack.Const_8))  --  OK
is
   Const_5 : constant Integer := Var.Input;
   Const_6 : constant Integer := 6;

   package Body_Pack is
  Const_7 : constant Integer := Var.Input;
  Const_8 : constant Integer := 8;
   end Body_Pack;

   procedure Force_Body is begin null; end Force_Body;
end Pack;


-- Compilation and output --


$ gcc -c -gnatf pack.adb
pack.adb:5:13: cannot use "Const_1" in refinement, constituent is not a hidden
  state of package "Pack"
pack.adb:7:13: cannot use "Var_1" in refinement, constituent is not a hidden
  state of package "Pack"
pack.adb:9:22: cannot use "Const_3" in refinement, constituent is not a hidden
  state of package "Pack"
pack.adb:11:22: cannot use "Var_3" in refinement, constituent is not a hidden
  state of package "Pack"
pack.ads:13:04: indicator Part_Of is required in this context (SPARK RM
  7.2.6(2))
pack.ads:13:04: "Var_1" is declared in the private part of package "Pack"
pack.ads:20:07: indicator Part_Of is required in this context (SPARK RM
  7.2.6(2))
pack.ads:20:07: "Var_3" is declared in the private part of package "Pack"

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Hristian Kirtchev  

* sem_prag.adb (Match_Constituent): Do not quietly accept constants as
suitable constituents.
* exp_util.adb: Minor reformatting.

Index: exp_util.adb
===
--- exp_util.adb(revision 255683)
+++ exp_util.adb(working copy)
@@ -165,6 +165,10 @@
--  Force evaluation of bounds of a slice, which may be given by a range
--  or by a subtype indication with or without a constraint.
 
+   function Is_Verifiable_DIC_Pragma (Prag : Node_Id) return Boolean;
+   --  Determine whether pragma Default_Initial_Condition denoted by Prag has
+   --  an assertion expression that should be verified at run time.
+
function Make_CW_Equivalent_Type
  (T : Entity_Id;
   E : Node_Id) return Entity_Id;
@@ -1500,6 +1504,7 @@
   --  Start of processing for Add_Own_DIC
 
   begin
+ pragma Assert (Present (DIC_Expr));
  Expr := New_Copy_Tree (DIC_Expr);
 
  --  Perform the following substitution:
@@ -1733,8 +1738,6 @@
  --  Produce an empty completing body in the following cases:
  --* Assertions are disabled
  --* The DIC Assertion_Policy is Ignore
- --* Pragma DIC appears without an argument
- --* Pragma DIC appears with argument "null"
 
  if No (Stmts) then
 Stmts := New_List (Make_Null_Statement (Loc));
@@ -8715,6 +8718,21 @@
   and then Is_Itype (Full_Typ);
end Is_Untagged_Private_Derivation;
 
+   --
+   -- Is_Verifiable_DIC_Pragma --
+   --
+
+   function Is_Verifiable_DIC_Pragma (Prag : Node_Id) return Boolean is
+  Args : constant List_Id := Pragma_Argument_Associations (Prag);
+
+   begin
+  --  To qualify as verifia

[PATCH, PR83327] Fix liveness analysis in lra for spilled-into hard regs

2017-12-15 Thread Tom de Vries

[ was: Re: patch to fix PR82353 ]

On 12/14/2017 06:01 PM, Vladimir Makarov wrote:



On 12/13/2017 07:34 AM, Tom de Vries wrote:

On 10/16/2017 10:38 PM, Vladimir Makarov wrote:

This is another version of the patch to fix

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82353

The patch was successfully bootstrapped on x86-64 with Go and Ada.

Committed as rev. 253796.


Hi Vladimir,

AFAIU this bit of the patch makes sure that the flags register show up 
in the bb_livein of the bb in which it's used (and not defined before 
the use), but not in the bb_liveout of the predecessors of that bb.


I wonder if that's a compile-speed optimization, or an oversight.

Hi, Tom.  It was just a minimal fix.  I prefer minimal fixes for LRA 
because even for me it is hard to predict in many cases how the patch 
will affect all the targets.  Therefore many LRA patches have a few 
iterations before to be final.




I see, thanks for the explanation.

I remember that I had some serious problems in the past when I tried to 
implement fixed hard reg liveness propagation in LRA.  It was long ago 
so we could try it again.  If you send patch you mentioned to gcc 
mailing list, I'll review and approve it.


Here it is. It applies cleanly to trunk (and to gcc-7-branch if you 
first backport r253796, the fix for PR82353).


I have not tested this on trunk sofar, only on the internal branch for 
gcn based on gcc 7.1 with the gcc testsuite, where it fixes a wrong-code 
bug in gcc.dg/vect/no-scevccp-outer-10.c and causes no regressions.


--

The problem on a minimal version of the test-case looks as follows:

I.

At ira, we have a def and use of 'reg:BI 605', the def in bb2 and the 
use in bb3:

...
(note 44 32 33 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

   ...

(insn 269 54 64 2 (set (reg:BI 605)
(le:BI (reg/v:SI 491 [ n ])
(const_int 0 [0]))) 23 {cstoresi4}
 (nil))

   

(code_label 250 228 56 3 7 (nil) [1 uses])
(note 56 250 58 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

   ...

(jump_insn 62 60 63 3 (set (pc)
(if_then_else (ne:BI (reg:BI 605)
(const_int 0 [0]))
(label_ref 242)
(pc))) "no-scevccp-outer-10.c":19 21 {cjump}
 (int_list:REG_BR_PROB 1500 (nil))
 -> 242)
(note 63 62 66 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
...

And in lra, we decide to spill it into a hard register:
...
  Spill r605 into hr95
...

Resulting in this code:
...
(insn 385 386 64 2 (set (reg:BI 95 s95)
(reg:BI 18 s18 [605])) 3 {*movbi}
 (nil))

  ...

(insn 404 60 405 3 (set (reg:BI 18 s18 [605])
(reg:BI 95 s95)) "no-scevccp-outer-10.c":19 3 {*movbi}
 (nil))
...


II.

However, a bit later in lra we decide to assign r94,r95 to DImode pseudo 
833:

...
   Assign 94 to reload r833 (freq=60)
...

Resulting in this code:
...
(insn 629 378 390 2 (set (reg:DI 94 s94 [833])
(plus:DI (reg/f:DI 16 s16)
(const_int -8 [0xfff8]))) 35 {addptrdi3}
 (nil))
...

This clobbers the def of s95 in insn 385.


III.

Consequently, the insn is removed in the dce in the jump pass:
...
DCE: Deleting insn 385
deleting insn with uid = 385.
...

--

Analysis:

The decision to assign r94,r95 to DImode pseudo 833 happens because r95 
is not marked in the conflict_hard_regs of lra_reg_info[833] during 
liveness analysis.


There's code in make_hard_regno_born to set the reg in the 
conflict_hard_regs of live pseudos, but at the point that r95 becomes 
live, the r833 pseudo is not live.


Then there's code in mark_pseudo_live to set the hard_regs_live in the 
conflict_hard_regs of the pseudo, but at the point that r833 becomes 
live, r95 is not set in hard_regs_live, due to the fact that it's not 
set in df_get_live_out (bb2).


In other words, the root cause is that hard reg liveness propagation is 
not done.


--

Proposed Solution:

The patch addresses the problem, by:
- marking the hard regs that have been used in lra_spill in
  hard_regs_spilled_into
- using hard_regs_spilled_into in lra_create_live_ranges to
  make sure those registers are marked in the conflict_hard_regs
  of pseudos that overlap with the spill register usage

[ I've also tried an approach where I didn't use hard_regs_spilled_into, 
but tried to propagate all hard regs. I figured out that I needed to 
mask out eliminable_regset.  Also I needed to masked out 
lra_no_alloc_regs, but that could be due to gcn-specific problems 
(pointers take 2 hard regs), I'm not yet sure. Anyway, in the submitted 
patch I tried to avoid these problems and went for the more minimal 
approach. ]


In order to get the patch accepted for trunk, I think we need:
- bootstrap and reg-test on x86_64
- build and reg-test on mips (the only primary platform that has the
  spill_class hook enabled)

Any comm

[PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-15 Thread Bin Cheng
Hi,
As explained in the PR, given below test case:
int a[8][10] = { [2][5] = 4 }, c;

int
main ()
{
  short b;
  int i, d;
  for (b = 4; b >= 0; b--)
for (c = 0; c <= 6; c++)
  a[c + 1][b + 2] = a[c][b + 1];
  for (i = 0; i < 8; i++)
for (d = 0; d < 10; d++)
  if (a[i][d] != (i == 3 && d == 6) * 4)
__builtin_abort ();
  return 0;

the loop nest is illegal for vectorization without reversing inner loop.  The 
issue
is in data dependence checking of vectorizer, I believe the mentioned revision 
just
exposed this.  Previously the vectorization is skipped because of unsupported 
memory
operation.  The outer loop vectorization unrolls the outer loop into:

  for (b = 4; b > 0; b -= 4)
  {
for (c = 0; c <= 6; c++)
  a[c + 1][6] = a[c][5];
for (c = 0; c <= 6; c++)
  a[c + 1][5] = a[c][4];
for (c = 0; c <= 6; c++)
  a[c + 1][4] = a[c][3];
for (c = 0; c <= 6; c++)
  a[c + 1][3] = a[c][2];
  }
Then four inner loops are fused into:
  for (b = 4; b > 0; b -= 4)
  {
for (c = 0; c <= 6; c++)
{
  a[c + 1][6] = a[c][5];  // S1
  a[c + 1][5] = a[c][4];  // S2
  a[c + 1][4] = a[c][3];
  a[c + 1][3] = a[c][2];
}
  }
The loop fusion needs to meet the dependence requirement.  Basically, GCC's data
dependence analyzer does not model dep between references in sibling loops, but
in practice, fusion requirement can be checked by analyzing all data references
after fusion, and there is no backward data dependence.

Apparently, the requirement is violated because we have backward data dependence
between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the inner
loop, the outer loop would become legal for vectorization.

This patch fixes the issue by enforcing dependence check.  It also adds two 
tests
with one shouldn't be vectorized and the other should.  Bootstrap and test on 
x86_64
and AArch64.  Is it OK?

Thanks,
bin
2017-12-15  Bin Cheng  

PR tree-optimization/81740
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): In case
of outer loop vectorization, check backward dependence at inner loop
if dependence at outer loop is reversed.

gcc/testsuite
2017-12-15  Bin Cheng  

PR tree-optimization/81740
* gcc.dg/vect/pr81740-1.c: New test.
* gcc.dg/vect/pr81740-2.c: Refine test.From c0c8cfae08c0bde2cec41a8d3abcbfea0bd2e211 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 14 Dec 2017 15:32:02 +
Subject: [PATCH] pr81740-20171212.txt

---
 gcc/testsuite/gcc.dg/vect/pr81740-1.c | 17 +
 gcc/testsuite/gcc.dg/vect/pr81740-2.c | 21 +
 gcc/tree-vect-data-refs.c | 11 +++
 3 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr81740-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr81740-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-1.c 
b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
new file mode 100644
index 000..d90aba5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr81740-1.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+int a[8][10] = { [2][5] = 4 }, c;
+
+int
+main ()
+{
+  short b;
+  int i, d;
+  for (b = 4; b >= 0; b--)
+for (c = 0; c <= 6; c++)
+  a[c + 1][b + 2] = a[c][b + 1];
+  for (i = 0; i < 8; i++)
+for (d = 0; d < 10; d++)
+  if (a[i][d] != (i == 3 && d == 6) * 4)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr81740-2.c 
b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
new file mode 100644
index 000..fb5b300
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr81740-2.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_int } */
+
+int a[8][10] = { [2][5] = 4 }, c;
+
+int
+main ()
+{
+  short b;
+  int i, d;
+  for (b = 4; b >= 0; b--)
+for (c = 6; c >= 0; c--)
+  a[c + 1][b + 2] = a[c][b + 1];
+  for (i = 0; i < 8; i++)
+for (d = 0; d < 10; d++)
+  if (a[i][d] != (i == 3 && d == 6) * 4)
+__builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect"  } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 996d156..3b780cf1 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -435,6 +435,17 @@ vect_analyze_data_ref_dependence (struct 
data_dependence_relation *ddr,
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "dependence distance negative.\n");
+ /* When doing outer loop vectorization, we need to check if there is
+backward dependence at inner loop level if dependence at the outer
+loop is reversed.  See PR81740 for more information.  */
+ if (nested_in_vect_loop_p (loop, DR_STMT (dra))
+ || nested_in_vect_loop_p (loop, DR_STMT (drb)))
+   {
+ unsigned inner_depth = index_in_loop_nest (loop->inner->num,
+ 

[Ada] Crash on expression function and discriminant-dependent component

2017-12-15 Thread Pierre-Marie de Rodat
This patch fixes a crash on an expression function that is a completion, when
the return expression includes a reference to a discriminant-dependent
component. An expression function that is a completion freezes all types
referenced in the expression, but some itypes are excluded because they are
frozen elsewhere (in the case pf discriminant-dependent component, when the
type itself is frozen).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Ed Schonberg  

* sem_ch6.adb (Freeze_Expr_Types): Do not emit a freeze node for
an itype that is the type of a discriminant-dependent component.

Fixes QC04-017.

gcc/testsuite/

2017-12-15  Ed Schonberg  

* gnat.dg/expr_func2.ads, gnat.dg/expr_func2.adb: New testcase.
Index: sem_ch6.adb
===
--- sem_ch6.adb (revision 255683)
+++ sem_ch6.adb (working copy)
@@ -366,10 +366,13 @@
 
 procedure Check_And_Freeze_Type (Typ : Entity_Id) is
 begin
-   --  Skip Itypes created by the preanalysis
+   --  Skip Itypes created by the preanalysis, and itypes
+   --  whose scope is another type (i.e. component subtypes
+   --  that depend on a discriminant),
 
if Is_Itype (Typ)
- and then Scope_Within_Or_Same (Scope (Typ), Def_Id)
+ and then (Scope_Within_Or_Same (Scope (Typ), Def_Id)
+   or else Is_Type (Scope (Typ)))
then
   return;
end if;
Index: ../testsuite/gnat.dg/expr_func2.ads
===
--- ../testsuite/gnat.dg/expr_func2.ads (revision 0)
+++ ../testsuite/gnat.dg/expr_func2.ads (revision 0)
@@ -0,0 +1,22 @@
+package Expr_Func2 is
+
+   type T_Index is range 1 .. 255;
+
+   type T_Table is array (T_Index range <>) of Boolean;
+
+   type T_Variable_Table (N : T_Index := T_Index'First) is record
+  Table : T_Table (1 .. N);
+   end record;
+
+   type T_A_Variable_Table is access T_Variable_Table;
+
+   function Element (A_Variable_Table : T_A_Variable_Table) return Boolean;
+
+private
+
+   function Element (A_Variable_Table : T_A_Variable_Table) return Boolean is
+ (A_Variable_Table.all.Table (1));
+
+   procedure Foo;
+
+end Expr_Func2;
Index: ../testsuite/gnat.dg/expr_func2.adb
===
--- ../testsuite/gnat.dg/expr_func2.adb (revision 0)
+++ ../testsuite/gnat.dg/expr_func2.adb (revision 0)
@@ -0,0 +1,5 @@
+--  { dg-do compile }
+
+package body Expr_Func2 is
+   procedure Foo is null;
+end Expr_Func2;


[Ada] Verify Part_Of indicator in non-SPARK code

2017-12-15 Thread Pierre-Marie de Rodat
This patch modifies the analysis of Part_Of indicators to verify their
associated rules even when the indicator appears in non-SPARK code. This
prevents possible tamperings of Part_Of constituents of single concurrent
types outside of SPARK code.


-- Source --


--  pack.ads

pragma Profile (Ravenscar);
pragma Partition_Elaboration_Policy (Sequential);

package Pack with SPARK_Mode is
   protected PO is
   end PO;

   X : Boolean := True with Part_Of => PO;
end Pack;

--  pack.adb

package body Pack is
   protected body PO is
   end PO;
begin
   X := not X;   --  OK
end Pack;

--  flip.adb

pragma Profile (Ravenscar);
pragma Partition_Elaboration_Policy (Sequential);

with Pack; use Pack;

procedure Flip with SPARK_Mode => Off is
begin
   X := not X;   --  Error
end Flip;


-- Compilation and output --


$ gcc -c flip.adb
$ gcc -c pack.adb
flip.adb:8:04: reference to variable "X" cannot appear in this context
flip.adb:8:04: "X" is constituent of single protected type "PO"
flip.adb:8:13: reference to variable "X" cannot appear in this context
flip.adb:8:13: "X" is constituent of single protected type "PO"

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Hristian Kirtchev  

* sem_prag.adb (Analyze_Part_Of): The context-specific portion of the
analysis is now directed to several specialized routines.
(Check_Part_Of_Abstract_State): New routine.
(Check_Part_Of_Concurrent_Type): New routine. Reimplement the checks
involving the item, the single concurrent type, and their respective
contexts.
* sem_res.adb (Resolve_Entity_Name): Potential constituents of a single
concurrent type are now recorded regardless of the SPARK mode.
* sem_util.adb (Check_Part_Of_Reference): Split some of the tests in
individual predicates.  A Part_Of reference is legal when it appears
within the statement list of the object's immediately enclosing
package.
(Is_Enclosing_Package_Body): New routine.
(Is_Internal_Declaration_Or_Body): New routine.
(Is_Single_Declaration_Or_Body): New routine.
(Is_Single_Task_Pragma): New routine.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 255685)
+++ sem_prag.adb(working copy)
@@ -3168,71 +3168,26 @@
   Encap_Id : out Entity_Id;
   Legal: out Boolean)
is
-  Encap_Typ   : Entity_Id;
-  Item_Decl   : Node_Id;
-  Pack_Id : Entity_Id;
-  Placement   : State_Space_Kind;
-  Parent_Unit : Entity_Id;
+  procedure Check_Part_Of_Abstract_State;
+  pragma Inline (Check_Part_Of_Abstract_State);
+  --  Verify the legality of indicator Part_Of when the encapsulator is an
+  --  abstract state.
 
-   begin
-  --  Assume that the indicator is illegal
+  procedure Check_Part_Of_Concurrent_Type;
+  pragma Inline (Check_Part_Of_Concurrent_Type);
+  --  Verify the legality of indicator Part_Of when the encapsulator is a
+  --  single concurrent type.
 
-  Encap_Id := Empty;
-  Legal:= False;
+  --
+  -- Check_Part_Of_Abstract_State --
+  --
 
-  if Nkind_In (Encap, N_Expanded_Name,
-  N_Identifier,
-  N_Selected_Component)
-  then
- Analyze   (Encap);
- Resolve_State (Encap);
+  procedure Check_Part_Of_Abstract_State is
+ Pack_Id : Entity_Id;
+ Placement   : State_Space_Kind;
+ Parent_Unit : Entity_Id;
 
- Encap_Id := Entity (Encap);
-
- --  The encapsulator is an abstract state
-
- if Ekind (Encap_Id) = E_Abstract_State then
-null;
-
- --  The encapsulator is a single concurrent type (SPARK RM 9.3)
-
- elsif Is_Single_Concurrent_Object (Encap_Id) then
-null;
-
- --  Otherwise the encapsulator is not a legal choice
-
- else
-SPARK_Msg_N
-  ("indicator Part_Of must denote abstract state, single "
-   & "protected type or single task type", Encap);
-return;
- end if;
-
-  --  This is a syntax error, always report
-
-  else
- Error_Msg_N
-   ("indicator Part_Of must denote abstract state, single protected "
-& "type or single task type", Encap);
- return;
-  end if;
-
-  --  Catch a case where indicator Part_Of denotes the abstract view of a
-  --  variable which appears as an abstract state (SPARK RM 10.1.2 2).
-
-  if From_Limited_With (Encap_Id)
-and then Present (Non_Limited_View (Encap_Id))
-and then Ekind (Non_Limited_View (Encap_Id)) = E_

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
> Hi,
> As explained in the PR, given below test case:
> int a[8][10] = { [2][5] = 4 }, c;
>
> int
> main ()
> {
>   short b;
>   int i, d;
>   for (b = 4; b >= 0; b--)
> for (c = 0; c <= 6; c++)
>   a[c + 1][b + 2] = a[c][b + 1];
>   for (i = 0; i < 8; i++)
> for (d = 0; d < 10; d++)
>   if (a[i][d] != (i == 3 && d == 6) * 4)
> __builtin_abort ();
>   return 0;
>
> the loop nest is illegal for vectorization without reversing inner loop.  The 
> issue
> is in data dependence checking of vectorizer, I believe the mentioned 
> revision just
> exposed this.  Previously the vectorization is skipped because of unsupported 
> memory
> operation.  The outer loop vectorization unrolls the outer loop into:
>
>   for (b = 4; b > 0; b -= 4)
>   {
> for (c = 0; c <= 6; c++)
>   a[c + 1][6] = a[c][5];
> for (c = 0; c <= 6; c++)
>   a[c + 1][5] = a[c][4];
> for (c = 0; c <= 6; c++)
>   a[c + 1][4] = a[c][3];
> for (c = 0; c <= 6; c++)
>   a[c + 1][3] = a[c][2];
>   }
> Then four inner loops are fused into:
>   for (b = 4; b > 0; b -= 4)
>   {
> for (c = 0; c <= 6; c++)
> {
>   a[c + 1][6] = a[c][5];  // S1
>   a[c + 1][5] = a[c][4];  // S2
>   a[c + 1][4] = a[c][3];
>   a[c + 1][3] = a[c][2];
> }
>   }

Note that they are not really "fused" but they are interleaved.  With
GIMPLE in mind
that makes a difference, you should get the equivalent of

   for (c = 0; c <= 6; c++)
 {
   tem1 = a[c][5];
   tem2 = a[c][4];
   tem3 = a[c][3];
   tem4 = a[c][2];
   a[c+1][6] = tem1;
   a[c +1][5] = tem2;
a[c+1][4] = tem3;
a[c+1][3] = tem4;
 }

> The loop fusion needs to meet the dependence requirement.  Basically, GCC's 
> data
> dependence analyzer does not model dep between references in sibling loops, 
> but
> in practice, fusion requirement can be checked by analyzing all data 
> references
> after fusion, and there is no backward data dependence.
>
> Apparently, the requirement is violated because we have backward data 
> dependence
> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the 
> inner
> loop, the outer loop would become legal for vectorization.
>
> This patch fixes the issue by enforcing dependence check.  It also adds two 
> tests
> with one shouldn't be vectorized and the other should.  Bootstrap and test on 
> x86_64
> and AArch64.  Is it OK?

I think you have identified the spot where things go wrong but I'm not
sure you fix the
problem fully.  The spot you pacth is (loop is the outer loop):

  loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
...
  FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
{
  int dist = dist_v[loop_depth];
...
  if (dist > 0 && DDR_REVERSED_P (ddr))
{
  /* If DDR_REVERSED_P the order of the data-refs in DDR was
 reversed (to make distance vector positive), and the actual
 distance is negative.  */
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "dependence distance negative.\n");

where you add

+ /* When doing outer loop vectorization, we need to check if there is
+backward dependence at inner loop level if dependence at the outer
+loop is reversed.  See PR81740 for more information.  */
+ if (nested_in_vect_loop_p (loop, DR_STMT (dra))
+ || nested_in_vect_loop_p (loop, DR_STMT (drb)))
+   {
+ unsigned inner_depth = index_in_loop_nest (loop->inner->num,
+DDR_LOOP_NEST (ddr));
+ if (dist_v[inner_depth] < 0)
+   return true;
+   }

but I don't understand how the dependence direction with respect to the
outer loop matters here.

Given there's DDR_REVERSED on the outer loop distance what does that
mean for the inner loop distance given the quite non-obvious code handling
this case in tree-data-ref.c:

  /* Verify a basic constraint: classic distance vectors should
 always be lexicographically positive.

 Data references are collected in the order of execution of
 the program, thus for the following loop

 | for (i = 1; i < 100; i++)
 |   for (j = 1; j < 100; j++)
 | {
 |   t = T[j+1][i-1];  // A
 |   T[j][i] = t + 2;  // B
 | }

 references are collected following the direction of the wind:
 A then B.  The data dependence tests are performed also
 following this order, such that we're looking at the distance
 separating the elements accessed by A from the elements later
 accessed by B.  But in this example, the distance returned by
 test_dep (A, B) is lexicographically negative (-1, 1), that
 means that the access A occurs later than B 

[PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2017-12-15 Thread Sudakshina Das

Hi

This patch fixes the inconsistent behavior observed at -O3 for the 
unordered comparisons. According to the online docs 
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
all of the following should not raise an FP exception:

- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising 
instructions such as fcmgt. This patch changes the instructions that are 
emitted to in order to not give out the exceptions. We first check each 
operand for NaNs and force any elements containing NaN to zero before 
using them in the compare.


Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : 
a, isnan (b) ? 0.0 : b))



The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new test case.


Is this ok for trunk? This will probably need a back-port to 
gcc-7-branch as well.


Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
	* config/aarch64/aarch64-simd.md (vec_cmp): Modify 
instructions for

UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
* gcc.target/aarch64/pr81647.c: New.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
 	  break;
 	}
   /* Fall through.  */
-case UNGE:
+case UNLT:
   std::swap (operands[2], operands[3]);
   /* Fall through.  */
-case UNLE:
+case UNGT:
 case GT:
   comparison = gen_aarch64_cmgt;
   break;
@@ -2745,10 +2745,10 @@
 	  break;
 	}
   /* Fall through.  */
-case UNGT:
+case UNLE:
   std::swap (operands[2], operands[3]);
   /* Fall through.  */
-case UNLT:
+case UNGE:
 case GE:
   comparison = gen_aarch64_cmge;
   break;
@@ -2771,21 +2771,35 @@
 case UNGT:
 case UNLE:
 case UNLT:
-case NE:
-  /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 invert the result, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a UNGE b -> !(b GT a)
-	 a UNGT b -> !(b GE a)
-	 a UNLE b -> !(a GT b)
-	 a UNLT b -> !(a GE b)
-	 a   NE b -> !(a EQ b)  */
-  gcc_assert (comparison != NULL);
-  emit_insn (comparison (operands[0], operands[2], operands[3]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  {
+	/* All of the above must not raise any FP exceptions.  Thus we first
+	   check each operand for NaNs and force any elements containing NaN to
+	   zero before using them in the compare.
+	   Example: UN (a, b) -> UNORDERED (a, b) |
+ (cm (isnan (a) ? 0.0 : a,
+	  isnan (b) ? 0.0 : b))
+	   We use the following transformations for doing the comparisions:
+	   a UNGE b -> a GE b
+	   a UNGT b -> a GT b
+	   a UNLE b -> b GE a
+	   a UNLT b -> b GT a.  */
+
+	rtx tmp0 = gen_reg_rtx (mode);
+	rtx tmp1 = gen_reg_rtx (mode);
+	rtx tmp2 = gen_reg_rtx (mode);
+	emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2]));
+	emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3]));
+	emit_insn (gen_and3 (tmp2, tmp0, tmp1));
+	emit_insn (gen_and3 (tmp0, tmp0,
+			lowpart_subreg (mode, operands[2], mode)));
+	emit_insn (gen_and3 (tmp1, tmp1,
+			lowpart_subreg (mode, operands[3], mode)));
+	gcc_assert (comparison != NULL);
+	emit_insn (comparison (operands[0],
+			   lowpart_subreg (mode, tmp0, mode),
+			   lowpart_subreg (mode, tmp1, mode)));
+	emit_insn (gen_orn3 (operands[0], tmp2, operands[0]));
+  }
   break;
 
 case LT:
@@ -2793,25 +2807,19 @@
 case GT:
 case GE:
 case EQ:
+case NE:
   /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
 	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
 	 a GE b -> a GE b
 	 a GT b -> a GT b
 	 a LE b -> b GE a
 	 a LT b -> b GT a
-	 a EQ b -> a EQ b  */
+	 a EQ b -> a EQ b
+	 a NE b -> ~(a EQ b)  */
   gcc_assert (comparison != NULL);
   emit_insn (comparison (operands[0], operands[2], operands[3]));
-  break;
-
-case UNEQ:
-  /* We first check (a > b ||  b > a) which is !UNEQ, inverting
-	 this result will then give us (a == b || a UNORDERED b).  */
-  emit_insn (gen_aarch64_cmgt (operands[0],
-	 operands[2], operands[3]));
-  emit_insn (gen_aarch64_cmgt (tmp, operands[3], operands[2]));
-  emit

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-15 Thread Bin.Cheng
On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
 wrote:
> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
>> Hi,
>> As explained in the PR, given below test case:
>> int a[8][10] = { [2][5] = 4 }, c;
>>
>> int
>> main ()
>> {
>>   short b;
>>   int i, d;
>>   for (b = 4; b >= 0; b--)
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][b + 2] = a[c][b + 1];
>>   for (i = 0; i < 8; i++)
>> for (d = 0; d < 10; d++)
>>   if (a[i][d] != (i == 3 && d == 6) * 4)
>> __builtin_abort ();
>>   return 0;
>>
>> the loop nest is illegal for vectorization without reversing inner loop.  
>> The issue
>> is in data dependence checking of vectorizer, I believe the mentioned 
>> revision just
>> exposed this.  Previously the vectorization is skipped because of 
>> unsupported memory
>> operation.  The outer loop vectorization unrolls the outer loop into:
>>
>>   for (b = 4; b > 0; b -= 4)
>>   {
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][6] = a[c][5];
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][5] = a[c][4];
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][4] = a[c][3];
>> for (c = 0; c <= 6; c++)
>>   a[c + 1][3] = a[c][2];
>>   }
>> Then four inner loops are fused into:
>>   for (b = 4; b > 0; b -= 4)
>>   {
>> for (c = 0; c <= 6; c++)
>> {
>>   a[c + 1][6] = a[c][5];  // S1
>>   a[c + 1][5] = a[c][4];  // S2
>>   a[c + 1][4] = a[c][3];
>>   a[c + 1][3] = a[c][2];
>> }
>>   }
>
> Note that they are not really "fused" but they are interleaved.  With
> GIMPLE in mind
> that makes a difference, you should get the equivalent of
>
>for (c = 0; c <= 6; c++)
>  {
>tem1 = a[c][5];
>tem2 = a[c][4];
>tem3 = a[c][3];
>tem4 = a[c][2];
>a[c+1][6] = tem1;
>a[c +1][5] = tem2;
> a[c+1][4] = tem3;
> a[c+1][3] = tem4;
>  }
Yeah, I will double check if this abstract breaks the patch and how.

>
>> The loop fusion needs to meet the dependence requirement.  Basically, GCC's 
>> data
>> dependence analyzer does not model dep between references in sibling loops, 
>> but
>> in practice, fusion requirement can be checked by analyzing all data 
>> references
>> after fusion, and there is no backward data dependence.
>>
>> Apparently, the requirement is violated because we have backward data 
>> dependence
>> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the 
>> inner
>> loop, the outer loop would become legal for vectorization.
>>
>> This patch fixes the issue by enforcing dependence check.  It also adds two 
>> tests
>> with one shouldn't be vectorized and the other should.  Bootstrap and test 
>> on x86_64
>> and AArch64.  Is it OK?
>
> I think you have identified the spot where things go wrong but I'm not
> sure you fix the
> problem fully.  The spot you pacth is (loop is the outer loop):
>
>   loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
> ...
>   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
> {
>   int dist = dist_v[loop_depth];
> ...
>   if (dist > 0 && DDR_REVERSED_P (ddr))
> {
>   /* If DDR_REVERSED_P the order of the data-refs in DDR was
>  reversed (to make distance vector positive), and the actual
>  distance is negative.  */
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "dependence distance negative.\n");
>
> where you add
>
> + /* When doing outer loop vectorization, we need to check if there is
> +backward dependence at inner loop level if dependence at the 
> outer
> +loop is reversed.  See PR81740 for more information.  */
> + if (nested_in_vect_loop_p (loop, DR_STMT (dra))
> + || nested_in_vect_loop_p (loop, DR_STMT (drb)))
> +   {
> + unsigned inner_depth = index_in_loop_nest (loop->inner->num,
> +DDR_LOOP_NEST (ddr));
> + if (dist_v[inner_depth] < 0)
> +   return true;
> +   }
>
> but I don't understand how the dependence direction with respect to the
> outer loop matters here.
If the direction wrto outer loop is positive by itself, i.e,
reversed_p equals to false, then dist is checked against max_vf.  In
this case, it's not possible to have references refer to the same
object?
On the other hand, dist is not checked at all for reversed case.
Maybe an additional check "dist < max_vf" can relax the patch a bit.
>
> Given there's DDR_REVERSED on the outer loop distance what does that
> mean for the inner loop distance given the quite non-obvious code handling
> this case in tree-data-ref.c:
>
>   /* Verify a basic constraint: classic distance vectors should
>  always be lexicographically positive.
>
>  Data references are collected in the order of execution of
>  the program, thus for the following loop
>
>

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-15 Thread Bin.Cheng
On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng  wrote:
> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
>  wrote:
>> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
>>> Hi,
>>> As explained in the PR, given below test case:
>>> int a[8][10] = { [2][5] = 4 }, c;
>>>
>>> int
>>> main ()
>>> {
>>>   short b;
>>>   int i, d;
>>>   for (b = 4; b >= 0; b--)
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][b + 2] = a[c][b + 1];
>>>   for (i = 0; i < 8; i++)
>>> for (d = 0; d < 10; d++)
>>>   if (a[i][d] != (i == 3 && d == 6) * 4)
>>> __builtin_abort ();
>>>   return 0;
>>>
>>> the loop nest is illegal for vectorization without reversing inner loop.  
>>> The issue
>>> is in data dependence checking of vectorizer, I believe the mentioned 
>>> revision just
>>> exposed this.  Previously the vectorization is skipped because of 
>>> unsupported memory
>>> operation.  The outer loop vectorization unrolls the outer loop into:
>>>
>>>   for (b = 4; b > 0; b -= 4)
>>>   {
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][6] = a[c][5];
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][5] = a[c][4];
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][4] = a[c][3];
>>> for (c = 0; c <= 6; c++)
>>>   a[c + 1][3] = a[c][2];
>>>   }
>>> Then four inner loops are fused into:
>>>   for (b = 4; b > 0; b -= 4)
>>>   {
>>> for (c = 0; c <= 6; c++)
>>> {
>>>   a[c + 1][6] = a[c][5];  // S1
>>>   a[c + 1][5] = a[c][4];  // S2
>>>   a[c + 1][4] = a[c][3];
>>>   a[c + 1][3] = a[c][2];
>>> }
>>>   }
>>
>> Note that they are not really "fused" but they are interleaved.  With
>> GIMPLE in mind
>> that makes a difference, you should get the equivalent of
>>
>>for (c = 0; c <= 6; c++)
>>  {
>>tem1 = a[c][5];
>>tem2 = a[c][4];
>>tem3 = a[c][3];
>>tem4 = a[c][2];
>>a[c+1][6] = tem1;
>>a[c +1][5] = tem2;
>> a[c+1][4] = tem3;
>> a[c+1][3] = tem4;
>>  }
> Yeah, I will double check if this abstract breaks the patch and how.
Hmm, I think this doesn't break it, well at least for part of the
analysis, because it is loop carried (backward) dependence goes wrong,
interleaving or not with the same iteration doesn't matter here.

Thanks,
bin
>
>>
>>> The loop fusion needs to meet the dependence requirement.  Basically, GCC's 
>>> data
>>> dependence analyzer does not model dep between references in sibling loops, 
>>> but
>>> in practice, fusion requirement can be checked by analyzing all data 
>>> references
>>> after fusion, and there is no backward data dependence.
>>>
>>> Apparently, the requirement is violated because we have backward data 
>>> dependence
>>> between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the 
>>> inner
>>> loop, the outer loop would become legal for vectorization.
>>>
>>> This patch fixes the issue by enforcing dependence check.  It also adds two 
>>> tests
>>> with one shouldn't be vectorized and the other should.  Bootstrap and test 
>>> on x86_64
>>> and AArch64.  Is it OK?
>>
>> I think you have identified the spot where things go wrong but I'm not
>> sure you fix the
>> problem fully.  The spot you pacth is (loop is the outer loop):
>>
>>   loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
>> ...
>>   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>> {
>>   int dist = dist_v[loop_depth];
>> ...
>>   if (dist > 0 && DDR_REVERSED_P (ddr))
>> {
>>   /* If DDR_REVERSED_P the order of the data-refs in DDR was
>>  reversed (to make distance vector positive), and the actual
>>  distance is negative.  */
>>   if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>  "dependence distance negative.\n");
>>
>> where you add
>>
>> + /* When doing outer loop vectorization, we need to check if there 
>> is
>> +backward dependence at inner loop level if dependence at the 
>> outer
>> +loop is reversed.  See PR81740 for more information.  */
>> + if (nested_in_vect_loop_p (loop, DR_STMT (dra))
>> + || nested_in_vect_loop_p (loop, DR_STMT (drb)))
>> +   {
>> + unsigned inner_depth = index_in_loop_nest (loop->inner->num,
>> +DDR_LOOP_NEST 
>> (ddr));
>> + if (dist_v[inner_depth] < 0)
>> +   return true;
>> +   }
>>
>> but I don't understand how the dependence direction with respect to the
>> outer loop matters here.
> If the direction wrto outer loop is positive by itself, i.e,
> reversed_p equals to false, then dist is checked against max_vf.  In
> this case, it's not possible to have references refer to the same
> object?
> On the other hand, dist is not checked at all for reversed case.
> Maybe an additional check "dist < max_vf" can relax the patch a bit.
>>
>> Given there's DD

Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Wilco Dijkstra
Hi Qing,

Just looking at a very high level, I have a few comments:

1. Constant folding str(n)cmp - folding is done separately in fold-const-call.c
   and gimple-fold.c.  There is already code for folding strcmp and strncmp,
   so we shouldn't need to add new foldings.  Or do you have an example that
   isn't folded as expected? If so, a fix should be added to the existing code.

2. Why check for str(n)cmp == 0 / != 0? There is no need to explicitly check
   for equality comparisons since folding into memcmp is always good.

3. Why handle strncmp? There is already code to convert strncmp into strcmp,
   so why repeat that again in a different way? It just seems to make the
   code significantly more complex for no benefit.

You can achieve the same effect by just optimizing strcmp into memcmp when
legal without checking for equality comparison.  As a result you can 
considerably
reduce the size of this patch while handling more useful cases.

Wilco



Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab

2017-12-15 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford
>  wrote:
>> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
>> isn't needed with the new VECTOR_CST layout.  It's really just the
>> original patch with bits removed, but just in case:
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>> OK to install?
>
> To keep things simple at this point OK.  Note that I'd eventually
> like to see this as VEC_PERM_EXPR .
> For reductions when we need { x, 0, ... } we now have to use a
> VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR
> to merge it with {0, ... }, right?  Rather than VEC_PERM_EXPR  { 0, 1, 1, 1 }>

That's where the shift-left-and-insert-scalar thing (IFN_SHL_INSERT)
comes in.  But yeah, allowing scalars as operands to VEC_PERM_EXPRs
would mean it could represent both VEC_DUPLICATE_EXPR and IFN_SHL_INSERT.
I guess the question is whether that's better than extending CONSTRUCTOR
(or a replacement) to use the VECTOR_CST encoding.  I realise you don't
like CONSTRUCTOR in gimple though...

I promise to look at either of those for GCC 9 if you think they're
better, but they'll be more invasive for other targets.

Thanks,
Richard


[PATCH] Fix PR81877

2017-12-15 Thread Richard Biener

The following removes safelen handling from LIM - it is not really
useful information to it and it was used to derive incorrect conclusions
about dependences.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I'll commit this on Monday to leave some time for comments (but many
are already in the PR).

Richard.

2017-12-15  Richard Biener  

PR tree-optimization/81877
* tree-ssa-loop-im.c (ref_indep_loop_p): Remove safelen parameters.
(outermost_indep_loop): Adjust.
(ref_indep_loop_p_1): Likewise.  Remove safelen handling again.
(can_sm_ref_p): Adjust.

* g++.dg/torture/pr81877.C: New testcase.
* g++.dg/vect/pr70729.cc: XFAIL.
* g++.dg/vect/pr70729-nest.cc: XFAIL.

Index: gcc/tree-ssa-loop-im.c
===
--- gcc/tree-ssa-loop-im.c  (revision 255678)
+++ gcc/tree-ssa-loop-im.c  (working copy)
@@ -199,7 +199,7 @@ static struct
 static bitmap_obstack lim_bitmap_obstack;
 static obstack mem_ref_obstack;
 
-static bool ref_indep_loop_p (struct loop *, im_mem_ref *, struct loop *);
+static bool ref_indep_loop_p (struct loop *, im_mem_ref *);
 static bool ref_always_accessed_p (struct loop *, im_mem_ref *, bool);
 
 /* Minimum cost of an expensive expression.  */
@@ -548,10 +548,10 @@ outermost_indep_loop (struct loop *outer
aloop != loop;
aloop = superloop_at_depth (loop, loop_depth (aloop) + 1))
 if ((!ref->stored || !bitmap_bit_p (ref->stored, aloop->num))
-   && ref_indep_loop_p (aloop, ref, loop))
+   && ref_indep_loop_p (aloop, ref))
   return aloop;
 
-  if (ref_indep_loop_p (loop, ref, loop))
+  if (ref_indep_loop_p (loop, ref))
 return loop;
   else
 return NULL;
@@ -2150,20 +2150,13 @@ record_dep_loop (struct loop *loop, im_m
 }
 
 /* Returns true if REF is independent on all other memory
-   references in LOOP.  REF_LOOP is where REF is accessed, SAFELEN is the
-   safelen to apply.  */
+   references in LOOP.  */
 
 static bool
-ref_indep_loop_p_1 (int safelen, struct loop *loop, im_mem_ref *ref,
-   bool stored_p, struct loop *ref_loop)
+ref_indep_loop_p_1 (struct loop *loop, im_mem_ref *ref, bool stored_p)
 {
   stored_p |= (ref->stored && bitmap_bit_p (ref->stored, loop->num));
 
-  if (loop->safelen > safelen
-  /* Check that REF is accessed inside LOOP.  */
-  && (loop == ref_loop || flow_loop_nested_p (loop, ref_loop)))
-safelen = loop->safelen;
-
   bool indep_p = true;
   bitmap refs_to_check;
 
@@ -2174,32 +2167,6 @@ ref_indep_loop_p_1 (int safelen, struct
 
   if (bitmap_bit_p (refs_to_check, UNANALYZABLE_MEM_ID))
 indep_p = false;
-  else if (safelen > 1)
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- fprintf (dump_file,"REF is independent due to safelen %d\n",
-  safelen);
- print_generic_expr (dump_file, ref->mem.ref, TDF_SLIM);
- fprintf (dump_file, "\n");
-   }
-
-  /* We need to recurse to properly handle UNANALYZABLE_MEM_ID.  */
-  struct loop *inner = loop->inner;
-  while (inner)
-   {
- if (!ref_indep_loop_p_1 (safelen, inner, ref, stored_p, ref_loop))
-   {
- indep_p = false;
- break;
-   }
- inner = inner->next;
-   }
-
-  /* Avoid caching here as safelen depends on context and refs
- are shared between different contexts.  */
-  return indep_p;
-}
   else
 {
   if (bitmap_bit_p (&ref->indep_loop, LOOP_DEP_BIT (loop->num, stored_p)))
@@ -2210,7 +2177,7 @@ ref_indep_loop_p_1 (int safelen, struct
   struct loop *inner = loop->inner;
   while (inner)
{
- if (!ref_indep_loop_p_1 (safelen, inner, ref, stored_p, ref_loop))
+ if (!ref_indep_loop_p_1 (inner, ref, stored_p))
{
  indep_p = false;
  break;
@@ -2264,14 +2231,14 @@ ref_indep_loop_p_1 (int safelen, struct
 }
 
 /* Returns true if REF is independent on all other memory references in
-   LOOP.  REF_LOOP is the loop where REF is accessed.  */
+   LOOP.  */
 
 static bool
-ref_indep_loop_p (struct loop *loop, im_mem_ref *ref, struct loop *ref_loop)
+ref_indep_loop_p (struct loop *loop, im_mem_ref *ref)
 {
   gcc_checking_assert (MEM_ANALYZABLE (ref));
 
-  return ref_indep_loop_p_1 (0, loop, ref, false, ref_loop);
+  return ref_indep_loop_p_1 (loop, ref, false);
 }
 
 /* Returns true if we can perform store motion of REF from LOOP.  */
@@ -2307,7 +2274,7 @@ can_sm_ref_p (struct loop *loop, im_mem_
 
   /* And it must be independent on all other memory references
  in LOOP.  */
-  if (!ref_indep_loop_p (loop, ref, loop))
+  if (!ref_indep_loop_p (loop, ref))
 return false;
 
   return true;
Index: gcc/testsuite/g++.dg/torture/pr81877.C
===
--- gcc/testsuite/g++.dg/torture/pr81877.C  (nonexistent)
+++ gcc/testsuite/g++.

[PATCH] Fix PR77291

2017-12-15 Thread Richard Biener

This adjusts array_at_struct_end_p to more properly adhere to its
documentation - if an underlying object constraints the size of the
trailing array we still have to return true in case there's room
for at least one excess element compared to what the array domain
specifies.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-12-15  Richard Biener  

PR middle-end/77291
* tree.c (array_at_struct_end_p): Return true if the underlying
object has space for at least one element in excess of what
the array domain specifies.

* gcc.dg/Warray-bounds-25.c: New testcase.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 255678)
+++ gcc/tree.c  (working copy)
@@ -12615,6 +12615,7 @@ array_at_struct_end_p (tree ref)
   if (TREE_CODE (ref) == STRING_CST)
 return false;
 
+  tree ref_to_array = ref;
   while (handled_component_p (ref))
 {
   /* If the reference chain contains a component reference to a
@@ -12653,35 +12654,41 @@ array_at_struct_end_p (tree ref)
   /* The array now is at struct end.  Treat flexible arrays as
  always subject to extend, even into just padding constrained by
  an underlying decl.  */
-  if (! TYPE_SIZE (atype))
+  if (! TYPE_SIZE (atype)
+  || ! TYPE_DOMAIN (atype)
+  || ! TYPE_MAX_VALUE (TYPE_DOMAIN (atype)))
 return true;
 
-  tree size = NULL;
-
   if (TREE_CODE (ref) == MEM_REF
   && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
-{
-  size = TYPE_SIZE (TREE_TYPE (ref));
-  ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
-}
+ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
 
   /* If the reference is based on a declared entity, the size of the array
  is constrained by its given domain.  (Do not trust commons PR/69368).  */
   if (DECL_P (ref)
-  /* Be sure the size of MEM_REF target match.  For example:
-
-  char buf[10];
-  struct foo *str = (struct foo *)&buf;
-
-  str->trailin_array[2] = 1;
-
-is valid because BUF allocate enough space.  */
-
-  && (!size || (DECL_SIZE (ref) != NULL
-   && operand_equal_p (DECL_SIZE (ref), size, 0)))
   && !(flag_unconstrained_commons
-  && VAR_P (ref) && DECL_COMMON (ref)))
-return false;
+  && VAR_P (ref) && DECL_COMMON (ref))
+  && DECL_SIZE_UNIT (ref)
+  && TREE_CODE (DECL_SIZE_UNIT (ref)) == INTEGER_CST)
+{
+  /* Check whether the array domain covers all of the available
+ padding.  */
+  HOST_WIDE_INT offset;
+  if (TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (atype))) != INTEGER_CST)
+   return true;
+  if (! get_addr_base_and_unit_offset (ref_to_array, &offset))
+   return true;
+
+  /* If at least one extra element fits it is a flexarray.  */
+  if (wi::les_p ((wi::to_offset (TYPE_MAX_VALUE (TYPE_DOMAIN (atype)))
+ - wi::to_offset (TYPE_MIN_VALUE (TYPE_DOMAIN (atype)))
+ + 2)
+* wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (atype))),
+wi::to_offset (DECL_SIZE_UNIT (ref)) - offset))
+   return true;
+
+  return false;
+}
 
   return true;
 }
Index: gcc/testsuite/gcc.dg/Warray-bounds-25.c
===
--- gcc/testsuite/gcc.dg/Warray-bounds-25.c (nonexistent)
+++ gcc/testsuite/gcc.dg/Warray-bounds-25.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds" } */
+
+struct Rec {
+  unsigned char data[1];  // actually variable length
+};
+
+union U {
+  unsigned char buf[42];
+  struct Rec rec;
+};
+
+int Load()
+{
+  union U u;
+  return u.rec.data[1]; /* { dg-bogus "array bound" } */
+}


Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 1:35 PM, Bin.Cheng  wrote:
> On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng  wrote:
>> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
>>  wrote:
>>> On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
 Hi,
 As explained in the PR, given below test case:
 int a[8][10] = { [2][5] = 4 }, c;

 int
 main ()
 {
   short b;
   int i, d;
   for (b = 4; b >= 0; b--)
 for (c = 0; c <= 6; c++)
   a[c + 1][b + 2] = a[c][b + 1];
   for (i = 0; i < 8; i++)
 for (d = 0; d < 10; d++)
   if (a[i][d] != (i == 3 && d == 6) * 4)
 __builtin_abort ();
   return 0;

 the loop nest is illegal for vectorization without reversing inner loop.  
 The issue
 is in data dependence checking of vectorizer, I believe the mentioned 
 revision just
 exposed this.  Previously the vectorization is skipped because of 
 unsupported memory
 operation.  The outer loop vectorization unrolls the outer loop into:

   for (b = 4; b > 0; b -= 4)
   {
 for (c = 0; c <= 6; c++)
   a[c + 1][6] = a[c][5];
 for (c = 0; c <= 6; c++)
   a[c + 1][5] = a[c][4];
 for (c = 0; c <= 6; c++)
   a[c + 1][4] = a[c][3];
 for (c = 0; c <= 6; c++)
   a[c + 1][3] = a[c][2];
   }
 Then four inner loops are fused into:
   for (b = 4; b > 0; b -= 4)
   {
 for (c = 0; c <= 6; c++)
 {
   a[c + 1][6] = a[c][5];  // S1
   a[c + 1][5] = a[c][4];  // S2
   a[c + 1][4] = a[c][3];
   a[c + 1][3] = a[c][2];
 }
   }
>>>
>>> Note that they are not really "fused" but they are interleaved.  With
>>> GIMPLE in mind
>>> that makes a difference, you should get the equivalent of
>>>
>>>for (c = 0; c <= 6; c++)
>>>  {
>>>tem1 = a[c][5];
>>>tem2 = a[c][4];
>>>tem3 = a[c][3];
>>>tem4 = a[c][2];
>>>a[c+1][6] = tem1;
>>>a[c +1][5] = tem2;
>>> a[c+1][4] = tem3;
>>> a[c+1][3] = tem4;
>>>  }
>> Yeah, I will double check if this abstract breaks the patch and how.
> Hmm, I think this doesn't break it, well at least for part of the
> analysis, because it is loop carried (backward) dependence goes wrong,
> interleaving or not with the same iteration doesn't matter here.

I think the idea is that forward dependences are always fine (negative distance)
to vectorize.  But with backward dependences we have to adhere to max_vf.

It looks like for outer loop vectorization we only look at the distances in the
outer loop but never at inner ones.  But here the same applies but isn't that
independend on the distances with respect to the outer loop?

But maybe I'm misunderstanding how "distances" work here.

Richard.

> Thanks,
> bin
>>
>>>
 The loop fusion needs to meet the dependence requirement.  Basically, 
 GCC's data
 dependence analyzer does not model dep between references in sibling 
 loops, but
 in practice, fusion requirement can be checked by analyzing all data 
 references
 after fusion, and there is no backward data dependence.

 Apparently, the requirement is violated because we have backward data 
 dependence
 between references (a[c][5], a[c+1][5]) in S1/S2.  Note, if we reverse the 
 inner
 loop, the outer loop would become legal for vectorization.

 This patch fixes the issue by enforcing dependence check.  It also adds 
 two tests
 with one shouldn't be vectorized and the other should.  Bootstrap and test 
 on x86_64
 and AArch64.  Is it OK?
>>>
>>> I think you have identified the spot where things go wrong but I'm not
>>> sure you fix the
>>> problem fully.  The spot you pacth is (loop is the outer loop):
>>>
>>>   loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
>>> ...
>>>   FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
>>> {
>>>   int dist = dist_v[loop_depth];
>>> ...
>>>   if (dist > 0 && DDR_REVERSED_P (ddr))
>>> {
>>>   /* If DDR_REVERSED_P the order of the data-refs in DDR was
>>>  reversed (to make distance vector positive), and the actual
>>>  distance is negative.  */
>>>   if (dump_enabled_p ())
>>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>>  "dependence distance negative.\n");
>>>
>>> where you add
>>>
>>> + /* When doing outer loop vectorization, we need to check if there 
>>> is
>>> +backward dependence at inner loop level if dependence at the 
>>> outer
>>> +loop is reversed.  See PR81740 for more information.  */
>>> + if (nested_in_vect_loop_p (loop, DR_STMT (dra))
>>> + || nested_in_vect_loop_p (loop, DR_STMT (drb)))
>>> +   {
>>> + unsigned inner_depth = index_in_loop_nest (loop->inner->num,
>>> +  

Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab

2017-12-15 Thread Richard Biener
On Fri, Dec 15, 2017 at 1:52 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford
>>  wrote:
>>> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
>>> isn't needed with the new VECTOR_CST layout.  It's really just the
>>> original patch with bits removed, but just in case:
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>>> OK to install?
>>
>> To keep things simple at this point OK.  Note that I'd eventually
>> like to see this as VEC_PERM_EXPR .
>> For reductions when we need { x, 0, ... } we now have to use a
>> VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR
>> to merge it with {0, ... }, right?  Rather than VEC_PERM_EXPR > { 0, 1, 1, 1 }>
>
> That's where the shift-left-and-insert-scalar thing (IFN_SHL_INSERT)
> comes in.  But yeah, allowing scalars as operands to VEC_PERM_EXPRs
> would mean it could represent both VEC_DUPLICATE_EXPR and IFN_SHL_INSERT.
> I guess the question is whether that's better than extending CONSTRUCTOR
> (or a replacement) to use the VECTOR_CST encoding.  I realise you don't
> like CONSTRUCTOR in gimple though...
>
> I promise to look at either of those for GCC 9 if you think they're
> better, but they'll be more invasive for other targets.

Thanks.
Richard.

> Thanks,
> Richard


[Ada] Missing error on illegal initialization item

2017-12-15 Thread Pierre-Marie de Rodat
This patch modifies the analysis of pragma Initializes to detect an illegal
null initialization item.


-- Source --


--  remote.ads

package Remote is
   Y : Integer := 0;
end Remote;

--  pack.ads

with Remote;

package Pack
   with SPARK_Mode,
Initializes => (null => Remote.Y)
is
   X : Integer := 0;
end Pack;


-- Compilation and output --


$ gcc -c pack.ads
pack.ads:5:25: initialization item must denote object or state

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Hristian Kirtchev  

* sem_prag.adb (Analyze_Initialization_Item): Remove the specialized
processing for a null initialization item. Such an item is always
illegal.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 255692)
+++ sem_prag.adb(working copy)
@@ -2752,10 +2752,6 @@
   --  A list of all initialization items processed so far. This list is
   --  used to detect duplicate items.
 
-  Non_Null_Seen : Boolean := False;
-  Null_Seen : Boolean := False;
-  --  Flags used to check the legality of a null initialization list
-
   States_And_Objs : Elist_Id := No_Elist;
   --  A list of all abstract states and objects declared in the visible
   --  declarations of the related package. This list is used to detect the
@@ -2785,91 +2781,67 @@
  Item_Id : Entity_Id;
 
   begin
- --  Null initialization list
+ Analyze   (Item);
+ Resolve_State (Item);
 
- if Nkind (Item) = N_Null then
-if Null_Seen then
-   SPARK_Msg_N ("multiple null initializations not allowed", Item);
+ if Is_Entity_Name (Item) then
+Item_Id := Entity_Of (Item);
 
-elsif Non_Null_Seen then
-   SPARK_Msg_N
- ("cannot mix null and non-null initialization items", Item);
-else
-   Null_Seen := True;
-end if;
+if Present (Item_Id)
+  and then Ekind_In (Item_Id, E_Abstract_State,
+  E_Constant,
+  E_Variable)
+then
+   --  When the initialization item is undefined, it appears as
+   --  Any_Id. Do not continue with the analysis of the item.
 
- --  Initialization item
+   if Item_Id = Any_Id then
+  null;
 
- else
-Non_Null_Seen := True;
+   --  The state or variable must be declared in the visible
+   --  declarations of the package (SPARK RM 7.1.5(7)).
 
-if Null_Seen then
-   SPARK_Msg_N
- ("cannot mix null and non-null initialization items", Item);
-end if;
+   elsif not Contains (States_And_Objs, Item_Id) then
+  Error_Msg_Name_1 := Chars (Pack_Id);
+  SPARK_Msg_NE
+("initialization item & must appear in the visible "
+ & "declarations of package %", Item, Item_Id);
 
-Analyze   (Item);
-Resolve_State (Item);
+   --  Detect a duplicate use of the same initialization item
+   --  (SPARK RM 7.1.5(5)).
 
-if Is_Entity_Name (Item) then
-   Item_Id := Entity_Of (Item);
+   elsif Contains (Items_Seen, Item_Id) then
+  SPARK_Msg_N ("duplicate initialization item", Item);
 
-   if Present (Item_Id)
- and then Ekind_In (Item_Id, E_Abstract_State,
- E_Constant,
- E_Variable)
-   then
-  --  When the initialization item is undefined, it appears as
-  --  Any_Id. Do not continue with the analysis of the item.
+   --  The item is legal, add it to the list of processed states
+   --  and variables.
 
-  if Item_Id = Any_Id then
- null;
+   else
+  Append_New_Elmt (Item_Id, Items_Seen);
 
-  --  The state or variable must be declared in the visible
-  --  declarations of the package (SPARK RM 7.1.5(7)).
-
-  elsif not Contains (States_And_Objs, Item_Id) then
- Error_Msg_Name_1 := Chars (Pack_Id);
- SPARK_Msg_NE
-   ("initialization item & must appear in the visible "
-& "declarations of package %", Item, Item_Id);
-
-  --  Detect a duplicate use of the same initialization item
-  --  (SPARK RM 7.1.5(5)).
-
-  elsif Contains (Items_Seen, Item_Id) then
- SPARK_Msg_N ("duplicate initialization ite

[Ada] Fix incorrect assignment to array with Component_Size clause

2017-12-15 Thread Pierre-Marie de Rodat
This change fixes a wrong translation of the assignment of an aggregate
made up of a single Others choice to an array whose nominal size of the
component type is the storage unit and which is subject to a Component_Size
clause that effectively bumps this size.

The compiler was generating a call to memset in this case, which filled
the gap between the nominal size and the component size with copies of
the single Others value instead of zero/sign-extending it appropriately.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Eric Botcazou  

* exp_aggr.adb: Fix for QC04-027 (incorrect assignment to array
with Component_Size clause):

* exp_aggr.adb (Aggr_Assignment_OK_For_Backend): Use
Component_Size of the innermost array instead of Esize of its
component type to exclude inappropriate array types, including
packed array types.

gcc/testsuite/

2017-12-15  Eric Botcazou  

* gnat.dg/component_size.adb: New testcase.
Index: exp_aggr.adb
===
--- exp_aggr.adb(revision 255693)
+++ exp_aggr.adb(working copy)
@@ -4895,14 +4895,14 @@
 
   --1. N consists of a single OTHERS choice, possibly recursively
 
-  --2. The array type is not packed
+  --2. The array type has no null ranges (the purpose of this is to
+  --   avoid a bogus warning for an out-of-range value).
 
   --3. The array type has no atomic components
 
-  --4. The array type has no null ranges (the purpose of this is to
-  --   avoid a bogus warning for an out-of-range value).
+  --4. The component type is elementary
 
-  --5. The component type is elementary
+  --5. The component size is a multiple of Storage_Unit
 
   --6. The component size is Storage_Unit or the value is of the form
   --   M * (1 + A**1 + A**2 + .. A**(K-1)) where A = 2**(Storage_Unit)
@@ -4918,6 +4918,7 @@
  Expr  : Node_Id := N;
  Low   : Node_Id;
  High  : Node_Id;
+ Csiz  : Uint;
  Remainder : Uint;
  Value : Uint;
  Nunits: Nat;
@@ -4933,14 +4934,6 @@
return False;
 end if;
 
-if Present (Packed_Array_Impl_Type (Ctyp)) then
-   return False;
-end if;
-
-if Has_Atomic_Components (Ctyp) then
-   return False;
-end if;
-
 Index := First_Index (Ctyp);
 while Present (Index) loop
Get_Index_Bounds (Index, Low, High);
@@ -4964,6 +4957,11 @@
Expr := Expression (First (Component_Associations (Expr)));
 end loop;
 
+if Has_Atomic_Components (Ctyp) then
+   return False;
+end if;
+
+Csiz := Component_Size (Ctyp);
 Ctyp := Component_Type (Ctyp);
 
 if Is_Atomic_Or_VFA (Ctyp) then
@@ -4978,20 +4976,19 @@
 return False;
  end if;
 
- --  All elementary types are supported
+ --  Access types need to be dealt with specially
 
- if not Is_Elementary_Type (Ctyp) then
-return False;
- end if;
+ if Is_Access_Type (Ctyp) then
 
- --  However access types need to be dealt with specially
+--  Component_Size is not set by Layout_Type if the component
+--  type is an access type ???
 
- if Is_Access_Type (Ctyp) then
+Csiz := Esize (Ctyp);
 
 --  Fat pointers are rejected as they are not really elementary
 --  for the backend.
 
-if Esize (Ctyp) /= System_Address_Size then
+if Csiz /= System_Address_Size then
return False;
 end if;
 
@@ -5002,16 +4999,27 @@
 if Nkind (Expr) /= N_Null and then not Is_Entity_Name (Expr) then
return False;
 end if;
+
+ --  Scalar types are OK if their size is a multiple of Storage_Unit
+
+ elsif Is_Scalar_Type (Ctyp) then
+
+if Csiz mod System_Storage_Unit /= 0 then
+   return False;
+end if;
+
+ --  Composite types are rejected
+
+ else
+return False;
  end if;
 
  --  The expression needs to be analyzed if True is returned
 
  Analyze_And_Resolve (Expr, Ctyp);
 
- --  The back end uses the Esize as the precision of the type
+ Nunits := UI_To_Int (Csiz) / System_Storage_Unit;
 
- Nunits := UI_To_Int (Esize (Ctyp)) / System_Storage_Unit;
-
  if Nunits = 1 then
 return True;
  end if;
Index: ../testsuite/gnat.dg/component_size.adb
===
--- ../testsuite/gnat.dg/component_size.adb (revision 0)
+++ ../testsuite/gnat.dg/component_size.adb (revision 0)
@@

[Ada] Spurious error and missing warning on static predicate

2017-12-15 Thread Pierre-Marie de Rodat
This patch handles properly a static predicate on a scalar type that
is trivially true. Previous to this patch the compiler rejected the
predicate on the incorrect grounds that it was not a static expression.

Compiling bad_days.ads must yield:

   bad_days.ads:4:34: warning: predicate is redundant (always True)

---
package Bad_Days is
 type Day is (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday);
 subtype Day_Bad is Day with 
 Static_Predicate => Day_Bad in Day;
end Bad_Days;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Ed Schonberg  

* exp_ch4.adb (Expand_N_In): Do not replace a membership test on a
scalar type with a validity test when the membership appears in a
predicate expression, to prevent a spurious error when predicate is
specified static.
* sem_ch13.adb (Build_Predicate_Functions): Add warning if a static
predicate, after constant-folding, reduces to True and is this
redundant.
* par-ch4.adb: Typo fixes and minor reformattings.

Index: exp_ch4.adb
===
--- exp_ch4.adb (revision 255693)
+++ exp_ch4.adb (working copy)
@@ -6015,10 +6015,20 @@
   --  have a test in the generic that makes sense with some types
   --  and not with other types.
 
-  and then not In_Instance
+  --  Similarly, do not rewrite membership as a validity check if
+  --  within the predicate function for the type.
+
 then
-   Substitute_Valid_Check;
-   goto Leave;
+   if In_Instance
+ or else (Ekind (Current_Scope) = E_Function
+   and then Is_Predicate_Function (Current_Scope))
+   then
+  null;
+
+   else
+  Substitute_Valid_Check;
+  goto Leave;
+   end if;
 end if;
 
 --  If we have an explicit range, do a bit of optimization based on
Index: par-ch4.adb
===
--- par-ch4.adb (revision 255693)
+++ par-ch4.adb (working copy)
@@ -645,8 +645,8 @@
  --  case of a name which can be extended in the normal manner.
  --  This case is handled by LP_State_Name or LP_State_Expr.
 
- --  (Ada2020) : the expression can be a reduction_expression_
- --  psarameter, i.e. a box or  < Simple_Expression >
+ --  (Ada 2020): the expression can be a reduction_expression_
+ --  parameter, i.e. a box or < Simple_Expression >.
 
  --  Note: if and case expressions (without an extra level of
  --  parentheses) are permitted in this context).
@@ -679,7 +679,7 @@
  end if;
 
  --  Here we have an expression after all, which may be a reduction
- --  expression with a binary operator
+ --  expression with a binary operator.
 
  if Token = Tok_Less then
 Scan; -- past <
@@ -2894,7 +2894,7 @@
Node1 := P_Name;
return Node1;
 
---  Ada2020: reduction expression parameter
+--  Ada 2020: reduction expression parameter
 
 when Tok_Less =>
Scan; -- past <
Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 255678)
+++ sem_ch13.adb(working copy)
@@ -11919,6 +11919,12 @@
   then
  return True;
 
+  elsif Is_Entity_Name (Expr)
+and then Entity (Expr) = Standard_True
+  then
+ Error_Msg_N ("predicate is redundant (always True)?", Expr);
+ return True;
+
   --  That's an exhaustive list of tests, all other cases are not
   --  predicate-static, so we return False.
 
Index: sem_ch4.adb
===
--- sem_ch4.adb (revision 255693)
+++ sem_ch4.adb (working copy)
@@ -4155,7 +4155,7 @@
   and then Parent (Loop_Par) /= N
 then
--  The parser cannot distinguish between a loop specification
-   --  and an iterator specification. If after pre-analysis the
+   --  and an iterator specification. If after preanalysis the
--  proper form has been recognized, rewrite the expression to
--  reflect the right kind. This is needed for proper ASIS
--  navigation. If expansion is enabled, the transformation is
@@ -4378,7 +4378,7 @@
   and then Parent (Loop_Par) /= N
 then
--  The parser cannot distinguish between a loop specification
-   --  and an iterator specification. If after pre-analysis the
+   --  and an iterator specification. If after preanalysis the
--  proper form has been recognized, rewrite the

[Ada] Concurrent types in pragma Initializes

2017-12-15 Thread Pierre-Marie de Rodat
Concurrent types and single concurrent types can now appear in the input list
of pragma Initializes as long as the type encloses the pragma.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Hristian Kirtchev  

* sem_prag.adb (Analyze_Input_Item): Allow concurrent types to appear
within the input list of Initializes. Remove the uses of Input_OK.

gcc/testsuite/

2017-12-15  Hristian Kirtchev  

* gnat.dg/initializes.ads, gnat.dg/initializes.adb: New testcase.
Index: sem_prag.adb
===
--- sem_prag.adb(revision 255693)
+++ sem_prag.adb(working copy)
@@ -2867,7 +2867,6 @@
 
  procedure Analyze_Input_Item (Input : Node_Id) is
 Input_Id : Entity_Id;
-Input_OK : Boolean := True;
 
  begin
 --  Null input list
@@ -2908,6 +2907,8 @@
  E_In_Parameter,
  E_In_Out_Parameter,
  E_Out_Parameter,
+ E_Protected_Type,
+ E_Task_Type,
  E_Variable)
   then
  --  The input cannot denote states or objects declared
@@ -2933,11 +2934,11 @@
null;
 
 else
-   Input_OK := False;
Error_Msg_Name_1 := Chars (Pack_Id);
SPARK_Msg_NE
  ("input item & cannot denote a visible object or "
   & "state of package %", Input, Input_Id);
+   return;
 end if;
  end if;
 
@@ -2945,26 +2946,25 @@
  --  (SPARK RM 7.1.5(5)).
 
  if Contains (Inputs_Seen, Input_Id) then
-Input_OK := False;
 SPARK_Msg_N ("duplicate input item", Input);
+return;
  end if;
 
- --  Input is legal, add it to the list of processed inputs
+ --  At this point it is known that the input is legal. Add
+ --  it to the list of processed inputs.
 
- if Input_OK then
-Append_New_Elmt (Input_Id, Inputs_Seen);
+ Append_New_Elmt (Input_Id, Inputs_Seen);
 
-if Ekind (Input_Id) = E_Abstract_State then
-   Append_New_Elmt (Input_Id, States_Seen);
-end if;
+ if Ekind (Input_Id) = E_Abstract_State then
+Append_New_Elmt (Input_Id, States_Seen);
+ end if;
 
-if Ekind_In (Input_Id, E_Abstract_State,
-   E_Constant,
-   E_Variable)
-  and then Present (Encapsulating_State (Input_Id))
-then
-   Append_New_Elmt (Input_Id, Constits_Seen);
-end if;
+ if Ekind_In (Input_Id, E_Abstract_State,
+E_Constant,
+E_Variable)
+   and then Present (Encapsulating_State (Input_Id))
+ then
+Append_New_Elmt (Input_Id, Constits_Seen);
  end if;
 
   --  The input references something that is not a state or an
Index: ../testsuite/gnat.dg/initializes.adb
===
--- ../testsuite/gnat.dg/initializes.adb(revision 0)
+++ ../testsuite/gnat.dg/initializes.adb(revision 0)
@@ -0,0 +1,33 @@
+--  { dg-do compile }
+
+package body Initializes is
+   protected body PO is
+  procedure Proc is
+ package Inner with Initializes => (Y => PO) is  --  OK
+Y : Boolean := X;
+ end Inner;
+
+ procedure Nested with Global => PO is   --  OK
+ begin
+null;
+ end Nested;
+  begin
+ Nested;
+  end Proc;
+   end PO;
+
+   protected body PT is
+  procedure Proc is
+ package Inner with Initializes => (Y => PT) is  --  OK
+Y : Boolean := X;
+ end Inner;
+
+ procedure Nested with Global => PT is   --  OK
+ begin
+null;
+ end Nested;
+  begin
+ Nested;
+  end Proc;
+   end PT;
+end Initializes;
Index: ../testsuite/gnat.dg/initializes.ads
===
-

[Ada] Spurious 'W' ALI line due to implicit with clause

2017-12-15 Thread Pierre-Marie de Rodat
This patch "fixes" an issue where an implicit with clause generated to emulate
an implicit Elaborate[_All] pragma appears on a 'W' line in the ALI file. As a
result, the 'W' line may introduce a spurious build dependency in GPRbuild.


-- Source --


--  func.ads

function Func return Boolean;

--  func.adb

function Func return Boolean is begin return True; end Func;

--  gen.ads

generic
package Gen is
   procedure Force_Body;
end Gen;

--  gen.adb

with Func;

package body Gen is
   Val : constant Boolean := Func;

   procedure Force_Body is begin null; end Force_Body;
end Gen;

--  pack.ads

with Gen;

package Pack is
   package Inst is new Gen;
end Pack;

--  main.adb

with Pack;

procedure Main is begin null; end Main;


-- Compilation and output --


$ gnatmake -q main.adb
$ grep -c "Z func" pack.ali
1

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Hristian Kirtchev  

* sem_elab.adb (Ensure_Prior_Elaboration_Static): Mark the generated
with clause as being implicit for an instantiation in order to
circumvent an issue with 'W' and 'Z' line encodings in ALI files.

Index: sem_elab.adb
===
--- sem_elab.adb(revision 255683)
+++ sem_elab.adb(working copy)
@@ -3585,6 +3585,16 @@
  Set_Implicit_With (Clause);
  Set_Library_Unit  (Clause, Unit_Cunit);
 
+ --  The following is a kludge to satisfy a GPRbuild requirement. In
+ --  general, internal with clauses should be encoded on a 'Z' line in
+ --  ALI files, but due to an old bug, they are encoded as source with
+ --  clauses on a 'W' line. As a result, these "semi-implicit" clauses
+ --  introduce spurious build dependencies in GPRbuild. The only way to
+ --  eliminate this effect is to mark the implicit clauses as generated
+ --  for an instantiation.
+
+ Set_Implicit_With_From_Instantiation (Clause);
+
  Append_To (Items, Clause);
   end if;
 


[Ada] Spurious error on equality operator on incomplete type

2017-12-15 Thread Pierre-Marie de Rodat
This patch fixes a spurious error on a declaration for an equality
operator whose operands have an incomplete type, when the same declarative
oart includes another such equality operator on another incomplete type which
is used as an actual in an earlier instantiation.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Ed Schonberg  

* sem_ch6.adb (Conformking_Types): Two incomplete types are conforming
when one of them is used as a generic actual, but only within an
instantiation.
* einfo.ads: Clarify use of flag Used_As_Generic_Actual.

gcc/testsuite/

2017-12-15  Ed Schonberg  

* gnat.dg/incomplete6.adb, gnat.dg/incomplete6.ads
Index: einfo.ads
===
--- einfo.ads   (revision 255690)
+++ einfo.ads   (working copy)
@@ -4583,7 +4583,9 @@
 
 --Used_As_Generic_Actual (Flag222)
 --   Defined in all entities, set if the entity is used as an argument to
---   a generic instantiation. Used to tune certain warning messages.
+--   a generic instantiation. Used to tune certain warning messages, and
+--   in checking type conformance within an instantiation that involves
+--   incomplete formal and actual types.
 
 --Uses_Lock_Free (Flag188)
 --   Defined in protected type entities. Set to True when the Lock Free
Index: sem_ch6.adb
===
--- sem_ch6.adb (revision 255693)
+++ sem_ch6.adb (working copy)
@@ -7666,10 +7666,12 @@
  return True;
 
   --  In Ada 2012, incomplete types (including limited views) can appear
-  --  as actuals in instantiations.
+  --  as actuals in instantiations, where they are conformant to the
+  --  corresponding incomplete formal.
 
   elsif Is_Incomplete_Type (Type_1)
 and then Is_Incomplete_Type (Type_2)
+and then In_Instance
 and then (Used_As_Generic_Actual (Type_1)
or else Used_As_Generic_Actual (Type_2))
   then
Index: ../testsuite/gnat.dg/incomplete6.adb
===
--- ../testsuite/gnat.dg/incomplete6.adb(revision 0)
+++ ../testsuite/gnat.dg/incomplete6.adb(revision 0)
@@ -0,0 +1,15 @@
+--  { dg-do compile }
+
+package body Incomplete6 is
+
+   function "=" (Left, Right : Vint) return Boolean is
+   begin
+  return Left.Value = Right.Value;
+   end;
+   
+   function "=" (Left, Right : Vfloat) return Boolean is
+   begin
+  return Left.Value = Right.Value;
+   end;
+
+end;
Index: ../testsuite/gnat.dg/incomplete6.ads
===
--- ../testsuite/gnat.dg/incomplete6.ads(revision 0)
+++ ../testsuite/gnat.dg/incomplete6.ads(revision 0)
@@ -0,0 +1,22 @@
+with Ada.Unchecked_Conversion;
+
+package Incomplete6 is
+   
+   type Vint;
+   function "=" (Left, Right : Vint) return Boolean;
+
+   type Vint is record
+  Value : Integer;
+   end record;
+
+   function To_Integer is new 
+ Ada.Unchecked_Conversion(Source => Vint, Target => Integer);
+   
+   type Vfloat;
+   function "=" (Left, Right : in Vfloat) return Boolean;
+
+   type Vfloat is record
+  Value : Float;
+   end record;
+
+end;


[Ada] Spurious alias error on access to array indexed by non-standard enum

2017-12-15 Thread Pierre-Marie de Rodat
This patch prevents the propagation of spurious errors about the prefix of
access being non-aliased when getting the access to an array indexed by an
enumeration with a custom representation.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

2017-12-15  Justin Squirek  

* sem_attr.adb (Resolve_Attribute): Modify check for aliased view on
prefix to use the prefix's original node to avoid looking at expanded
conversions for certain array types.

gcc/testsuite/

2017-12-15  Justin Squirek  

* gnat.dg/aliasing4.adb: New testcase.
Index: sem_attr.adb
===
--- sem_attr.adb(revision 255678)
+++ sem_attr.adb(working copy)
@@ -1,7 +1,7 @@
   and then not (Nkind (P) = N_Selected_Component
  and then
Is_Overloadable (Entity (Selector_Name (P
-  and then not Is_Aliased_View (P)
+  and then not Is_Aliased_View (Original_Node (P))
   and then not In_Instance
   and then not In_Inlined_Body
   and then Comes_From_Source (N)


[Ada] Added warning on membership tests

2017-12-15 Thread Pierre-Marie de Rodat
RM 4.5.3 (28) specifies that (except for records and limited types) a
membership operation uses the predefined equality, regardless of whether
user-defined equality for the type is available. This can be confusing
and deserves a new warning.

Compiling code.adb must yield:

  code.adb:19:42: warning: membership test on "Var" uses predefined equality
  code.adb:19:42: warning: even if user-defined equality exists
  (RM 4.5.2 (28.1/3)

--
with Ada.Characters.Handling;
with Ada.Text_IO; use Ada.Text_IO;
procedure Code is
   type Var is new Character;

   function "=" (C1, C2 : Var) return Boolean;

   function "=" (C1, C2 : Var) return Boolean is
  use Ada.Characters.Handling;
   begin
  return To_Lower (Character (C1)) = To_Lower (Character (C2));
   end "=";

   V : Var := 'A';

begin
   Put_Line ("equal " & Boolean'Image (V = 'a'));

   Put_Line ("in" & Boolean'Image (V in 'a' | 'o'));
end Code;

Tested on x86_64-pc-linux-gnu, committed on trunk

2017-12-15  Ed Schonberg  

* sem_res.adb (Resolve_Membership_Op): Add warning on a membership
operation on a scalar type for which there is a user-defined equality
operator.

Index: sem_res.adb
===
--- sem_res.adb (revision 255694)
+++ sem_res.adb (working copy)
@@ -9086,6 +9086,21 @@
end loop;
 end;
  end if;
+
+ --  RM 4.5.2 (28.1/3) specifies that for types other than records or
+ --  limited types, evaluation of a membership test uses the predefined
+ --  equality for the type. This may be confusing to users, and the
+ --  following warning appears useful for the most common case.
+
+ if Is_Scalar_Type (Ltyp)
+   and then Present (Get_User_Defined_Eq (Ltyp))
+ then
+Error_Msg_NE
+  ("membership test on& uses predefined equality?", N, Ltyp);
+Error_Msg_N
+  ("\even if user-defined equality exists (RM 4.5.2 (28.1/3)?", N);
+ end if;
+
   end Resolve_Set_Membership;
 
--  Start of processing for Resolve_Membership_Op


[PATCH] Swap affects_type_identity and handler fields in attribute_spec

2017-12-15 Thread Jakub Jelinek
Hi!

As I said earlier, I'd like to take the opportunity that Martin has added
new field into attribute_spec and all out of tree FEs and backends need
adjustment anyway to swap the affects_type_identity and handler fields.
Previously we had:
const char *, 2x int, 3x bool, 1x fnptr, 1x bool, 1x exclusions *
fields in the structure, so 48 bytes on LP64 and 28 bytes on ILP32 hosts,
with the patch we have:
const char *, 2x int, 4x bool, 1x fnptr, 1x exclusions *
so 40 bytes on LP64 and 24 bytes on ILP32 hosts.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-15  Jakub Jelinek  

* tree-core.h (struct attribute_spec): Swap affects_type_identity and
handler fields.
* config/alpha/alpha.c (vms_attribute_table): Swap
affects_type_identity and handler fields, adjust comments.
* config/mips/mips.c (mips_attribute_table): Likewise.
* config/visium/visium.c (visium_attribute_table): Likewise.
* config/epiphany/epiphany.c (epiphany_attribute_table): Likewise.
* config/microblaze/microblaze.c (microblaze_attribute_table):
Likewise.
* config/spu/spu.c (spu_attribute_table): Likewise.
* config/mcore/mcore.c (mcore_attribute_table): Likewise.
* config/arc/arc.c (arc_attribute_table): Likewise.
* config/m68k/m68k.c (m68k_attribute_table): Likewise.
* config/v850/v850.c (v850_handle_interrupt_attribute,
v850_handle_data_area_attribute): Formatting fixes.
(v850_attribute_table): Swap affects_type_identity and handler
fields, adjust comments.
* config/m32r/m32r.c (m32r_attribute_table): Likewise.
* config/arm/arm.c (arm_attribute_table): Likewise.
* config/avr/avr.c (avr_attribute_table): Likewise.
* config/s390/s390.c (s390_attribute_table): Likewise.
* config/sh/sh.c (sh_attribute_table): Likewise.
* config/i386/i386.c (ix86_handle_cconv_attribute,
ix86_handle_callee_pop_aggregate_return): Formatting fixes.
(ix86_attribute_table): Swap affects_type_identity and handler
fields, adjust comments.
* config/i386/cygming.h (SUBTARGET_ATTRIBUTE_TABLE): Likewise.
* config/sparc/sparc.c (sparc_attribute_table): Likewise.
* config/m32c/m32c.c (m32c_attribute_table): Likewise.
* config/sol2.h (SOLARIS_ATTRIBUTE_TABLE): Likewise.
* config/ia64/ia64.c (ia64_attribute_table): Likewise.
* config/msp430/msp430.c (msp430_attribute_table): Likewise.
* config/rx/rx.c (rx_attribute_table): Likewise.
* config/cr16/cr16.c (cr16_attribute_table): Likewise.
* config/h8300/h8300.c (h8300_attribute_table): Likewise.
* config/nvptx/nvptx.c (nvptx_attribute_table): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_attribute_table): Likewise.
* config/darwin.h (SUBTARGET_ATTRIBUTE_TABLE): Likewise.
* config/stormy16/stormy16.c (xstormy16_attribute_table): Likewise.
* config/bfin/bfin.c (bfin_attribute_table): Likewise.
* config/rs6000/rs6000.c (rs6000_attribute_table): Likewise.
* config/rl78/rl78.c (rl78_attribute_table): Likewise.
* config/nds32/nds32.c (nds32_attribute_table): Likewise.
* doc/plugins.texi (user_attr): Likewise.  Add NULL for
exclude.
* attribs.c (empty_attribute_table): Swap affects_type_identity and
handler fields.
(register_scoped_attributes, decl_attributes): Formatting fixes.
ada/
* gcc-interface/utils.c (gnat_internal_attribute_table): Swap
affects_type_identity and handler fields, adjust comments.
brig/
* brig-lang.c (brig_attribute_table): Swap affects_type_identity
and handler fields, adjust comments.
c-family/
* c-attribs.c (c_common_attribute_table,
c_common_format_attribute_table): Swap affects_type_identity
and handler fields, adjust comments.
cp/
* tree.c (cxx_attribute_table, std_attribute_table): Swap
affects_type_identity and handler fields, adjust comments.
fortran/
* f95-lang.c (gfc_attribute_table): Swap affects_type_identity
and handler fields, adjust comments.
lto/
* lto-lang.c (lto_attribute_table, lto_format_attribute_table): Swap
affects_type_identity and handler fields, adjust comments.
testsuite/
* g++.dg/plugin/attribute_plugin.c (user_attr): Swap
affects_type_identity and handler fields, add NULL for exclude.

--- gcc/tree-core.h.jj  2017-12-08 00:50:28.0 +0100
+++ gcc/tree-core.h 2017-12-15 12:13:01.413612746 +0100
@@ -1929,6 +1929,8 @@ struct attribute_spec {
  and from a function return type (which is not itself a function
  pointer type) to the function type.  */
   bool function_type_required;
+  /* Specifies if attribute affects type's identity.  */
+  bool affects_type_identity;
   /* Function to handle this attribute.  NODE points to the node to which
  

[PR 81616] Deferring FMA transformations in tight loops

2017-12-15 Thread Martin Jambor

Hello,

the patch below prevents creation if fused-multiply-and-add instructions
in the widening_mul gimple pass on the Zen-based AMD CPUs and as a
result fixes regressions of native znver1 tuning when compared to
generic tuning in:

  - the matrix.c testcase of PR 81616 (straightforward matrix
multiplication) at -O2 and -O3 which is currently 60% (!),

  - SPEC 2006 454.calculix at -O2, which is currently over 20%, and

  - SPEC 2017 510.parest at -O2 and -Ofast, which is currently also
about 20% in both cases.

The basic idea is to detect loops in the following form:


# accumulator_111 = PHI <0.0(5), accumulator_66(6)>
...
_65 = _14 * _16;
accumulator_66 = _65 + accumulator_111;

and prevents from creating FMA for it.  Because at least in the parest
and calculix cases it has to, it also deals with more than one chain of
FMA candidates that feed the next one's addend:



# accumulator_111 = PHI <0.0(5), accumulator_66(6)>
...
_65 = _14 * _16;
accumulator_55 = _65 + accumulator_111;
_65 = _24 * _36;
accumulator_66 = _65 + accumulator_55;

Unfortunately, to really get rid of the calculix regression, the
algorithm cannot just look at one BB at a time but also has to work for
cases like the following:

 1  void mult(void)
 2  {
 3  int i, j, k, l;
 4  
 5 for(i=0; ic[i][j] += p1->a[i][k] * p1->b[k][j];
12  p2->c[i][j] += p2->a[i][k] * p2->b[k][j];
13   }
14}
15 }
16  }

I suppose that the best optimization for the above would be to split the
loops, but one could probably construct at least an artificial testcase
where the FMAs would keep enough locality that it is not the case.  The
mechanism can be easily extended to keep track of not just one chain but
a few, preferably as a followup, if people think it makes sense.

An interesting observation is that the matrix multiplication does not
suffer the penalty when compiled with -O3 -mprefer-vector-width=256.
Apparently the 256 vector processing can hide the latency penalty when
internally it is split into two halves.  The same goes for 512 bit
vectors.  That is why the patch leaves those be - well, there is a param
for the threshold which is set to zero for everybody but znver1.  If
maintainers of any other architecture suspect that their FMAs might
suffer similar latency problem, they can easily try tweaking that
parameter and see what happens with the matrix multiplication example.

I have bootstrapped and tested the patch on x86_64-linux (as it is and
also with the param set to a 256 by default to make it trigger).  I have
also measured run-times of all benchmarks in SPEC 2006 FP and SPEC 2017
FPrate and the only changes are the big improvements of calculix and
parest.

After I address any comments and/or suggestions, would it be OK for
trunk?

Thanks,

Martin


2017-12-13  Martin Jambor  

PR target/81616
* params.def: New parameter PARAM_AVOID_FMA_MAX_BITS.
* tree-ssa-math-opts.c: Include domwalk.h.
(convert_mult_to_fma_1): New function.
(fma_transformation_info): New type.
(fma_deferring_state): Likewise.
(cancel_fma_deferring): New function.
(result_of_phi): Likewise.
(last_fma_candidate_feeds_initial_phi): Likewise.
(convert_mult_to_fma): Added deferring logic, split actual
transformation to convert_mult_to_fma_1.
(math_opts_dom_walker): New type.
(math_opts_dom_walker::after_dom_children): New method, body moved
here from pass_optimize_widening_mul::execute, added deferring logic
bits.
(pass_optimize_widening_mul::execute): Moved most of code to
math_opts_dom_walker::after_dom_children.
* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): New.
* config/i386/i386.c (ix86_option_override_internal): Added
maybe_setting of PARAM_AVOID_FMA_MAX_BITS.
---
 gcc/config/i386/i386.c   |   5 +
 gcc/config/i386/x86-tune.def |   4 +
 gcc/params.def   |   5 +
 gcc/tree-ssa-math-opts.c | 521 ---
 4 files changed, 407 insertions(+), 128 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e323102cef5..224544fe04f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4888,6 +4888,11 @@ ix86_option_override_internal (bool main_args_p,
(cf_protection_level) (opts->x_flag_cf_protection | CF_SET);
 }
 
+  if (ix86_tune_features [X86_TUNE_AVOID_128FMA_CHAINS])
+maybe_set_param_value (PARAM_AVOID_FMA_MAX_BITS, 128,
+  opts->x_param_values,
+  opts_set->x_param_values);
+
   return true;
 }
 
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 25f28e3cfc1..1b6f5f8816b 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -399,6 +399,10 @@ DEF_TUNE (X86_TUNE_SLOW_P

Re: [PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2017-12-15 Thread Jason Merrill
On Thu, Dec 14, 2017 at 2:25 PM, David Malcolm  wrote:
> On Mon, 2017-12-11 at 21:10 -0500, Jason Merrill wrote:
>> On 11/10/2017 04:45 PM, David Malcolm wrote:
>> > The initial version of the patch kit added location wrapper nodes
>> > around constants and uses-of-declarations, along with some other
>> > places in the parser (typeid, alignof, sizeof, offsetof).
>> >
>> > This version takes a much more minimal approach: it only adds
>> > location wrapper nodes around the arguments at callsites, thus
>> > not adding wrapper nodes around uses of constants and decls in
>> > other
>> > locations.
>> >
>> > It keeps them for the other places in the parser (typeid, alignof,
>> > sizeof, offsetof).
>> >
>> > In addition, for now, each site that adds wrapper nodes is guarded
>> > with !processing_template_decl, suppressing the creation of wrapper
>> > nodes when processing template declarations.  This is to simplify
>> > the patch kit so that we don't have to support wrapper nodes during
>> > template expansion.
>>
>> Hmm, it should be easy to support them, since NON_LVALUE_EXPR and
>> VIEW_CONVERT_EXPR don't otherwise appear in template trees.
>>
>> Jason
>
> I don't know if it's "easy"; it's at least non-trivial.
>
> I attempted to support them in the obvious way by adding the two codes
> to the switch statement tsubst_copy, reusing the case used by NOP_EXPR
> and others, but ran into a issue when dealing with template parameter
> packs.

> Attached is the reproducer I've been testing with (minimized using
> "delta" from a stdlib reproducer); my code was failing with:
>
> ../../src/cp-stdlib.ii: In instantiation of ‘struct 
> allocator_traits >’:
> ../../src/cp-stdlib.ii:31:8:   required from ‘struct 
> __alloc_traits, char>’
> ../../src/cp-stdlib.ii:43:75:   required from ‘class basic_string allocator >’
> ../../src/cp-stdlib.ii:47:58:   required from here
> ../../src/cp-stdlib.ii:27:55: sorry, unimplemented: use of 
> ‘type_pack_expansion’ in template
>  -> decltype(_S_construct(__a, __p, forward<_Args>(__args)...))  {   }
>^~
>
> The issue is that normally "__args" would be a PARM_DECL of type
> TYPE_PACK_EXPANSION, and that's handled by tsubst_decl, but on adding a
> wrapper node we now have a VIEW_CONVERT_EXPR of the same type i.e.
> TYPE_PACK_EXPANSION wrapping the PARM_DECL.
>
> When tsubst traverses the tree, the VIEW_CONVERT_EXPR is reached first,
> and it attempts to substitute the type TYPE_PACK_EXPANSION, which leads
> to the "sorry".
>
> If I understand things right, during substitution, only tsubst_decl on
> PARM_DECL can handle nodes with type with code TYPE_PACK_EXPANSION.
>
> The simplest approach seems to be to not create wrapper nodes for decls
> of type TYPE_PACK_EXPANSION, and that seems to fix the issue.

That does seem simplest.

> Alternatively I can handle TYPE_PACK_EXPANSION for VIEW_CONVERT_EXPR in
> tsubst by remapping the type to that of what they wrap after
> substitution; doing so also fixes the issue.

This will be more correct.  For the wrappers you don't need all the
handling that we currently have for NOP_EXPR and such; since we know
they don't change the type, we can substitute what they wrap, and then
rewrap the result.

Jason


Re: [PR C++/59930] template friend classes & default args

2017-12-15 Thread Nathan Sidwell

On 12/14/2017 02:31 PM, Nathan Sidwell wrote:
PR 59930 concerns some problems with templated friend classes (of 
templates).  In rying to clean up our handling, I discovered we were 
accepting default args of such things.  This is ill formed


[temp.param]/12 'A default template-argument shall not be specified in a 
friend class template declaration.'


This patch addresses that problem by extending check_default_tmpl_args 
to deal with such friends.


I realized we don't have to check default args here in the friend 
FUNCTION_DECL case, because we do that later, when we know whether it's 
a definition, introducing decl or redeclaration.  This makes it a little 
clearer as to why we don't tell check_default_tmpl_args it's a friend 
function.


nathan
--
Nathan Sidwell
2017-12-15  Nathan Sidwell  

	PR c++/59930
	* decl.c (xref_tag_1): Correct comments about template friends and
	default args.
	* friend.c (make_friend_class): Move comments concerning
	self-friendliness to code dealing with such.
	* pt.c (check_default_tmpl_args): Deal with template friend
	classes too.
	(push_template_decl_real): Check default args for non-function
	template friends.

	PR c++/59930
	* g++.dg/cpp0x/temp_default4.C: Adjust diagnostic.
	* g++.old-deja/g++.pt/friend23.C: Likewise.
	* g++.old-deja/g++.pt/friend24.C: Delete.

Index: cp/decl.c
===
--- cp/decl.c	(revision 255691)
+++ cp/decl.c	(working copy)
@@ -13538,37 +13538,28 @@ xref_tag_1 (enum tag_types tag_code, tre
 	 processing a (member) template declaration of a template
 	 class, we must be very careful; consider:
 
-	   template 
-	   struct S1
+	   template  struct S1
 
-	   template 
-	   struct S2
-	   { template 
-	   friend struct S1; };
+	   template  struct S2
+	   {
+	 template  friend struct S1;
+	   };
 
 	 Here, the S2::S1 declaration should not be confused with the
 	 outer declaration.  In particular, the inner version should
-	 have a template parameter of level 2, not level 1.  This
-	 would be particularly important if the member declaration
-	 were instead:
-
-	   template  friend struct S1;
-
-	 say, when we should tsubst into `U' when instantiating
-	 S2.  On the other hand, when presented with:
-
-	   template 
-	   struct S1 {
-	 template 
-	 struct S2 {};
-	 template 
-	 friend struct S2;
+	 have a template parameter of level 2, not level 1.
+
+	 On the other hand, when presented with:
+
+	   template  struct S1
+	   {
+	 template  struct S2 {};
+	 template  friend struct S2;
 	   };
 
-	 we must find the inner binding eventually.  We
-	 accomplish this by making sure that the new type we
-	 create to represent this declaration has the right
-	 TYPE_CONTEXT.  */
+	 the friend must find S1::S2 eventually.  We accomplish this
+	 by making sure that the new type we create to represent this
+	 declaration has the right TYPE_CONTEXT.  */
   context = TYPE_CONTEXT (t);
   t = NULL_TREE;
 }
@@ -13622,9 +13613,10 @@ xref_tag_1 (enum tag_types tag_code, tre
 	  return error_mark_node;
 	}
 
-  /* Make injected friend class visible.  */
   if (scope != ts_within_enclosing_non_class && TYPE_HIDDEN_P (t))
 	{
+	  /* This is no longer an invisible friend.  Make it
+	 visible.  */
 	  tree decl = TYPE_NAME (t);
 
 	  DECL_ANTICIPATED (decl) = false;
Index: cp/friend.c
===
--- cp/friend.c	(revision 255691)
+++ cp/friend.c	(working copy)
@@ -283,21 +283,18 @@ make_friend_class (tree type, tree frien
 return;
 
   if (friend_depth)
-/* If the TYPE is a template then it makes sense for it to be
-   friends with itself; this means that each instantiation is
-   friends with all other instantiations.  */
 {
+  /* [temp.friend] Friend declarations shall not declare partial
+	 specializations.  */
   if (CLASS_TYPE_P (friend_type)
 	  && CLASSTYPE_TEMPLATE_SPECIALIZATION (friend_type)
 	  && uses_template_parms (friend_type))
 	{
-	  /* [temp.friend]
-	 Friend declarations shall not declare partial
-	 specializations.  */
 	  error ("partial specialization %qT declared %",
 		 friend_type);
 	  return;
 	}
+
   if (TYPE_TEMPLATE_INFO (friend_type)
 	  && !PRIMARY_TEMPLATE_P (TYPE_TI_TEMPLATE (friend_type)))
 	{
@@ -311,7 +308,11 @@ make_friend_class (tree type, tree frien
 	  return;
 	}
 }
-  else if (same_type_p (type, friend_type))
+
+  /* It makes sense for a template class to be friends with itself,
+ that means the instantiations can be friendly.  Other cases are
+ not so meaningful.  */
+  if (!friend_depth && same_type_p (type, friend_type))
 {
   if (complain)
 	warning (0, "class %qT is implicitly friends with itself",
Index: cp/pt.c
===
--- cp/pt.c	(revision 255691)
+++ cp/pt.c	(working copy)
@@ -4980,9 +4980,10 @@ fixed_parameter_pack_p (tree parm)
a pr

Re: [PATCH] Swap affects_type_identity and handler fields in attribute_spec

2017-12-15 Thread Jason Merrill
On Fri, Dec 15, 2017 at 9:18 AM, Jakub Jelinek  wrote:
> Hi!
>
> As I said earlier, I'd like to take the opportunity that Martin has added
> new field into attribute_spec and all out of tree FEs and backends need
> adjustment anyway to swap the affects_type_identity and handler fields.
> Previously we had:
> const char *, 2x int, 3x bool, 1x fnptr, 1x bool, 1x exclusions *
> fields in the structure, so 48 bytes on LP64 and 28 bytes on ILP32 hosts,
> with the patch we have:
> const char *, 2x int, 4x bool, 1x fnptr, 1x exclusions *
> so 40 bytes on LP64 and 24 bytes on ILP32 hosts.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.


Re: [14/nn] Add helpers for shift count modes

2017-12-15 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Dec 15, 2017 at 1:48 AM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>  wrote:
>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>  wrote:
>>> This patch adds a stub helper routine to provide the mode
>>> of a scalar shift amount, given the mode of the values
>>> being shifted.
>>>
>>> One long-standing problem has been to decide what this mode
>>> should be for arbitrary rtxes (as opposed to those directly
>>> tied to a target pattern).  Is it the mode of the shifted
>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>> the corresponding target pattern says?  (In which case what
>>> should the mode be when the target doesn't have a pattern?)
>>>
>>> For now the patch picks word_mode, which should be safe on
>>> all targets but could perhaps become suboptimal if the helper
>>> routine is used more often than it is in this patch.  As it
>>> stands the patch does not change the generated code.
>>>
>>> The patch also adds a helper function that constructs rtxes
>>> for constant shift amounts, again given the mode of the value
>>> being shifted.  As well as helping with the SVE patches, this
>>> is one step towards allowing CONST_INTs to have a real mode.
>>
>> I think gen_shift_amount_mode is flawed and while encapsulating
>> constant shift amount RTX generation into a gen_int_shift_amount
>> looks good to me I'd rather have that ??? in this function (and
>> I'd use the mode of the RTX shifted, not word_mode...).

 OK.  I'd gone for word_mode because that's what expand_binop uses
 for CONST_INTs:

   op1_mode = (GET_MODE (op1) != VOIDmode
   ? as_a  (GET_MODE (op1))
   : word_mode);

 But using the inner mode should be fine too.  The patch below does that.

>> In the end it's up to insn recognizing to convert the op to the
>> expected mode and for generic RTL it's us that should decide
>> on the mode -- on GENERIC the shift amount has to be an
>> integer so why not simply use a mode that is large enough to
>> make the constant fit?

 ...but I can do that instead if you think it's better.

>> Just throwing in some comments here, RTL isn't my primary
>> expertise.
>
> To add a little bit - shift amounts is maybe the only(?) place
> where a modeless CONST_INT makes sense!  So "fixing"
> that first sounds backwards.

 But even here they have a mode conceptually, since out-of-range shift
 amounts are target-defined rather than undefined.  E.g. if the target
 interprets the shift amount as unsigned, then for a shift amount
 (const_int -1) it matters whether the mode is QImode (and so we're
 shifting by 255) or HImode (and so we're shifting by 65535.
>>>
>>> I think RTL is well-defined (at least I hope so ...) and machine constraints
>>> need to be modeled explicitely (like embedding an implicit bit_and in
>>> shift patterns).
>>
>> Well, RTL is well-defined in the sense that if you have
>>
>>   (ashift X (foo:HI ...))
>>
>> then the shift amount must be interpreted as HImode rather than some
>> other mode.  The problem here is to define a default choice of mode for
>> const_ints, in cases where the shift is being created out of the blue.
>>
>> Whether the shift amount is effectively signed or unsigned isn't defined
>> by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
>> out-of-range values, and the behaviour for out-of-range RTL shifts is
>> specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.
>>
>> I think the revised patch does implement your suggestion of using the
>> integer equivalent of the inner mode as the default, but we need to
>> decide whether to go with it, go with the original word_mode approach
>> (taken from existing expand_binop code) or something else.  Something
>> else could include the widest supported integer mode, so that we never
>> change the value.
>
> I guess it's pretty arbitrary what we choose (but we might need to adjust
> targets?).  For something like this an appealing choice would be sth
> that is host and target idependent, like [u]int32_t or given CONST_INT
> is always 64bits now and signed int64_t aka HOST_WIDE_INT (bad
> name now).  That means it's the "infinite precision" thing that fits
> into CONST_INT ;)

Sounds OK to me.  How about the attached?

Thanks,
Richard


2017-12-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN

RE: [compare-debug] use call loc for nop_endbr

2017-12-15 Thread Tsimbalist, Igor V
> -Original Message-
> From: Alexandre Oliva [mailto:aol...@redhat.com]
> Sent: Thursday, December 14, 2017 7:37 PM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [compare-debug] use call loc for nop_endbr
> 
> On Dec 14, 2017, "Tsimbalist, Igor V"  wrote:
> 
> >> Regstrapping with -fcompare-debug on stage3 host and target builds on
> >> x86_64- and i686-linux-gnu; ok to install?
> 
> > Ok from me.
> 
> Thanks, I went ahead and installed it.
> 
> > Am I correct the error you had was related to improper location
> information,
> 
> Yeah, only location information.
> 
> > I will try to skip NOTE insns only.
> 
> You probably want to skip debug insns and notes, too.  Actually, IIRC
> you insert these insns after var-tracking, so you probably only have to
> deal with notes.  You don't have to, but if bindings are intended to
> take effect right after the call, it would probably be nice if they
> still did so, e.g., even if you happen to single-step out of the call
> and stop at the nop_endbr insn.
Yes, I expect this behavior.

> BTW, is this the subject of a Cauldron 2017 talk in which I raised an
> issue about PLT entries possibly needing special opcodes to enable them
> to be used as call targets or somesuch?  I had initially retracted my
> question, when it was stated that only indirect calls needed special
> treatment, but later I realized that in some cases PLT entries *are*
> used as function addresses even for functions that have their addresses
> taken.  Please let me know if you're familiar with the issue and would
> like me to detail the problem.
Please give more info. I do not remember all details but PLT entries
were changes to have endbr instruction (if this is relevant to your question :).
HJ did this.

Thanks,
Igor

> --
> Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [001/nnn] poly_int: add poly-int.h

2017-12-15 Thread Jeff Law
On 12/15/2017 02:08 AM, Richard Biener wrote:
> On Fri, Dec 15, 2017 at 4:40 AM, Martin Sebor  wrote:
>> On 12/07/2017 03:48 PM, Jeff Law wrote:
>>>
>>> On 12/07/2017 03:38 PM, Richard Sandiford wrote:
>>>
> So I think that's the final ack on this series.


 Thanks to both of you, really appreciate it!
>>>
>>> Sorry it took so long.
>>>

> Richard S. can you confirm?  I fully expect the trunk has moved some
> and the patches will need adjustments -- consider adjustments which
> work in a manner similar to the patches to date pre-approved.


 Yeah, that's now all of the poly_int patches.  I still owe you replies
 to some of them -- I'll get to that as soon as I can.
>>>
>>> NP.  I don't think any of the questions were all that significant.
>>> Those which were I think you already responded to.
>>
>>
>> I am disappointed that the no-op ctor issue hasn't been adequately
>> addressed.  No numbers were presented as to the difference it makes
>> to have the ctor do the expected thing (i.e., initialize the object).
>> In my view, the choice seems arbitrarily in favor of a hypothetical
>> performance improvement at -O0 without regard to the impact on
>> correctness.  We have recently seen the adverse effects of similar
>> choices in other areas: the hash table insertion[*] and the related
>> offset_int initialization.
> 
> As were coming from a C code base not initializing stuff is what I expect.
> I'm really surprised to see lot of default initialization done in places
> where it only hurts compile-time (of GCC at least where we need to
> optimize that away).
I suspect a lot of the default initializations were done when Kaveh and
others were working to get us -Wuninitialized clean -- which happened
when uninitialized warnings were still done in RTL (flow.c).

I've long wished we had marked the initializations which were done
solely to make -Wuninitialized happy because it would be a good way to
measure progress on our analysis & optimization passes's ability to
prove the paths weren't executable.

WRT the nop ctor issue, I had a slight leaning towards initializing
them, but I certainly could argue either side.  I think the long term
goal really should be to move to C++11 where it can be done right.

jeff


Re: [002/nnn] poly_int: IN_TARGET_CODE

2017-12-15 Thread Jeff Law
On 12/14/2017 06:08 PM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 10/23/2017 10:58 AM, Richard Sandiford wrote:
>>> This patch makes each target-specific TU define an IN_TARGET_CODE macro,
>>> which is used to decide whether poly_int<1, C> should convert to C.
>>>
>>>
>>> 2017-10-23  Richard Sandiford  
>>> Alan Hayward  
>>> David Sherwood  
>>>
>>> gcc/
>>> * genattrtab.c (write_header): Define IN_TARGET_CODE to 1 in the
>>> target C file.
>>> * genautomata.c (main): Likewise.
>>> * genconditions.c (write_header): Likewise.
>>> * genemit.c (main): Likewise.
>>> * genextract.c (print_header): Likewise.
>>> * genopinit.c (main): Likewise.
>>> * genoutput.c (output_prologue): Likewise.
>>> * genpeep.c (main): Likewise.
>>> * genpreds.c (write_insn_preds_c): Likewise.
>>> * genrecog.c (writer_header): Likewise.
>>> * config/aarch64/aarch64-builtins.c (IN_TARGET_CODE): Define.
>>> * config/aarch64/aarch64-c.c (IN_TARGET_CODE): Likewise.
>>> * config/aarch64/aarch64.c (IN_TARGET_CODE): Likewise.
>>> * config/aarch64/cortex-a57-fma-steering.c (IN_TARGET_CODE): Likewise.
>>> * config/aarch64/driver-aarch64.c (IN_TARGET_CODE): Likewise.
>>> * config/alpha/alpha.c (IN_TARGET_CODE): Likewise.
>>> * config/alpha/driver-alpha.c (IN_TARGET_CODE): Likewise.
>>> * config/arc/arc-c.c (IN_TARGET_CODE): Likewise.
>>> * config/arc/arc.c (IN_TARGET_CODE): Likewise.
>>> * config/arc/driver-arc.c (IN_TARGET_CODE): Likewise.
>>> * config/arm/aarch-common.c (IN_TARGET_CODE): Likewise.
>>> * config/arm/arm-builtins.c (IN_TARGET_CODE): Likewise.
>>> * config/arm/arm-c.c (IN_TARGET_CODE): Likewise.
>>> * config/arm/arm.c (IN_TARGET_CODE): Likewise.
>>> * config/arm/driver-arm.c (IN_TARGET_CODE): Likewise.
>>> * config/avr/avr-c.c (IN_TARGET_CODE): Likewise.
>>> * config/avr/avr-devices.c (IN_TARGET_CODE): Likewise.
>>> * config/avr/avr-log.c (IN_TARGET_CODE): Likewise.
>>> * config/avr/avr.c (IN_TARGET_CODE): Likewise.
>>> * config/avr/driver-avr.c (IN_TARGET_CODE): Likewise.
>>> * config/avr/gen-avr-mmcu-specs.c (IN_TARGET_CODE): Likewise.
>>> * config/bfin/bfin.c (IN_TARGET_CODE): Likewise.
>>> * config/c6x/c6x.c (IN_TARGET_CODE): Likewise.
>>> * config/cr16/cr16.c (IN_TARGET_CODE): Likewise.
>>> * config/cris/cris.c (IN_TARGET_CODE): Likewise.
>>> * config/darwin.c (IN_TARGET_CODE): Likewise.
>>> * config/epiphany/epiphany.c (IN_TARGET_CODE): Likewise.
>>> * config/epiphany/mode-switch-use.c (IN_TARGET_CODE): Likewise.
>>> * config/epiphany/resolve-sw-modes.c (IN_TARGET_CODE): Likewise.
>>> * config/fr30/fr30.c (IN_TARGET_CODE): Likewise.
>>> * config/frv/frv.c (IN_TARGET_CODE): Likewise.
>>> * config/ft32/ft32.c (IN_TARGET_CODE): Likewise.
>>> * config/h8300/h8300.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/djgpp.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/driver-i386.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/driver-mingw32.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/host-cygwin.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/host-i386-darwin.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/host-mingw32.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/i386-c.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/i386.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/intelmic-mkoffload.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/msformat-c.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/winnt-cxx.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/winnt-stubs.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/winnt.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/x86-tune-sched-atom.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/x86-tune-sched-bd.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/x86-tune-sched-core.c (IN_TARGET_CODE): Likewise.
>>> * config/i386/x86-tune-sched.c (IN_TARGET_CODE): Likewise.
>>> * config/ia64/ia64-c.c (IN_TARGET_CODE): Likewise.
>>> * config/ia64/ia64.c (IN_TARGET_CODE): Likewise.
>>> * config/iq2000/iq2000.c (IN_TARGET_CODE): Likewise.
>>> * config/lm32/lm32.c (IN_TARGET_CODE): Likewise.
>>> * config/m32c/m32c-pragma.c (IN_TARGET_CODE): Likewise.
>>> * config/m32c/m32c.c (IN_TARGET_CODE): Likewise.
>>> * config/m32r/m32r.c (IN_TARGET_CODE): Likewise.
>>> * config/m68k/m68k.c (IN_TARGET_CODE): Likewise.
>>> * config/mcore/mcore.c (IN_TARGET_CODE): Likewise.
>>> * config/microblaze/microblaze-c.c (IN_TARGET_CODE): Likewise.
>>> * config/microblaze/microblaze.c (IN_TARGET_CODE): Likewise.
>>> * config/mips/driver-native.c (IN_TARGET_CODE): Likewise.
>>> * config/mips/frame-header-opt.c (IN_TARGET_CODE): Likewise.
>>> * config/mips/mips.c (IN_TARGET_CODE): Likewise.
>>> * config/mmix/mmix.c (IN_TARGET_CODE): Likewise.
>>> * config/mn10300/mn10300.c (IN_TARGET_CODE): Likew

Re: [compare-debug] use call loc for nop_endbr

2017-12-15 Thread H.J. Lu
On Fri, Dec 15, 2017 at 7:17 AM, Tsimbalist, Igor V
 wrote:
>> -Original Message-
>> From: Alexandre Oliva [mailto:aol...@redhat.com]
>> Sent: Thursday, December 14, 2017 7:37 PM
>> To: Tsimbalist, Igor V 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [compare-debug] use call loc for nop_endbr
>>
>> On Dec 14, 2017, "Tsimbalist, Igor V"  wrote:
>>
>> >> Regstrapping with -fcompare-debug on stage3 host and target builds on
>> >> x86_64- and i686-linux-gnu; ok to install?
>>
>> > Ok from me.
>>
>> Thanks, I went ahead and installed it.
>>
>> > Am I correct the error you had was related to improper location
>> information,
>>
>> Yeah, only location information.
>>
>> > I will try to skip NOTE insns only.
>>
>> You probably want to skip debug insns and notes, too.  Actually, IIRC
>> you insert these insns after var-tracking, so you probably only have to
>> deal with notes.  You don't have to, but if bindings are intended to
>> take effect right after the call, it would probably be nice if they
>> still did so, e.g., even if you happen to single-step out of the call
>> and stop at the nop_endbr insn.
> Yes, I expect this behavior.
>
>> BTW, is this the subject of a Cauldron 2017 talk in which I raised an
>> issue about PLT entries possibly needing special opcodes to enable them
>> to be used as call targets or somesuch?  I had initially retracted my
>> question, when it was stated that only indirect calls needed special
>> treatment, but later I realized that in some cases PLT entries *are*
>> used as function addresses even for functions that have their addresses
>> taken.  Please let me know if you're familiar with the issue and would
>> like me to detail the problem.
> Please give more info. I do not remember all details but PLT entries
> were changes to have endbr instruction (if this is relevant to your question 
> :).
> HJ did this.

PLT is covered.  See x86 psABI for CET:

https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI


-- 
H.J.


Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Martin Sebor

On 12/15/2017 01:48 AM, Richard Biener wrote:

On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor  wrote:

On 12/14/2017 03:43 AM, Richard Biener wrote:


On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor  wrote:


On 12/12/2017 05:35 PM, Jeff Law wrote:



On 12/12/2017 01:15 PM, Martin Sebor wrote:



Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the result
of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.



What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in the realm
of undefined behavior at that point (I hope so)?




Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.



There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).



Right, and memcpy dereferences it, so it's undefined.


That's interpretation of the standard that I don't share.


It's not an interpretation.  It's a basic rule of the languages
that the standards are explicit about.  In C11 you will find
this specified in detail in 6.5.6, paragraph 7 and 8 (of
particular relevance to your question below is p7: "a pointer
to an object that is not an element of an array behaves the same
as a pointer to the first element of an array of length one.")


Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?


Correct.  See above.


Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?


No, not really.  It works in practice but it's not well-defined.
It doesn't matter how you get at the result.  What matters is
what you start with.  As Jeff said, to derive a pointer to
distinct suobjects of a larger object you need to start with
a pointer to the larger object and treat it as an array of
chars.


This means that pfu->x1 + 10 is a valid pointer
into *pfu no matter what you say and you can dereference it.


No.

As another hopefully more convincing example consider a multi-
dimensional array A[2][2].  The value of the offset of A[i][j]
is sizeof A[i] + j.  With that, the offset of A[1][0] is
sizeof A[1] + 0, and so would be the offset of A[0][2]. But
that doesn't make A[0][2] a valid reference to an element of
A (because A[0] has only two elements, A[0][0] and A[0][1]),
or &A[0] + 2 a derefernceable pointer.  It's a pointer that
points just past the last element of the array A[0].  That
there's another array right after A[0] (namely A[1]) is
immaterial, same as in the struct f example above.

Martin


Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Qing Zhao
Hi, Jakub,

thanks a lot for your detailed review.

> On Dec 14, 2017, at 2:45 PM, Jakub Jelinek  wrote:
> 
> On Thu, Dec 14, 2017 at 01:45:21PM -0600, Qing Zhao wrote:
>> 2017-12-11  Qing Zhao  mailto:qing.z...@oracle.com>>
> 
> No " " in ChangeLog entries please.

this is an error when I pasted this part from my terminal to mail editor, not 
in my real code.
will double check next time when sending out email.
> 
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
>> @@ -2541,6 +2541,198 @@ handle_builtin_memcmp (gimple_stmt_iterator *gsi)
>>   return false;
>> }
>> 
>> +/* Given an index to the strinfo vector, compute the string length for the
>> +   corresponding string. Return -1 when unknown.  */
>> + 
>> +static HOST_WIDE_INT 
>> +compute_string_length (int idx)
>> +{
>> +  HOST_WIDE_INT string_leni = -1; 
>> +  gcc_assert (idx != 0);
>> +
>> +  if (idx < 0)
>> +string_leni = ~idx;
>> +  else
>> +{
>> +  strinfo *si = get_strinfo (idx);
>> +  if (si)
>> +{
>> +  tree const_string_len = get_string_length (si);
>> +  string_leni
>> += (const_string_len && tree_fits_uhwi_p (const_string_len)
>> +   ? tree_to_uhwi(const_string_len) : -1); 
> 
> So, you are returning a signed HWI, then clearly tree_fits_uhwi_p and
> tree_to_uhwi are inappropriate, you should have used tree_fits_shwi_p
> and tree_to_shwi.  Space after function name is missing too.
> And, as you start by initializing string_leni to -1, there is no
> point to write it this way rather than
> if (const_string_len && tree_fits_shwi_p (const_string_len))
>   string_leni = tree_to_shwi (const_string_len);

originally it returned an unsigned HWI.   but later I changed it to return a 
signed one since I
need a negative value to represent the UNKNOWN state. 

I will fix this.
> 
>> +}
>> +}
> 
> Maybe also do
>  if (string_leni < 0)
>return -1;

Yes, this might be safer.
> 
>> +  return string_leni;
> 
> unless the callers just look for negative value as unusable.
> 
>> +  tree len = gimple_call_arg (stmt, 2);
>> +  if (tree_fits_uhwi_p (len))
>> +length = tree_to_uhwi (len);
> 
> Similarly to above, you are mixing signed and unsigned HWIs too much.

same reason as above :-),  I will fix this.

> 
>> +  if (gimple_code (ustmt) == GIMPLE_ASSIGN)
> 
>  if (is_gimple_assign (ustmt))
> 
> Usually we use use_stmt instead of ustmt.
Okay.
> 
>> +{
>> +  gassign *asgn = as_a  (ustmt);
> 
> No need for the gassign and ugly as_a, gimple_assign_rhs_code
> as well as gimple_assign_rhs2 can be called on gimple * too.

this part of the code I just copied from the routine “handle_builtin_memcpy” 
and no change.

I will change it as you suggested.


>> +  tree_code code = gimple_assign_rhs_code (asgn);
>> +  if ((code != EQ_EXPR && code != NE_EXPR)
>> +  || !integer_zerop (gimple_assign_rhs2 (asgn)))
>> +return true;
>> +}
>> +  else if (gimple_code (ustmt) == GIMPLE_COND)
>> +{
>> +  tree_code code = gimple_cond_code (ustmt);
>> +  if ((code != EQ_EXPR && code != NE_EXPR)
>> +  || !integer_zerop (gimple_cond_rhs (ustmt)))
>> +return true;
> 
> There is another case you are missing, assign stmt with
> gimple_assign_rhs_code COND_EXPR, where gimple_assign_rhs1 is
> tree with TREE_CODE EQ_EXPR or NE_EXPR with TREE_OPERAND (rhs1, 1)
> integer_zerop.

a little confused here:

in the current code:
. the first case is:  result = strcmp() != 0
. the second case is:if (strcmp() != 0)

so, the missing case you mentioned above is:

result = if (strcmp() != 0) 

or something else?
> 
>> +  /* When both arguments are known, and their strlens are unequal, we can 
>> + safely fold the call to a non-zero value for strcmp;
>> + othewise, do nothing now.  */
>> +  if (idx1 != 0 && idx2 != 0)
>> +{
>> +  HOST_WIDE_INT const_string_leni1 = -1;
>> +  HOST_WIDE_INT const_string_leni2 = -1;
>> +  const_string_leni1 = compute_string_length (idx1);
>> +  const_string_leni2 = compute_string_length (idx2);
> 
> Why do you initialize the vars when you immediately overwrite it?

just a habit to declare a variable with initialization :-).

> Just do
>  HOST_WIDE_INT const_string_leni1 = compute_string_length (idx1);

I can change it like this.
> etc.
> 
>> +  /* When one of args is constant string.  */
>> +  tree var_string;
>> +  HOST_WIDE_INT const_string_leni = -1;
>> +  
>> +  if (idx1)
>> +{
>> +  const_string_leni = compute_string_length (idx1);
>> +  var_string = arg2;
>> +} 
>> +  else if (idx2)
>> +{
>> +  const_string_leni = compute_string_length (idx2);
>> +  var_string = arg1;
>> +} 
> 
> Haven't you checked earlier that one of idx1 and idx2 is non-zero?

Yes.  

it’s guaranteed that  there is one and ONLY one of idx1 and idx2 is non-zero 
when getting here. 

> If so, then the el

Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Richard Biener
On December 15, 2017 4:58:14 PM GMT+01:00, Martin Sebor  
wrote:
>On 12/15/2017 01:48 AM, Richard Biener wrote:
>> On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor 
>wrote:
>>> On 12/14/2017 03:43 AM, Richard Biener wrote:

 On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor 
>wrote:
>
> On 12/12/2017 05:35 PM, Jeff Law wrote:
>>
>>
>> On 12/12/2017 01:15 PM, Martin Sebor wrote:
>>>
>>>
>>> Bug 83373 - False positive reported by -Wstringop-overflow, is
>>> another example of warning triggered by a missed optimization
>>> opportunity, this time in the strlen pass.  The optimization
>>> is discussed in pr78450 - strlen(s) return value can be assumed
>>> to be less than the size of s.  The gist of it is that the
>result
>>> of strlen(array) can be assumed to be less than the size of
>>> the array (except in the corner case of last struct members).
>>>
>>> To avoid the false positive the attached patch adds this
>>> optimization to the strlen pass.  Although the patch passes
>>> bootstrap and regression tests for all front-ends I'm not sure
>>> the way it determines the upper bound of the range is 100%
>>> correct for languages with arrays with a non-zero lower bound.
>>> Maybe it's just not as tight as it could be.
>>
>>
>> What about something hideous like
>>
>> struct fu {
>>   char x1[10];
>>   char x2[10];
>>   int avoid_trailing_array;
>> }
>>
>> Where objects stored in x1 are not null terminated.  Are we in
>the realm
>> of undefined behavior at that point (I hope so)?
>
>
>
> Yes, this is undefined.  Pointer arithmetic (either direct or
> via standard library functions) is only defined for pointers
> to the same object or subobject.  So even something like
>
>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>
> is undefined.


 There's nothing undefined here - computing the pointer pointing
 to one-after-the-last element of an array is valid (you are just
 not allowed to dereference it).
>>>
>>>
>>> Right, and memcpy dereferences it, so it's undefined.
>>
>> That's interpretation of the standard that I don't share.
>
>It's not an interpretation.  It's a basic rule of the languages
>that the standards are explicit about.  In C11 you will find
>this specified in detail in 6.5.6, paragraph 7 and 8 (of
>particular relevance to your question below is p7: "a pointer
>to an object that is not an element of an array behaves the same
>as a pointer to the first element of an array of length one.")

I know. 

>> Also, if I have struct f { int i; int j; };  and a int * that points
>> to the j member you say I have no standard conforming way
>> to get at a pointer to the i member from this, right?
>
>Correct.  See above.
>
>> Because
>> the pointer points to an 'int' object.  But it also points within
>> a struct f object!  So at least maybe (int *)((char *)p - offsetof
>> (struct f, j))
>> should be valid?
>
>No, not really.  It works in practice but it's not well-defined.
>It doesn't matter how you get at the result.  What matters is
>what you start with.  As Jeff said, to derive a pointer to
>distinct suobjects of a larger object you need to start with
>a pointer to the larger object and treat it as an array of
>chars.

That's obviously not constraints people use C and C++ with so I see no way to 
enforce this within gimple.

>> This means that pfu->x1 + 10 is a valid pointer
>> into *pfu no matter what you say and you can dereference it.
>
>No.
>
>As another hopefully more convincing example consider a multi-
>dimensional array A[2][2].  The value of the offset of A[i][j]
>is sizeof A[i] + j.  With that, the offset of A[1][0] is
>sizeof A[1] + 0, and so would be the offset of A[0][2]. But
>that doesn't make A[0][2] a valid reference to an element of
>A (because A[0] has only two elements, A[0][0] and A[0][1]),
>or &A[0] + 2 a derefernceable pointer.  It's a pointer that
>points just past the last element of the array A[0].  That
>there's another array right after A[0] (namely A[1]) is
>immaterial, same as in the struct f example above.

I know. Dependence analysis relies on this. We've had bugs in the past with gcc 
itself introducing such bogus references. 

Richard. 

>
>Martin



[committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-15 Thread Jeff Law
I hate this patch.

The fundamental problem we have is that there are times when we very
much want to thread jumps and yet there are other times when we do not.

To date we have been able to largely select between those two by looking
at the shape of the CFG and the jump thread to see how threading a
particular jump would impact the shape of the CFG (particularly loops in
the CFG).

In this BZ we have a loop like this:

2
|
3<---+
||
4<---+   |
   / \   |   |
  5   6  |   |
   \  /  |   |
 7   |   |
/ \  |   |
   8  11-+   |
  / \|
 9  10---+
 |
 E

We want to thread the path (6, 7) (7, 11).  ie, there's a path through
that inner loop where we don't need to do the loop test.  Here's an
example (from libgomp testsuite)

(N is 1500)

 for (j = 0; j < N; j++)
{
  if (j > 500)
{
  x[i][j] = i + j + 3;
  y[j] = i*j + 10;
}
  else
x[i][j] = x[i][j]*3;
}



Threading (in effect) puts the loop test into both arms of the
conditional, then removes it from the ELSE arm because the condition is
always "keep looping" in that arm.

This plays poorly with loop parallelization -- I tried to follow what's
going on in there and just simply got lost.  I got as far as seeing that
the parallelization code thought there were loop carried dependencies.
At least that's how it looked to me.  But I don't see any loop carried
dependencies in the code.

You might naturally ask if threading this is actually wise.  It seems to
broadly fit into the cases where we throttle threading so as not to muck
around too much with the loop structure.  It's not terrible to detect a
CFG of this shape and avoid threading.

BUT


You can have essentially the same shape CFG (not 100% the same, but the
same key characteristics), but where jump threading simplifies things in
ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where
jump threading avoids spurious warnings (graphite/scop-4.c)

Initially I thought I'd seen a key difference in the contents of the
latch block, but I think I was just up too late and mis-read the dump.

So I don't see anything in the CFG shape or the contents of the blocks
that can be reasonably analyzed at jump threading time.  Given we have
two opposite needs and no reasonable way I can find to select between
them, I'm resorting to a disgusting hack.  Namely to avoid threading
through the latch to another point in the loop *when loop
parallelization is enabled*.

Let me be clear.  This is a hack.  I don't like it, not one little bit.
But I don't see a way to resolve the regression without introducing
other regressions and the loop parallelization code is a total mystery
to me.

Bootstrapped on x86_64 and regression tested with and without graphite.
Confirmed it fixes the graphite related regressions mentioned in the BZ
on x86_64.

Committing to the trunk and hanging my head in shame.

Jeff

commit aa9f2e239944cc5baafdae431c821b900e7f37a9
Author: Jeff Law 
Date:   Fri Dec 15 11:09:50 2017 -0500

PR tree-optimization/83410
* tree-ssa-threadupdate.c (thread_block_1): Avoid certain jump
threads when parallelizing loops.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8830638d226..2d53e24b4c1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-12-12  Jeff Law  
+
+   PR tree-optimization/83410
+   * tree-ssa-threadupdate.c (thread_block_1): Avoid certain jump
+   threads when parallelizing loops.
+
 2017-12-15  Jakub Jelinek  
 
* tree-core.h (struct attribute_spec): Swap affects_type_identity and
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 045905eceb7..63ad8f9c953 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1333,6 +1333,31 @@ thread_block_1 (basic_block bb, bool noloop_only, bool 
joiners)
 
  if (i != path->length ())
continue;
+
+ /* Loop parallelization can be confused by the result of
+threading through the loop exit test back into the loop.
+However, theading those jumps seems to help other codes.
+
+I have been unable to find anything related to the shape of
+the CFG, the contents of the affected blocks, etc which would
+allow a more sensible test than what we're using below which
+merely avoids the optimization when parallelizing loops.  */
+ if (flag_tree_parallelize_loops > 1)
+   {
+ for (i = 1; i < path->length (); i++)
+   if (bb->loop_father == e2->src->loop_father
+   && loop_exits_from_bb_p (bb->loop_father,
+(*path)[i]->e->src)
+   && !loop_exit_edge_p (bb->loop_father, e2))
+ break;
+
+ if (i != path->length ())
+

Re: [PATCH] Fix PR83418

2017-12-15 Thread Jeff Law
On 12/15/2017 01:10 AM, Richard Biener wrote:
> On Thu, 14 Dec 2017, Richard Biener wrote:
> 
>> On December 14, 2017 4:43:42 PM GMT+01:00, Jeff Law  wrote:
>>> On 12/14/2017 01:54 AM, Richard Biener wrote:

 IVOPTs (at least) leaves unfolded stmts in the IL and VRP
>>> overzealously
 asserts they cannot happen.

 Bootstrap and regtest running on x86_64-unknown-linux-gnu.

 Richard.

 2017-12-14  Richard Biener  

PR tree-optimization/83418
* vr-values.c
>>> (vr_values::extract_range_for_var_from_comparison_expr):
Instead of asserting we don't get unfolded comparisons deal with
them.

* gcc.dg/torture/pr83418.c: New testcase.
>>> I think this also potentially affects dumping.  I've seen the dumper
>>> crash trying to access a INTEGER_CST where we expected to find an
>>> SSA_NAME while iterating over a statement's operands.
>>>
>>> I haven't submitted the workaround because I hadn't tracked down the
>>> root cause to verify something deeper isn't wrong.
>>
>> Yes, I've seen this as well, see my comment in the PR. The issue is that DOM 
>> calls VRP analyze (and dump) routines with not up to date operands during 
>> optimize_stmt. 
> 
> I had the following in my tree to allow dumping.
> 
> Richard.
> 
> Index: gcc/tree-ssa-dom.c
> ===
> --- gcc/tree-ssa-dom.c  (revision 255640)
> +++ gcc/tree-ssa-dom.c  (working copy)
> @@ -2017,6 +2017,7 @@ dom_opt_dom_walker::optimize_stmt (basic
>  undefined behavior that get diagnosed if they're left in 
> the
>  IL because we've attached range information to new
>  SSA_NAMES.  */
> + update_stmt_if_modified (stmt);
>   edge taken_edge = NULL;
>   evrp_range_analyzer.vrp_visit_cond_stmt (as_a  
> (stmt),
>&taken_edge);
> 
I think this implies something earlier changed a statement without
updating it.

jeff


Re: [PATCH] Fix PR83418

2017-12-15 Thread Richard Biener
On December 15, 2017 5:27:14 PM GMT+01:00, Jeff Law  wrote:
>On 12/15/2017 01:10 AM, Richard Biener wrote:
>> On Thu, 14 Dec 2017, Richard Biener wrote:
>> 
>>> On December 14, 2017 4:43:42 PM GMT+01:00, Jeff Law 
>wrote:
 On 12/14/2017 01:54 AM, Richard Biener wrote:
>
> IVOPTs (at least) leaves unfolded stmts in the IL and VRP
 overzealously
> asserts they cannot happen.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2017-12-14  Richard Biener  
>
>   PR tree-optimization/83418
>   * vr-values.c
 (vr_values::extract_range_for_var_from_comparison_expr):
>   Instead of asserting we don't get unfolded comparisons deal with
>   them.
>
>   * gcc.dg/torture/pr83418.c: New testcase.
 I think this also potentially affects dumping.  I've seen the
>dumper
 crash trying to access a INTEGER_CST where we expected to find an
 SSA_NAME while iterating over a statement's operands.

 I haven't submitted the workaround because I hadn't tracked down
>the
 root cause to verify something deeper isn't wrong.
>>>
>>> Yes, I've seen this as well, see my comment in the PR. The issue is
>that DOM calls VRP analyze (and dump) routines with not up to date
>operands during optimize_stmt. 
>> 
>> I had the following in my tree to allow dumping.
>> 
>> Richard.
>> 
>> Index: gcc/tree-ssa-dom.c
>> ===
>> --- gcc/tree-ssa-dom.c  (revision 255640)
>> +++ gcc/tree-ssa-dom.c  (working copy)
>> @@ -2017,6 +2017,7 @@ dom_opt_dom_walker::optimize_stmt (basic
>>  undefined behavior that get diagnosed if they're
>left in 
>> the
>>  IL because we've attached range information to new
>>  SSA_NAMES.  */
>> + update_stmt_if_modified (stmt);
>>   edge taken_edge = NULL;
>>   evrp_range_analyzer.vrp_visit_cond_stmt (as_a 
>
>> (stmt),
>>&taken_edge);
>> 
>I think this implies something earlier changed a statement without
>updating it.

Dom itself does this and delays updating on purpose as an optimization. That 
doesn't work quite well when dispatching into different code. 

Richard. 

>jeff



Re: [PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2017-12-15 Thread David Malcolm
On Fri, 2017-12-15 at 10:01 -0500, Jason Merrill wrote:
> On Thu, Dec 14, 2017 at 2:25 PM, David Malcolm 
> wrote:
> > On Mon, 2017-12-11 at 21:10 -0500, Jason Merrill wrote:
> > > On 11/10/2017 04:45 PM, David Malcolm wrote:
> > > > The initial version of the patch kit added location wrapper
> > > > nodes
> > > > around constants and uses-of-declarations, along with some
> > > > other
> > > > places in the parser (typeid, alignof, sizeof, offsetof).
> > > > 
> > > > This version takes a much more minimal approach: it only adds
> > > > location wrapper nodes around the arguments at callsites, thus
> > > > not adding wrapper nodes around uses of constants and decls in
> > > > other
> > > > locations.
> > > > 
> > > > It keeps them for the other places in the parser (typeid,
> > > > alignof,
> > > > sizeof, offsetof).
> > > > 
> > > > In addition, for now, each site that adds wrapper nodes is
> > > > guarded
> > > > with !processing_template_decl, suppressing the creation of
> > > > wrapper
> > > > nodes when processing template declarations.  This is to
> > > > simplify
> > > > the patch kit so that we don't have to support wrapper nodes
> > > > during
> > > > template expansion.
> > > 
> > > Hmm, it should be easy to support them, since NON_LVALUE_EXPR and
> > > VIEW_CONVERT_EXPR don't otherwise appear in template trees.
> > > 
> > > Jason
> > 
> > I don't know if it's "easy"; it's at least non-trivial.
> > 
> > I attempted to support them in the obvious way by adding the two
> > codes
> > to the switch statement tsubst_copy, reusing the case used by
> > NOP_EXPR
> > and others, but ran into a issue when dealing with template
> > parameter
> > packs.
> > Attached is the reproducer I've been testing with (minimized using
> > "delta" from a stdlib reproducer); my code was failing with:
> > 
> > ../../src/cp-stdlib.ii: In instantiation of ‘struct
> > allocator_traits >’:
> > ../../src/cp-stdlib.ii:31:8:   required from ‘struct
> > __alloc_traits, char>’
> > ../../src/cp-stdlib.ii:43:75:   required from ‘class
> > basic_string >’
> > ../../src/cp-stdlib.ii:47:58:   required from here
> > ../../src/cp-stdlib.ii:27:55: sorry, unimplemented: use of
> > ‘type_pack_expansion’ in template
> >  -> decltype(_S_construct(__a, __p,
> > forward<_Args>(__args)...))  {   }
> >^~
> > 
> > The issue is that normally "__args" would be a PARM_DECL of type
> > TYPE_PACK_EXPANSION, and that's handled by tsubst_decl, but on
> > adding a
> > wrapper node we now have a VIEW_CONVERT_EXPR of the same type i.e.
> > TYPE_PACK_EXPANSION wrapping the PARM_DECL.
> > 
> > When tsubst traverses the tree, the VIEW_CONVERT_EXPR is reached
> > first,
> > and it attempts to substitute the type TYPE_PACK_EXPANSION, which
> > leads
> > to the "sorry".
> > 
> > If I understand things right, during substitution, only tsubst_decl
> > on
> > PARM_DECL can handle nodes with type with code TYPE_PACK_EXPANSION.
> > 
> > The simplest approach seems to be to not create wrapper nodes for
> > decls
> > of type TYPE_PACK_EXPANSION, and that seems to fix the issue.
> 
> That does seem simplest.
> 
> > Alternatively I can handle TYPE_PACK_EXPANSION for
> > VIEW_CONVERT_EXPR in
> > tsubst by remapping the type to that of what they wrap after
> > substitution; doing so also fixes the issue.
> 
> This will be more correct.  For the wrappers you don't need all the
> handling that we currently have for NOP_EXPR and such; since we know
> they don't change the type, we can substitute what they wrap, and
> then
> rewrap the result.

(nods; I have this working)

I've been debugging the other issues that I ran into when removing the
"!processing_template_decl" filter on making wrapper nodes (ICEs and
other errors on valid code).  They turn out to relate to wrappers
around decls of type TEMPLATE_TYPE_PARM; having these wrappers leads to
such VIEW_CONVERT_EXPRs turning up in unexpected places.

I could try to track all those places down, but it seems much simpler
to just add an exclusion to adding wrapper nodes around decls of type
TEMPLATE_TYPE_PARM.  On doing that my smoketests with the C++ stdlib
work again.  Does that sound reasonable?

Thanks
Dave


Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 10:08:03AM -0600, Qing Zhao wrote:
> a little confused here:
> 
> in the current code:
>   . the first case is:  result = strcmp() != 0
>   . the second case is:if (strcmp() != 0)
> 
> so, the missing case you mentioned above is:
> 
> result = if (strcmp() != 0) 
> 
> or something else?

result = (strcmp () != 0 ? 15 : 37);
or similar.  Though, usually COND_EXPRs are added by the tree-if-conversion
pass, so you might need -ftree-loop-if-convert option and it probably needs
to be within some loop which will have just a single bb after the
if-conversion.
> > 
> >> +  /* When both arguments are known, and their strlens are unequal, we can 
> >> + safely fold the call to a non-zero value for strcmp;
> >> + othewise, do nothing now.  */
> >> +  if (idx1 != 0 && idx2 != 0)
> >> +{
> >> +  HOST_WIDE_INT const_string_leni1 = -1;
> >> +  HOST_WIDE_INT const_string_leni2 = -1;
> >> +  const_string_leni1 = compute_string_length (idx1);
> >> +  const_string_leni2 = compute_string_length (idx2);
> > 
> > Why do you initialize the vars when you immediately overwrite it?
> 
> just a habit to declare a variable with initialization :-).
> 
> > Just do
> >  HOST_WIDE_INT const_string_leni1 = compute_string_length (idx1);
> 
> I can change it like this.
> > etc.
> > 
> >> +  /* When one of args is constant string.  */
> >> +  tree var_string;
> >> +  HOST_WIDE_INT const_string_leni = -1;
> >> +  
> >> +  if (idx1)
> >> +{
> >> +  const_string_leni = compute_string_length (idx1);
> >> +  var_string = arg2;
> >> +} 
> >> +  else if (idx2)
> >> +{
> >> +  const_string_leni = compute_string_length (idx2);
> >> +  var_string = arg1;
> >> +} 
> > 
> > Haven't you checked earlier that one of idx1 and idx2 is non-zero?
> 
> Yes.  
> 
> it’s guaranteed that  there is one and ONLY one of idx1 and idx2 is non-zero 
> when getting here. 
> 
> > If so, then the else if (idx2) will just might confuse -Wuninitialized,
> 
> Okay.
> 
> > if you just use else, you don't need to initialize const_string_leni
> > either.
> 
> I think that const_string_leni still need to be initialized in this case, 
> because when idx2 is non-zero,  
> const_string_leni is initialized to compute_string_length (idx2). 

Sure.  But
  type uninitialized_var;
  if (cond1)
uninitialized_var = foo;
  else if (cond2)
uninitialized_var = bar;
  use (uninitialized_var);
is a coding style which asks for -Wmaybe-uninitialized warnings, in order
not to warn, the compiler has to prove that cond1 || cond2 is always true,
which might not be always easy for the compiler.

> > This is something that looks problematic to me.  get_range_strlen returns
> > some conservative upper bound on the string length, which is fine if
> > var_string points to say a TREE_STATIC variable where you know the allocated
> > size, or automatic variable.  But if somebody passes you a pointer to a
> > structure and the source doesn't contain aggregate copying for it, not sure
> > if you can take for granted that all the bytes are readable after the '\0'
> > in the string.  Hopefully at least for flexible array members and arrays in
> > such positions get_range_strlen will not provide the upper bound, but even
> > in other cases it doesn't feel safe to me.
> 
> this is the part that took me most of the time during the implementation. 
> 
> I have considered the following 3 approaches to decide the size of the 
> variable array:
> 
>   A. use “compute_builtin_object_size” in tree-object-size.h to decide 
> the size of the
> object.   However, even with the simplest case, it cannot provide the 
> information. 

compute_builtin_object_size with modes 0 or 1 computes upper bound, what you
are really looking for is lower bound, so that would be mode 2, though that
mode isn't actually used in real-world code and thus might be not fully
tested.

>   B. use “get_range_strlen” in gimple-fold.h to decide the size of the 
> object.  however, 
> it cannot provide valid info for simple array, either. 

get_range_strlen returns you a range, the minval is not what you're looking
for, that is the minimum string length, so might be too short for your
purposes.  And maxval is an upper bound, but you are looking for lower
bound, you need guarantees this amount of memory can be accessed, even if
there is 0 in the first byte.

> > Furthermore, in the comments you say that you do it only for small strings,
> > but in the patch I can't see any upper bound, so you could transform strlen
> > that would happen to return say just 1 or 2 with a function call that
> > possibly reads megabytes of data (memcmp may read all bytes, not just stop
> > at the first difference).
> 
> do you mean for very short constant string, we should NOT change it to a. 
> call to memcmp?  instead we should just 
> inline it with byte comparison sequence?

I mean we should never ever replace strcmp or strncmp call with lib

Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 09:19:14AM -0700, Jeff Law wrote:
> +   /* Loop parallelization can be confused by the result of
> +  threading through the loop exit test back into the loop.
> +  However, theading those jumps seems to help other codes.
> +
> +  I have been unable to find anything related to the shape of
> +  the CFG, the contents of the affected blocks, etc which would
> +  allow a more sensible test than what we're using below which
> +  merely avoids the optimization when parallelizing loops.  */
> +   if (flag_tree_parallelize_loops > 1)

Is there no jump threading (dom or vrp) after the parloops pass?
If there is, it would be nice to only do this if the parloops pass
has not been invoked yet.

Jakub


Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-15 Thread Richard Biener
On December 15, 2017 5:19:14 PM GMT+01:00, Jeff Law  wrote:
>I hate this patch.
>
>The fundamental problem we have is that there are times when we very
>much want to thread jumps and yet there are other times when we do not.
>
>To date we have been able to largely select between those two by
>looking
>at the shape of the CFG and the jump thread to see how threading a
>particular jump would impact the shape of the CFG (particularly loops
>in
>the CFG).
>
>In this BZ we have a loop like this:
>
>2
>|
>3<---+
>||
>4<---+   |
>   / \   |   |
>  5   6  |   |
>   \  /  |   |
> 7   |   |
>/ \  |   |
>   8  11-+   |
>  / \|
> 9  10---+
> |
> E
>
>We want to thread the path (6, 7) (7, 11).  ie, there's a path through
>that inner loop where we don't need to do the loop test.  Here's an
>example (from libgomp testsuite)
>
>(N is 1500)
>
> for (j = 0; j < N; j++)
>{
>  if (j > 500)
>{
>  x[i][j] = i + j + 3;
>  y[j] = i*j + 10;
>}
>  else
>x[i][j] = x[i][j]*3;
>}
>
>
>
>Threading (in effect) puts the loop test into both arms of the
>conditional, then removes it from the ELSE arm because the condition is
>always "keep looping" in that arm.
>
>This plays poorly with loop parallelization -- I tried to follow what's
>going on in there and just simply got lost.  I got as far as seeing
>that
>the parallelization code thought there were loop carried dependencies.
>At least that's how it looked to me.  But I don't see any loop carried
>dependencies in the code.

Hmm. I'll double check if I remember on Monday. 

>
>You might naturally ask if threading this is actually wise.  It seems
>to
>broadly fit into the cases where we throttle threading so as not to
>muck
>around too much with the loop structure.  It's not terrible to detect a
>CFG of this shape and avoid threading.
>
>BUT
>
>
>You can have essentially the same shape CFG (not 100% the same, but the
>same key characteristics), but where jump threading simplifies things
>in
>ways that are good for the SLP vectorizer (vect/bb-slp-16.c) or where
>jump threading avoids spurious warnings (graphite/scop-4.c)
>
>Initially I thought I'd seen a key difference in the contents of the
>latch block, but I think I was just up too late and mis-read the dump.
>
>So I don't see anything in the CFG shape or the contents of the blocks
>that can be reasonably analyzed at jump threading time.  Given we have
>two opposite needs and no reasonable way I can find to select between
>them, I'm resorting to a disgusting hack.  Namely to avoid threading
>through the latch to another point in the loop *when loop
>parallelization is enabled*.
>
>Let me be clear.  This is a hack.  I don't like it, not one little bit.
>But I don't see a way to resolve the regression without introducing
>other regressions and the loop parallelization code is a total mystery
>to me.
>
>Bootstrapped on x86_64 and regression tested with and without graphite.
>Confirmed it fixes the graphite related regressions mentioned in the BZ
>on x86_64.
>
>Committing to the trunk and hanging my head in shame.

I'd not have worried much about auto parallekization or graphite here. It does 
look like a missed handling there. 

Richard. 


>Jeff



Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Qing Zhao

> On Dec 15, 2017, at 10:42 AM, Jakub Jelinek  wrote:
> 
> On Fri, Dec 15, 2017 at 10:08:03AM -0600, Qing Zhao wrote:
>> a little confused here:
>> 
>> in the current code:
>>  . the first case is:  result = strcmp() != 0
>>  . the second case is:if (strcmp() != 0)
>> 
>> so, the missing case you mentioned above is:
>> 
>>result = if (strcmp() != 0) 
>> 
>> or something else?
> 
> result = (strcmp () != 0 ? 15 : 37);
> or similar.  Though, usually COND_EXPRs are added by the tree-if-conversion
> pass, so you might need -ftree-loop-if-convert option and it probably needs
> to be within some loop which will have just a single bb after the
> if-conversion.

I see. thanks.

>> 
>>> if you just use else, you don't need to initialize const_string_leni
>>> either.
>> 
>> I think that const_string_leni still need to be initialized in this case, 
>> because when idx2 is non-zero,  
>> const_string_leni is initialized to compute_string_length (idx2). 
> 
> Sure.  But
>  type uninitialized_var;
>  if (cond1)
>uninitialized_var = foo;
>  else if (cond2)
>uninitialized_var = bar;
>  use (uninitialized_var);
> is a coding style which asks for -Wmaybe-uninitialized warnings, in order
> not to warn, the compiler has to prove that cond1 || cond2 is always true,
> which might not be always easy for the compiler.

in my case, I already initialize the “uninitialized_var” when declared it:

  HOST_WIDE_INT const_string_leni = -1;

  if (idx1)
{
  const_string_leni = compute_string_length (idx1);
  var_string = arg2;
}
  else if (idx2)
{
  const_string_leni = compute_string_length (idx2);
  var_string = arg1;
}

so, the -Wmaybe-uninitialized should NOT issue warning, right?

but anyway, I can change the above as following:

 HOST_WIDE_INT const_string_leni = -1;

  if (idx1)
{
  const_string_leni = compute_string_length (idx1);
  var_string = arg2;
}
  else
{
  gcc_assert (idx2);
  const_string_leni = compute_string_length (idx2);
  var_string = arg1;
}

is this better?

> 
>>> This is something that looks problematic to me.  get_range_strlen returns
>>> some conservative upper bound on the string length, which is fine if
>>> var_string points to say a TREE_STATIC variable where you know the allocated
>>> size, or automatic variable.  But if somebody passes you a pointer to a
>>> structure and the source doesn't contain aggregate copying for it, not sure
>>> if you can take for granted that all the bytes are readable after the '\0'
>>> in the string.  Hopefully at least for flexible array members and arrays in
>>> such positions get_range_strlen will not provide the upper bound, but even
>>> in other cases it doesn't feel safe to me.
>> 
>> this is the part that took me most of the time during the implementation. 
>> 
>> I have considered the following 3 approaches to decide the size of the 
>> variable array:
>> 
>>  A. use “compute_builtin_object_size” in tree-object-size.h to decide 
>> the size of the
>> object.   However, even with the simplest case, it cannot provide the 
>> information. 
> 
> compute_builtin_object_size with modes 0 or 1 computes upper bound, what you
> are really looking for is lower bound,

you mean: 0, 1 is for maximum object size, and 2 is for minimum object size?

yes, I am looking for minimum object size for this optimization. 

> so that would be mode 2, though that
> mode isn't actually used in real-world code and thus might be not fully
> tested.

so, using this routine with mode 2 should be the right approach to go? and we 
need fully testing on this too?

> 
>>  B. use “get_range_strlen” in gimple-fold.h to decide the size of the 
>> object.  however, 
>> it cannot provide valid info for simple array, either. 
> 
> get_range_strlen returns you a range, the minval is not what you're looking
> for, that is the minimum string length, so might be too short for your
> purposes.  And maxval is an upper bound, but you are looking for lower
> bound, you need guarantees this amount of memory can be accessed, even if
> there is 0 in the first byte.

my understanding is that: get_range_strlen returns the minimum and maximum 
length of the string pointed by the 
pointer, and the maximum length of the string is determined by the size of the 
allocated memory pointed by the
pointer, so, it should serve my purpose,   did I misunderstand it?

> 
>>> Furthermore, in the comments you say that you do it only for small strings,
>>> but in the patch I can't see any upper bound, so you could transform strlen
>>> that would happen to return say just 1 or 2 with a function call that
>>> possibly reads megabytes of data (memcmp may read all bytes, not just stop
>>> at the first difference).
>> 
>> do you mean for very short constant string, we should NOT change it to a. 
>> call to memcmp?  instead we should just 
>> inline it with byte comparison sequence?
> 
> I mean we should never ever replace strcmp or

Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

2017-12-15 Thread Martin Sebor

On 12/15/2017 09:17 AM, Richard Biener wrote:

On December 15, 2017 4:58:14 PM GMT+01:00, Martin Sebor  
wrote:

On 12/15/2017 01:48 AM, Richard Biener wrote:

On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor 

wrote:

On 12/14/2017 03:43 AM, Richard Biener wrote:


On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor 

wrote:


On 12/12/2017 05:35 PM, Jeff Law wrote:



On 12/12/2017 01:15 PM, Martin Sebor wrote:



Bug 83373 - False positive reported by -Wstringop-overflow, is
another example of warning triggered by a missed optimization
opportunity, this time in the strlen pass.  The optimization
is discussed in pr78450 - strlen(s) return value can be assumed
to be less than the size of s.  The gist of it is that the

result

of strlen(array) can be assumed to be less than the size of
the array (except in the corner case of last struct members).

To avoid the false positive the attached patch adds this
optimization to the strlen pass.  Although the patch passes
bootstrap and regression tests for all front-ends I'm not sure
the way it determines the upper bound of the range is 100%
correct for languages with arrays with a non-zero lower bound.
Maybe it's just not as tight as it could be.



What about something hideous like

struct fu {
  char x1[10];
  char x2[10];
  int avoid_trailing_array;
}

Where objects stored in x1 are not null terminated.  Are we in

the realm

of undefined behavior at that point (I hope so)?




Yes, this is undefined.  Pointer arithmetic (either direct or
via standard library functions) is only defined for pointers
to the same object or subobject.  So even something like

 memcpy (pfu->x1, pfu->x1 + 10, 10);

is undefined.



There's nothing undefined here - computing the pointer pointing
to one-after-the-last element of an array is valid (you are just
not allowed to dereference it).



Right, and memcpy dereferences it, so it's undefined.


That's interpretation of the standard that I don't share.


It's not an interpretation.  It's a basic rule of the languages
that the standards are explicit about.  In C11 you will find
this specified in detail in 6.5.6, paragraph 7 and 8 (of
particular relevance to your question below is p7: "a pointer
to an object that is not an element of an array behaves the same
as a pointer to the first element of an array of length one.")


I know.


Also, if I have struct f { int i; int j; };  and a int * that points
to the j member you say I have no standard conforming way
to get at a pointer to the i member from this, right?


Correct.  See above.


Because
the pointer points to an 'int' object.  But it also points within
a struct f object!  So at least maybe (int *)((char *)p - offsetof
(struct f, j))
should be valid?


No, not really.  It works in practice but it's not well-defined.
It doesn't matter how you get at the result.  What matters is
what you start with.  As Jeff said, to derive a pointer to
distinct suobjects of a larger object you need to start with
a pointer to the larger object and treat it as an array of
chars.


That's obviously not constraints people use C and C++ with so I see no way to 
enforce this within gimple.


There's code out there that relies on all sorts of undefined
behavior.  It's a judgment call in each instance as to how much
of such code exists and how important it is.  In this case, I'd
expect it to be confined to low-level software like OS kernels
and such whose authors use C as a more convenient assembly
language to talk directly to the hardware.  Programmers in other
domains are usually more conscious of the requirements and limited
guarantees of the language and less willing to make assumptions
based on what this or that processor lets them get away with.

That being said, it certainly is possible to enforce this
constraint within GIMPLE.  My -Wrestrict patch does it to
an extent for the memory and string built-ins.  The -Warray-bounds
patch I submitted for offsets does it for all other expressions.
Neither patch exposed any such code in the Linux kernel, so it
doesn't look like abuses of this sort are common even in low-level
code.

Martin


Re: [PATCH][Middle-end]2nd patch of PR78809 and PR83026

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 11:17:37AM -0600, Qing Zhao wrote:
>   HOST_WIDE_INT const_string_leni = -1;
> 
>   if (idx1)
> {
>   const_string_leni = compute_string_length (idx1);
>   var_string = arg2;
> }
>   else if (idx2)
> {
>   const_string_leni = compute_string_length (idx2);
>   var_string = arg1;
> }
> 
> so, the -Wmaybe-uninitialized should NOT issue warning, right?

Well, you had the var_string var uninitialized, so that is what I was
talking about.

> but anyway, I can change the above as following:
> 
>  HOST_WIDE_INT const_string_leni = -1;

And here you don't need to initialize it.

>   if (idx1)
> {
>   const_string_leni = compute_string_length (idx1);
>   var_string = arg2;
> }
>   else
> {
>   gcc_assert (idx2);
>   const_string_leni = compute_string_length (idx2);
>   var_string = arg1;
> }
> 
> is this better?

Yes, though the gcc_assert could be just gcc_checking_assert (idx2);

> > so that would be mode 2, though that
> > mode isn't actually used in real-world code and thus might be not fully
> > tested.
> 
> so, using this routine with mode 2 should be the right approach to go? 
> and we need fully testing on this too?

It has been a while since I wrote it, so it would need careful analysis.

> >>B. use “get_range_strlen” in gimple-fold.h to decide the size of the 
> >> object.  however, 
> >> it cannot provide valid info for simple array, either. 
> > 
> > get_range_strlen returns you a range, the minval is not what you're looking
> > for, that is the minimum string length, so might be too short for your
> > purposes.  And maxval is an upper bound, but you are looking for lower
> > bound, you need guarantees this amount of memory can be accessed, even if
> > there is 0 in the first byte.
> 
> my understanding is that: get_range_strlen returns the minimum and maximum 
> length of the string pointed by the 
> pointer, and the maximum length of the string is determined by the size of 
> the allocated memory pointed by the
> pointer, so, it should serve my purpose,   did I misunderstand it?

What I'm worried about is:
struct S { int a; char b[64]; };
struct T { struct S c; char d; };
int
foo (struct T *x)
{
  return strcmp (x->c.b, "01234567890123456789012345678901234567890123456789") 
== 0;
}
int
bar (void)
{
  struct S *p = malloc (offsetof (struct S, b) + 8);
  p->a = 123;
  strcpy (p->b, "0123456");
  return foo ((struct T *) p);
}
etc.  where if you transform that into memcmp (x->c.b, 
"01234567890123456789012345678901234567890123456789", 51) == 0
it will segfault, whereas strcmp would not.

> >  But if we find out during
> > expansion we don't want to expand it inline, we should fall back to calling
> > strcmp or strncmp.
> 
> under what situation we will NOT expand the memcpy_eq call inline?

  target = expand_builtin_memcmp (exp, target, fcode == BUILT_IN_MEMCMP_EQ);
  if (target)
return target;
  if (fcode == BUILT_IN_MEMCMP_EQ)
{
  tree newdecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
  TREE_OPERAND (exp, 1) = build_fold_addr_expr (newdecl);
}
is what builtins.c has, so it certainly counts with the possibility.
Now, both expand_builtin_memcmp, and emit_block_cmp_hints has several cases
when it fails.  E.g. can_do_by_pieces decides it is too expensive to do it
inline, and emit_block_cmp_via_cmpmem fails because the target doesn't have
cmpmemsi expander.  Various other cases.

Also, note that some target might have cmpstr*si expanders implemented, but
not cmpmemsi, in which case trying to optimize strcmp as memcmp_eq might be a
severe pessimization.

Jakub


[committed] Add one further testcase for PR80631

2017-12-15 Thread Jakub Jelinek
Hi!

When backporting the PR80631 fix to 7.x, I've noticed there is no runtime
FAIL in any of the tests, only the scan-tree-dump test failures.

So, I've committed following test that FAILs on x86_64-linux without the
patch and succeeds with it.

2017-12-15  Jakub Jelinek  

PR tree-optimization/80631
* gcc.target/i386/avx2-pr80631.c: New test.

--- gcc/testsuite/gcc.target/i386/avx2-pr80631.c.jj 2017-12-15 
18:29:56.714301404 +0100
+++ gcc/testsuite/gcc.target/i386/avx2-pr80631.c2017-12-15 
18:29:37.0 +0100
@@ -0,0 +1,23 @@
+/* PR tree-optimization/80631 */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-vectorize -mavx2 -fno-vect-cost-model" } */
+/* { dg-require-effective-target avx2 } */
+
+#include "avx2-check.h"
+
+#define N 8
+
+static void
+avx2_test (void)
+{
+  int v[N], k;
+  for(k = 0; k < N; k++)
+v[k] = k;
+  v[0] = 77;
+  int found_index = -1;
+  for (k = 0; k < N; k++)
+if (v[k] == 77)
+  found_index = k;
+  if (found_index != 0)
+abort ();
+}

Jakub


Re: [committed][PR tree-optimization/83410] Avoid some jump threads when parallelizing loops

2017-12-15 Thread Jeff Law
On 12/15/2017 09:45 AM, Jakub Jelinek wrote:
> On Fri, Dec 15, 2017 at 09:19:14AM -0700, Jeff Law wrote:
>> +  /* Loop parallelization can be confused by the result of
>> + threading through the loop exit test back into the loop.
>> + However, theading those jumps seems to help other codes.
>> +
>> + I have been unable to find anything related to the shape of
>> + the CFG, the contents of the affected blocks, etc which would
>> + allow a more sensible test than what we're using below which
>> + merely avoids the optimization when parallelizing loops.  */
>> +  if (flag_tree_parallelize_loops > 1)
> 
> Is there no jump threading (dom or vrp) after the parloops pass?
> If there is, it would be nice to only do this if the parloops pass
> has not been invoked yet.
That's precisely how it works -- there's an pre-existing guard.

So prior to the loop optimizers, we're conservative about potentially
mucking up the loop structure.  After the loop optimizers we allow more
aggressive threading.

JEff


Re: [PR81165] discount killed stmts when sizing blocks for threading

2017-12-15 Thread Jeff Law
On 12/11/2017 10:17 PM, Alexandre Oliva wrote:
> On Dec 11, 2017, Jeff Law  wrote:
> 
> I've updated it according to richi's and your feedbacks.  Regstrapped on
> {x86_64,i686}-linux-gnu.  Ok to install?
> 
> 
> We limit the amount of copying for jump threading based on counting
> stmts.  This counting is overly pessimistic, because we will very
> often delete stmts as a consequence of jump threading: when the final
> conditional jump of a block is removed, earlier SSA names computed
> exclusively for use in that conditional are killed.  Furthermore, PHI
> nodes in blocks with only two predecessors are trivially replaced with
> their now-single values after threading.
> 
> This patch scans blocks to be copied in the path constructed so far
> and estimates the number of stmts that will be removed in the copies,
> bumping up the stmt count limit.
> 
> for  gcc/ChangeLog
> 
>   PR tree-optimization/81165
>   * tree-ssa-threadedge.c (uses_in_bb): New.
>   (estimate_threading_killed_stmts): New.
>   (estimate_threading_killed_stmts): New overload.
>   (record_temporary_equivalences_from_stmts_at_dest): Add path
>   parameter; adjust caller.  Expand limit when it's hit.
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR tree-optimization/81165
>   * gcc.dg/pr81165.c: New.


> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index 91793bfa59d3..0f5b943aa9a0 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -170,6 +170,160 @@ threadedge_valueize (tree t)
>return t;
>  }
>  
> +/* Return how many uses of T there are within BB, as long as there
> +   aren't any uses outside BB.  If there are any uses outside BB,
> +   return -1 if there's at most one use within BB, or -2 if there is
> +   more than one use within BB.  */
> +
> +static int
> +uses_in_bb (tree t, basic_block bb)
> +{
> +  int uses = 0;
> +  bool outside_bb = false;
> +
> +  imm_use_iterator iter;
> +  use_operand_p use_p;
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, t)
> +{
> +  if (is_gimple_debug (USE_STMT (use_p)))
> + continue;
> +
> +  if (gimple_bb (USE_STMT (use_p)) != bb)
> + outside_bb = true;
> +  else
> + uses++;
> +
> +  if (outside_bb && uses > 1)
> + return -2;
> +}
> +
> +  if (outside_bb)
> +return -1;
> +
> +  return uses;
> +}
> +
> +/* Starting from the final control flow stmt in BB, assuming it will
> +   be removed, follow uses in to-be-removed stmts back to their defs
> +   and count how many defs are to become dead and be removed as
> +   well.  */
> +
> +static int
> +estimate_threading_killed_stmts (basic_block bb)
> +{
> +  int killed_stmts = 0;
> +  hash_map ssa_remaining_uses;
> +  auto_vec dead_worklist;
> +
> +  /* If the block has only two predecessors, threading will turn phi
> + dsts into either src, so count them as dead stmts.  */
> +  bool drop_all_phis = EDGE_COUNT (bb->preds) == 2;
> +
> +  if (drop_all_phis)
> +for (gphi_iterator gsi = gsi_start_phis (bb);
> +  !gsi_end_p (gsi); gsi_next (&gsi))
> +  {
> + gphi *phi = gsi.phi ();
> + tree dst = gimple_phi_result (phi);
> +
> + /* We don't count virtual PHIs as stmts in
> +record_temporary_equivalences_from_phis.  */
> + if (virtual_operand_p (dst))
> +   continue;
> +
> + killed_stmts++;
> +  }
> +
> +  if (gsi_end_p (gsi_last_bb (bb)))
> +return killed_stmts;
> +
> +  gimple *stmt = gsi_stmt (gsi_last_bb (bb));
> +  if (gimple_code (stmt) != GIMPLE_COND
> +  && gimple_code (stmt) != GIMPLE_GOTO
> +  && gimple_code (stmt) != GIMPLE_SWITCH)
> +return killed_stmts;
> +
> +  dead_worklist.quick_push (stmt);
> +  while (!dead_worklist.is_empty ())
> +{
> +  stmt = dead_worklist.pop ();
> +
> +  ssa_op_iter iter;
> +  use_operand_p use_p;
> +  FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> + {
> +   tree t = USE_FROM_PTR (use_p);
> +   gimple *def = SSA_NAME_DEF_STMT (t);
> +
> +   if (gimple_bb (def) == bb
> +   && (gimple_code (def) != GIMPLE_PHI
> +   || !drop_all_phis)
> +   && !gimple_has_side_effects (def))
> + {
> +   int *usesp = ssa_remaining_uses.get (t);
> +   int uses;
> +
> +   if (usesp)
> + uses = *usesp;
> +   else
> + uses = uses_in_bb (t, bb);
> +
> +   gcc_assert (uses);
> +
> +   /* Don't bother recording the expected use count if we
> +  won't find any further uses within BB.  */
> +   if (!usesp && (uses < -1 || uses > 1))
> + {
> +   usesp = &ssa_remaining_uses.get_or_insert (t);
> +   *usesp = uses;
> + }
> +
> +   if (uses < 0)
> + continue;
> +
> +   --uses;
> +   if (usesp)
> + *usesp = uses;
> +
> +   if (!uses)
> + {
> +   killed_stmts++;
> +   i

Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization

2017-12-15 Thread Bin.Cheng
On Fri, Dec 15, 2017 at 1:19 PM, Richard Biener
 wrote:
> On Fri, Dec 15, 2017 at 1:35 PM, Bin.Cheng  wrote:
>> On Fri, Dec 15, 2017 at 12:09 PM, Bin.Cheng  wrote:
>>> On Fri, Dec 15, 2017 at 11:55 AM, Richard Biener
>>>  wrote:
 On Fri, Dec 15, 2017 at 12:30 PM, Bin Cheng  wrote:
> Hi,
> As explained in the PR, given below test case:
> int a[8][10] = { [2][5] = 4 }, c;
>
> int
> main ()
> {
>   short b;
>   int i, d;
>   for (b = 4; b >= 0; b--)
> for (c = 0; c <= 6; c++)
>   a[c + 1][b + 2] = a[c][b + 1];
>   for (i = 0; i < 8; i++)
> for (d = 0; d < 10; d++)
>   if (a[i][d] != (i == 3 && d == 6) * 4)
> __builtin_abort ();
>   return 0;
>
> the loop nest is illegal for vectorization without reversing inner loop.  
> The issue
> is in data dependence checking of vectorizer, I believe the mentioned 
> revision just
> exposed this.  Previously the vectorization is skipped because of 
> unsupported memory
> operation.  The outer loop vectorization unrolls the outer loop into:
>
>   for (b = 4; b > 0; b -= 4)
>   {
> for (c = 0; c <= 6; c++)
>   a[c + 1][6] = a[c][5];
> for (c = 0; c <= 6; c++)
>   a[c + 1][5] = a[c][4];
> for (c = 0; c <= 6; c++)
>   a[c + 1][4] = a[c][3];
> for (c = 0; c <= 6; c++)
>   a[c + 1][3] = a[c][2];
>   }
> Then four inner loops are fused into:
>   for (b = 4; b > 0; b -= 4)
>   {
> for (c = 0; c <= 6; c++)
> {
>   a[c + 1][6] = a[c][5];  // S1
>   a[c + 1][5] = a[c][4];  // S2
>   a[c + 1][4] = a[c][3];
>   a[c + 1][3] = a[c][2];
> }
>   }

 Note that they are not really "fused" but they are interleaved.  With
 GIMPLE in mind
 that makes a difference, you should get the equivalent of

for (c = 0; c <= 6; c++)
  {
tem1 = a[c][5];
tem2 = a[c][4];
tem3 = a[c][3];
tem4 = a[c][2];
a[c+1][6] = tem1;
a[c +1][5] = tem2;
 a[c+1][4] = tem3;
 a[c+1][3] = tem4;
  }
>>> Yeah, I will double check if this abstract breaks the patch and how.
>> Hmm, I think this doesn't break it, well at least for part of the
>> analysis, because it is loop carried (backward) dependence goes wrong,
>> interleaving or not with the same iteration doesn't matter here.
>
> I think the idea is that forward dependences are always fine (negative 
> distance)
> to vectorize.  But with backward dependences we have to adhere to max_vf.
>
> It looks like for outer loop vectorization we only look at the distances in 
> the
> outer loop but never at inner ones.  But here the same applies but isn't that
> independend on the distances with respect to the outer loop?
>
> But maybe I'm misunderstanding how "distances" work here.
Hmm, I am not sure I understand "distance" correctly.  With
description as in book like "Optimizing compilers for Modern
Architectures", distance is "# of iteration of sink ref - # of
iteration of source ref".  Given below example:
  for (i = 0; i < N; ++i)
{
  x = arr[idx_1];  // S1
  arr[idx_2] = x;  // S2
}
if S1 is source ref, distance = idx_2 - idx_1, and distance > 0.  Also
this is forward dependence.  For example, idx_1 is i + 1 and idx_2 is
i;
If S2 is source ref, distance = idx_1 - idx_2, and distance < 0.  Also
this is backward dependence.  For example idx_1 is i and idx_2 is i +
1;

In GCC, we always try to subtract idx_2 from idx_1 first in computing
classic distance, we could result in negative distance in case of
backward dependence.  When this happens at dependence carried level,
we manually reverse it.  When this happens at inner level loop, we
simply keep the negative distance.  And it's meaningless to talk about
forward/backward given dependence is carried at outer level loop.

Outer loop vectorization is interesting.  The problematic case has
backward dependence carried by outer loop.  Because we don't check
dist vs. max_vf for such dep, the unrolled references could have outer
loop index equal to each other, as in a[c][5] and a[c+1][5].  So it's
like we have outer loop index resolved as equal.  Now it has
dependence only if c == c' + 1.  I think previous comment on fusion
still hold for this and we now need to check backward dependence
between the two reference at inner level loop.  I guess it's a bit
trick to model dependence between a[c][5] and a[c+1][5] using the
original references and dependence.  I think it's okay to do that
because distance of a[c][5] and a[c+1][5] is what we computed
previously for the original references at inner level loop.

Not sure if I have missed something important.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>

> The loop fusion needs to meet the dependence requirement.  Basically, 
> GCC's data
> dependenc

Re: [PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2017-12-15 Thread Jason Merrill
On Fri, Dec 15, 2017 at 11:35 AM, David Malcolm  wrote:
> On Fri, 2017-12-15 at 10:01 -0500, Jason Merrill wrote:
>> On Thu, Dec 14, 2017 at 2:25 PM, David Malcolm 
>> wrote:
>> > On Mon, 2017-12-11 at 21:10 -0500, Jason Merrill wrote:
>> > > On 11/10/2017 04:45 PM, David Malcolm wrote:
>> > > > The initial version of the patch kit added location wrapper
>> > > > nodes
>> > > > around constants and uses-of-declarations, along with some
>> > > > other
>> > > > places in the parser (typeid, alignof, sizeof, offsetof).
>> > > >
>> > > > This version takes a much more minimal approach: it only adds
>> > > > location wrapper nodes around the arguments at callsites, thus
>> > > > not adding wrapper nodes around uses of constants and decls in
>> > > > other
>> > > > locations.
>> > > >
>> > > > It keeps them for the other places in the parser (typeid,
>> > > > alignof,
>> > > > sizeof, offsetof).
>> > > >
>> > > > In addition, for now, each site that adds wrapper nodes is
>> > > > guarded
>> > > > with !processing_template_decl, suppressing the creation of
>> > > > wrapper
>> > > > nodes when processing template declarations.  This is to
>> > > > simplify
>> > > > the patch kit so that we don't have to support wrapper nodes
>> > > > during
>> > > > template expansion.
>> > >
>> > > Hmm, it should be easy to support them, since NON_LVALUE_EXPR and
>> > > VIEW_CONVERT_EXPR don't otherwise appear in template trees.
>> > >
>> > > Jason
>> >
>> > I don't know if it's "easy"; it's at least non-trivial.
>> >
>> > I attempted to support them in the obvious way by adding the two
>> > codes
>> > to the switch statement tsubst_copy, reusing the case used by
>> > NOP_EXPR
>> > and others, but ran into a issue when dealing with template
>> > parameter
>> > packs.
>> > Attached is the reproducer I've been testing with (minimized using
>> > "delta" from a stdlib reproducer); my code was failing with:
>> >
>> > ../../src/cp-stdlib.ii: In instantiation of ‘struct
>> > allocator_traits >’:
>> > ../../src/cp-stdlib.ii:31:8:   required from ‘struct
>> > __alloc_traits, char>’
>> > ../../src/cp-stdlib.ii:43:75:   required from ‘class
>> > basic_string >’
>> > ../../src/cp-stdlib.ii:47:58:   required from here
>> > ../../src/cp-stdlib.ii:27:55: sorry, unimplemented: use of
>> > ‘type_pack_expansion’ in template
>> >  -> decltype(_S_construct(__a, __p,
>> > forward<_Args>(__args)...))  {   }
>> >^~
>> >
>> > The issue is that normally "__args" would be a PARM_DECL of type
>> > TYPE_PACK_EXPANSION, and that's handled by tsubst_decl, but on
>> > adding a
>> > wrapper node we now have a VIEW_CONVERT_EXPR of the same type i.e.
>> > TYPE_PACK_EXPANSION wrapping the PARM_DECL.
>> >
>> > When tsubst traverses the tree, the VIEW_CONVERT_EXPR is reached
>> > first,
>> > and it attempts to substitute the type TYPE_PACK_EXPANSION, which
>> > leads
>> > to the "sorry".
>> >
>> > If I understand things right, during substitution, only tsubst_decl
>> > on
>> > PARM_DECL can handle nodes with type with code TYPE_PACK_EXPANSION.
>> >
>> > The simplest approach seems to be to not create wrapper nodes for
>> > decls
>> > of type TYPE_PACK_EXPANSION, and that seems to fix the issue.
>>
>> That does seem simplest.
>>
>> > Alternatively I can handle TYPE_PACK_EXPANSION for
>> > VIEW_CONVERT_EXPR in
>> > tsubst by remapping the type to that of what they wrap after
>> > substitution; doing so also fixes the issue.
>>
>> This will be more correct.  For the wrappers you don't need all the
>> handling that we currently have for NOP_EXPR and such; since we know
>> they don't change the type, we can substitute what they wrap, and
>> then
>> rewrap the result.
>
> (nods; I have this working)
>
> I've been debugging the other issues that I ran into when removing the
> "!processing_template_decl" filter on making wrapper nodes (ICEs and
> other errors on valid code).  They turn out to relate to wrappers
> around decls of type TEMPLATE_TYPE_PARM; having these wrappers leads to
> such VIEW_CONVERT_EXPRs turning up in unexpected places.

Hmm, that's odd.  What kind of decls?  A variable which happens to
have a template parameter for a type shouldn't be a problem.

> I could try to track all those places down, but it seems much simpler
> to just add an exclusion to adding wrapper nodes around decls of type
> TEMPLATE_TYPE_PARM.  On doing that my smoketests with the C++ stdlib
> work again.  Does that sound reasonable?

Jason


Re: [PATCH] Avoid excessive function type casts with splay-trees

2017-12-15 Thread Bernd Edlinger
On 12/15/17 11:51, Jakub Jelinek wrote:
> On Fri, Dec 15, 2017 at 10:44:54AM +, Bernd Edlinger wrote:
>> when working on the -Wcast-function-type patch I noticed some rather
>> ugly and non-portable function type casts that are necessary to accomplish
>> some actually very simple tasks.
>>
>> Often functions taking pointer arguments are called with a different 
>> signature
>> taking uintptr_t arguments, which is IMHO not really safe to do...
>>
>> The attached patch adds a context argument to the callback functions but
>> keeps the existing interface as far as possible.
> 
> Just formatting nits, not full review:
> 
>> +  return strcmp ((char*) k1, (char*) k2);
> 
> char * instead of char*, please.
> 
>> +void
>> +splay_tree_delete_key_wrapper (splay_tree_key key, void *fn)
>> +{
>> +  splay_tree_delete_key_fn delete_key = (splay_tree_delete_key_fn) 
>> (uintptr_t) fn;
> 
> Too long line, should be:
>splay_tree_delete_key_fn delete_key
>  = (splay_tree_delete_key_fn) (uintptr_t) fn;
> 
>> +void
>> +splay_tree_delete_value_wrapper (splay_tree_value value, void *fn)
>> +{
>> +  splay_tree_delete_value_fn delete_value = (splay_tree_delete_value_fn) 
>> (uintptr_t) fn;
> 
> Ditto.
> 

Yes, thanks.

Updated patch attached.


Bernd.
include:
2017-12-15  Bernd Edlinger  

* splay-tree.h (splay_tree_compare_ex_fn, splay_tree_delete_key_ex_fn,
splay_tree_delete_value_ex_fn): New function types.
(splay_tree_s): Update to use new function types.
(splay_tree_ex_new): Declare new constructor.
(splay_tree_compare_strings, splay_tree_delete_pointers,
splay_tree_compare_wrapper, splay_tree_delete_key_wrapper,
splay_tree_delete_value_wrapper, splay_tree_xmalloc_allocate,
splay_tree_xmalloc_deallocate): Declare new utility functions.

libiberty:
2017-12-15  Bernd Edlinger  

* splay-tree.c (splay_tree_delete_helper, splay_tree_splay,
splay_tree_insert, splay_tree_remove, splay_tree_lookup,
splay_tree_predecessor, splay_tree_successor): Adjust.
(splay_tree_new_typed_alloc): Call splay_tree_ex_new.
(splay_tree_ex_new): New constructor.
(splay_tree_compare_strings, splay_tree_delete_pointers,
splay_tree_compare_wrapper, splay_tree_delete_key_wrapper,
splay_tree_delete_value_wrapper): New utility functions.
(splay_tree_xmalloc_allocate, splay_tree_xmalloc_deallocate): Export.

gcc:
2017-12-15  Bernd Edlinger  

* typed-splay-tree.h (typed_splay_tree::m_compare_outer_fn,
typed_splay_tree::m_delete_key_outer_fn,
typed_splay_tree::m_delete_value_outer_fn): New data members.
(typed_splay_tree::compare_inner_fn,
typed_splay_tree::delete_key_inner_fn,
typed_splay_tree::delete_value_inner_fn): New helper functions.
(typed_splay_tree::typed_splay_tree): Use splay_tree_ex_new.
* tree-dump.c (dump_node): Use splay_tree_delete_pointers.

c-family:
2017-12-15  Bernd Edlinger  

* c-lex.c (get_fileinfo): Use splay_tree_compare_strings and
splay_tree_delete_pointers.

cp:
2017-12-15  Bernd Edlinger  

* decl2.c (start_static_storage_duration_function): Use
splay_tree_delete_pointers.
Index: gcc/c-family/c-lex.c
===
--- gcc/c-family/c-lex.c	(revision 255661)
+++ gcc/c-family/c-lex.c	(working copy)
@@ -101,11 +101,9 @@ get_fileinfo (const char *name)
   struct c_fileinfo *fi;
 
   if (!file_info_tree)
-file_info_tree = splay_tree_new ((splay_tree_compare_fn)
- (void (*) (void)) strcmp,
+file_info_tree = splay_tree_new (splay_tree_compare_strings,
  0,
- (splay_tree_delete_value_fn)
- (void (*) (void)) free);
+ splay_tree_delete_pointers);
 
   n = splay_tree_lookup (file_info_tree, (splay_tree_key) name);
   if (n)
Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c	(revision 255661)
+++ gcc/cp/decl2.c	(working copy)
@@ -3558,8 +3558,7 @@ start_static_storage_duration_function (unsigned c
   priority_info_map = splay_tree_new (splay_tree_compare_ints,
 	  /*delete_key_fn=*/0,
 	  /*delete_value_fn=*/
-	  (splay_tree_delete_value_fn)
-	  (void (*) (void)) free);
+	  splay_tree_delete_pointers);
 
   /* We always need to generate functions for the
 	 DEFAULT_INIT_PRIORITY so enter it now.  That way when we walk
Index: gcc/tree-dump.c
===
--- gcc/tree-dump.c	(revision 255661)
+++ gcc/tree-dump.c	(working copy)
@@ -736,8 +736,7 @@ dump_node (const_tree t, dump_flags_t flags, FILE
   di.flags = flags;
   di.node = t;
   di.nodes = splay_tree_new (splay_tree_compare_pointers, 0,
-			 (splay_tree_delete_value_fn)
-			 (void (*) (void)) free);
+			 splay_tree_delete_pointers);
 
   /* Queue up the first node.  */
   queue (&di, t, DUMP_NONE);

Re: [C++ PATCH] Fix ICE on invalid std::tuple_size<...>::value (PR c++/83205)

2017-12-15 Thread Jason Merrill

On 11/29/2017 08:19 PM, Martin Sebor wrote:

On 11/29/2017 03:32 PM, Jakub Jelinek wrote:

+  if (!tree_fits_uhwi_p (tsize))
+    {
+  error_at (loc, "%u names provided while %qT decomposes into "


When count is 1 as in the test below the error isn't grammatically
correct ("1 names").  I see that the same message is already issued
elsewhere in the function so this seems like an opportunity to use
the right form here and also fix the other one at the same time or
in a followup.  The error_n function exists to issue the right form
for the language, singular or plural.  It's not as convenient when
the sentence contains two terms that may be singular or plural,
but that can also be dealt with.


Agreed.

Jason



Re: [C++ PATCH] Fix ICE with structured binding & to incomplete type (PR c++/83217)

2017-12-15 Thread Jason Merrill

OK.


Re: [C++ PATCH] Fix ICE on invalid std::tuple_size<...>::value (PR c++/83205)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 02:01:36PM -0500, Jason Merrill wrote:
> On 11/29/2017 08:19 PM, Martin Sebor wrote:
> > On 11/29/2017 03:32 PM, Jakub Jelinek wrote:
> > > +  if (!tree_fits_uhwi_p (tsize))
> > > +    {
> > > +  error_at (loc, "%u names provided while %qT decomposes into "
> > 
> > When count is 1 as in the test below the error isn't grammatically
> > correct ("1 names").  I see that the same message is already issued
> > elsewhere in the function so this seems like an opportunity to use
> > the right form here and also fix the other one at the same time or
> > in a followup.  The error_n function exists to issue the right form
> > for the language, singular or plural.  It's not as convenient when
> > the sentence contains two terms that may be singular or plural,
> > but that can also be dealt with.
> 
> Agreed.

Yeah, I've implemented it as an incremental patch.
So
http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02521.html
for the ICE and
http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02538.html
on top of it.  The latter is what Nathan approved already, the former
needs review.

Jakub


Re: Add an "early rematerialisation" pass

2017-12-15 Thread Jeff Law
On 12/14/2017 12:32 PM, Richard Biener wrote:
> 
> On x86_64 all xmm registers are caller saved for example. That means all FP 
> regs and all vectors. (yeah, stupid ABI decision)
But that's precisely what I would expect if one was looking to maintain
backwards compatibility within the core runtime libraries.

If you make something callee-saved, then you have to have space for it
in the setjmp buffer.  Expanding that buffer is an ABI change and thus
*highly* discouraged.

jeff


Re: Add an "early rematerialisation" pass

2017-12-15 Thread Richard Biener
On December 15, 2017 8:10:33 PM GMT+01:00, Jeff Law  wrote:
>On 12/14/2017 12:32 PM, Richard Biener wrote:
>> 
>> On x86_64 all xmm registers are caller saved for example. That means
>all FP regs and all vectors. (yeah, stupid ABI decision)
>But that's precisely what I would expect if one was looking to maintain
>backwards compatibility within the core runtime libraries.
>
>If you make something callee-saved, then you have to have space for it
>in the setjmp buffer.  Expanding that buffer is an ABI change and thus
>*highly* discouraged.

Yes. But the initial 64bit ABI already had 8 xmm regs. Probably all used for 
parameter passing as well, but... 

For all further extensions I agree. 

Richard. 

>jeff



[PATCH] Fix PR83439

2017-12-15 Thread Richard Biener

Goofed up a backport to the GCC 7 branch.

Bootstrap & regtest in progress.

Richard.

2017-12-15  Richard Biener  

PR bootstrap/83439
* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
Adjust remaining gimple_set_modified to use the modified
variable instead.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 255701)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4673,7 +4673,7 @@ eliminate_dom_walker::before_dom_childre
  == void_type_node))
gimple_call_set_fntype (call_stmt, TREE_TYPE (fn));
  maybe_remove_unused_call_args (cfun, call_stmt);
- gimple_set_modified (stmt, true);
+ modified = true;
}
}
}


Re: [C++ PATCH] Fix ICE on invalid std::tuple_size<...>::value (PR c++/83205)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 08:09:20PM +0100, Jakub Jelinek wrote:
> On Fri, Dec 15, 2017 at 02:01:36PM -0500, Jason Merrill wrote:
> > On 11/29/2017 08:19 PM, Martin Sebor wrote:
> > > On 11/29/2017 03:32 PM, Jakub Jelinek wrote:
> > > > +  if (!tree_fits_uhwi_p (tsize))
> > > > +    {
> > > > +  error_at (loc, "%u names provided while %qT decomposes into "
> > > 
> > > When count is 1 as in the test below the error isn't grammatically
> > > correct ("1 names").  I see that the same message is already issued
> > > elsewhere in the function so this seems like an opportunity to use
> > > the right form here and also fix the other one at the same time or
> > > in a followup.  The error_n function exists to issue the right form
> > > for the language, singular or plural.  It's not as convenient when
> > > the sentence contains two terms that may be singular or plural,
> > > but that can also be dealt with.
> > 
> > Agreed.
> 
> Yeah, I've implemented it as an incremental patch.
> So
> http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02521.html
> for the ICE and
> http://gcc.gnu.org/ml/gcc-patches/2017-11/msg02538.html
> on top of it.  The latter is what Nathan approved already, the former
> needs review.

If it helps any, here are the 2 patches combined, re-tested on x86_64-linux
with check-c++-all.

2017-12-15  Jakub Jelinek  

PR c++/83205
* decl.c (cp_finish_decomp): Handle the case when tsize is not
error_mark_node, but doesn't fit into uhwi.  Split up count != eltscnt
and !tree_fits_uhwi_p (tsize) error_at calls into error_n and inform_n
to handle plural forms properly.

* g++.dg/cpp1z/decomp3.C: Adjust for structured binding count
mismatch diagnostics split into error and warning with plural
forms.
* g++.dg/cpp1z/decomp10.C: Likewise.
* g++.dg/cpp1z/decomp32.C: New test.

--- gcc/cp/decl.c.jj2017-12-15 20:40:00.601221086 +0100
+++ gcc/cp/decl.c   2017-12-15 20:42:25.917445216 +0100
@@ -7427,11 +7427,20 @@ cp_finish_decomp (tree decl, tree first,
{
cnt_mismatch:
  if (count > eltscnt)
-   error_at (loc, "%u names provided while %qT decomposes into "
-  "%wu elements", count, type, eltscnt);
+   error_n (loc, count,
+"%u name provided for structured binding",
+"%u names provided for structured binding", count);
  else
-   error_at (loc, "only %u names provided while %qT decomposes into "
-  "%wu elements", count, type, eltscnt);
+   error_n (loc, count,
+"only %u name provided for structured binding",
+"only %u names provided for structured binding", count);
+ /* Some languages have special plural rules even for large values,
+but it is periodic with period of 10, 100, 1000 etc.  */
+ inform_n (loc, eltscnt > INT_MAX
+? (eltscnt % 100) + 100 : eltscnt,
+   "while %qT decomposes into %wu element",
+   "while %qT decomposes into %wu elements",
+   type, eltscnt);
  goto error_out;
}
   eltype = TREE_TYPE (type);
@@ -7500,6 +7509,15 @@ cp_finish_decomp (tree decl, tree first,
 "constant expression", type);
  goto error_out;
}
+  if (!tree_fits_uhwi_p (tsize))
+   {
+ error_n (loc, count,
+  "%u name provided for structured binding",
+  "%u names provided for structured binding", count);
+ inform (loc, "while %qT decomposes into %E elements",
+ type, tsize);
+ goto error_out;
+   }
   eltscnt = tree_to_uhwi (tsize);
   if (count != eltscnt)
goto cnt_mismatch;
--- gcc/testsuite/g++.dg/cpp1z/decomp3.C.jj 2017-11-30 11:18:00.078805693 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp3.C2017-12-15 20:42:25.917445216 
+0100
@@ -51,16 +51,21 @@ int arr[4];
 void
 test3 (A &b, B c)
 {
-  auto [ d, e, f ] = arr;  // { dg-error "only 3 names provided 
while 'int .4.' decomposes into 4 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14_down } .-1 }
-  auto & [ g, h, i, j, k ] = arr;  // { dg-error "5 names provided while 
'int .4.' decomposes into 4 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14_down } .-1 }
-  auto [ l, m ] = b;   // { dg-error "only 2 names provided 
while 'A' decomposes into 3 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14_down } .-1 }
-  auto & [ n, o, p, q ] = b;   // { dg-error "4 names provided 

Re: Add an "early rematerialisation" pass

2017-12-15 Thread Jeff Law
On 12/14/2017 12:26 PM, Richard Sandiford wrote:

>>> How does it relate to what LRA can do?  AFAIK LRA doesn't try to find
>>> any global optimal solution and previous hardreg assignments may work
>>> against it?
> 
> Yeah, both of those are problems.  But the more important problem is
> that it can't increase the live ranges of input registers as easily.
> Doing it before RA means that IRA gets to see the new ranges.
LRA does not work on a global basis.  It's somewhere between basic block
and extended basic block in its scope.

Remat (along with caller-saves) is really just a case of range splitting
in my mind.  So you really want the pass to either directly integrate
with IRA or run prior to IRA.

You can do splitting in response to failure to get a hard register and
try to hook back into IRA to color those new objects.  I had reasonably
good success with that approach when I was looking at the allocators
prior to LRA.

Basically I let IRA do its thing.  WHen it was done I walked through the
IL splitting ranges to make hard registers available at key points.
Then I'd call back into IRA (using existing mechanisms) to try
allocation again for the allocnos that had not been colored and any new
ones.  The key was there was some very simple and easy range splitting
you could do on already allocated allocnos that in turn would free up
hard registers.

That kind of model doesn't seem to fit here terribly well.  It's not a
lack of hard regs that's the problem, but simply not having any hard
regs available across calls.  So splitting the range of some allocno
that did get a hard register isn't going to help color any of the
allocanos that did not get a register.


If we go back further (circa 1998) we did a pre-allocation range
splitting pass.  We had it working marginally OK, but never really as
well as we wanted.

In that model we looked at pseudos that were likely going to be hard to
allocate and split them into multiple new pseudos.  We tracked the
relationship between the new and original pseudo so that reload could
shove them back together in cases where that made the most sense.  We
had copyin/copyout insns to move back and forth between the range copies
and the original pseudo as needed.

I don't remember the heuristics that drove when/where to split.
Meissner might since my recollection is that he did the major lifting
there.

But again, I don't think that model works here either.  It did nothing
WRT remat.

I know we pondered remat in the context of revamping caller-saves in the
early 90s to help Sparc FP.  But my recollection was that once we had
caller-saves handling the basics well, the performance gains were enough
that digging into remat was never really explored.

Anyway, that's a bit of history.  IMHO remat has to run prior to
allocation or integrated with allocation.   In general I'd expect
running before and independent of IRA to be easier to implement, but
slightly less performant than tightly integrated with IRA.

In addition to potentially avoiding spilling, we have an added benefit
for SVE that we avoid variable sized stack frames if we can eliminate
*all* instances of SVE regsiters live across calls.

I'm guessing that they're relatively rare to begin with based on
comments within the actual code.


> 
>>> That said - I would have expected remat to be done before the
>>> first scheduling pass?  Even before pass_sms (not sure
>>> what pass_live_range_shrinkage does).  Or be integrated
>>> with scheduling and it's register pressure cost model.
> 
> SMS shouldn't be a problem.  Early remat wouldn't introduce new
> instructions into a loop unless the loop also had a call, which would
> prevent SMS.  And although it's theoretically possible that it could
> remove instructions from a loop, that would only happen if:
> 
>   (a) the instruction actually computes the same value every time, so
>   could have been moved outside the loop; and
> 
>   (b) the result is only used after a following call (and in particular
>   isn't used within the loop itself)
> 
> (a) is a missed optimisation and (b) seems unlikely.
> 
> Integrating remat into scheduling would make it much less powerful,
> since scheduling does only limited code motion between blocks.
> 
> Doing it before scheduling would be good in principle, but there
> would then need to be a fake dependency between the call and remat
> instructions to stop the scheduler moving the remat instructions
> back before the call.  Adding early remat was a way of avoiding such
> fake dependencies in "every" pass, but it might be that scheduling
> is one case in which the dependencies make sense.
> 
> Either way, being able to run the pass before scheduling seems
> like a future enhancement, blocked on a future enhancement to
> the scheduler.
I'd expect it to be post-scheduling simply because otherwise you have to
ensure scheduling doesn't muck it back up.  And there's been talk t

> 
>>> Also I would have expected the approach to apply to all modes,

Re: Add an "early rematerialisation" pass

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 08:14:38PM +0100, Richard Biener wrote:
> On December 15, 2017 8:10:33 PM GMT+01:00, Jeff Law  wrote:
> >On 12/14/2017 12:32 PM, Richard Biener wrote:
> >> 
> >> On x86_64 all xmm registers are caller saved for example. That means
> >all FP regs and all vectors. (yeah, stupid ABI decision)
> >But that's precisely what I would expect if one was looking to maintain
> >backwards compatibility within the core runtime libraries.
> >
> >If you make something callee-saved, then you have to have space for it
> >in the setjmp buffer.  Expanding that buffer is an ABI change and thus
> >*highly* discouraged.
> 
> Yes. But the initial 64bit ABI already had 8 xmm regs. Probably all used for 
> parameter passing as well, but... 
> 
> For all further extensions I agree. 

Another issue is that the ?mm registers keep growing in size, and having
e.g. low 128-bits of the registers call-saved and upper bits call-used is
just weird (I believe that is the MS ABI).

Jakub


Re: [C++ RFC PATCH] Fix ICE with late attributes in templates (PR c++/83300)

2017-12-15 Thread Jason Merrill

On 12/07/2017 11:45 AM, Jakub Jelinek wrote:

save_template_attributes ignored flags, when ATTR_FLAG_TYPE_IN_PLACE
wasn't set on a type, it would happily attach the attributes to some
existing type (in this case to integer_type_node).

My first approach was to just call build_type_attribute_variant, but
that ICEs on g++.dg/cpp0x/alias-decl-59.C, because there *decl_p is
UNDERLYING_TYPE, which the generic type_hash_canon
build_type_attribute_variant calls doesn't like.


Ah, because it calls layout_type.  What if we did this?

Jason
diff --git a/gcc/tree.c b/gcc/tree.c
index ed1852b3e66..4883b711624 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6445,7 +6445,8 @@ type_hash_canon (unsigned int hashcode, tree type)
 
   /* The TYPE_ALIGN field of a type is set by layout_type(), so we
  must call that routine before comparing TYPE_ALIGNs.  */
-  layout_type (type);
+  if (TREE_CODE (type) < NUM_TREE_CODES)
+layout_type (type);
 
   in.hash = hashcode;
   in.type = type;


Re: [C++ PATCH] Harmonize C++ flexible array member initialization with C (PR c++/80135, PR c++/81922)

2017-12-15 Thread Jason Merrill

On 12/08/2017 11:15 AM, Jakub Jelinek wrote:

Hi!

Martin's patch a few years ago started allowing flexible array members
inside of nested aggregates, similarly to what we were doing in C.
But C rejects cases where we in nested context try to initialize a flexible
array member with a non-empty initializer, because that is something that
can't really work. Say if a flexible array member is inside of a struct
and we are initializing an array of such structs, we can't really have
each array element with a different width based on how large was the
initializer for a particular element's flexible array member.
After Martin's change, we were accepting those and silently generating bogus
assembly (claiming some size of elements but the initializer really followed
the sizes of what was added there), then I think Nathan added some
verification and since then we usually just ICE on those.

This patch does the similar thing in the C++ FE to what the C FE does, i.e.
allow empty initializers of flexible array members ( {}, not "" as that is
already non-zero size) everywhere, and for others allow them only for the
outermost struct/class/union.
Allowing the empty flexible array members is IMHO useful, people can have
say some general structure that is sometimes used as toplevel object and
can be initialized with arbitrarily sized array, and sometimes just use it
inside other structs or arrays if the array isn't needed.

digest_init_r already had a nested argument, but it wasn't actually the
nesting this patch is looking for, because nested true is already in
processing the CONSTRUCTOR for flexible array member, so I've changed it to
an int that tracks limited depth information (just 0 (former nested ==
false), 1 and 2 (both former nested == true), where 2 is used when we
digest_init_r once or more times more).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-08  Jakub Jelinek  

PR c++/80135
PR c++/81922
* typeck2.c (digest_init_r): Change nested argument type from bool to
int.  Use code instead of TREE_CODE (type) where possible.  If
nested == 2, diagnose initialization of flexible array member with
STRING_CST.  Pass nested to process_init_constructor.  Formatting fix.
(digest_init, digest_init_flags): Adjust digest_init_r caller.
(massage_init_elt): Add nested argument.  Pass 2 instead of 1 to
digest_init_r's nested argument if nested is non-zero.
(process_init_constructor_array): Add nested argument.  If nested == 2,
diagnose initialization of flexible array member with non-empty
braced enclosed list.  Pass nested to massage_init_elt.
(process_init_constructor_record, process_init_constructor_union): Add
nested argument, pass it to massage_init_elt.
(process_init_constructor): Add nested argument, pass it to
process_init_constructor_{array,record,union}.
* init.c (find_field_init): Return NULL_TREE if init is
error_mark_node.  Don't look through nested CONSTRUCTORs.


So this change is because the caller is only interested in flexible 
arrays, which can't be deeply nested anymore?  In that case, this is no 
longer a general purpose function and should be called find_flexarray_init.


OK with that change.

Jason


Re: [C++ RFC PATCH] Fix ICE with late attributes in templates (PR c++/83300)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 03:02:50PM -0500, Jason Merrill wrote:
> On 12/07/2017 11:45 AM, Jakub Jelinek wrote:
> > save_template_attributes ignored flags, when ATTR_FLAG_TYPE_IN_PLACE
> > wasn't set on a type, it would happily attach the attributes to some
> > existing type (in this case to integer_type_node).
> > 
> > My first approach was to just call build_type_attribute_variant, but
> > that ICEs on g++.dg/cpp0x/alias-decl-59.C, because there *decl_p is
> > UNDERLYING_TYPE, which the generic type_hash_canon
> > build_type_attribute_variant calls doesn't like.
> 
> Ah, because it calls layout_type.  What if we did this?
> 
> Jason

> diff --git a/gcc/tree.c b/gcc/tree.c
> index ed1852b3e66..4883b711624 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6445,7 +6445,8 @@ type_hash_canon (unsigned int hashcode, tree type)
>  
>/* The TYPE_ALIGN field of a type is set by layout_type(), so we
>   must call that routine before comparing TYPE_ALIGNs.  */
> -  layout_type (type);
> +  if (TREE_CODE (type) < NUM_TREE_CODES)
> +layout_type (type);
>  
>in.hash = hashcode;
>in.type = type;

I think that can't be sufficient, because type_cache_hasher::equal
has:
  switch (TREE_CODE (a->type))
{
...
default:
  return 0;
}

  if (lang_hooks.types.type_hash_eq != NULL)
return lang_hooks.types.type_hash_eq (a->type, b->type);

  return 1;
}

so for types it doesn't know about it will just always return 0.
Or is that what we want for the FE specific types?

Another possibility would be to return 0; for default only if
lang_hooks.types.type_hash_eq is NULL, and otherwise defer to
the langhook, plus changing the C++ and Ada langhooks to do
something with them if needed.

Jakub


Re: [C++ PATCH] Harmonize C++ flexible array member initialization with C (PR c++/80135, PR c++/81922)

2017-12-15 Thread Jakub Jelinek
On Fri, Dec 15, 2017 at 03:09:53PM -0500, Jason Merrill wrote:
> So this change is because the caller is only interested in flexible arrays,
> which can't be deeply nested anymore?  In that case, this is no longer a

Yes.

> general purpose function and should be called find_flexarray_init.

Done and committed.  Thanks.
As a follow up, I'll try (next week) to avoid walking the whole CONSTRUCTOR
and just look at the last element in it instead in find_flexarray_init.

Jakub


  1   2   >