Re: [Patch/cfgexpand]: also consider assembler_name to call expand_main_function

2012-03-21 Thread Richard Guenther
On Tue, 20 Mar 2012, Tristan Gingold wrote:

 
 On Mar 20, 2012, at 5:01 PM, Richard Guenther wrote:
 
  On Tue, 20 Mar 2012, Tristan Gingold wrote:
  
  
  On Mar 20, 2012, at 3:19 PM, Richard Guenther wrote:
  
  [...]
  
  I'd rather get away from using a global main_identifier_node, instead
  make that frontend specific, and introduce targetm.main_assembler_name
  which the assembler-name creating langhook would make sure to use
  when mangling what the FE thinks main is.  main_identifier_node should
  not serve any purpose outside of Frontends.
  
  But I see both as a possible cleanup opportunity, not a necessary change.
  
  Something along these lines ?
  
  Yes, but I'd simply call the hook at the places you now use
  main_assembler_name and not create a global tree node for it.
 
 But we use it at the beginning of graph_finalize_function, so caching it
 makes sense, doesn't it ?

Well, maybe ;)  I have no strong opinion here.

Richard.


Re: [PATCH] Replace a SRA FIXME with an assert

2012-03-21 Thread Richard Guenther
On Tue, 20 Mar 2012, Martin Jambor wrote:

 Hi,
 
 On Tue, Mar 20, 2012 at 04:08:31PM +0100, Richard Guenther wrote:
  On Tue, 20 Mar 2012, Martin Jambor wrote:
  
   Hi,
   
   this patch which removes one of only two FIXMEs in tree-sra.c has been
   sitting in my patch queue for over a year.  Yesterday I noticed it
   there, bootstrapped and tested it on x86_64-linux and it passed.
   
   I'd like to either commit it or just remove the comment, if there
   likely still are size inconsistencies in assignments but we are not
   planning to do anything with them in foreseeable future (and perhaps
   add a note to the bug).
   
   So, which should it be?
  
  Well.  Aggregate assignments can still be off I think, especially
  because of the disconnect between TYPE_SIZE and DECL_SIZE in
  some cases, considering *p = x; with typeof (x) == typeof (*p)
  (tail-padding re-use).
  
  The comments in PR40058 hint at that that issue might be fixed,
  but I also remember issues with Ada.
 
 The other FIXME in tree-sra.c suggests that Ada can produce
 VIEW_CONVERT_EXPRs with a different size than its argument, perhaps
 that is it (I'll try removing that one too).

Yeah, it does that.

  
  GIMPLE verification ensures compatible types (but not a match
  of type_size / decl_size which will be exposed by get_ref_base_and_extent)
  
  But the real question is what do you want to guard against here?
  The assert at least looks like it is going to triggert at some point,
  but, would it be a problem if the sizes to not match?
  
 
 I really can't remember what exactly happened but I do remember it did
 lead to a bug (it's been already part of the chck-in of new SRA so svn
 history does not help).  We copy access tree children accross
 assignments and also change the type of the LHS access to a scalar if
 the RHS access is a scalar (assignments into a structure containing
 just one scalar) and both could lead to some access tree children
 covering larger part of the aggregate than the parent, making the
 children un-findable or even creating overlaps which are prohibited
 for SRA candidates.
 
 But as I wrote before, I'll be happy to just remove the FIXME comment.

I'd just remove the comment then.

Richard.

 Martin
 
 
  Richard.
  
  
   2011-01-06  Martin Jambor  mjam...@suse.cz
   
 * tree-sra.c (build_accesses_from_assign): Make size equality test
 an assert.
   
   Index: src/gcc/tree-sra.c
   ===
   --- src.orig/gcc/tree-sra.c
   +++ src/gcc/tree-sra.c
   @@ -1175,13 +1175,11 @@ build_accesses_from_assign (gimple stmt)
   !lacc-grp_unscalarizable_region
   !racc-grp_unscalarizable_region
   AGGREGATE_TYPE_P (TREE_TYPE (lhs))
   -  /* FIXME: Turn the following line into an assert after PR 40058 is
   -  fixed.  */
   -   lacc-size == racc-size
   useless_type_conversion_p (lacc-type, racc-type))
{
  struct assign_link *link;

   +  gcc_assert (lacc-size == racc-size);
  link = (struct assign_link *) pool_alloc (link_pool);
  memset (link, 0, sizeof (struct assign_link));
 
 

-- 
Richard Guenther rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

[PATCH] Fix PR52636

2012-03-21 Thread Richard Guenther

This fixes PR52636 now that we treat all constants as constants
we need to convert them to the appropriate vector type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-03-20  Richard Guenther  rguent...@suse.de

PR tree-optimizer/52636
* tree-vect-slp.c (vect_get_constant_vectors): Convert constants
to the appropriate type.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 185563)
--- gcc/tree-vect-slp.c (working copy)
*** vect_get_constant_vectors (tree op, slp_
*** 2363,2368 
--- 2363,2374 
  
/* Create 'vect_ = {op0,op1,...,opn}'.  */
number_of_places_left_in_vector--;
+ if (constant_p
+  !types_compatible_p (TREE_TYPE (vector_type), TREE_TYPE (op)))
+   {
+ op = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (vector_type), op);
+ gcc_assert (op  CONSTANT_CLASS_P (op));
+   }
  elts[number_of_places_left_in_vector] = op;
  
if (number_of_places_left_in_vector == 0)


Re: [Patch/cfgexpand]: also consider assembler_name to call expand_main_function

2012-03-21 Thread Tristan Gingold

On Mar 20, 2012, at 6:17 PM, Jan Hubicka wrote:

 On Tue, 20 Mar 2012, Tristan Gingold wrote:
 
 
 On Mar 15, 2012, at 10:37 AM, Richard Guenther wrote:
 
 On Wed, 14 Mar 2012, Tristan Gingold wrote:
 [?]
 
 
 Well.  To make this work in LTO the main function (thus, the program
 entry point) should be marked at cgraph level and all users of
 MAIN_NAME_P should instead check a flag on the cgraph node.
 
 Will write a predicate in tree.[ch].
 
 Please instead transition main-ness to the graph.
 
 Yep, I also agree that it is something cgraph code should care about instead 
 of
 random placess across the whole middle-end.
 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
 index bd21169..7a7a774 100644
 --- a/gcc/cfgexpand.c
 +++ b/gcc/cfgexpand.c
 @@ -4513,9 +4513,8 @@ gimple_expand_cfg (void)
 
   /* If this function is `main', emit a call to `__main'
  to run global initializers, etc.  */
 -  if (DECL_NAME (current_function_decl)
 -   MAIN_NAME_P (DECL_NAME (current_function_decl))
 -   DECL_FILE_SCOPE_P (current_function_decl))
 +  if (DECL_FILE_SCOPE_P (current_function_decl)
 +   cgraph_main_function_p (cgraph_get_node (current_function_decl)))
 expand_main_function ();
 
 The DECL_FILE_SCOPE_P check is redundant, please remove them everywhere
 you call cgraph_main_function_p.  I suppose returning false if the
 cgraph node is NULL in cgraph_main_function_p would be good.
 
 How do we handle the cases before cgraph is built with this approach?

Only front-end code need to check wether a function is main before they add
it in cgraph.  As each front-end should know which function is main, this is
not an issue for them.

 +/* Return true iff NODE is the main function (main in C).  */
 +static inline bool
 +cgraph_main_function_p (struct cgraph_node *node)
 +{
 +  return node-local.main_function;
 
 node  node-local.main_function
 
 Well, cgraph strategy is ito ICE when NODE is NULL :)
 We could have cgraph_main_function_decl_p wrapper that does the NULL 
 handling, but I still don't
 see how this helps - i.e. when you don't have cgraph node you don't have info 
 whether function
 is main or not, so you should not even try to ask.
 In what cases we ICE here?

We don't ICE here - as long as graph_main_function_p is called after front-end.

 +}
 +
 /* Walk all functions with body defined.  */
 #define FOR_EACH_FUNCTION_WITH_GIMPLE_BODY(node) \
for ((node) = cgraph_first_function_with_gimple_body (); (node); \
 diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
 index 516f187..4a59f63 100644
 --- a/gcc/cgraphunit.c
 +++ b/gcc/cgraphunit.c
 @@ -346,6 +346,10 @@ cgraph_finalize_function (tree decl, bool nested)
   notice_global_symbol (decl);
   node-local.finalized = true;
   node-lowered = DECL_STRUCT_FUNCTION (decl)-cfg != NULL;
 +  node-local.main_function =
 +DECL_FILE_SCOPE_P (decl)
 + ((!DECL_ASSEMBLER_NAME_SET_P (decl)  MAIN_NAME_P (DECL_NAME 
 (decl)))
 +   ||decl_assembler_name_equal (decl, main_identifier_node));
 
 If we finalize a function we should always create an assembler name,
 thus I'd change the above to
 
  node-local.main_function = decl_assembler_name_equal (decl, 
 main_identifier_node);
 
 btw, decl_assembler_name_equal doesn't seem to remove target-specific
 mangling - do some OSes mangle main differently (I'm thinking of
 leading underscores or complete renames)?  Thus, I guess the
 targets might want to be able to provide the main_identifier_assember_name
 you use here.
 
 Yes, name function is mangled, i.e. it is _main on djgpp as long as I 
 remember.
 This is why we have the main_identifier_node to go through the mandling 
 procedure.


USER_LABEL_PREFIX is handled by decl_assembler_name_equal.

One way to simplify that is to change the NESTED argument of 
cgraph_finalize_function
to LEVEL, which could be either main, top or nested.  With this mechanism, every
front-end will explicitly tell to the middle-end which function is the main 
entry point.

Thoughts ?

Tristan.

whic


[PATCH] Bug fix in store_bit_field_1 for big endian targets (issue 51893)

2012-03-21 Thread Aurelien Buhrig
Hi,

This patch (for 4.6) fixes a wrong subword index computation in
store_bit_field_1 for big endian targets when value is at least 4 times
bigger than a word (DI REG value with HI words).

It fixes a regression on gcc.c-torture/execute/bitfld-3.c for my current
backend port.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51893

OK to commit?

Aurélien
--- gcc-4.6.1.orig/gcc/expmed.c 2011-05-22 21:02:59.0 +0200
+++ src/gcc/expmed.c2012-01-19 09:32:04.0 +0100
@@ -589,7 +589,10 @@
{
  /* If I is 0, use the low-order word in both field and target;
 if I is 1, use the next to lowest word; and so on.  */
- unsigned int wordnum = (backwards ? nwords - i - 1 : i);
+ unsigned int wordnum = (backwards
+  ? GET_MODE_SIZE(fieldmode)/UNITS_PER_WORD
+- i - 1 
+  : i);
  unsigned int bit_offset = (backwards
 ? MAX ((int) bitsize - ((int) i + 1)
* BITS_PER_WORD,


Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread Richard Guenther
On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski pins...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
 Greetings,

 Now that we're into stage 1 again, I'd like to submit the first round of
 changes for dominator-based strength reduction, which will address
 issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
 attaching two patches: the smaller (slsr-part1) is the patch I'm
 submitting for approval today, while the larger (slsr-fyi) is for
 reference only, but may be useful if questions arise about how the small
 patch fits into the intended whole.

 This patch contains the logic for identifying strength reduction
 candidates, and makes replacements only for those candidates where the
 stride is a fixed constant.  Replacement for candidates with fixed but
 unknown strides are not implemented herein, but that logic can be viewed
 in the larger patch.  This patch does not address strength reduction of
 data reference expressions, or candidates with conditional increments;
 those issues will be dealt with in future patches.

 The cost model is built on the one used by tree-ssa-ivopts.c, and I've
 added some new instruction costs to that model in place.  It might
 eventually be good to divorce that modeling code from IVOPTS, but that's
 an orthogonal patch and somewhat messy.

 I think this is the wrong way to do straight line strength reduction
 considering we have a nice value numbering system which should be easy
 to extended to support it.

Well, it is easy to handle very specific easy cases like

a = i * 2;
b = i * 3;
c = i * 4;

to transform it to

a = i * 2;
b = a + i;
c = b + i;

but already

a = i * 2;
b = i * 4;
c = i * 6;

would need extra special code.  The easy case could be handled in eliminate ()
by, when seeing A * CST, looking up A * (CST - 1) and if that
succeeds, transform
it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done this
simple case at some point, but it failed to handle the common associated cases,
when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
3, etc.  I think
it is the re-association in case of a strength-reduction opportunity
that makes the
separate pass better?  How would you suggest handling this case in the
VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
a * 2 and one for val + 2?  But then we still don't have a value for a + 1
to re-use ...

Bill, experimenting with pattern detection in eliminate () would be a
possibility.

Thanks,
Richard.



 Thanks,
 Andrew pinski



 Thanks,
 Bill


 gcc:

 2012-03-18  Bill Schmidt  wschm...@linux.vnet.ibm.com

        * tree-pass.h (pass_strength_reduction): New decl.
        * tree-ssa-loop-ivopts.c (add_cost): Remove #undef; rename to
        add_regs_cost.
        (multiply_regs_cost): New function.
        (add_const_cost): Likewise.
        (extend_or_trunc_cost): Likewise.
        (negate_cost): Likewise.
        (get_address_cost): Rename add_cost to add_regs_cost.
        (force_expr_to_var_cost): Likewise.
        (get_computation_cost_at): Likewise.
        (determine_iv_cost): Likewise.
        * timevar.def (TV_TREE_SLSR): New timevar.
        * tree-ssa-strength-reduction.c: New.
        * tree-flow.h (add_regs_cost): New decl.
        (multiply_regs_cost): Likewise.
        (add_const_cost): Likewise.
        (extend_or_trunc_cost): Likewise.
        (negate_cost): Likewise.
        * Makefile.in (tree-ssa-strength-reduction.o): New dependencies.
        * passes.c (init_optimization_passes): Add pass_strength_reduction.

 gcc/testsuite:

 2012-03-18  Bill Schmidt  wschm...@linux.vnet.ibm.com

        * gcc.dg/tree-ssa/slsr-1.c: New test.
        * gcc.dg/tree-ssa/slsr-2.c: Likewise.
        * gcc.dg/tree-ssa/slsr-3.c: Likewise.
        * gcc.dg/tree-ssa/slsr-4.c: Likewise.



Re: [patch tree-optimization]: Fix for PR 45397 part 2 of 2

2012-03-21 Thread Richard Guenther
On Thu, Mar 15, 2012 at 3:45 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:46 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:09 PM, Kai Tietz ktiet...@googlemail.com wrote:
 Hi,

 this is the second part of the patch for this problem.  It adds some
 basic simplifications for ==/!=
 comparisons for eliminating redudant operands.

 It adds the following patterns:
  -X ==/!= Z - X - Z ==/!= 0.
  ~X ==/!= Z ^ X - Z ==/!= ~0
  X ==/!= X - Y - Y == 0
  X ==/!= X + Y - Y == 0
  X ==/!= X ^ Y - Y == 0
  (X - Y) ==/!= (Z - Y) - X ==/!= Z
  (Y - X) ==/!= (Y - Z) - X ==/!= Z
  (X + Y) ==/!= (X + Z) - Y ==/!= Z
  (X + Y) ==/!= (Z + X) - Y ==/!= Z
  (X ^ Y) ==/!= (Z ^ X) - Y ==/!= Z

 Can you re-base this patch to work without the previous one?  Also
 please coordinate with Andrew.  Note that all of these(?) simplifications
 are already done by fold_comparison which we could share if you'd split
 out the EXPR_P op0/op1 cases with separated operands/code.

 Richard.

 Hmm, fold_comparison doesn't do the same thing as it checks for
 possible overflow.  This is true for comparisons not being ==/!= or
 having operands of none-integral-type.  But for ==/!= with integral
 typed arguments  the overflow doesn't matter at all.  And exactly this
 is what patch implements here.

 fold_comparison does not check for overflow for ==/!=.

 This optimization of course is just desired in non-AST form, as we
 otherwise loose information in FE.  Therefore I didn't added it to
 fold_const.

 Which pieces are not already in fold-const btw?  forwprop already
 re-constructs trees for the defs of the lhs/rhs of a comparison.

 Richard.

 I have tried to use here instead a call to fold_build2 instead, and I
 had to notice that it didn't optimized a single case (beside the - and
 ~ case on both sides).

 I see in fold const for example in the pattern 'X +- C1 CMP Y +- C2'
 to 'X CMP Y +- C2 +- C1' explicit the check for it.

 ...
 /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
   X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
   the resulting offset is smaller in absolute value than the
   original one.  */
 if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
     (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
 ...

Because the transform is not valid if Y +- C2 +- C1 overflows.  It is not valid
because overflow is undefined, not because the comparison would do the
wrong thing.  You'd have to change the addition to unsigned.

 The same for pattern X +- C1 CMP C2 to X CMP C2 +- C1.

Well, this is obviously just a missed optimization in fold-const.c then.  Mind
conditionalizing the overflow check to codes not NE_EXPR or EQ_EXPR?

 The cases for '(X + Y) ==/!= (Z + X)' and co have the same issue or
 are simply not present.

That's true.  I suppose they were considered too special to worry about.
Did you see these cases in real code?

 Sorry fold_const doesn't cover this at all.

It covers part of it.

 Kai


Re: [patch tree-optimization]: Fix for PR 45397 part 2 of 2

2012-03-21 Thread Kai Tietz
2012/3/21 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 3:45 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:46 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:09 PM, Kai Tietz ktiet...@googlemail.com 
 wrote:
 Hi,

 this is the second part of the patch for this problem.  It adds some
 basic simplifications for ==/!=
 comparisons for eliminating redudant operands.

 It adds the following patterns:
  -X ==/!= Z - X - Z ==/!= 0.
  ~X ==/!= Z ^ X - Z ==/!= ~0
  X ==/!= X - Y - Y == 0
  X ==/!= X + Y - Y == 0
  X ==/!= X ^ Y - Y == 0
  (X - Y) ==/!= (Z - Y) - X ==/!= Z
  (Y - X) ==/!= (Y - Z) - X ==/!= Z
  (X + Y) ==/!= (X + Z) - Y ==/!= Z
  (X + Y) ==/!= (Z + X) - Y ==/!= Z
  (X ^ Y) ==/!= (Z ^ X) - Y ==/!= Z

 Can you re-base this patch to work without the previous one?  Also
 please coordinate with Andrew.  Note that all of these(?) simplifications
 are already done by fold_comparison which we could share if you'd split
 out the EXPR_P op0/op1 cases with separated operands/code.

 Richard.

 Hmm, fold_comparison doesn't do the same thing as it checks for
 possible overflow.  This is true for comparisons not being ==/!= or
 having operands of none-integral-type.  But for ==/!= with integral
 typed arguments  the overflow doesn't matter at all.  And exactly this
 is what patch implements here.

 fold_comparison does not check for overflow for ==/!=.

 This optimization of course is just desired in non-AST form, as we
 otherwise loose information in FE.  Therefore I didn't added it to
 fold_const.

 Which pieces are not already in fold-const btw?  forwprop already
 re-constructs trees for the defs of the lhs/rhs of a comparison.

 Richard.

 I have tried to use here instead a call to fold_build2 instead, and I
 had to notice that it didn't optimized a single case (beside the - and
 ~ case on both sides).

 I see in fold const for example in the pattern 'X +- C1 CMP Y +- C2'
 to 'X CMP Y +- C2 +- C1' explicit the check for it.

 ...
 /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
   X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
   the resulting offset is smaller in absolute value than the
   original one.  */
 if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
     (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
 ...

 Because the transform is not valid if Y +- C2 +- C1 overflows.  It is not 
 valid
 because overflow is undefined, not because the comparison would do the
 wrong thing.  You'd have to change the addition to unsigned.

 The same for pattern X +- C1 CMP C2 to X CMP C2 +- C1.

 Well, this is obviously just a missed optimization in fold-const.c then.  Mind
 conditionalizing the overflow check to codes not NE_EXPR or EQ_EXPR?

 The cases for '(X + Y) ==/!= (Z + X)' and co have the same issue or
 are simply not present.

 That's true.  I suppose they were considered too special to worry about.
 Did you see these cases in real code?

 Sorry fold_const doesn't cover this at all.

 It covers part of it.

 Kai

Sure, the test code shown in this patch isn't that unusual.
Especially in gimple (by using different statements) such construct
are happening.

Eg.:

int f1 (int a, int b, int c)
{
  if ((a + b) == (c + a))
   return 1;
  return 0;
}

int f2 (int a, int b, int c)
{
  if ((a ^ b) == (a  ^ c))
   return 1;
  return 0;
}


int f2 (int a, int b)
{
  if (-a == (b - a))
   return 1;
  return 0;
}

In all those cases the use of variable should be optimized out.
Instead we are producing pretty weak code for those cases.

Kai


Re: [patch tree-optimization]: Fix for PR 45397 part 2 of 2

2012-03-21 Thread Richard Guenther
On Wed, Mar 21, 2012 at 10:56 AM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/21 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 3:45 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:46 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2012/3/15 Richard Guenther richard.guent...@gmail.com:
 On Thu, Mar 15, 2012 at 2:09 PM, Kai Tietz ktiet...@googlemail.com 
 wrote:
 Hi,

 this is the second part of the patch for this problem.  It adds some
 basic simplifications for ==/!=
 comparisons for eliminating redudant operands.

 It adds the following patterns:
  -X ==/!= Z - X - Z ==/!= 0.
  ~X ==/!= Z ^ X - Z ==/!= ~0
  X ==/!= X - Y - Y == 0
  X ==/!= X + Y - Y == 0
  X ==/!= X ^ Y - Y == 0
  (X - Y) ==/!= (Z - Y) - X ==/!= Z
  (Y - X) ==/!= (Y - Z) - X ==/!= Z
  (X + Y) ==/!= (X + Z) - Y ==/!= Z
  (X + Y) ==/!= (Z + X) - Y ==/!= Z
  (X ^ Y) ==/!= (Z ^ X) - Y ==/!= Z

 Can you re-base this patch to work without the previous one?  Also
 please coordinate with Andrew.  Note that all of these(?) simplifications
 are already done by fold_comparison which we could share if you'd split
 out the EXPR_P op0/op1 cases with separated operands/code.

 Richard.

 Hmm, fold_comparison doesn't do the same thing as it checks for
 possible overflow.  This is true for comparisons not being ==/!= or
 having operands of none-integral-type.  But for ==/!= with integral
 typed arguments  the overflow doesn't matter at all.  And exactly this
 is what patch implements here.

 fold_comparison does not check for overflow for ==/!=.

 This optimization of course is just desired in non-AST form, as we
 otherwise loose information in FE.  Therefore I didn't added it to
 fold_const.

 Which pieces are not already in fold-const btw?  forwprop already
 re-constructs trees for the defs of the lhs/rhs of a comparison.

 Richard.

 I have tried to use here instead a call to fold_build2 instead, and I
 had to notice that it didn't optimized a single case (beside the - and
 ~ case on both sides).

 I see in fold const for example in the pattern 'X +- C1 CMP Y +- C2'
 to 'X CMP Y +- C2 +- C1' explicit the check for it.

 ...
 /* Transform comparisons of the form X +- C1 CMP Y +- C2 to
   X CMP Y +- C2 +- C1 for signed X, Y.  This is valid if
   the resulting offset is smaller in absolute value than the
   original one.  */
 if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg0))
     (TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
 ...

 Because the transform is not valid if Y +- C2 +- C1 overflows.  It is not 
 valid
 because overflow is undefined, not because the comparison would do the
 wrong thing.  You'd have to change the addition to unsigned.

 The same for pattern X +- C1 CMP C2 to X CMP C2 +- C1.

 Well, this is obviously just a missed optimization in fold-const.c then.  
 Mind
 conditionalizing the overflow check to codes not NE_EXPR or EQ_EXPR?

 The cases for '(X + Y) ==/!= (Z + X)' and co have the same issue or
 are simply not present.

 That's true.  I suppose they were considered too special to worry about.
 Did you see these cases in real code?

 Sorry fold_const doesn't cover this at all.

 It covers part of it.

 Kai

 Sure, the test code shown in this patch isn't that unusual.
 Especially in gimple (by using different statements) such construct
 are happening.

 Eg.:

 int f1 (int a, int b, int c)
 {
  if ((a + b) == (c + a))
   return 1;
  return 0;
 }

 int f2 (int a, int b, int c)
 {
  if ((a ^ b) == (a  ^ c))
   return 1;
  return 0;
 }


 int f2 (int a, int b)
 {
  if (-a == (b - a))
   return 1;
  return 0;
 }

 In all those cases the use of variable should be optimized out.
 Instead we are producing pretty weak code for those cases.

True, I agree we should try to handle these.  Did you talk to Andrew
with respect to the gimple-combining thing he is working on?

Richard.

 Kai


Re: [RFC PATCH 0/3] Fixing expansion of misaligned MEM_REFs on strict-alignment targets

2012-03-21 Thread Martin Jambor
Hi,

On Tue, Mar 20, 2012 at 08:16:04PM +0100, Georg-Johann Lay wrote:
 Martin Jambor wrote:
  Hi,
  
  this is another iteration of my attempts to fix expansion of
  misaligned memory accesses on strict-alignment platforms (which was
  suggested by Richi in
  http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00931.html and my first
  attempt was posted as
  http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00319.html).
  
  This time I got further, to big extent thanks to parts of Richi's
  fixes of PR 50444 which cleaned up expr.c considerably.  I have
  successfully bootstrapped the combined patch on x86_64-linux,
  i686-linux, ia64-linux (without Ada) and sparc64-linux (without Java).
  I have run the c and c++ testsuites on individual patches on sparc64
  and ia64 too.
  
  Nevertheless, since I still lack experience in this area, there will
  almost certainly be comments and suggestions and therefore I have
  divided the three main changes to three different patches, so that
  they are easier to comment on by both me and anybody reviewing them.
  
  Thanks in advance for any comments,
  
  Martin
 
 Hi Martin.
 
 The new test cases make implications on the size of int: for example
  they crash for targets with sizeof(int) == 2
 

Crash?  I assume the tests misaligned-expand-[12].c abort because of
the comparison with 0xdeadbeef constant, misaligned-expand-3.c should
not be affected by size of int.  If that is the case, does the
following patch fixes the issue for you?

Sorry for the inconvenience,

Martin


2012-03-21  Martin Jambor  mjam...@suse.cz

* gcc.dg/misaligned-expand-1.c (cst): Cast to int.
* gcc.dg/misaligned-expand-2.c (cst): Likewise.

Index: src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
===
--- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-1.c
+++ src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
@@ -14,7 +14,7 @@ foo (myint *p)
   return *p;
 }
 
-#define cst 0xdeadbeef
+#define cst (int) 0xdeadbeef
 #define NUM 8
 
 struct blah
Index: src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
===
--- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-2.c
+++ src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
@@ -14,7 +14,7 @@ foo (myint *p, unsigned int i)
   *p = i;
 }
 
-#define cst 0xdeadbeef
+#define cst (int) 0xdeadbeef
 #define NUM 8
 
 struct blah
@@ -25,8 +25,6 @@ struct blah
 
 struct blah g;
 
-#define cst 0xdeadbeef
-
 int
 main (int argc, char **argv)
 {




Re: [PATCH][ARM] Improve use of conditional execution in thumb mode.

2012-03-21 Thread Andrew Stubbs

On 19/03/12 14:48, Richard Earnshaw wrote:

OK.


Committed.

Andrew


[PATCH, ARM] Don't force vget_lane returning a 64-bit result to transfer to core registers

2012-03-21 Thread Richard Earnshaw
Semantically the neon intrinsic vgetq_lane_[su]64 returns a 64 bit
sub-object of a 128-bit vector; there's no real need for the intrinsic
to map onto a specific machine instruction.

Indeed, if force a particular instruction that moves the result into a
core register, but then want to use the result in the vector unit, we
don't really want to have to move the result back to the other register
bank.  However, that's what we do today.

This patch changes the way we expand these operations so that we
no-longer force selection of the get-lane operation.

A side effect of this change is that we now spit out the fmrrd mnemonic
rather than the vmov equivalent.  As a consequence I've updated the
testsuite to allow for this change.  The changes to the ML files are
pretty mechanical, but I don't speak ML so it would be helpful if
another pair of eyes could check that bit over and tell me if I've
missed something subtle.

Tested on trunk and gcc-4.7, but only installed on trunk.

R.

* neon.md (neon_vget_lanev2di): Use gen_lowpart and gen_highpart.
* config/arm/neon.ml (Fixed_return_reg): Renamed to fixed_vector_reg.
All callers changed.
(Fixed_core_reg): New feature.
(Vget_lane [sizes S64 and U64]): Add Fixed_core_reg.  Allow
fmrrd in disassembly.
* neon-testgen.ml: Handle Fixed_core_reg.

testsuite/
* gcc.target/arm/neon/vgetQ_laneu64.c: Regenerated.
* gcc.target/arm/neon/vgetQ_lanes64.c: Likewise.--- config/arm/neon-testgen.ml  (revision 185587)
+++ config/arm/neon-testgen.ml  (local)
@@ -79,9 +79,12 @@ let emit_automatics chan c_types feature
   (* The intrinsic returns a value.  We need to do explict register
  allocation for vget_low tests or they fail because of copy
  elimination.  *)
-  ((if List.mem Fixed_return_reg features then
+  ((if List.mem Fixed_vector_reg features then
   Printf.fprintf chan   register %s out_%s asm (\d18\);\n
  return_ty return_ty
+else if List.mem Fixed_core_reg features then
+  Printf.fprintf chan   register %s out_%s asm (\r0\);\n
+ return_ty return_ty
 else
   Printf.fprintf chan   %s out_%s;\n return_ty return_ty);
   emit ())
--- config/arm/neon.md  (revision 185587)
+++ config/arm/neon.md  (local)
@@ -2720,14 +2720,24 @@ (define_expand neon_vget_lanedi
 })
 
 (define_expand neon_vget_lanev2di
-  [(match_operand:DI 0 s_register_operand =r)
-   (match_operand:V2DI 1 s_register_operand w)
-   (match_operand:SI 2 immediate_operand i)
-   (match_operand:SI 3 immediate_operand i)]
+  [(match_operand:DI 0 s_register_operand )
+   (match_operand:V2DI 1 s_register_operand )
+   (match_operand:SI 2 immediate_operand )
+   (match_operand:SI 3 immediate_operand )]
   TARGET_NEON
 {
-  neon_lane_bounds (operands[2], 0, 2);
-  emit_insn (gen_vec_extractv2di (operands[0], operands[1], operands[2]));
+  switch (INTVAL (operands[2]))
+{
+case 0:
+  emit_move_insn (operands[0], gen_lowpart (DImode, operands[1]));
+  break;
+case 1:
+  emit_move_insn (operands[0], gen_highpart (DImode, operands[1]));
+  break;
+default:
+  neon_lane_bounds (operands[2], 0, 1);
+  FAIL;
+}
   DONE;
 })
 
--- config/arm/neon.ml  (revision 185587)
+++ config/arm/neon.ml  (local)
@@ -234,7 +234,8 @@ type features =
cases.  The function supplied must return the integer to be written
into the testcase for the argument number (0-based) supplied to it.  *)
   | Const_valuator of (int - int)
-  | Fixed_return_reg
+  | Fixed_vector_reg
+  | Fixed_core_reg
 
 exception MixedMode of elts * elts
 
@@ -1009,7 +1010,8 @@ let ops =
 Vget_lane,
   [InfoWord;
Disassembles_as [Use_operands [| Corereg; Corereg; Dreg |]];
-   Instruction_name [vmov]; Const_valuator (fun _ - 0)],
+   Instruction_name [vmov; fmrrd]; Const_valuator (fun _ - 0);
+   Fixed_core_reg],
   Use_operands [| Corereg; Qreg; Immed |],
   vgetQ_lane, notype_2, [S64; U64];
 
@@ -1125,7 +1127,7 @@ let ops =
   notype_1, pf_su_8_64;
 Vget_low, [Instruction_name [vmov];
Disassembles_as [Use_operands [| Dreg; Dreg |]];
-  Fixed_return_reg],
+  Fixed_vector_reg],
   Use_operands [| Dreg; Qreg |], vget_low,
   notype_1, pf_su_8_32;
  Vget_low, [No_op],
--- testsuite/gcc.target/arm/neon/vgetQ_lanes64.c   (revision 185587)
+++ testsuite/gcc.target/arm/neon/vgetQ_lanes64.c   (local)
@@ -10,11 +10,11 @@
 
 void test_vgetQ_lanes64 (void)
 {
-  int64_t out_int64_t;
+  register int64_t out_int64_t asm (r0);
   int64x2_t arg0_int64x2_t;
 
   out_int64_t = vgetq_lane_s64 (arg0_int64x2_t, 0);
 }
 
-/* { dg-final { scan-assembler vmov\[ \]+\[rR\]\[0-9\]+, 
\[rR\]\[0-9\]+, \[dD\]\[0-9\]+!?\(\[ \]+@\[a-zA-Z0-9 \]+\)?\n } } */
+/* { dg-final { scan-assembler 

Re: C++ PATCH to mangling of 'new auto'

2012-03-21 Thread Jakub Jelinek
On Wed, Mar 21, 2012 at 12:04:18AM -0400, Jason Merrill wrote:
 This also seems like it might be a candidate for 4.7.0.  What do you
 think, Jakub?

Ok for 4.7.0. 

Jakub


RE: [Patch,AVR]: Hack around PR rtl-optimization/52543, Take #2

2012-03-21 Thread Weddington, Eric


 -Original Message-
 From: Georg-Johann Lay 
 Sent: Tuesday, March 20, 2012 1:56 PM
 To: gcc-patches@gcc.gnu.org
 Cc: Denis Chertykov; Weddington, Eric
 Subject: Re: [Patch,AVR]: Hack around PR rtl-optimization/52543, Take
#2
 
 And here is the patch...
 
 Georg-Johann Lay wrote:
 

Please commit.


Re: [PATCH] Fix PR52636

2012-03-21 Thread Paolo Carlini

On 03/21/2012 09:06 AM, Richard Guenther wrote:

2012-03-20  Richard Guentherrguent...@suse.de

PR tree-optimizer/52636

Typo ;)

Paolo.


Re: [RFC PATCH 0/3] Fixing expansion of misaligned MEM_REFs on strict-alignment targets

2012-03-21 Thread Georg-Johann Lay
Martin Jambor wrote:
 Hi,
 
 On Tue, Mar 20, 2012 at 08:16:04PM +0100, Georg-Johann Lay wrote:
 Martin Jambor wrote:
 Hi,

 this is another iteration of my attempts to fix expansion of
 misaligned memory accesses on strict-alignment platforms (which was
 suggested by Richi in
 http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00931.html and my first
 attempt was posted as
 http://gcc.gnu.org/ml/gcc-patches/2012-01/msg00319.html).

 This time I got further, to big extent thanks to parts of Richi's
 fixes of PR 50444 which cleaned up expr.c considerably.  I have
 successfully bootstrapped the combined patch on x86_64-linux,
 i686-linux, ia64-linux (without Ada) and sparc64-linux (without Java).
 I have run the c and c++ testsuites on individual patches on sparc64
 and ia64 too.

 Nevertheless, since I still lack experience in this area, there will
 almost certainly be comments and suggestions and therefore I have
 divided the three main changes to three different patches, so that
 they are easier to comment on by both me and anybody reviewing them.

 Thanks in advance for any comments,

 Martin
 Hi Martin.

 The new test cases make implications on the size of int: for example
  they crash for targets with sizeof(int) == 2

 
 Crash?  I assume the tests misaligned-expand-[12].c abort because of
 the comparison with 0xdeadbeef constant, misaligned-expand-3.c should

Yes. With crash I meant failing at run-time as opposed to a fail at
compile-time.

 not be affected by size of int.  If that is the case, does the
 following patch fixes the issue for you?

Yes, the tests pass now. I went ahead and applied your patch:

http://gcc.gnu.org/viewcvs?view=revisionrevision=185602

Johann


 Martin
 
 
 2012-03-21  Martin Jambor  mjam...@suse.cz
 
   * gcc.dg/misaligned-expand-1.c (cst): Cast to int.
   * gcc.dg/misaligned-expand-2.c (cst): Likewise.
 
 Index: src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
 ===
 --- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-1.c
 +++ src/gcc/testsuite/gcc.dg/misaligned-expand-1.c
 @@ -14,7 +14,7 @@ foo (myint *p)
return *p;
  }
  
 -#define cst 0xdeadbeef
 +#define cst (int) 0xdeadbeef
  #define NUM 8
  
  struct blah
 Index: src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
 ===
 --- src.orig/gcc/testsuite/gcc.dg/misaligned-expand-2.c
 +++ src/gcc/testsuite/gcc.dg/misaligned-expand-2.c
 @@ -14,7 +14,7 @@ foo (myint *p, unsigned int i)
*p = i;
  }
  
 -#define cst 0xdeadbeef
 +#define cst (int) 0xdeadbeef
  #define NUM 8
  
  struct blah
 @@ -25,8 +25,6 @@ struct blah
  
  struct blah g;
  
 -#define cst 0xdeadbeef
 -
  int
  main (int argc, char **argv)
  {
 
 
 



Re: remove wrong code in immed_double_const

2012-03-21 Thread Richard Sandiford
Mike Stump mikest...@comcast.net writes:
 diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
 index de45a22..0c6dc45 100644
 --- a/gcc/doc/rtl.texi
 +++ b/gcc/doc/rtl.texi
 @@ -1530,7 +1530,9 @@ Represents either a floating-point constant of mode 
 @var{m} or an
  integer constant too large to fit into @code{HOST_BITS_PER_WIDE_INT}
  bits but small enough to fit within twice that number of bits (GCC
  does not provide a mechanism to represent even larger constants).  In
 -the latter case, @var{m} will be @code{VOIDmode}.
 +the latter case, @var{m} will be @code{VOIDmode}.  For integral values
 +the value is a signed value, meaning the top bit of
 +@code{CONST_DOUBLE_HIGH} is a sign bit.
  
  @findex CONST_DOUBLE_LOW
  If @var{m} is @code{VOIDmode}, the bits of the value are stored in

Sounds good.

 diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
 index 78ddfc3..c0b24e4 100644
 --- a/gcc/emit-rtl.c
 +++ b/gcc/emit-rtl.c
 @@ -531,10 +531,9 @@ immed_double_const (HOST_WIDE_INT i0, HOST_WIDE_INT i1, 
 enum machine_mode mode)
  
   1) If GET_MODE_BITSIZE (mode) = HOST_BITS_PER_WIDE_INT, then we use
   gen_int_mode.
 - 2) GET_MODE_BITSIZE (mode) == 2 * HOST_BITS_PER_WIDE_INT, but the value 
 of
 - the integer fits into HOST_WIDE_INT anyway (i.e., i1 consists only
 - from copies of the sign bit, and sign of i0 and i1 are the same),  then
 - we return a CONST_INT for i0.
 + 2) If the value of the integer fits into HOST_WIDE_INT anyway
 +(i.e., i1 consists only from copies of the sign bit, and sign
 + of i0 and i1 are the same), then we return a CONST_INT for i0.
   3) Otherwise, we create a CONST_DOUBLE for i0 and i1.  */
if (mode != VOIDmode)
  {

This too.

 diff --git a/gcc/explow.c b/gcc/explow.c
 index 2fae1a1..6284d61 100644
 --- a/gcc/explow.c
 +++ b/gcc/explow.c
 @@ -96,6 +96,9 @@ plus_constant (rtx x, HOST_WIDE_INT c)

For this I think we should make plus_constant a wrapper:

/* Return an rtx for the sum of X and the integer C.  */

rtx
plus_constant (rtx x, HOST_WIDE_INT c)
{
  return plus_constant_mode (GET_MODE (x), x, c);
}

/* Return an rtx for the sum of X and the integer C, given that X
   has mode MODE.  */

rtx
plus_constant_mode (enum machine_mode mode, rtx x, HOST_WIDE_INT c)
{
  ...innards of current plus_constant, without the mode = ...
}

Reason being...

switch (code)
  {
  case CONST_INT:
 +  if (GET_MODE_BITSIZE (mode)  HOST_WIDE_INT)
 + /* Punt for now.  */
 + goto overflow;
return GEN_INT (INTVAL (x) + c);
  
  case CONST_DOUBLE:

...this won't work as things stand, since CONST_INT always has VOIDmode.
(I'm on a slow mission to fix that.)

I agree that this is a pre-existing bug.  Callers that want to be
CONST_INT-safe should use the new plus_constant_mode instead of
plus_constant.  Once they do, we should assert here that mode isn't
VOIDmode.  But since it's an existing bug that also affects 2-HWI
constants, I agree that replacing calls to plus_constant with calls
to plus_constant_mode is a separate fix.

I don't think it's a good idea to punt to a PLUS though.
(plus (const_int X) (const_int Y)) isn't canonical rtl,
and could cause other problems.

Suggest instead we reuse the CONST_DOUBLE code for CONST_INT,
with l1 set to INTVAL and h1 set to the sign extension.

 @@ -103,10 +106,14 @@ plus_constant (rtx x, HOST_WIDE_INT c)
   unsigned HOST_WIDE_INT l1 = CONST_DOUBLE_LOW (x);
   HOST_WIDE_INT h1 = CONST_DOUBLE_HIGH (x);
   unsigned HOST_WIDE_INT l2 = c;
 - HOST_WIDE_INT h2 = c  0 ? ~0 : 0;
 + HOST_WIDE_INT h2 = c  0 ? ~(HOST_WIDE_INT)0 : 0;
   unsigned HOST_WIDE_INT lv;
   HOST_WIDE_INT hv;
  
 + if (GET_MODE_BITSIZE (mode)  2*HOST_WIDE_INT)
 +   /* Punt for now.  */
 +   goto overflow;
 +
   add_double (l1, h1, l2, h2, lv, hv);

Nicely, add_double returns true on overflow, so I think
we should replace the punt with:

   if (add_double_with_sign (l1, h1, l2, h2, lv, hv, false))
 gcc_assert (GET_MODE_BITSIZE (mode) = 2 * HOST_WIDE_INT);

(Seems better to explicitly specify the sign, even though
add_double would be equivalent.)

 @@ -141,6 +148,9 @@ plus_constant (rtx x, HOST_WIDE_INT c)
break;
  
  case PLUS:
 +  if (GET_MODE_BITSIZE (mode)  HOST_WIDE_INT)
 + /* Punt for now.  */
 + goto overflow;
/* The interesting case is adding the integer to a sum.
Look for constant term in the sum and combine
with C.  For an integer constant term, we make a combined

For this I think we should change the recursive CONSTANT_P call
to use plus_constant_mode (with the mode of the PLUS) instead of
plus_constant.  It will then be correct for CONST_INT, and we can
remove the special CONST_INT case.

 diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
 index ce4eab4..37e46b1 100644
 --- a/gcc/simplify-rtx.c
 +++ b/gcc/simplify-rtx.c
 @@ -101,6 +101,7 @@ mode_signbit_p (enum machine_mode mode, const_rtx x)
  
if (width  

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread William J. Schmidt
On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote:
 On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski pins...@gmail.com wrote:
  On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
  wschm...@linux.vnet.ibm.com wrote:
  Greetings,
 
  Now that we're into stage 1 again, I'd like to submit the first round of
  changes for dominator-based strength reduction, which will address
  issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
  attaching two patches: the smaller (slsr-part1) is the patch I'm
  submitting for approval today, while the larger (slsr-fyi) is for
  reference only, but may be useful if questions arise about how the small
  patch fits into the intended whole.
 
  This patch contains the logic for identifying strength reduction
  candidates, and makes replacements only for those candidates where the
  stride is a fixed constant.  Replacement for candidates with fixed but
  unknown strides are not implemented herein, but that logic can be viewed
  in the larger patch.  This patch does not address strength reduction of
  data reference expressions, or candidates with conditional increments;
  those issues will be dealt with in future patches.
 
  The cost model is built on the one used by tree-ssa-ivopts.c, and I've
  added some new instruction costs to that model in place.  It might
  eventually be good to divorce that modeling code from IVOPTS, but that's
  an orthogonal patch and somewhat messy.
 
  I think this is the wrong way to do straight line strength reduction
  considering we have a nice value numbering system which should be easy
  to extended to support it.
 
 Well, it is easy to handle very specific easy cases like
 
 a = i * 2;
 b = i * 3;
 c = i * 4;
 
 to transform it to
 
 a = i * 2;
 b = a + i;
 c = b + i;
 
 but already
 
 a = i * 2;
 b = i * 4;
 c = i * 6;
 
 would need extra special code.  The easy case could be handled in eliminate ()
 by, when seeing A * CST, looking up A * (CST - 1) and if that
 succeeds, transform
 it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done 
 this
 simple case at some point, but it failed to handle the common associated 
 cases,
 when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
 3, etc.  I think
 it is the re-association in case of a strength-reduction opportunity
 that makes the
 separate pass better?  How would you suggest handling this case in the
 VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
 a * 2 and one for val + 2?  But then we still don't have a value for a + 1
 to re-use ...

And it becomes even more difficult with more complex scenarios.
Consider:

a = x + (3 * s);
b = x + (5 * s);
c = x + (7 * s);

The framework I've developed recognizes that this group of instructions
is related, and that it is profitable to replace them as follows:

a = x + (3 * s);
t = 2 * s;
b = a + t;
c = b + t;

The introduced multiply by 2 (one shift) is far cheaper than the two
multiplies that it replaces.  However, suppose you have instead:

a = x + (2 * s);
b = x + (8 * s);

Now it isn't profitable to replace this by:

a = x + (2 * s);
t = 6 * s;
b = a + t;

since a multiply by 6 (2 shifts, one add) is more costly than a multiply
by 8 (one shift).  To make these decisions correctly requires analyzing
all the related statements together, which value numbering as it stands
is not equipped to do.  Logic to handle these cases is included in my
larger fyi patch.

As another example, consider conditionally-executed increments:

a = i * 5;
if (...)
  i = i + 1;
b = i * 5;

This can be correctly and profitably strength-reduced as:

a = i * 5;
t = a;
if (...)
  {
i = i + 1;
t = t + 5;
  }
b = t;

(This is an approximation to the actual phi representation, which I've
omitted for clarity.)  Again, this kind of analysis is not something
that fits naturally into value numbering.  I don't yet have this in the
fyi patch, but have it largely working in a private version.

My conclusion is that if strength reduction is done in value numbering,
it must either be a very limited form of strength reduction, or the kind
of logic I've developed that considers chains of related candidates
together must be glued onto value numbering.  I think the latter would
be a mistake, as it would introduce much unnecessary complexity to what
is currently a very clean approach to PRE; the strength reduction would
become an ugly wart that people would complain about.  I think it's far
cleaner to keep the two issues separate.

 
 Bill, experimenting with pattern detection in eliminate () would be a
 possibility.

For the reasons expressed above, I don't think that would get very far
or make anyone very happy...

I appreciate Andrew's view that value numbering is a logical place to do
strength reduction, but after considering the problem over the last few
months I have to disagree.  If you don't mind, at this point I would
prefer to have my current patch considered on its merits.

Thanks,
Bill

 
 

Re: remove wrong code in immed_double_const

2012-03-21 Thread Michael Matz
Hi,

On Tue, 20 Mar 2012, Mike Stump wrote:

  Actually you did.  I've tried yesterday to come up with a text that 
  would do the same (because I agree with you that deleting the assert 
  changes the spec of the function,
 
 The spec of the function is the text above the definition of the 
 function, coupled with the information in the .texi file, would you 
 agree?

Actually, I wouldn't.  The real spec includes many pieces of information, 
the comments (that can be incomplete or become out of date), the .texi 
docu (which can be even more incomplete and out of date), the code (which 
can conflict with the comments and still be the correct variant) and the 
current usage (which can conflict with everything of the above).  asserts 
are IMO even a nice way of documenting parts of the spec.

 If so, could you please quote the text of the spec which would 
 be violated by removing the assert?  Could you please give a specific 
 value with which we could talk about that shows a violation of the spec.

Richard did so.  If the high bit of i1 is set then currently you will get 
a negative number produced no matter the absolute value of it.  That's IMO 
part of the spec, high bit set -- negative number.  negative as defined 
by the various routines dealing with CONST_INT or CONST_DOUBLE interpreted 
in the modes allowed for creating them.

If you were to allow modes of larger size than what could be completely 
specified with the two HWI arguments i0 and i1, then it suddenly depends 
on the absolute value if you get a negative or positive number.  For small 
negative numbers (those for which i1 is ~0 and i0  0) you'll still get a 
negative CONST_INT.  For large negative numbers you'll get a CONST_DOUBLE, 
that when interpreted in the large requested mode (which is the only thing 
that makes sense) is positive.  It doesn't matter that it's still 
negative when interpreted in a smaller mode.

Hence all values where i1 is between (HWI)1  (hwibits-1)) and 
((HWI)~0)-1 are the values you're searching for, that show the problem.  
As you correctly note the routine will of course generate the exact same 
CONST_DOUBLE object no matter the mode given, but they have to be 
interpreted together with the given mode.

This positive/negative inconsistency doesn't make sense to allow, and the 
assert ensures that it isn't allowed.

Now, this inconsistency can also be avoided via different means.  By 
extending the block comment in front of the function for instance, but 
then the assert would make even more sense.  So Richards proposal to move 
the assert is better: The problem occurs only with large positive or 
negative values (i1 not 0 or ~0), so the mode-size check can be moved 
after the GEN_INT call.

This would have the seemingly strange effect of disallowing too large 
modes only for large values, but that's simply a side-effect of 
CONST_DOUBLE and the whole associated machinery not being able to 
consistently deal with constants wider than 2*HWI_BITS.

 My position is simple, the spec is what is above the definition and the 
 .texi files, and the stuff inside the definition are interesting 
 implementation details of that spec, which _never_ modify the spec.

As an abstract goal that's good.  But reality is that this isn't the case.  
GCC is quite excellently commented, but it doesn't fit the ideal.  Using 
the fact that it isn't ideal to claim that the spec doesn't say anything 
about large modes (when the assert clearly disallows them) is absurd.

 My position is that 0 is a value which the spec defines, and for which 
 we assert.  Please quote the line from the spec that defines what we do 
 in that case.  I've never seen anyone quote such a line.  To support 
 your position, I will insist on a direct quote from the spec.

This line disallows the value 0 with large modes:

  gcc_assert (GET_MODE_BITSIZE (mode) == 2 * HOST_BITS_PER_WIDE_INT);

I insist on it being part of the spec.  Moving the assert changes the spec 
to allow 0 (and generally small positive and negative numbers) to also be 
generated for larger modes.  If you so want to change the spec nobody 
would be opposed.

 if I is 42, we abort.  To back the position that spec must not be 
 changed, you need to explain at least one thing for which the wrong 
 thing will happen if the spec did change.  If you want to go down that 
 path, you will need to furnish one example where badness happens with 0, 
 not 2, not 3, but 0.

Huh.  Removing the assert wouldn't only allow 0, but also other values.  
I don't understand your argumentation: because for 0 nothing bad happens, 
that proves that nothing bad happens for any other values which we would 
also allow, hence I can remove the assert?  It of course doesn't prove 
anything at all.  In any case, above I have given the values that will be 
problematic (and they don't include 0), and a way of changing the spec to 
disallow only them, instead of all values.  Actually Richard S. did so, I 
just repeated him.

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread Richard Guenther
On Wed, 21 Mar 2012, William J. Schmidt wrote:

 On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote:
  On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski pins...@gmail.com wrote:
   On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
   wschm...@linux.vnet.ibm.com wrote:
   Greetings,
  
   Now that we're into stage 1 again, I'd like to submit the first round of
   changes for dominator-based strength reduction, which will address
   issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
   attaching two patches: the smaller (slsr-part1) is the patch I'm
   submitting for approval today, while the larger (slsr-fyi) is for
   reference only, but may be useful if questions arise about how the small
   patch fits into the intended whole.
  
   This patch contains the logic for identifying strength reduction
   candidates, and makes replacements only for those candidates where the
   stride is a fixed constant.  Replacement for candidates with fixed but
   unknown strides are not implemented herein, but that logic can be viewed
   in the larger patch.  This patch does not address strength reduction of
   data reference expressions, or candidates with conditional increments;
   those issues will be dealt with in future patches.
  
   The cost model is built on the one used by tree-ssa-ivopts.c, and I've
   added some new instruction costs to that model in place.  It might
   eventually be good to divorce that modeling code from IVOPTS, but that's
   an orthogonal patch and somewhat messy.
  
   I think this is the wrong way to do straight line strength reduction
   considering we have a nice value numbering system which should be easy
   to extended to support it.
  
  Well, it is easy to handle very specific easy cases like
  
  a = i * 2;
  b = i * 3;
  c = i * 4;
  
  to transform it to
  
  a = i * 2;
  b = a + i;
  c = b + i;
  
  but already
  
  a = i * 2;
  b = i * 4;
  c = i * 6;
  
  would need extra special code.  The easy case could be handled in eliminate 
  ()
  by, when seeing A * CST, looking up A * (CST - 1) and if that
  succeeds, transform
  it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done 
  this
  simple case at some point, but it failed to handle the common associated 
  cases,
  when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
  3, etc.  I think
  it is the re-association in case of a strength-reduction opportunity
  that makes the
  separate pass better?  How would you suggest handling this case in the
  VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
  a * 2 and one for val + 2?  But then we still don't have a value for a + 1
  to re-use ...
 
 And it becomes even more difficult with more complex scenarios.
 Consider:
 
 a = x + (3 * s);
 b = x + (5 * s);
 c = x + (7 * s);
 
 The framework I've developed recognizes that this group of instructions
 is related, and that it is profitable to replace them as follows:
 
 a = x + (3 * s);
 t = 2 * s;
 b = a + t;
 c = b + t;
 
 The introduced multiply by 2 (one shift) is far cheaper than the two
 multiplies that it replaces.  However, suppose you have instead:
 
 a = x + (2 * s);
 b = x + (8 * s);
 
 Now it isn't profitable to replace this by:
 
 a = x + (2 * s);
 t = 6 * s;
 b = a + t;
 
 since a multiply by 6 (2 shifts, one add) is more costly than a multiply
 by 8 (one shift).  To make these decisions correctly requires analyzing
 all the related statements together, which value numbering as it stands
 is not equipped to do.  Logic to handle these cases is included in my
 larger fyi patch.
 
 As another example, consider conditionally-executed increments:
 
 a = i * 5;
 if (...)
   i = i + 1;
 b = i * 5;
 
 This can be correctly and profitably strength-reduced as:
 
 a = i * 5;
 t = a;
 if (...)
   {
 i = i + 1;
 t = t + 5;
   }
 b = t;
 
 (This is an approximation to the actual phi representation, which I've
 omitted for clarity.)  Again, this kind of analysis is not something
 that fits naturally into value numbering.  I don't yet have this in the
 fyi patch, but have it largely working in a private version.
 
 My conclusion is that if strength reduction is done in value numbering,
 it must either be a very limited form of strength reduction, or the kind
 of logic I've developed that considers chains of related candidates
 together must be glued onto value numbering.  I think the latter would
 be a mistake, as it would introduce much unnecessary complexity to what
 is currently a very clean approach to PRE; the strength reduction would
 become an ugly wart that people would complain about.  I think it's far
 cleaner to keep the two issues separate.

I agree.

  
  Bill, experimenting with pattern detection in eliminate () would be a
  possibility.
 
 For the reasons expressed above, I don't think that would get very far
 or make anyone very happy...
 
 I appreciate Andrew's view that value numbering is a logical place to do
 strength 

Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread Richard Earnshaw
On 21/03/12 13:40, William J. Schmidt wrote:
 On Wed, 2012-03-21 at 10:33 +0100, Richard Guenther wrote:
 On Mon, Mar 19, 2012 at 2:19 AM, Andrew Pinski pins...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 6:12 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
 Greetings,

 Now that we're into stage 1 again, I'd like to submit the first round of
 changes for dominator-based strength reduction, which will address
 issues from PR22586, PR35308, PR46556, and perhaps others.  I'm
 attaching two patches: the smaller (slsr-part1) is the patch I'm
 submitting for approval today, while the larger (slsr-fyi) is for
 reference only, but may be useful if questions arise about how the small
 patch fits into the intended whole.

 This patch contains the logic for identifying strength reduction
 candidates, and makes replacements only for those candidates where the
 stride is a fixed constant.  Replacement for candidates with fixed but
 unknown strides are not implemented herein, but that logic can be viewed
 in the larger patch.  This patch does not address strength reduction of
 data reference expressions, or candidates with conditional increments;
 those issues will be dealt with in future patches.

 The cost model is built on the one used by tree-ssa-ivopts.c, and I've
 added some new instruction costs to that model in place.  It might
 eventually be good to divorce that modeling code from IVOPTS, but that's
 an orthogonal patch and somewhat messy.

 I think this is the wrong way to do straight line strength reduction
 considering we have a nice value numbering system which should be easy
 to extended to support it.

 Well, it is easy to handle very specific easy cases like

 a = i * 2;
 b = i * 3;
 c = i * 4;

 to transform it to

 a = i * 2;
 b = a + i;
 c = b + i;

 but already

 a = i * 2;
 b = i * 4;
 c = i * 6;

 would need extra special code.  The easy case could be handled in eliminate 
 ()
 by, when seeing A * CST, looking up A * (CST - 1) and if that
 succeeds, transform
 it to VAL + A.  Cost issues are increasing the lifetime of VAL.  I've done 
 this
 simple case at some point, but it failed to handle the common associated 
 cases,
 when we transform (a + 1) * 2, (a + 1) * 3, etc. to a * 2 + 2, a * 3 +
 3, etc.  I think
 it is the re-association in case of a strength-reduction opportunity
 that makes the
 separate pass better?  How would you suggest handling this case in the
 VN framework?  Detect the a * 3 + 3 pattern and then do two lookups, one for
 a * 2 and one for val + 2?  But then we still don't have a value for a + 1
 to re-use ...
 
 And it becomes even more difficult with more complex scenarios.
 Consider:
 
 a = x + (3 * s);
 b = x + (5 * s);
 c = x + (7 * s);
 
 The framework I've developed recognizes that this group of instructions
 is related, and that it is profitable to replace them as follows:
 
 a = x + (3 * s);
 t = 2 * s;
 b = a + t;
 c = b + t;
 

Given that CPUs often have shift+add, that's not necessarily best
either.  Also, on pipelined super-scalar systems you're serializing a
problem when it might be better to improve the parallelism.

The best sequence on ARM would probably be something like

a = x + (3 * s);
b = a + (2 * s); (ADD b, a, s, LSL #1)
c = a + (4 * s); (ADD b, a, s, LSL #2).


R.



Re: [Patch,AVR]: Hack around PR rtl-optimization/52543, Take #2

2012-03-21 Thread Georg-Johann Lay
Steven Bosscher wrote:
 On Tue, Mar 20, 2012 at 8:54 PM, Georg-Johann Lay a...@gjlay.de wrote:
 Dropping the first patch which does not work because at expand-time there
 must not be pre-/post-modify addressing :-(
 
 Have you tried to fix that, instead? Or at least ask around a bit to
 see what people would think about that idea? The reasons why things
 are the way they are, may not be applicable anymore.

No, I didn't try to fix it. I am not experienced enough in that field.

Moreover, at least as far as avr is concerned, using post-inc would
just be a hack, too.

 For example, perhaps the only reason for not having pre-/post-modify
 addressing modes earlier is that the old flow dataflow frame work
 didn't handle them. And it doesn't seem to be so black-and-white: The
 very pass you ran into problems with first, cprop, does handle
 pre-/post-modify addresses in local cprop. Some other passes simply
 take the conservative path and drop pre-/post-modify (like CSE, which

The problems were not only in cprop but also in cselib.

 doesn't record values from them). It may be a relatively small job to
 make everything accept them, and you may be something that's also
 helpful for other targets.
 
 Ciao!
 Steven



Re: [PATCH] Straight line strength reduction, part 1

2012-03-21 Thread William J. Schmidt
On Wed, 2012-03-21 at 13:57 +, Richard Earnshaw wrote:
 On 21/03/12 13:40, William J. Schmidt wrote:
  
  And it becomes even more difficult with more complex scenarios.
  Consider:
  
  a = x + (3 * s);
  b = x + (5 * s);
  c = x + (7 * s);
  
  The framework I've developed recognizes that this group of instructions
  is related, and that it is profitable to replace them as follows:
  
  a = x + (3 * s);
  t = 2 * s;
  b = a + t;
  c = b + t;
  
 
 Given that CPUs often have shift+add, that's not necessarily best
 either.  Also, on pipelined super-scalar systems you're serializing a
 problem when it might be better to improve the parallelism.
 
 The best sequence on ARM would probably be something like
 
 a = x + (3 * s);
 b = a + (2 * s); (ADD b, a, s, LSL #1)
 c = a + (4 * s); (ADD b, a, s, LSL #2).
 

These are good points, and I hope you'll keep an eye on this work as it
proceeds.  I should have been less categorical about stating the
profitability of the transformation.  My intent is that the cost model
will reflect the capabilities of the target machine, and for the machine
I'm most familiar with the transformation as shown is best.  Getting to
the optimal sequence that you show for ARM could be an interesting
challenge that might require additional logic in the cost model.  I'll
add it to my list of things to think about.

Thanks,
Bill

 
 R.
 



Re: struct siginfo vs. siginfo_t

2012-03-21 Thread Thomas Schwinge
Hi!

On Thu, 15 Mar 2012 11:57:00 -0400, Carlos O'Donell car...@systemhalted.org 
wrote:
 On Thu, Mar 15, 2012 at 11:05 AM, Thomas Schwinge
 tho...@codesourcery.com wrote:
  On 26 Feb 2012 18:17:52 -, drep...@sourceware.org wrote:
  http://sources.redhat.com/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=4efeffc1d583597e4f52985b9747269e47b754e2
 
  commit 4efeffc1d583597e4f52985b9747269e47b754e2
  Author: Ulrich Drepper drep...@gmail.com
  Date:   Sun Feb 26 13:17:27 2012 -0500
 
      Fix up POSIX testing in conformtest
 
  [...]
  +     * sysdeps/unix/sysv/linux/bits/siginfo.h: Don't name siginfo_t
  +     struct.  [...]
  [...]
 
  diff --git a/sysdeps/unix/sysv/linux/bits/siginfo.h 
  b/sysdeps/unix/sysv/linux/bits/siginfo.h
  index ecef39d..0635e2f 100644
  --- a/sysdeps/unix/sysv/linux/bits/siginfo.h
  +++ b/sysdeps/unix/sysv/linux/bits/siginfo.h
  [...]
  @@ -47,7 +47,7 @@ typedef union sigval
   #  define __SI_PAD_SIZE     ((__SI_MAX_SIZE / sizeof (int)) - 3)
   # endif
 
  -typedef struct siginfo
  +typedef struct
     {
       int si_signo;            /* Signal number.  */
       int si_errno;            /* If non-zero, an errno value associated 
  with
  [...]
 
  This change breaks GCC:
 
     In file included from 
  /scratch/tschwing/FM_sh-linux-gnu-mk2/src/gcc-mainline/libgcc/unwind-dw2.c:377:0:
     ./md-unwind-support.h: In function 'sh_fallback_frame_state':
     ./md-unwind-support.h:182:17: error: field 'info' has incomplete type
 
  In my case, this is really libgcc/config/sh/linux-unwind.h:
 
     [...]
        181            struct rt_sigframe {
        182              struct siginfo info;
        183              struct ucontext uc;
        184            } *rt_ = context-cfa;
     [...]
 
 POSIX says you get siginto_t *not* struct siginfo, please fix the code.

There is one usage in boehm-gc/os_dep.c, but it is only used if
SUNOS5SIGS is defined, which it is only if one of SUNOS5, DRSNX, HPUX, or
FREEBSD is defined, which are all not using Linux-based glibc ports.

Likewise, gcc/ada/init.c has a struct __siginfo occurence, but only for
__FreeBSD__.

config/rs6000/linux-unwind.h uses ``char siginfo[128]'', and
config/s390/linux-unwind.h also uses a constant.

I tested the following patch for sh-linux-gnu.  This only covers one
configuration, but the change is pretty mechanic anyway and every place
that used to refer to struct siginfo already must have had signal.h in
its include path, which is the same file that declares siginfo_t.

OK to commit?  This should probably also go into any active release
branches, to keep them buildable once this glibc change ripples through?

libgcc/
* config/alpha/linux-unwind.h (alpha_fallback_frame_state): Use
siginfo_t instead of struct siginfo.
* config/bfin/linux-unwind.h (bfin_fallback_frame_state): Likewise.
* config/i386/linux-unwind.h (x86_fallback_frame_state): Likewise.
* config/ia64/linux-unwind.h (ia64_fallback_frame_state)
(ia64_handle_unwabi): Likewise.
* config/mips/linux-unwind.h (mips_fallback_frame_state): Likewise.
* config/pa/linux-unwind.h (pa32_fallback_frame_state): Likewise.
* config/sh/linux-unwind.h (shmedia_fallback_frame_state)
(sh_fallback_frame_state): Likewise.
* config/tilepro/linux-unwind.h (tile_fallback_frame_state): Likewise.
* config/xtensa/linux-unwind.h (xtensa_fallback_frame_state): Likewise.

diff --git a/libgcc/config/alpha/linux-unwind.h 
b/libgcc/config/alpha/linux-unwind.h
index 4c811dc..f747053 100644
--- a/libgcc/config/alpha/linux-unwind.h
+++ b/libgcc/config/alpha/linux-unwind.h
@@ -49,7 +49,7 @@ alpha_fallback_frame_state (struct _Unwind_Context *context,
   else if (pc[1] == 0x201f015f)/* lda $0,NR_rt_sigreturn */
 {
   struct rt_sigframe {
-   struct siginfo info;
+   siginfo_t info;
struct ucontext uc;
   } *rt_ = context-cfa;
   sc = rt_-uc.uc_mcontext;
diff --git a/libgcc/config/bfin/linux-unwind.h 
b/libgcc/config/bfin/linux-unwind.h
index 88c8285..6e8f1ad 100644
--- a/libgcc/config/bfin/linux-unwind.h
+++ b/libgcc/config/bfin/linux-unwind.h
@@ -48,10 +48,10 @@ bfin_fallback_frame_state (struct _Unwind_Context *context,
 {
   struct rt_sigframe {
int sig;
-   struct siginfo *pinfo;
+   siginfo_t *pinfo;
void *puc;
char retcode[8];
-   struct siginfo info;
+   siginfo_t info;
struct ucontext uc;
   } *rt_ = context-cfa;
 
diff --git a/libgcc/config/i386/linux-unwind.h 
b/libgcc/config/i386/linux-unwind.h
index f17a46c..33810c5 100644
--- a/libgcc/config/i386/linux-unwind.h
+++ b/libgcc/config/i386/linux-unwind.h
@@ -139,9 +139,9 @@ x86_fallback_frame_state (struct _Unwind_Context *context,
 {
   struct rt_sigframe {
int sig;
-   struct siginfo *pinfo;
+   siginfo_t *pinfo;
void *puc;
-   struct siginfo info;
+   siginfo_t info;
struct ucontext uc;
  

[RFC][PATCH] A change to do_while_loop_p()

2012-03-21 Thread Razya Ladelsky
Hi,

I need to use do_while_loop_p, but I'm not sure its functionality is what 
I expected it to be.

This is the part that I do not understand:

/* If the header contains just a condition, it is not a do-while loop.  */
  stmt = last_and_only_stmt (loop-header);
 if (stmt
   gimple_code (stmt) == GIMPLE_COND)
return false;

The header could contain a condition which is not the loop's exit 
condition,
but rather a part of its body, then  why do we rule out this loop as a 
do_while loop?

I ran into this in a loop (the outer loop) extracted from bwaves 
benchmark:

  do k=1,nz
 km1=mod(k+nz-2,nz)+1
 kp1=mod(k,nz)+1
 do j=1,ny
jm1=mod(j+ny-2,ny)+1
jp1=mod(j,ny)+1
.
 enddo
  enddo
 
which was translated to:

D.2361_17 = *ny_16(D);

bb 5:
  # k_3 = PHI 1(4), k_562(25)
  if (D.2361_17  0)
goto bb 8;
  else
goto bb 6;

bb 6:
  k_562 = k_3 + 1;
  # DEBUG k = k_562
  if (k_3 == D.1583_270)
goto bb 7;  ---   return
  else
goto bb 25;

bb 25:
  goto bb 5;

bb 8:  -- starting the body of the the second loop
  pretmp.318_776 = (integer(kind=8)) k_3;
  pretmp.318_777 = stride.92_20 * pretmp.318_776;
... 



bb 5 is the header of the outer loop, and bb 25 is the latch.
According to do_while_loop_p ()  this is NOT a do while loop, but it
seems that it should be.

 I am attaching a patch to change do_while_loop_p() assuming that what I 
understand is indeed correct,
 Please let me know if I'm right,

Thank you,
Razya

Index: tree-ssa-loop-ch.c
===
--- tree-ssa-loop-ch.c  (revision 185604)
+++ tree-ssa-loop-ch.c  (working copy)
@@ -107,6 +107,8 @@ should_duplicate_loop_header_p (basic_block header
 bool
 do_while_loop_p (struct loop *loop)
 {
+  edge exit_edge;
+  gimple cond_stmt;
   gimple stmt = last_stmt (loop-latch);
 
   /* If the latch of the loop is not empty, it is not a do-while loop.  */
@@ -116,8 +118,14 @@ do_while_loop_p (struct loop *loop)
 
   /* If the header contains just a condition, it is not a do-while loop.  */
   stmt = last_and_only_stmt (loop-header);
+  exit_edge = single_dom_exit (loop);
+  if (exit_edge)
+cond_stmt = last_stmt (exit_edge-src);
+  else
+cond_stmt =stmt;
   if (stmt
-   gimple_code (stmt) == GIMPLE_COND)
+   gimple_code (stmt) == GIMPLE_COND
+   stmt == cond_stmt)
 return false;
 
   return true;
=

[fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Rainer Orth
As reported in PR other/52626, make check in fixincludes is currently
failing since I neglected to adapt the baseline for the Solaris 8
removal ;-(  I always meant to run make check, but forgot.

On the other hand, it would be really helpful if fixincludes make check
could emit DejaGnu-style fixincludes.{sum, log} files which would
automatically be picked up by make mail-report.log and make failures
immediately obvious.

The following patch fixes this, tested with make check in fixincludes on
i386-pc-solaris2.11.

Ok for mainline?

Rainer


2012-03-21  Rainer Orth  r...@cebitec.uni-bielefeld.de

PR other/52626
* tests/base/pthread.h [SOLARIS_MUTEX_INIT_2_CHECK]
(PTHREAD_COND_INITIALIZER): Adapt for solaris_cond_init removal.

# HG changeset patch
# Parent aa297e98c9489d9734c7a503fc3275455c33985e
Fix pthread.h failure (PR other/52626)

diff --git a/fixincludes/tests/base/pthread.h b/fixincludes/tests/base/pthread.h
--- a/fixincludes/tests/base/pthread.h
+++ b/fixincludes/tests/base/pthread.h
@@ -83,9 +83,9 @@ extern int __sigsetjmp (struct __jmp_buf
 #define PTHREAD_MUTEX_INITIALIZER	{{{0},0}, {{{0}}}, {0}}
 #endif
 #if __STDC__ - 0 == 0  !defined(_NO_LONGLONG)
-#define PTHREAD_COND_INITIALIZER	{{{0}, 0, 0x4356}, 0}	/* DEFAULTCV */
+#define PTHREAD_COND_INITIALIZER	{{{0}, 0}, 0}	/* DEFAULTCV */
 #else
-#define PTHREAD_COND_INITIALIZER	{{{0}, 0, 0x4356}, {0}}	/* DEFAULTCV */
+#define PTHREAD_COND_INITIALIZER	{{{0}, 0}, {0}}	/* DEFAULTCV */
 #endif
 #if __STDC__ - 0 == 0  !defined(_NO_LONGLONG)
 #define	PTHREAD_MUTEX_INITIALIZER		/* = DEFAULTMUTEX */	\

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: remove wrong code in immed_double_const

2012-03-21 Thread Mike Stump
On Mar 21, 2012, at 6:47 AM, Michael Matz wrote:
 Actually, I wouldn't.

Ok, thanks for explaining.  In light of that, I'd just say, I want to change 
the spec, the details don't change any for me, only the terminology I might 
use.  The problem is the spec is causing aborts on valid code, and hence, 
either, the code must be duplicated and fixed, or the code has to be fixed.  I 
don't see any value in duplicating the code, so, I am left with fixing the spec 
so that valid programs produce valid code.

 If the high bit of i1 is set then currently you will get 
 a negative number produced no matter the absolute value of it.

Ok, in the new patch, I'm pushing to change the spec so that the value is sign 
extended and fixing all the code that doesn't conform to that spec.  Richard 
seems to be agreeable with the basic idea, though, we are now sorting out all 
the little details to make that happen.  If there is any down-side or details 
we missed or got wrong, please chime in.

 For large negative numbers you'll get a CONST_DOUBLE, 
 that when interpreted in the large requested mode (which is the only thing 
 that makes sense) is positive.

In the new patch, we use sign extension, and when the high bit is set, the 
value is interpreted as a negative number is a larger mode.  I'll test signed 
and unsigned constants coming in from above to ensure the right thing happens.  
Above the signededness is tracked via the type.  In the rtl constant, it isn't, 
so that code will need an assert to prevent large unsigned values from taking 
this code path.

 Hence all values where i1 is between (HWI)1  (hwibits-1)) and 
 ((HWI)~0)-1 are the values you're searching for, that show the problem.

Presently, I am fixing _all_ problems shown with those values.  If you know of 
any that we don't address, love to hear about them.

 This positive/negative inconsistency doesn't make sense to allow, and the 
 assert ensures that it isn't allowed.

Don't need the assert when there is no inconsistency, I believe that resolving 
any inconsistencies should remove the need for the assert.

 This would have the seemingly strange effect of disallowing too large 
 modes only for large values, but that's simply a side-effect of 
 CONST_DOUBLE and the whole associated machinery not being able to 
 consistently deal with constants wider than 2*HWI_BITS.

I'll move that assert up to the code that has the type information for the 
constant.

 if I is 42, we abort.  To back the position that spec must not be 
 changed, you need to explain at least one thing for which the wrong 
 thing will happen if the spec did change.  If you want to go down that 
 path, you will need to furnish one example where badness happens with 0, 
 not 2, not 3, but 0.
 
 Huh.  Removing the assert wouldn't only allow 0, but also other values.  
 I don't understand your argumentation: because for 0 nothing bad happens, 
 that proves that nothing bad happens for any other values which we would 
 also allow, hence I can remove the assert?

Right, it merely proves that the assert is wrong and needs fixing.  Once you 
accept that, then we progress on exactly what it should be.  This is now all 
mostly moot, given that I'm fine with changing the spec as Richard suggested to 
be a sign-extended constant.  Once we have that nice are concrete definition, 
the any code conflicts with that, is buggy, and we just fix.  Seems like a nice 
way forward to me.

 It of course doesn't prove anything at all.

:-)  Only the point I wanted to make; that 0 is safe.  As such, it proves that 
the spec, as you might call it, is wrong.  Once that spec is proven wrong, 
trivially, fixing it, isn't unreasonable.


[IA-64] Work around bug in unwinder

2012-03-21 Thread Eric Botcazou
Another latent issue exposed on IA-64 (both Linux and VMS) by GCC 4.7: the LC 
(Loop Counter) register isn't preserved by the unwinder.

The compiler generates unwind info for LC and unwind-ia64.c:uw_install_context 
restores it if this is deemed necessary.  The hitch is that deemed necessary 
means register saved at some point along the path between thrower and catcher 
and going through _Unwind_RaiseException. Now if a register isn't saved along 
this path but clobbered very late, namely in uw_install_context, then nothing 
restores it before the longjmp.

unwind-ia64.c:uw_install_context reads:

static void __attribute__((noreturn))
uw_install_context (struct _Unwind_Context *current __attribute__((unused)),
struct _Unwind_Context *target)
{
  unsigned long ireg_buf[4], ireg_nat = 0, ireg_pr = 0;
  long i;

  /* Copy integer register data from the target context to a
 temporary buffer.  Do this so that we can frob AR.UNAT
 to get the NaT bits for these registers set properly.  */
  for (i = 4; i = 7; ++i)
{
  char nat;
  void *t = target-ireg[i - 2].loc;
  if (t)
{
  unw_access_gr (target, i, ireg_buf[i - 4], nat, 0);
  ireg_nat |= (long)nat  (((size_t)ireg_buf[i - 4]  3)  0x3f);
  /* Set p6 - p9.  */
  ireg_pr |= 4L  i;
}
}

and it clobbers LC because of the loop when compiled with GCC 4.7 and above.

Bootstrapped/regtested on IA-64/Linux, OK for the mainline?  Do we also want it 
for 4.7.1?


2012-03-21  Eric Botcazou  ebotca...@adacore.com

* config/ia64/unwind-ia64.c (uw_install_context): Manually save LC
if it hasn't been previously saved.


-- 
Eric Botcazou
Index: libgcc/config/ia64/unwind-ia64.c
===
--- libgcc/config/ia64/unwind-ia64.c	(revision 185395)
+++ libgcc/config/ia64/unwind-ia64.c	(working copy)
@@ -2171,8 +2171,20 @@ uw_install_context (struct _Unwind_Conte
 		struct _Unwind_Context *target)
 {
   unsigned long ireg_buf[4], ireg_nat = 0, ireg_pr = 0;
+  unsigned long saved_lc;
   long i;
 
+  /* ??? LC is a fixed register so the call to __builtin_unwind_init in
+ uw_init_context doesn't cause it to be saved.  In case it isn't in
+ the user frames either, we need to manually do it here, lest it be
+ clobbered by the loop just below.  */
+  if (target-lc_loc == NULL)
+{
+  register unsigned long lc asm (ar.lc);
+  saved_lc = lc;
+  target-lc_loc = saved_lc;
+}
+
   /* Copy integer register data from the target context to a
  temporary buffer.  Do this so that we can frob AR.UNAT
  to get the NaT bits for these registers set properly.  */


Re: [fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Bruce Korb
Hi Rainer,

On Wed, Mar 21, 2012 at 9:25 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 As reported in PR other/52626, make check in fixincludes is currently
 failing since I neglected to adapt the baseline for the Solaris 8
 removal ;-(  I always meant to run make check, but forgot.

 On the other hand, it would be really helpful if fixincludes make check
 could emit DejaGnu-style fixincludes.{sum, log} files which would
 automatically be picked up by make mail-report.log and make failures
 immediately obvious.

Patch welcome!  I, myself, don't know what emit DejaGnu-style
fixincludes.{sum, log} files would mean.

 The following patch fixes this, tested with make check in fixincludes on
 i386-pc-solaris2.11.

 Ok for mainline?

I'm sure you examined the difference by hand and confirmed that the change is
expected.  Then yes, please, by all means.  Thank you!  - Bruce


[pph] Fix x1mbstate_t.h (issue5872043)

2012-03-21 Thread Diego Novillo
Fix x1mbstate_t.h.

This patch fixes the parser segmentation fault caused by a name
lookup failure (details in
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01369.html).

I am not 100% sure that this is the right fix, but Jason seems to
think that the theory behind this is fine (parser does no allow a
USING_DECL to be set in the bindings of an identifier).  I left a
FIXME note to help future debugging.


2012-03-21   Diego Novillo  dnovi...@google.com

cp/ChangeLog.pph
* name-lookup.c (pph_set_namespace_decl_binding): Ignore
USING_DECLs.

testsuite/ChangeLog.pph
* g++.dg/pph/x1mbstate_t.h: Mark fixed.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 947708d..1b33ce3 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -6271,7 +6271,11 @@ pph_set_namespace_decl_binding (tree decl, 
cp_binding_level *bl, int flags)
 {
   /* Set the namespace identifier binding for a single decl.  */
   tree id = DECL_NAME (decl);
-  if (id)
+  /* FIXME pph.  USING_DECLs do not seem to be used in bindings by
+ the parser. This was causing the SEGV in
+ testsuite/g++.dg/pph/x1mbstate_t.h.  It's unclear whether this is
+ the right fix.  */
+  if (id  TREE_CODE (decl) != USING_DECL)
 pph_set_identifier_binding (id, decl, bl, flags);
 }
 
diff --git a/gcc/testsuite/g++.dg/pph/x1mbstate_t.h 
b/gcc/testsuite/g++.dg/pph/x1mbstate_t.h
index c07a0cc..4d473e4 100644
--- a/gcc/testsuite/g++.dg/pph/x1mbstate_t.h
+++ b/gcc/testsuite/g++.dg/pph/x1mbstate_t.h
@@ -1,9 +1,8 @@
-// { dg-xfail-if identifier bindings not set properly { *-*-* } { 
-fpph-map=pph.map} }
-// { dg-bogus .*Segmentation fault ICE trying to parse std::mbstate_t  { 
xfail *-*-* } 0 }
 #ifndef _X1_MBSTATE_H
 #define _X1_MBSTATE_H
 #include x0mbstate_t.h
-// Name lookup for std::mbstate_t fails here.  Instead of returning the global
-// type_decl for mbstate_t, it returns the usings ::mbstate_t declaration.
+// Name lookup for std::mbstate_t was failingfails here.  Instead of returning
+// the global type_decl for mbstate_t, it was returning the
+// usings ::mbstate_t declaration.
 typedef std::mbstate_t state_type;
 #endif

--
This patch is available for review at http://codereview.appspot.com/5872043


[PATCH] Partial Transition fix attempt

2012-03-21 Thread redbrain
in tree-mudflap.c to change usage of fold_conver to build_int_cst 
http://gcc.gnu.org/wiki/Partial_Transitions. I am not 100% sure this is correct 
but maybe someone can shed some light.

---
 gcc/tree-mudflap.c |   38 --
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-mudflap.c b/gcc/tree-mudflap.c
index e4f6ec0..684b7b4 100644
--- a/gcc/tree-mudflap.c
+++ b/gcc/tree-mudflap.c
@@ -567,8 +567,7 @@ mf_build_check_statement_for (tree base, tree limit,
 
   /* Build: __mf_base = (uintptr_t) base address expression.  */
   seq = gimple_seq_alloc ();
-  t = fold_convert_loc (location, mf_uintptr_type,
-   unshare_expr (base));
+  t = build_int_cst_type (mf_uintptr_type, TREE_INT_CST_LOW (unshare_expr 
(base)));
   t = force_gimple_operand (t, stmts, false, NULL_TREE);
   gimple_seq_add_seq (seq, stmts);
   g = gimple_build_assign (mf_base, t);
@@ -576,8 +575,7 @@ mf_build_check_statement_for (tree base, tree limit,
   gimple_seq_add_stmt (seq, g);
 
   /* Build: __mf_limit = (uintptr_t) limit address expression.  */
-  t = fold_convert_loc (location, mf_uintptr_type,
-   unshare_expr (limit));
+  t = build_int_cst_type (mf_uintptr_type, TREE_INT_CST_LOW (unshare_expr 
(limit)));
   t = force_gimple_operand (t, stmts, false, NULL_TREE);
   gimple_seq_add_seq (seq, stmts);
   g = gimple_build_assign (mf_limit, t);
@@ -853,18 +851,21 @@ mf_xform_derefs_1 (gimple_stmt_iterator *iter, tree *tp,
if (elt)
  elt = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (elt)),
elt);
-addr = fold_convert_loc (location, ptr_type_node, elt ? elt : 
base);
+   addr = build_int_cst_type (ptr_type_node, elt ? TREE_INT_CST_LOW 
(elt) :
+  TREE_INT_CST_LOW (base));
 addr = fold_build_pointer_plus_loc (location,
addr, byte_position (field));
   }
 else
   addr = build1 (ADDR_EXPR, build_pointer_type (type), t);
-
+   
 limit = fold_build2_loc (location, MINUS_EXPR, mf_uintptr_type,
- fold_build2_loc (location, PLUS_EXPR, 
mf_uintptr_type,
- fold_convert (mf_uintptr_type, addr),
- size),
- integer_one_node);
+fold_build2_loc (location, PLUS_EXPR, 
mf_uintptr_type,
+ build_int_cst_type 
(mf_uintptr_type,
+ 
TREE_INT_CST_LOW (addr)),
+ 
+ size),
+integer_one_node);
   }
   break;
 
@@ -908,17 +909,17 @@ mf_xform_derefs_1 (gimple_stmt_iterator *iter, tree *tp,
   return;
 
 bpu = bitsize_int (BITS_PER_UNIT);
-ofs = fold_convert (bitsizetype, TREE_OPERAND (t, 2));
+   ofs = build_int_cst_type (bitsizetype, TREE_INT_CST_LOW (TREE_OPERAND 
(t, 2)));
 rem = size_binop_loc (location, TRUNC_MOD_EXPR, ofs, bpu);
 ofs = size_binop_loc (location, TRUNC_DIV_EXPR, ofs, bpu);
 
-size = fold_convert (bitsizetype, TREE_OPERAND (t, 1));
+size = build_int_cst_type (bitsizetype, TREE_INT_CST_LOW (TREE_OPERAND 
(t, 1)));
 size = size_binop_loc (location, PLUS_EXPR, size, rem);
 size = size_binop_loc (location, CEIL_DIV_EXPR, size, bpu);
-size = fold_convert (sizetype, size);
+size = build_int_cst_type (sizetype, TREE_INT_CST_LOW (size));
 
 addr = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
-addr = fold_convert (ptr_type_node, addr);
+addr = build_int_cst_type (ptr_type_node, TREE_INT_CST_LOW (addr));
 addr = fold_build_pointer_plus_loc (location, addr, ofs);
 
 base = addr;
@@ -1048,8 +1049,8 @@ mx_register_decls (tree decl, gimple_seq seq, location_t 
location)
 
  /* Variable-sized objects should have sizes already been
 gimplified when we got here. */
- size = fold_convert (size_type_node,
-  TYPE_SIZE_UNIT (TREE_TYPE (decl)));
+ size = build_int_cst_type (size_type_node,
+TREE_INT_CST_LOW (TYPE_SIZE_UNIT 
(TREE_TYPE (decl;
  gcc_assert (is_gimple_val (size));
 
 
@@ -1233,11 +1234,12 @@ mudflap_register_call (tree obj, tree object_size, tree 
varname)
   tree arg, call_stmt;
 
   arg = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (obj)), obj);
-  arg = fold_convert (ptr_type_node, arg);
+  arg = build_int_cst_type (ptr_type_node, TREE_INT_CST_LOW (arg));
 
   call_stmt = build_call_expr (mf_register_fndecl, 4,
   arg,
-  fold_convert 

Re: [C11-atomic] test invalid hoist across and acquire load

2012-03-21 Thread Andrew MacLeod

On 03/21/2012 01:35 PM, Aldy Hernandez wrote:
In the test below, we cannot cache either [x] or [y] neither before 
the load of flag1 nor the load of flag2.  This is because the 
corresponding store/release can flush a different value of x or y:


+  if (__atomic_load_n (flag1, __ATOMIC_ACQUIRE))
+i = x + y;
+
+  if (__atomic_load_n (flag2, __ATOMIC_ACQUIRE))
+a = 10;
+  j = x + y;



Actually, does it need to be that complicated?

can't you simply have the other_thread process monotonically increase 
x by 1 every cycle?


then if the load is hoisted and commoned,  
simulate_thread_final_verify() can simply check that if  i == j,  it 
knows that x was loaded as a common value and reused when calculating 
j.   with the other thread increasing x eveyr sycle, they should never 
be the same value.


Andrew


Re: [C11-atomic] test invalid hoist across and acquire load

2012-03-21 Thread Andrew MacLeod

On 03/21/2012 01:35 PM, Aldy Hernandez wrote:


The pass at fault here is the combine stack adjustment RTL pass.  I 
have not looked into why this is happening, but I wanted to get this 
test into the branch lest we forget about it.


Is this OK for the branch?  Is my understanding correct?


Fine for the C11-atomic branch..  we'll accumulate all the failing tests 
there for now... and then address them when we go memory-model 
compliance hunting...


Andrew


[google][4.6]Bump param value of default function size limit for auto cloning

2012-03-21 Thread Sriraman Tallam
Hi,

  I am bumping up the default param value of  function size limit for
auto cloning. Since auto cloning happens on inlined functions, the
original value does not catch some cases in one of our benchmarks.

  Automatic function versioning is only available in the
google/gcc-4_6 branch. I am working on porting this to trunk. Please
see: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01230.html for
description.

Thanks,
-Sri.


Index: params.def
===
--- params.def  (revision 185514)
+++ params.def  (working copy)
@@ -1040,7 +1040,7 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
 DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING,
  autoclone-function-size-limit,
  Do not auto clone functions beyond this size.,
- 450, 0, 10)
+ 5000, 0, 10)

 /*
 Local variables:


[C11-atomic] test invalid hoist across and acquire load

2012-03-21 Thread Aldy Hernandez
In the test below, we cannot cache either [x] or [y] neither before the 
load of flag1 nor the load of flag2.  This is because the corresponding 
store/release can flush a different value of x or y:


+  if (__atomic_load_n (flag1, __ATOMIC_ACQUIRE))
+i = x + y;
+
+  if (__atomic_load_n (flag2, __ATOMIC_ACQUIRE))
+a = 10;
+  j = x + y;

For example, on x86-64, we are hoisting x and y before the load of 
flag2:


movlflag1(%rip), %eax
movlx(%rip), %edx   -- hoist of X
testl   %eax, %eax
movly(%rip), %eax   -- hoist of Y
je  .L2
leal(%edx,%eax), %ecx
movl%ecx, i(%rip)
.L2:
movlflag2(%rip), %ecx
testl   %ecx, %ecx
je  .L3
movl$10, a(%rip)
.L3:
addl%edx, %eax  -- x/y may have changed by the
acquire of flag2.
movl%eax, j(%rip)
ret

(For that matter, we are also hoisting x before the actual test of 
flag1 as well, but I believe this is allowed since flag1 has already 
been loaded.)


The pass at fault here is the combine stack adjustment RTL pass.  I have 
not looked into why this is happening, but I wanted to get this test 
into the branch lest we forget about it.


Is this OK for the branch?  Is my understanding correct?

Aldy

Index: atomic-hoist-1.c
===
--- atomic-hoist-1.c(revision 0)
+++ atomic-hoist-1.c(revision 0)
@@ -0,0 +1,96 @@
+/* { dg-do link } */
+/* { dg-require-effective-target sync_int_long } */
+/* { dg-final { simulate-thread } } */
+
+/* Test that a hoist is not performed across an acquire barrier.  */
+
+#include stdio.h
+#include simulate-thread.h
+
+int flag1=1, flag2=1;
+
+unsigned int x=1, y=2, i=0x1234, j=0x5678, a;
+
+/* These two tables are random numbers such that there are no two
+   pairs between the both tables that yield the same sum.  */
+
+unsigned int table1[16] = {
+  24747, 19512, 3692, 25985,
+  25079, 24, 3310, 22073,
+  4026, 25641, 35240, 35542,
+  24783, 17378, 12184, 23755
+};
+
+unsigned int table2[16] = {
+  2467, 37461, 14064, 36460,
+  46434, 8387, 42174, 36763,
+  49205, 48759, 10526, 3446,
+  14035, 2195, 6798, 38782
+};
+
+int table_cycle_size = 16;
+
+/* At each instruction, get a new X and Y from the tables to later
+   verify that we have not reused a value incorrectly.  */
+void simulate_thread_other_threads ()
+{
+  static int current = 0;
+
+  if (++current = table_cycle_size)
+current = 0;
+  x = table1[current];
+  y = table2[current];
+}
+
+/* Return true if error, otherwise 0.  */
+int verify_result ()
+{
+  /* [i] should not equal [j], because that would mean that we hoisted
+ [x] and [y] instead of loading them again.  */
+  int fail = i == j;
+  if (fail)
+printf(FAIL: i (%u) should not equal j (%u)\n, i, j);
+  return fail;
+}
+
+int simulate_thread_step_verify ()
+{
+  return verify_result ();
+}
+
+int simulate_thread_final_verify ()
+{
+  return verify_result ();
+}
+
+__attribute__((noinline))
+void simulate_thread_main()
+{
+  /* The values of x or y should not be hoisted across reads of
+ flag[12].
+
+ For example, when the second load below synchronizes with another
+ thread, the synchronization is with a release, and that release
+ may cause a stored value of x/y to be flushed and become visible.
+ So, for this case, it is incorrect for CSE/CSA/and-others to
+ hoist x or y above the load of flag2.  */
+
+  /* Execute loads with value changing at various cyclic values.  */
+  if (__atomic_load_n (flag1, __ATOMIC_ACQUIRE))
+i = x + y;
+ 
+  if (__atomic_load_n (flag2, __ATOMIC_ACQUIRE))
+a = 10;
+  j = x + y;
+
+  /* Since x and y have been changing at each instruction above, i and j
+ should be different.  If they are the same, we have hoisted
+ something incorrectly.  */
+}
+
+main()
+{
+  simulate_thread_main ();
+  simulate_thread_done ();
+  return 0;
+}


Re: [google][4.6]Bump param value of default function size limit for auto cloning

2012-03-21 Thread Xinliang David Li
ok.

thanks,

David

On Wed, Mar 21, 2012 at 11:20 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

  I am bumping up the default param value of  function size limit for
 auto cloning. Since auto cloning happens on inlined functions, the
 original value does not catch some cases in one of our benchmarks.

  Automatic function versioning is only available in the
 google/gcc-4_6 branch. I am working on porting this to trunk. Please
 see: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01230.html for
 description.

 Thanks,
 -Sri.


 Index: params.def
 ===
 --- params.def  (revision 185514)
 +++ params.def  (working copy)
 @@ -1040,7 +1040,7 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
  DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING,
          autoclone-function-size-limit,
          Do not auto clone functions beyond this size.,
 -         450, 0, 10)
 +         5000, 0, 10)

  /*
  Local variables:


[PATCH, alpha]: Use generic config/elfos.h headers

2012-03-21 Thread Uros Bizjak
Hello!

Attached patch enables alpha to use generic config/elfos.h headers on
linux and *bsd targets. The most important difference to generic
elfos.h is in

* config/alpha/elf.h (TARGET_ASM_FILE_START_FILE_DIRECTIVE): Undefine.

and

* config/alpha/alpha.h (NO_DOLLAR_IN_LABEL): Undefine.

Otherwise, the patch is very carefully written to not change anything
in alpha.h (it only #undefines some symbols before #define and removes
READOLNY_DATA_SECTION_ASM_OP that is always defined elsewhere), so VMS
target should continue to work without problems.

2012-03-21  Uros Bizjak  ubiz...@gmail.com

* config.gcc (alpha*-*-linux*): Add elfos.h to tm_file.
(alpha*-*-freebsd*): Ditto.
(alpha*-*-netbsd*): Ditto.
(alpha*-*-openbsd*): Ditto.
* config/alpha/elf.h (OBJECT_FORMAT_ELF): Remove.
(DWARF2_DEBUGGING_INFO): Remove.
(PREFERRED_DEBUGGING_TYPE): Remove.
(ASM_FINAL_SPEC): Remove.
(IDENT_ASM_OP): Remove.
(ASM_OUTPUT_IDENT): Remove.
(SKIP_ASM_OP): Remove.
(ASM_OUTPUT_SKIP): Remove.
(ALIGN_ASM_OP): Remove.
(ASM_OUTPUT_BEFORE_CASE_LABEL): Remove.
(ASM_OUTPUT_CASE_LABEL): Remove.
(ASM_OUTPUT_EXTERNAL_LIBCALL): Remove.
(COMMON_ASM_OP): Remove.
(ASM_OUTPUT_ALIGNED_COMMON): Remove.
(ASCII_DATA_ASM_OP): Remove.
(READONLY_DATA_SECTION_ASM_OP): Remove.
(INIT_SECTION_ASM_OP): Remove.
(FINI_SECTION_ASM_OP): Remove.
(ASM_SECTION_START_OP): Remove.
(ASM_OUTPUT_SECTION_START_FILE): Remove.
(TARGET_ASM_NAMED_SECTION): Remove.
(TARGET_ASM_SELECT_SECTION): Remove.
(MAKE_DECL_ONE_ONLY): Remove.
(TYPE_ASM_OP): Remove.
(SIZE_ASM_OP): Remove.
(ASM_WEAKEN_LABEL): Remove.
(TYPE_OPERAND_FMT): Remove.
(ASM_DECLARE_RESULT): Remove.
(ASM_DECLARE_OBJECT_NAME): Remove.
(ASM_FINISH_DECLARE_OBJECT): Remove.
(ELF_ASCII_ESCAPES): Remove.
(ELF_STRING_LIMIT): Remove.
(STRING_ASM_OP): Remove.
(ASM_OUTPUT_EXTERNAL): Remove.
(TARGET_ASM_FILE_START_FILE_DIRECTIVE): Undefine.
* config/alpha/alpha.h (PCC_BITFIELD_TYPE_MATTERS): Undefine
before define.
(ASM_DECLARE_FUNCTION_NAME): Ditto.
(ASM_DECLARE_FUNCTION_SIZE): Ditto.
(USER_LABEL_PREFIX): Ditto.
(ASM_GENERATE_INTERNAL_LABEL): Ditto.
(ASM_OUTPUT_ASCII): Ditto.
(ASM_OUTPUT_SKIP): Ditto.
(READONLY_DATA_SECTION_ASM_OP): Remove.
(ASM_OUTPUT_CASE_LABEL): Remove.
(NO_DOLLAR_IN_LABEL): Undefine.

Patch was bootstrapped and regression tested on alphaev68-pc-linux-gnu.

OK for mainline?

Uros.
Index: config/alpha/alpha.h
===
--- config/alpha/alpha.h(revision 185578)
+++ config/alpha/alpha.h(working copy)
@@ -284,6 +284,7 @@
 #define STRUCTURE_SIZE_BOUNDARY 8
 
 /* A bit-field declared as `int' forces `int' alignment for the struct.  */
+#undef PCC_BITFILED_TYPE_MATTERS
 #define PCC_BITFIELD_TYPE_MATTERS 1
 
 /* No data type wants to be aligned rounder than this.  */
@@ -700,11 +701,13 @@
 
 /* This macro produces the initial definition of a function.  */
 
+#undef ASM_DECLARE_FUNCTION_NAME
 #define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \
   alpha_start_function(FILE,NAME,DECL);
 
 /* This macro closes up a function definition for the assembler.  */
 
+#undef ASM_DECLARE_FUNCTION_SIZE
 #define ASM_DECLARE_FUNCTION_SIZE(FILE,NAME,DECL) \
   alpha_end_function(FILE,NAME,DECL)
 
@@ -982,10 +985,6 @@
 
 #define TEXT_SECTION_ASM_OP \t.text
 
-/* Output before read-only data.  */
-
-#define READONLY_DATA_SECTION_ASM_OP \t.rdata
-
 /* Output before writable data.  */
 
 #define DATA_SECTION_ASM_OP \t.data
@@ -1022,12 +1021,14 @@
 
 /* The prefix to add to user-visible assembler symbols.  */
 
+#undef USER_LABEL_PREFIX
 #define USER_LABEL_PREFIX 
 
 /* This is how to output a label for a jump table.  Arguments are the same as
for (*targetm.asm_out.internal_label), except the insn for the jump table is
passed.  */
 
+#undef ASM_OUTPUT_CASE_LABEL
 #define ASM_OUTPUT_CASE_LABEL(FILE,PREFIX,NUM,TABLEINSN)   \
 { ASM_OUTPUT_ALIGN (FILE, 2); (*targetm.asm_out.internal_label) (FILE, PREFIX, 
NUM); }
 
@@ -1036,12 +1037,14 @@
PREFIX is the class of label and NUM is the number within the class.
This is suitable for output with `assemble_name'.  */
 
+#undef ASM_GENERATE_INTERNAL_LABEL
 #define ASM_GENERATE_INTERNAL_LABEL(LABEL,PREFIX,NUM)  \
   sprintf ((LABEL), *$%s%ld, (PREFIX), (long)(NUM))
 
 /* We use the default ASCII-output routine, except that we don't write more
than 50 characters since the assembler doesn't support very long lines.  */
 
+#undef ASM_OUTPUT_ASCII
 #define ASM_OUTPUT_ASCII(MYFILE, MYSTRING, MYLENGTH) \
   do {   \
 FILE 

[patch][PR52640] Fix quadratic behavior with many referenced extern functions

2012-03-21 Thread Steven Bosscher
Hello,

The test case for this bug triggeres O(extern_delcs**2) behavior
because value_member traverses the pending_assemble_externals list
from start to end for every new extern decl.

The solution I've picked, is to add a pointer set, and while there I
made pending_assemble_externals a VEC instead of a TREE_LIST. I've
also added a FIXME to clarify that this whole situation of having
places calling assemble_external is not desirable.

On gcc110, the test case compiles in ~4s with the patch, and ~24s
without. If I add LIM5(Y) and LIM5(Z), the compile time is ~15s with
the patch, and ~372s without the patch. I am not sure what is
reasonable for this kind of test case, but on a smaller machine the
test case will probably blow up if I add those two extra LIM5s.

Anyway. Bootstrappedtested on powerpc64-unknown-linux-gnu.
OK for trunk?
OK for all open release branches too?

Ciao!
Steven

gcc/
* varasm.c (pending_assemble_externals): Make a VEC.
(pending_assemble_externals_set): New pointer set.
(process_pending_assemble_externals): Traverse the VEC instead
of the TREE_LIST. Destroy the pointer set.
(assemble_external): See if decl is in pending_assemble_externals_set,
and add it to pending_assemble_externals if necessary.
(init_varasm_once): Allocate pending_assemble_externals and
pending_assemble_externals_set.

testsuite/
* gcc.c-torture/compile/limits-externdecl.c: New test for PR62640.
gcc/
* varasm.c (pending_assemble_externals): Make a VEC.
(pending_assemble_externals_set): New pointer set.
(process_pending_assemble_externals): Traverse the VEC instead
of the TREE_LIST. Destroy the pointer set.
(assemble_external): See if decl is in pending_assemble_externals_set,
and add it to pending_assemble_externals if necessary.
(init_varasm_once): Allocate pending_assemble_externals and
pending_assemble_externals_set.

testsuite/
* gcc.c-torture/compile/limits-externdecl.c: New test for PR62640.

Index: varasm.c
===
--- varasm.c(revision 185603)
+++ varasm.c(working copy)
@@ -2097,8 +2097,15 @@ contains_pointers_p (tree type)
the compilation unit is finalized.  This is the best we can do for
right now (i.e. stage 3 of GCC 4.0) - the right thing is to delay
it all the way to final.  See PR 17982 for further discussion.  */
-static GTY(()) tree pending_assemble_externals;
+static GTY(()) VEC(tree,gc) *pending_assemble_externals;
 
+/* FIXME: Trunk is at GCC 4.8 now and the above problem still hasn't been
+   addressed properly.  This caused PR 52640 due to O(external_decls**2)
+   lookups in the pending_assemble_externals queue in assemble_external.
+   Paper over with this pointer set.  (And pending_assemble_externals even
+   was a TREE_LIST before?!)  */
+static struct pointer_set_t *pending_assemble_externals_set;
+
 #ifdef ASM_OUTPUT_EXTERNAL
 /* True if DECL is a function decl for which no out-of-line copy exists.
It is assumed that DECL's assembler name has been set.  */
@@ -2146,11 +2153,14 @@ void
 process_pending_assemble_externals (void)
 {
 #ifdef ASM_OUTPUT_EXTERNAL
-  tree list;
-  for (list = pending_assemble_externals; list; list = TREE_CHAIN (list))
-assemble_external_real (TREE_VALUE (list));
+  size_t i;
+  tree decl;
 
-  pending_assemble_externals = 0;
+  FOR_EACH_VEC_ELT (tree, pending_assemble_externals, i, decl)
+assemble_external_real (decl);
+  VEC_free (tree, gc, pending_assemble_externals);
+
+  pointer_set_destroy (pending_assemble_externals_set);
 #endif
 }
 
@@ -2191,9 +2201,8 @@ assemble_external (tree decl ATTRIBUTE_UNUSED)
 weak_decls = tree_cons (NULL, decl, weak_decls);
 
 #ifdef ASM_OUTPUT_EXTERNAL
-  if (value_member (decl, pending_assemble_externals) == NULL_TREE)
-pending_assemble_externals = tree_cons (NULL, decl,
-   pending_assemble_externals);
+  if (! pointer_set_insert (pending_assemble_externals_set, decl))
+VEC_safe_push (tree, gc, pending_assemble_externals, decl);
 #endif
 }
 
@@ -6168,6 +6177,11 @@ init_varasm_once (void)
 
   if (readonly_data_section == NULL)
 readonly_data_section = text_section;
+
+#ifdef ASM_OUTPUT_EXTERNAL
+  pending_assemble_externals = VEC_alloc (tree, gc, 12);
+  pending_assemble_externals_set = pointer_set_create ();
+#endif
 }
 
 enum tls_model
Index: testsuite/gcc.c-torture/compile/limits-externdecl.c
===
--- testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 0)
+++ testsuite/gcc.c-torture/compile/limits-externdecl.c (revision 0)
@@ -0,0 +1,56 @@
+/* Inspired by the test case for PR middle-end/52640.  */
+
+typedef struct
+{
+char *value;
+} REFERENCE;
+
+/* Add a few extern int Xx (); declarations.  */
+#undef DEF
+#undef LIM1
+#undef LIM2
+#undef LIM3
+#undef LIM4

Re: [RFC, patch] powerpc64 FreeBSD support

2012-03-21 Thread Andreas Tobler

On 02.03.12 17:28, David Edelsohn wrote:

On Fri, Mar 2, 2012 at 11:04 AM, Andreas Toblerandreast-l...@fgznet.ch  wrote:


the attached patch adds support for powerpc64-*-freebsd*.
Results are/were sent to the test results list.

A few words about the patch.
I have chosen the way to add separate freebsd* files because FreeBSD
supports not as much as linux does in regard of the PowerPC CPU's. For
example we only have FreeBSD running on 970 CPU's. POWER support on the
kernel side is wip, but it grows very slow.

Also, powerpc FreeBSD has 64-bit long doubles while linux-ppc has 128-bit.

The architecture itself is a 64-bit one which can execute 32-bit
binaries. But it its primary bit width is 64-bit. It is not wanted that
we have a 32-bit compiler which can build 64-bit objects.

To have a clearer picture and not to influence linux-ppc I thought
having its own set of files would be easier.

I'd appreciate feedback on the patch.

There are some issues with the tests, I'm cleaning and trying to make
them work. In certain areas I need to fix the kernel or the libc first
and this takes some time.

One thing is the libstdc++ test I had to skip. Here the test tries to
allocate int_max mem. And this blows my machine :)

Here the CL's.

Thank you!
Andreas

gcc:


2011-12-22  Andreas Toblerandre...@fgznet.ch

* configure.ac (HAVE_LD_NO_DOT_SYMBOLS): Add powerpc64-*-freebsd*.
Introduce emul_name to select the right linker emulation for
powerpc64-*-freebsd*.
* configure: Regenerate.
* config.gcc: Add bits to support powerpc64-*-freebsd*.
* config/rs6000/freebsd.h (POWERPC_FREEBSD): Define.
* config/rs6000/freebsd64.h: New file.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Use
POWERPC_FREEBSD.
(rs6000_savres_strategy): Likewise.
(rs6000_savres_routine_name): Likewise.
(rs6000_elf_file_end): Likewise.
* config/rs6000/t-freebsd64: New file.
* config/rs6000/sysv4.h (SUBTARGET_OVERRIDE_OPTIONS): Set the
rs6000_current_abi for 64-bit FreeBSD to ABI_AIX.

libgcc:
---

2011-12-22  Andreas Toblerandre...@fgznet.ch

* config.host: Add bits to support powerpc64-*-freebsd*.
* config/rs6000/freebsd-unwind.h: New file.
* config/rs6000/t-freebsd64: New file.

libstdc++:
--
2011-12-22  Andreas Toblerandre...@fgznet.ch

* testsuite/23_containers/vector/bool/modifiers/insert/31370.cc:
Skip
this test on powerpc64-*-freebsd*.


This patch is okay for trunk now that it has opened for 4.8 release.
I thought that Loren might comment.


Thank you David!

I commited this patch to trunk, r185613-r185615.

Results are here:

http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg02067.html

I'll continue to improve the port...

Again, thanks.
Andreas





Re: [PATCH] Preserve loops from tree to RTL loop optimizers

2012-03-21 Thread Richard Sandiford
Richard Guenther rguent...@suse.de writes:
 This patch makes us preserve loop structures from the start of tree
 loop optimizers to the end of RTL loop optimizers.  It uses a new
 property, PROP_loops to indicate we want to preserve them and
 massages loop_optimizer_init/finalize to honor that.

 On the RTL side the CFG manipulation was not prepared to handle
 updating loops, so this patch fills in enough to pass bootstrap
 and regtesting.  We still do too much loop fixing from cleanup_cfg
 basically because callers do not tell cleanup_cfg if they have
 modified the CFG themselves (CSE for example does in some cases).
 It was suggested to use a new flag to cleanup_cfg to do that,
 other suggestions welcome.

 Bootstrapped on x86_64-unknown-linux-gnu, testing shows some
 remaining libstdc++ errors, I am investigating them now but
 don't expect major issues.

 Comments?  The patch is ontop of the early RTL pass merge.

Thanks for doing this (and for keeping the ~PROP_loops case around for
passes after rtl_loop_done -- I have a patch that uses it for SMS).

Richard


[patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher
Hello,

There is no reason for the ObjC front end to call assemble_external on
these symbols, the middle-end handles this just fine via
add_builtin_function.

Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven

objc/
* objc-act (objc_build_ivar_assignment): Do not call assemble_external.
(objc_build_global_assignment): Likewise.
(objc_build_strong_cast_assignment): Likewise.
* objc-next-runtime-abi-01.c: Cleanup commented-out assemble_external.
* objc-next-runtime-abi-02.c: Likewise.
* objc-gnu-runtime-abi-01.c: Likewise.
objc/
	* objc-act (objc_build_ivar_assignment): Do not call assemble_external.
	(objc_build_global_assignment): Likewise.
	(objc_build_strong_cast_assignment): Likewise.
	* objc-next-runtime-abi-01.c: Cleanup commented-out assemble_external.
	* objc-next-runtime-abi-02.c: Likewise.
	* objc-gnu-runtime-abi-01.c: Likewise.

Index: objc-act.c
===
--- objc-act.c	(revision 185603)
+++ objc-act.c	(working copy)
@@ -3553,7 +3553,6 @@ objc_build_ivar_assignment (tree outervar, tree lh
 		tree_cons (NULL_TREE, offs,
 		NULL_TREE)));
 
-  assemble_external (func);
   return build_function_call (input_location, func, func_params);
 }
 
@@ -3566,7 +3565,6 @@ objc_build_global_assignment (tree lhs, tree rhs)
 		  build_unary_op (input_location, ADDR_EXPR, lhs, 0)),
 		NULL_TREE));
 
-  assemble_external (objc_assign_global_decl);
   return build_function_call (input_location,
 			  objc_assign_global_decl, func_params);
 }
@@ -3580,7 +3578,6 @@ objc_build_strong_cast_assignment (tree lhs, tree
 		  build_unary_op (input_location, ADDR_EXPR, lhs, 0)),
 		NULL_TREE));
 
-  assemble_external (objc_assign_strong_cast_decl);
   return build_function_call (input_location,
 			  objc_assign_strong_cast_decl, func_params);
 }
Index: objc-next-runtime-abi-01.c
===
--- objc-next-runtime-abi-01.c	(revision 185603)
+++ objc-next-runtime-abi-01.c	(working copy)
@@ -977,7 +977,6 @@ next_runtime_abi_01_get_category_super_ref (locati
   /* else do it the slow way.  */
   add_class_reference (super_name);
   super_class = (inst_meth ? objc_get_class_decl : objc_get_meta_class_decl);
-/* assemble_external (super_class);*/
   super_name = my_build_string_pointer (IDENTIFIER_LENGTH (super_name) + 1,
 	IDENTIFIER_POINTER (super_name));
   /* super_class = objc_get{Meta}Class(CLASS_SUPER_NAME); */
Index: objc-next-runtime-abi-02.c
===
--- objc-next-runtime-abi-02.c	(revision 185603)
+++ objc-next-runtime-abi-02.c	(working copy)
@@ -1509,7 +1509,6 @@ next_runtime_abi_02_get_category_super_ref (locati
   /* ??? Do we need to add the class ref anway for zero-link?  */
   /* else do it the slow way.  */
   super_class = (inst_meth ? objc_get_class_decl : objc_get_meta_class_decl);
-  /* assemble_external (super_class); */
   super_name = my_build_string_pointer (IDENTIFIER_LENGTH (super_name) + 1,
 	IDENTIFIER_POINTER (super_name));
   /* super_class = objc_get{Meta}Class(CLASS_SUPER_NAME); */
Index: objc-gnu-runtime-abi-01.c
===
--- objc-gnu-runtime-abi-01.c	(revision 185603)
+++ objc-gnu-runtime-abi-01.c	(working copy)
@@ -574,8 +574,6 @@ gnu_runtime_abi_01_get_class_reference (tree ident
 		(IDENTIFIER_LENGTH (ident) + 1,
 		 IDENTIFIER_POINTER (ident)));
 
-  /* FIXME: Do we need this assemble_external() ? */
-  /* assemble_external (objc_get_class_decl);*/
   return build_function_call (input_location, objc_get_class_decl, params);
 }
 
@@ -839,8 +837,6 @@ gnu_runtime_abi_01_get_category_super_ref (locatio
 
   add_class_reference (super_name);
   super_class = (inst_meth ? objc_get_class_decl : objc_get_meta_class_decl);
-  /* FIXME: Do we need this assemble_external() ? */
-  /* assemble_external (super_class);*/
   super_name = my_build_string_pointer (IDENTIFIER_LENGTH (super_name) + 1,
 	IDENTIFIER_POINTER (super_name));
   /* super_class = get_{meta_}class(CLASS_SUPER_NAME);  */


Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Iain Sandoe

Hi Steven,

On 21 Mar 2012, at 21:09, Steven Bosscher wrote:

There is no reason for the ObjC front end to call assemble_external on
these symbols, the middle-end handles this just fine via
add_builtin_function.


Ah, that's the bit I'd yet to figure out ...


Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven

objc/
	* objc-act (objc_build_ivar_assignment): Do not call  
assemble_external.

(objc_build_global_assignment): Likewise.
(objc_build_strong_cast_assignment): Likewise.
	* objc-next-runtime-abi-01.c: Cleanup commented-out  
assemble_external.

* objc-next-runtime-abi-02.c: Likewise.
* objc-gnu-runtime-abi-01.c: Likewise.
cleanup_objc_assemble_external.diff


... this would allow us to close PR17982?

... and make progress on PR24777? (I'm not sure where exactly we need  
to go with this one - we have different sets of calls depending on the  
runtime)


Iain



Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher
On Wed, Mar 21, 2012 at 10:23 PM, Iain Sandoe
develo...@sandoe-acoustics.co.uk wrote:
 objc/
        * objc-act (objc_build_ivar_assignment): Do not call
 assemble_external.
        (objc_build_global_assignment): Likewise.
        (objc_build_strong_cast_assignment): Likewise.
        * objc-next-runtime-abi-01.c: Cleanup commented-out
 assemble_external.
        * objc-next-runtime-abi-02.c: Likewise.
        * objc-gnu-runtime-abi-01.c: Likewise.
 cleanup_objc_assemble_external.diff


 ... this would allow us to close PR17982?

I believe so, but I have to look into it a but deeper to understand
what the remaining assemble_external calls are for. The ones in
config/* are OK, they are all for writing out multiple-inheritance
thunks. The ones that need checking are:

calls.c:1649: assemble_external (fndecl);
expr.c:1422:  assemble_external (block_move_fn);
expr.c:2794:  assemble_external (block_clear_fn);
expr.c:7423:  assemble_external (exp);
expr.c:9022:  assemble_external (exp);
final.c:2745: assemble_external (t);
final.c:3497:   assemble_external (t);
final.c:3568:   assemble_external (SYMBOL_REF_DECL (x));
toplev.c:489:  assemble_external (decl);


 ... and make progress on PR24777? (I'm not sure where exactly we need to go
 with this one - we have different sets of calls depending on the runtime)

The FIXMEs in that PR do not exist anymore. Perhaps this removed them:

2011-10-11  Michael Meissner  meissner at linux dot vnet dot ibm dot com

* objc-next-runtime-abi-01.c (objc_build_exc_ptr): Delete old
interface with two parallel arrays to hold standard builtin
declarations, and replace it with a function based interface that
can support creating builtins on the fly in the future.  Change
all uses, and poison the old names.  Make sure 0 is not a
legitimate builtin index.
* objc-next-runtime-abi-02.c (objc_build_exc_ptr): Ditto.
* objc-gnu-runtime-abi-01.c (objc_build_exc_ptr): Ditto.

In any case, if there's nothing left to fix for PR24777, I suppose it
can be closed as FIXED.

Ciao!
Steven


Re: [Patch, fortran] PR 49010/24518 MOD/MODULO fixes

2012-03-21 Thread Janne Blomqvist
PING

On Wed, Mar 14, 2012 at 01:03, Janne Blomqvist
blomqvist.ja...@gmail.com wrote:
 Hi,

 the attached patch implements a few fixes and cleanups for the MOD and
 MODULO intrinsics.

 - When the arguments are constant, use mpfr_fmod instead of the naive
 algorithms which are numerically unstable for large arguments. This
 extends the PR 24518 fix to constant arguments as well, and makes the
 compile-time evaluation match the runtime implementation which also
 uses fmod in the same manner.

 - Remove the old fallback path for the case builtin_fmod is not
 available, as the builtin is AFAICS always available.

 The patch does not per se fix the corner-case bug as reported in PR
 49010, in fact it makes it worse in a way as with the patch the result
 if the arguments are parameters is the same as the runtime result
 (previously, the compile-time result was correct). But, I think we
 should leave it as it is. Due to the reasons above, we're not using
 the naive algorithms anyway, and IMHO -0.0 is quite a good
 approximation for +0.0 anyway. One might even argue that due to the
 numerical instability, specifying the naive algorithms is a bug in the
 standard.

 The patch adds notes to the documentation about the usage of fmod, so
 users interested in corner-case behavior can look up how that function
 is supposed to behave on their target. FWIW, AFAICS MPFR and glibc
 fmod conform to the behavior specified in C99 Annex F.

 Regtested on x86_64-unknown-linux-gnu, Ok for trunk?

 2012-03-14  Janne Blomqvist  j...@gcc.gnu.org

        PR fortran/49010
        PR fortran/24518
        * intrinsic.texi (MOD,MODULO): Mention usage of fmod instead of
        naive algorithm.
        * simplify.c (gfc_simplify_mod): Use mpfr_fmod.
        (gfc_simplify_modulo): Likewise.
        * trans-intrinsic.c (gfc_conv_intrinsic_mod): Remove fallback as
        builtin_fmod is always available.


 --
 Janne Blomqvist



-- 
Janne Blomqvist


[google] Minor cleanup and test fixes for -mpatch-functions-for-instrumentation. (issue5877043)

2012-03-21 Thread Harshit Chopra
2012-03-21   Harshit Chopra  hars...@google.com

  Minor changes:
i386.c: made check_should_patch_current_function C90 compatible.
i386.md: Added '\t' to bytes generated by
 ix86_output_function_nops_prologue_epilogue for proper formatting
 of assembly.
patch-functions-*.c: Fixed verification in tests. Added a test to verify
 nop-bytes generated for sibling calls and another test
 to verify a binary with nop-bytes runs properly.

* gcc/config/i386/i386.c (check_should_patch_current_function):
* gcc/config/i386/i386.md:
* gcc/testsuite/gcc.target/i386/patch-functions-1.c (void foo):
(int main):
* gcc/testsuite/gcc.target/i386/patch-functions-2.c:
* gcc/testsuite/gcc.target/i386/patch-functions-3.c:
* gcc/testsuite/gcc.target/i386/patch-functions-4.c:
* gcc/testsuite/gcc.target/i386/patch-functions-5.c:
* gcc/testsuite/gcc.target/i386/patch-functions-6.c:
* gcc/testsuite/gcc.target/i386/patch-functions-7.c:
* gcc/testsuite/gcc.target/i386/patch-functions-8.c (int foo):
(int bar):
* gcc/testsuite/gcc.target/i386/patch-functions-sibling-call.c:

Testing method:
  make check-gcc RUNTESTFLAGS=i386.exp=patch-functions* 
--target_board=\unix{-m32,}\

Patch to be applied to google/main.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 08bd5f0..be1f7a4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10981,6 +10981,7 @@ check_should_patch_current_function (void)
   const char* func_name = NULL;
   struct loops loops;
   int num_loops = 0;
+  int min_functions_instructions;
 
   /* Patch the function if it has at least a loop.  */
   if (!patch_functions_ignore_loops)
@@ -11007,7 +11008,7 @@ check_should_patch_current_function (void)
   strcmp(main, func_name) == 0)
 return true;
 
-  int min_functions_instructions =
+  min_functions_instructions =
   PARAM_VALUE (PARAM_FUNCTION_PATCH_MIN_INSTRUCTIONS);
   if (min_functions_instructions  0)
 {
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 08353ff..38a04ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11688,7 +11688,7 @@
   /* Emit 10 nop bytes after ret.  */
   if (ix86_output_function_nops_prologue_epilogue (asm_out_file,
   
FUNCTION_PATCH_EPILOGUE_SECTION,
-  ret,
+  \tret,
   10))
return ;
 }
@@ -11712,7 +11712,7 @@
   /* Emit 9 nop bytes after rep;ret.  */
   if (ix86_output_function_nops_prologue_epilogue (asm_out_file,
   
FUNCTION_PATCH_EPILOGUE_SECTION,
-  rep\;ret,
+  \trep\;ret,
   9))
return ;
 }
diff --git a/gcc/testsuite/gcc.target/i386/patch-functions-1.c 
b/gcc/testsuite/gcc.target/i386/patch-functions-1.c
index 308e8c3..aa1f424 100644
--- a/gcc/testsuite/gcc.target/i386/patch-functions-1.c
+++ b/gcc/testsuite/gcc.target/i386/patch-functions-1.c
@@ -1,5 +1,5 @@
 /* Verify -mpatch-functions-for-instrumentation works.  */
-/* { dg-do run } */
+/* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-options -mpatch-functions-for-instrumentation } */
 
@@ -8,13 +8,16 @@
 /* Check nop-bytes at end.  */
 /* { dg-final { scan-assembler ret(.*).byte\t0x90(.*).byte\t0x90 } } */
 
-void foo() {
+__attribute__ ((noinline))
+void foo()
+{
   /* Dummy loop.  */
   int x = 0;
   while (++x);
 }
 
-int main() {
+int main()
+{
   foo();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/i386/patch-functions-2.c 
b/gcc/testsuite/gcc.target/i386/patch-functions-2.c
index 6baad32..78de867 100644
--- a/gcc/testsuite/gcc.target/i386/patch-functions-2.c
+++ b/gcc/testsuite/gcc.target/i386/patch-functions-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-options -mpatch-functions-for-instrumentation 
-mno-patch-functions-main-always } */
 
@@ -8,11 +8,14 @@
 /* { dg-final { scan-assembler-not .byte\t0xeb,0x09(.*).byte\t0x90 } } */
 /* { dg-final { scan-assembler-not ret(.*).byte\t0x90(.*).byte\t0x90 } } */
 
-void foo() {
+__attribute__ ((noinline))
+void foo()
+{
   int x = 0;
 }
 
-int main() {
+int main()
+{
   foo();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/i386/patch-functions-3.c 
b/gcc/testsuite/gcc.target/i386/patch-functions-3.c
index 49b57a8..9e8eb52 100644
--- a/gcc/testsuite/gcc.target/i386/patch-functions-3.c
+++ b/gcc/testsuite/gcc.target/i386/patch-functions-3.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do compile } 

Re: [fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Mike Stump
On Mar 21, 2012, at 10:16 AM, Bruce Korb wrote:
 Patch welcome!  I, myself, don't know what emit DejaGnu-style
 fixincludes.{sum, log} files would mean.

Rather simple...  In a file called fixinclude.sum, put

  PASS: unique string

or

  FAIL: unique string

one per line, as many times as you want.  The unique strings should be 
meaningful to you in some way, and be stable over long periods of time (no 
`pwd` or `date` in them for example).  You can write a python script, a awk 
script, a bash script or a c program to generate this.  You can synthesize this 
from any source you can pull from, for example, the existing testing code or 
test report you might have.

Bonus points if you can total passes and failures:

exec $file
echo
echo # of expected passes$(cat $file | grep 'PASS:' | wc -l)
echo # of expected failures$(cat $file | grep 'FAIL:' | wc -l)

at the end of the file.  The above by the way, will add it (/bin/sh style), if 
you just create the $file.  That's it, done.  For example, if you just had a 
single, it all worked flawlessly thing, you could do:

if check-cmd; then
 echo PASS: fixinclude
else
 echo FAIL: fixinclude
fi

and then the code above, and viola, you're done.  To create the .log file, cp 
fixincludes.sum fixincludes.log, if you have nothing better to do.


Re: [fixincludes] Fix pthread.h failure (PR other/52626)

2012-03-21 Thread Mike Stump
On Mar 21, 2012, at 2:46 PM, Mike Stump wrote:
 echo # of expected failures$(cat $file | grep 'FAIL:' | wc -l)

Oh, and if you expect perfection, you should use:

echo # of unexpected failures$(cat $file | grep 'FAIL:' | wc -l)

instead.


Re: [PATCH] Bug fix in store_bit_field_1 for big endian targets (issue 51893)

2012-03-21 Thread Eric Botcazou
 This patch (for 4.6) fixes a wrong subword index computation in
 store_bit_field_1 for big endian targets when value is at least 4 times
 bigger than a word (DI REG value with HI words).

 It fixes a regression on gcc.c-torture/execute/bitfld-3.c for my current
 backend port.

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51893

 OK to commit?

It is OK for mainline on principle but there is no ChangeLog entry and the 
patch doesn't follow the GNU Coding Style: TABs instead of spaces, spaces 
before parentheses, etc.  See the equivalent code in extract_bit_field_1.
Moreover you need to properly test it on a mainstream big-endian platform.

See http://gcc.gnu.org/contribute.html for a more complete reference.

-- 
Eric Botcazou


Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Mike Stump
On Mar 21, 2012, at 2:09 PM, Steven Bosscher wrote:
 There is no reason for the ObjC front end to call assemble_external on
 these symbols,

 OK for trunk?

Ok.  Watch for hate mail from Jack, if you guess wrong.  :-)


Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Mike Stump
On Mar 21, 2012, at 2:32 PM, Steven Bosscher wrote:
 In any case, if there's nothing left to fix for PR24777, I suppose it
 can be closed as FIXED.

I see all sorts of FIXME: in c-decl.c still...  Anyway, someone needs to sort 
out what is done and remains undone and update the FIXMEs...  I don't know 
which ones are dead.


Re: [PATCH, alpha]: Use generic config/elfos.h headers

2012-03-21 Thread Richard Henderson
On 03/21/12 12:28, Uros Bizjak wrote:
 Hello!
 
 Attached patch enables alpha to use generic config/elfos.h headers on
 linux and *bsd targets. The most important difference to generic
 elfos.h is in
 
   * config/alpha/elf.h (TARGET_ASM_FILE_START_FILE_DIRECTIVE): Undefine.

This one can be deleted, since

2012-03-14  Rainer Orth  r...@cebitec.uni-bielefeld.de

...
* config/alpha/alpha.c (alpha_file_start): Always assume
OBJECT_FORMAT_ELF.
Don't set targetm.asm_file_start_file_directive.
[!OBJECT_FORMAT_ELF]: Remove.
(TARGET_ASM_FILE_START_FILE_DIRECTIVE): Remove.

The patch looks ok, with,

 +#undef USER_LABEL_PREFIX
  #define USER_LABEL_PREFIX 

This ought to be deletable.

  /* We use the default ASCII-output routine, except that we don't write more
 than 50 characters since the assembler doesn't support very long lines.  
 */
  
 +#undef ASM_OUTPUT_ASCII
  #define ASM_OUTPUT_ASCII(MYFILE, MYSTRING, MYLENGTH) \

Likewise.  Since RO deleted all non-gas support.


r~


Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher
On Wed, Mar 21, 2012 at 11:11 PM, Mike Stump mikest...@comcast.net wrote:
 On Mar 21, 2012, at 2:32 PM, Steven Bosscher wrote:
 In any case, if there's nothing left to fix for PR24777, I suppose it
 can be closed as FIXED.

 I see all sorts of FIXME: in c-decl.c still...  Anyway, someone needs to sort 
 out what is done and remains undone and update the FIXMEs...  I don't know 
 which ones are dead.


Ehm, yes. I was looking in the wrong place (objc/*). The weird
not-really-builtin functions are the ones added via
add_builtin_function but with builtin type NOT_BUILT_IN. Those
problems still appear to be there:

objc/objc-act.c:= add_builtin_function (TAG_EXCEPTIONTHROW,
temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-act.c:= add_builtin_function (TAG_SYNCENTER, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_SYNCEXIT, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_ENUMERATION_MUTATION,
type, 0, NOT_BUILT_IN,
objc/objc-gnu-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_GETMETACLASS, type, 0, NOT_BUILT_IN, NULL, NULL_TREE);
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_SETJMP, temp_type, 0, NOT_BUILT_IN, NULL, NULL_TREE);
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONEXTRACT, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONTRYENTER, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONTRYEXIT, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_EXCEPTIONMATCH, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNIVAR, temp_type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNGLOBAL, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNSTRONGCAST, temp_type, 0, NOT_BUILT_IN, NULL,
(+ some not found by grep because the add_builtin_function call spans
multiple lines)

FWIW, Java does this too:
java/decl.c:= add_builtin_function (_Jv_MonitorEnter, t, 0, NOT_BUILT_IN,
java/decl.c:= add_builtin_function (_Jv_MonitorExit, t, 0, NOT_BUILT_IN,

Ciao!
Steven


Re: [patch] Split parts of cse_insn out to a few new functions

2012-03-21 Thread Steven Bosscher
On Wed, Mar 21, 2012 at 1:13 AM, Ian Lance Taylor wrote:
 On Tue, Mar 20, 2012 at 2:06 PM, Steven Bosscher wrote:

 This patch splits a couple of pieces of cse_insn out to new functions.
 There are no functional changes, and no code generation differences as
 far as I could tell on x86_64 (-m64 and -m32).

Likewise for the attached patch.

 The purpose of the patch is and, loto hopefully make cse_insn easier
 to understand. In a follow-up patch, I will make canonicalize_insn run
 only once per insn (it currently, i.e. before and after this patch,
 runs multiple times for CSE on extended basic blocks if a block is in
 multiple extended basic blocks).

That is what the attached patch does.

Bootstrappedtested on x86_64-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven

* cse.c (cse_canonicalized_basic_blocks): New simple bitmap to
tag basic blocks that have already been traversed at least once,
so that all insns have been canonicalized.
(cse_insn): Call canonicalize_insn only if the basic block that
contains insn is visited for the first time.
(cse_extended_basic_block): After visiting all insns in a basic
block, mark the block in cse_canonicalized_basic_blocks.
(cse_main): Setup and destroy cse_canonicalized_basic_blocks.

(cse_find_path): Micro-optimization, reorder one condition to
avoid a reference to cfun.
	* cse.c (cse_canonicalized_basic_blocks): New simple bitmap to
	tag basic blocks that have already been traversed at least once,
	so that all insns have been canonicalized.
	(cse_insn): Call canonicalize_insn only if the basic block that
	contains insn is visited for the first time.
	(cse_extended_basic_block): After visiting all insns in a basic
	block, mark the block in cse_canonicalized_basic_blocks.
	(cse_main): Setup and destroy cse_canonicalized_basic_blocks.

	(cse_find_path): Micro-optimization, reorder one condition to
	avoid a reference to cfun.

Index: cse.c
===
--- cse.c	(revision 185622)
+++ cse.c	(working copy)
@@ -551,6 +551,10 @@ static bitmap cse_ebb_live_in, cse_ebb_l
already as part of an already processed extended basic block.  */
 static sbitmap cse_visited_basic_blocks;
 
+/* A simple bitmap to track for which basic blocks all insns have been
+   canonicalized already.  */
+static sbitmap cse_canonicalized_basic_blocks;
+
 static bool fixed_base_plus_p (rtx x);
 static int notreg_cost (rtx, enum rtx_code, int);
 static int approx_reg_cost_1 (rtx *, void *);
@@ -4492,8 +4496,10 @@ cse_insn (rtx insn)
   /* Record all the SETs in this instruction.  */
   n_sets = find_sets_in_insn (insn, sets);
 
-  /* Substitute the canonical register where possible.  */
-  canonicalize_insn (insn, sets, n_sets);
+  /* If we have not visited this block before (as part of another extended
+ basic block, substitute the canonical register where possible.  */
+  if (!TEST_BIT (cse_canonicalized_basic_blocks, BLOCK_FOR_INSN (insn)-index))
+canonicalize_insn (insn, sets, n_sets);
 
   /* If this insn has a REG_EQUAL note, store the equivalent value in SRC_EQV,
  if different, or if the DEST is a STRICT_LOW_PART.  The latter condition
@@ -6254,10 +6260,9 @@ cse_find_path (basic_block first_bb, str
 	  else
 	e = NULL;
 
-	  if (e
-	   !((e-flags  EDGE_ABNORMAL_CALL)  cfun-has_nonlocal_label)
-	   e-dest != EXIT_BLOCK_PTR
+	  if (e  e-dest != EXIT_BLOCK_PTR
 	   single_pred_p (e-dest)
+	   !((e-flags  EDGE_ABNORMAL_CALL)  cfun-has_nonlocal_label)
 	  /* Avoid visiting basic blocks twice.  The large comment
 		 above explains why this can happen.  */
 	   !TEST_BIT (cse_visited_basic_blocks, e-dest-index))
@@ -6452,6 +6457,9 @@ cse_extended_basic_block (struct cse_bas
 	}
 	}
 
+  /* We have now canonicalized all insns in this basic block.  */
+  SET_BIT (cse_canonicalized_basic_blocks, bb-index);
+
   /* With non-call exceptions, we are not always able to update
 	 the CFG properly inside cse_insn.  So clean up possibly
 	 redundant EH edges here.  */
@@ -6555,6 +6563,10 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nr
   cse_visited_basic_blocks = sbitmap_alloc (last_basic_block);
   sbitmap_zero (cse_visited_basic_blocks);
 
+  /* Set up the table of already canonicalized basic blocks.  */
+  cse_canonicalized_basic_blocks = sbitmap_alloc (last_basic_block);
+  sbitmap_zero (cse_canonicalized_basic_blocks);
+
   /* Loop over basic blocks in reverse completion order (RPO),
  excluding the ENTRY and EXIT blocks.  */
   n_blocks = pre_and_rev_post_order_compute (NULL, rc_order, false);
@@ -6598,6 +6610,7 @@ cse_main (rtx f ATTRIBUTE_UNUSED, int nr
   free (reg_eqv_table);
   free (ebb_data.path);
   sbitmap_free (cse_visited_basic_blocks);
+  sbitmap_free (cse_canonicalized_basic_blocks);
   free (rc_order);
   rtl_hooks = general_rtl_hooks;
 


Re: [patch] Split parts of cse_insn out to a few new functions

2012-03-21 Thread Steven Bosscher
On Thu, Mar 22, 2012 at 12:09 AM, Steven Bosscher stevenb@gmail.com wrote:

        (cse_find_path): Micro-optimization, reorder one condition to
        avoid a reference to cfun.

Ah, and please ignore this bit. I don't know what I was thinking...


Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Iain Sandoe


On 21 Mar 2012, at 22:45, Steven Bosscher wrote:

On Wed, Mar 21, 2012 at 11:11 PM, Mike Stump mikest...@comcast.net  
wrote:

On Mar 21, 2012, at 2:32 PM, Steven Bosscher wrote:
In any case, if there's nothing left to fix for PR24777, I suppose  
it

can be closed as FIXED.


I see all sorts of FIXME: in c-decl.c still...  Anyway, someone  
needs to sort out what is done and remains undone and update the  
FIXMEs...  I don't know which ones are dead.



Ehm, yes. I was looking in the wrong place (objc/*). The weird
not-really-builtin functions are the ones added via
add_builtin_function but with builtin type NOT_BUILT_IN. Those
problems still appear to be there:

objc/objc-act.c:= add_builtin_function (TAG_EXCEPTIONTHROW,
temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-act.c:= add_builtin_function (TAG_SYNCENTER, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_SYNCEXIT, temp_type,
0, NOT_BUILT_IN,
objc/objc-act.c:= add_builtin_function (TAG_ENUMERATION_MUTATION,
type, 0, NOT_BUILT_IN,
objc/objc-gnu-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_GETCLASS, type, 0, NOT_BUILT_IN,


snip


(TAG_ASSIGNGLOBAL, temp_type, 0, NOT_BUILT_IN, NULL,
objc/objc-next-runtime-abi-01.c:= add_builtin_function
(TAG_ASSIGNSTRONGCAST, temp_type, 0, NOT_BUILT_IN, NULL,
(+ some not found by grep because the add_builtin_function call spans
multiple lines)


conceptually, the issue is that there are multiple sets of built-ins  
(potentially, one set for each runtime, and the sets are of different  
sizes).  Thus, it's not just a case of turning these into regular  
built-ins, without some mechanism to cater for overloading or presence/ 
absence of particular ones.






Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Steven Bosscher
On Thu, Mar 22, 2012 at 12:53 AM, Iain Sandoe
develo...@sandoe-acoustics.co.uk wrote:

 conceptually, the issue is that there are multiple sets of built-ins
 (potentially, one set for each runtime, and the sets are of different
 sizes).  Thus, it's not just a case of turning these into regular built-ins,
 without some mechanism to cater for overloading or presence/absence of
 particular ones.

I don't understand this. We're committed to one runtime per
compilation, right? If so, then we should create only one of the sets
at any time.

Ciao!
Steven


Re: [patch][objc] Do not call assemble_external

2012-03-21 Thread Iain Sandoe


On 22 Mar 2012, at 00:00, Steven Bosscher wrote:


On Thu, Mar 22, 2012 at 12:53 AM, Iain Sandoe
develo...@sandoe-acoustics.co.uk wrote:


conceptually, the issue is that there are multiple sets of built-ins
(potentially, one set for each runtime, and the sets are of different
sizes).  Thus, it's not just a case of turning these into regular  
built-ins,
without some mechanism to cater for overloading or presence/absence  
of

particular ones.


I don't understand this. We're committed to one runtime per
compilation, right? If so, then we should create only one of the sets
at any time.


Yes, that's true [notwithstanding an erroneous invocation of LTO with  
mixed inputs, which we should detect by alternate means]


but ...
.. don't the indices for built-ins need to be constant?
  (maybe that means that we'd just allocate as many as needed to  
cover all runtimes?)

.. or, otherwise, how does LTO know which set to invoke?

... sorry, I think my observation about pr24777 has pushed this thread  
off course - perhaps we'd be better putting this in the PR thread?


cheers
Iain



[C++ Patch] PR 49152

2012-03-21 Thread Paolo Carlini

Hi,

this diagnostic issue is about not even trying to print expressions in 
error messages involving operators, and print operand types instead. 
Just as an example, for:


struct X { int x; };
void trigger (X x []) { x [01] = 0; }

we currently print:

error: no match for ‘operator=’ in ‘*(x + 4u) = 0’

which the patch changes to:

error: no match for ‘operator=’ (operand types are ‘X’ and ‘int’)

Or, for the existing other/error10.C, from:

error: no match for ‘operator-’ in ‘-(*  a)’

to

error: no match for ‘operator-’ (operand type is ‘A0’)

Jon and Manuel checked clang and I checked what icc does: without the 
caret, I don't think we can do *much* better here, but, wrt the audit 
trail discussion, I'm proposing printing the actual operand types 
between parentheses - I got the general idea from icc - because we don't 
want to confuse parameters and arguments.


Tested x86_64-linux.

Thanks,
Paolo.

/




/cp
2012-03-21  Paolo Carlini  paolo.carl...@oracle.com

PR c++/49152
* call.c (op_error): Don't try to print expressions, print types.

/testsuite
2012-03-21  Paolo Carlini  paolo.carl...@oracle.com

PR c++/49152
* g++.dg/diagnostic/operator1.C: New.
* g++.dg/ext/label5.C: Adjust.
* g++.dg/ext/va-arg1.C: Likewise.
* g++.dg/other/error20.C: Likewise.
* g++.dg/other/error20.C: Likewise.
* g++.dg/other/error16.C: Likewise.
* g++.dg/other/error10.C: Likewise.
* g++.dg/parse/error30.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-err1.C: Likewise.
Index: testsuite/g++.dg/ext/label5.C
===
--- testsuite/g++.dg/ext/label5.C   (revision 185603)
+++ testsuite/g++.dg/ext/label5.C   (working copy)
@@ -2,5 +2,5 @@
 // PR c++/24052
 
 struct A { };
-int main() { b: A()   b; } // { dg-error A\\(\\)   *b }
+int main() { b: A()   b; } // { dg-error operand types are 'A' and 
'void\\*' }
 // { dg-message candidate|operator|no known conversion additional { 
target *-*-* } 5 }
Index: testsuite/g++.dg/ext/va-arg1.C
===
--- testsuite/g++.dg/ext/va-arg1.C  (revision 185603)
+++ testsuite/g++.dg/ext/va-arg1.C  (working copy)
@@ -4,5 +4,5 @@ struct A {};
 
 void foo()
 {
-  ++__builtin_va_arg(0, A); // { dg-error '\\+\\+va_arg\\(0, A\\)' }
+  ++__builtin_va_arg(0, A); // { dg-error operand type is 'A' }
 }
Index: testsuite/g++.dg/other/error20.C
===
--- testsuite/g++.dg/other/error20.C(revision 185603)
+++ testsuite/g++.dg/other/error20.C(working copy)
@@ -8,6 +8,6 @@ struct A// { dg-message operator=|no 
known con
 
 void bar (A a)
 {
-  a.foo () = 0; // { dg-error A::foo\\(\\) = 0 }
+  a.foo () = 0; // { dg-error operand types are 'A' and 'int' }
   // { dg-message candidate candidate note { target *-*-* } 11 }
 }   
Index: testsuite/g++.dg/other/error16.C
===
--- testsuite/g++.dg/other/error16.C(revision 185603)
+++ testsuite/g++.dg/other/error16.C(working copy)
@@ -10,5 +10,5 @@ typedef OuterX XOuter;
 
 int main() {
   Outerint  ab;
-  ab.foo() == 1; // { dg-error ab.Outer }
+  ab.foo() == 1; // { dg-error operand types are 'Outerint::Inner' and 
'int' }
 }
Index: testsuite/g++.dg/other/error10.C
===
--- testsuite/g++.dg/other/error10.C(revision 185603)
+++ testsuite/g++.dg/other/error10.C(working copy)
@@ -6,10 +6,9 @@ templateint struct A {};
 
 templateint N
 void foo(const AN a)
-{ -AN(a); } // { dg-error \\(\\*  a\\)  }
+{ -AN(a); } // { dg-error operand type is 'A0' }
 
 void bar()
 {
 foo(A0()); // { dg-message required from here  }
 }
-
Index: testsuite/g++.dg/diagnostic/operator1.C
===
--- testsuite/g++.dg/diagnostic/operator1.C (revision 0)
+++ testsuite/g++.dg/diagnostic/operator1.C (revision 0)
@@ -0,0 +1,4 @@
+// PR c++/49152
+
+struct X { int x; }; 
+void trigger (X x []) { x [01] = 0; } // { dg-error operand types are 'X' and 
'int' }
Index: testsuite/g++.dg/parse/error30.C
===
--- testsuite/g++.dg/parse/error30.C(revision 185603)
+++ testsuite/g++.dg/parse/error30.C(working copy)
@@ -8,5 +8,5 @@ struct A
   A(int);
 };
 
-A a = -A();// { dg-error 10:no match for.*operator-.*in.*-A\\(\\) }
-A b = -A(5);   // { dg-error 11:no match for.*operator-.*in.*-A\\(5\\) }
+A a = -A();// { dg-error operand type is 'A' }
+A b = -A(5);   // { dg-error operand type is 'A' }
Index: testsuite/g++.dg/cpp0x/lambda/lambda-err1.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-err1.C (revision 185603)
+++ 

Re: [google][4.6] Bug fixes to function reordering linker plugin to handle local and comdat functions. (issue 5851044)

2012-03-21 Thread tmsriram

Committed to google/gcc-4_6 after validation.

On 2012/03/21 05:07:33, davidxl wrote:

ok for google branches after checkin validation.



David




http://codereview.appspot.com/5851044/


[PATCH][Testsuite] XFAIL scev-3/4.c and add scev-5.c

2012-03-21 Thread Jiangning Liu
Hi,

This patch is to XFAIL scev-3.c and scev-5.c. 

The bug is going to be fixed after Richard Guenther fix a serials of
problems related to POINTER_PLUS_EXPR and sizetype precision.

Thanks,
-Jiangning 

ChangeLog for testsuite:

2012-03-21  Jiangning Liu  jiangning@arm.com

PR tree-optimization/52563
* gcc.dg/tree-ssa/scev-3.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-4.c: XFAIL on lp64.
* gcc.dg/tree-ssa/scev-5.c: New.

Thanks,
-Jiangning

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
index 28d5c93..ed63a18 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-3.c
@@ -14,5 +14,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { scan-tree-dump-times a 1 optimized { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump optimized } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
index 6c1e530..a538c32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-4.c
@@ -19,5 +19,5 @@ f(int k)
 }
 }
 
-/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { scan-tree-dump-times a 1 optimized { xfail lp64 } 
+} } */
 /* { dg-final { cleanup-tree-dump optimized } } */ diff --git
a/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
new file mode 100644
index 000..b9de36a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-optimized } */
+
+int *a_p;
+int a[1000];
+
+f(int k)
+{
+long long i;
+
+for (i=k; i1000; i+=k) {
+a_p = a[i];
+*a_p = 100;
+}
+}
+
+/* { dg-final { scan-tree-dump-times a 1 optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */





Re: [google][4.6]Bump param value of default function size limit for auto cloning

2012-03-21 Thread Sriraman Tallam
Submitted to google/gcc-4_6.

Thanks,
-Sri.

On Wed, Mar 21, 2012 at 11:38 AM, Xinliang David Li davi...@google.com wrote:
 ok.

 thanks,

 David

 On Wed, Mar 21, 2012 at 11:20 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

  I am bumping up the default param value of  function size limit for
 auto cloning. Since auto cloning happens on inlined functions, the
 original value does not catch some cases in one of our benchmarks.

  Automatic function versioning is only available in the
 google/gcc-4_6 branch. I am working on porting this to trunk. Please
 see: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01230.html for
 description.

 Thanks,
 -Sri.


 Index: params.def
 ===
 --- params.def  (revision 185514)
 +++ params.def  (working copy)
 @@ -1040,7 +1040,7 @@ DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
  DEFPARAM (PARAM_MAX_FUNCTION_SIZE_FOR_AUTO_CLONING,
          autoclone-function-size-limit,
          Do not auto clone functions beyond this size.,
 -         450, 0, 10)
 +         5000, 0, 10)

  /*
  Local variables:


Re: [C++ Patch] PR 49152

2012-03-21 Thread Gabriel Dos Reis
On Wed, Mar 21, 2012 at 7:22 PM, Paolo Carlini paolo.carl...@oracle.com wrote:
 Hi,

 this diagnostic issue is about not even trying to print expressions in error
 messages involving operators, and print operand types instead. Just as an
 example, for:

 struct X { int x; };
 void trigger (X x []) { x [01] = 0; }

 we currently print:

 error: no match for ‘operator=’ in ‘*(x + 4u) = 0’

 which the patch changes to:

 error: no match for ‘operator=’ (operand types are ‘X’ and ‘int’)

 Or, for the existing other/error10.C, from:

 error: no match for ‘operator-’ in ‘-(*  a)’

 to

 error: no match for ‘operator-’ (operand type is ‘A0’)


Usually these things appear in much less simpler expressions, possibly involving
the same symbol but withe different meanings.  There out be the a
way to give an indication of which symbol the diagnostic is about.
Withe the (imperfect) approach of printing expressions, at least some indication
is given on the expression involved.  Just printing the types with no indication
of what expression is causing trouble is more head-scratching.


 Jon and Manuel checked clang and I checked what icc does: without the caret,
 I don't think we can do *much* better here, but, wrt the audit trail

I think printing part or all of the expression is better in absence of carets.
We can improve on the pretty printing of expressions, for example.

 discussion, I'm proposing printing the actual operand types between
 parentheses - I got the general idea from icc - because we don't want to
 confuse parameters and arguments.

 Tested x86_64-linux.

 Thanks,
 Paolo.

 /






PATCH COMMITTED: Add notes about Go to gcc-4.7/changes.html

2012-03-21 Thread Ian Lance Taylor
I committed this patch to add some notes about Go to
gcc-4.7/changes.html on the web site.

Ian

Index: gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.100
diff -u -r1.100 changes.html
--- gcc-4.7/changes.html	19 Mar 2012 16:05:46 -	1.100
+++ gcc-4.7/changes.html	22 Mar 2012 04:51:20 -
@@ -609,6 +609,17 @@
   /ul/li
   /ul
 
+h3 id=goGo/h3
+  ul
+liGCC 4.7 implements
+  the a href=http://weekly.golang.org/doc/go1.html;Go 1
+  language standard./a  The library support in 4.7.0 is not
+  quite complete, due to release timing.  Release 4.7.1 is
+  expected to include complete support./li
+liGo has been tested on GNU/Linux and Solaris platforms.  It may
+  work on other platforms as well./li
+  /ul
+
 !--
 h3Java (GCJ)/h3
 --