RE: [PATCH]Fix computation of offset in ivopt

2013-09-27 Thread bin.cheng


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of bin.cheng
 Sent: Friday, September 27, 2013 1:07 PM
 To: 'Richard Biener'
 Cc: GCC Patches
 Subject: RE: [PATCH]Fix computation of offset in ivopt
 
 
 
  -Original Message-
  From: Richard Biener [mailto:richard.guent...@gmail.com]
  Sent: Tuesday, September 24, 2013 6:31 PM
  To: Bin Cheng
  Cc: GCC Patches
  Subject: Re: [PATCH]Fix computation of offset in ivopt
 
  On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote:
 
  +   field = TREE_OPERAND (expr, 1);
  +   if (DECL_FIELD_BIT_OFFSET (field)
  +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
  + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field));
  +
  +   tmp = component_ref_field_offset (expr);
  +   if (top_compref
  +cst_and_fits_in_hwi (tmp))
  + {
  +   /* Strip the component reference completely.  */
  +   op0 = TREE_OPERAND (expr, 0);
  +   op0 = strip_offset_1 (op0, inside_addr, top_compref, off0);
  +   *offset = off0 + int_cst_value (tmp) + boffset /
 BITS_PER_UNIT;
  +   return op0;
  + }
 
  the failure paths seem mangled, that is, if cst_and_fits_in_hwi is
  false
 for
  either offset part you may end up doing half accounting and not
stripping.
 
  Btw, DECL_FIELD_BIT_OFFSET is always non-NULL.  I suggest to rewrite
  to
 
   if (!inside_addr)
 return orig_expr;
 
   tmp = component_ref_field_offset (expr);
   field = TREE_OPERAND (expr, 1);
   if (top_compref
cst_and_fits_in_hwi (tmp)
cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
  {
...
  }
 Will be refined.
 
 
  note that this doesn't really handle overflows correctly as
 
  +   *offset = off0 + int_cst_value (tmp) + boffset /
  + BITS_PER_UNIT;
 
  may still overflow.
 Since it's unsigned + signed + signed, according to implicit conversion,
the
 signed operand will be converted to unsigned, so the overflow would only
 happen when off0 is huge number and tmp/boffset is large positive number,
 right?  Do I need to check whether off0 is larger than the overflowed
result?
 Also there is signed-unsigned problem here, see below.
 
 
  @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data
 *data,
   bitmap_clear (*depends_on);
   }
 
  +  /* Sign-extend offset if utype has lower precision than
  + HOST_WIDE_INT.  */  offset = sext_hwi (offset, TYPE_PRECISION
  + (utype));
  +
 
  offset is computed elsewhere in difference_cost and the bug to me
  seems that it is unsigned.  sign-extending it here is odd at least
  (and the
 extension
  should probably happen at sizetype precision, not that of utype).
 I agree, The root cause is in split_offset_1, in which offset is computed.
 Every time offset is computed in this function with a signed operand (like
 int_cst_value (tmp) above), we need to take care the possible negative
 number problem.   Take this case as an example, we need to do below
 change:
 
   case INTEGER_CST:
   //...
   *offset = int_cst_value (expr);
 change to
   case INTEGER_CST:
   //...
   *offset = sext_hwi (int_cst_value (expr), type);
 
 and
   case MULT_EXPR:
   //...
   *offset = sext_hwi (int_cst_value (expr), type); to
   case MULT_EXPR:
   //...
  HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1);
   *offset = sext_hwi (xxx, type);
 
 Any comments?

Thought twice, I guess we can compute signed offset in strip_offset_1 and
sign extend it for strip_offset, thus we don't need to change every
computation of offset in that function.

Thanks.
bin





Re: OMP4/cilkplus: simd clone function mangling

2013-09-27 Thread Richard Biener
On Thu, Sep 26, 2013 at 9:35 PM, Aldy Hernandez al...@redhat.com wrote:
 +  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
 + be cloned have a distinctive artificial label in addition to omp
 + declare simd.  */
 +  bool cilk_clone = flag_enable_cilkplus
 + lookup_attribute (cilk plus elemental,
 +DECL_ATTRIBUTES (new_node-symbol.decl));
 +  if (cilk_clone)
 +remove_attribute (cilk plus elemental,
 + DECL_ATTRIBUTES (new_node-symbol.decl));


 Oh yeah, rth had asked me why I remove the attribute.  My initial thoughts
 were that whether or not a function is a simd clone can be accessed through
 the cgraph bits (node-simdclone != NULL for the clone, and
 node-has_simd_clones for the parent).  No sense keeping the attribute.
 But I can leave it if you think it's better.

Why have it in the first place if it's marked in the cgraph?

Richard.

 Aldy


Re: [google gcc-4_8] fix size_estimation for builtin_expect

2013-09-27 Thread Richard Biener
On Fri, Sep 27, 2013 at 12:23 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 builtin_expect should be a NOP in size_estimation. Indeed, the call
 stmt itself is 0 weight in size and time. But it may introduce
 an extra relation expr which has non-zero size/time. The end result
 is: for w/ and w/o builtin_expect, we have different size/time estimation
 for early inlining.

 This patch fixes this problem.

 -Rong

 2013-09-26  Rong Xu  x...@google.com

   * ipa-inline-analysis.c (estimate_function_body_sizes): fix
 the size estimation for builtin_expect.

 This seems fine with an comment in the code what it is about.
 I also think we want to support mutiple builtin_expects in a BB so perhaps
 we want to have pointer set of statements to ignore?

 To avoid spagetti code, please just move the new logic into separate 
 functions.

Looks like this could use tree-ssa.c:walk_use_def_chains (please
change its implementation as necessary, make it C++, etc. - you will
be the first user again).

Richard.

 Honza

 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c (revision 202638)
 +++ ipa-inline-analysis.c (working copy)
 @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node *
/* Estimate static overhead for function prologue/epilogue and alignment. 
 */
int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
int size = overhead;
 +  gimple fix_expect_builtin;
 +
/* Benefits are scaled by probability of elimination that is in range
   0,2.  */
basic_block bb;
 @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node *
   }
   }

 +  fix_expect_builtin = NULL;
for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
   {
 gimple stmt = gsi_stmt (bsi);
 +   if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT))
 +{
 +  tree var = gimple_call_lhs (stmt);
 +  tree arg = gimple_call_arg (stmt, 0);
 +  use_operand_p use_p;
 +  gimple use_stmt;
 +  bool match = false;
 +  bool done = false;
 +  gcc_assert (var  arg);
 +  gcc_assert (TREE_CODE (var) == SSA_NAME);
 +
 +  while (TREE_CODE (arg) == SSA_NAME)
 +{
 +  gimple stmt_tmp = SSA_NAME_DEF_STMT (arg);
 +  switch (gimple_assign_rhs_code (stmt_tmp))
 +{
 +  case LT_EXPR:
 +  case LE_EXPR:
 +  case GT_EXPR:
 +  case GE_EXPR:
 +  case EQ_EXPR:
 +  case NE_EXPR:
 +match = true;
 +done = true;
 +break;
 +  case NOP_EXPR:
 +break;
 +  default:
 +done = true;
 +break;
 +}
 +  if (done)
 +break;
 +  arg = gimple_assign_rhs1 (stmt_tmp);
 +}
 +
 +  if (match  single_imm_use (var, use_p, use_stmt))
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +{
 +  fix_expect_builtin = use_stmt;
 +}
 +}
 +
 +  /* we should see one builtin_expert call in one bb.  */
 +  break;
 +}
 +}
 +
 +  for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
 + {
 +   gimple stmt = gsi_stmt (bsi);
 int this_size = estimate_num_insns (stmt, eni_size_weights);
 int this_time = estimate_num_insns (stmt, eni_time_weights);
 int prob;
 struct predicate will_be_nonconstant;

 +   if (stmt == fix_expect_builtin)
 +{
 +  this_size--;
 +  this_time--;
 +}
 +
 if (dump_file  (dump_flags  TDF_DETAILS))
   {
 fprintf (dump_file,   );



Re: [PATCH]Fix computation of offset in ivopt

2013-09-27 Thread Richard Biener
On Fri, Sep 27, 2013 at 7:07 AM, bin.cheng bin.ch...@arm.com wrote:


 -Original Message-
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, September 24, 2013 6:31 PM
 To: Bin Cheng
 Cc: GCC Patches
 Subject: Re: [PATCH]Fix computation of offset in ivopt

 On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote:

 +   field = TREE_OPERAND (expr, 1);
 +   if (DECL_FIELD_BIT_OFFSET (field)
 +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
 + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field));
 +
 +   tmp = component_ref_field_offset (expr);
 +   if (top_compref
 +cst_and_fits_in_hwi (tmp))
 + {
 +   /* Strip the component reference completely.  */
 +   op0 = TREE_OPERAND (expr, 0);
 +   op0 = strip_offset_1 (op0, inside_addr, top_compref, off0);
 +   *offset = off0 + int_cst_value (tmp) + boffset /
 BITS_PER_UNIT;
 +   return op0;
 + }

 the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false
 for
 either offset part you may end up doing half accounting and not stripping.

 Btw, DECL_FIELD_BIT_OFFSET is always non-NULL.  I suggest to rewrite to

  if (!inside_addr)
return orig_expr;

  tmp = component_ref_field_offset (expr);
  field = TREE_OPERAND (expr, 1);
  if (top_compref
   cst_and_fits_in_hwi (tmp)
   cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field)))
 {
   ...
 }
 Will be refined.


 note that this doesn't really handle overflows correctly as

 +   *offset = off0 + int_cst_value (tmp) + boffset /
 + BITS_PER_UNIT;

 may still overflow.
 Since it's unsigned + signed + signed, according to implicit conversion,
 the signed operand will be converted to unsigned, so the overflow would only
 happen when off0 is huge number and tmp/boffset is large positive number,
 right?  Do I need to check whether off0 is larger than the overflowed
 result?  Also there is signed-unsigned problem here, see below.


 @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data,
  bitmap_clear (*depends_on);
  }

 +  /* Sign-extend offset if utype has lower precision than
 + HOST_WIDE_INT.  */  offset = sext_hwi (offset, TYPE_PRECISION
 + (utype));
 +

 offset is computed elsewhere in difference_cost and the bug to me seems
 that it is unsigned.  sign-extending it here is odd at least (and the
 extension
 should probably happen at sizetype precision, not that of utype).
 I agree, The root cause is in split_offset_1, in which offset is computed.
 Every time offset is computed in this function with a signed operand (like
 int_cst_value (tmp) above), we need to take care the possible negative
 number problem.   Take this case as an example, we need to do below change:

   case INTEGER_CST:
   //...
   *offset = int_cst_value (expr);
 change to
   case INTEGER_CST:
   //...
   *offset = sext_hwi (int_cst_value (expr), type);

 and
   case MULT_EXPR:
   //...
   *offset = sext_hwi (int_cst_value (expr), type);
 to
   case MULT_EXPR:
   //...
  HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1);
   *offset = sext_hwi (xxx, type);

 Any comments?

The issue is of course that we end up converting offsets to sizetype
at some point which makes them all appear unsigned.  The fix for this
is to simply interpret them as signed ... but it's really a mess ;)

Richard.

 Thanks.
 bin





Re: [PATCH, PR 57748] Check for out of bounds access, Part 2

2013-09-27 Thread Eric Botcazou
 Sure, but the modifier is not meant to force something into memory,
 especially when it is already in an register. Remember, we are only
 talking of structures here, and we only want to access one member.
 
 It is more the other way round:
 It says: You do not have to load the value in a register, if it is already
 in memory I'm happy

   EXPAND_MEMORY means we are interested in a memory result, even if
the memory is constant and we could have propagated a constant value.  */

We definitely want to propagate constant values here, look at the code below. 
And it already lists explicit cases where we really need to splill to memory.

-- 
Eric Botcazou


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-27 Thread Eric Botcazou
 They don't need to be kept synchronised as such.  It's fine for the index
 to allow more than must_be_index_p.  But if you're not keen on the current
 structure, does the following look better?  Tested on x86_64-linux-gnu.
 
 Thanks,
 Richard
 
 
 gcc/
   * rtlanal.c (must_be_base_p, must_be_index_p): Delete.
   (binary_scale_code_p, get_base_term, get_index_term): New functions.
   (set_address_segment, set_address_base, set_address_index)
   (set_address_disp): Accept the argument unconditionally.
   (baseness): Remove must_be_base_p and must_be_index_p checks.
   (decompose_normal_address): Classify as much as possible in the
   main loop.

Yes, fine by me, thanks.

-- 
Eric Botcazou


Re: RFA: Store the REG_BR_PROB probability directly as an int

2013-09-27 Thread Eric Botcazou
 Thanks for the testing.  It also passes bootstrap on x86_64-linux-gnu.
 OK to install?

Yes, thanks.

-- 
Eric Botcazou


Re: [patch] Separate immediate uses and phi routines from tree-flow*.h

2013-09-27 Thread Richard Biener
On Thu, Sep 26, 2013 at 6:07 PM, Andrew MacLeod amacl...@redhat.com wrote:
 On 09/25/2013 04:49 AM, Richard Biener wrote:

 On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod amacl...@redhat.com
 wrote:

 This larger patch moves all the immediate use and operand routines from
 tree-flow.h into tree-ssa-operands.h.
 It also moves the basic phi routines and prototypes into a newly created
 tree-phinodes.h, or tree-ssa-operands.h if they belong there.
 And finally shuffles a couple of other routines which allows
 tree-ssa-operands.h to be removed from the gimple.h header file.

 of note or interest:

 1 - dump_decl_set() was defined in tree-into-ssa.c, but isn't really ssa
 specific. Its tree-specific, so normally I'd throw it into tree.c.
 Looking
 forward a little, its only used in a gimple context, so when we map to
 gimple_types it will need to be converted to/created for those. If it is
 in
 tree.c, I'll have to create a new version for gimple types, and then the
 routine in tree.c will become unused.  Based on that, I figured gimple.c
 is
 the place place for it.

 2 - has_zero_uses_1() and single_imm_use_1() were both in tree-cfg.c for
 some reason.. they've been moved to tree-ssa-operands.c

 3 - a few routines seem like basic gimple routines, but really turn out
 to
 require the operand infrastructure to implement... so they are moved to
 tree-ssa-operands.[ch] as well.  This sort of thing showed up when
 removing
 tree-ssa-operands.h from the gimple.h include file.  These were things
 like
 gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified

 Note that things like gimple_vuse_op are on the interface border between
 gimple (where the SSA operands are stored) and SSA operands.  So
 it's not so clear for them given they access internal gimple fields
 directly
 but use the regular SSA operand API.

 I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch].


 Ugg. I incorporated what we talked about, and it was much messier than
 expected :-P.  I ended up with a chicken and egg problem between the
 gimple_v{use,def}_op routines in gimple-ssa.h  and the operand routines in
 tree-ssa-operands.h.   They both require each other, and I couldn't get
 things into a consistent state while they are in separate files.  It was
 actually the immediate use iterators which were requiring
 gimple_vuse_op()...  So I have created a new ssa-iterators.h file  to
 resolve this problem.  They build on the operand code and clearly has other
 prerequisites, so that seems reasonable to me...

 This in fact solves a couple of other little warts. It allows me to put both
 gimple_phi_arg_imm_use_ptr() and phi_arg_index_from_use() into
 tree-phinodes.h.

 It also exposes that gimple.c::walk_stmt_load_store_addr_ops() and friends
 actually depend on the existence of PHI nodes, meaning it really belongs on
 the gimple-ssa border as well. So I moved those into gimple-ssa.c

It doesn't depend on PHI nodes but it also works for PHI nodes.  So
I'd rather have it in gimple.c.

 And finally, it turns out that a lot of files include tree-flow.h and
 depend on it to include gimple.h rather than including it themselves.
 Since tree-flow.h is losing its kitchen-sink attribute, and I needed to move
 it to the bottom of the #include list for tree-ssa.h, I have temporarily
 included gimple.h at the top of tree-ssa.h to make sure it gets hauled in.
 When I remove tree-flow.h as the everyone includes it file, I'll add
 gimple.h in all the appropriate .c files and remove it from tree-ssa.h.   It
 would have just made this growing patch even more annoying for now.

 Does this seem reasonable?

Yes - try leaving walk_stmt_load_store_addr_ops in gimple.c though,
if that is technically possible.  Otherwise I guess I don't mind.

Thanks,
Richard.

 Bootstraps on x86_64-unknown-linux-gnu and currently running regressions.

 Andrew

 PS Oh and I noticed the macro name for tree-outof-ssa.h wasnt right, so I
 changed it too.

 Next I'll diverge into trying to sort through putting all the phi-related
 structs and such into tree-phinodes.h


Re: [PATCH, RTL] Prepare ARM build with LRA

2013-09-27 Thread Eric Botcazou
 below is a trivial patch, which makes both parts of test signed.
 With this, bootstrap completes on powerpc-darwin9 - however, you might want
 to check that it still does what you intended.

Please install under PR middle-end/58547 if not already done.

-- 
Eric Botcazou


Re: Commit: MSP430: Pass -md on to assembler

2013-09-27 Thread nick clifton

Hi Mike,


I must say though, it seems wrong to have to provide a sign-extend pointer 
pattern when pointers (on the MSP430) are unsigned.


Agreed.  If we instead ask, is it sane for gcc to ever want to signed extend in 
this case, the answer appears to be no.  Why does it, ptr_mode is SImode, and 
expand_builtin_next_arg is used to perform the addition in this mode.  It 
'just' knows that is can be signed extended… and just does it that way.  This 
seems like it is wrong.

Index: builtins.c
===
--- builtins.c  (revision 202634)
+++ builtins.c  (working copy)
@@ -4094,7 +4094,7 @@ expand_builtin_next_arg (void)
return expand_binop (ptr_mode, add_optab,
   crtl-args.internal_arg_pointer,
   crtl-args.arg_offset_rtx,
-  NULL_RTX, 0, OPTAB_LIB_WIDEN);
+  NULL_RTX, POINTERS_EXTEND_UNSIGNED  0, OPTAB_LIB_WIDEN);
  }

  /* Make it easier for the backends by protecting the valist argument

would fix this problem.  If this is done, the unmodified test case then doesn't 
abort.  Arguably, the extension should be done as the port directs.  It isn't 
clear to me why they do not.

Ok?



OK by me, although I cannot approve that particular patch.

I did eventually find some test cases that exercised the sign-extend 
pointer pattern, so I was able to check the generated code - it worked OK.


But I ran into a very strange problem.  With your PARTIAL_INT_MODE_NAME 
patch applied GCC started erroneously eliminating NULL function pointer 
checks!  This was particularly noticeable in libgcc/crtstuff.c where for 
example:


  static void __attribute__((used))
  frame_dummy (void)
  {
static struct object object;
if (__register_frame_info)
  __register_frame_info (__EH_FRAME_BEGIN__, object);

(this is a simplified version of the real code) ... is compiled as if it 
had be written as:


  static void __attribute__((used))
  frame_dummy (void)
  {
static struct object object;
__register_frame_info (__EH_FRAME_BEGIN__, object);


This only happens for the LARGE model (when pointers are PSImode) but I 
was baffled as to where it could be happening. Have you come across 
anything like this ?


Cheers
  Nick



Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set

2013-09-27 Thread Eric Botcazou
 Like the following.
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 Richard.
 
 2013-09-26  Richard Biener  rguent...@suse.de
 
   * alias.h (component_uses_parent_alias_set): Rename to ...
   (component_uses_parent_alias_set_from): ... this.
   * alias.c (component_uses_parent_alias_set): Rename to ...
   (component_uses_parent_alias_set_from): ... this and return
   the desired parent.
   (reference_alias_ptr_type_1): Use the result from
   component_uses_parent_alias_set_from instead of stripping
   components one at a time.
   * emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust.

FWIW it looks fine to me.

-- 
Eric Botcazou


Re: [ping] [PATCH] Silence an unused variable warning

2013-09-27 Thread Dodji Seketeli
Let's CC Vladimir on this easy one.

Cheers.

Jan-Benedict Glaw jbg...@lug-owl.de a écrit:

 On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de 
 wrote:
 Hi!
 
 With the VAX target, I see this warning:
 
 g++ -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
 -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
 -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic 
 -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  
 -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. 
 -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include  
 -I../../../../gcc/gcc/../libdecnumber 
 -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
 -I../../../../gcc/gcc/../libbacktrace
 ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o
 ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’:
 ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable 
 ‘value_p’ [-Wunused-variable]
bool value_p;
 ^
 [...]

 Ping:

 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01568.html
 `-- http://gcc.gnu.org/ml/gcc-patches/2013-09/txtnrNwaGiD3x.txt

 MfG, JBG

-- 
Dodji


Generic tuning in x86-tune.def 1/2

2013-09-27 Thread Jan Hubicka
Hi,
this is second part of the generic tuning changes sanityzing the tuning flags.
This patch again is supposed to deal with the obvious part only.
I will send separate patch for more changes.

The flags changed agree on all CPUs considered for generic (and their
optimization manuals) + amdfam10, core2 and Atom SLM.

I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it
seems like obvious omision (after double checking in optimization manual) and
droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores.  Implementation of this
feature was always bit weird and its main purpose was to avoid terrible branch
predictor degeneration on the older AMD branch predictors. I benchmarked both
spec2k and 2k6 to verify there are no regression.

Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in 
specfp
benchmarks.

Bootstrapped/regtested x86_64-linux, will wait for comments and commit it
during weekend.  I will be happy to revisit any of the generic tuning if
regressions pop up.

Overall this patch also brings small code size improvements for smaller
loads/stores and less padding at -O2. Differences are sub 0.1% however.

Honza
* x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for 
generic.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
(X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer.
(X86_TUNE_PAD_RETURNS): Drop for newer AMD chips.
(X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic.
(X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic.
Index: config/i386/x86-tune.def
===
--- config/i386/x86-tune.def(revision 202966)
+++ config/i386/x86-tune.def(working copy)
@@ -115,9 +115,9 @@ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPEN
   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMDFAM10 
   | m_BDVER | m_GENERIC)
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, sse_unaligned_load_optimal,
-  m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER | m_SLM)
+  m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER | m_SLM | m_GENERIC)
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, sse_unaligned_store_optimal,
-  m_COREI7 | m_BDVER | m_SLM)
+  m_COREI7 | m_BDVER | m_BTVER | m_SLM | m_GENERIC)
 DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, 
sse_packed_single_insn_optimal,
   m_BDVER)
 /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
@@ -146,8 +146,7 @@ DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSION
 /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
than 4 branch instructions in the 16 byte window.  */
 DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit,
-  m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE 
-  | m_GENERIC)
+  m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_ATHLON_K8 | m_AMDFAM10)
 DEF_TUNE (X86_TUNE_SCHEDULE, schedule,
   m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE 
   | m_AMD_MULTIPLE | m_GENERIC)
@@ -156,13 +155,13 @@ DEF_TUNE (X86_TUNE_USE_BT, use_bt,
 DEF_TUNE (X86_TUNE_USE_INCDEC, use_incdec,
   ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_GENERIC))
 DEF_TUNE (X86_TUNE_PAD_RETURNS, pad_returns,
-  m_AMD_MULTIPLE | m_GENERIC)
+  m_ATHLON_K8 | m_AMDFAM10 | | m_GENERIC)
 DEF_TUNE (X86_TUNE_PAD_SHORT_FUNCTION, pad_short_function, m_ATOM)
 DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, ext_80387_constants,
   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE
   | m_ATHLON_K8 | m_GENERIC)
 DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE, avoid_vector_decode,
-  m_K8 | m_GENERIC)
+  m_K8)
 /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode
and SImode multiply, but 386 and 486 do HImode multiply faster.  */
 DEF_TUNE (X86_TUNE_PROMOTE_HIMODE_IMUL, promote_himode_imul,
@@ -217,7 +216,7 @@ DEF_TUNE (X86_TUNE_REASSOC_INT_TO_PARALL
 /* X86_TUNE_REASSOC_FP_TO_PARALLEL: Try to produce parallel computations
during reassociation of fp computation.  */
 DEF_TUNE (X86_TUNE_REASSOC_FP_TO_PARALLEL, reassoc_fp_to_parallel,
-  m_ATOM | m_SLM | m_HASWELL | m_BDVER1 | m_BDVER2)
+  m_ATOM | m_SLM | m_HASWELL | m_BDVER1 | m_BDVER2 | m_GENERIC)
 /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE
regs instead of memory.  */
 DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, general_regs_sse_spill,


Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-27 Thread Jonathan Wakely
On 27 September 2013 03:15, Tim Shen wrote:
 POSIX ERE says that escaping an ordinary char, say R\n is not
 permitted, because 'n' is not a special char. However, they also say
 that : Implementations are permitted to extend the language to allow
 these. Conforming applications cannot use such constructs.

 So let's support it not to make users surprised.

 Booted and tested under -m32 and -m64

I'm wondering whether we want to have a stricter mode that doesn't
allow them, to help users avoid creating non-portable programs.  We
could check the value of the preprocessor macro __STRICT_ANSI__, which
is set by -std=c++11 but not by -std=gnu++11, although that's not
really the right flag. We want something more like the GNU shell
utils' POSIXLY_CORRECT.


Re: User-define literals for std::complex.

2013-09-27 Thread Jonathan Wakely
On 27 September 2013 05:17, Ed Smith-Rowland wrote:

 The complex user-defined literals finally passed (n3779) with the resolution
 to DR1473 allowing the suffix id to touch the quotes (Can't find it but I
 put it in not too long ago).

I think it's been approved by the LWG and looks like it will go to a
vote by the full committee, but let's wait for that to pass before
making any changes.


Re: [gomp4] Library side of depend clause support

2013-09-27 Thread Jakub Jelinek
On Fri, Sep 27, 2013 at 01:48:36AM +0200, Jakub Jelinek wrote:
 Perhaps.  What if I do just minor cleanup (use flexible array members for
 the reallocated vectors, and perhaps keep only the last out/inout task
 in the hash table chains rather than all of them), retest, commit and then
 we can discuss/incrementally improve it?

Here is what I've committed now, the incremental changes were really only
using a structure with flex array member for the dependers vectors,
removing/making redundant earlier !ent-is_in when adding !is_in into the
chain and addition of new testcases.

Let's improve it incrementally later.

2013-09-27  Jakub Jelinek  ja...@redhat.com

* libgomp.h: Include stdlib.h.
(struct gomp_task_depend_entry,
struct gomp_dependers_vec): New types.
(struct gomp_task): Add dependers, depend_hash, depend_count,
num_dependees and depend fields.
(struct gomp_taskgroup): Add num_children field.
(gomp_finish_task): Free depend_hash if non-NULL.
* libgomp_g.h (GOMP_task): Add depend argument.
* hashtab.h: New file.
* task.c: Include hashtab.h.
(hash_entry_type): New typedef.
(htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
(gomp_init_task): Clear dependers, depend_hash and depend_count
fields.
(GOMP_task): Add depend argument, handle depend clauses.  Increment
num_children field in taskgroup.
(gomp_task_run_pre): Don't increment task_running_count here,
nor clear task_pending bit.
(gomp_task_run_post_handle_depend_hash,
gomp_task_run_post_handle_dependers,
gomp_task_run_post_handle_depend): New functions.
(gomp_task_run_post_remove_parent): Clear in_taskwait before
signalling corresponding semaphore.
(gomp_task_run_post_remove_taskgroup): Decrement num_children
field and make the decrement to 0 MEMMODEL_RELEASE operation,
rather than storing NULL to taskgroup-children.  Clear
in_taskgroup_wait before signalling corresponding semaphore.
(gomp_barrier_handle_tasks): Move task_running_count increment
and task_pending bit clearing here.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks
have been queued, wake other threads if needed.
(GOMP_taskwait): Call gomp_task_run_post_handle_depend.  If more
than one new tasks have been queued, wake other threads if needed.
After waiting on taskwait_sem, enter critical section again.
(GOMP_taskgroup_start): Initialize num_children field.
(GOMP_taskgroup_end): Check num_children instead of children
before critical section.  If children is NULL, but num_children
is non-zero, wait on taskgroup_sem.  Call
gomp_task_run_post_handle_depend.  If more than one new tasks have
been queued, wake other threads if needed.  After waiting on
taskgroup_sem, enter critical section again.
* testsuite/libgomp.c/depend-1.c: New test.
* testsuite/libgomp.c/depend-2.c: New test.
* testsuite/libgomp.c/depend-3.c: New test.
* testsuite/libgomp.c/depend-4.c: New test.

--- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200
+++ libgomp/libgomp.h   2013-09-27 09:05:17.025402127 +0200
@@ -39,6 +39,7 @@
 
 #include pthread.h
 #include stdbool.h
+#include stdlib.h
 
 #ifdef HAVE_ATTRIBUTE_VISIBILITY
 # pragma GCC visibility push(hidden)
@@ -253,7 +254,26 @@ enum gomp_task_kind
   GOMP_TASK_TIED
 };
 
+struct gomp_task;
 struct gomp_taskgroup;
+struct htab;
+
+struct gomp_task_depend_entry
+{
+  void *addr;
+  struct gomp_task_depend_entry *next;
+  struct gomp_task_depend_entry *prev;
+  struct gomp_task *task;
+  bool is_in;
+  bool redundant;
+};
+
+struct gomp_dependers_vec
+{
+  size_t n_elem;
+  size_t allocated;
+  struct gomp_task *elem[];
+};
 
 /* This structure describes a task to be run by a thread.  */
 
@@ -268,6 +288,10 @@ struct gomp_task
   struct gomp_task *next_taskgroup;
   struct gomp_task *prev_taskgroup;
   struct gomp_taskgroup *taskgroup;
+  struct gomp_dependers_vec *dependers;
+  struct htab *depend_hash;
+  size_t depend_count;
+  size_t num_dependees;
   struct gomp_task_icv icv;
   void (*fn) (void *);
   void *fn_data;
@@ -277,6 +301,7 @@ struct gomp_task
   bool final_task;
   bool copy_ctors_done;
   gomp_sem_t taskwait_sem;
+  struct gomp_task_depend_entry depend[];
 };
 
 struct gomp_taskgroup
@@ -286,6 +311,7 @@ struct gomp_taskgroup
   bool in_taskgroup_wait;
   bool cancelled;
   gomp_sem_t taskgroup_sem;
+  size_t num_children;
 };
 
 /* This structure describes a team of threads.  These are the threads
@@ -525,6 +551,8 @@ extern void gomp_barrier_handle_tasks (g
 static void inline
 gomp_finish_task (struct gomp_task *task)
 {
+  if (__builtin_expect (task-depend_hash != NULL, 0))
+free (task-depend_hash);
   gomp_sem_destroy (task-taskwait_sem);
 }
 
--- 

[PING] [C++ PATCH] demangler fix (take 2)

2013-09-27 Thread Gary Benson
Gary Benson wrote:
 Hi all,
 
 This is a resubmission of my previous demangler fix [1] rewritten
 to avoid using hashtables and other libiberty features.
 
 From the above referenced email:
 
 d_print_comp maintains a certain amount of scope across calls (namely
 a stack of templates) which is used when evaluating references in
 template argument lists.  If such a reference is later used from a
 subtitution then the scope in force at the time of the substitution is
 used.  This appears to be wrong (I say appears because I couldn't find
 anything in the API [2] to clarify this).
 
 The attached patch causes the demangler to capture the scope the first
 time such a reference is traversed, and to use that captured scope on
 subsequent traversals.  This fixes GDB PR 14963 [3] whereby a
 reference is resolved against the wrong template, causing an infinite
 loop and eventual stack overflow and segmentation fault.
 
 I've added the result to the demangler test suite, but I know of no
 way to check the validity of the demangled symbol other than by
 inspection (and I am no expert here!)  If anybody knows a way to
 check this then please let me know!  Otherwise, I hope this
 not-really-checked demangled version is acceptable.
 
 Thanks,
 Gary
 
 [1] http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00215.html
 [2] http://mentorembedded.github.io/cxx-abi/abi.html#mangling
 [3] http://sourceware.org/bugzilla/show_bug.cgi?id=14963
 
 -- 
 http://gbenson.net/
diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 89e108a..2ff8216 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,20 @@
+2013-09-17  Gary Benson  gben...@redhat.com
+
+   * cp-demangle.c (struct d_saved_scope): New structure.
+   (struct d_print_info): New fields saved_scopes and
+   num_saved_scopes.
+   (d_print_init): Initialize the above.
+   (d_print_free): New function.
+   (cplus_demangle_print_callback): Call the above.
+   (d_copy_templates): New function.
+   (d_print_comp): New variables saved_templates and
+   need_template_restore.
+   [DEMANGLE_COMPONENT_REFERENCE,
+   DEMANGLE_COMPONENT_RVALUE_REFERENCE]: Capture scope the first
+   time the component is traversed, and use the captured scope for
+   subsequent traversals.
+   * testsuite/demangle-expected: Add regression test.
+
 2013-09-10  Paolo Carlini  paolo.carl...@oracle.com
 
PR bootstrap/58386
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 70f5438..a199f6d 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -275,6 +275,18 @@ struct d_growable_string
   int allocation_failure;
 };
 
+/* A demangle component and some scope captured when it was first
+   traversed.  */
+
+struct d_saved_scope
+{
+  /* The component whose scope this is.  */
+  const struct demangle_component *container;
+  /* The list of templates, if any, that was current when this
+ scope was captured.  */
+  struct d_print_template *templates;
+};
+
 enum { D_PRINT_BUFFER_LENGTH = 256 };
 struct d_print_info
 {
@@ -302,6 +314,10 @@ struct d_print_info
   int pack_index;
   /* Number of d_print_flush calls so far.  */
   unsigned long int flush_count;
+  /* Array of saved scopes for evaluating substitutions.  */
+  struct d_saved_scope *saved_scopes;
+  /* Number of saved scopes in the above array.  */
+  int num_saved_scopes;
 };
 
 #ifdef CP_DEMANGLE_DEBUG
@@ -3665,6 +3681,30 @@ d_print_init (struct d_print_info *dpi, 
demangle_callbackref callback,
   dpi-opaque = opaque;
 
   dpi-demangle_failure = 0;
+
+  dpi-saved_scopes = NULL;
+  dpi-num_saved_scopes = 0;
+}
+
+/* Free a print information structure.  */
+
+static void
+d_print_free (struct d_print_info *dpi)
+{
+  int i;
+
+  for (i = 0; i  dpi-num_saved_scopes; i++)
+{
+  struct d_print_template *ts, *tn;
+
+  for (ts = dpi-saved_scopes[i].templates; ts != NULL; ts = tn)
+   {
+ tn = ts-next;
+ free (ts);
+   }
+}
+
+  free (dpi-saved_scopes);
 }
 
 /* Indicate that an error occurred during printing, and test for error.  */
@@ -3749,6 +3789,7 @@ cplus_demangle_print_callback (int options,
demangle_callbackref callback, void *opaque)
 {
   struct d_print_info dpi;
+  int success;
 
   d_print_init (dpi, callback, opaque);
 
@@ -3756,7 +3797,9 @@ cplus_demangle_print_callback (int options,
 
   d_print_flush (dpi);
 
-  return ! d_print_saw_error (dpi);
+  success = ! d_print_saw_error (dpi);
+  d_print_free (dpi);
+  return success;
 }
 
 /* Turn components into a human readable string.  OPTIONS is the
@@ -3913,6 +3956,36 @@ d_print_subexpr (struct d_print_info *dpi, int options,
 d_append_char (dpi, ')');
 }
 
+/* Return a shallow copy of the current list of templates.
+   On error d_print_error is called and a partial list may
+   be returned.  Whatever is returned must be freed.  */
+
+static struct d_print_template *
+d_copy_templates (struct 

[patch] Fix PR bootstrap/58509

2013-09-27 Thread Eric Botcazou
Hi,

this fixes the ICE during the build of the Ada runtime on the SPARC, a fallout 
of the recent inliner changes:
  http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01033.html

The ICE is triggered because the ldd peephole merges an MEM with MEM_NOTRAP_P 
and a contiguous MEM without MEM_NOTRAP_P, keeping the MEM_NOTRAP_P flag on 
the result.  As a consequence, an EH edge is eliminated and a BB is orphaned.

I think this shows that my above inliner patch was too gross: when you have 
successive inlining, you can quickly end up with a mess of trapping and non-
trapping memory accesses for the same object.  So the attached seriously 
refines it, restricting it to parameters with reference type and leaning 
towards being less conservative.  Again, this should only affect Ada.

Tested on x86_64-suse-linux, OK for the mainline?


2013-09-27  Eric Botcazou  ebotca...@adacore.com

PR bootstrap/58509
* ipa-prop.h (get_ancestor_addr_info): Declare.
* ipa-prop.c (get_ancestor_addr_info): Make public.
* tree-inline.c (is_parm): Rename into...
(is_ref_parm): ...this.
(is_based_on_ref_parm): New predicate.
(remap_gimple_op_r): Do not propagate TREE_THIS_NOTRAP on MEM_REF if
a parameter with reference type has been remapped and the result is
not based on another parameter with reference type.
(copy_tree_body_r): Likewise on INDIRECT_REF and MEM_REF.


2013-09-27  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/specs/opt1.ads: New test.


-- 
Eric BotcazouIndex: tree-inline.c
===
--- tree-inline.c	(revision 202912)
+++ tree-inline.c	(working copy)
@@ -751,10 +751,11 @@ copy_gimple_bind (gimple stmt, copy_body
   return new_bind;
 }
 
-/* Return true if DECL is a parameter or a SSA_NAME for a parameter.  */
+/* Return true if DECL is a parameter with reference type or a SSA_NAME
+  for a parameter with reference type.  */
 
 static bool
-is_parm (tree decl)
+is_ref_parm (tree decl)
 {
   if (TREE_CODE (decl) == SSA_NAME)
 {
@@ -763,7 +764,40 @@ is_parm (tree decl)
 	return false;
 }
 
-  return (TREE_CODE (decl) == PARM_DECL);
+  return (TREE_CODE (decl) == PARM_DECL
+	   TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE);
+}
+
+/* Return true if DECL is based on a parameter with reference type or a
+   SSA_NAME for a parameter with with reference type.  */
+
+static bool
+is_based_on_ref_parm (tree decl)
+{
+  HOST_WIDE_INT offset;
+  tree obj, expr;
+  gimple def_stmt;
+
+  /* First the easy case.  */
+  if (is_ref_parm (decl))
+return true;
+
+  /* Then look for an SSA name whose defining statement is of the form:
+
+  D.1718_7 = parm_2(D)-f1;
+
+ where parm_2 is a parameter with reference type.  */
+  if (TREE_CODE (decl) != SSA_NAME)
+return false;
+  def_stmt = SSA_NAME_DEF_STMT (decl);
+  if (!def_stmt)
+return false;
+
+  expr = get_ancestor_addr_info (def_stmt, obj, offset);
+  if (!expr)
+return false;
+
+  return is_ref_parm (TREE_OPERAND (expr, 0));
 }
 
 /* Remap the GIMPLE operand pointed to by *TP.  DATA is really a
@@ -865,12 +899,13 @@ remap_gimple_op_r (tree *tp, int *walk_s
 	  TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old);
 	  TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old);
 	  TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old);
-	  /* We cannot propagate the TREE_THIS_NOTRAP flag if we have
-	 remapped a parameter as the property might be valid only
-	 for the parameter itself.  */
+	  /* We cannot always propagate the TREE_THIS_NOTRAP flag if we have
+	 remapped a parameter with reference type as the property may be
+	 valid only for the parameter.  */
 	  if (TREE_THIS_NOTRAP (old)
-	   (!is_parm (TREE_OPERAND (old, 0))
-		  || (!id-transform_parameter  is_parm (ptr
+	   (!is_ref_parm (TREE_OPERAND (old, 0))
+		  || !id-transform_parameter
+		  || is_based_on_ref_parm (ptr)))
 	TREE_THIS_NOTRAP (*tp) = 1;
 	  *walk_subtrees = 0;
 	  return NULL;
@@ -1092,12 +1127,13 @@ copy_tree_body_r (tree *tp, int *walk_su
 		  TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old);
 		  TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old);
 		  TREE_READONLY (*tp) = TREE_READONLY (old);
-		  /* We cannot propagate the TREE_THIS_NOTRAP flag if we
-			 have remapped a parameter as the property might be
-			 valid only for the parameter itself.  */
+		  /* We cannot always propagate the TREE_THIS_NOTRAP flag
+			 if we have remapped a parameter with reference type as
+			 the property may be valid only for the parameter.  */
 		  if (TREE_THIS_NOTRAP (old)
-			   (!is_parm (TREE_OPERAND (old, 0))
-			  || (!id-transform_parameter  is_parm (ptr
+			   (!is_ref_parm (TREE_OPERAND (old, 0))
+			  || !id-transform_parameter
+			  || is_based_on_ref_parm (ptr)))
 		TREE_THIS_NOTRAP (*tp) = 1;
 		}
 		}
@@ -1118,12 +1154,13 @@ copy_tree_body_r (tree *tp, int 

Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-27 Thread Paolo Carlini

On 9/27/13 4:34 AM, Jonathan Wakely wrote:

On 27 September 2013 03:15, Tim Shen wrote:

POSIX ERE says that escaping an ordinary char, say R\n is not
permitted, because 'n' is not a special char. However, they also say
that : Implementations are permitted to extend the language to allow
these. Conforming applications cannot use such constructs.

So let's support it not to make users surprised.

Booted and tested under -m32 and -m64

I'm wondering whether we want to have a stricter mode that doesn't
allow them, to help users avoid creating non-portable programs.  We
could check the value of the preprocessor macro __STRICT_ANSI__, which
is set by -std=c++11 but not by -std=gnu++11, although that's not
really the right flag. We want something more like the GNU shell
utils' POSIXLY_CORRECT.
Indeed. I think that for now __STRICT_ANSI__ can do, it's important to 
manage to accept those otherwise, as we discovered yesterday, we easily 
reject quite a few rather sensible regex users can write or find in 
examples: this started when Tim, upon my suggestion, tried the examples 
in the new edition of Nicolai Josuttis book and found in one those an 
escaped closed curly bracket (note, closed, open are definitely fine), 
which apparently most of the other implementations do not reject.


Paolo.


Re: OMP4/cilkplus: simd clone function mangling

2013-09-27 Thread Aldy Hernandez

On 09/27/13 03:18, Richard Biener wrote:

On Thu, Sep 26, 2013 at 9:35 PM, Aldy Hernandez al...@redhat.com wrote:

+  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
+ be cloned have a distinctive artificial label in addition to omp
+ declare simd.  */
+  bool cilk_clone = flag_enable_cilkplus
+ lookup_attribute (cilk plus elemental,
+DECL_ATTRIBUTES (new_node-symbol.decl));
+  if (cilk_clone)
+remove_attribute (cilk plus elemental,
+ DECL_ATTRIBUTES (new_node-symbol.decl));



Oh yeah, rth had asked me why I remove the attribute.  My initial thoughts
were that whether or not a function is a simd clone can be accessed through
the cgraph bits (node-simdclone != NULL for the clone, and
node-has_simd_clones for the parent).  No sense keeping the attribute.
But I can leave it if you think it's better.


Why have it in the first place if it's marked in the cgraph?


It would be placed there by the front-end when parsing Cilk Plus 
simd-enabled functions.  It's only in the the omp stage that we transfer 
that information to the cgraph bits.




Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-27 Thread Jonathan Wakely
On 27 September 2013 13:32, Paolo Carlini wrote:
 On 9/27/13 4:34 AM, Jonathan Wakely wrote:

 On 27 September 2013 03:15, Tim Shen wrote:

 POSIX ERE says that escaping an ordinary char, say R\n is not
 permitted, because 'n' is not a special char. However, they also say
 that : Implementations are permitted to extend the language to allow
 these. Conforming applications cannot use such constructs.

 So let's support it not to make users surprised.

 Booted and tested under -m32 and -m64

 I'm wondering whether we want to have a stricter mode that doesn't
 allow them, to help users avoid creating non-portable programs.  We
 could check the value of the preprocessor macro __STRICT_ANSI__, which
 is set by -std=c++11 but not by -std=gnu++11, although that's not
 really the right flag. We want something more like the GNU shell
 utils' POSIXLY_CORRECT.

 Indeed. I think that for now __STRICT_ANSI__ can do, it's important to
 manage to accept those otherwise, as we discovered yesterday, we easily
 reject quite a few rather sensible regex users can write or find in
 examples: this started when Tim, upon my suggestion, tried the examples in
 the new edition of Nicolai Josuttis book and found in one those an escaped
 closed curly bracket (note, closed, open are definitely fine), which
 apparently most of the other implementations do not reject.

Ah I see.  I definitely agree it's good to accept that instead of
being unnecessarily strict, but other people will want the option of
strict conformance, so I think we can please everyone with something
like:

else
  {
#ifdef __STRICT_ANSI__
__throw_regex_error(regex_constants::error_escape);
#else
   _M_token = _S_token_ord_char;
   _M_value.assign(1, __c);
#endif
  }


[committed] Fix move_sese_region_to_fn (PR middle-end/58551)

2013-09-27 Thread Jakub Jelinek
Hi!

I've committed the following fix to a regression introduced in 4.9
early loop construction.  SESE regions, as documented above
move_sese_region_to_fn, are allowed to contain calls to noreturn functions
like abort/exit.  But, basic blocks leading to noreturn functions aren't
actually placed in the loop inside of which the SESE region is present,
but directly inside of the outermost loop of the function.

So, we can't just move change loop_father of bb's belonging to
entry_bb's loop_father to new function's outermost loop and move
loops which have their outer loop equal to entry_bb's loop_father
and have their header in the SESE region into the new function,
but we also have to handle the same way the outermost loop of the
original function.

Bootstrapped/regtested on x86_64-linux and i686-linux, preapproved by richi
on IRC, committed to trunk.

2013-09-27  Jakub Jelinek  ja...@redhat.com

PR middle-end/58551
* tree-cfg.c (move_sese_region_to_fn): Also move loops that
are children of outermost saved_cfun's loop, and set it up to
be moved to dest_cfun's outermost loop.  Fix up num_nodes adjustments
if loop != loop0 and SESE region contains bbs that belong to loop0.

* c-c++-common/gomp/pr58551.c: New test.

--- gcc/tree-cfg.c.jj   2013-09-13 14:41:28.0 +0200
+++ gcc/tree-cfg.c  2013-09-27 12:23:48.582217401 +0200
@@ -6662,12 +6662,13 @@ move_sese_region_to_fn (struct function
   struct function *saved_cfun = cfun;
   int *entry_flag, *exit_flag;
   unsigned *entry_prob, *exit_prob;
-  unsigned i, num_entry_edges, num_exit_edges;
+  unsigned i, num_entry_edges, num_exit_edges, num_nodes;
   edge e;
   edge_iterator ei;
   htab_t new_label_map;
   struct pointer_map_t *vars_map, *eh_map;
   struct loop *loop = entry_bb-loop_father;
+  struct loop *loop0 = get_loop (saved_cfun, 0);
   struct move_stmt_d d;
 
   /* If ENTRY does not strictly dominate EXIT, this cannot be an SESE
@@ -6760,16 +6761,29 @@ move_sese_region_to_fn (struct function
   set_loops_for_fn (dest_cfun, loops);
 
   /* Move the outlined loop tree part.  */
+  num_nodes = bbs.length ();
   FOR_EACH_VEC_ELT (bbs, i, bb)
 {
-  if (bb-loop_father-header == bb
-  loop_outer (bb-loop_father) == loop)
+  if (bb-loop_father-header == bb)
{
  struct loop *this_loop = bb-loop_father;
- flow_loop_tree_node_remove (bb-loop_father);
- flow_loop_tree_node_add (get_loop (dest_cfun, 0), this_loop);
- fixup_loop_arrays_after_move (saved_cfun, cfun, this_loop);
+ struct loop *outer = loop_outer (this_loop);
+ if (outer == loop
+ /* If the SESE region contains some bbs ending with
+a noreturn call, those are considered to belong
+to the outermost loop in saved_cfun, rather than
+the entry_bb's loop_father.  */
+ || outer == loop0)
+   {
+ if (outer != loop)
+   num_nodes -= this_loop-num_nodes;
+ flow_loop_tree_node_remove (bb-loop_father);
+ flow_loop_tree_node_add (get_loop (dest_cfun, 0), this_loop);
+ fixup_loop_arrays_after_move (saved_cfun, cfun, this_loop);
+   }
}
+  else if (bb-loop_father == loop0  loop0 != loop)
+   num_nodes--;
 
   /* Remove loop exits from the outlined region.  */
   if (loops_for_fn (saved_cfun)-exits)
@@ -6789,6 +6803,7 @@ move_sese_region_to_fn (struct function
 
   /* Setup a mapping to be used by move_block_to_fn.  */
   loop-aux = current_loops-tree_root;
+  loop0-aux = current_loops-tree_root;
 
   pop_cfun ();
 
@@ -6817,11 +6832,13 @@ move_sese_region_to_fn (struct function
 }
 
   loop-aux = NULL;
+  loop0-aux = NULL;
   /* Loop sizes are no longer correct, fix them up.  */
-  loop-num_nodes -= bbs.length ();
+  loop-num_nodes -= num_nodes;
   for (struct loop *outer = loop_outer (loop);
outer; outer = loop_outer (outer))
-outer-num_nodes -= bbs.length ();
+outer-num_nodes -= num_nodes;
+  loop0-num_nodes -= bbs.length () - num_nodes;
 
   if (saved_cfun-has_simduid_loops || saved_cfun-has_force_vect_loops)
 {
--- gcc/testsuite/c-c++-common/gomp/pr58551.c.jj2013-09-27 
11:18:20.825251967 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr58551.c   2013-09-27 11:17:56.0 
+0200
@@ -0,0 +1,33 @@
+/* PR middle-end/58551 */
+/* { dg-do compile } */
+/* { dg-options -O0 -fopenmp } */
+
+void
+foo (int *a)
+{
+  int i;
+  for (i = 0; i  8; i++)
+#pragma omp task
+if (a[i])
+  __builtin_abort ();
+}
+
+void bar (int, int);
+
+void
+baz (int *a)
+{
+  int i;
+  for (i = 0; i  8; i++)
+#pragma omp task
+if (a[i])
+  {
+   int j, k;
+   for (j = 0; j  10; j++)
+ for (k = 0; k  8; k++)
+   bar (j, k);
+   for (k = 0; k  12; k++)
+ bar (-1, k);
+   __builtin_abort ();
+  }
+}


Jakub


[patch] fix libstdc++/57465

2013-09-27 Thread Jonathan Wakely
PR libstdc++/57465
* include/std/functional
(_Function_base::_Base_manager::_M_not_empty_function): Fix overload
for pointers.
* testsuite/20_util/function/cons/57465.cc: New.


Tested x86_64-linux, committed to trunk.  I'll apply it to the
branches after it's been on trunk without problems for a while.
commit 55531e9c74a5f2b4699250b6b302d49f7dc8c5ae
Author: Jonathan Wakely jwakely@gmail.com
Date:   Wed Aug 7 01:38:39 2013 +0100

PR libstdc++/57465
* include/std/functional
(_Function_base::_Base_manager::_M_not_empty_function): Fix overload
for pointers.
* testsuite/20_util/function/cons/57465.cc: New.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 63ba777..73cddfe 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -1932,7 +1932,7 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type)
 
templatetypename _Tp
  static bool
- _M_not_empty_function(const _Tp* __fp)
+ _M_not_empty_function(_Tp* const __fp)
  { return __fp; }
 
templatetypename _Class, typename _Tp
diff --git a/libstdc++-v3/testsuite/20_util/function/cons/57465.cc 
b/libstdc++-v3/testsuite/20_util/function/cons/57465.cc
new file mode 100644
index 000..44413fb
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function/cons/57465.cc
@@ -0,0 +1,31 @@
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+// libstdc++/57465
+
+// { dg-options -std=gnu++11 }
+
+#include functional
+#include testsuite_hooks.h
+
+int main()
+{
+  using F = void();
+  F* f = nullptr;
+  std::functionF x(f);
+  VERIFY( !x );
+}


[PATCH] Invalid unpoisoning of stack redzones on ARM

2013-09-27 Thread Yury Gribov

Hi all,

I've recently submitted a bug report regarding invalid unpoisoning of 
stack frame redzones 
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take 
a look at proposed patch (a simple one-liner) and check whether it's ok 
for commit?


Thanks!

-Yuri
diff --git a/gcc/asan.c b/gcc/asan.c
index 32f1837..acb00ea 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 
   gcc_assert ((len  3) == 0);
   top_label = gen_label_rtx ();
-  addr = force_reg (Pmode, XEXP (shadow_mem, 0));
+  addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0)));
   shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0);
   end = force_reg (Pmode, plus_constant (Pmode, addr, len));
   emit_label (top_label);


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-09-27 Thread Teresa Johnson
On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka hubi...@ucw.cz wrote:

 Why not just have probably_never_executed_bb_p return simply return
 false bb-frequency is non-zero (right now it does the opposite -

 We want to have frequencies guessed for functions that was not trained
 in the profiling run (that was patch I posted earlier that I think did not
 go in, yet).

Right, but for splitting and bb layout purposes, for these statically
guessed unprofiled functions we in fact don't want to do any splitting
or treat the bbs as never executed (which shouldn't be a change from
the status quo since all the bbs in these functions are currently 0
weight, it's only when we inline in the case of comdats that they
appear colder than the surrounding code, but in fact we don't want
this).

The only other caller to probably_never_executed_bb_p is
compute_function_frequency, but in the case of statically guessed
functions they will have profile_status != PROFILE_READ and won't
invoke probably_never_executed_bb_p. But re-reading our most recent
exchange on the comdat profile issue, it sounds like you were
suggesting guessing profiles for all 0-weight functions early, then
dropping them from PROFILE_READ to PROFILE_GUESSED only once we
determine in ipa-inline that there is a potentially non-zero call path
to them. In that case with the change I describe above to
probably_never_executed_bb_p, the 0-weight functions with 0 calls to
them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would
be bad as they would not be size optimized or moved into the cold
section.

So it seems like we want different handling of these guessed
frequencies in compute_function_frequency and bb-reorder.c. Actually I
think we can handle this by checking if the function entry block has a
0 count. If so, then we just look at the bb counts and not the
frequencies for determining bb hotness as the frequencies would
presumably have been statically-guessed. This will ensure that the
cgraph node continues to be marked unlikely and size-optimized. If the
function entry block has a non-zero count, then we look at both the bb
count and the bb frequency - if they are both zero then the bb is
probably never executed, but if either is non-zero then we should
treat the block as possibly executed (which will come into play for
splitting and bb layout).

Teresa


 Currently I return true when frequency indicate that BB is executed at least 
 in
 1/4th of all executions.  With the cases discussed I see we may need to reduce
 this threshold.  In general I do not like much hard tests for 0 because 
 meaning
 of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may
 want to make frequencies sreal, too.

 I suppose we may introduce --param for this.  You are also right that I should
 update probably_never_executed_edge_p (I intended so, but obviously the code
 ended up in mainline accidentally).

 I however saw at least one case of jump threading where this trick did not
 help: the jump threading update confused itself by scaling via counts rather
 than frequencies and ended up with dropping everything to 0. This makes it
 more tempting to try to go with sreals for those

 Honza

 returns true when bb-frequency is 0)? Making this change removed a
 bunch of other failures. With this change as well, there are only 3
 cases that still fail with 1 train run that pass with 100. Need to
 look at those.

 
  Will you look into logic of do_jump or shall I try to dive in?

 I can take a look, but probably won't have a chance until late this
 week. If you don't get to it before then I will see if I can figure
 out why it is applying the branch probabilities this way.

 Teresa

 
  Honza



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: OMP4/cilkplus: simd clone function mangling

2013-09-27 Thread Jakub Jelinek
On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote:
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
return val;
  }
  
 +/* Return the default vector mangling ISA code when none is specified
 +   in a `processor' clause.  */
 +
 +static char
 +ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone
 + ATTRIBUTE_UNUSED)
 +{
 +  return 'x';
 +}

I think rth was suggesting using vecsize_mangle, vecsize_modifier or something 
else,
instead of ISA, because it won't represent the ISA on all targets.
It is just some magic letter used in mangling of the simd functions.

 +
 +  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
 + be cloned have a distinctive artificial label in addition to omp
 + declare simd.  */
 +  bool cilk_clone = flag_enable_cilkplus
 + lookup_attribute (cilk plus elemental,
 +  DECL_ATTRIBUTES (new_node-symbol.decl));

Formatting.  I'd say it should be
  bool cilk_clone
= (flag_enable_cilkplus
lookup_attribute (cilk plus elemental,
  DECL_ATTRIBUTES (new_node-symbol.decl)));

 +  if (cilk_clone)
 +remove_attribute (cilk plus elemental,
 +   DECL_ATTRIBUTES (new_node-symbol.decl));

I think it doesn't make sense to remove the attribute.

 +  pretty_printer vars_pp;

Do you really need two different pretty printers?
Can't you just print _ZGV%c%c%d into pp (is pp_printf
that cheap, wouldn't it be better to pp_string (pp, _ZGV),
2 pp_character + one pp_decimal_int?), and then do the loop over
the args, which right now writes into vars_pp and finally
pp_underscore and pp_string the normally mangled name?

 +/* Create a simd clone of OLD_NODE and return it.  */
 +
 +static struct cgraph_node *
 +simd_clone_create (struct cgraph_node *old_node)
 +{
 +  struct cgraph_node *new_node;
 +  new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
 +  NULL, NULL, simdclone);
 +

My understanding of how IPA cloning etc. works is that you first
set up various data structures describing how you change the arguments
and only then actually do cgraph_function_versioning which already during
the copying will do some of the transformations of the IL.
But perhaps those transformations are too complicated to describe for
tree-inline.c to make them for you.

 +  tree attr = lookup_attribute (omp declare simd,
 + DECL_ATTRIBUTES (node-symbol.decl));
 +  if (!attr)
 +return;
 +  do
 +{
 +  struct cgraph_node *new_node = simd_clone_create (node);
 +
 +  bool inbranch_clause;
 +  simd_clone_clauses_extract (new_node, TREE_VALUE (attr),
 +   inbranch_clause);
 +  simd_clone_compute_isa_and_simdlen (new_node);
 +  simd_clone_mangle (node, new_node);

As discussed on IRC, I was hoping that for OpenMP simd and selected
targets (e.g. i?86-linux and x86_64-linux) we could do better than that,
creating not just one or two clones as we do for Cilk+ where one can
select which CPU (and thus ISA) he wants to build the clones for, but
creating clones for all ISAs, and just based on command line options
either emit just one of them as the really optimized one and the others
just as thunks that would just call other simd clone functions or the
normal function possibly several times.

Jakub


Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-09-27 Thread Jakub Jelinek
On Fri, Sep 27, 2013 at 06:10:41PM +0400, Yury Gribov wrote:
 Hi all,
 
 I've recently submitted a bug report regarding invalid unpoisoning
 of stack frame redzones
 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone
 take a look at proposed patch (a simple one-liner) and check whether
 it's ok for commit?

Can you please be more verbose on why do you think it is the right fix,
what exactly is the problem and why force_reg wasn't sufficient?
What exactly was XEXP (shadow_mem, 0) that force_reg didn't force it into
a pseudo?

Also, you are missing a ChangeLog entry.

 diff --git a/gcc/asan.c b/gcc/asan.c
 index 32f1837..acb00ea 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
  
gcc_assert ((len  3) == 0);
top_label = gen_label_rtx ();
 -  addr = force_reg (Pmode, XEXP (shadow_mem, 0));
 +  addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0)));
shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0);
end = force_reg (Pmode, plus_constant (Pmode, addr, len));
emit_label (top_label);


Jakub


Add value range support into memcpy/memset expansion

2013-09-27 Thread Jan Hubicka
Hi,
this patch makes it possible to access value range info from setmem/movstr that
I plan to use in i386 memcpy/memset expansion code.  It is all quite
straighforward except that I need to deal with cases where max size does not
fit in HOST_WIDE_INT where I use maximal value as a marker.  It is then
translated as NULL pointer to the expander that is bit inconsistent with other
places that use -1 as marker of unknown value.

I also think we lose some cases because of TER replacing out the SSA_NAME by
something else, but it seems to work in quite many cases. This can be probably
tracked incrementally by disabling TER here or finally getting away from
expanding calls via the generic route.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* doc/md.texi (setmem, movstr): Update documentation.
* builtins.c (determine_block_size): New function.
(expand_builtin_memcpy): Use it and pass it to
emit_block_move_hints.
(expand_builtin_memset_args): Use it and pass it to
set_storage_via_setmem.
* expr.c (emit_block_move_via_movmem): Add min_size/max_size parameters;
update call to expander.
(emit_block_move_hints): Add min_size/max_size parameters.
(clear_storage_hints): Likewise.
(set_storage_via_setmem): Likewise.
(clear_storage): Update.
* expr.h (emit_block_move_hints, clear_storage_hints,
set_storage_via_setmem): Update prototype.

Index: doc/md.texi
===
--- doc/md.texi (revision 202968)
+++ doc/md.texi (working copy)
@@ -5198,6 +5198,9 @@ destination and source strings are opera
 the expansion of this pattern should store in operand 0 the address in
 which the @code{NUL} terminator was stored in the destination string.
 
+This patern has also several optional operands that are same as in
+@code{setmem}.
+
 @cindex @code{setmem@var{m}} instruction pattern
 @item @samp{setmem@var{m}}
 Block set instruction.  The destination string is the first operand,
@@ -5217,6 +5220,8 @@ respectively.  The expected alignment di
 in a way that the blocks are not required to be aligned according to it in
 all cases. This expected alignment is also in bytes, just like operand 4.
 Expected size, when unknown, is set to @code{(const_int -1)}.
+Operand 7 is the minimal size of the block and operand 8 is the
+maximal size of the block (NULL if it can not be represented as CONST_INT).
 
 The use for multiple @code{setmem@var{m}} is as for @code{movmem@var{m}}.
 
Index: builtins.c
===
--- builtins.c  (revision 202968)
+++ builtins.c  (working copy)
@@ -3070,6 +3070,51 @@ builtin_memcpy_read_str (void *data, HOS
   return c_readstr (str + offset, mode);
 }
 
+/* LEN specify length of the block of memcpy/memset operation.
+   Figure out its range and put it into MIN_SIZE/MAX_SIZE.  */
+
+static void
+determine_block_size (tree len, rtx len_rtx,
+ unsigned HOST_WIDE_INT *min_size,
+ unsigned HOST_WIDE_INT *max_size)
+{
+  if (CONST_INT_P (len_rtx))
+{
+  *min_size = *max_size = UINTVAL (len_rtx);
+  return;
+}
+  else
+{
+  double_int min, max;
+  if (TREE_CODE (len) == SSA_NAME 
+  get_range_info (len, min, max) == VR_RANGE)
+   {
+ if (min.fits_uhwi ())
+   *min_size = min.to_uhwi ();
+ else
+   *min_size = 0;
+ if (max.fits_uhwi ())
+   *max_size = max.to_uhwi ();
+ else
+   *max_size = (HOST_WIDE_INT)-1;
+   }
+  else
+   {
+ if (host_integerp (TYPE_MIN_VALUE (TREE_TYPE (len)), 1))
+   *min_size = tree_low_cst (TYPE_MIN_VALUE (TREE_TYPE (len)), 1);
+ else
+   *min_size = 0;
+ if (host_integerp (TYPE_MAX_VALUE (TREE_TYPE (len)), 1))
+   *max_size = tree_low_cst (TYPE_MAX_VALUE (TREE_TYPE (len)), 1);
+ else
+   *max_size = GET_MODE_MASK (GET_MODE (len_rtx));
+   }
+}
+  gcc_checking_assert (*max_size =
+  (unsigned HOST_WIDE_INT)
+ GET_MODE_MASK (GET_MODE (len_rtx)));
+}
+
 /* Expand a call EXP to the memcpy builtin.
Return NULL_RTX if we failed, the caller should emit a normal call,
otherwise try to get the result in TARGET, if convenient (and in
@@ -3092,6 +3137,8 @@ expand_builtin_memcpy (tree exp, rtx tar
   rtx dest_mem, src_mem, dest_addr, len_rtx;
   HOST_WIDE_INT expected_size = -1;
   unsigned int expected_align = 0;
+  unsigned HOST_WIDE_INT min_size;
+  unsigned HOST_WIDE_INT max_size;
 
   /* If DEST is not a pointer type, call the normal function.  */
   if (dest_align == 0)
@@ -3111,6 +3158,7 @@ expand_builtin_memcpy (tree exp, rtx tar
   dest_mem = get_memory_rtx (dest, len);
   set_mem_align (dest_mem, dest_align);
   len_rtx = expand_normal (len);
+  

Re: Generic tuning in x86-tune.def 1/2

2013-09-27 Thread H.J. Lu
On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this is second part of the generic tuning changes sanityzing the tuning flags.
 This patch again is supposed to deal with the obvious part only.
 I will send separate patch for more changes.

 The flags changed agree on all CPUs considered for generic (and their
 optimization manuals) + amdfam10, core2 and Atom SLM.

 I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it
 seems like obvious omision (after double checking in optimization manual) and
 droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores.  Implementation of this
 feature was always bit weird and its main purpose was to avoid terrible branch
 predictor degeneration on the older AMD branch predictors. I benchmarked both
 spec2k and 2k6 to verify there are no regression.

 Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements 
 in specfp
 benchmarks.

 Bootstrapped/regtested x86_64-linux, will wait for comments and commit it
 during weekend.  I will be happy to revisit any of the generic tuning if
 regressions pop up.

 Overall this patch also brings small code size improvements for smaller
 loads/stores and less padding at -O2. Differences are sub 0.1% however.

 Honza
 * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for 
 generic.
 (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
 (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer.
 (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips.

Can we drop generic on X86_TUNE_PAD_RETURNS?

 (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic.
 (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic.


-- 
H.J.


Re: [ping] [PATCH] Silence an unused variable warning

2013-09-27 Thread Vladimir Makarov

On 13-09-27 4:55 AM, Dodji Seketeli wrote:

Let's CC Vladimir on this easy one.

Cheers.
All targets I know have ELIMINABLE_REGS defined.  Therefore it was not 
caught before.

.
The patch is ok for me.  Thanks.



Jan-Benedict Glaw jbg...@lug-owl.de a écrit:


On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote:

Hi!

With the VAX target, I see this warning:

g++ -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
-I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. 
-I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include  
-I../../../../gcc/gcc/../libdecnumber -I../../../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../../../gcc/gcc/../libbacktrace
../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o
../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’:
../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable 
‘value_p’ [-Wunused-variable]
bool value_p;
 ^

[...]





Re: [PATCH] Make jump thread path carry more information

2013-09-27 Thread Jeff Law

On 09/27/2013 08:42 AM, James Greenhalgh wrote:

On Thu, Sep 26, 2013 at 04:26:35AM +0100, Jeff Law wrote:

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
Installed on trunk.


Hi Jeff,

This patch caused a regression on Arm and AArch64 in:

PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution,  -O3 
-fomit-frame-pointer

 From what I can see, the only place the behaviour of the threader has
changed is in this hunk:
Yes.  The old code was dropping the tail off the thread path; if we're 
seeing failures on the ARM port as a result of fixing that goof we 
obviously need to address them.


Let me take a looksie :-)

If you could pass along a .i file it'd be helpful in case I want to look 
at something under the debugger.



jeff



Re: Generic tuning in x86-tune.def 1/2

2013-09-27 Thread Jan Hubicka
 On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  this is second part of the generic tuning changes sanityzing the tuning 
  flags.
  This patch again is supposed to deal with the obvious part only.
  I will send separate patch for more changes.
 
  The flags changed agree on all CPUs considered for generic (and their
  optimization manuals) + amdfam10, core2 and Atom SLM.
 
  I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it
  seems like obvious omision (after double checking in optimization manual) 
  and
  droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores.  Implementation of this
  feature was always bit weird and its main purpose was to avoid terrible 
  branch
  predictor degeneration on the older AMD branch predictors. I benchmarked 
  both
  spec2k and 2k6 to verify there are no regression.
 
  Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements 
  in specfp
  benchmarks.
 
  Bootstrapped/regtested x86_64-linux, will wait for comments and commit it
  during weekend.  I will be happy to revisit any of the generic tuning if
  regressions pop up.
 
  Overall this patch also brings small code size improvements for smaller
  loads/stores and less padding at -O2. Differences are sub 0.1% however.
 
  Honza
  * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for 
  generic.
  (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
  (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer.
  (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips.
 
 Can we drop generic on X86_TUNE_PAD_RETURNS?
It is on my list for not-so-obvious changes.  I tested and removed it from
BDVER with intention to drop it from generic. But after furhter testing I lean
towards keeping it for some extra time.

I tested it on fam10 machines and it causes over 10% regressions on some
benchmarks, including bzip and botan (where it is up to 4-fold regression).
Missing a return on amdfam10 hardware is bad, because it causes return stack to
go out of sync. At the same time I can not really measure benefits for
disabling it - the code size cost is very small and runtime cost on
non-amdfam10 cores is not important, too, since the function call overhead hide
the extra nop quite easily.

So I would incline to be apply extra care on this flag and keep it for extra
release or two. Most of gcc.opensuse.org testing runs on these and adding
random branch mispredictions will trash them.

At the related note, would would you think of X86_TUNE_PARTIAL_FLAG_REG_STALL?
I benchmarked it on my I5 notebook and it seems to have no measurable effects
on spec2k6.

I also did some benchmarking of the patch to disable alignments you proposed.
Unforutnately I can measure slowdowns on fam10/bdver/and on botan/hand written
loops even for core.

I am considering to drop the branch target/function alignment and keep only loop
alignment, but I did not test this yet.

Honza


Re: Add value range support into memcpy/memset expansion

2013-09-27 Thread Xinliang David Li
Nice extension. Test cases would be great to have.

thanks,

David

On Fri, Sep 27, 2013 at 7:50 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 this patch makes it possible to access value range info from setmem/movstr 
 that
 I plan to use in i386 memcpy/memset expansion code.  It is all quite
 straighforward except that I need to deal with cases where max size does not
 fit in HOST_WIDE_INT where I use maximal value as a marker.  It is then
 translated as NULL pointer to the expander that is bit inconsistent with other
 places that use -1 as marker of unknown value.

 I also think we lose some cases because of TER replacing out the SSA_NAME by
 something else, but it seems to work in quite many cases. This can be probably
 tracked incrementally by disabling TER here or finally getting away from
 expanding calls via the generic route.

 Bootstrapped/regtested x86_64-linux, OK?

 Honza

 * doc/md.texi (setmem, movstr): Update documentation.
 * builtins.c (determine_block_size): New function.
 (expand_builtin_memcpy): Use it and pass it to
 emit_block_move_hints.
 (expand_builtin_memset_args): Use it and pass it to
 set_storage_via_setmem.
 * expr.c (emit_block_move_via_movmem): Add min_size/max_size 
 parameters;
 update call to expander.
 (emit_block_move_hints): Add min_size/max_size parameters.
 (clear_storage_hints): Likewise.
 (set_storage_via_setmem): Likewise.
 (clear_storage): Update.
 * expr.h (emit_block_move_hints, clear_storage_hints,
 set_storage_via_setmem): Update prototype.

 Index: doc/md.texi
 ===
 --- doc/md.texi (revision 202968)
 +++ doc/md.texi (working copy)
 @@ -5198,6 +5198,9 @@ destination and source strings are opera
  the expansion of this pattern should store in operand 0 the address in
  which the @code{NUL} terminator was stored in the destination string.

 +This patern has also several optional operands that are same as in
 +@code{setmem}.
 +
  @cindex @code{setmem@var{m}} instruction pattern
  @item @samp{setmem@var{m}}
  Block set instruction.  The destination string is the first operand,
 @@ -5217,6 +5220,8 @@ respectively.  The expected alignment di
  in a way that the blocks are not required to be aligned according to it in
  all cases. This expected alignment is also in bytes, just like operand 4.
  Expected size, when unknown, is set to @code{(const_int -1)}.
 +Operand 7 is the minimal size of the block and operand 8 is the
 +maximal size of the block (NULL if it can not be represented as CONST_INT).

  The use for multiple @code{setmem@var{m}} is as for @code{movmem@var{m}}.

 Index: builtins.c
 ===
 --- builtins.c  (revision 202968)
 +++ builtins.c  (working copy)
 @@ -3070,6 +3070,51 @@ builtin_memcpy_read_str (void *data, HOS
return c_readstr (str + offset, mode);
  }

 +/* LEN specify length of the block of memcpy/memset operation.
 +   Figure out its range and put it into MIN_SIZE/MAX_SIZE.  */
 +
 +static void
 +determine_block_size (tree len, rtx len_rtx,
 + unsigned HOST_WIDE_INT *min_size,
 + unsigned HOST_WIDE_INT *max_size)
 +{
 +  if (CONST_INT_P (len_rtx))
 +{
 +  *min_size = *max_size = UINTVAL (len_rtx);
 +  return;
 +}
 +  else
 +{
 +  double_int min, max;
 +  if (TREE_CODE (len) == SSA_NAME
 +  get_range_info (len, min, max) == VR_RANGE)
 +   {
 + if (min.fits_uhwi ())
 +   *min_size = min.to_uhwi ();
 + else
 +   *min_size = 0;
 + if (max.fits_uhwi ())
 +   *max_size = max.to_uhwi ();
 + else
 +   *max_size = (HOST_WIDE_INT)-1;
 +   }
 +  else
 +   {
 + if (host_integerp (TYPE_MIN_VALUE (TREE_TYPE (len)), 1))
 +   *min_size = tree_low_cst (TYPE_MIN_VALUE (TREE_TYPE (len)), 1);
 + else
 +   *min_size = 0;
 + if (host_integerp (TYPE_MAX_VALUE (TREE_TYPE (len)), 1))
 +   *max_size = tree_low_cst (TYPE_MAX_VALUE (TREE_TYPE (len)), 1);
 + else
 +   *max_size = GET_MODE_MASK (GET_MODE (len_rtx));
 +   }
 +}
 +  gcc_checking_assert (*max_size =
 +  (unsigned HOST_WIDE_INT)
 + GET_MODE_MASK (GET_MODE (len_rtx)));
 +}
 +
  /* Expand a call EXP to the memcpy builtin.
 Return NULL_RTX if we failed, the caller should emit a normal call,
 otherwise try to get the result in TARGET, if convenient (and in
 @@ -3092,6 +3137,8 @@ expand_builtin_memcpy (tree exp, rtx tar
rtx dest_mem, src_mem, dest_addr, len_rtx;
HOST_WIDE_INT expected_size = -1;
unsigned int expected_align = 0;
 +  unsigned HOST_WIDE_INT min_size;
 +  unsigned HOST_WIDE_INT max_size;

/* If DEST is not a pointer type, call the normal function.  */
  

Re: Generic tuning in x86-tune.def 1/2

2013-09-27 Thread H.J. Lu
On Fri, Sep 27, 2013 at 8:36 AM, Jan Hubicka hubi...@ucw.cz wrote:
 On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  this is second part of the generic tuning changes sanityzing the tuning 
  flags.
  This patch again is supposed to deal with the obvious part only.
  I will send separate patch for more changes.
 
  The flags changed agree on all CPUs considered for generic (and their
  optimization manuals) + amdfam10, core2 and Atom SLM.
 
  I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since 
  it
  seems like obvious omision (after double checking in optimization manual) 
  and
  droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores.  Implementation of this
  feature was always bit weird and its main purpose was to avoid terrible 
  branch
  predictor degeneration on the older AMD branch predictors. I benchmarked 
  both
  spec2k and 2k6 to verify there are no regression.
 
  Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice 
  improvements in specfp
  benchmarks.
 
  Bootstrapped/regtested x86_64-linux, will wait for comments and commit it
  during weekend.  I will be happy to revisit any of the generic tuning if
  regressions pop up.
 
  Overall this patch also brings small code size improvements for smaller
  loads/stores and less padding at -O2. Differences are sub 0.1% however.
 
  Honza
  * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for 
  generic.
  (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise.
  (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer.
  (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips.

 Can we drop generic on X86_TUNE_PAD_RETURNS?
 It is on my list for not-so-obvious changes.  I tested and removed it from
 BDVER with intention to drop it from generic. But after furhter testing I lean
 towards keeping it for some extra time.

 I tested it on fam10 machines and it causes over 10% regressions on some
 benchmarks, including bzip and botan (where it is up to 4-fold regression).
 Missing a return on amdfam10 hardware is bad, because it causes return stack 
 to
 go out of sync. At the same time I can not really measure benefits for
 disabling it - the code size cost is very small and runtime cost on
 non-amdfam10 cores is not important, too, since the function call overhead 
 hide
 the extra nop quite easily.

I see.

 So I would incline to be apply extra care on this flag and keep it for extra
 release or two. Most of gcc.opensuse.org testing runs on these and adding
 random branch mispredictions will trash them.

 At the related note, would would you think of X86_TUNE_PARTIAL_FLAG_REG_STALL?
 I benchmarked it on my I5 notebook and it seems to have no measurable effects
 on spec2k6.

 I also did some benchmarking of the patch to disable alignments you proposed.
 Unforutnately I can measure slowdowns on fam10/bdver/and on botan/hand written
 loops even for core.

I am not surprised about hand written loops.  Have you
tried SPEC CPU rate?

 I am considering to drop the branch target/function alignment and keep only 
 loop
 alignment, but I did not test this yet.

 Honza



-- 
H.J.


Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib

2013-09-27 Thread Steve Ellcey
On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote:

 I'm in two minds about whether further sticky tape of this form is the 
 right approach or whether the original patch should be reverted until a 
 proper fix that does not regress the tree can be found.
 
 Thoughts?
 
 2013-09-26  Marcus Shawcroft  marcus.shawcr...@arm.com
 
  * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement
  dependent on gcc_no_link.
 
 Cheers
 /Marcus

Well, I thought this patch would work for me, but it does not.  It looks
like gcc_no_link is set to 'no' on my target because, technically, I can
link even if I don't use a linker script.  I just can't find any
functions.


% cat x.c
int main (void) { return 0; }
% mips-mti-elf-gcc x.c -o x
/local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld:
 warning: cannot find entry symbol __start; defaulting to 00400098
% echo $?
0


% cat y.c
int main (void) { exit (0); }
% install-mips-mti-elf/bin/mips-mti-elf-gcc y.c -o y
/local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld:
 warning: cannot find entry symbol __start; defaulting to 00400098
/tmp/ccdG78PN.o: In function `main':
y.c:(.text+0x14): undefined reference to `exit'
collect2: error: ld returned 1 exit status
ubuntu-sellcey % echo $?
1


Steve Ellcey
sell...@mips.com




Remove enum ssa_mode

2013-09-27 Thread Diego Novillo
The gimple builder no longer support normal form. The ssa_mode
enum is not needed now.

Committed to trunk.

* gimple.h (enum ssa_mode): Remove.
---
 gcc/gimple.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 3047ab4..a031c8d 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -34,15 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 
 typedef gimple gimple_seq_node;
 
-/* Types of supported temporaries.  GIMPLE temporaries may be symbols
-   in normal form (i.e., regular decls) or SSA names.  This enum is
-   used by create_gimple_tmp to tell it what kind of temporary the
-   caller wants.  */
-enum ssa_mode {
-M_SSA = 0,
-M_NORMAL
-};
-
 /* For each block, the PHI nodes that need to be rewritten are stored into
these vectors.  */
 typedef vecgimple gimple_vec;
-- 
1.8.4



Re: Context sensitive type inheritance graph walking

2013-09-27 Thread Martin Jambor
Hi,

sorry it took me so long, but it also took me quite a while to chew
through.  Please consider posting context diff in cases like this.
Nevertheless, most of the patch is a nice improvement.

On Wed, Sep 25, 2013 at 12:20:50PM +0200, Jan Hubicka wrote:
 Hi,
 this is updated version of 
 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html
 
 It updates the type inheritance graph walking algorithm to be context 
 sensitive.
 Contex is stored in ipa_polymorhic_call_context containing
   - OUTER_TYPE
   (a type of memory object that contains the object whose method is 
 called,
NULL if unknown)
   - OFFSET
   offset withing OUTER_TYPE where object is located
   - MAYBE_IN_CONSTRUCTION
   true if the object may be in construction. I.e. it is local or static
   variable.
   At the moment we do not try to disprove construciton that is partly
   done by ipa-prop and is planned to be merged here in future
   - MAYBE_DERIVED_TYPE
   if OUTER_TYPE object actually may be a derivation of a type.
   For example when OUTER_TYPE was determined from THIS parameter of a 
 method.
 
 There is cgraph_indirect_call_info that walks GIMPLE code and attempts to
 determine the context of a given call.  It looks for objects located in
 declarations (static vars/ automatic vars/parameters), objects passed by
 invisible references and objects passed as THIS pointers.
 The second two cases are new, the first case is already done by 
 gimple_extract_devirt_binfo_from_cst

and I assume we should really put the context there, rather than
reconstructing it from the edge.  Of course we must stop overloading
the offset field for that, are there any other obstacles?

 Context is determined by cgraph construction code and stored in cgraph edges.
 In future I expect ipa-prop to manipulate and update the contextes and 
 propagate
 across them, but it is not implemented.  For this reason I did not add context
 itself into cgrpah edge, merely I added the new info and hacked ipa-prop 
 (temporarily)
 to simply throw away the actual info.
 
 The highlevel idea is to make get_class_context to fill in info for known type
 and ancestor functions, too and then we can have function combining types +
 doing propagation across them replacing the current code that deals with 
 BINFOs
 directly and thus can not deal with all the cases above very well.
 
 possible_polymorphic_call_targets is now bit more complex.  it is split into
   - get_class_context
   this function walks the OUTER_TYPE (if present) and looks up the type
   of class actually containing the method being called.
 
   Previously similar logic was part of 
 gimple_extract_devirt_binfo_from_cst.
   - walk_bases
   When type is in construction, all base types are walked and for those 
 containing
   OTR_TYPE at proper memory location the corresponding virtual methods 
 are added
   to list
   - record_binfos now walks the inner binfo from OUTER_TYPE to OTR_TYPE
   via get_binfo_at_offset.
 
 Bootstrapped/regtested x86_64-linux.  The patch is tested by ipa-devirt9
 testcase.  I have extra four, but I would like to first fix the case where the
 devirtualization happens in TODO of early_local_passes that is not dumped
 anywhere.  So I plan to post these incrementally once this code is hooked also
 into gimple folding.
 
 The patch results in 60% more devirtualizations on Firefox and 10% more
 speculative devirtualization.  I think main component missing now is code
 determining dynamic type from a call to constructor.  I have some prototype 
 for
 this, too, I would like to discuss incrementally.  I am not 100% sure how much
 harder tracking of dynamic type changes becomes here.
 
 Martin, does it seem to make sense?
 
 Honza
 
   * cgraph.c (cgraph_create_indirect_edge): Use get_polymorphic_call_info.
   * cgraph.h (cgraph_indirect_call_info): Add outer_type, 
 maybe_in_construction
   and maybe_derived_type.
   * ipa-utils.h (ipa_polymorphic_call_context): New structure.
   (ipa_dummy_polymorphic_call_context): New global var.
   (possible_polymorphic_call_targets): Add context paramter.
   (dump_possible_polymorphic_call_targets): Likewise; update
   wrappers.
   (possible_polymorphic_call_target_p): Likewise.
   (get_polymorphic_call_info): New function.
   * ipa-devirt.c (ipa_dummy_polymorphic_call_context): New function.
   (add_type_duplicate): Remove forgotten debug output.
   (method_class_type): Add sanity check.
   (maybe_record_node): Add FINALP parameter.
   (record_binfo): Add OUTER_TYPE and OFFSET; walk the inner
   by info by get_binfo_at_offset.
   (possible_polymorphic_call_targets_1): Add OUTER_TYPE/OFFSET parameters;
   pass them to record-binfo.
   (polymorphic_call_target_d): Add context and FINAL.
   (polymorphic_call_target_hasher::hash): Hash context.
   (polymorphic_call_target_hasher::equal): Compare context.
   

Re: [PATCH] Make jump thread path carry more information

2013-09-27 Thread James Greenhalgh
On Fri, Sep 27, 2013 at 04:32:10PM +0100, Jeff Law wrote:
 If you could pass along a .i file it'd be helpful in case I want to look 
 at something under the debugger.

I've opened http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 to save
everyone's inboxes.

Let me know if I can do anything else to help.

Cheers,
James



Re: [PATCH] Make jump thread path carry more information

2013-09-27 Thread Jeff Law

On 09/27/2013 10:48 AM, James Greenhalgh wrote:

On Fri, Sep 27, 2013 at 04:32:10PM +0100, Jeff Law wrote:

If you could pass along a .i file it'd be helpful in case I want to look
at something under the debugger.


I've opened http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 to save
everyone's inboxes.

Let me know if I can do anything else to help.

THanks.  I'm investigating.

jeff


Re: [google gcc-4_8] fix size_estimation for builtin_expect

2013-09-27 Thread Rong Xu
On Fri, Sep 27, 2013 at 1:20 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Sep 27, 2013 at 12:23 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 builtin_expect should be a NOP in size_estimation. Indeed, the call
 stmt itself is 0 weight in size and time. But it may introduce
 an extra relation expr which has non-zero size/time. The end result
 is: for w/ and w/o builtin_expect, we have different size/time estimation
 for early inlining.

 This patch fixes this problem.

 -Rong

 2013-09-26  Rong Xu  x...@google.com

   * ipa-inline-analysis.c (estimate_function_body_sizes): fix
 the size estimation for builtin_expect.

 This seems fine with an comment in the code what it is about.
 I also think we want to support mutiple builtin_expects in a BB so perhaps
 we want to have pointer set of statements to ignore?

 To avoid spagetti code, please just move the new logic into separate 
 functions.

 Looks like this could use tree-ssa.c:walk_use_def_chains (please
 change its implementation as necessary, make it C++, etc. - you will
 be the first user again).


Thanks for the suggestion. But it seems walk_use_def_chains() was
removed by r201951.

-Rong

 Richard.

 Honza

 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c (revision 202638)
 +++ ipa-inline-analysis.c (working copy)
 @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node *
/* Estimate static overhead for function prologue/epilogue and 
 alignment. */
int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
int size = overhead;
 +  gimple fix_expect_builtin;
 +
/* Benefits are scaled by probability of elimination that is in range
   0,2.  */
basic_block bb;
 @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node *
   }
   }

 +  fix_expect_builtin = NULL;
for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
   {
 gimple stmt = gsi_stmt (bsi);
 +   if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT))
 +{
 +  tree var = gimple_call_lhs (stmt);
 +  tree arg = gimple_call_arg (stmt, 0);
 +  use_operand_p use_p;
 +  gimple use_stmt;
 +  bool match = false;
 +  bool done = false;
 +  gcc_assert (var  arg);
 +  gcc_assert (TREE_CODE (var) == SSA_NAME);
 +
 +  while (TREE_CODE (arg) == SSA_NAME)
 +{
 +  gimple stmt_tmp = SSA_NAME_DEF_STMT (arg);
 +  switch (gimple_assign_rhs_code (stmt_tmp))
 +{
 +  case LT_EXPR:
 +  case LE_EXPR:
 +  case GT_EXPR:
 +  case GE_EXPR:
 +  case EQ_EXPR:
 +  case NE_EXPR:
 +match = true;
 +done = true;
 +break;
 +  case NOP_EXPR:
 +break;
 +  default:
 +done = true;
 +break;
 +}
 +  if (done)
 +break;
 +  arg = gimple_assign_rhs1 (stmt_tmp);
 +}
 +
 +  if (match  single_imm_use (var, use_p, use_stmt))
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +{
 +  fix_expect_builtin = use_stmt;
 +}
 +}
 +
 +  /* we should see one builtin_expert call in one bb.  */
 +  break;
 +}
 +}
 +
 +  for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
 + {
 +   gimple stmt = gsi_stmt (bsi);
 int this_size = estimate_num_insns (stmt, eni_size_weights);
 int this_time = estimate_num_insns (stmt, eni_time_weights);
 int prob;
 struct predicate will_be_nonconstant;

 +   if (stmt == fix_expect_builtin)
 +{
 +  this_size--;
 +  this_time--;
 +}
 +
 if (dump_file  (dump_flags  TDF_DETAILS))
   {
 fprintf (dump_file,   );



Re: OMP4/cilkplus: simd clone function mangling

2013-09-27 Thread Aldy Hernandez

On 09/27/13 09:23, Jakub Jelinek wrote:

On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote:

--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val)
return val;
  }

+/* Return the default vector mangling ISA code when none is specified
+   in a `processor' clause.  */
+
+static char
+ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone
+   ATTRIBUTE_UNUSED)
+{
+  return 'x';
+}


I think rth was suggesting using vecsize_mangle, vecsize_modifier or something 
else,
instead of ISA, because it won't represent the ISA on all targets.
It is just some magic letter used in mangling of the simd functions.


I thought he was only talking about the local vecsize_mangle() function, 
not the target hooks.  Fair enough, I have changed all the ISA 
references when they can be replaced with *mangle* or something similar.





+
+  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
+ be cloned have a distinctive artificial label in addition to omp
+ declare simd.  */
+  bool cilk_clone = flag_enable_cilkplus
+ lookup_attribute (cilk plus elemental,
+DECL_ATTRIBUTES (new_node-symbol.decl));


Formatting.  I'd say it should be
   bool cilk_clone
 = (flag_enable_cilkplus
 lookup_attribute (cilk plus elemental,
  DECL_ATTRIBUTES (new_node-symbol.decl)));


+  if (cilk_clone)
+remove_attribute (cilk plus elemental,
+ DECL_ATTRIBUTES (new_node-symbol.decl));


I think it doesn't make sense to remove the attribute.


Done.




+  pretty_printer vars_pp;


Do you really need two different pretty printers?


Whoops, fixed.  Nice catch.


Can't you just print _ZGV%c%c%d into pp (is pp_printf
that cheap, wouldn't it be better to pp_string (pp, _ZGV),
2 pp_character + one pp_decimal_int?), and then do the loop over
the args, which right now writes into vars_pp and finally
pp_underscore and pp_string the normally mangled name?


pp_printf() would be cheap.  It's only used for a few cloned functions 
in a compilation unit.  I like printf.  It's pretty and clean.  Not 
using it, is like saving sex for your old age ;-).  But just to keep you 
happy, I changed it...global maintainers are free to live their celibate 
monk lives as they see fit :).





+/* Create a simd clone of OLD_NODE and return it.  */
+
+static struct cgraph_node *
+simd_clone_create (struct cgraph_node *old_node)
+{
+  struct cgraph_node *new_node;
+  new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
+NULL, NULL, simdclone);
+


My understanding of how IPA cloning etc. works is that you first
set up various data structures describing how you change the arguments
and only then actually do cgraph_function_versioning which already during
the copying will do some of the transformations of the IL.
But perhaps those transformations are too complicated to describe for
tree-inline.c to make them for you.


Sure, we can worry about that when we're actually emitting the actual 
clones (as discussed below), and when we start adapting the vectorizer.





+  tree attr = lookup_attribute (omp declare simd,
+   DECL_ATTRIBUTES (node-symbol.decl));
+  if (!attr)
+return;
+  do
+{
+  struct cgraph_node *new_node = simd_clone_create (node);
+
+  bool inbranch_clause;
+  simd_clone_clauses_extract (new_node, TREE_VALUE (attr),
+ inbranch_clause);
+  simd_clone_compute_isa_and_simdlen (new_node);
+  simd_clone_mangle (node, new_node);


As discussed on IRC, I was hoping that for OpenMP simd and selected
targets (e.g. i?86-linux and x86_64-linux) we could do better than that,
creating not just one or two clones as we do for Cilk+ where one can
select which CPU (and thus ISA) he wants to build the clones for, but
creating clones for all ISAs, and just based on command line options
either emit just one of them as the really optimized one and the others
just as thunks that would just call other simd clone functions or the
normal function possibly several times.


The thunk sounds like a good idea, long term.  How about we start by 
emitting all the variants up-front and then we can optimize these cases 
later?


I was thinking, in the absence of a `simdlen' clause, we can provide a 
target hook that returns a vector of (struct { int hw_vector_size; char 
vecsize_mangle }) which would gives us the different clone variants we 
should generate.  If the user provides `simdlen', we can continue 
generating just one clone (or two with *inbranch) with the present 
generic algorithm in simd_clone_compute_vecsize_and_simdlen().


Or do you have any other ideas?  But I'd like to leave the thunking 
business after we get the general infrastructure working.


Aldy
diff --git a/gcc/ChangeLog 

Re: cost model patch

2013-09-27 Thread Xinliang David Li
Please review the changes.html change and suggest better wordings if possible:

ndex: htdocs/gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.26
diff -u -r1.26 changes.html
--- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 - 1.26
+++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 -
@@ -37,6 +37,7 @@
   ul
 liAddressSanitizer, a fast memory error detector, is now
available on ARM.
 /li
+liGCC introduces a new cost model for vectorizer, called
'cheap' model. The new cost model is intenteded to minimize compile
time, code size, and potential negative runtime impact introduced when
vectorizer is turned on at the expense of not getting the maximum
potential runtime speedup. The 'cheap' model will be the default when
vectorizer is turned on at code-O2/code. To override this, use
option code-fvect-cost-model=[cheap|dynamic|unlimited]/code.
   /ul

 h2New Languages and Language specific improvements/h2

thanks,

David


On Thu, Sep 26, 2013 at 11:09 AM, Xinliang David Li davi...@google.com wrote:
 On Thu, Sep 26, 2013 at 7:37 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li davi...@google.com 
 wrote:
 I took the liberty to pick up Richard's original fvect-cost-model
 patch and made some modification.

 What has not changed:
 1) option -ftree-vect-loop-version is removed;
 2) three cost models are introduced: cheap, dynamic, and unlimited;
 3) unless explicitly specified, cheap model is the default at O2 (e.g.
 when -ftree-loop-vectorize is used with -O2), and dynamic mode is the
 default for O3 and FDO
 4) alignment based versioning is disabled with cheap model.

 What has changed:
 1) peeling is also disabled with cheap model;
 2) alias check condition limit is reduced with cheap model, but not
 completely suppressed. Runtime alias check is a pretty important
 enabler.
 3) tree if conversion changes are not included.

 Does this patch look reasonable?

 In principle yes.  Note that it changes the behavior of -O2 -ftree-vectorize
 as -ftree-vectorize does not imply changing the default cost model.  I am
 fine with that, but eventually this will have some testsuite fallout.  This
 reorg would also need documenting in changes.html to make people
 aware of this.


 Here is the proposed change:


 Index: htdocs/gcc-4.9/changes.html
 ===
 RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
 retrieving revision 1.26
 diff -u -r1.26 changes.html
 --- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 - 1.26
 +++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 -
 @@ -37,6 +37,7 @@
ul
  liAddressSanitizer, a fast memory error detector, is now
 available on ARM.
  /li
 +liGCC introduces a new cost model for vectorizer, called
 'cheap' model. The new cost model is intenteded to minimize compile
 time, code size, and potential negative runtime impact introduced when
 vectorizer is turned on at the expense of not getting the maximum
 potential runtime speedup. The 'cheap' model will be the default when
 vectorizer is turned on at code-O2/code. To override this, use
 option code-fvect-cost-model=[cheap|dynamic|unlimited]/code.
/ul

  h2New Languages and Language specific improvements/h2



 With completely disabling alingment peeling and alignment versioning
 you cut out targets that have no way of performing unaligned accesses.
 From looking at vect_no_align this are mips, sparc, ia64 and some arm.
 A compromise for them would be to allow peeling a single iteration
 and some alignment checks (like up to two?).


 Possibly. I think target owners can choose to do target specific
 tunings as follow up.


 Reducing the number of allowed alias-checks is ok, but I'd reduce it
 more than to 6 (was that an arbitrary number or is that the result of
 some benchmarking?)


 yes -- we found that it is not uncommon to have a loop with 2 or 3
 distinct source address and 1 or 2 target address.

 There are also tuning opportunities. For instance, in cases where
 source address are derived from the same base, a consolidated alias
 check (against the whole access range instead of just checking cross
 1-unrolled iteration dependence) can be done.

 I suppose all of the params could use some benchmarking to select
 a sweet spot in code size vs. runtime.

 Agree.



 I suppose the patch is ok as-is (if it actually works) if you provide
 a changelog and propose an entry for changes.html.  We can
 tune the params for the cheap model as followup.

 Ok. I will do more testing and check in the patch with proper
 ChangeLog. The changes.html change will be done separately.

 thanks,

 David



 Thanks for picking this up,
 Richard.

 thanks,

 David


libgo patch committed: Implement reflect.MakeFunc for amd64

2013-09-27 Thread Ian Lance Taylor
The Go standard library has an interesting function named
reflect.MakeFunc.  It takes a Go function F that accepts and returns a
slice of reflect.Value, and a function type T, and returns a pointer to
a function of type T that converts its arguments to reflect.Value, calls
F, and converts the returned reflect.Value into the appropriate return
types.  In effect this is the reverse of libffi: instead of describing a
function and calling it, we describe a function and permit it to be
called.

For gccgo I tried to implement this generically using the builtin
varargs functions, but that failed because I had no way to handle the
return type.  Many Go functions return multiple values, which in gccgo
is represented as returning a struct, and, of course, in some cases a
struct is returned by passing a hidden pointer as the first argument,
and in other cases is handled by splitting up the struct into different
register classes.  So handling this generically is essentially
impossible, at least without adding some more builtin functions to
somehow handle the return value, builtin functions that I couldn't
figure out how to even represent.

So I gave up and went for a processor-specific approach.  The idea is
that processor-specific assembly code will save all the relevant
registers into a struct, and pass them to processor-specific Go code
which will implement the calling convention.  This has the advantage
that I only need to deal with Go types, which in particular means no
worries about vector types.

This patch implements this approach for x86_64.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.8
branch.

Ian

diff -r 024105249263 libgo/Makefile.am
--- a/libgo/Makefile.am	Tue Sep 24 20:26:38 2013 -0700
+++ b/libgo/Makefile.am	Fri Sep 27 08:06:13 2013 -0700
@@ -895,9 +895,21 @@
 	go/path/match.go \
 	go/path/path.go
 
+if LIBGO_IS_X86_64
+go_reflect_makefunc_file = \
+	go/reflect/makefuncgo_amd64.go
+go_reflect_makefunc_s_file = \
+	go/reflect/makefunc_amd64.S
+else
+go_reflect_makefunc_file =
+go_reflect_makefunc_s_file = \
+	go/reflect/makefunc_dummy.c
+endif
+
 go_reflect_files = \
 	go/reflect/deepequal.go \
 	go/reflect/makefunc.go \
+	$(go_reflect_makefunc_file) \
 	go/reflect/type.go \
 	go/reflect/value.go
 
@@ -1761,6 +1773,7 @@
 	os.lo \
 	path.lo \
 	reflect-go.lo \
+	reflect/makefunc.lo \
 	regexp.lo \
 	runtime-go.lo \
 	sort.lo \
@@ -2147,6 +2160,9 @@
 	$(BUILDPACKAGE)
 reflect/check: $(CHECK_DEPS)
 	@$(CHECK)
+reflect/makefunc.lo: $(go_reflect_makefunc_s_file)
+	@$(MKDIR_P) reflect
+	$(LTCOMPILE) -c -o $@ $
 .PHONY: reflect/check
 
 @go_include@ regexp.lo.dep
diff -r 024105249263 libgo/go/reflect/all_test.go
--- a/libgo/go/reflect/all_test.go	Tue Sep 24 20:26:38 2013 -0700
+++ b/libgo/go/reflect/all_test.go	Fri Sep 27 08:06:13 2013 -0700
@@ -1430,11 +1430,13 @@
 	}
 }
 
-/*
-
-Not yet implemented for gccgo.
-
 func TestMakeFunc(t *testing.T) {
+	switch runtime.GOARCH {
+	case amd64:
+	default:
+		t.Skip(MakeFunc not implemented for  + runtime.GOARCH)
+	}
+
 	f := dummy
 	fv := MakeFunc(TypeOf(f), func(in []Value) []Value { return in })
 	ValueOf(f).Elem().Set(fv)
@@ -1452,8 +1454,6 @@
 	}
 }
 
-*/
-
 type Point struct {
 	x, y int
 }
diff -r 024105249263 libgo/go/reflect/makefunc.go
--- a/libgo/go/reflect/makefunc.go	Tue Sep 24 20:26:38 2013 -0700
+++ b/libgo/go/reflect/makefunc.go	Fri Sep 27 08:06:13 2013 -0700
@@ -7,6 +7,7 @@
 package reflect
 
 import (
+	runtime
 	unsafe
 )
 
@@ -45,14 +46,33 @@
 		panic(reflect: call of MakeFunc with non-Func type)
 	}
 
+	switch runtime.GOARCH {
+	case amd64:
+	default:
+		panic(reflect.MakeFunc not implemented for  + runtime.GOARCH)
+	}
+
 	t := typ.common()
 	ftyp := (*funcType)(unsafe.Pointer(t))
 
-	_, _ = t, ftyp
+	// Indirect Go func value (dummy) to obtain
+	// actual code address. (A Go func value is a pointer
+	// to a C function pointer. http://golang.org/s/go11func.)
+	dummy := makeFuncStub
+	code := **(**uintptr)(unsafe.Pointer(dummy))
 
-	panic(reflect MakeFunc not implemented)
+	impl := makeFuncImpl{code: code, typ: ftyp, fn: fn}
+
+	return Value{t, unsafe.Pointer(impl), flag(Func)  flagKindShift}
 }
 
+// makeFuncStub is an assembly function that is the code half of
+// the function returned from MakeFunc. It expects a *callReflectFunc
+// as its context register, and its job is to invoke callReflect(ctxt, frame)
+// where ctxt is the context register and frame is a pointer to the first
+// word in the passed-in argument frame.
+func makeFuncStub()
+
 // makeMethodValue converts v from the rcvr+method index representation
 // of a method value to an actual method func value, which is
 // basically the receiver value with a special bit set, into a true
diff -r 024105249263 libgo/go/reflect/makefunc_amd64.S
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/libgo/go/reflect/makefunc_amd64.S	Fri Sep 27 08:06:13 2013 -0700
@@ -0,0 +1,107 @@
+# Copyright 2013 The Go Authors. All rights reserved.
+# Use of 

Re: [google gcc-4_8] fix size_estimation for builtin_expect

2013-09-27 Thread Rong Xu
On Thu, Sep 26, 2013 at 3:23 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 builtin_expect should be a NOP in size_estimation. Indeed, the call
 stmt itself is 0 weight in size and time. But it may introduce
 an extra relation expr which has non-zero size/time. The end result
 is: for w/ and w/o builtin_expect, we have different size/time estimation
 for early inlining.

 This patch fixes this problem.

 -Rong

 2013-09-26  Rong Xu  x...@google.com

   * ipa-inline-analysis.c (estimate_function_body_sizes): fix
 the size estimation for builtin_expect.

 This seems fine with an comment in the code what it is about.
 I also think we want to support mutiple builtin_expects in a BB so perhaps
 we want to have pointer set of statements to ignore?


Thanks for feedback. I'll add the comment and split the code into a
new function.

You have a good pointer about multiple builtin_expect. I think I need
to remove the early exit (the break stmt).
But I would argue that using pointer_set is probably not necessary.

Here is the reasoning: The typical usage of builtin_expect is like:
  if (__builtin_expect (a=b, 1))  { ... }
The IR is:
  t1 = a = b;
  t2 = (long int) t1;
  t3 = __builtin_expect (t2, 1);
  if (t3 != 0) goto ...
Without the builtin, the IR is
  if (a=b) goto...

The code in the patch check the use of the builtin_expect return
value, to see if it's
used in a COND stmt. So even there are multiple builtins, only one can match.

-Rong

 To avoid spagetti code, please just move the new logic into separate 
 functions.

 Honza

 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c (revision 202638)
 +++ ipa-inline-analysis.c (working copy)
 @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node *
/* Estimate static overhead for function prologue/epilogue and alignment. 
 */
int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
int size = overhead;
 +  gimple fix_expect_builtin;
 +
/* Benefits are scaled by probability of elimination that is in range
   0,2.  */
basic_block bb;
 @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node *
   }
   }

 +  fix_expect_builtin = NULL;
for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
   {
 gimple stmt = gsi_stmt (bsi);
 +   if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT))
 +{
 +  tree var = gimple_call_lhs (stmt);
 +  tree arg = gimple_call_arg (stmt, 0);
 +  use_operand_p use_p;
 +  gimple use_stmt;
 +  bool match = false;
 +  bool done = false;
 +  gcc_assert (var  arg);
 +  gcc_assert (TREE_CODE (var) == SSA_NAME);
 +
 +  while (TREE_CODE (arg) == SSA_NAME)
 +{
 +  gimple stmt_tmp = SSA_NAME_DEF_STMT (arg);
 +  switch (gimple_assign_rhs_code (stmt_tmp))
 +{
 +  case LT_EXPR:
 +  case LE_EXPR:
 +  case GT_EXPR:
 +  case GE_EXPR:
 +  case EQ_EXPR:
 +  case NE_EXPR:
 +match = true;
 +done = true;
 +break;
 +  case NOP_EXPR:
 +break;
 +  default:
 +done = true;
 +break;
 +}
 +  if (done)
 +break;
 +  arg = gimple_assign_rhs1 (stmt_tmp);
 +}
 +
 +  if (match  single_imm_use (var, use_p, use_stmt))
 +{
 +  if (gimple_code (use_stmt) == GIMPLE_COND)
 +{
 +  fix_expect_builtin = use_stmt;
 +}
 +}
 +
 +  /* we should see one builtin_expert call in one bb.  */
 +  break;
 +}
 +}
 +
 +  for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi))
 + {
 +   gimple stmt = gsi_stmt (bsi);
 int this_size = estimate_num_insns (stmt, eni_size_weights);
 int this_time = estimate_num_insns (stmt, eni_time_weights);
 int prob;
 struct predicate will_be_nonconstant;

 +   if (stmt == fix_expect_builtin)
 +{
 +  this_size--;
 +  this_time--;
 +}
 +
 if (dump_file  (dump_flags  TDF_DETAILS))
   {
 fprintf (dump_file,   );



Re: Commit: MSP430: Pass -md on to assembler

2013-09-27 Thread Mike Stump
On Sep 27, 2013, at 1:48 AM, nick clifton ni...@redhat.com wrote:
 OK by me, although I cannot approve that particular patch.

I know, the intent is for someone that can, to approve it.

 But I ran into a very strange problem.  With your PARTIAL_INT_MODE_NAME patch 
 applied GCC started erroneously eliminating NULL function pointer checks!  
 This was particularly noticeable in libgcc/crtstuff.c where for example:

Index: stor-layout.c
===
--- stor-layout.c   (revision 202634)
+++ stor-layout.c   (working copy)
@@ -2821,7 +2821,7 @@ get_mode_bounds (enum machine_mode mode,
 enum machine_mode target_mode,
 rtx *mmin, rtx *mmax)
 {
-  unsigned size = GET_MODE_BITSIZE (mode);
+  unsigned size = GET_MODE_PRECISION (mode);
   unsigned HOST_WIDE_INT min_val, max_val;
 
   gcc_assert (size = HOST_BITS_PER_WIDE_INT);

fixes this problem.  The problem is that we treat the maximum value of PSImode 
as -1, and then later we do:

  case EQ:
/* x == y is always false for y out of range.  */
  =if (val  mmin || val  mmax)
B return const0_rtx;
break;

and the answer to the question is 0  -1, is no, so the entire test is 
eliminated as never able to be true.  After the fix, in your case, we get:

(gdb) p mmin
$72 = -524288
(gdb) p mmax
$73 = 524287

and the test becomes if (0  -524288 || 0  524287), which is not true, then 
the test isn't eliminated as never true.

Here, we see the test that protects this code uses GET_MODE_PRECISION:

(gdb) macro expand HWI_COMPUTABLE_MODE_P (PSImode)
expands to: enum mode_class) mode_class[PSImode]) == MODE_INT || ((enum 
mode_class) mode_class[PSIm\
ode]) == MODE_PARTIAL_INT)  mode_precision[PSImode] = (8 * 8))

so, clearly, someone was thinking about GET_MODE_PRECISION being in range, not 
GET_MODE_BITSIZE.

Ok?  [ for the rtl people ]

Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS

2013-09-27 Thread Mike Stump
Can the sh people weigh in on this?  Are the PSI and PDI precisions 32 and 64?

On Sep 17, 2013, at 10:24 AM, Mike Stump mikest...@comcast.net wrote:
 On Sep 16, 2013, at 8:41 PM, DJ Delorie d...@redhat.com wrote:
 m32c's PSImode is 24-bits, why does it have 32 in the macro?
 
 /* 24-bit pointers, in 32-bit units */
 -PARTIAL_INT_MODE (SI);
 +PARTIAL_INT_MODE_NAME (SI, 32, PSI);
 
 Sorry, fingers copied the wrong number.  Thanks for the catch.
 
 partial-1.diffs.txt



[google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Teresa Johnson
David and Rong,

The following patch will disable -g/-gmlt when instrumenting for LIPO
since they will affect the recorded ggc_memory used in the module
grouping decision. Added -fripa-allow-debug to override this behavior.

Passes regression tests and simple tests on the new functionality.

Ok for google/4_8?

Teresa

2013-09-27  Teresa Johnson  tejohn...@google.com

* opts.c (finish_options): Suppress -g/-gmlt when we are
building for LIPO instrumention.
* common.opt (fripa-allow-debug): New option.

Index: opts.c
===
--- opts.c  (revision 202976)
+++ opts.c  (working copy)
@@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
 #endif
   if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
 error_at (loc, -fno-fat-lto-objects are supported only with
linker plugin.);
-}
+}
   if ((opts-x_flag_lto_partition_balanced != 0) +
(opts-x_flag_lto_partition_1to1 != 0)
+ (opts-x_flag_lto_partition_none != 0) = 1)
 {
@@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
   /* Turn on -ffunction-sections when -freorder-functions=* is used.  */
   if (opts-x_flag_reorder_functions  1)
 opts-x_flag_function_sections = 1;
+
+  /* LIPO module grouping depends on the memory consumed by the profile-gen
+ parsing phase, recorded in a per-module ggc_memory field of the module
+ info struct. This will be higher when debug generation is on via
+ -g/-gmlt, which causes the FE to generate debug structures that will
+ increase the ggc_total_memory. This could in theory cause the module
+ groups to be slightly more conservative. Disable -g/-gmlt under
+ -fripa -fprofile-generate, but provide an option to override this
+ in case we actually need to debug an instrumented binary.  */
+  if (opts-x_profile_arc_flag
+   opts-x_flag_dyn_ipa
+   opts-x_debug_info_level != DINFO_LEVEL_NONE
+   !opts-x_flag_dyn_ipa_allow_debug)
+{
+  inform (loc,
+ Debug generation via -g option disabled under -fripa 
+  -fprofile-generate (use -fripa-allow-debug to override));
+  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set,
+   loc);
+}
 }

 #define LEFT_COLUMN27
Index: common.opt
===
--- common.opt  (revision 202976)
+++ common.opt  (working copy)
@@ -1155,6 +1155,10 @@ fripa
 Common Report Var(flag_dyn_ipa)
 Perform Dynamic Inter-Procedural Analysis.

+fripa-allow-debug
+Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
+Allow -g enablement for -fripa -fprofile-generate compiles.
+
 fripa-disallow-asm-modules
 Common Report Var(flag_ripa_disallow_asm_modules)
 Don't import an auxiliary module if it contains asm statements

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Xinliang David Li
On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote:
 David and Rong,

 The following patch will disable -g/-gmlt when instrumenting for LIPO
 since they will affect the recorded ggc_memory used in the module
 grouping decision. Added -fripa-allow-debug to override this behavior.

 Passes regression tests and simple tests on the new functionality.

 Ok for google/4_8?

 Teresa

 2013-09-27  Teresa Johnson  tejohn...@google.com

 * opts.c (finish_options): Suppress -g/-gmlt when we are
 building for LIPO instrumention.
 * common.opt (fripa-allow-debug): New option.

 Index: opts.c
 ===
 --- opts.c  (revision 202976)
 +++ opts.c  (working copy)
 @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
  #endif
if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
  error_at (loc, -fno-fat-lto-objects are supported only with
 linker plugin.);
 -}
 +}


Unrelated format change?

Otherwise looks ok.

thanks,

David


if ((opts-x_flag_lto_partition_balanced != 0) +
 (opts-x_flag_lto_partition_1to1 != 0)
 + (opts-x_flag_lto_partition_none != 0) = 1)
  {
 @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
/* Turn on -ffunction-sections when -freorder-functions=* is used.  */
if (opts-x_flag_reorder_functions  1)
  opts-x_flag_function_sections = 1;
 +
 +  /* LIPO module grouping depends on the memory consumed by the profile-gen
 + parsing phase, recorded in a per-module ggc_memory field of the module
 + info struct. This will be higher when debug generation is on via
 + -g/-gmlt, which causes the FE to generate debug structures that will
 + increase the ggc_total_memory. This could in theory cause the module
 + groups to be slightly more conservative. Disable -g/-gmlt under
 + -fripa -fprofile-generate, but provide an option to override this
 + in case we actually need to debug an instrumented binary.  */
 +  if (opts-x_profile_arc_flag
 +   opts-x_flag_dyn_ipa
 +   opts-x_debug_info_level != DINFO_LEVEL_NONE
 +   !opts-x_flag_dyn_ipa_allow_debug)
 +{
 +  inform (loc,
 + Debug generation via -g option disabled under -fripa 
 +  -fprofile-generate (use -fripa-allow-debug to override));
 +  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set,
 +   loc);
 +}
  }

  #define LEFT_COLUMN27
 Index: common.opt
 ===
 --- common.opt  (revision 202976)
 +++ common.opt  (working copy)
 @@ -1155,6 +1155,10 @@ fripa
  Common Report Var(flag_dyn_ipa)
  Perform Dynamic Inter-Procedural Analysis.

 +fripa-allow-debug
 +Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
 +Allow -g enablement for -fripa -fprofile-generate compiles.
 +
  fripa-disallow-asm-modules
  Common Report Var(flag_ripa_disallow_asm_modules)
  Don't import an auxiliary module if it contains asm statements

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH v4 04/20] add configury

2013-09-27 Thread Gerald Pfeifer
Hi Tom,

On Mon, 23 Sep 2013, Tom Tromey wrote:
 This adds the configury needed for automatic dependency tracking.  It
 also adds some bits to the Makefile so we can begin converting
 (removing) explicit dependencies.
 
   * Makefile.in (CCDEPMODE, DEPDIR, depcomp, COMPILE.base)
   (COMPILE, POSTCOMPILE): New variables.
   (.cc.o .c.o): Use COMPILE, POSTCOMPILE.

I believe this may be breaking all my testers on FreeBSD 
(i386-unknown-freebsd10.0 for example).  The timing of when this
patchset went in fits pretty much when my builds started to break
and I am wondering about some code.

Here is the failure mode:

gmake[2]: Entering directory `/scratch/tmp/gerald/OBJ-0927-1848/gcc'
g++ -c  -DIN_GCC_FRONTEND -g -O2 -DIN_GCC   -fno-exceptions -fno-rtti
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long
-Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common
-DHAVE_CONFIG_H -I. -Ic -I/scratch/tmp/gerald/gcc-HEAD/gcc ...[-I 
options]...
-o c/c-lang.o -MT c/c-lang.o -MMD -MP -MF c/.deps/c-lang.TPo
/scratch/tmp/gerald/gcc-HEAD/gcc/c/c-lang.c
cc1plus: error: unrecognized command line option -Wno-narrowing
gmake[2]: *** [c/c-lang.o] Error 1
gmake[1]: *** [install-gcc] Error 2
gmake: *** [install] Error 2

The issue is the invocation of g++ (the old system compiler, not what
we built) with -Wno-narrowing (a new option).

And looking at the code, I see

  +COMPILE.base = $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) -o $@
  +ifeq ($(CCDEPMODE),depmode=gcc3)
  +# Note a subtlety here: we use $(@D) for the directory part, to make
  +# things like the go/%.o rule work properly; but we use $(*F) for the
  +# file part, as we just want the file part of the stem, not the entire
  +# file name.
  +COMPILE = $(COMPILE.base) -MT $@ -MMD -MP -MF $(@D)/$(DEPDIR)/$(*F).TPo
  +POSTCOMPILE = @mv $(@D)/$(DEPDIR)/$(*F).TPo $(@D)/$(DEPDIR)/$(*F).Po
  +else
  +COMPILE = source='$' object='$@' libtool=no \
  +DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) $(COMPILE.base)
  +POSTCOMPILE =
  +endif

Where does $(ALL_COMPILERFLAGS) compile from?  If I read the code
right, we do disable these warnings for the stage1 build.  However,
the install compiler is the same -- so I guess we should disable
warnings there, too?

Gerald


[Patch, fortran, doc, committed] Fix DATE_AND_TIME example

2013-09-27 Thread Janne Blomqvist
Hello,

a format string in the example for DATE_AND_TIME contained a syntax
error. Committed as obvious.

2013-09-27  Janne Blomqvist  j...@gcc.gnu.org

* intrinsic.texi (DATE_AND_TIME): Fix example.


Index: intrinsic.texi
===
--- intrinsic.texi  (revision 202983)
+++ intrinsic.texi  (working copy)
@@ -3461,7 +3461,7 @@ program test_time_and_date
 call date_and_time(TIME=time)
 call date_and_time(VALUES=values)
 print '(a,2x,a,2x,a)', date, time, zone
-print '(8i5))', values
+print '(8i5)', values
 end program test_time_and_date
 @end smallexample


-- 
Janne Blomqvist


Re: Generic tuning in x86-tune.def 1/2

2013-09-27 Thread Andi Kleen
Jan Hubicka hubi...@ucw.cz writes:

 I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it
 seems like obvious omision (after double checking in optimization manual) and
 droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores.

When tuning for Intel SandyBridge+ it would be actually interesting to
use a 32 byte window instead of 16 bytes.

The decoded icache has a 3 jump limit per 32byte.

So if K8 support is dropped from generic could just change it over
to 32 bytes there?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Teresa Johnson
On Fri, Sep 27, 2013 at 12:01 PM, Xinliang David Li davi...@google.com wrote:
 On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote:
 David and Rong,

 The following patch will disable -g/-gmlt when instrumenting for LIPO
 since they will affect the recorded ggc_memory used in the module
 grouping decision. Added -fripa-allow-debug to override this behavior.

 Passes regression tests and simple tests on the new functionality.

 Ok for google/4_8?

 Teresa

 2013-09-27  Teresa Johnson  tejohn...@google.com

 * opts.c (finish_options): Suppress -g/-gmlt when we are
 building for LIPO instrumention.
 * common.opt (fripa-allow-debug): New option.

 Index: opts.c
 ===
 --- opts.c  (revision 202976)
 +++ opts.c  (working copy)
 @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
  #endif
if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
  error_at (loc, -fno-fat-lto-objects are supported only with
 linker plugin.);
 -}
 +}


 Unrelated format change?

Well, related in the sense that it messed up my editor's
auto-indention logic when making the change below. Ok to include the
formatting fix if I mention it in the commit log?

Teresa


 Otherwise looks ok.

 thanks,

 David


if ((opts-x_flag_lto_partition_balanced != 0) +
 (opts-x_flag_lto_partition_1to1 != 0)
 + (opts-x_flag_lto_partition_none != 0) = 1)
  {
 @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
/* Turn on -ffunction-sections when -freorder-functions=* is used.  */
if (opts-x_flag_reorder_functions  1)
  opts-x_flag_function_sections = 1;
 +
 +  /* LIPO module grouping depends on the memory consumed by the profile-gen
 + parsing phase, recorded in a per-module ggc_memory field of the module
 + info struct. This will be higher when debug generation is on via
 + -g/-gmlt, which causes the FE to generate debug structures that will
 + increase the ggc_total_memory. This could in theory cause the module
 + groups to be slightly more conservative. Disable -g/-gmlt under
 + -fripa -fprofile-generate, but provide an option to override this
 + in case we actually need to debug an instrumented binary.  */
 +  if (opts-x_profile_arc_flag
 +   opts-x_flag_dyn_ipa
 +   opts-x_debug_info_level != DINFO_LEVEL_NONE
 +   !opts-x_flag_dyn_ipa_allow_debug)
 +{
 +  inform (loc,
 + Debug generation via -g option disabled under -fripa 
 +  -fprofile-generate (use -fripa-allow-debug to override));
 +  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, 
 opts_set,
 +   loc);
 +}
  }

  #define LEFT_COLUMN27
 Index: common.opt
 ===
 --- common.opt  (revision 202976)
 +++ common.opt  (working copy)
 @@ -1155,6 +1155,10 @@ fripa
  Common Report Var(flag_dyn_ipa)
  Perform Dynamic Inter-Procedural Analysis.

 +fripa-allow-debug
 +Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
 +Allow -g enablement for -fripa -fprofile-generate compiles.
 +
  fripa-disallow-asm-modules
  Common Report Var(flag_ripa_disallow_asm_modules)
  Don't import an auxiliary module if it contains asm statements

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Xinliang David Li
ok.

David

On Fri, Sep 27, 2013 at 1:03 PM, Teresa Johnson tejohn...@google.com wrote:
 On Fri, Sep 27, 2013 at 12:01 PM, Xinliang David Li davi...@google.com 
 wrote:
 On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com 
 wrote:
 David and Rong,

 The following patch will disable -g/-gmlt when instrumenting for LIPO
 since they will affect the recorded ggc_memory used in the module
 grouping decision. Added -fripa-allow-debug to override this behavior.

 Passes regression tests and simple tests on the new functionality.

 Ok for google/4_8?

 Teresa

 2013-09-27  Teresa Johnson  tejohn...@google.com

 * opts.c (finish_options): Suppress -g/-gmlt when we are
 building for LIPO instrumention.
 * common.opt (fripa-allow-debug): New option.

 Index: opts.c
 ===
 --- opts.c  (revision 202976)
 +++ opts.c  (working copy)
 @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
  #endif
if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
  error_at (loc, -fno-fat-lto-objects are supported only with
 linker plugin.);
 -}
 +}


 Unrelated format change?

 Well, related in the sense that it messed up my editor's
 auto-indention logic when making the change below. Ok to include the
 formatting fix if I mention it in the commit log?

 Teresa


 Otherwise looks ok.

 thanks,

 David


if ((opts-x_flag_lto_partition_balanced != 0) +
 (opts-x_flag_lto_partition_1to1 != 0)
 + (opts-x_flag_lto_partition_none != 0) = 1)
  {
 @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
/* Turn on -ffunction-sections when -freorder-functions=* is used.  */
if (opts-x_flag_reorder_functions  1)
  opts-x_flag_function_sections = 1;
 +
 +  /* LIPO module grouping depends on the memory consumed by the profile-gen
 + parsing phase, recorded in a per-module ggc_memory field of the module
 + info struct. This will be higher when debug generation is on via
 + -g/-gmlt, which causes the FE to generate debug structures that will
 + increase the ggc_total_memory. This could in theory cause the module
 + groups to be slightly more conservative. Disable -g/-gmlt under
 + -fripa -fprofile-generate, but provide an option to override this
 + in case we actually need to debug an instrumented binary.  */
 +  if (opts-x_profile_arc_flag
 +   opts-x_flag_dyn_ipa
 +   opts-x_debug_info_level != DINFO_LEVEL_NONE
 +   !opts-x_flag_dyn_ipa_allow_debug)
 +{
 +  inform (loc,
 + Debug generation via -g option disabled under -fripa 
 +  -fprofile-generate (use -fripa-allow-debug to override));
 +  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, 
 opts_set,
 +   loc);
 +}
  }

  #define LEFT_COLUMN27
 Index: common.opt
 ===
 --- common.opt  (revision 202976)
 +++ common.opt  (working copy)
 @@ -1155,6 +1155,10 @@ fripa
  Common Report Var(flag_dyn_ipa)
  Perform Dynamic Inter-Procedural Analysis.

 +fripa-allow-debug
 +Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
 +Allow -g enablement for -fripa -fprofile-generate compiles.
 +
  fripa-disallow-asm-modules
  Common Report Var(flag_ripa_disallow_asm_modules)
  Don't import an auxiliary module if it contains asm statements

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Rong Xu
I don't quite understand here. We use the profile-generate memory
consumption to estimate the profile use memory consumption.
we still have -g/-gmlt in profile-use compilation. Will this change
effectively under estimate the memory use in the use phrase?

-Rong

On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote:
 David and Rong,

 The following patch will disable -g/-gmlt when instrumenting for LIPO
 since they will affect the recorded ggc_memory used in the module
 grouping decision. Added -fripa-allow-debug to override this behavior.

 Passes regression tests and simple tests on the new functionality.

 Ok for google/4_8?

 Teresa

 2013-09-27  Teresa Johnson  tejohn...@google.com

 * opts.c (finish_options): Suppress -g/-gmlt when we are
 building for LIPO instrumention.
 * common.opt (fripa-allow-debug): New option.

 Index: opts.c
 ===
 --- opts.c  (revision 202976)
 +++ opts.c  (working copy)
 @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
  #endif
if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
  error_at (loc, -fno-fat-lto-objects are supported only with
 linker plugin.);
 -}
 +}
if ((opts-x_flag_lto_partition_balanced != 0) +
 (opts-x_flag_lto_partition_1to1 != 0)
 + (opts-x_flag_lto_partition_none != 0) = 1)
  {
 @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
/* Turn on -ffunction-sections when -freorder-functions=* is used.  */
if (opts-x_flag_reorder_functions  1)
  opts-x_flag_function_sections = 1;
 +
 +  /* LIPO module grouping depends on the memory consumed by the profile-gen
 + parsing phase, recorded in a per-module ggc_memory field of the module
 + info struct. This will be higher when debug generation is on via
 + -g/-gmlt, which causes the FE to generate debug structures that will
 + increase the ggc_total_memory. This could in theory cause the module
 + groups to be slightly more conservative. Disable -g/-gmlt under
 + -fripa -fprofile-generate, but provide an option to override this
 + in case we actually need to debug an instrumented binary.  */
 +  if (opts-x_profile_arc_flag
 +   opts-x_flag_dyn_ipa
 +   opts-x_debug_info_level != DINFO_LEVEL_NONE
 +   !opts-x_flag_dyn_ipa_allow_debug)
 +{
 +  inform (loc,
 + Debug generation via -g option disabled under -fripa 
 +  -fprofile-generate (use -fripa-allow-debug to override));
 +  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set,
 +   loc);
 +}
  }

  #define LEFT_COLUMN27
 Index: common.opt
 ===
 --- common.opt  (revision 202976)
 +++ common.opt  (working copy)
 @@ -1155,6 +1155,10 @@ fripa
  Common Report Var(flag_dyn_ipa)
  Perform Dynamic Inter-Procedural Analysis.

 +fripa-allow-debug
 +Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
 +Allow -g enablement for -fripa -fprofile-generate compiles.
 +
  fripa-disallow-asm-modules
  Common Report Var(flag_ripa_disallow_asm_modules)
  Don't import an auxiliary module if it contains asm statements

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-27 Thread Tim Shen
On Fri, Sep 27, 2013 at 9:37 AM, Jonathan Wakely jwakely@gmail.com wrote:
 Ah I see.  I definitely agree it's good to accept that instead of
 being unnecessarily strict, but other people will want the option of
 strict conformance, so I think we can please everyone with something
 like:

 else
   {
 #ifdef __STRICT_ANSI__
 __throw_regex_error(regex_constants::error_escape);
 #else
_M_token = _S_token_ord_char;
_M_value.assign(1, __c);
 #endif
   }

Sorry for late .

Do I need to bootstrap again?


-- 
Tim Shen


a.patch
Description: Binary data


Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Teresa Johnson
The issue is that building the instrumented binary with and without,
say, -gmlt, may result in different module grouping.

Teresa

On Fri, Sep 27, 2013 at 1:18 PM, Rong Xu x...@google.com wrote:
 I don't quite understand here. We use the profile-generate memory
 consumption to estimate the profile use memory consumption.
 we still have -g/-gmlt in profile-use compilation. Will this change
 effectively under estimate the memory use in the use phrase?

 -Rong

 On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote:
 David and Rong,

 The following patch will disable -g/-gmlt when instrumenting for LIPO
 since they will affect the recorded ggc_memory used in the module
 grouping decision. Added -fripa-allow-debug to override this behavior.

 Passes regression tests and simple tests on the new functionality.

 Ok for google/4_8?

 Teresa

 2013-09-27  Teresa Johnson  tejohn...@google.com

 * opts.c (finish_options): Suppress -g/-gmlt when we are
 building for LIPO instrumention.
 * common.opt (fripa-allow-debug): New option.

 Index: opts.c
 ===
 --- opts.c  (revision 202976)
 +++ opts.c  (working copy)
 @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
  #endif
if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
  error_at (loc, -fno-fat-lto-objects are supported only with
 linker plugin.);
 -}
 +}
if ((opts-x_flag_lto_partition_balanced != 0) +
 (opts-x_flag_lto_partition_1to1 != 0)
 + (opts-x_flag_lto_partition_none != 0) = 1)
  {
 @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
/* Turn on -ffunction-sections when -freorder-functions=* is used.  */
if (opts-x_flag_reorder_functions  1)
  opts-x_flag_function_sections = 1;
 +
 +  /* LIPO module grouping depends on the memory consumed by the profile-gen
 + parsing phase, recorded in a per-module ggc_memory field of the module
 + info struct. This will be higher when debug generation is on via
 + -g/-gmlt, which causes the FE to generate debug structures that will
 + increase the ggc_total_memory. This could in theory cause the module
 + groups to be slightly more conservative. Disable -g/-gmlt under
 + -fripa -fprofile-generate, but provide an option to override this
 + in case we actually need to debug an instrumented binary.  */
 +  if (opts-x_profile_arc_flag
 +   opts-x_flag_dyn_ipa
 +   opts-x_debug_info_level != DINFO_LEVEL_NONE
 +   !opts-x_flag_dyn_ipa_allow_debug)
 +{
 +  inform (loc,
 + Debug generation via -g option disabled under -fripa 
 +  -fprofile-generate (use -fripa-allow-debug to override));
 +  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, 
 opts_set,
 +   loc);
 +}
  }

  #define LEFT_COLUMN27
 Index: common.opt
 ===
 --- common.opt  (revision 202976)
 +++ common.opt  (working copy)
 @@ -1155,6 +1155,10 @@ fripa
  Common Report Var(flag_dyn_ipa)
  Perform Dynamic Inter-Procedural Analysis.

 +fripa-allow-debug
 +Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
 +Allow -g enablement for -fripa -fprofile-generate compiles.
 +
  fripa-disallow-asm-modules
  Common Report Var(flag_ripa_disallow_asm_modules)
  Don't import an auxiliary module if it contains asm statements

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation

2013-09-27 Thread Xinliang David Li
The key is that grouping results should not dependent on the presence
of -g flags.  The downside of the patch is that it may slightly
underestimate the memory pressure at profile-use time, but that should
not be a big problem.

David

On Fri, Sep 27, 2013 at 1:18 PM, Rong Xu x...@google.com wrote:
 I don't quite understand here. We use the profile-generate memory
 consumption to estimate the profile use memory consumption.
 we still have -g/-gmlt in profile-use compilation. Will this change
 effectively under estimate the memory use in the use phrase?

 -Rong

 On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote:
 David and Rong,

 The following patch will disable -g/-gmlt when instrumenting for LIPO
 since they will affect the recorded ggc_memory used in the module
 grouping decision. Added -fripa-allow-debug to override this behavior.

 Passes regression tests and simple tests on the new functionality.

 Ok for google/4_8?

 Teresa

 2013-09-27  Teresa Johnson  tejohn...@google.com

 * opts.c (finish_options): Suppress -g/-gmlt when we are
 building for LIPO instrumention.
 * common.opt (fripa-allow-debug): New option.

 Index: opts.c
 ===
 --- opts.c  (revision 202976)
 +++ opts.c  (working copy)
 @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g
  #endif
if (!opts-x_flag_fat_lto_objects  !HAVE_LTO_PLUGIN)
  error_at (loc, -fno-fat-lto-objects are supported only with
 linker plugin.);
 -}
 +}
if ((opts-x_flag_lto_partition_balanced != 0) +
 (opts-x_flag_lto_partition_1to1 != 0)
 + (opts-x_flag_lto_partition_none != 0) = 1)
  {
 @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g
/* Turn on -ffunction-sections when -freorder-functions=* is used.  */
if (opts-x_flag_reorder_functions  1)
  opts-x_flag_function_sections = 1;
 +
 +  /* LIPO module grouping depends on the memory consumed by the profile-gen
 + parsing phase, recorded in a per-module ggc_memory field of the module
 + info struct. This will be higher when debug generation is on via
 + -g/-gmlt, which causes the FE to generate debug structures that will
 + increase the ggc_total_memory. This could in theory cause the module
 + groups to be slightly more conservative. Disable -g/-gmlt under
 + -fripa -fprofile-generate, but provide an option to override this
 + in case we actually need to debug an instrumented binary.  */
 +  if (opts-x_profile_arc_flag
 +   opts-x_flag_dyn_ipa
 +   opts-x_debug_info_level != DINFO_LEVEL_NONE
 +   !opts-x_flag_dyn_ipa_allow_debug)
 +{
 +  inform (loc,
 + Debug generation via -g option disabled under -fripa 
 +  -fprofile-generate (use -fripa-allow-debug to override));
 +  set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, 
 opts_set,
 +   loc);
 +}
  }

  #define LEFT_COLUMN27
 Index: common.opt
 ===
 --- common.opt  (revision 202976)
 +++ common.opt  (working copy)
 @@ -1155,6 +1155,10 @@ fripa
  Common Report Var(flag_dyn_ipa)
  Perform Dynamic Inter-Procedural Analysis.

 +fripa-allow-debug
 +Common Report Var(flag_dyn_ipa_allow_debug) Init(0)
 +Allow -g enablement for -fripa -fprofile-generate compiles.
 +
  fripa-disallow-asm-modules
  Common Report Var(flag_ripa_disallow_asm_modules)
  Don't import an auxiliary module if it contains asm statements

 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH v4 04/20] add configury

2013-09-27 Thread Tom Tromey
Gerald And looking at the code, I see
Gerald   +COMPILE.base = $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) 
-o $@
[...]
Gerald Where does $(ALL_COMPILERFLAGS) compile from?

Look a little further down in the patch:

 .cc.o .c.o:
-   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $ $(OUTPUT_OPTION)
+   $(COMPILE) $
+   $(POSTCOMPILE)


... that is, the patches didn't change this part.  ALL_COMPILERFLAGS was
used before and it is used now.  I don't think this series touched how
this variable is computed, either.

Gerald If I read the code
Gerald right, we do disable these warnings for the stage1 build.  However,
Gerald the install compiler is the same -- so I guess we should disable
Gerald warnings there, too?

I'm afraid I don't understand.


If I were debugging this then I think I would start by looking in
config.log to see why the compiler accepted -Wno-narrowing.

Actually, I looked in my own config.log and I see it isn't very
informative:

configure:6280: checking whether gcc supports -Wnarrowing
configure:6306: result: yes

I suppose I would hack in a set -x / set +x pair into configure
around the warning-checking section and then see what happens.

Tom


Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-27 Thread Paolo Carlini


Hi

Tim Shen timshe...@gmail.com ha scritto:
Do I need to bootstrap again?

Nah, only double check that the testcase you are un-xfail-ing uses 
-std=gnu++11, otherwise will not pass ;)

Paolo



Re: [google gcc-4_8] alternate hirate for builtin_expert

2013-09-27 Thread Rong Xu
On Thu, Sep 26, 2013 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 Current default probably for builtin_expect is 0.9996.
 This makes the freq of unlikely bb very low (4), which
 suppresses the inlining of any calls within those bb.

 We used FDO data to measure the branch probably for
 the branch annotated with builtin_expert.
  For google
 internal benchmarks, the weight average
 (the profile count value as the weight) is 0.9081.

 Linux kernel is another program that is heavily annotated
 with builtin-expert. We measured its weight average as 0.8717,
   using google search as
 the workload.

 This is interesting.  I was always unsure if programmers use builtin_expect
 more often to mark an impossible paths (as those leading to crash) or just
 unlikely paths.  Your data seems to suggest the second.


I don't find an official guideline on how this should be used. Maybe
some documentation
in gcc user-guide helps.

 We probably also ought to get analyze_brprob working again. It was always
 useful to get such a data.


 This patch sets the alternate hirate probability for
 builtin_expert
 to 90%. With the alternate hirate, we measured performance
   improvement for google
 benchmarks and Linux kernel.


   -Rong
 2013-09-26  Rong Xu  x...@google.com

   * params.def (DEFPARAM): New.
   * params.def: New.
   * predict.c (tree_predict_by_opcode): Alternate
 probablity hirate for builtin_expect.

 This also seems resonable for mainline.  Please add a comment
 to the parameter explaining how the value was determined.
 Please also add invoke.texi documentation.


Will do.

 For patches that seems resonable for mainline in FDO/IPA area,
 i would apprechiate if you just added me to CC, so I do not lose
 track of them.

Will do.

 Honza


libgo patch committed: Implement reflect.MakeFunc for 386

2013-09-27 Thread Ian Lance Taylor
Following up on my earlier patch, this patch implements the
reflect.MakeFunc function for 386.

Tom Tromey pointed out to me that the libffi closure support can
probably be used for this.  I was not aware of that support.  It
supports a lot more processors, and I should probably start using it.
The approach I am using does have a couple of advantages: it's more
efficient, and it doesn't require any type of writable executable
memory.  I can get away with that because indirect calls in Go always
pass a closure value.  So even when and if I do change to using libffi,
I might still keep this code for amd64 and 386.

Anyhow, for this patch, bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.8 branch.

Ian

diff -r 14a32fb52e24 libgo/Makefile.am
--- a/libgo/Makefile.am	Fri Sep 27 10:46:09 2013 -0700
+++ b/libgo/Makefile.am	Fri Sep 27 14:30:27 2013 -0700
@@ -901,10 +901,17 @@
 go_reflect_makefunc_s_file = \
 	go/reflect/makefunc_amd64.S
 else
+if LIBGO_IS_386
+go_reflect_makefunc_file = \
+	go/reflect/makefuncgo_386.go
+go_reflect_makefunc_s_file = \
+	go/reflect/makefunc_386.S
+else
 go_reflect_makefunc_file =
 go_reflect_makefunc_s_file = \
 	go/reflect/makefunc_dummy.c
 endif
+endif
 
 go_reflect_files = \
 	go/reflect/deepequal.go \
diff -r 14a32fb52e24 libgo/go/reflect/all_test.go
--- a/libgo/go/reflect/all_test.go	Fri Sep 27 10:46:09 2013 -0700
+++ b/libgo/go/reflect/all_test.go	Fri Sep 27 14:30:27 2013 -0700
@@ -1432,7 +1432,7 @@
 
 func TestMakeFunc(t *testing.T) {
 	switch runtime.GOARCH {
-	case amd64:
+	case amd64, 386:
 	default:
 		t.Skip(MakeFunc not implemented for  + runtime.GOARCH)
 	}
diff -r 14a32fb52e24 libgo/go/reflect/makefunc.go
--- a/libgo/go/reflect/makefunc.go	Fri Sep 27 10:46:09 2013 -0700
+++ b/libgo/go/reflect/makefunc.go	Fri Sep 27 14:30:27 2013 -0700
@@ -47,7 +47,7 @@
 	}
 
 	switch runtime.GOARCH {
-	case amd64:
+	case amd64, 386:
 	default:
 		panic(reflect.MakeFunc not implemented for  + runtime.GOARCH)
 	}
diff -r 14a32fb52e24 libgo/go/reflect/makefunc_386.S
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/libgo/go/reflect/makefunc_386.S	Fri Sep 27 14:30:27 2013 -0700
@@ -0,0 +1,111 @@
+# Copyright 2013 The Go Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style
+# license that can be found in the LICENSE file.
+
+# MakeFunc 386 assembly code.
+
+	.global reflect.makeFuncStub
+
+#ifdef __ELF__
+	.type reflect.makeFuncStub,@function
+#endif
+
+reflect.makeFuncStub:
+	.cfi_startproc
+
+	# Go does not provide any equivalent to the regparm function
+	# attribute, so on Go we do not need to worry about passing
+	# parameters in registers.  We just pass a pointer to the
+	# arguments on the stack.
+	#
+	# We do need to pick up the return values, though, so we pass
+	# a pointer to a struct that looks like this.
+	# struct {
+	#   esp uint32		// 0x0
+	#   eax uint32		// 0x4
+	#   st0 uint64		// 0x8
+	# }
+
+	pushl	%ebp
+	.cfi_def_cfa_offset 8
+	.cfi_offset %ebp, -8
+	movl	%esp, %ebp
+	.cfi_def_cfa_register %ebp
+
+	pushl	%ebx		# In case this is PIC.
+
+	subl	$36, %esp	# Enough for args and to align stack.
+	.cfi_offset %ebx, -12
+
+#ifdef __PIC__
+	call	__x86.get_pc_thunk.bx
+	addl	$_GLOBAL_OFFSET_TABLE_, %ebx
+#endif
+
+	leal	8(%ebp), %eax	# Set esp field in struct.
+	movl	%eax, -24(%ebp)
+
+#ifdef __PIC__
+	call	__go_get_closure@PLT
+#else
+	call	__go_get_closure
+#endif
+
+	movl	%eax, 4(%esp)
+
+	leal	-24(%ebp), %eax
+	movl	%eax, (%esp)
+
+#ifdef __PIC__
+	call	reflect.MakeFuncStubGo@PLT
+#else
+	call	reflect.MakeFuncStubGo
+#endif
+
+	# Set return registers.
+
+	movl	-20(%ebp), %eax
+	fldl	-16(%ebp)
+
+#ifdef __SSE2__
+	# In case we are compiling with -msseregparm.  This won't work
+	# correctly if only SSE1 is supported, but that seems unlikely.
+	movsd	-16(%ebp), %xmm0
+#endif
+
+	addl	$36, %esp
+	popl	%ebx
+	.cfi_restore %ebx
+	popl	%ebp
+	.cfi_restore %ebp
+	.cfi_def_cfa %esp, 4
+
+	ret
+	.cfi_endproc
+
+#ifdef __ELF__
+	.size	reflect.makeFuncStub, . - reflect.makeFuncStub
+#endif
+
+#ifdef __PIC__
+	.section	.text.__x86.get_pc_thunk.bx,axG,@progbits,__x86.get_pc_thunk.bx,comdat
+	.globl	__x86.get_pc_thunk.bx
+	.hidden	__x86.get_pc_thunk.bx
+#ifdef __ELF__
+	.type	__x86.get_pc_thunk.bx, @function
+#endif
+__x86.get_pc_thunk.bx:
+	.cfi_startproc
+	movl	(%esp), %ebx
+	ret
+	.cfi_endproc
+#ifdef __ELF__
+	.size	__x86.get_pc_thunk.bx, . - __x86.get_pc_thunk.bx
+#endif
+#endif
+
+#ifdef __ELF__
+	.section	.note.GNU-stack,,@progbits
+	.section	.note.GNU-split-stack,,@progbits
+	.section	.note.GNU-no-split-stack,,@progbits
+#endif
diff -r 14a32fb52e24 libgo/go/reflect/makefuncgo_386.go
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/libgo/go/reflect/makefuncgo_386.go	Fri Sep 27 14:30:27 2013 -0700
@@ -0,0 +1,135 @@
+// Copyright 2013 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// MakeFunc 386 

[Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Easwaran Raman
This patch increases comdat-sharing-probability to 80 under -Os. This
reduces the amount of inlining and helps internal benchmarks.
Unfortunately, this causes slight regression on spec 2006. Ok for
google branches if all tests pass?

- Easwaran


comdat_sharing.patch
Description: Binary data


libgo patch committed: Copy stack values onto heap

2013-09-27 Thread Ian Lance Taylor
I realized that in the amd64 implementation of MakeFunc I forgot that
it's not OK to just take the address of a value on the stack, since the
function might hang onto the address.  The value needs to be copied onto
the heap first.  This patch implements that.  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.8
branch.

Ian

diff -r cec0db170d82 libgo/go/reflect/makefuncgo_amd64.go
--- a/libgo/go/reflect/makefuncgo_amd64.go	Fri Sep 27 14:31:06 2013 -0700
+++ b/libgo/go/reflect/makefuncgo_amd64.go	Fri Sep 27 15:11:00 2013 -0700
@@ -431,8 +431,14 @@
 func amd64Memarg(in []Value, ap uintptr, rt *rtype) ([]Value, uintptr) {
 	ap = align(ap, ptrSize)
 	ap = align(ap, uintptr(rt.align))
-	p := Value{rt, unsafe.Pointer(ap), flag(rt.Kind()flagKindShift) | flagIndir}
-	in = append(in, p)
+
+	// We have to copy the argument onto the heap in case the
+	// function hangs onto the reflect.Value we pass it.
+	p := unsafe_New(rt)
+	memmove(p, unsafe.Pointer(ap), rt.size)
+
+	v := Value{rt, p, flag(rt.Kind()flagKindShift) | flagIndir}
+	in = append(in, v)
 	ap += rt.size
 	return in, ap
 }


Merge from GCC 4.8 branch to gccgo branch

2013-09-27 Thread Ian Lance Taylor
I've merged revision 202996 from the GCC 4.8 branch to the gccgo branch.

Ian


[PATCH] alternative hirate for builtin_expert

2013-09-27 Thread Rong Xu
Hi,



  Current default probability for builtin_expect is 0.9996.
This makes the freq of unlikely bb very low (4), which
suppresses the inlining of any calls within those bb.

We used FDO data to measure the branch probably for
the branch annotated with builtin_expert.
For google internal benchmarks, the weight average
(the profile count value as the weight) is 0.9081.

Linux kernel is another program that is heavily annotated
with builtin-expert. We measured its weight average as 0.8717,
using google search as the workload.

This patch sets the alternate hirate probability for builtin_expert
to 90%. With the alternate hirate, we measured performance
improvement for google benchmarks and Linux kernel.

An earlier discussion is
https://mail.google.com/mail/u/0/?pli=1#label/gcc-paches/1415c5910054630b

This new patch is for the trunk and addresses Honza's comments.

Honza: this new probability is off by default. When we backport to google
branch we will make it the default. Let me know if you want to do the same
here.

Thanks,

-Rong


p2_patch
Description: Binary data


[PATCH] fix size_estimation for builtin_expect

2013-09-27 Thread Rong Xu
Hi,

builtin_expect should be a NOP in size_estimation. Indeed, the call
stmt itself is 0 weight in size and time. But it may introduce
an extra relation expr which has non-zero size/time. The end result
is: for w/ and w/o builtin_expect, we have different size/time estimation
for inlining.

This patch fixes this problem.

An earlier discussion of this patch is
  https://mail.google.com/mail/u/0/?pli=1#label/gcc-paches/1415c590ad8c5315

This new patch address Honza's comments.
It passes the bootstrap and regression.

Richard: I looked at your tree-ssa.c:walk_use_def_chains() code. I think
that's an overkill for this simple problem. Your code is mostly dealing
with the recursively walk the PHI node to find the real def stmts.
Here the traversal is within one BB and I may need to continue on multiple
real assignment. Calling walk_use_def_chains probably only uses
the SSA_NAME_DEF_STMT() part of the code.

Thanks,

-Rong


p1_patch
Description: Binary data


Re: [Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Xinliang David Li
d.growth -= (info-size
* (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY))
+ 50) / 100;

What is the purpose of '50' here?

The patch is fine for Google branch.

Other tunings to think about -- I think the sharing probability should
not be a fixed value -- but depending on the function's charateristics
-- such as size, number of callsites etc. For instance, for small leaf
comdat functions, the sharing probability will be small.

David


On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman era...@google.com wrote:
 This patch increases comdat-sharing-probability to 80 under -Os. This
 reduces the amount of inlining and helps internal benchmarks.
 Unfortunately, this causes slight regression on spec 2006. Ok for
 google branches if all tests pass?

 - Easwaran


Re: [Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Easwaran Raman
On Fri, Sep 27, 2013 at 4:08 PM, Xinliang David Li davi...@google.com wrote:
 d.growth -= (info-size
 * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY))
 + 50) / 100;

 What is the purpose of '50' here?

Rounding after division by 100.

 The patch is fine for Google branch.

 Other tunings to think about -- I think the sharing probability should
 not be a fixed value -- but depending on the function's charateristics
 -- such as size, number of callsites etc. For instance, for small leaf
 comdat functions, the sharing probability will be small.
 David


 On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman era...@google.com wrote:
 This patch increases comdat-sharing-probability to 80 under -Os. This
 reduces the amount of inlining and helps internal benchmarks.
 Unfortunately, this causes slight regression on spec 2006. Ok for
 google branches if all tests pass?

 - Easwaran


[PATCH] Relax the requirement of reduction pattern in GCC vectorizer.

2013-09-27 Thread Cong Hou
The current GCC vectorizer requires the following pattern as a simple
reduction computation:

   loop_header:
 a1 = phi  a0, a2 
 a3 = ...
 a2 = operation (a3, a1)

But a3 can also be defined outside of the loop. For example, the
following loop can benefit from vectorization but the GCC vectorizer
fails to vectorize it:


int foo(int v)
{
  int s = 1;
  ++v;
  for (int i = 0; i  10; ++i)
s *= v;
  return s;
}


This patch relaxes the original requirement by also considering the
following pattern:


   a3 = ...
   loop_header:
 a1 = phi  a0, a2 
 a2 = operation (a3, a1)


A test case is also added. The patch is tested on x86-64.


thanks,
Cong



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 39c786e..45c1667 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2013-09-27  Cong Hou  co...@google.com
+
+ * tree-vect-loop.c: Relax the requirement of the reduction
+ pattern so that one operand of the reduction operation can
+ come from outside of the loop.
+
 2013-09-25  Tom Tromey  tro...@redhat.com

  * Makefile.in (PARTITION_H, LTO_SYMTAB_H, COMMON_TARGET_DEF_H)
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 09644d2..90496a2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2013-09-27  Cong Hou  co...@google.com
+
+ * gcc.dg/vect/vect-reduc-pattern-3.c: New test.
+
 2013-09-25  Marek Polacek  pola...@redhat.com

  PR sanitizer/58413
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 2871ba1..3c51c3b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2091,6 +2091,13 @@ vect_is_slp_reduction (loop_vec_info loop_info,
gimple phi, gimple first_stmt)
  a3 = ...
  a2 = operation (a3, a1)

+   or
+
+   a3 = ...
+   loop_header:
+ a1 = phi  a0, a2 
+ a2 = operation (a3, a1)
+
such that:
1. operation is commutative and associative and it is safe to
   change the order of the computation (if CHECK_REDUCTION is true)
@@ -2451,6 +2458,7 @@ vect_is_simple_reduction_1 (loop_vec_info
loop_info, gimple phi,
   if (def2  def2 == phi
(code == COND_EXPR
   || !def1 || gimple_nop_p (def1)
+  || !flow_bb_inside_loop_p (loop, gimple_bb (def1))
   || (def1  flow_bb_inside_loop_p (loop, gimple_bb (def1))
(is_gimple_assign (def1)
   || is_gimple_call (def1)
@@ -2469,6 +2477,7 @@ vect_is_simple_reduction_1 (loop_vec_info
loop_info, gimple phi,
   if (def1  def1 == phi
(code == COND_EXPR
   || !def2 || gimple_nop_p (def2)
+  || !flow_bb_inside_loop_p (loop, gimple_bb (def2))
   || (def2  flow_bb_inside_loop_p (loop, gimple_bb (def2))
(is_gimple_assign (def2)
   || is_gimple_call (def2)
diff --git gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
new file mode 100644
index 000..06a9416
--- /dev/null
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+
+#include stdarg.h
+#include tree-vect.h
+
+#define N 10
+#define RES 1024
+
+/* A reduction pattern in which there is no data ref in
+   the loop and one operand is defined outside of the loop.  */
+
+__attribute__ ((noinline)) int
+foo (int v)
+{
+  int i;
+  int result = 1;
+
+  ++v;
+  for (i = 0; i  N; i++)
+result *= v;
+
+  return result;
+}
+
+int
+main (void)
+{
+  int res;
+
+  check_vect ();
+
+  res = foo (1);
+  if (res != RES)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */
+/* { dg-final { cleanup-tree-dump vect } } */
+


[PATCH] reimplement -fstrict-volatile-bitfields v4, part 0/2

2013-09-27 Thread Sandra Loosemore

Here is the latest version of my -fstrict-volatile-bitfields patches.

I have gone ahead and committed part 1 of the previous version of this 
patch series, which was approved back in mid-June.  That part just 
removes the warning about conflicts with packed structs (which everybody 
seemed to agree was a bad idea) but doesn't otherwise change the 
behavior of -fstrict-volatile-bitfields.


The code changes for the rest of the patch series are unchanged, but 
I've updated the test cases to remove dependencies on header files.
I refreshed the patches against mainline head and retested all parts of 
the patch series last week.


Part 1 of the current patch series incorporates parts 2 and 3 of the 
previous version (code changes and test cases for the various bugs that 
have been reported against -fstrict-volatile-bitfields).  Part 2 is the 
same as part 4 of the previous version (changing the target-specific 
defaults).  Hopefully having fewer parts to keep track of will make it 
easier to review.


-Sandra



[PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-09-27 Thread Sandra Loosemore
This patch fixes various -fstrict-volatile-bitfields wrong-code bugs, 
including instances where bitfield values were being quietly truncated 
as well as bugs relating to using the wrong width.  The code changes are 
identical to those in the previous version of this patch series 
(originally posted in June and re-pinged many times since then), but for 
this version I have cleaned up the test cases to remove dependencies on 
header files, as Bernd requested.


-Sandra

2013-09-28  Sandra Loosemore  san...@codesourcery.com

	PR middle-end/23623
	PR middle-end/48784
	PR middle-end/56341
	PR middle-end/56997

	gcc/
	* expmed.c (strict_volatile_bitfield_p): New function.
	(store_bit_field_1): Don't special-case strict volatile
	bitfields here.
	(store_bit_field): Handle strict volatile bitfields here instead.
	(store_fixed_bit_field): Don't special-case strict volatile
	bitfields here.
	(extract_bit_field_1): Don't special-case strict volatile
	bitfields here.
	(extract_bit_field): Handle strict volatile bitfields here instead.
	(extract_fixed_bit_field): Don't special-case strict volatile
	bitfields here.  Simplify surrounding code to resemble that in
	store_fixed_bit_field.
	* doc/invoke.texi (Code Gen Options): Update
	-fstrict-volatile-bitfields description.

	gcc/testsuite/
	* gcc.dg/pr23623.c: New test.
	* gcc.dg/pr48784-1.c: New test.
	* gcc.dg/pr48784-2.c: New test.
	* gcc.dg/pr56341-1.c: New test.
	* gcc.dg/pr56341-2.c: New test.
	* gcc.dg/pr56997-1.c: New test.
	* gcc.dg/pr56997-2.c: New test.
	* gcc.dg/pr56997-3.c: New test.
Index: gcc/expmed.c
===
--- gcc/expmed.c	(revision 203003)
+++ gcc/expmed.c	(working copy)
@@ -415,6 +415,42 @@ lowpart_bit_field_p (unsigned HOST_WIDE_
 return bitnum % BITS_PER_WORD == 0;
 }
 
+/* Return true if -fstrict-volatile-bitfields applies an access of OP0
+   containing BITSIZE bits starting at BITNUM, with field mode FIELDMODE.  */
+
+static bool
+strict_volatile_bitfield_p (rtx op0, unsigned HOST_WIDE_INT bitsize,
+			unsigned HOST_WIDE_INT bitnum,
+			enum machine_mode fieldmode)
+{
+  unsigned HOST_WIDE_INT modesize = GET_MODE_BITSIZE (fieldmode);
+
+  /* -fstrict-volatile-bitfields must be enabled and we must have a
+ volatile MEM.  */
+  if (!MEM_P (op0)
+  || !MEM_VOLATILE_P (op0)
+  || flag_strict_volatile_bitfields = 0)
+return false;
+
+  /* Non-integral modes likely only happen with packed structures.
+ Punt.  */
+  if (!SCALAR_INT_MODE_P (fieldmode))
+return false;
+
+  /* The bit size must not be larger than the field mode, and
+ the field mode must not be larger than a word.  */
+  if (bitsize  modesize || modesize  BITS_PER_WORD)
+return false;
+
+  /* Check for cases of unaligned fields that must be split.  */
+  if (bitnum % BITS_PER_UNIT + bitsize  modesize
+  || (STRICT_ALIGNMENT
+	   bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize  modesize))
+return false;
+
+  return true;
+}
+
 /* Return true if OP is a memory and if a bitfield of size BITSIZE at
bit number BITNUM can be treated as a simple value of mode MODE.  */
 
@@ -813,12 +849,8 @@ store_bit_field_1 (rtx str_rtx, unsigned
  cheap register alternative is available.  */
   if (MEM_P (op0))
 {
-  /* Do not use unaligned memory insvs for volatile bitfields when
-	 -fstrict-volatile-bitfields is in effect.  */
-  if (!(MEM_VOLATILE_P (op0)
-	 flag_strict_volatile_bitfields  0)
-	   get_best_mem_extraction_insn (insv, EP_insv, bitsize, bitnum,
-	   fieldmode)
+  if (get_best_mem_extraction_insn (insv, EP_insv, bitsize, bitnum,
+	fieldmode)
 	   store_bit_field_using_insv (insv, op0, bitsize, bitnum, value))
 	return true;
 
@@ -871,6 +903,27 @@ store_bit_field (rtx str_rtx, unsigned H
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
+  /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
+  if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, fieldmode))
+{
+
+  /* Storing any naturally aligned field can be done with a simple
+	 store.  For targets that support fast unaligned memory, any
+	 naturally sized, unit aligned field can be done directly.  */
+  if (simple_mem_bitfield_p (str_rtx, bitsize, bitnum, fieldmode))
+	{
+	  str_rtx = adjust_bitfield_address (str_rtx, fieldmode,
+	 bitnum / BITS_PER_UNIT);
+	  emit_move_insn (str_rtx, value);
+	}
+  else
+	/* Explicitly override the C/C++ memory model; ignore the
+	   bit range so that we can do the access in the mode mandated
+	   by -fstrict-volatile-bitfields instead.  */
+	store_fixed_bit_field (str_rtx, bitsize, bitnum, 0, 0, value);
+  return;
+}
+
   /* Under the C++0x memory model, we must not touch bits outside the
  bit region.  Adjust the address to start at the beginning of the
  bit region.  */
@@ -923,29 +976,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
   if (MEM_P (op0))
 {
-  unsigned HOST_WIDE_INT 

[PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-09-27 Thread Sandra Loosemore
This patch makes -fstrict-volatile-bitfields not be the default on any 
target.  It is unchanged from part 4 of the previous version (v3) of 
this patch set that I originally posted back in June and have been 
re-pinging many times since then.


Some reviewers of my original patch series argued quite strongly that 
the C/C++ language semantics ought to take precedence over any 
target-specific ABI in cases where they conflict.  I am neutral on this 
change myself, but if it is a requirement for approval of the other part 
of the patch that fixes the wrong-code bugs, I think users on targets 
such as ARM that currently default to enabling this flag would be better 
off specifying the flag explicitly to get code that does what they want, 
than getting broken code by default and no way to tell GCC to DTRT.  :-( 
 And that's the behavior we're stuck with if we do nothing or can't 
reach a consensus about what to do.  :-(


Bernd Edlinger has been working on a followup patch to issue warnings in 
cases where -fstrict-volatile-bitfields behavior conflicts with the new 
C/C++ memory model, which might help to ease the transition in the 
change of defaults.  I believe this was the last version posted:


http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00100.html

-Sandra

2013-09-28  Sandra Loosemore  san...@codesourcery.com

	gcc/
	* config/aarch64/aarch64.c (aarch64_override_options): Don't
	override flag_strict_volatile_bitfields.
	* config/arm/arm.c (arm_option_override): Likewise.
	* config/h8300/h8300.c (h8300_option_override): Likewise.
	* config/m32c/m32c.c (m32c_option_override): Likewise.
	* config/rx/rx.c (rx_option_override): Likewise.
	* config/sh/sh.c (sh_option_override): Likewise.
	* doc/invoke.texi (Code Gen Options): Document that
	-fstrict-volatile-bitfields is no longer the default on any target.

	gcc/testsuite/
	* gcc.target/arm/volatile-bitfields-1.c: Add explicit
	-fstrict-volatile-bitfields.
	* gcc.target/arm/volatile-bitfields-2.c: Likewise.
	* gcc.target/arm/volatile-bitfields-3.c: Likewise.
	* gcc.target/arm/volatile-bitfields-4.c: Likewise.
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c	(revision 203002)
+++ gcc/config/aarch64/aarch64.c	(working copy)
@@ -5141,10 +5141,6 @@ aarch64_override_options (void)
 
   aarch64_build_bitmask_table ();
 
-  /* This target defaults to strict volatile bitfields.  */
-  if (flag_strict_volatile_bitfields  0  abi_version_at_least (2))
-flag_strict_volatile_bitfields = 1;
-
   /* If the user did not specify a processor, choose the default
  one for them.  This will be the CPU set during configuration using
  --with-cpu, otherwise it is generic.  */
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 203002)
+++ gcc/config/arm/arm.c	(working copy)
@@ -2136,11 +2136,6 @@ arm_option_override (void)
 			   global_options.x_param_values,
 			   global_options_set.x_param_values);
 
-  /* ARM EABI defaults to strict volatile bitfields.  */
-  if (TARGET_AAPCS_BASED  flag_strict_volatile_bitfields  0
-   abi_version_at_least(2))
-flag_strict_volatile_bitfields = 1;
-
   /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we have deemed
  it beneficial (signified by setting num_prefetch_slots to 1 or more.)  */
   if (flag_prefetch_loop_arrays  0
Index: gcc/config/h8300/h8300.c
===
--- gcc/config/h8300/h8300.c	(revision 203002)
+++ gcc/config/h8300/h8300.c	(working copy)
@@ -437,10 +437,6 @@ h8300_option_override (void)
 	 restore er6 though, so bump up the cost.  */
   h8300_move_ratio = 6;
 }
-
-  /* This target defaults to strict volatile bitfields.  */
-  if (flag_strict_volatile_bitfields  0  abi_version_at_least(2))
-flag_strict_volatile_bitfields = 1;
 }
 
 /* Return the byte register name for a register rtx X.  B should be 0
Index: gcc/config/m32c/m32c.c
===
--- gcc/config/m32c/m32c.c	(revision 203002)
+++ gcc/config/m32c/m32c.c	(working copy)
@@ -416,10 +416,6 @@ m32c_option_override (void)
   if (TARGET_A24)
 flag_ivopts = 0;
 
-  /* This target defaults to strict volatile bitfields.  */
-  if (flag_strict_volatile_bitfields  0  abi_version_at_least(2))
-flag_strict_volatile_bitfields = 1;
-
   /* r8c/m16c have no 16-bit indirect call, so thunks are involved.
  This is always worse than an absolute call.  */
   if (TARGET_A16)
Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 203002)
+++ gcc/config/rx/rx.c	(working copy)
@@ -2691,10 +2691,6 @@ rx_option_override (void)
 	  }
   }
 
-  /* This target defaults to strict volatile bitfields.  */
-  if (flag_strict_volatile_bitfields  0  abi_version_at_least(2))
-