RE: [PATCH]Fix computation of offset in ivopt
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Friday, September 27, 2013 1:07 PM To: 'Richard Biener' Cc: GCC Patches Subject: RE: [PATCH]Fix computation of offset in ivopt -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 6:31 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Fix computation of offset in ivopt On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } Will be refined. note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / + BITS_PER_UNIT; may still overflow. Since it's unsigned + signed + signed, according to implicit conversion, the signed operand will be converted to unsigned, so the overflow would only happen when off0 is huge number and tmp/boffset is large positive number, right? Do I need to check whether off0 is larger than the overflowed result? Also there is signed-unsigned problem here, see below. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION + (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). I agree, The root cause is in split_offset_1, in which offset is computed. Every time offset is computed in this function with a signed operand (like int_cst_value (tmp) above), we need to take care the possible negative number problem. Take this case as an example, we need to do below change: case INTEGER_CST: //... *offset = int_cst_value (expr); change to case INTEGER_CST: //... *offset = sext_hwi (int_cst_value (expr), type); and case MULT_EXPR: //... *offset = sext_hwi (int_cst_value (expr), type); to case MULT_EXPR: //... HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); *offset = sext_hwi (xxx, type); Any comments? Thought twice, I guess we can compute signed offset in strip_offset_1 and sign extend it for strip_offset, thus we don't need to change every computation of offset in that function. Thanks. bin
Re: OMP4/cilkplus: simd clone function mangling
On Thu, Sep 26, 2013 at 9:35 PM, Aldy Hernandez al...@redhat.com wrote: + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, +DECL_ATTRIBUTES (new_node-symbol.decl)); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Oh yeah, rth had asked me why I remove the attribute. My initial thoughts were that whether or not a function is a simd clone can be accessed through the cgraph bits (node-simdclone != NULL for the clone, and node-has_simd_clones for the parent). No sense keeping the attribute. But I can leave it if you think it's better. Why have it in the first place if it's marked in the cgraph? Richard. Aldy
Re: [google gcc-4_8] fix size_estimation for builtin_expect
On Fri, Sep 27, 2013 at 12:23 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for early inlining. This patch fixes this problem. -Rong 2013-09-26 Rong Xu x...@google.com * ipa-inline-analysis.c (estimate_function_body_sizes): fix the size estimation for builtin_expect. This seems fine with an comment in the code what it is about. I also think we want to support mutiple builtin_expects in a BB so perhaps we want to have pointer set of statements to ignore? To avoid spagetti code, please just move the new logic into separate functions. Looks like this could use tree-ssa.c:walk_use_def_chains (please change its implementation as necessary, make it C++, etc. - you will be the first user again). Richard. Honza Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 202638) +++ ipa-inline-analysis.c (working copy) @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node * /* Estimate static overhead for function prologue/epilogue and alignment. */ int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); int size = overhead; + gimple fix_expect_builtin; + /* Benefits are scaled by probability of elimination that is in range 0,2. */ basic_block bb; @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node * } } + fix_expect_builtin = NULL; for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT)) +{ + tree var = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + use_operand_p use_p; + gimple use_stmt; + bool match = false; + bool done = false; + gcc_assert (var arg); + gcc_assert (TREE_CODE (var) == SSA_NAME); + + while (TREE_CODE (arg) == SSA_NAME) +{ + gimple stmt_tmp = SSA_NAME_DEF_STMT (arg); + switch (gimple_assign_rhs_code (stmt_tmp)) +{ + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: +match = true; +done = true; +break; + case NOP_EXPR: +break; + default: +done = true; +break; +} + if (done) +break; + arg = gimple_assign_rhs1 (stmt_tmp); +} + + if (match single_imm_use (var, use_p, use_stmt)) +{ + if (gimple_code (use_stmt) == GIMPLE_COND) +{ + fix_expect_builtin = use_stmt; +} +} + + /* we should see one builtin_expert call in one bb. */ + break; +} +} + + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); int this_size = estimate_num_insns (stmt, eni_size_weights); int this_time = estimate_num_insns (stmt, eni_time_weights); int prob; struct predicate will_be_nonconstant; + if (stmt == fix_expect_builtin) +{ + this_size--; + this_time--; +} + if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, );
Re: [PATCH]Fix computation of offset in ivopt
On Fri, Sep 27, 2013 at 7:07 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 6:31 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Fix computation of offset in ivopt On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } Will be refined. note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / + BITS_PER_UNIT; may still overflow. Since it's unsigned + signed + signed, according to implicit conversion, the signed operand will be converted to unsigned, so the overflow would only happen when off0 is huge number and tmp/boffset is large positive number, right? Do I need to check whether off0 is larger than the overflowed result? Also there is signed-unsigned problem here, see below. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION + (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). I agree, The root cause is in split_offset_1, in which offset is computed. Every time offset is computed in this function with a signed operand (like int_cst_value (tmp) above), we need to take care the possible negative number problem. Take this case as an example, we need to do below change: case INTEGER_CST: //... *offset = int_cst_value (expr); change to case INTEGER_CST: //... *offset = sext_hwi (int_cst_value (expr), type); and case MULT_EXPR: //... *offset = sext_hwi (int_cst_value (expr), type); to case MULT_EXPR: //... HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); *offset = sext_hwi (xxx, type); Any comments? The issue is of course that we end up converting offsets to sizetype at some point which makes them all appear unsigned. The fix for this is to simply interpret them as signed ... but it's really a mess ;) Richard. Thanks. bin
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
Sure, but the modifier is not meant to force something into memory, especially when it is already in an register. Remember, we are only talking of structures here, and we only want to access one member. It is more the other way round: It says: You do not have to load the value in a register, if it is already in memory I'm happy EXPAND_MEMORY means we are interested in a memory result, even if the memory is constant and we could have propagated a constant value. */ We definitely want to propagate constant values here, look at the code below. And it already lists explicit cases where we really need to splill to memory. -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
They don't need to be kept synchronised as such. It's fine for the index to allow more than must_be_index_p. But if you're not keen on the current structure, does the following look better? Tested on x86_64-linux-gnu. Thanks, Richard gcc/ * rtlanal.c (must_be_base_p, must_be_index_p): Delete. (binary_scale_code_p, get_base_term, get_index_term): New functions. (set_address_segment, set_address_base, set_address_index) (set_address_disp): Accept the argument unconditionally. (baseness): Remove must_be_base_p and must_be_index_p checks. (decompose_normal_address): Classify as much as possible in the main loop. Yes, fine by me, thanks. -- Eric Botcazou
Re: RFA: Store the REG_BR_PROB probability directly as an int
Thanks for the testing. It also passes bootstrap on x86_64-linux-gnu. OK to install? Yes, thanks. -- Eric Botcazou
Re: [patch] Separate immediate uses and phi routines from tree-flow*.h
On Thu, Sep 26, 2013 at 6:07 PM, Andrew MacLeod amacl...@redhat.com wrote: On 09/25/2013 04:49 AM, Richard Biener wrote: On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod amacl...@redhat.com wrote: This larger patch moves all the immediate use and operand routines from tree-flow.h into tree-ssa-operands.h. It also moves the basic phi routines and prototypes into a newly created tree-phinodes.h, or tree-ssa-operands.h if they belong there. And finally shuffles a couple of other routines which allows tree-ssa-operands.h to be removed from the gimple.h header file. of note or interest: 1 - dump_decl_set() was defined in tree-into-ssa.c, but isn't really ssa specific. Its tree-specific, so normally I'd throw it into tree.c. Looking forward a little, its only used in a gimple context, so when we map to gimple_types it will need to be converted to/created for those. If it is in tree.c, I'll have to create a new version for gimple types, and then the routine in tree.c will become unused. Based on that, I figured gimple.c is the place place for it. 2 - has_zero_uses_1() and single_imm_use_1() were both in tree-cfg.c for some reason.. they've been moved to tree-ssa-operands.c 3 - a few routines seem like basic gimple routines, but really turn out to require the operand infrastructure to implement... so they are moved to tree-ssa-operands.[ch] as well. This sort of thing showed up when removing tree-ssa-operands.h from the gimple.h include file. These were things like gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified Note that things like gimple_vuse_op are on the interface border between gimple (where the SSA operands are stored) and SSA operands. So it's not so clear for them given they access internal gimple fields directly but use the regular SSA operand API. I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch]. Ugg. I incorporated what we talked about, and it was much messier than expected :-P. I ended up with a chicken and egg problem between the gimple_v{use,def}_op routines in gimple-ssa.h and the operand routines in tree-ssa-operands.h. They both require each other, and I couldn't get things into a consistent state while they are in separate files. It was actually the immediate use iterators which were requiring gimple_vuse_op()... So I have created a new ssa-iterators.h file to resolve this problem. They build on the operand code and clearly has other prerequisites, so that seems reasonable to me... This in fact solves a couple of other little warts. It allows me to put both gimple_phi_arg_imm_use_ptr() and phi_arg_index_from_use() into tree-phinodes.h. It also exposes that gimple.c::walk_stmt_load_store_addr_ops() and friends actually depend on the existence of PHI nodes, meaning it really belongs on the gimple-ssa border as well. So I moved those into gimple-ssa.c It doesn't depend on PHI nodes but it also works for PHI nodes. So I'd rather have it in gimple.c. And finally, it turns out that a lot of files include tree-flow.h and depend on it to include gimple.h rather than including it themselves. Since tree-flow.h is losing its kitchen-sink attribute, and I needed to move it to the bottom of the #include list for tree-ssa.h, I have temporarily included gimple.h at the top of tree-ssa.h to make sure it gets hauled in. When I remove tree-flow.h as the everyone includes it file, I'll add gimple.h in all the appropriate .c files and remove it from tree-ssa.h. It would have just made this growing patch even more annoying for now. Does this seem reasonable? Yes - try leaving walk_stmt_load_store_addr_ops in gimple.c though, if that is technically possible. Otherwise I guess I don't mind. Thanks, Richard. Bootstraps on x86_64-unknown-linux-gnu and currently running regressions. Andrew PS Oh and I noticed the macro name for tree-outof-ssa.h wasnt right, so I changed it too. Next I'll diverge into trying to sort through putting all the phi-related structs and such into tree-phinodes.h
Re: [PATCH, RTL] Prepare ARM build with LRA
below is a trivial patch, which makes both parts of test signed. With this, bootstrap completes on powerpc-darwin9 - however, you might want to check that it still does what you intended. Please install under PR middle-end/58547 if not already done. -- Eric Botcazou
Re: Commit: MSP430: Pass -md on to assembler
Hi Mike, I must say though, it seems wrong to have to provide a sign-extend pointer pattern when pointers (on the MSP430) are unsigned. Agreed. If we instead ask, is it sane for gcc to ever want to signed extend in this case, the answer appears to be no. Why does it, ptr_mode is SImode, and expand_builtin_next_arg is used to perform the addition in this mode. It 'just' knows that is can be signed extended… and just does it that way. This seems like it is wrong. Index: builtins.c === --- builtins.c (revision 202634) +++ builtins.c (working copy) @@ -4094,7 +4094,7 @@ expand_builtin_next_arg (void) return expand_binop (ptr_mode, add_optab, crtl-args.internal_arg_pointer, crtl-args.arg_offset_rtx, - NULL_RTX, 0, OPTAB_LIB_WIDEN); + NULL_RTX, POINTERS_EXTEND_UNSIGNED 0, OPTAB_LIB_WIDEN); } /* Make it easier for the backends by protecting the valist argument would fix this problem. If this is done, the unmodified test case then doesn't abort. Arguably, the extension should be done as the port directs. It isn't clear to me why they do not. Ok? OK by me, although I cannot approve that particular patch. I did eventually find some test cases that exercised the sign-extend pointer pattern, so I was able to check the generated code - it worked OK. But I ran into a very strange problem. With your PARTIAL_INT_MODE_NAME patch applied GCC started erroneously eliminating NULL function pointer checks! This was particularly noticeable in libgcc/crtstuff.c where for example: static void __attribute__((used)) frame_dummy (void) { static struct object object; if (__register_frame_info) __register_frame_info (__EH_FRAME_BEGIN__, object); (this is a simplified version of the real code) ... is compiled as if it had be written as: static void __attribute__((used)) frame_dummy (void) { static struct object object; __register_frame_info (__EH_FRAME_BEGIN__, object); This only happens for the LARGE model (when pointers are PSImode) but I was baffled as to where it could be happening. Have you come across anything like this ? Cheers Nick
Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set
Like the following. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2013-09-26 Richard Biener rguent...@suse.de * alias.h (component_uses_parent_alias_set): Rename to ... (component_uses_parent_alias_set_from): ... this. * alias.c (component_uses_parent_alias_set): Rename to ... (component_uses_parent_alias_set_from): ... this and return the desired parent. (reference_alias_ptr_type_1): Use the result from component_uses_parent_alias_set_from instead of stripping components one at a time. * emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust. FWIW it looks fine to me. -- Eric Botcazou
Re: [ping] [PATCH] Silence an unused variable warning
Let's CC Vladimir on this easy one. Cheers. Jan-Benedict Glaw jbg...@lug-owl.de a écrit: On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: Hi! With the VAX target, I see this warning: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include -I../../../../gcc/gcc/../libdecnumber -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../../gcc/gcc/../libbacktrace ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’: ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable ‘value_p’ [-Wunused-variable] bool value_p; ^ [...] Ping: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01568.html `-- http://gcc.gnu.org/ml/gcc-patches/2013-09/txtnrNwaGiD3x.txt MfG, JBG -- Dodji
Generic tuning in x86-tune.def 1/2
Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic. (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic. Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 202966) +++ config/i386/x86-tune.def(working copy) @@ -115,9 +115,9 @@ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPEN m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMDFAM10 | m_BDVER | m_GENERIC) DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, sse_unaligned_load_optimal, - m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER | m_SLM) + m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER | m_SLM | m_GENERIC) DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, sse_unaligned_store_optimal, - m_COREI7 | m_BDVER | m_SLM) + m_COREI7 | m_BDVER | m_BTVER | m_SLM | m_GENERIC) DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, sse_packed_single_insn_optimal, m_BDVER) /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies @@ -146,8 +146,7 @@ DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSION /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more than 4 branch instructions in the 16 byte window. */ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, - m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE - | m_GENERIC) + m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_ATHLON_K8 | m_AMDFAM10) DEF_TUNE (X86_TUNE_SCHEDULE, schedule, m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC) @@ -156,13 +155,13 @@ DEF_TUNE (X86_TUNE_USE_BT, use_bt, DEF_TUNE (X86_TUNE_USE_INCDEC, use_incdec, ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_GENERIC)) DEF_TUNE (X86_TUNE_PAD_RETURNS, pad_returns, - m_AMD_MULTIPLE | m_GENERIC) + m_ATHLON_K8 | m_AMDFAM10 | | m_GENERIC) DEF_TUNE (X86_TUNE_PAD_SHORT_FUNCTION, pad_short_function, m_ATOM) DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, ext_80387_constants, m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC) DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE, avoid_vector_decode, - m_K8 | m_GENERIC) + m_K8) /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode and SImode multiply, but 386 and 486 do HImode multiply faster. */ DEF_TUNE (X86_TUNE_PROMOTE_HIMODE_IMUL, promote_himode_imul, @@ -217,7 +216,7 @@ DEF_TUNE (X86_TUNE_REASSOC_INT_TO_PARALL /* X86_TUNE_REASSOC_FP_TO_PARALLEL: Try to produce parallel computations during reassociation of fp computation. */ DEF_TUNE (X86_TUNE_REASSOC_FP_TO_PARALLEL, reassoc_fp_to_parallel, - m_ATOM | m_SLM | m_HASWELL | m_BDVER1 | m_BDVER2) + m_ATOM | m_SLM | m_HASWELL | m_BDVER1 | m_BDVER2 | m_GENERIC) /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE regs instead of memory. */ DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, general_regs_sse_spill,
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On 27 September 2013 03:15, Tim Shen wrote: POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 I'm wondering whether we want to have a stricter mode that doesn't allow them, to help users avoid creating non-portable programs. We could check the value of the preprocessor macro __STRICT_ANSI__, which is set by -std=c++11 but not by -std=gnu++11, although that's not really the right flag. We want something more like the GNU shell utils' POSIXLY_CORRECT.
Re: User-define literals for std::complex.
On 27 September 2013 05:17, Ed Smith-Rowland wrote: The complex user-defined literals finally passed (n3779) with the resolution to DR1473 allowing the suffix id to touch the quotes (Can't find it but I put it in not too long ago). I think it's been approved by the LWG and looks like it will go to a vote by the full committee, but let's wait for that to pass before making any changes.
Re: [gomp4] Library side of depend clause support
On Fri, Sep 27, 2013 at 01:48:36AM +0200, Jakub Jelinek wrote: Perhaps. What if I do just minor cleanup (use flexible array members for the reallocated vectors, and perhaps keep only the last out/inout task in the hash table chains rather than all of them), retest, commit and then we can discuss/incrementally improve it? Here is what I've committed now, the incremental changes were really only using a structure with flex array member for the dependers vectors, removing/making redundant earlier !ent-is_in when adding !is_in into the chain and addition of new testcases. Let's improve it incrementally later. 2013-09-27 Jakub Jelinek ja...@redhat.com * libgomp.h: Include stdlib.h. (struct gomp_task_depend_entry, struct gomp_dependers_vec): New types. (struct gomp_task): Add dependers, depend_hash, depend_count, num_dependees and depend fields. (struct gomp_taskgroup): Add num_children field. (gomp_finish_task): Free depend_hash if non-NULL. * libgomp_g.h (GOMP_task): Add depend argument. * hashtab.h: New file. * task.c: Include hashtab.h. (hash_entry_type): New typedef. (htab_alloc, htab_free, htab_hash, htab_eq): New inlines. (gomp_init_task): Clear dependers, depend_hash and depend_count fields. (GOMP_task): Add depend argument, handle depend clauses. Increment num_children field in taskgroup. (gomp_task_run_pre): Don't increment task_running_count here, nor clear task_pending bit. (gomp_task_run_post_handle_depend_hash, gomp_task_run_post_handle_dependers, gomp_task_run_post_handle_depend): New functions. (gomp_task_run_post_remove_parent): Clear in_taskwait before signalling corresponding semaphore. (gomp_task_run_post_remove_taskgroup): Decrement num_children field and make the decrement to 0 MEMMODEL_RELEASE operation, rather than storing NULL to taskgroup-children. Clear in_taskgroup_wait before signalling corresponding semaphore. (gomp_barrier_handle_tasks): Move task_running_count increment and task_pending bit clearing here. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. (GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskwait_sem, enter critical section again. (GOMP_taskgroup_start): Initialize num_children field. (GOMP_taskgroup_end): Check num_children instead of children before critical section. If children is NULL, but num_children is non-zero, wait on taskgroup_sem. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskgroup_sem, enter critical section again. * testsuite/libgomp.c/depend-1.c: New test. * testsuite/libgomp.c/depend-2.c: New test. * testsuite/libgomp.c/depend-3.c: New test. * testsuite/libgomp.c/depend-4.c: New test. --- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200 +++ libgomp/libgomp.h 2013-09-27 09:05:17.025402127 +0200 @@ -39,6 +39,7 @@ #include pthread.h #include stdbool.h +#include stdlib.h #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility push(hidden) @@ -253,7 +254,26 @@ enum gomp_task_kind GOMP_TASK_TIED }; +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; + +struct gomp_dependers_vec +{ + size_t n_elem; + size_t allocated; + struct gomp_task *elem[]; +}; /* This structure describes a task to be run by a thread. */ @@ -268,6 +288,10 @@ struct gomp_task struct gomp_task *next_taskgroup; struct gomp_task *prev_taskgroup; struct gomp_taskgroup *taskgroup; + struct gomp_dependers_vec *dependers; + struct htab *depend_hash; + size_t depend_count; + size_t num_dependees; struct gomp_task_icv icv; void (*fn) (void *); void *fn_data; @@ -277,6 +301,7 @@ struct gomp_task bool final_task; bool copy_ctors_done; gomp_sem_t taskwait_sem; + struct gomp_task_depend_entry depend[]; }; struct gomp_taskgroup @@ -286,6 +311,7 @@ struct gomp_taskgroup bool in_taskgroup_wait; bool cancelled; gomp_sem_t taskgroup_sem; + size_t num_children; }; /* This structure describes a team of threads. These are the threads @@ -525,6 +551,8 @@ extern void gomp_barrier_handle_tasks (g static void inline gomp_finish_task (struct gomp_task *task) { + if (__builtin_expect (task-depend_hash != NULL, 0)) +free (task-depend_hash); gomp_sem_destroy (task-taskwait_sem); } ---
[PING] [C++ PATCH] demangler fix (take 2)
Gary Benson wrote: Hi all, This is a resubmission of my previous demangler fix [1] rewritten to avoid using hashtables and other libiberty features. From the above referenced email: d_print_comp maintains a certain amount of scope across calls (namely a stack of templates) which is used when evaluating references in template argument lists. If such a reference is later used from a subtitution then the scope in force at the time of the substitution is used. This appears to be wrong (I say appears because I couldn't find anything in the API [2] to clarify this). The attached patch causes the demangler to capture the scope the first time such a reference is traversed, and to use that captured scope on subsequent traversals. This fixes GDB PR 14963 [3] whereby a reference is resolved against the wrong template, causing an infinite loop and eventual stack overflow and segmentation fault. I've added the result to the demangler test suite, but I know of no way to check the validity of the demangled symbol other than by inspection (and I am no expert here!) If anybody knows a way to check this then please let me know! Otherwise, I hope this not-really-checked demangled version is acceptable. Thanks, Gary [1] http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00215.html [2] http://mentorembedded.github.io/cxx-abi/abi.html#mangling [3] http://sourceware.org/bugzilla/show_bug.cgi?id=14963 -- http://gbenson.net/ diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog index 89e108a..2ff8216 100644 --- a/libiberty/ChangeLog +++ b/libiberty/ChangeLog @@ -1,3 +1,20 @@ +2013-09-17 Gary Benson gben...@redhat.com + + * cp-demangle.c (struct d_saved_scope): New structure. + (struct d_print_info): New fields saved_scopes and + num_saved_scopes. + (d_print_init): Initialize the above. + (d_print_free): New function. + (cplus_demangle_print_callback): Call the above. + (d_copy_templates): New function. + (d_print_comp): New variables saved_templates and + need_template_restore. + [DEMANGLE_COMPONENT_REFERENCE, + DEMANGLE_COMPONENT_RVALUE_REFERENCE]: Capture scope the first + time the component is traversed, and use the captured scope for + subsequent traversals. + * testsuite/demangle-expected: Add regression test. + 2013-09-10 Paolo Carlini paolo.carl...@oracle.com PR bootstrap/58386 diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index 70f5438..a199f6d 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -275,6 +275,18 @@ struct d_growable_string int allocation_failure; }; +/* A demangle component and some scope captured when it was first + traversed. */ + +struct d_saved_scope +{ + /* The component whose scope this is. */ + const struct demangle_component *container; + /* The list of templates, if any, that was current when this + scope was captured. */ + struct d_print_template *templates; +}; + enum { D_PRINT_BUFFER_LENGTH = 256 }; struct d_print_info { @@ -302,6 +314,10 @@ struct d_print_info int pack_index; /* Number of d_print_flush calls so far. */ unsigned long int flush_count; + /* Array of saved scopes for evaluating substitutions. */ + struct d_saved_scope *saved_scopes; + /* Number of saved scopes in the above array. */ + int num_saved_scopes; }; #ifdef CP_DEMANGLE_DEBUG @@ -3665,6 +3681,30 @@ d_print_init (struct d_print_info *dpi, demangle_callbackref callback, dpi-opaque = opaque; dpi-demangle_failure = 0; + + dpi-saved_scopes = NULL; + dpi-num_saved_scopes = 0; +} + +/* Free a print information structure. */ + +static void +d_print_free (struct d_print_info *dpi) +{ + int i; + + for (i = 0; i dpi-num_saved_scopes; i++) +{ + struct d_print_template *ts, *tn; + + for (ts = dpi-saved_scopes[i].templates; ts != NULL; ts = tn) + { + tn = ts-next; + free (ts); + } +} + + free (dpi-saved_scopes); } /* Indicate that an error occurred during printing, and test for error. */ @@ -3749,6 +3789,7 @@ cplus_demangle_print_callback (int options, demangle_callbackref callback, void *opaque) { struct d_print_info dpi; + int success; d_print_init (dpi, callback, opaque); @@ -3756,7 +3797,9 @@ cplus_demangle_print_callback (int options, d_print_flush (dpi); - return ! d_print_saw_error (dpi); + success = ! d_print_saw_error (dpi); + d_print_free (dpi); + return success; } /* Turn components into a human readable string. OPTIONS is the @@ -3913,6 +3956,36 @@ d_print_subexpr (struct d_print_info *dpi, int options, d_append_char (dpi, ')'); } +/* Return a shallow copy of the current list of templates. + On error d_print_error is called and a partial list may + be returned. Whatever is returned must be freed. */ + +static struct d_print_template * +d_copy_templates (struct
[patch] Fix PR bootstrap/58509
Hi, this fixes the ICE during the build of the Ada runtime on the SPARC, a fallout of the recent inliner changes: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01033.html The ICE is triggered because the ldd peephole merges an MEM with MEM_NOTRAP_P and a contiguous MEM without MEM_NOTRAP_P, keeping the MEM_NOTRAP_P flag on the result. As a consequence, an EH edge is eliminated and a BB is orphaned. I think this shows that my above inliner patch was too gross: when you have successive inlining, you can quickly end up with a mess of trapping and non- trapping memory accesses for the same object. So the attached seriously refines it, restricting it to parameters with reference type and leaning towards being less conservative. Again, this should only affect Ada. Tested on x86_64-suse-linux, OK for the mainline? 2013-09-27 Eric Botcazou ebotca...@adacore.com PR bootstrap/58509 * ipa-prop.h (get_ancestor_addr_info): Declare. * ipa-prop.c (get_ancestor_addr_info): Make public. * tree-inline.c (is_parm): Rename into... (is_ref_parm): ...this. (is_based_on_ref_parm): New predicate. (remap_gimple_op_r): Do not propagate TREE_THIS_NOTRAP on MEM_REF if a parameter with reference type has been remapped and the result is not based on another parameter with reference type. (copy_tree_body_r): Likewise on INDIRECT_REF and MEM_REF. 2013-09-27 Eric Botcazou ebotca...@adacore.com * gnat.dg/specs/opt1.ads: New test. -- Eric BotcazouIndex: tree-inline.c === --- tree-inline.c (revision 202912) +++ tree-inline.c (working copy) @@ -751,10 +751,11 @@ copy_gimple_bind (gimple stmt, copy_body return new_bind; } -/* Return true if DECL is a parameter or a SSA_NAME for a parameter. */ +/* Return true if DECL is a parameter with reference type or a SSA_NAME + for a parameter with reference type. */ static bool -is_parm (tree decl) +is_ref_parm (tree decl) { if (TREE_CODE (decl) == SSA_NAME) { @@ -763,7 +764,40 @@ is_parm (tree decl) return false; } - return (TREE_CODE (decl) == PARM_DECL); + return (TREE_CODE (decl) == PARM_DECL + TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE); +} + +/* Return true if DECL is based on a parameter with reference type or a + SSA_NAME for a parameter with with reference type. */ + +static bool +is_based_on_ref_parm (tree decl) +{ + HOST_WIDE_INT offset; + tree obj, expr; + gimple def_stmt; + + /* First the easy case. */ + if (is_ref_parm (decl)) +return true; + + /* Then look for an SSA name whose defining statement is of the form: + + D.1718_7 = parm_2(D)-f1; + + where parm_2 is a parameter with reference type. */ + if (TREE_CODE (decl) != SSA_NAME) +return false; + def_stmt = SSA_NAME_DEF_STMT (decl); + if (!def_stmt) +return false; + + expr = get_ancestor_addr_info (def_stmt, obj, offset); + if (!expr) +return false; + + return is_ref_parm (TREE_OPERAND (expr, 0)); } /* Remap the GIMPLE operand pointed to by *TP. DATA is really a @@ -865,12 +899,13 @@ remap_gimple_op_r (tree *tp, int *walk_s TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old); TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old); TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old); - /* We cannot propagate the TREE_THIS_NOTRAP flag if we have - remapped a parameter as the property might be valid only - for the parameter itself. */ + /* We cannot always propagate the TREE_THIS_NOTRAP flag if we have + remapped a parameter with reference type as the property may be + valid only for the parameter. */ if (TREE_THIS_NOTRAP (old) - (!is_parm (TREE_OPERAND (old, 0)) - || (!id-transform_parameter is_parm (ptr + (!is_ref_parm (TREE_OPERAND (old, 0)) + || !id-transform_parameter + || is_based_on_ref_parm (ptr))) TREE_THIS_NOTRAP (*tp) = 1; *walk_subtrees = 0; return NULL; @@ -1092,12 +1127,13 @@ copy_tree_body_r (tree *tp, int *walk_su TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old); TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old); TREE_READONLY (*tp) = TREE_READONLY (old); - /* We cannot propagate the TREE_THIS_NOTRAP flag if we - have remapped a parameter as the property might be - valid only for the parameter itself. */ + /* We cannot always propagate the TREE_THIS_NOTRAP flag + if we have remapped a parameter with reference type as + the property may be valid only for the parameter. */ if (TREE_THIS_NOTRAP (old) - (!is_parm (TREE_OPERAND (old, 0)) - || (!id-transform_parameter is_parm (ptr + (!is_ref_parm (TREE_OPERAND (old, 0)) + || !id-transform_parameter + || is_based_on_ref_parm (ptr))) TREE_THIS_NOTRAP (*tp) = 1; } } @@ -1118,12 +1154,13 @@ copy_tree_body_r (tree *tp, int
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On 9/27/13 4:34 AM, Jonathan Wakely wrote: On 27 September 2013 03:15, Tim Shen wrote: POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 I'm wondering whether we want to have a stricter mode that doesn't allow them, to help users avoid creating non-portable programs. We could check the value of the preprocessor macro __STRICT_ANSI__, which is set by -std=c++11 but not by -std=gnu++11, although that's not really the right flag. We want something more like the GNU shell utils' POSIXLY_CORRECT. Indeed. I think that for now __STRICT_ANSI__ can do, it's important to manage to accept those otherwise, as we discovered yesterday, we easily reject quite a few rather sensible regex users can write or find in examples: this started when Tim, upon my suggestion, tried the examples in the new edition of Nicolai Josuttis book and found in one those an escaped closed curly bracket (note, closed, open are definitely fine), which apparently most of the other implementations do not reject. Paolo.
Re: OMP4/cilkplus: simd clone function mangling
On 09/27/13 03:18, Richard Biener wrote: On Thu, Sep 26, 2013 at 9:35 PM, Aldy Hernandez al...@redhat.com wrote: + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, +DECL_ATTRIBUTES (new_node-symbol.decl)); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Oh yeah, rth had asked me why I remove the attribute. My initial thoughts were that whether or not a function is a simd clone can be accessed through the cgraph bits (node-simdclone != NULL for the clone, and node-has_simd_clones for the parent). No sense keeping the attribute. But I can leave it if you think it's better. Why have it in the first place if it's marked in the cgraph? It would be placed there by the front-end when parsing Cilk Plus simd-enabled functions. It's only in the the omp stage that we transfer that information to the cgraph bits.
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On 27 September 2013 13:32, Paolo Carlini wrote: On 9/27/13 4:34 AM, Jonathan Wakely wrote: On 27 September 2013 03:15, Tim Shen wrote: POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 I'm wondering whether we want to have a stricter mode that doesn't allow them, to help users avoid creating non-portable programs. We could check the value of the preprocessor macro __STRICT_ANSI__, which is set by -std=c++11 but not by -std=gnu++11, although that's not really the right flag. We want something more like the GNU shell utils' POSIXLY_CORRECT. Indeed. I think that for now __STRICT_ANSI__ can do, it's important to manage to accept those otherwise, as we discovered yesterday, we easily reject quite a few rather sensible regex users can write or find in examples: this started when Tim, upon my suggestion, tried the examples in the new edition of Nicolai Josuttis book and found in one those an escaped closed curly bracket (note, closed, open are definitely fine), which apparently most of the other implementations do not reject. Ah I see. I definitely agree it's good to accept that instead of being unnecessarily strict, but other people will want the option of strict conformance, so I think we can please everyone with something like: else { #ifdef __STRICT_ANSI__ __throw_regex_error(regex_constants::error_escape); #else _M_token = _S_token_ord_char; _M_value.assign(1, __c); #endif }
[committed] Fix move_sese_region_to_fn (PR middle-end/58551)
Hi! I've committed the following fix to a regression introduced in 4.9 early loop construction. SESE regions, as documented above move_sese_region_to_fn, are allowed to contain calls to noreturn functions like abort/exit. But, basic blocks leading to noreturn functions aren't actually placed in the loop inside of which the SESE region is present, but directly inside of the outermost loop of the function. So, we can't just move change loop_father of bb's belonging to entry_bb's loop_father to new function's outermost loop and move loops which have their outer loop equal to entry_bb's loop_father and have their header in the SESE region into the new function, but we also have to handle the same way the outermost loop of the original function. Bootstrapped/regtested on x86_64-linux and i686-linux, preapproved by richi on IRC, committed to trunk. 2013-09-27 Jakub Jelinek ja...@redhat.com PR middle-end/58551 * tree-cfg.c (move_sese_region_to_fn): Also move loops that are children of outermost saved_cfun's loop, and set it up to be moved to dest_cfun's outermost loop. Fix up num_nodes adjustments if loop != loop0 and SESE region contains bbs that belong to loop0. * c-c++-common/gomp/pr58551.c: New test. --- gcc/tree-cfg.c.jj 2013-09-13 14:41:28.0 +0200 +++ gcc/tree-cfg.c 2013-09-27 12:23:48.582217401 +0200 @@ -6662,12 +6662,13 @@ move_sese_region_to_fn (struct function struct function *saved_cfun = cfun; int *entry_flag, *exit_flag; unsigned *entry_prob, *exit_prob; - unsigned i, num_entry_edges, num_exit_edges; + unsigned i, num_entry_edges, num_exit_edges, num_nodes; edge e; edge_iterator ei; htab_t new_label_map; struct pointer_map_t *vars_map, *eh_map; struct loop *loop = entry_bb-loop_father; + struct loop *loop0 = get_loop (saved_cfun, 0); struct move_stmt_d d; /* If ENTRY does not strictly dominate EXIT, this cannot be an SESE @@ -6760,16 +6761,29 @@ move_sese_region_to_fn (struct function set_loops_for_fn (dest_cfun, loops); /* Move the outlined loop tree part. */ + num_nodes = bbs.length (); FOR_EACH_VEC_ELT (bbs, i, bb) { - if (bb-loop_father-header == bb - loop_outer (bb-loop_father) == loop) + if (bb-loop_father-header == bb) { struct loop *this_loop = bb-loop_father; - flow_loop_tree_node_remove (bb-loop_father); - flow_loop_tree_node_add (get_loop (dest_cfun, 0), this_loop); - fixup_loop_arrays_after_move (saved_cfun, cfun, this_loop); + struct loop *outer = loop_outer (this_loop); + if (outer == loop + /* If the SESE region contains some bbs ending with +a noreturn call, those are considered to belong +to the outermost loop in saved_cfun, rather than +the entry_bb's loop_father. */ + || outer == loop0) + { + if (outer != loop) + num_nodes -= this_loop-num_nodes; + flow_loop_tree_node_remove (bb-loop_father); + flow_loop_tree_node_add (get_loop (dest_cfun, 0), this_loop); + fixup_loop_arrays_after_move (saved_cfun, cfun, this_loop); + } } + else if (bb-loop_father == loop0 loop0 != loop) + num_nodes--; /* Remove loop exits from the outlined region. */ if (loops_for_fn (saved_cfun)-exits) @@ -6789,6 +6803,7 @@ move_sese_region_to_fn (struct function /* Setup a mapping to be used by move_block_to_fn. */ loop-aux = current_loops-tree_root; + loop0-aux = current_loops-tree_root; pop_cfun (); @@ -6817,11 +6832,13 @@ move_sese_region_to_fn (struct function } loop-aux = NULL; + loop0-aux = NULL; /* Loop sizes are no longer correct, fix them up. */ - loop-num_nodes -= bbs.length (); + loop-num_nodes -= num_nodes; for (struct loop *outer = loop_outer (loop); outer; outer = loop_outer (outer)) -outer-num_nodes -= bbs.length (); +outer-num_nodes -= num_nodes; + loop0-num_nodes -= bbs.length () - num_nodes; if (saved_cfun-has_simduid_loops || saved_cfun-has_force_vect_loops) { --- gcc/testsuite/c-c++-common/gomp/pr58551.c.jj2013-09-27 11:18:20.825251967 +0200 +++ gcc/testsuite/c-c++-common/gomp/pr58551.c 2013-09-27 11:17:56.0 +0200 @@ -0,0 +1,33 @@ +/* PR middle-end/58551 */ +/* { dg-do compile } */ +/* { dg-options -O0 -fopenmp } */ + +void +foo (int *a) +{ + int i; + for (i = 0; i 8; i++) +#pragma omp task +if (a[i]) + __builtin_abort (); +} + +void bar (int, int); + +void +baz (int *a) +{ + int i; + for (i = 0; i 8; i++) +#pragma omp task +if (a[i]) + { + int j, k; + for (j = 0; j 10; j++) + for (k = 0; k 8; k++) + bar (j, k); + for (k = 0; k 12; k++) + bar (-1, k); + __builtin_abort (); + } +} Jakub
[patch] fix libstdc++/57465
PR libstdc++/57465 * include/std/functional (_Function_base::_Base_manager::_M_not_empty_function): Fix overload for pointers. * testsuite/20_util/function/cons/57465.cc: New. Tested x86_64-linux, committed to trunk. I'll apply it to the branches after it's been on trunk without problems for a while. commit 55531e9c74a5f2b4699250b6b302d49f7dc8c5ae Author: Jonathan Wakely jwakely@gmail.com Date: Wed Aug 7 01:38:39 2013 +0100 PR libstdc++/57465 * include/std/functional (_Function_base::_Base_manager::_M_not_empty_function): Fix overload for pointers. * testsuite/20_util/function/cons/57465.cc: New. diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional index 63ba777..73cddfe 100644 --- a/libstdc++-v3/include/std/functional +++ b/libstdc++-v3/include/std/functional @@ -1932,7 +1932,7 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) templatetypename _Tp static bool - _M_not_empty_function(const _Tp* __fp) + _M_not_empty_function(_Tp* const __fp) { return __fp; } templatetypename _Class, typename _Tp diff --git a/libstdc++-v3/testsuite/20_util/function/cons/57465.cc b/libstdc++-v3/testsuite/20_util/function/cons/57465.cc new file mode 100644 index 000..44413fb --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/function/cons/57465.cc @@ -0,0 +1,31 @@ +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// libstdc++/57465 + +// { dg-options -std=gnu++11 } + +#include functional +#include testsuite_hooks.h + +int main() +{ + using F = void(); + F* f = nullptr; + std::functionF x(f); + VERIFY( !x ); +}
[PATCH] Invalid unpoisoning of stack redzones on ARM
Hi all, I've recently submitted a bug report regarding invalid unpoisoning of stack frame redzones (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take a look at proposed patch (a simple one-liner) and check whether it's ok for commit? Thanks! -Yuri diff --git a/gcc/asan.c b/gcc/asan.c index 32f1837..acb00ea 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) gcc_assert ((len 3) == 0); top_label = gen_label_rtx (); - addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0))); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label);
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka hubi...@ucw.cz wrote: Why not just have probably_never_executed_bb_p return simply return false bb-frequency is non-zero (right now it does the opposite - We want to have frequencies guessed for functions that was not trained in the profiling run (that was patch I posted earlier that I think did not go in, yet). Right, but for splitting and bb layout purposes, for these statically guessed unprofiled functions we in fact don't want to do any splitting or treat the bbs as never executed (which shouldn't be a change from the status quo since all the bbs in these functions are currently 0 weight, it's only when we inline in the case of comdats that they appear colder than the surrounding code, but in fact we don't want this). The only other caller to probably_never_executed_bb_p is compute_function_frequency, but in the case of statically guessed functions they will have profile_status != PROFILE_READ and won't invoke probably_never_executed_bb_p. But re-reading our most recent exchange on the comdat profile issue, it sounds like you were suggesting guessing profiles for all 0-weight functions early, then dropping them from PROFILE_READ to PROFILE_GUESSED only once we determine in ipa-inline that there is a potentially non-zero call path to them. In that case with the change I describe above to probably_never_executed_bb_p, the 0-weight functions with 0 calls to them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would be bad as they would not be size optimized or moved into the cold section. So it seems like we want different handling of these guessed frequencies in compute_function_frequency and bb-reorder.c. Actually I think we can handle this by checking if the function entry block has a 0 count. If so, then we just look at the bb counts and not the frequencies for determining bb hotness as the frequencies would presumably have been statically-guessed. This will ensure that the cgraph node continues to be marked unlikely and size-optimized. If the function entry block has a non-zero count, then we look at both the bb count and the bb frequency - if they are both zero then the bb is probably never executed, but if either is non-zero then we should treat the block as possibly executed (which will come into play for splitting and bb layout). Teresa Currently I return true when frequency indicate that BB is executed at least in 1/4th of all executions. With the cases discussed I see we may need to reduce this threshold. In general I do not like much hard tests for 0 because meaning of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may want to make frequencies sreal, too. I suppose we may introduce --param for this. You are also right that I should update probably_never_executed_edge_p (I intended so, but obviously the code ended up in mainline accidentally). I however saw at least one case of jump threading where this trick did not help: the jump threading update confused itself by scaling via counts rather than frequencies and ended up with dropping everything to 0. This makes it more tempting to try to go with sreals for those Honza returns true when bb-frequency is 0)? Making this change removed a bunch of other failures. With this change as well, there are only 3 cases that still fail with 1 train run that pass with 100. Need to look at those. Will you look into logic of do_jump or shall I try to dive in? I can take a look, but probably won't have a chance until late this week. If you don't get to it before then I will see if I can figure out why it is applying the branch probabilities this way. Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: OMP4/cilkplus: simd clone function mangling
On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote: --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) return val; } +/* Return the default vector mangling ISA code when none is specified + in a `processor' clause. */ + +static char +ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone + ATTRIBUTE_UNUSED) +{ + return 'x'; +} I think rth was suggesting using vecsize_mangle, vecsize_modifier or something else, instead of ISA, because it won't represent the ISA on all targets. It is just some magic letter used in mangling of the simd functions. + + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Formatting. I'd say it should be bool cilk_clone = (flag_enable_cilkplus lookup_attribute (cilk plus elemental, DECL_ATTRIBUTES (new_node-symbol.decl))); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); I think it doesn't make sense to remove the attribute. + pretty_printer vars_pp; Do you really need two different pretty printers? Can't you just print _ZGV%c%c%d into pp (is pp_printf that cheap, wouldn't it be better to pp_string (pp, _ZGV), 2 pp_character + one pp_decimal_int?), and then do the loop over the args, which right now writes into vars_pp and finally pp_underscore and pp_string the normally mangled name? +/* Create a simd clone of OLD_NODE and return it. */ + +static struct cgraph_node * +simd_clone_create (struct cgraph_node *old_node) +{ + struct cgraph_node *new_node; + new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false, + NULL, NULL, simdclone); + My understanding of how IPA cloning etc. works is that you first set up various data structures describing how you change the arguments and only then actually do cgraph_function_versioning which already during the copying will do some of the transformations of the IL. But perhaps those transformations are too complicated to describe for tree-inline.c to make them for you. + tree attr = lookup_attribute (omp declare simd, + DECL_ATTRIBUTES (node-symbol.decl)); + if (!attr) +return; + do +{ + struct cgraph_node *new_node = simd_clone_create (node); + + bool inbranch_clause; + simd_clone_clauses_extract (new_node, TREE_VALUE (attr), + inbranch_clause); + simd_clone_compute_isa_and_simdlen (new_node); + simd_clone_mangle (node, new_node); As discussed on IRC, I was hoping that for OpenMP simd and selected targets (e.g. i?86-linux and x86_64-linux) we could do better than that, creating not just one or two clones as we do for Cilk+ where one can select which CPU (and thus ISA) he wants to build the clones for, but creating clones for all ISAs, and just based on command line options either emit just one of them as the really optimized one and the others just as thunks that would just call other simd clone functions or the normal function possibly several times. Jakub
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
On Fri, Sep 27, 2013 at 06:10:41PM +0400, Yury Gribov wrote: Hi all, I've recently submitted a bug report regarding invalid unpoisoning of stack frame redzones (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take a look at proposed patch (a simple one-liner) and check whether it's ok for commit? Can you please be more verbose on why do you think it is the right fix, what exactly is the problem and why force_reg wasn't sufficient? What exactly was XEXP (shadow_mem, 0) that force_reg didn't force it into a pseudo? Also, you are missing a ChangeLog entry. diff --git a/gcc/asan.c b/gcc/asan.c index 32f1837..acb00ea 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) gcc_assert ((len 3) == 0); top_label = gen_label_rtx (); - addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0))); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label); Jakub
Add value range support into memcpy/memset expansion
Hi, this patch makes it possible to access value range info from setmem/movstr that I plan to use in i386 memcpy/memset expansion code. It is all quite straighforward except that I need to deal with cases where max size does not fit in HOST_WIDE_INT where I use maximal value as a marker. It is then translated as NULL pointer to the expander that is bit inconsistent with other places that use -1 as marker of unknown value. I also think we lose some cases because of TER replacing out the SSA_NAME by something else, but it seems to work in quite many cases. This can be probably tracked incrementally by disabling TER here or finally getting away from expanding calls via the generic route. Bootstrapped/regtested x86_64-linux, OK? Honza * doc/md.texi (setmem, movstr): Update documentation. * builtins.c (determine_block_size): New function. (expand_builtin_memcpy): Use it and pass it to emit_block_move_hints. (expand_builtin_memset_args): Use it and pass it to set_storage_via_setmem. * expr.c (emit_block_move_via_movmem): Add min_size/max_size parameters; update call to expander. (emit_block_move_hints): Add min_size/max_size parameters. (clear_storage_hints): Likewise. (set_storage_via_setmem): Likewise. (clear_storage): Update. * expr.h (emit_block_move_hints, clear_storage_hints, set_storage_via_setmem): Update prototype. Index: doc/md.texi === --- doc/md.texi (revision 202968) +++ doc/md.texi (working copy) @@ -5198,6 +5198,9 @@ destination and source strings are opera the expansion of this pattern should store in operand 0 the address in which the @code{NUL} terminator was stored in the destination string. +This patern has also several optional operands that are same as in +@code{setmem}. + @cindex @code{setmem@var{m}} instruction pattern @item @samp{setmem@var{m}} Block set instruction. The destination string is the first operand, @@ -5217,6 +5220,8 @@ respectively. The expected alignment di in a way that the blocks are not required to be aligned according to it in all cases. This expected alignment is also in bytes, just like operand 4. Expected size, when unknown, is set to @code{(const_int -1)}. +Operand 7 is the minimal size of the block and operand 8 is the +maximal size of the block (NULL if it can not be represented as CONST_INT). The use for multiple @code{setmem@var{m}} is as for @code{movmem@var{m}}. Index: builtins.c === --- builtins.c (revision 202968) +++ builtins.c (working copy) @@ -3070,6 +3070,51 @@ builtin_memcpy_read_str (void *data, HOS return c_readstr (str + offset, mode); } +/* LEN specify length of the block of memcpy/memset operation. + Figure out its range and put it into MIN_SIZE/MAX_SIZE. */ + +static void +determine_block_size (tree len, rtx len_rtx, + unsigned HOST_WIDE_INT *min_size, + unsigned HOST_WIDE_INT *max_size) +{ + if (CONST_INT_P (len_rtx)) +{ + *min_size = *max_size = UINTVAL (len_rtx); + return; +} + else +{ + double_int min, max; + if (TREE_CODE (len) == SSA_NAME + get_range_info (len, min, max) == VR_RANGE) + { + if (min.fits_uhwi ()) + *min_size = min.to_uhwi (); + else + *min_size = 0; + if (max.fits_uhwi ()) + *max_size = max.to_uhwi (); + else + *max_size = (HOST_WIDE_INT)-1; + } + else + { + if (host_integerp (TYPE_MIN_VALUE (TREE_TYPE (len)), 1)) + *min_size = tree_low_cst (TYPE_MIN_VALUE (TREE_TYPE (len)), 1); + else + *min_size = 0; + if (host_integerp (TYPE_MAX_VALUE (TREE_TYPE (len)), 1)) + *max_size = tree_low_cst (TYPE_MAX_VALUE (TREE_TYPE (len)), 1); + else + *max_size = GET_MODE_MASK (GET_MODE (len_rtx)); + } +} + gcc_checking_assert (*max_size = + (unsigned HOST_WIDE_INT) + GET_MODE_MASK (GET_MODE (len_rtx))); +} + /* Expand a call EXP to the memcpy builtin. Return NULL_RTX if we failed, the caller should emit a normal call, otherwise try to get the result in TARGET, if convenient (and in @@ -3092,6 +3137,8 @@ expand_builtin_memcpy (tree exp, rtx tar rtx dest_mem, src_mem, dest_addr, len_rtx; HOST_WIDE_INT expected_size = -1; unsigned int expected_align = 0; + unsigned HOST_WIDE_INT min_size; + unsigned HOST_WIDE_INT max_size; /* If DEST is not a pointer type, call the normal function. */ if (dest_align == 0) @@ -3111,6 +3158,7 @@ expand_builtin_memcpy (tree exp, rtx tar dest_mem = get_memory_rtx (dest, len); set_mem_align (dest_mem, dest_align); len_rtx = expand_normal (len); +
Re: Generic tuning in x86-tune.def 1/2
On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. Can we drop generic on X86_TUNE_PAD_RETURNS? (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic. (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic. -- H.J.
Re: [ping] [PATCH] Silence an unused variable warning
On 13-09-27 4:55 AM, Dodji Seketeli wrote: Let's CC Vladimir on this easy one. Cheers. All targets I know have ELIMINABLE_REGS defined. Therefore it was not caught before. . The patch is ok for me. Thanks. Jan-Benedict Glaw jbg...@lug-owl.de a écrit: On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: Hi! With the VAX target, I see this warning: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include -I../../../../gcc/gcc/../libdecnumber -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../../gcc/gcc/../libbacktrace ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’: ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable ‘value_p’ [-Wunused-variable] bool value_p; ^ [...]
Re: [PATCH] Make jump thread path carry more information
On 09/27/2013 08:42 AM, James Greenhalgh wrote: On Thu, Sep 26, 2013 at 04:26:35AM +0100, Jeff Law wrote: Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on trunk. Hi Jeff, This patch caused a regression on Arm and AArch64 in: PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution, -O3 -fomit-frame-pointer From what I can see, the only place the behaviour of the threader has changed is in this hunk: Yes. The old code was dropping the tail off the thread path; if we're seeing failures on the ARM port as a result of fixing that goof we obviously need to address them. Let me take a looksie :-) If you could pass along a .i file it'd be helpful in case I want to look at something under the debugger. jeff
Re: Generic tuning in x86-tune.def 1/2
On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. Can we drop generic on X86_TUNE_PAD_RETURNS? It is on my list for not-so-obvious changes. I tested and removed it from BDVER with intention to drop it from generic. But after furhter testing I lean towards keeping it for some extra time. I tested it on fam10 machines and it causes over 10% regressions on some benchmarks, including bzip and botan (where it is up to 4-fold regression). Missing a return on amdfam10 hardware is bad, because it causes return stack to go out of sync. At the same time I can not really measure benefits for disabling it - the code size cost is very small and runtime cost on non-amdfam10 cores is not important, too, since the function call overhead hide the extra nop quite easily. So I would incline to be apply extra care on this flag and keep it for extra release or two. Most of gcc.opensuse.org testing runs on these and adding random branch mispredictions will trash them. At the related note, would would you think of X86_TUNE_PARTIAL_FLAG_REG_STALL? I benchmarked it on my I5 notebook and it seems to have no measurable effects on spec2k6. I also did some benchmarking of the patch to disable alignments you proposed. Unforutnately I can measure slowdowns on fam10/bdver/and on botan/hand written loops even for core. I am considering to drop the branch target/function alignment and keep only loop alignment, but I did not test this yet. Honza
Re: Add value range support into memcpy/memset expansion
Nice extension. Test cases would be great to have. thanks, David On Fri, Sep 27, 2013 at 7:50 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch makes it possible to access value range info from setmem/movstr that I plan to use in i386 memcpy/memset expansion code. It is all quite straighforward except that I need to deal with cases where max size does not fit in HOST_WIDE_INT where I use maximal value as a marker. It is then translated as NULL pointer to the expander that is bit inconsistent with other places that use -1 as marker of unknown value. I also think we lose some cases because of TER replacing out the SSA_NAME by something else, but it seems to work in quite many cases. This can be probably tracked incrementally by disabling TER here or finally getting away from expanding calls via the generic route. Bootstrapped/regtested x86_64-linux, OK? Honza * doc/md.texi (setmem, movstr): Update documentation. * builtins.c (determine_block_size): New function. (expand_builtin_memcpy): Use it and pass it to emit_block_move_hints. (expand_builtin_memset_args): Use it and pass it to set_storage_via_setmem. * expr.c (emit_block_move_via_movmem): Add min_size/max_size parameters; update call to expander. (emit_block_move_hints): Add min_size/max_size parameters. (clear_storage_hints): Likewise. (set_storage_via_setmem): Likewise. (clear_storage): Update. * expr.h (emit_block_move_hints, clear_storage_hints, set_storage_via_setmem): Update prototype. Index: doc/md.texi === --- doc/md.texi (revision 202968) +++ doc/md.texi (working copy) @@ -5198,6 +5198,9 @@ destination and source strings are opera the expansion of this pattern should store in operand 0 the address in which the @code{NUL} terminator was stored in the destination string. +This patern has also several optional operands that are same as in +@code{setmem}. + @cindex @code{setmem@var{m}} instruction pattern @item @samp{setmem@var{m}} Block set instruction. The destination string is the first operand, @@ -5217,6 +5220,8 @@ respectively. The expected alignment di in a way that the blocks are not required to be aligned according to it in all cases. This expected alignment is also in bytes, just like operand 4. Expected size, when unknown, is set to @code{(const_int -1)}. +Operand 7 is the minimal size of the block and operand 8 is the +maximal size of the block (NULL if it can not be represented as CONST_INT). The use for multiple @code{setmem@var{m}} is as for @code{movmem@var{m}}. Index: builtins.c === --- builtins.c (revision 202968) +++ builtins.c (working copy) @@ -3070,6 +3070,51 @@ builtin_memcpy_read_str (void *data, HOS return c_readstr (str + offset, mode); } +/* LEN specify length of the block of memcpy/memset operation. + Figure out its range and put it into MIN_SIZE/MAX_SIZE. */ + +static void +determine_block_size (tree len, rtx len_rtx, + unsigned HOST_WIDE_INT *min_size, + unsigned HOST_WIDE_INT *max_size) +{ + if (CONST_INT_P (len_rtx)) +{ + *min_size = *max_size = UINTVAL (len_rtx); + return; +} + else +{ + double_int min, max; + if (TREE_CODE (len) == SSA_NAME + get_range_info (len, min, max) == VR_RANGE) + { + if (min.fits_uhwi ()) + *min_size = min.to_uhwi (); + else + *min_size = 0; + if (max.fits_uhwi ()) + *max_size = max.to_uhwi (); + else + *max_size = (HOST_WIDE_INT)-1; + } + else + { + if (host_integerp (TYPE_MIN_VALUE (TREE_TYPE (len)), 1)) + *min_size = tree_low_cst (TYPE_MIN_VALUE (TREE_TYPE (len)), 1); + else + *min_size = 0; + if (host_integerp (TYPE_MAX_VALUE (TREE_TYPE (len)), 1)) + *max_size = tree_low_cst (TYPE_MAX_VALUE (TREE_TYPE (len)), 1); + else + *max_size = GET_MODE_MASK (GET_MODE (len_rtx)); + } +} + gcc_checking_assert (*max_size = + (unsigned HOST_WIDE_INT) + GET_MODE_MASK (GET_MODE (len_rtx))); +} + /* Expand a call EXP to the memcpy builtin. Return NULL_RTX if we failed, the caller should emit a normal call, otherwise try to get the result in TARGET, if convenient (and in @@ -3092,6 +3137,8 @@ expand_builtin_memcpy (tree exp, rtx tar rtx dest_mem, src_mem, dest_addr, len_rtx; HOST_WIDE_INT expected_size = -1; unsigned int expected_align = 0; + unsigned HOST_WIDE_INT min_size; + unsigned HOST_WIDE_INT max_size; /* If DEST is not a pointer type, call the normal function. */
Re: Generic tuning in x86-tune.def 1/2
On Fri, Sep 27, 2013 at 8:36 AM, Jan Hubicka hubi...@ucw.cz wrote: On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. Can we drop generic on X86_TUNE_PAD_RETURNS? It is on my list for not-so-obvious changes. I tested and removed it from BDVER with intention to drop it from generic. But after furhter testing I lean towards keeping it for some extra time. I tested it on fam10 machines and it causes over 10% regressions on some benchmarks, including bzip and botan (where it is up to 4-fold regression). Missing a return on amdfam10 hardware is bad, because it causes return stack to go out of sync. At the same time I can not really measure benefits for disabling it - the code size cost is very small and runtime cost on non-amdfam10 cores is not important, too, since the function call overhead hide the extra nop quite easily. I see. So I would incline to be apply extra care on this flag and keep it for extra release or two. Most of gcc.opensuse.org testing runs on these and adding random branch mispredictions will trash them. At the related note, would would you think of X86_TUNE_PARTIAL_FLAG_REG_STALL? I benchmarked it on my I5 notebook and it seems to have no measurable effects on spec2k6. I also did some benchmarking of the patch to disable alignments you proposed. Unforutnately I can measure slowdowns on fam10/bdver/and on botan/hand written loops even for core. I am not surprised about hand written loops. Have you tried SPEC CPU rate? I am considering to drop the branch target/function alignment and keep only loop alignment, but I did not test this yet. Honza -- H.J.
Re: [PATCH] Fix libgfortran cross compile configury w.r.t newlib
On Thu, 2013-09-26 at 14:47 +0100, Marcus Shawcroft wrote: I'm in two minds about whether further sticky tape of this form is the right approach or whether the original patch should be reverted until a proper fix that does not regress the tree can be found. Thoughts? 2013-09-26 Marcus Shawcroft marcus.shawcr...@arm.com * configure.ac (AC_CHECK_FUNCS_ONCE): Make if statement dependent on gcc_no_link. Cheers /Marcus Well, I thought this patch would work for me, but it does not. It looks like gcc_no_link is set to 'no' on my target because, technically, I can link even if I don't use a linker script. I just can't find any functions. % cat x.c int main (void) { return 0; } % mips-mti-elf-gcc x.c -o x /local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld: warning: cannot find entry symbol __start; defaulting to 00400098 % echo $? 0 % cat y.c int main (void) { exit (0); } % install-mips-mti-elf/bin/mips-mti-elf-gcc y.c -o y /local/home/sellcey/nightly/install-mips-mti-elf/lib/gcc/mips-mti-elf/4.9.0/../../../../mips-mti-elf/bin/ld: warning: cannot find entry symbol __start; defaulting to 00400098 /tmp/ccdG78PN.o: In function `main': y.c:(.text+0x14): undefined reference to `exit' collect2: error: ld returned 1 exit status ubuntu-sellcey % echo $? 1 Steve Ellcey sell...@mips.com
Remove enum ssa_mode
The gimple builder no longer support normal form. The ssa_mode enum is not needed now. Committed to trunk. * gimple.h (enum ssa_mode): Remove. --- gcc/gimple.h | 9 - 1 file changed, 9 deletions(-) diff --git a/gcc/gimple.h b/gcc/gimple.h index 3047ab4..a031c8d 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -34,15 +34,6 @@ along with GCC; see the file COPYING3. If not see typedef gimple gimple_seq_node; -/* Types of supported temporaries. GIMPLE temporaries may be symbols - in normal form (i.e., regular decls) or SSA names. This enum is - used by create_gimple_tmp to tell it what kind of temporary the - caller wants. */ -enum ssa_mode { -M_SSA = 0, -M_NORMAL -}; - /* For each block, the PHI nodes that need to be rewritten are stored into these vectors. */ typedef vecgimple gimple_vec; -- 1.8.4
Re: Context sensitive type inheritance graph walking
Hi, sorry it took me so long, but it also took me quite a while to chew through. Please consider posting context diff in cases like this. Nevertheless, most of the patch is a nice improvement. On Wed, Sep 25, 2013 at 12:20:50PM +0200, Jan Hubicka wrote: Hi, this is updated version of http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00936.html It updates the type inheritance graph walking algorithm to be context sensitive. Contex is stored in ipa_polymorhic_call_context containing - OUTER_TYPE (a type of memory object that contains the object whose method is called, NULL if unknown) - OFFSET offset withing OUTER_TYPE where object is located - MAYBE_IN_CONSTRUCTION true if the object may be in construction. I.e. it is local or static variable. At the moment we do not try to disprove construciton that is partly done by ipa-prop and is planned to be merged here in future - MAYBE_DERIVED_TYPE if OUTER_TYPE object actually may be a derivation of a type. For example when OUTER_TYPE was determined from THIS parameter of a method. There is cgraph_indirect_call_info that walks GIMPLE code and attempts to determine the context of a given call. It looks for objects located in declarations (static vars/ automatic vars/parameters), objects passed by invisible references and objects passed as THIS pointers. The second two cases are new, the first case is already done by gimple_extract_devirt_binfo_from_cst and I assume we should really put the context there, rather than reconstructing it from the edge. Of course we must stop overloading the offset field for that, are there any other obstacles? Context is determined by cgraph construction code and stored in cgraph edges. In future I expect ipa-prop to manipulate and update the contextes and propagate across them, but it is not implemented. For this reason I did not add context itself into cgrpah edge, merely I added the new info and hacked ipa-prop (temporarily) to simply throw away the actual info. The highlevel idea is to make get_class_context to fill in info for known type and ancestor functions, too and then we can have function combining types + doing propagation across them replacing the current code that deals with BINFOs directly and thus can not deal with all the cases above very well. possible_polymorphic_call_targets is now bit more complex. it is split into - get_class_context this function walks the OUTER_TYPE (if present) and looks up the type of class actually containing the method being called. Previously similar logic was part of gimple_extract_devirt_binfo_from_cst. - walk_bases When type is in construction, all base types are walked and for those containing OTR_TYPE at proper memory location the corresponding virtual methods are added to list - record_binfos now walks the inner binfo from OUTER_TYPE to OTR_TYPE via get_binfo_at_offset. Bootstrapped/regtested x86_64-linux. The patch is tested by ipa-devirt9 testcase. I have extra four, but I would like to first fix the case where the devirtualization happens in TODO of early_local_passes that is not dumped anywhere. So I plan to post these incrementally once this code is hooked also into gimple folding. The patch results in 60% more devirtualizations on Firefox and 10% more speculative devirtualization. I think main component missing now is code determining dynamic type from a call to constructor. I have some prototype for this, too, I would like to discuss incrementally. I am not 100% sure how much harder tracking of dynamic type changes becomes here. Martin, does it seem to make sense? Honza * cgraph.c (cgraph_create_indirect_edge): Use get_polymorphic_call_info. * cgraph.h (cgraph_indirect_call_info): Add outer_type, maybe_in_construction and maybe_derived_type. * ipa-utils.h (ipa_polymorphic_call_context): New structure. (ipa_dummy_polymorphic_call_context): New global var. (possible_polymorphic_call_targets): Add context paramter. (dump_possible_polymorphic_call_targets): Likewise; update wrappers. (possible_polymorphic_call_target_p): Likewise. (get_polymorphic_call_info): New function. * ipa-devirt.c (ipa_dummy_polymorphic_call_context): New function. (add_type_duplicate): Remove forgotten debug output. (method_class_type): Add sanity check. (maybe_record_node): Add FINALP parameter. (record_binfo): Add OUTER_TYPE and OFFSET; walk the inner by info by get_binfo_at_offset. (possible_polymorphic_call_targets_1): Add OUTER_TYPE/OFFSET parameters; pass them to record-binfo. (polymorphic_call_target_d): Add context and FINAL. (polymorphic_call_target_hasher::hash): Hash context. (polymorphic_call_target_hasher::equal): Compare context.
Re: [PATCH] Make jump thread path carry more information
On Fri, Sep 27, 2013 at 04:32:10PM +0100, Jeff Law wrote: If you could pass along a .i file it'd be helpful in case I want to look at something under the debugger. I've opened http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 to save everyone's inboxes. Let me know if I can do anything else to help. Cheers, James
Re: [PATCH] Make jump thread path carry more information
On 09/27/2013 10:48 AM, James Greenhalgh wrote: On Fri, Sep 27, 2013 at 04:32:10PM +0100, Jeff Law wrote: If you could pass along a .i file it'd be helpful in case I want to look at something under the debugger. I've opened http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 to save everyone's inboxes. Let me know if I can do anything else to help. THanks. I'm investigating. jeff
Re: [google gcc-4_8] fix size_estimation for builtin_expect
On Fri, Sep 27, 2013 at 1:20 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Sep 27, 2013 at 12:23 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for early inlining. This patch fixes this problem. -Rong 2013-09-26 Rong Xu x...@google.com * ipa-inline-analysis.c (estimate_function_body_sizes): fix the size estimation for builtin_expect. This seems fine with an comment in the code what it is about. I also think we want to support mutiple builtin_expects in a BB so perhaps we want to have pointer set of statements to ignore? To avoid spagetti code, please just move the new logic into separate functions. Looks like this could use tree-ssa.c:walk_use_def_chains (please change its implementation as necessary, make it C++, etc. - you will be the first user again). Thanks for the suggestion. But it seems walk_use_def_chains() was removed by r201951. -Rong Richard. Honza Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 202638) +++ ipa-inline-analysis.c (working copy) @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node * /* Estimate static overhead for function prologue/epilogue and alignment. */ int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); int size = overhead; + gimple fix_expect_builtin; + /* Benefits are scaled by probability of elimination that is in range 0,2. */ basic_block bb; @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node * } } + fix_expect_builtin = NULL; for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT)) +{ + tree var = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + use_operand_p use_p; + gimple use_stmt; + bool match = false; + bool done = false; + gcc_assert (var arg); + gcc_assert (TREE_CODE (var) == SSA_NAME); + + while (TREE_CODE (arg) == SSA_NAME) +{ + gimple stmt_tmp = SSA_NAME_DEF_STMT (arg); + switch (gimple_assign_rhs_code (stmt_tmp)) +{ + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: +match = true; +done = true; +break; + case NOP_EXPR: +break; + default: +done = true; +break; +} + if (done) +break; + arg = gimple_assign_rhs1 (stmt_tmp); +} + + if (match single_imm_use (var, use_p, use_stmt)) +{ + if (gimple_code (use_stmt) == GIMPLE_COND) +{ + fix_expect_builtin = use_stmt; +} +} + + /* we should see one builtin_expert call in one bb. */ + break; +} +} + + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); int this_size = estimate_num_insns (stmt, eni_size_weights); int this_time = estimate_num_insns (stmt, eni_time_weights); int prob; struct predicate will_be_nonconstant; + if (stmt == fix_expect_builtin) +{ + this_size--; + this_time--; +} + if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, );
Re: OMP4/cilkplus: simd clone function mangling
On 09/27/13 09:23, Jakub Jelinek wrote: On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote: --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) return val; } +/* Return the default vector mangling ISA code when none is specified + in a `processor' clause. */ + +static char +ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone + ATTRIBUTE_UNUSED) +{ + return 'x'; +} I think rth was suggesting using vecsize_mangle, vecsize_modifier or something else, instead of ISA, because it won't represent the ISA on all targets. It is just some magic letter used in mangling of the simd functions. I thought he was only talking about the local vecsize_mangle() function, not the target hooks. Fair enough, I have changed all the ISA references when they can be replaced with *mangle* or something similar. + + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, +DECL_ATTRIBUTES (new_node-symbol.decl)); Formatting. I'd say it should be bool cilk_clone = (flag_enable_cilkplus lookup_attribute (cilk plus elemental, DECL_ATTRIBUTES (new_node-symbol.decl))); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); I think it doesn't make sense to remove the attribute. Done. + pretty_printer vars_pp; Do you really need two different pretty printers? Whoops, fixed. Nice catch. Can't you just print _ZGV%c%c%d into pp (is pp_printf that cheap, wouldn't it be better to pp_string (pp, _ZGV), 2 pp_character + one pp_decimal_int?), and then do the loop over the args, which right now writes into vars_pp and finally pp_underscore and pp_string the normally mangled name? pp_printf() would be cheap. It's only used for a few cloned functions in a compilation unit. I like printf. It's pretty and clean. Not using it, is like saving sex for your old age ;-). But just to keep you happy, I changed it...global maintainers are free to live their celibate monk lives as they see fit :). +/* Create a simd clone of OLD_NODE and return it. */ + +static struct cgraph_node * +simd_clone_create (struct cgraph_node *old_node) +{ + struct cgraph_node *new_node; + new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false, +NULL, NULL, simdclone); + My understanding of how IPA cloning etc. works is that you first set up various data structures describing how you change the arguments and only then actually do cgraph_function_versioning which already during the copying will do some of the transformations of the IL. But perhaps those transformations are too complicated to describe for tree-inline.c to make them for you. Sure, we can worry about that when we're actually emitting the actual clones (as discussed below), and when we start adapting the vectorizer. + tree attr = lookup_attribute (omp declare simd, + DECL_ATTRIBUTES (node-symbol.decl)); + if (!attr) +return; + do +{ + struct cgraph_node *new_node = simd_clone_create (node); + + bool inbranch_clause; + simd_clone_clauses_extract (new_node, TREE_VALUE (attr), + inbranch_clause); + simd_clone_compute_isa_and_simdlen (new_node); + simd_clone_mangle (node, new_node); As discussed on IRC, I was hoping that for OpenMP simd and selected targets (e.g. i?86-linux and x86_64-linux) we could do better than that, creating not just one or two clones as we do for Cilk+ where one can select which CPU (and thus ISA) he wants to build the clones for, but creating clones for all ISAs, and just based on command line options either emit just one of them as the really optimized one and the others just as thunks that would just call other simd clone functions or the normal function possibly several times. The thunk sounds like a good idea, long term. How about we start by emitting all the variants up-front and then we can optimize these cases later? I was thinking, in the absence of a `simdlen' clause, we can provide a target hook that returns a vector of (struct { int hw_vector_size; char vecsize_mangle }) which would gives us the different clone variants we should generate. If the user provides `simdlen', we can continue generating just one clone (or two with *inbranch) with the present generic algorithm in simd_clone_compute_vecsize_and_simdlen(). Or do you have any other ideas? But I'd like to leave the thunking business after we get the general infrastructure working. Aldy diff --git a/gcc/ChangeLog
Re: cost model patch
Please review the changes.html change and suggest better wordings if possible: ndex: htdocs/gcc-4.9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 - 1.26 +++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 - @@ -37,6 +37,7 @@ ul liAddressSanitizer, a fast memory error detector, is now available on ARM. /li +liGCC introduces a new cost model for vectorizer, called 'cheap' model. The new cost model is intenteded to minimize compile time, code size, and potential negative runtime impact introduced when vectorizer is turned on at the expense of not getting the maximum potential runtime speedup. The 'cheap' model will be the default when vectorizer is turned on at code-O2/code. To override this, use option code-fvect-cost-model=[cheap|dynamic|unlimited]/code. /ul h2New Languages and Language specific improvements/h2 thanks, David On Thu, Sep 26, 2013 at 11:09 AM, Xinliang David Li davi...@google.com wrote: On Thu, Sep 26, 2013 at 7:37 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 26, 2013 at 1:10 AM, Xinliang David Li davi...@google.com wrote: I took the liberty to pick up Richard's original fvect-cost-model patch and made some modification. What has not changed: 1) option -ftree-vect-loop-version is removed; 2) three cost models are introduced: cheap, dynamic, and unlimited; 3) unless explicitly specified, cheap model is the default at O2 (e.g. when -ftree-loop-vectorize is used with -O2), and dynamic mode is the default for O3 and FDO 4) alignment based versioning is disabled with cheap model. What has changed: 1) peeling is also disabled with cheap model; 2) alias check condition limit is reduced with cheap model, but not completely suppressed. Runtime alias check is a pretty important enabler. 3) tree if conversion changes are not included. Does this patch look reasonable? In principle yes. Note that it changes the behavior of -O2 -ftree-vectorize as -ftree-vectorize does not imply changing the default cost model. I am fine with that, but eventually this will have some testsuite fallout. This reorg would also need documenting in changes.html to make people aware of this. Here is the proposed change: Index: htdocs/gcc-4.9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-4.9/changes.html 26 Aug 2013 14:16:31 - 1.26 +++ htdocs/gcc-4.9/changes.html 26 Sep 2013 18:02:33 - @@ -37,6 +37,7 @@ ul liAddressSanitizer, a fast memory error detector, is now available on ARM. /li +liGCC introduces a new cost model for vectorizer, called 'cheap' model. The new cost model is intenteded to minimize compile time, code size, and potential negative runtime impact introduced when vectorizer is turned on at the expense of not getting the maximum potential runtime speedup. The 'cheap' model will be the default when vectorizer is turned on at code-O2/code. To override this, use option code-fvect-cost-model=[cheap|dynamic|unlimited]/code. /ul h2New Languages and Language specific improvements/h2 With completely disabling alingment peeling and alignment versioning you cut out targets that have no way of performing unaligned accesses. From looking at vect_no_align this are mips, sparc, ia64 and some arm. A compromise for them would be to allow peeling a single iteration and some alignment checks (like up to two?). Possibly. I think target owners can choose to do target specific tunings as follow up. Reducing the number of allowed alias-checks is ok, but I'd reduce it more than to 6 (was that an arbitrary number or is that the result of some benchmarking?) yes -- we found that it is not uncommon to have a loop with 2 or 3 distinct source address and 1 or 2 target address. There are also tuning opportunities. For instance, in cases where source address are derived from the same base, a consolidated alias check (against the whole access range instead of just checking cross 1-unrolled iteration dependence) can be done. I suppose all of the params could use some benchmarking to select a sweet spot in code size vs. runtime. Agree. I suppose the patch is ok as-is (if it actually works) if you provide a changelog and propose an entry for changes.html. We can tune the params for the cheap model as followup. Ok. I will do more testing and check in the patch with proper ChangeLog. The changes.html change will be done separately. thanks, David Thanks for picking this up, Richard. thanks, David
libgo patch committed: Implement reflect.MakeFunc for amd64
The Go standard library has an interesting function named reflect.MakeFunc. It takes a Go function F that accepts and returns a slice of reflect.Value, and a function type T, and returns a pointer to a function of type T that converts its arguments to reflect.Value, calls F, and converts the returned reflect.Value into the appropriate return types. In effect this is the reverse of libffi: instead of describing a function and calling it, we describe a function and permit it to be called. For gccgo I tried to implement this generically using the builtin varargs functions, but that failed because I had no way to handle the return type. Many Go functions return multiple values, which in gccgo is represented as returning a struct, and, of course, in some cases a struct is returned by passing a hidden pointer as the first argument, and in other cases is handled by splitting up the struct into different register classes. So handling this generically is essentially impossible, at least without adding some more builtin functions to somehow handle the return value, builtin functions that I couldn't figure out how to even represent. So I gave up and went for a processor-specific approach. The idea is that processor-specific assembly code will save all the relevant registers into a struct, and pass them to processor-specific Go code which will implement the calling convention. This has the advantage that I only need to deal with Go types, which in particular means no worries about vector types. This patch implements this approach for x86_64. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r 024105249263 libgo/Makefile.am --- a/libgo/Makefile.am Tue Sep 24 20:26:38 2013 -0700 +++ b/libgo/Makefile.am Fri Sep 27 08:06:13 2013 -0700 @@ -895,9 +895,21 @@ go/path/match.go \ go/path/path.go +if LIBGO_IS_X86_64 +go_reflect_makefunc_file = \ + go/reflect/makefuncgo_amd64.go +go_reflect_makefunc_s_file = \ + go/reflect/makefunc_amd64.S +else +go_reflect_makefunc_file = +go_reflect_makefunc_s_file = \ + go/reflect/makefunc_dummy.c +endif + go_reflect_files = \ go/reflect/deepequal.go \ go/reflect/makefunc.go \ + $(go_reflect_makefunc_file) \ go/reflect/type.go \ go/reflect/value.go @@ -1761,6 +1773,7 @@ os.lo \ path.lo \ reflect-go.lo \ + reflect/makefunc.lo \ regexp.lo \ runtime-go.lo \ sort.lo \ @@ -2147,6 +2160,9 @@ $(BUILDPACKAGE) reflect/check: $(CHECK_DEPS) @$(CHECK) +reflect/makefunc.lo: $(go_reflect_makefunc_s_file) + @$(MKDIR_P) reflect + $(LTCOMPILE) -c -o $@ $ .PHONY: reflect/check @go_include@ regexp.lo.dep diff -r 024105249263 libgo/go/reflect/all_test.go --- a/libgo/go/reflect/all_test.go Tue Sep 24 20:26:38 2013 -0700 +++ b/libgo/go/reflect/all_test.go Fri Sep 27 08:06:13 2013 -0700 @@ -1430,11 +1430,13 @@ } } -/* - -Not yet implemented for gccgo. - func TestMakeFunc(t *testing.T) { + switch runtime.GOARCH { + case amd64: + default: + t.Skip(MakeFunc not implemented for + runtime.GOARCH) + } + f := dummy fv := MakeFunc(TypeOf(f), func(in []Value) []Value { return in }) ValueOf(f).Elem().Set(fv) @@ -1452,8 +1454,6 @@ } } -*/ - type Point struct { x, y int } diff -r 024105249263 libgo/go/reflect/makefunc.go --- a/libgo/go/reflect/makefunc.go Tue Sep 24 20:26:38 2013 -0700 +++ b/libgo/go/reflect/makefunc.go Fri Sep 27 08:06:13 2013 -0700 @@ -7,6 +7,7 @@ package reflect import ( + runtime unsafe ) @@ -45,14 +46,33 @@ panic(reflect: call of MakeFunc with non-Func type) } + switch runtime.GOARCH { + case amd64: + default: + panic(reflect.MakeFunc not implemented for + runtime.GOARCH) + } + t := typ.common() ftyp := (*funcType)(unsafe.Pointer(t)) - _, _ = t, ftyp + // Indirect Go func value (dummy) to obtain + // actual code address. (A Go func value is a pointer + // to a C function pointer. http://golang.org/s/go11func.) + dummy := makeFuncStub + code := **(**uintptr)(unsafe.Pointer(dummy)) - panic(reflect MakeFunc not implemented) + impl := makeFuncImpl{code: code, typ: ftyp, fn: fn} + + return Value{t, unsafe.Pointer(impl), flag(Func) flagKindShift} } +// makeFuncStub is an assembly function that is the code half of +// the function returned from MakeFunc. It expects a *callReflectFunc +// as its context register, and its job is to invoke callReflect(ctxt, frame) +// where ctxt is the context register and frame is a pointer to the first +// word in the passed-in argument frame. +func makeFuncStub() + // makeMethodValue converts v from the rcvr+method index representation // of a method value to an actual method func value, which is // basically the receiver value with a special bit set, into a true diff -r 024105249263 libgo/go/reflect/makefunc_amd64.S --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/libgo/go/reflect/makefunc_amd64.S Fri Sep 27 08:06:13 2013 -0700 @@ -0,0 +1,107 @@ +# Copyright 2013 The Go Authors. All rights reserved. +# Use of
Re: [google gcc-4_8] fix size_estimation for builtin_expect
On Thu, Sep 26, 2013 at 3:23 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for early inlining. This patch fixes this problem. -Rong 2013-09-26 Rong Xu x...@google.com * ipa-inline-analysis.c (estimate_function_body_sizes): fix the size estimation for builtin_expect. This seems fine with an comment in the code what it is about. I also think we want to support mutiple builtin_expects in a BB so perhaps we want to have pointer set of statements to ignore? Thanks for feedback. I'll add the comment and split the code into a new function. You have a good pointer about multiple builtin_expect. I think I need to remove the early exit (the break stmt). But I would argue that using pointer_set is probably not necessary. Here is the reasoning: The typical usage of builtin_expect is like: if (__builtin_expect (a=b, 1)) { ... } The IR is: t1 = a = b; t2 = (long int) t1; t3 = __builtin_expect (t2, 1); if (t3 != 0) goto ... Without the builtin, the IR is if (a=b) goto... The code in the patch check the use of the builtin_expect return value, to see if it's used in a COND stmt. So even there are multiple builtins, only one can match. -Rong To avoid spagetti code, please just move the new logic into separate functions. Honza Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 202638) +++ ipa-inline-analysis.c (working copy) @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node * /* Estimate static overhead for function prologue/epilogue and alignment. */ int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); int size = overhead; + gimple fix_expect_builtin; + /* Benefits are scaled by probability of elimination that is in range 0,2. */ basic_block bb; @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node * } } + fix_expect_builtin = NULL; for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT)) +{ + tree var = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + use_operand_p use_p; + gimple use_stmt; + bool match = false; + bool done = false; + gcc_assert (var arg); + gcc_assert (TREE_CODE (var) == SSA_NAME); + + while (TREE_CODE (arg) == SSA_NAME) +{ + gimple stmt_tmp = SSA_NAME_DEF_STMT (arg); + switch (gimple_assign_rhs_code (stmt_tmp)) +{ + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: +match = true; +done = true; +break; + case NOP_EXPR: +break; + default: +done = true; +break; +} + if (done) +break; + arg = gimple_assign_rhs1 (stmt_tmp); +} + + if (match single_imm_use (var, use_p, use_stmt)) +{ + if (gimple_code (use_stmt) == GIMPLE_COND) +{ + fix_expect_builtin = use_stmt; +} +} + + /* we should see one builtin_expert call in one bb. */ + break; +} +} + + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); int this_size = estimate_num_insns (stmt, eni_size_weights); int this_time = estimate_num_insns (stmt, eni_time_weights); int prob; struct predicate will_be_nonconstant; + if (stmt == fix_expect_builtin) +{ + this_size--; + this_time--; +} + if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, );
Re: Commit: MSP430: Pass -md on to assembler
On Sep 27, 2013, at 1:48 AM, nick clifton ni...@redhat.com wrote: OK by me, although I cannot approve that particular patch. I know, the intent is for someone that can, to approve it. But I ran into a very strange problem. With your PARTIAL_INT_MODE_NAME patch applied GCC started erroneously eliminating NULL function pointer checks! This was particularly noticeable in libgcc/crtstuff.c where for example: Index: stor-layout.c === --- stor-layout.c (revision 202634) +++ stor-layout.c (working copy) @@ -2821,7 +2821,7 @@ get_mode_bounds (enum machine_mode mode, enum machine_mode target_mode, rtx *mmin, rtx *mmax) { - unsigned size = GET_MODE_BITSIZE (mode); + unsigned size = GET_MODE_PRECISION (mode); unsigned HOST_WIDE_INT min_val, max_val; gcc_assert (size = HOST_BITS_PER_WIDE_INT); fixes this problem. The problem is that we treat the maximum value of PSImode as -1, and then later we do: case EQ: /* x == y is always false for y out of range. */ =if (val mmin || val mmax) B return const0_rtx; break; and the answer to the question is 0 -1, is no, so the entire test is eliminated as never able to be true. After the fix, in your case, we get: (gdb) p mmin $72 = -524288 (gdb) p mmax $73 = 524287 and the test becomes if (0 -524288 || 0 524287), which is not true, then the test isn't eliminated as never true. Here, we see the test that protects this code uses GET_MODE_PRECISION: (gdb) macro expand HWI_COMPUTABLE_MODE_P (PSImode) expands to: enum mode_class) mode_class[PSImode]) == MODE_INT || ((enum mode_class) mode_class[PSIm\ ode]) == MODE_PARTIAL_INT) mode_precision[PSImode] = (8 * 8)) so, clearly, someone was thinking about GET_MODE_PRECISION being in range, not GET_MODE_BITSIZE. Ok? [ for the rtl people ]
Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS
Can the sh people weigh in on this? Are the PSI and PDI precisions 32 and 64? On Sep 17, 2013, at 10:24 AM, Mike Stump mikest...@comcast.net wrote: On Sep 16, 2013, at 8:41 PM, DJ Delorie d...@redhat.com wrote: m32c's PSImode is 24-bits, why does it have 32 in the macro? /* 24-bit pointers, in 32-bit units */ -PARTIAL_INT_MODE (SI); +PARTIAL_INT_MODE_NAME (SI, 32, PSI); Sorry, fingers copied the wrong number. Thanks for the catch. partial-1.diffs.txt
[google/4_8] Disable -g/-gmlt during LIPO instrumentation
David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation
On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote: David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} Unrelated format change? Otherwise looks ok. thanks, David if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH v4 04/20] add configury
Hi Tom, On Mon, 23 Sep 2013, Tom Tromey wrote: This adds the configury needed for automatic dependency tracking. It also adds some bits to the Makefile so we can begin converting (removing) explicit dependencies. * Makefile.in (CCDEPMODE, DEPDIR, depcomp, COMPILE.base) (COMPILE, POSTCOMPILE): New variables. (.cc.o .c.o): Use COMPILE, POSTCOMPILE. I believe this may be breaking all my testers on FreeBSD (i386-unknown-freebsd10.0 for example). The timing of when this patchset went in fits pretty much when my builds started to break and I am wondering about some code. Here is the failure mode: gmake[2]: Entering directory `/scratch/tmp/gerald/OBJ-0927-1848/gcc' g++ -c -DIN_GCC_FRONTEND -g -O2 -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -Ic -I/scratch/tmp/gerald/gcc-HEAD/gcc ...[-I options]... -o c/c-lang.o -MT c/c-lang.o -MMD -MP -MF c/.deps/c-lang.TPo /scratch/tmp/gerald/gcc-HEAD/gcc/c/c-lang.c cc1plus: error: unrecognized command line option -Wno-narrowing gmake[2]: *** [c/c-lang.o] Error 1 gmake[1]: *** [install-gcc] Error 2 gmake: *** [install] Error 2 The issue is the invocation of g++ (the old system compiler, not what we built) with -Wno-narrowing (a new option). And looking at the code, I see +COMPILE.base = $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) -o $@ +ifeq ($(CCDEPMODE),depmode=gcc3) +# Note a subtlety here: we use $(@D) for the directory part, to make +# things like the go/%.o rule work properly; but we use $(*F) for the +# file part, as we just want the file part of the stem, not the entire +# file name. +COMPILE = $(COMPILE.base) -MT $@ -MMD -MP -MF $(@D)/$(DEPDIR)/$(*F).TPo +POSTCOMPILE = @mv $(@D)/$(DEPDIR)/$(*F).TPo $(@D)/$(DEPDIR)/$(*F).Po +else +COMPILE = source='$' object='$@' libtool=no \ +DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) $(COMPILE.base) +POSTCOMPILE = +endif Where does $(ALL_COMPILERFLAGS) compile from? If I read the code right, we do disable these warnings for the stage1 build. However, the install compiler is the same -- so I guess we should disable warnings there, too? Gerald
[Patch, fortran, doc, committed] Fix DATE_AND_TIME example
Hello, a format string in the example for DATE_AND_TIME contained a syntax error. Committed as obvious. 2013-09-27 Janne Blomqvist j...@gcc.gnu.org * intrinsic.texi (DATE_AND_TIME): Fix example. Index: intrinsic.texi === --- intrinsic.texi (revision 202983) +++ intrinsic.texi (working copy) @@ -3461,7 +3461,7 @@ program test_time_and_date call date_and_time(TIME=time) call date_and_time(VALUES=values) print '(a,2x,a,2x,a)', date, time, zone -print '(8i5))', values +print '(8i5)', values end program test_time_and_date @end smallexample -- Janne Blomqvist
Re: Generic tuning in x86-tune.def 1/2
Jan Hubicka hubi...@ucw.cz writes: I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. When tuning for Intel SandyBridge+ it would be actually interesting to use a 32 byte window instead of 16 bytes. The decoded icache has a 3 jump limit per 32byte. So if K8 support is dropped from generic could just change it over to 32 bytes there? -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation
On Fri, Sep 27, 2013 at 12:01 PM, Xinliang David Li davi...@google.com wrote: On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote: David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} Unrelated format change? Well, related in the sense that it messed up my editor's auto-indention logic when making the change below. Ok to include the formatting fix if I mention it in the commit log? Teresa Otherwise looks ok. thanks, David if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation
ok. David On Fri, Sep 27, 2013 at 1:03 PM, Teresa Johnson tejohn...@google.com wrote: On Fri, Sep 27, 2013 at 12:01 PM, Xinliang David Li davi...@google.com wrote: On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote: David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} Unrelated format change? Well, related in the sense that it messed up my editor's auto-indention logic when making the change below. Ok to include the formatting fix if I mention it in the commit log? Teresa Otherwise looks ok. thanks, David if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation
I don't quite understand here. We use the profile-generate memory consumption to estimate the profile use memory consumption. we still have -g/-gmlt in profile-use compilation. Will this change effectively under estimate the memory use in the use phrase? -Rong On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote: David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On Fri, Sep 27, 2013 at 9:37 AM, Jonathan Wakely jwakely@gmail.com wrote: Ah I see. I definitely agree it's good to accept that instead of being unnecessarily strict, but other people will want the option of strict conformance, so I think we can please everyone with something like: else { #ifdef __STRICT_ANSI__ __throw_regex_error(regex_constants::error_escape); #else _M_token = _S_token_ord_char; _M_value.assign(1, __c); #endif } Sorry for late . Do I need to bootstrap again? -- Tim Shen a.patch Description: Binary data
Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation
The issue is that building the instrumented binary with and without, say, -gmlt, may result in different module grouping. Teresa On Fri, Sep 27, 2013 at 1:18 PM, Rong Xu x...@google.com wrote: I don't quite understand here. We use the profile-generate memory consumption to estimate the profile use memory consumption. we still have -g/-gmlt in profile-use compilation. Will this change effectively under estimate the memory use in the use phrase? -Rong On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote: David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [google/4_8] Disable -g/-gmlt during LIPO instrumentation
The key is that grouping results should not dependent on the presence of -g flags. The downside of the patch is that it may slightly underestimate the memory pressure at profile-use time, but that should not be a big problem. David On Fri, Sep 27, 2013 at 1:18 PM, Rong Xu x...@google.com wrote: I don't quite understand here. We use the profile-generate memory consumption to estimate the profile use memory consumption. we still have -g/-gmlt in profile-use compilation. Will this change effectively under estimate the memory use in the use phrase? -Rong On Fri, Sep 27, 2013 at 11:50 AM, Teresa Johnson tejohn...@google.com wrote: David and Rong, The following patch will disable -g/-gmlt when instrumenting for LIPO since they will affect the recorded ggc_memory used in the module grouping decision. Added -fripa-allow-debug to override this behavior. Passes regression tests and simple tests on the new functionality. Ok for google/4_8? Teresa 2013-09-27 Teresa Johnson tejohn...@google.com * opts.c (finish_options): Suppress -g/-gmlt when we are building for LIPO instrumention. * common.opt (fripa-allow-debug): New option. Index: opts.c === --- opts.c (revision 202976) +++ opts.c (working copy) @@ -799,7 +799,7 @@ finish_options (struct gcc_options *opts, struct g #endif if (!opts-x_flag_fat_lto_objects !HAVE_LTO_PLUGIN) error_at (loc, -fno-fat-lto-objects are supported only with linker plugin.); -} +} if ((opts-x_flag_lto_partition_balanced != 0) + (opts-x_flag_lto_partition_1to1 != 0) + (opts-x_flag_lto_partition_none != 0) = 1) { @@ -852,6 +852,26 @@ finish_options (struct gcc_options *opts, struct g /* Turn on -ffunction-sections when -freorder-functions=* is used. */ if (opts-x_flag_reorder_functions 1) opts-x_flag_function_sections = 1; + + /* LIPO module grouping depends on the memory consumed by the profile-gen + parsing phase, recorded in a per-module ggc_memory field of the module + info struct. This will be higher when debug generation is on via + -g/-gmlt, which causes the FE to generate debug structures that will + increase the ggc_total_memory. This could in theory cause the module + groups to be slightly more conservative. Disable -g/-gmlt under + -fripa -fprofile-generate, but provide an option to override this + in case we actually need to debug an instrumented binary. */ + if (opts-x_profile_arc_flag + opts-x_flag_dyn_ipa + opts-x_debug_info_level != DINFO_LEVEL_NONE + !opts-x_flag_dyn_ipa_allow_debug) +{ + inform (loc, + Debug generation via -g option disabled under -fripa + -fprofile-generate (use -fripa-allow-debug to override)); + set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, 0, opts, opts_set, + loc); +} } #define LEFT_COLUMN27 Index: common.opt === --- common.opt (revision 202976) +++ common.opt (working copy) @@ -1155,6 +1155,10 @@ fripa Common Report Var(flag_dyn_ipa) Perform Dynamic Inter-Procedural Analysis. +fripa-allow-debug +Common Report Var(flag_dyn_ipa_allow_debug) Init(0) +Allow -g enablement for -fripa -fprofile-generate compiles. + fripa-disallow-asm-modules Common Report Var(flag_ripa_disallow_asm_modules) Don't import an auxiliary module if it contains asm statements -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH v4 04/20] add configury
Gerald And looking at the code, I see Gerald +COMPILE.base = $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) -o $@ [...] Gerald Where does $(ALL_COMPILERFLAGS) compile from? Look a little further down in the patch: .cc.o .c.o: - $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $ $(OUTPUT_OPTION) + $(COMPILE) $ + $(POSTCOMPILE) ... that is, the patches didn't change this part. ALL_COMPILERFLAGS was used before and it is used now. I don't think this series touched how this variable is computed, either. Gerald If I read the code Gerald right, we do disable these warnings for the stage1 build. However, Gerald the install compiler is the same -- so I guess we should disable Gerald warnings there, too? I'm afraid I don't understand. If I were debugging this then I think I would start by looking in config.log to see why the compiler accepted -Wno-narrowing. Actually, I looked in my own config.log and I see it isn't very informative: configure:6280: checking whether gcc supports -Wnarrowing configure:6306: result: yes I suppose I would hack in a set -x / set +x pair into configure around the warning-checking section and then see what happens. Tom
Re: [Patch] Let ordinary escaping in POSIX regex be valid
Hi Tim Shen timshe...@gmail.com ha scritto: Do I need to bootstrap again? Nah, only double check that the testcase you are un-xfail-ing uses -std=gnu++11, otherwise will not pass ;) Paolo
Re: [google gcc-4_8] alternate hirate for builtin_expert
On Thu, Sep 26, 2013 at 3:27 PM, Jan Hubicka hubi...@ucw.cz wrote: Hi, Current default probably for builtin_expect is 0.9996. This makes the freq of unlikely bb very low (4), which suppresses the inlining of any calls within those bb. We used FDO data to measure the branch probably for the branch annotated with builtin_expert. For google internal benchmarks, the weight average (the profile count value as the weight) is 0.9081. Linux kernel is another program that is heavily annotated with builtin-expert. We measured its weight average as 0.8717, using google search as the workload. This is interesting. I was always unsure if programmers use builtin_expect more often to mark an impossible paths (as those leading to crash) or just unlikely paths. Your data seems to suggest the second. I don't find an official guideline on how this should be used. Maybe some documentation in gcc user-guide helps. We probably also ought to get analyze_brprob working again. It was always useful to get such a data. This patch sets the alternate hirate probability for builtin_expert to 90%. With the alternate hirate, we measured performance improvement for google benchmarks and Linux kernel. -Rong 2013-09-26 Rong Xu x...@google.com * params.def (DEFPARAM): New. * params.def: New. * predict.c (tree_predict_by_opcode): Alternate probablity hirate for builtin_expect. This also seems resonable for mainline. Please add a comment to the parameter explaining how the value was determined. Please also add invoke.texi documentation. Will do. For patches that seems resonable for mainline in FDO/IPA area, i would apprechiate if you just added me to CC, so I do not lose track of them. Will do. Honza
libgo patch committed: Implement reflect.MakeFunc for 386
Following up on my earlier patch, this patch implements the reflect.MakeFunc function for 386. Tom Tromey pointed out to me that the libffi closure support can probably be used for this. I was not aware of that support. It supports a lot more processors, and I should probably start using it. The approach I am using does have a couple of advantages: it's more efficient, and it doesn't require any type of writable executable memory. I can get away with that because indirect calls in Go always pass a closure value. So even when and if I do change to using libffi, I might still keep this code for amd64 and 386. Anyhow, for this patch, bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r 14a32fb52e24 libgo/Makefile.am --- a/libgo/Makefile.am Fri Sep 27 10:46:09 2013 -0700 +++ b/libgo/Makefile.am Fri Sep 27 14:30:27 2013 -0700 @@ -901,10 +901,17 @@ go_reflect_makefunc_s_file = \ go/reflect/makefunc_amd64.S else +if LIBGO_IS_386 +go_reflect_makefunc_file = \ + go/reflect/makefuncgo_386.go +go_reflect_makefunc_s_file = \ + go/reflect/makefunc_386.S +else go_reflect_makefunc_file = go_reflect_makefunc_s_file = \ go/reflect/makefunc_dummy.c endif +endif go_reflect_files = \ go/reflect/deepequal.go \ diff -r 14a32fb52e24 libgo/go/reflect/all_test.go --- a/libgo/go/reflect/all_test.go Fri Sep 27 10:46:09 2013 -0700 +++ b/libgo/go/reflect/all_test.go Fri Sep 27 14:30:27 2013 -0700 @@ -1432,7 +1432,7 @@ func TestMakeFunc(t *testing.T) { switch runtime.GOARCH { - case amd64: + case amd64, 386: default: t.Skip(MakeFunc not implemented for + runtime.GOARCH) } diff -r 14a32fb52e24 libgo/go/reflect/makefunc.go --- a/libgo/go/reflect/makefunc.go Fri Sep 27 10:46:09 2013 -0700 +++ b/libgo/go/reflect/makefunc.go Fri Sep 27 14:30:27 2013 -0700 @@ -47,7 +47,7 @@ } switch runtime.GOARCH { - case amd64: + case amd64, 386: default: panic(reflect.MakeFunc not implemented for + runtime.GOARCH) } diff -r 14a32fb52e24 libgo/go/reflect/makefunc_386.S --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/libgo/go/reflect/makefunc_386.S Fri Sep 27 14:30:27 2013 -0700 @@ -0,0 +1,111 @@ +# Copyright 2013 The Go Authors. All rights reserved. +# Use of this source code is governed by a BSD-style +# license that can be found in the LICENSE file. + +# MakeFunc 386 assembly code. + + .global reflect.makeFuncStub + +#ifdef __ELF__ + .type reflect.makeFuncStub,@function +#endif + +reflect.makeFuncStub: + .cfi_startproc + + # Go does not provide any equivalent to the regparm function + # attribute, so on Go we do not need to worry about passing + # parameters in registers. We just pass a pointer to the + # arguments on the stack. + # + # We do need to pick up the return values, though, so we pass + # a pointer to a struct that looks like this. + # struct { + # esp uint32 // 0x0 + # eax uint32 // 0x4 + # st0 uint64 // 0x8 + # } + + pushl %ebp + .cfi_def_cfa_offset 8 + .cfi_offset %ebp, -8 + movl %esp, %ebp + .cfi_def_cfa_register %ebp + + pushl %ebx # In case this is PIC. + + subl $36, %esp # Enough for args and to align stack. + .cfi_offset %ebx, -12 + +#ifdef __PIC__ + call __x86.get_pc_thunk.bx + addl $_GLOBAL_OFFSET_TABLE_, %ebx +#endif + + leal 8(%ebp), %eax # Set esp field in struct. + movl %eax, -24(%ebp) + +#ifdef __PIC__ + call __go_get_closure@PLT +#else + call __go_get_closure +#endif + + movl %eax, 4(%esp) + + leal -24(%ebp), %eax + movl %eax, (%esp) + +#ifdef __PIC__ + call reflect.MakeFuncStubGo@PLT +#else + call reflect.MakeFuncStubGo +#endif + + # Set return registers. + + movl -20(%ebp), %eax + fldl -16(%ebp) + +#ifdef __SSE2__ + # In case we are compiling with -msseregparm. This won't work + # correctly if only SSE1 is supported, but that seems unlikely. + movsd -16(%ebp), %xmm0 +#endif + + addl $36, %esp + popl %ebx + .cfi_restore %ebx + popl %ebp + .cfi_restore %ebp + .cfi_def_cfa %esp, 4 + + ret + .cfi_endproc + +#ifdef __ELF__ + .size reflect.makeFuncStub, . - reflect.makeFuncStub +#endif + +#ifdef __PIC__ + .section .text.__x86.get_pc_thunk.bx,axG,@progbits,__x86.get_pc_thunk.bx,comdat + .globl __x86.get_pc_thunk.bx + .hidden __x86.get_pc_thunk.bx +#ifdef __ELF__ + .type __x86.get_pc_thunk.bx, @function +#endif +__x86.get_pc_thunk.bx: + .cfi_startproc + movl (%esp), %ebx + ret + .cfi_endproc +#ifdef __ELF__ + .size __x86.get_pc_thunk.bx, . - __x86.get_pc_thunk.bx +#endif +#endif + +#ifdef __ELF__ + .section .note.GNU-stack,,@progbits + .section .note.GNU-split-stack,,@progbits + .section .note.GNU-no-split-stack,,@progbits +#endif diff -r 14a32fb52e24 libgo/go/reflect/makefuncgo_386.go --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/libgo/go/reflect/makefuncgo_386.go Fri Sep 27 14:30:27 2013 -0700 @@ -0,0 +1,135 @@ +// Copyright 2013 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// MakeFunc 386
[Google] Adjust comdat-sharing-probability param for -Os
This patch increases comdat-sharing-probability to 80 under -Os. This reduces the amount of inlining and helps internal benchmarks. Unfortunately, this causes slight regression on spec 2006. Ok for google branches if all tests pass? - Easwaran comdat_sharing.patch Description: Binary data
libgo patch committed: Copy stack values onto heap
I realized that in the amd64 implementation of MakeFunc I forgot that it's not OK to just take the address of a value on the stack, since the function might hang onto the address. The value needs to be copied onto the heap first. This patch implements that. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r cec0db170d82 libgo/go/reflect/makefuncgo_amd64.go --- a/libgo/go/reflect/makefuncgo_amd64.go Fri Sep 27 14:31:06 2013 -0700 +++ b/libgo/go/reflect/makefuncgo_amd64.go Fri Sep 27 15:11:00 2013 -0700 @@ -431,8 +431,14 @@ func amd64Memarg(in []Value, ap uintptr, rt *rtype) ([]Value, uintptr) { ap = align(ap, ptrSize) ap = align(ap, uintptr(rt.align)) - p := Value{rt, unsafe.Pointer(ap), flag(rt.Kind()flagKindShift) | flagIndir} - in = append(in, p) + + // We have to copy the argument onto the heap in case the + // function hangs onto the reflect.Value we pass it. + p := unsafe_New(rt) + memmove(p, unsafe.Pointer(ap), rt.size) + + v := Value{rt, p, flag(rt.Kind()flagKindShift) | flagIndir} + in = append(in, v) ap += rt.size return in, ap }
Merge from GCC 4.8 branch to gccgo branch
I've merged revision 202996 from the GCC 4.8 branch to the gccgo branch. Ian
[PATCH] alternative hirate for builtin_expert
Hi, Current default probability for builtin_expect is 0.9996. This makes the freq of unlikely bb very low (4), which suppresses the inlining of any calls within those bb. We used FDO data to measure the branch probably for the branch annotated with builtin_expert. For google internal benchmarks, the weight average (the profile count value as the weight) is 0.9081. Linux kernel is another program that is heavily annotated with builtin-expert. We measured its weight average as 0.8717, using google search as the workload. This patch sets the alternate hirate probability for builtin_expert to 90%. With the alternate hirate, we measured performance improvement for google benchmarks and Linux kernel. An earlier discussion is https://mail.google.com/mail/u/0/?pli=1#label/gcc-paches/1415c5910054630b This new patch is for the trunk and addresses Honza's comments. Honza: this new probability is off by default. When we backport to google branch we will make it the default. Let me know if you want to do the same here. Thanks, -Rong p2_patch Description: Binary data
[PATCH] fix size_estimation for builtin_expect
Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for inlining. This patch fixes this problem. An earlier discussion of this patch is https://mail.google.com/mail/u/0/?pli=1#label/gcc-paches/1415c590ad8c5315 This new patch address Honza's comments. It passes the bootstrap and regression. Richard: I looked at your tree-ssa.c:walk_use_def_chains() code. I think that's an overkill for this simple problem. Your code is mostly dealing with the recursively walk the PHI node to find the real def stmts. Here the traversal is within one BB and I may need to continue on multiple real assignment. Calling walk_use_def_chains probably only uses the SSA_NAME_DEF_STMT() part of the code. Thanks, -Rong p1_patch Description: Binary data
Re: [Google] Adjust comdat-sharing-probability param for -Os
d.growth -= (info-size * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY)) + 50) / 100; What is the purpose of '50' here? The patch is fine for Google branch. Other tunings to think about -- I think the sharing probability should not be a fixed value -- but depending on the function's charateristics -- such as size, number of callsites etc. For instance, for small leaf comdat functions, the sharing probability will be small. David On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman era...@google.com wrote: This patch increases comdat-sharing-probability to 80 under -Os. This reduces the amount of inlining and helps internal benchmarks. Unfortunately, this causes slight regression on spec 2006. Ok for google branches if all tests pass? - Easwaran
Re: [Google] Adjust comdat-sharing-probability param for -Os
On Fri, Sep 27, 2013 at 4:08 PM, Xinliang David Li davi...@google.com wrote: d.growth -= (info-size * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY)) + 50) / 100; What is the purpose of '50' here? Rounding after division by 100. The patch is fine for Google branch. Other tunings to think about -- I think the sharing probability should not be a fixed value -- but depending on the function's charateristics -- such as size, number of callsites etc. For instance, for small leaf comdat functions, the sharing probability will be small. David On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman era...@google.com wrote: This patch increases comdat-sharing-probability to 80 under -Os. This reduces the amount of inlining and helps internal benchmarks. Unfortunately, this causes slight regression on spec 2006. Ok for google branches if all tests pass? - Easwaran
[PATCH] Relax the requirement of reduction pattern in GCC vectorizer.
The current GCC vectorizer requires the following pattern as a simple reduction computation: loop_header: a1 = phi a0, a2 a3 = ... a2 = operation (a3, a1) But a3 can also be defined outside of the loop. For example, the following loop can benefit from vectorization but the GCC vectorizer fails to vectorize it: int foo(int v) { int s = 1; ++v; for (int i = 0; i 10; ++i) s *= v; return s; } This patch relaxes the original requirement by also considering the following pattern: a3 = ... loop_header: a1 = phi a0, a2 a2 = operation (a3, a1) A test case is also added. The patch is tested on x86-64. thanks, Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 39c786e..45c1667 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2013-09-27 Cong Hou co...@google.com + + * tree-vect-loop.c: Relax the requirement of the reduction + pattern so that one operand of the reduction operation can + come from outside of the loop. + 2013-09-25 Tom Tromey tro...@redhat.com * Makefile.in (PARTITION_H, LTO_SYMTAB_H, COMMON_TARGET_DEF_H) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 09644d2..90496a2 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-09-27 Cong Hou co...@google.com + + * gcc.dg/vect/vect-reduc-pattern-3.c: New test. + 2013-09-25 Marek Polacek pola...@redhat.com PR sanitizer/58413 diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 2871ba1..3c51c3b 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2091,6 +2091,13 @@ vect_is_slp_reduction (loop_vec_info loop_info, gimple phi, gimple first_stmt) a3 = ... a2 = operation (a3, a1) + or + + a3 = ... + loop_header: + a1 = phi a0, a2 + a2 = operation (a3, a1) + such that: 1. operation is commutative and associative and it is safe to change the order of the computation (if CHECK_REDUCTION is true) @@ -2451,6 +2458,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, if (def2 def2 == phi (code == COND_EXPR || !def1 || gimple_nop_p (def1) + || !flow_bb_inside_loop_p (loop, gimple_bb (def1)) || (def1 flow_bb_inside_loop_p (loop, gimple_bb (def1)) (is_gimple_assign (def1) || is_gimple_call (def1) @@ -2469,6 +2477,7 @@ vect_is_simple_reduction_1 (loop_vec_info loop_info, gimple phi, if (def1 def1 == phi (code == COND_EXPR || !def2 || gimple_nop_p (def2) + || !flow_bb_inside_loop_p (loop, gimple_bb (def2)) || (def2 flow_bb_inside_loop_p (loop, gimple_bb (def2)) (is_gimple_assign (def2) || is_gimple_call (def2) diff --git gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c new file mode 100644 index 000..06a9416 --- /dev/null +++ gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-3.c @@ -0,0 +1,41 @@ +/* { dg-require-effective-target vect_int } */ + +#include stdarg.h +#include tree-vect.h + +#define N 10 +#define RES 1024 + +/* A reduction pattern in which there is no data ref in + the loop and one operand is defined outside of the loop. */ + +__attribute__ ((noinline)) int +foo (int v) +{ + int i; + int result = 1; + + ++v; + for (i = 0; i N; i++) +result *= v; + + return result; +} + +int +main (void) +{ + int res; + + check_vect (); + + res = foo (1); + if (res != RES) +abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ +
[PATCH] reimplement -fstrict-volatile-bitfields v4, part 0/2
Here is the latest version of my -fstrict-volatile-bitfields patches. I have gone ahead and committed part 1 of the previous version of this patch series, which was approved back in mid-June. That part just removes the warning about conflicts with packed structs (which everybody seemed to agree was a bad idea) but doesn't otherwise change the behavior of -fstrict-volatile-bitfields. The code changes for the rest of the patch series are unchanged, but I've updated the test cases to remove dependencies on header files. I refreshed the patches against mainline head and retested all parts of the patch series last week. Part 1 of the current patch series incorporates parts 2 and 3 of the previous version (code changes and test cases for the various bugs that have been reported against -fstrict-volatile-bitfields). Part 2 is the same as part 4 of the previous version (changing the target-specific defaults). Hopefully having fewer parts to keep track of will make it easier to review. -Sandra
[PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
This patch fixes various -fstrict-volatile-bitfields wrong-code bugs, including instances where bitfield values were being quietly truncated as well as bugs relating to using the wrong width. The code changes are identical to those in the previous version of this patch series (originally posted in June and re-pinged many times since then), but for this version I have cleaned up the test cases to remove dependencies on header files, as Bernd requested. -Sandra 2013-09-28 Sandra Loosemore san...@codesourcery.com PR middle-end/23623 PR middle-end/48784 PR middle-end/56341 PR middle-end/56997 gcc/ * expmed.c (strict_volatile_bitfield_p): New function. (store_bit_field_1): Don't special-case strict volatile bitfields here. (store_bit_field): Handle strict volatile bitfields here instead. (store_fixed_bit_field): Don't special-case strict volatile bitfields here. (extract_bit_field_1): Don't special-case strict volatile bitfields here. (extract_bit_field): Handle strict volatile bitfields here instead. (extract_fixed_bit_field): Don't special-case strict volatile bitfields here. Simplify surrounding code to resemble that in store_fixed_bit_field. * doc/invoke.texi (Code Gen Options): Update -fstrict-volatile-bitfields description. gcc/testsuite/ * gcc.dg/pr23623.c: New test. * gcc.dg/pr48784-1.c: New test. * gcc.dg/pr48784-2.c: New test. * gcc.dg/pr56341-1.c: New test. * gcc.dg/pr56341-2.c: New test. * gcc.dg/pr56997-1.c: New test. * gcc.dg/pr56997-2.c: New test. * gcc.dg/pr56997-3.c: New test. Index: gcc/expmed.c === --- gcc/expmed.c (revision 203003) +++ gcc/expmed.c (working copy) @@ -415,6 +415,42 @@ lowpart_bit_field_p (unsigned HOST_WIDE_ return bitnum % BITS_PER_WORD == 0; } +/* Return true if -fstrict-volatile-bitfields applies an access of OP0 + containing BITSIZE bits starting at BITNUM, with field mode FIELDMODE. */ + +static bool +strict_volatile_bitfield_p (rtx op0, unsigned HOST_WIDE_INT bitsize, + unsigned HOST_WIDE_INT bitnum, + enum machine_mode fieldmode) +{ + unsigned HOST_WIDE_INT modesize = GET_MODE_BITSIZE (fieldmode); + + /* -fstrict-volatile-bitfields must be enabled and we must have a + volatile MEM. */ + if (!MEM_P (op0) + || !MEM_VOLATILE_P (op0) + || flag_strict_volatile_bitfields = 0) +return false; + + /* Non-integral modes likely only happen with packed structures. + Punt. */ + if (!SCALAR_INT_MODE_P (fieldmode)) +return false; + + /* The bit size must not be larger than the field mode, and + the field mode must not be larger than a word. */ + if (bitsize modesize || modesize BITS_PER_WORD) +return false; + + /* Check for cases of unaligned fields that must be split. */ + if (bitnum % BITS_PER_UNIT + bitsize modesize + || (STRICT_ALIGNMENT + bitnum % GET_MODE_ALIGNMENT (fieldmode) + bitsize modesize)) +return false; + + return true; +} + /* Return true if OP is a memory and if a bitfield of size BITSIZE at bit number BITNUM can be treated as a simple value of mode MODE. */ @@ -813,12 +849,8 @@ store_bit_field_1 (rtx str_rtx, unsigned cheap register alternative is available. */ if (MEM_P (op0)) { - /* Do not use unaligned memory insvs for volatile bitfields when - -fstrict-volatile-bitfields is in effect. */ - if (!(MEM_VOLATILE_P (op0) - flag_strict_volatile_bitfields 0) - get_best_mem_extraction_insn (insv, EP_insv, bitsize, bitnum, - fieldmode) + if (get_best_mem_extraction_insn (insv, EP_insv, bitsize, bitnum, + fieldmode) store_bit_field_using_insv (insv, op0, bitsize, bitnum, value)) return true; @@ -871,6 +903,27 @@ store_bit_field (rtx str_rtx, unsigned H enum machine_mode fieldmode, rtx value) { + /* Handle -fstrict-volatile-bitfields in the cases where it applies. */ + if (strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, fieldmode)) +{ + + /* Storing any naturally aligned field can be done with a simple + store. For targets that support fast unaligned memory, any + naturally sized, unit aligned field can be done directly. */ + if (simple_mem_bitfield_p (str_rtx, bitsize, bitnum, fieldmode)) + { + str_rtx = adjust_bitfield_address (str_rtx, fieldmode, + bitnum / BITS_PER_UNIT); + emit_move_insn (str_rtx, value); + } + else + /* Explicitly override the C/C++ memory model; ignore the + bit range so that we can do the access in the mode mandated + by -fstrict-volatile-bitfields instead. */ + store_fixed_bit_field (str_rtx, bitsize, bitnum, 0, 0, value); + return; +} + /* Under the C++0x memory model, we must not touch bits outside the bit region. Adjust the address to start at the beginning of the bit region. */ @@ -923,29 +976,12 @@ store_fixed_bit_field (rtx op0, unsigned if (MEM_P (op0)) { - unsigned HOST_WIDE_INT
[PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2
This patch makes -fstrict-volatile-bitfields not be the default on any target. It is unchanged from part 4 of the previous version (v3) of this patch set that I originally posted back in June and have been re-pinging many times since then. Some reviewers of my original patch series argued quite strongly that the C/C++ language semantics ought to take precedence over any target-specific ABI in cases where they conflict. I am neutral on this change myself, but if it is a requirement for approval of the other part of the patch that fixes the wrong-code bugs, I think users on targets such as ARM that currently default to enabling this flag would be better off specifying the flag explicitly to get code that does what they want, than getting broken code by default and no way to tell GCC to DTRT. :-( And that's the behavior we're stuck with if we do nothing or can't reach a consensus about what to do. :-( Bernd Edlinger has been working on a followup patch to issue warnings in cases where -fstrict-volatile-bitfields behavior conflicts with the new C/C++ memory model, which might help to ease the transition in the change of defaults. I believe this was the last version posted: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00100.html -Sandra 2013-09-28 Sandra Loosemore san...@codesourcery.com gcc/ * config/aarch64/aarch64.c (aarch64_override_options): Don't override flag_strict_volatile_bitfields. * config/arm/arm.c (arm_option_override): Likewise. * config/h8300/h8300.c (h8300_option_override): Likewise. * config/m32c/m32c.c (m32c_option_override): Likewise. * config/rx/rx.c (rx_option_override): Likewise. * config/sh/sh.c (sh_option_override): Likewise. * doc/invoke.texi (Code Gen Options): Document that -fstrict-volatile-bitfields is no longer the default on any target. gcc/testsuite/ * gcc.target/arm/volatile-bitfields-1.c: Add explicit -fstrict-volatile-bitfields. * gcc.target/arm/volatile-bitfields-2.c: Likewise. * gcc.target/arm/volatile-bitfields-3.c: Likewise. * gcc.target/arm/volatile-bitfields-4.c: Likewise. Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c (revision 203002) +++ gcc/config/aarch64/aarch64.c (working copy) @@ -5141,10 +5141,6 @@ aarch64_override_options (void) aarch64_build_bitmask_table (); - /* This target defaults to strict volatile bitfields. */ - if (flag_strict_volatile_bitfields 0 abi_version_at_least (2)) -flag_strict_volatile_bitfields = 1; - /* If the user did not specify a processor, choose the default one for them. This will be the CPU set during configuration using --with-cpu, otherwise it is generic. */ Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 203002) +++ gcc/config/arm/arm.c (working copy) @@ -2136,11 +2136,6 @@ arm_option_override (void) global_options.x_param_values, global_options_set.x_param_values); - /* ARM EABI defaults to strict volatile bitfields. */ - if (TARGET_AAPCS_BASED flag_strict_volatile_bitfields 0 - abi_version_at_least(2)) -flag_strict_volatile_bitfields = 1; - /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we have deemed it beneficial (signified by setting num_prefetch_slots to 1 or more.) */ if (flag_prefetch_loop_arrays 0 Index: gcc/config/h8300/h8300.c === --- gcc/config/h8300/h8300.c (revision 203002) +++ gcc/config/h8300/h8300.c (working copy) @@ -437,10 +437,6 @@ h8300_option_override (void) restore er6 though, so bump up the cost. */ h8300_move_ratio = 6; } - - /* This target defaults to strict volatile bitfields. */ - if (flag_strict_volatile_bitfields 0 abi_version_at_least(2)) -flag_strict_volatile_bitfields = 1; } /* Return the byte register name for a register rtx X. B should be 0 Index: gcc/config/m32c/m32c.c === --- gcc/config/m32c/m32c.c (revision 203002) +++ gcc/config/m32c/m32c.c (working copy) @@ -416,10 +416,6 @@ m32c_option_override (void) if (TARGET_A24) flag_ivopts = 0; - /* This target defaults to strict volatile bitfields. */ - if (flag_strict_volatile_bitfields 0 abi_version_at_least(2)) -flag_strict_volatile_bitfields = 1; - /* r8c/m16c have no 16-bit indirect call, so thunks are involved. This is always worse than an absolute call. */ if (TARGET_A16) Index: gcc/config/rx/rx.c === --- gcc/config/rx/rx.c (revision 203002) +++ gcc/config/rx/rx.c (working copy) @@ -2691,10 +2691,6 @@ rx_option_override (void) } } - /* This target defaults to strict volatile bitfields. */ - if (flag_strict_volatile_bitfields 0 abi_version_at_least(2)) -