Re: Make change_decl_assembler_name functional with inline clones
On 8 April 2013 22:08:54 Jan Hubicka hubi...@ucw.cz wrote: Hi, this patch makes change_decl_assembler_name to do the right thing with inline clones. My original plan was to remove inline clones from assembler_name_hash, but it hits the problem that we currently need to make them unique for purposes of LTO sreaming. It is not hard to walk the clone tree and update it. Later we can reorg streaming to not rely on uniqueness of symbol names of function bodies not associated with a real symbol and perhaps simplify this somewhat. Bootstrapped/regtested x86_64-linux, will commit it shortly. PR lto/54095 * symtab.c (insert_to_assembler_name_hash): Handle clones. (unlink_from_assembler_name_hash): Likewise. (symtab_prevail_in_asm_name_hash, symtab_register_node, symtab_unregister_node, symtab_initialize_asm_name_hash, change_decl_assembler_name): Update. Index: symtab.c === *** symtab.c(revision 197551) --- symtab.c(working copy) *** eq_assembler_name (const void *p1, const *** 102,108 /* Insert NODE to assembler name hash. */ static void ! insert_to_assembler_name_hash (symtab_node node) { if (is_a varpool_node (node) DECL_HARD_REGISTER (node-symbol.decl)) return; --- 102,108 /* Insert NODE to assembler name hash. */ static void ! insert_to_assembler_name_hash (symtab_node node, bool with_clones) { if (is_a varpool_node (node) DECL_HARD_REGISTER (node-symbol.decl)) return; *** insert_to_assembler_name_hash (symtab_no *** 111,116 --- 111,119 if (assembler_name_hash) { void **aslot; + struct cgraph_node *cnode; + tree decl = node-symbol.decl; + tree name = DECL_ASSEMBLER_NAME (node-symbol.decl); Maybe use decl here? Thanks, Sent with AquaMail for Android http://www.aqua-mail.com
Re: FW: [PATCH] Avoid a few find_base_term calls in alias.c
On Tue, Apr 9, 2013 at 9:42 AM, Igor Zamyatin izamya...@gmail.com wrote: Richard, Any plans to commit your new fix? I did already. Richard. Thanks, Igor On Thu, Apr 4, 2013 at 3:33 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi Richard, You slightly change behavior of find_base_term, namely before your fix we have the following code: if (REG_P (tmp1) REG_POINTER (tmp1)) { rtx base = find_base_term (tmp1); if (base) return base; } but in your fix you added the following checks on base: tmp1 = find_base_term (tmp1); if (tmp1 != NULL_RTX ((REG_P (tmp1) REG_POINTER (tmp1)) || known_base_value_p (tmp1))) return tmp1; Why you added these checks since function arguments may not considered as bases. If you continue down the original function there is (quoted fully) /* If either operand is known to be a pointer, then use it to determine the base term. */ if (REG_P (tmp1) REG_POINTER (tmp1)) { rtx base = find_base_term (tmp1); if (base) return base; } if (REG_P (tmp2) REG_POINTER (tmp2)) { rtx base = find_base_term (tmp2); if (base) return base; } /* Neither operand was known to be a pointer. Go ahead and find the base term for both operands. */ tmp1 = find_base_term (tmp1); tmp2 = find_base_term (tmp2); so we call find_base_term on tmp1 anyway, and after the modification tmp1 = find_base_term (tmp1); if (tmp1 != NULL_RTX ((REG_P (tmp1) REG_POINTER (tmp1)) || known_base_value_p (tmp1))) return tmp1; we take it if it is non-NULL and ... ah, I see. We now check REG_P/REG_POINTER on the new value. That was indeed unintended. I'll fix it like below. Richard. 2013-04-04 Richard Biener rguent...@suse.de * alias.c (find_base_term): Fix thinko in previous change. Index: gcc/alias.c === --- gcc/alias.c (revision 197480) +++ gcc/alias.c (working copy) @@ -1687,16 +1687,16 @@ find_base_term (rtx x) term is from a pointer or is a named object or a special address (like an argument or stack reference), then use it for the base term. */ - tmp1 = find_base_term (tmp1); - if (tmp1 != NULL_RTX + rtx base = find_base_term (tmp1); + if (base != NULL_RTX ((REG_P (tmp1) REG_POINTER (tmp1)) -|| known_base_value_p (tmp1))) - return tmp1; - tmp2 = find_base_term (tmp2); - if (tmp2 != NULL_RTX +|| known_base_value_p (base))) + return base; + base = find_base_term (tmp2); + if (base != NULL_RTX ((REG_P (tmp2) REG_POINTER (tmp2)) -|| known_base_value_p (tmp2))) - return tmp2; +|| known_base_value_p (base))) + return base; /* We could not determine which of the two operands was the base register and which was the index. So we can determine
[PATCH, trivial] Make type_hash_lookup, type_hash_add static
This patch makes type_hash_lookup and type_hash_add in tree.c static. These functions are used only within tree.c, and I don't think there is any reason to export low-level interface to type hashing anyway. Will commit in 2 days unless someone complains. Bootstrapped on x86_64-linux-gnu. Thanks, -- Maxim Kuvyrkov KugelWorks gcc-static-type-hash.ChangeLog Description: Binary data gcc-static-type-hash.patch Description: Binary data
[PATCH] Remove unsave_expr_now
In propagate_tree_value and replace_exp_1 operating on SSA form are the only users of unsave_expr_now where really unsharing matters. Thus, use unshare_expr and remove unsave_expr_now. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2013-04-09 Richard Biener rguent...@suse.de * tree.h (unsave_expr_now): Remove. * tree-inline.c (mark_local_for_remap_r): Remove. (unsave_expr_1): Likewise. (unsave_r): Likewise. (unsave_expr_now): Likewise. * tree-ssa-copy.c (replace_exp_1): Use unshare_expr. (propagate_tree_value): Likewise. Index: gcc/tree.h === *** gcc/tree.h (revision 197574) --- gcc/tree.h (working copy) *** extern void indent_to (FILE *, int); *** 6018,6024 extern bool debug_find_tree (tree, tree); /* This is in tree-inline.c since the routine uses data structures from the inliner. */ - extern tree unsave_expr_now (tree); extern tree build_duplicate_type (tree); /* In calls.c */ --- 6018,6023 Index: gcc/tree-inline.c === *** gcc/tree-inline.c (revision 197574) --- gcc/tree-inline.c (working copy) *** eni_weights eni_time_weights; *** 114,122 static tree declare_return_variable (copy_body_data *, tree, tree, basic_block); static void remap_block (tree *, copy_body_data *); static void copy_bind_expr (tree *, int *, copy_body_data *); - static tree mark_local_for_remap_r (tree *, int *, void *); - static void unsave_expr_1 (tree); - static tree unsave_r (tree *, int *, void *); static void declare_inline_vars (tree, tree); static void remap_save_expr (tree *, void *, int *); static void prepend_lexical_block (tree current_block, tree new_block); --- 114,119 *** remap_save_expr (tree *tp, void *st_, in *** 4465,4601 *tp = t; } - /* Called via walk_tree. If *TP points to a DECL_STMT for a local label, -copies the declaration and enters it in the splay_tree in DATA (which is -really an `copy_body_data *'). */ - - static tree - mark_local_for_remap_r (tree *tp, int *walk_subtrees ATTRIBUTE_UNUSED, - void *data) - { - copy_body_data *id = (copy_body_data *) data; - - /* Don't walk into types. */ - if (TYPE_P (*tp)) - *walk_subtrees = 0; - - else if (TREE_CODE (*tp) == LABEL_EXPR) - { - tree decl = TREE_OPERAND (*tp, 0); - - /* Copy the decl and remember the copy. */ - insert_decl_map (id, decl, id-copy_decl (decl, id)); - } - - return NULL_TREE; - } - - /* Perform any modifications to EXPR required when it is unsaved. Does -not recurse into EXPR's subtrees. */ - - static void - unsave_expr_1 (tree expr) - { - switch (TREE_CODE (expr)) - { - case TARGET_EXPR: - /* Don't mess with a TARGET_EXPR that hasn't been expanded. - It's OK for this to happen if it was part of a subtree that - isn't immediately expanded, such as operand 2 of another - TARGET_EXPR. */ - if (TREE_OPERAND (expr, 1)) - break; - - TREE_OPERAND (expr, 1) = TREE_OPERAND (expr, 3); - TREE_OPERAND (expr, 3) = NULL_TREE; - break; - - default: - break; - } - } - - /* Called via walk_tree when an expression is unsaved. Using the -splay_tree pointed to by ST (which is really a `splay_tree'), -remaps all local declarations to appropriate replacements. */ - - static tree - unsave_r (tree *tp, int *walk_subtrees, void *data) - { - copy_body_data *id = (copy_body_data *) data; - struct pointer_map_t *st = id-decl_map; - tree *n; - - /* Only a local declaration (variable or label). */ - if ((TREE_CODE (*tp) == VAR_DECL !TREE_STATIC (*tp)) - || TREE_CODE (*tp) == LABEL_DECL) - { - /* Lookup the declaration. */ - n = (tree *) pointer_map_contains (st, *tp); - - /* If it's there, remap it. */ - if (n) - *tp = *n; - } - - else if (TREE_CODE (*tp) == STATEMENT_LIST) - gcc_unreachable (); - else if (TREE_CODE (*tp) == BIND_EXPR) - copy_bind_expr (tp, walk_subtrees, id); - else if (TREE_CODE (*tp) == SAVE_EXPR - || TREE_CODE (*tp) == TARGET_EXPR) - remap_save_expr (tp, st, walk_subtrees); - else - { - copy_tree_r (tp, walk_subtrees, NULL); - - /* Do whatever unsaving is required. */ - unsave_expr_1 (*tp); - } - - /* Keep iterating. */ - return NULL_TREE; - } - - /* Copies everything in EXPR and replaces variables, labels -and SAVE_EXPRs local to EXPR. */ - - tree - unsave_expr_now (tree expr) - { - copy_body_data id; - - /* There's nothing to do for NULL_TREE. */ - if (expr == 0) - return expr; - - /* Set up ID. */ - memset (id, 0, sizeof (id)); - id.src_fn = current_function_decl; - id.dst_fn =
Re: [PATCH] Don't forwprop into clobbers in some cases (PR tree-optimization/56854)
On Mon, 8 Apr 2013, Jakub Jelinek wrote: Hi! lhs ={v} {CLOBBER}; stmts right now allow only VAR_DECL or MEM_REF lhs, but the forwprop code below on the attached testcase attempts to propagate an ARRAY_REF (of MEM_REF) into it. Fixed by not propagating in that case, allowing arbitrary memory lhs is IMHO unnecessary and such lhs's wouldn't be very useful for DSE anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2013-04-08 Jakub Jelinek ja...@redhat.com PR tree-optimization/56854 * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Don't forward into clobber stmts if it would change MEM_REF lhs into non-MEM_REF. * g++.dg/torture/pr56854.C: New test. --- gcc/tree-ssa-forwprop.c.jj2013-02-25 23:51:21.0 +0100 +++ gcc/tree-ssa-forwprop.c 2013-04-08 16:12:37.0 +0200 @@ -826,7 +826,11 @@ forward_propagate_addr_expr_1 (tree name integer_zerop (TREE_OPERAND (lhs, 1)) useless_type_conversion_p (TREE_TYPE (TREE_OPERAND (def_rhs, 0)), - TREE_TYPE (gimple_assign_rhs1 (use_stmt + TREE_TYPE (gimple_assign_rhs1 (use_stmt))) +/* Don't forward anything into clobber stmts if it would result + in the lhs no longer being a MEM_REF. */ + (!gimple_clobber_p (use_stmt) +|| TREE_CODE (TREE_OPERAND (def_rhs, 0)) == MEM_REF)) { tree *def_rhs_basep = TREE_OPERAND (def_rhs, 0); tree new_offset, new_base, saved, new_lhs; --- gcc/testsuite/g++.dg/torture/pr56854.C.jj 2013-04-08 18:03:37.978009666 +0200 +++ gcc/testsuite/g++.dg/torture/pr56854.C2013-04-08 18:03:09.0 +0200 @@ -0,0 +1,24 @@ +// PR tree-optimization/56854 +// { dg-do compile } + +inline void * +operator new (__SIZE_TYPE__, void *p) throw () +{ + return p; +} + +struct A +{ + int a; + A () : a (0) {} + ~A () {} + A operator= (const A v) { this-~A (); new (this) A (v); return *this; } +}; +A b[4], c[4]; + +void +foo () +{ + for (int i = 0; i 4; ++i) +c[i] = b[i]; +} Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: Make lto-symtab to ignore conflicts in static functions
On Mon, 8 Apr 2013, Jan Hubicka wrote: Hi, currently lto-symtab is trying to resolve all duplicated declarations, including static variables where such duplicates should not happen. This conflicts with the plan to solve PR54095 by postponning renaming to the partitioning. This patch adds lto_symtab_symbol_p that disable merging on statics and keeps duplicate entries for a given asm name. Boostrapped/regtested x86_64-linux, OK? Ok. Thanks, Richard. Honza PR lto/54095 lto-symtab.c (lto_symtab_symbol_p): New function. (lto_symtab_resolve_can_prevail_p, lto_symtab_resolve_symbols, lto_symtab_resolve_symbols, lto_symtab_merge_decls_2, lto_symtab_merge_decls_1, lto_symtab_merge_cgraph_nodes_1): Skip static symbols. Index: lto-symtab.c === *** lto-symtab.c (revision 197551) --- lto-symtab.c (working copy) *** lto_symtab_resolve_replaceable_p (symtab *** 226,237 return false; } /* Return true if the symtab entry E can be the prevailing one. */ static bool lto_symtab_resolve_can_prevail_p (symtab_node e) { ! if (!symtab_real_symbol_p (e)) return false; /* The C++ frontend ends up neither setting TREE_STATIC nor --- 226,249 return false; } + /* Return true, if the symbol E should be resolved by lto-symtab. +Those are all real symbols that are not static (we handle renaming +of static later in partitioning). */ + + static bool + lto_symtab_symbol_p (symtab_node e) + { + if (!TREE_PUBLIC (e-symbol.decl)) + return false; + return symtab_real_symbol_p (e); + } + /* Return true if the symtab entry E can be the prevailing one. */ static bool lto_symtab_resolve_can_prevail_p (symtab_node e) { ! if (!lto_symtab_symbol_p (e)) return false; /* The C++ frontend ends up neither setting TREE_STATIC nor *** lto_symtab_resolve_symbols (symtab_node *** 261,267 /* Always set e-node so that edges are updated to reflect decl merging. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) ! if (symtab_real_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) --- 273,279 /* Always set e-node so that edges are updated to reflect decl merging. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) ! if (lto_symtab_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) *** lto_symtab_resolve_symbols (symtab_node *** 275,281 { /* Assert it's the only one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! if (symtab_real_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) --- 287,293 { /* Assert it's the only one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! if (lto_symtab_symbol_p (e) (e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY || e-symbol.resolution == LDPR_PREVAILING_DEF_IRONLY_EXP || e-symbol.resolution == LDPR_PREVAILING_DEF)) *** lto_symtab_resolve_symbols (symtab_node *** 310,317 /* Do a second round choosing one from the replaceable prevailing decls. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) { ! if (!lto_symtab_resolve_can_prevail_p (e) ! || !symtab_real_symbol_p (e)) continue; /* Choose the first function that can prevail as prevailing. */ --- 322,328 /* Do a second round choosing one from the replaceable prevailing decls. */ for (e = first; e; e = e-symbol.next_sharing_asm_name) { ! if (!lto_symtab_resolve_can_prevail_p (e)) continue; /* Choose the first function that can prevail as prevailing. */ *** lto_symtab_merge_decls_2 (symtab_node fi *** 365,375 /* Try to merge each entry with the prevailing one. */ for (e = prevailing-symbol.next_sharing_asm_name; e; e = e-symbol.next_sharing_asm_name) ! { ! if (!lto_symtab_merge (prevailing, e) !!diagnosed_p) ! mismatches.safe_push (e-symbol.decl); ! } if (mismatches.is_empty ()) return; --- 376,387 /* Try to merge each entry with the prevailing one. */ for (e = prevailing-symbol.next_sharing_asm_name;
Re: [PATCH, combine] Fix host-specific behavior in simplify_compare_const()
The attached patch has passed bootstrap/regression test on x86_64-unknown-linux-gnu with current main trunk. A plaintext gcc/ChangeLog is as below: 2013-04-06 Chung-Ju Wu jasonw...@gmail.com * combine.c (simplify_compare_const): Use GET_MODE_MASK to filter out unnecessary bits in the constant power of two case. On behalf of Andes Technology Co., we have signed FSF agreement. However, so far I don't have svn write access yet. Would you please help to commit this patch? Jeff has kindly offered to sponsor you for write access, so you should be able to install it yourself. Otherwise I'll be happy to do it for you. Thanks for fixing this problem in the combiner. -- Eric Botcazou
Re: RFA: Fix tree-optimization/55524
On Mon, Apr 8, 2013 at 5:10 PM, Joern Rennecke joern.renne...@embecosm.com wrote: This is basically the same patch as attached to the PR, except that I have changed the goto-loop into a do-while loop with a new comment; this caused the need for a lot of reformatting. Can you please include a testcase that shows the effect of this? bootstrapped regtested on i686-pc-linux-gnu. 2013-04-08 Joern Rennecke joern.renne...@embecosm.com * tree-ssa-math-opts.c (mult_to_fma_pass): New file static struct. (convert_mult_to_fma): In first pass, don't use an fms construct when we don't have an fms operation, but fmna. it's fnma I believe. (execute_optimize_widening_mul): Add a second pass if convert_mult_to_fma requests it. Index: gcc/tree-ssa-math-opts.c === --- gcc/tree-ssa-math-opts.c(revision 197578) +++ gcc/tree-ssa-math-opts.c(working copy) @@ -2461,6 +2461,12 @@ convert_plusminus_to_widen (gimple_stmt_ return true; } +static struct +{ + bool second_pass; + bool retry_request; +} mult_to_fma_pass; + /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2 with uses in additions and subtractions to form fused multiply-add operations. Returns true if successful and MUL_STMT should be removed. */ @@ -2570,6 +2576,22 @@ convert_mult_to_fma (gimple mul_stmt, tr return false; } + /* If the subtrahend (gimple_assign_rhs2 (use_stmt)) is computed +by a MULT_EXPR that we'll visit later, we might be able to +get a more profitable match with fnma. +OTOH, if we don't, a negate / fma pair has likely lower latency +that a mult / subtract pair. */ This makes it sound that this is purely an ordering issue, thus for x = a * b; y = c * d; z = x - y; instead of y = c * d; z = a * b + (-y); you want to generate x = a * b; z = -(c * d + (-x)); I fail to see why you need two passes for this rather than considering the case that the immediate use stmt of the multiplication we start from combines another multiplication with a MINUS_EXPR. That is ... + if (use_code == MINUS_EXPR !negate_p + gimple_assign_rhs1 (use_stmt) == result + optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing + optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing + mult_to_fma_pass.second_pass == false) see if that is the case here and simply not do the transform. A second pass will not recover from that without destroying the fnma pattern (testcase?) Richard. + { + /* ??? Could make setting of retry_request dependent on some +rtx_cost measure we evaluate beforehand. */ + mult_to_fma_pass.retry_request = true; + return false; + } /* We can't handle a * b + a * b. */ if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt)) return false; @@ -2657,76 +2679,89 @@ execute_optimize_widening_mul (void) memset (widen_mul_stats, 0, sizeof (widen_mul_stats)); - FOR_EACH_BB (bb) -{ - gimple_stmt_iterator gsi; - for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);) -{ - gimple stmt = gsi_stmt (gsi); - enum tree_code code; + /* We may run one or two passes. In the first pass, if have fnma, + but not fms, we don't synthesize fms so that we can get the maximum + matches for fnma. If we have therefore skipped opportunities to + synthesize fms, we'll run a second pass where we use any such + opportunities that still remain. */ + mult_to_fma_pass.retry_request = false; + do +{ + mult_to_fma_pass.second_pass = mult_to_fma_pass.retry_request; + FOR_EACH_BB (bb) + { + gimple_stmt_iterator gsi; - if (is_gimple_assign (stmt)) + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);) { - code = gimple_assign_rhs_code (stmt); - switch (code) + gimple stmt = gsi_stmt (gsi); + enum tree_code code; + + if (is_gimple_assign (stmt)) { - case MULT_EXPR: - if (!convert_mult_to_widen (stmt, gsi) - convert_mult_to_fma (stmt, - gimple_assign_rhs1 (stmt), - gimple_assign_rhs2 (stmt))) + code = gimple_assign_rhs_code (stmt); + switch (code) { - gsi_remove (gsi, true); - release_defs (stmt); - continue; - } - break; - - case PLUS_EXPR: - case MINUS_EXPR: - convert_plusminus_to_widen (gsi, stmt, code); -
Re: useless cast blocking some optimization in gcc 4.7.3
On Mon, Apr 8, 2013 at 9:13 PM, Laurent Alfonsi laurent.alfo...@st.com wrote: Hello, I have identified a big performance regression between 4.6 and 4.7. (I have enclosed a pathological test). After investigation, it is because of the += statement applied on 2 signed chars. - It is now type-promoted to int when it is written result += foo(). (since 4.7) - it is type promoted to unsigned char when it is written result = result + foo(). The char-int-char cast is blocking some optimizations in later phases. Anyway, this doesn't look wrong, so I extended fold optimization in order to catch this case. (patch enclosed) The patch basically transforms : (TypeA) ( (TypeB) a1 + (TypeB) a2 )/* with a1 and a2 of the signed type TypeA */ into : a1 + a2 I believe this is legal for any licit a1/a2 input values (no overflow on signed char). No new failure on the two tested targets : sh-superh-elf and x86_64-unknown-linux-gnu. Should I enter a bugzilla to track this ? Is it ok for trunk ? Please open a bugzilla. No, the patch is not ok, as said by Marc. It's possible to shorten the operation by performing it using unsigned arithmetic, that is (signed)((unsigned)a1 + (unsigned)a2). But if really casts cause the issue you are seeing that will not help you. Richard. 2013-04-08 Laurent Alfonsi laurent.alfo...@st.com * fold-const.c (fold_unary_loc): Suppress useless type promotion. Thanks, Laurent
Re: [i386] Replace builtins with vector extensions
On Mon, Apr 8, 2013 at 10:47 PM, Marc Glisse marc.gli...@inria.fr wrote: On Sun, 7 Apr 2013, Marc Glisse wrote: extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_slli_epi16 (__m128i __A, int __B) { - return (__m128i)__builtin_ia32_psllwi128 ((__v8hi)__A, __B); + return (__m128i) ((__v8hi)__A __B); } Actually, I believe I have to keep using the builtins for shifts, because the intrinsics have well defined behavior for large __B whereas and don't. I seem to remember discussion in the PR(s) that the intrinsics should (and do for other compilers) expand to the desired instructions even when the corresponding instruction set is disabled. Using vector extension makes that harder to achieve. Other than that I am all for using the vector extensions, but I think you need carefully wrapped __extension__ markers so that with -std=c89 -pedantic you still can compile programs using the intrinsics? Richard. -- Marc Glisse
Re: [patch][sparc] define_c_enum for UNSPEC/UNSPECV
* config/sparc/sparc.md: Use define_c_enum for unspec and unspecv. OK, thanks. -- Eric Botcazou
Re: Comments on the suggestion to use infinite precision math for wide int.
On Mon, Apr 8, 2013 at 10:39 PM, Lawrence Crowl cr...@google.com wrote: On 4/8/13, Kenneth Zadeck zad...@naturalbridge.com wrote: The other problem, which i invite you to use the full power of your c++ sorcery on, is the one where defining an operator so that wide-int + unsigned hwi is either rejected or properly zero extended. If you can do this, I will go along with your suggestion that the internal rep should be sign extended. Saying that constants are always sign extended seems ok, but there are a huge number of places where we convert unsigned hwis as the second operand and i do not want that to be a trap. I went thru a round of this, where i did not post the patch because i could not make this work. And the number of places where you want to use an hwi as the second operand dwarfs the number of places where you want to use a small integer constant. You can use overloading, as in the following, which actually ignores handling the sign in the representation. class number { unsigned int rep1; int representation; public: number(int arg) : representation(arg) {} number(unsigned int arg) : representation(arg) {} friend number operator+(number, int); friend number operator+(number, unsigned int); friend number operator+(int, number); friend number operator+(unsigned int, number); }; number operator+(number n, int si) { return n.representation + si; } number operator+(number n, unsigned int ui) { return n.representation + ui; } number operator+(int si, number n) { return n.representation + si; } number operator+(unsigned int ui, number n) { return n.representation + ui; } That does not work for types larger than int/unsigned int as HOST_WIDE_INT usually is (it's long / unsigned long). When you pass an int or unsigned int to number operator+(unsigned long ui, number n); number operator+(long ui, number n) you get an ambiguity. You can fix that by relying on template argument deduction and specialization instead of on overloading and integer conversion rules. If the argument type is of a template type parameter, then you can test the template type via if (std::is_signedT::value) // sign extend else // zero extend See http://www.cplusplus.com/reference/type_traits/is_signed/. Yes, if we want to use the standard library. For what integer types is std::is_signed required to be implemented in C++98 (long long?)? Consider non-GCC host compilers. Richard. If you want to handle non-builtin types that are asigne dor unsigned, then you need to add a specialization for is_signed. -- Lawrence Crowl
Re: [RFA][PATCH] Improve VRP of COND_EXPR_CONDs -- v2
On Tue, Apr 9, 2013 at 3:54 AM, Jeff Law l...@redhat.com wrote: This incorporates the concrete suggestions from Steven Richi -- it doesn't do any refactoring of the VRP code. There's still stuff I'm looking at that might directly lead to some refactoring. In the mean time I'm submitting the obvious small improvements. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for trunk? Ok. Thanks, Richard. Jeff commit d6d1e36561b9022bbcdf157a886895f5bb0ef2ae Author: Jeff Law l...@redhat.com Date: Sat Apr 6 06:46:58 2013 -0600 * tree-vrp.c (simplify_cond_using_ranges): Simplify test of boolean when the boolean was created by converting a wider object which had a boolean range. * gcc.dg/tree-ssa/vrp87.c: New test diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6ee7d9c..110f61e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -16,8 +16,14 @@ 2013-04-08 Jeff Law l...@redhat.com + * tree-vrp.c (simplify_cond_using_ranges): Simplify test of boolean + when the boolean was created by converting a wider object which + had a boolean range. + +2013-04-08 Jeff Law l...@redhat.com + * gimple.c (canonicalize_cond_expr_cond): Rewrite x ^ y into x != y. - + 2013-04-08 Richard Biener rguent...@suse.de * gimple-pretty-print.c (debug_gimple_stmt): Do not print diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c new file mode 100644 index 000..7feff81 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp87.c @@ -0,0 +1,81 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-vrp2-details -fdump-tree-cddce2-details } */ + +struct bitmap_head_def; +typedef struct bitmap_head_def *bitmap; +typedef const struct bitmap_head_def *const_bitmap; + + +typedef unsigned long BITMAP_WORD; +typedef struct bitmap_element_def +{ + struct bitmap_element_def *next; + unsigned int indx; + BITMAP_WORD bits[((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u))]; +} bitmap_element; + + + + + + +typedef struct bitmap_head_def +{ + bitmap_element *first; + +} bitmap_head; + + + +static __inline__ unsigned char +bitmap_elt_ior (bitmap dst, bitmap_element * dst_elt, + bitmap_element * dst_prev, const bitmap_element * a_elt, + const bitmap_element * b_elt, unsigned char changed) +{ + + if (a_elt) +{ + + if (!changed dst_elt) + { + changed = 1; + } +} + else +{ + changed = 1; +} + return changed; +} + +unsigned char +bitmap_ior_into (bitmap a, const_bitmap b) +{ + bitmap_element *a_elt = a-first; + const bitmap_element *b_elt = b-first; + bitmap_element *a_prev = ((void *) 0); + unsigned char changed = 0; + + while (b_elt) +{ + + if (!a_elt || a_elt-indx == b_elt-indx) + changed = bitmap_elt_ior (a, a_elt, a_prev, a_elt, b_elt, changed); + else if (a_elt-indx b_elt-indx) + changed = 1; + b_elt = b_elt-next; + + +} + + return changed; +} + +/* Verify that VRP simplified an if statement. */ +/* { dg-final { scan-tree-dump Folded into: if.* vrp2} } */ +/* Verify that DCE after VRP2 eliminates a dead conversion + to a (Bool). */ +/* { dg-final { scan-tree-dump Deleting.*_Bool.*; cddce2} } */ +/* { dg-final { cleanup-tree-dump vrp2 } } */ +/* { dg-final { cleanup-tree-dump cddce2 } } */ + diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 250a506..4520c89 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -8584,6 +8584,45 @@ simplify_cond_using_ranges (gimple stmt) } } + /* If we have a comparison of a SSA_NAME boolean against + a constant (which obviously must be [0..1]), see if the + SSA_NAME was set by a type conversion where the source + of the conversion is another SSA_NAME with a range [0..1]. + + If so, we can replace the SSA_NAME in the comparison with + the RHS of the conversion. This will often make the type + conversion dead code which DCE will clean up. */ + if (TREE_CODE (op0) == SSA_NAME + (TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE + || (INTEGRAL_TYPE_P (TREE_TYPE (op)) + TYPE_PRECISION (TREE_TYPE (op0)) == 1)) + TREE_CODE (op1) == INTEGER_CST) +{ + gimple def_stmt = SSA_NAME_DEF_STMT (op0); + tree innerop; + + if (!is_gimple_assign (def_stmt) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) + return false; + + innerop = gimple_assign_rhs1 (def_stmt); + + if (TREE_CODE (innerop) == SSA_NAME) + { + value_range_t *vr = get_value_range (innerop); + + if (range_int_cst_p (vr) + operand_equal_p (vr-min, integer_zero_node, 0) + operand_equal_p (vr-max, integer_one_node, 0)) + { + tree newconst = fold_convert
Re: [PATCH v3]IPA: fixing inline fail report caused by overwritable functions.
On Tue, Apr 9, 2013 at 5:40 AM, Zhouyi Zhou zhouzho...@gmail.com wrote: On Mon, Apr 8, 2013 at 5:48 PM, Richard Biener richard.guent...@gmail.com wrote: Can you trigger this message to show up with -Winline before/after the patch? Can you please add a testcase then? Thanks Richard for reviewing, from my point of view about gcc and my invoking of gcc, -Winline only works on callees that be declared inline, but if the callee is declared inline, it will be AVAIL_AVAILABLE in function can_inline_edge_p, thus out of the range of my patch. Ah, indeed ... So I only add a testcase for fixing the tree dump, are there any thing more I can do? No. Patch is ok. Thanks, Richard. Regtested/bootstrapped on x86_64-linux ChangeLog: 2013-04-08 Zhouyi Zhou yizhouz...@ict.ac.cn * cif-code.def (OVERWRITABLE): correct the comment for overwritable function * ipa-inline.c (can_inline_edge_p): let dump mechanism report the inline fail caused by overwritable functions. * gcc.dg/tree-ssa/inline-11.c: New test Index: gcc/cif-code.def === --- gcc/cif-code.def(revision 197549) +++ gcc/cif-code.def(working copy) @@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE, /* Function is not inlinable. */ DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_(function not inlinable)) -/* Function is not overwritable. */ +/* Function is overwritable. */ DEFCIFCODE(OVERWRITABLE, N_(function body can be overwritten at link time)) /* Function is not an inlining candidate. */ Index: gcc/testsuite/gcc.dg/tree-ssa/inline-11.c === --- gcc/testsuite/gcc.dg/tree-ssa/inline-11.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/inline-11.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-einline } */ +int w; +int bar (void) __attribute__ ((weak)); +int bar (){ + w++; +} +void foo() +{ + bar(); +} +/* { dg-final { scan-tree-dump-times function body can be overwritten at link time 1 einline } } */ +/* { dg-final { cleanup-tree-dump einline } } */ Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c(revision 197549) +++ gcc/ipa-inline.c(working copy) @@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e else if (avail = AVAIL_OVERWRITABLE) { e-inline_failed = CIF_OVERWRITABLE; - return false; + inlinable = false; } else if (e-call_stmt_cannot_inline_p) {
Re: [patch][sparc] remove sparc machine_reorg, add machine specifc pass after delayed-branch scheduling
Bootstrapped and tested on sparc64-unknown-linux-gnu. OK for trunk? OK modulo a couple of nits: @@ -804,6 +801,136 @@ char sparc_hard_reg_printed[8]; struct gcc_target targetm = TARGET_INITIALIZER; +/* We use the machine specific reorg pass to enable workarounds for errata. + We need to have the (essentially) final form of the insn stream in order + to properly detect the various hazards. Therefore, this machine specific + pass runs as late as possible. The pass is inserted in the pass pipeline + at the end of sparc_options_override(). */ We use a machine specific pass to enable... No () for functions in comments. -- Eric Botcazou
Re: [PATCH, trivial] Make type_hash_lookup, type_hash_add static
On Tue, Apr 9, 2013 at 10:20 AM, Maxim Kuvyrkov ma...@kugelworks.com wrote: This patch makes type_hash_lookup and type_hash_add in tree.c static. These functions are used only within tree.c, and I don't think there is any reason to export low-level interface to type hashing anyway. Will commit in 2 days unless someone complains. Bootstrapped on x86_64-linux-gnu. Ok. Thanks, Richard. Thanks, -- Maxim Kuvyrkov KugelWorks
Re: [i386] Replace builtins with vector extensions
On Tue, 9 Apr 2013, Richard Biener wrote: On Mon, Apr 8, 2013 at 10:47 PM, Marc Glisse marc.gli...@inria.fr wrote: On Sun, 7 Apr 2013, Marc Glisse wrote: extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_slli_epi16 (__m128i __A, int __B) { - return (__m128i)__builtin_ia32_psllwi128 ((__v8hi)__A, __B); + return (__m128i) ((__v8hi)__A __B); } Actually, I believe I have to keep using the builtins for shifts, because the intrinsics have well defined behavior for large __B whereas and don't. I seem to remember discussion in the PR(s) that the intrinsics should (and do for other compilers) expand to the desired instructions even when the corresponding instruction set is disabled. emmintrin.h starts with: #ifndef __SSE2__ # error SSE2 instruction set not enabled The closest thing I can think of is issues with -mfpmath=387, but that shouldn't matter for full vector ops. Using vector extension makes that harder to achieve. Other than that I am all for using the vector extensions, but I think you need carefully wrapped __extension__ markers so that with -std=c89 -pedantic you still can compile programs using the intrinsics? The *intrin.h files already use __extension__ to create vectors, like: return __extension__ (__m128d){ __F, 0.0 }; but even when I remove it it does not warn with -std=c89 -pedantic. -- Marc Glisse
Re: [patch] Hash table changes from cxx-conversion branch
On Mon, Apr 8, 2013 at 11:45 PM, Lawrence Crowl cr...@googlers.com wrote: Ping? You didn't commit the ones I already approved? I don't want to go over them again ... Richard. On 3/31/13, Lawrence Crowl cr...@googlers.com wrote: On 3/28/13, Richard Biener richard.guent...@gmail.com wrote: On Mar 27, 2013 Lawrence Crowl cr...@googlers.com wrote: On 3/27/13, Richard Biener richard.guent...@gmail.com wrote: On Mar 23, 2013 Lawrence Crowl cr...@googlers.com wrote: This patch is a consolodation of the hash_table patches to the cxx-conversion branch. Update various hash tables from htab_t to hash_table. Modify types and calls to match. Ugh. Can you split it up somewhat ... like split target bits away at least? Targets may prefer to keep the old hashes for ease of branch maintainance. I will do that. * tree-ssa-live.c'var_map_base_init::tree_to_index New struct tree_int_map_hasher. I think this wants to be generalized - we have the common tree_map/tree_decl_map and tree_int_map maps in tree.h - those (and its users) should be tackled in a separate patch by providing common hashtable trails implementations. I will investigate for a separate patch. Remove unused: htab_t scop::original_pddrs SCOP_ORIGINAL_PDDRS Remove unused: insert_loop_close_phis insert_guard_phis debug_ivtype_map ivtype_map_elt_info new_ivtype_map_elt Unused function/type removal are obvious changes. Remove unused: dse.c bitmap clear_alias_sets dse.c bitmap disqualified_clear_alias_sets dse.c alloc_pool clear_alias_mode_pool dse.c dse_step2_spill dse.c dse_step5_spill graphds.h htab_t graph::indices See above. It wasn't obvious that the functions could be removed. :-) Are you saying you don't want these notations in the description? No, I was saying that removal of unused functions / types should be committed separately and do not need approval as they are obvious. If they are not obvious (I didn't look at that patch part), then posting separately still helps ;) I've split out the removals to separate patches. The remaining work is in two independent pieces. The changes within the config directory and the changes outside that directory. The descriptions and patch are attached compressed due to mailer size issues. Okay for trunk? -- Lawrence Crowl -- Lawrence Crowl
Re: [i386] Replace builtins with vector extensions
On Tue, Apr 09, 2013 at 11:08:38AM +0200, Marc Glisse wrote: The *intrin.h files already use __extension__ to create vectors, like: return __extension__ (__m128d){ __F, 0.0 }; but even when I remove it it does not warn with -std=c89 -pedantic. Even with -Wsystem-headers ? Jakub
Re: [i386] Replace builtins with vector extensions
On Tue, 9 Apr 2013, Jakub Jelinek wrote: On Tue, Apr 09, 2013 at 11:08:38AM +0200, Marc Glisse wrote: The *intrin.h files already use __extension__ to create vectors, like: return __extension__ (__m128d){ __F, 0.0 }; but even when I remove it it does not warn with -std=c89 -pedantic. Even with -Wsystem-headers ? Oups ;-) Ok, removing the existing __extension__ causes warnings (note that it can easily be worked around by initializing a variable instead of this compound literal, so it isn't vectors that pedantic complains about), but my changes do not warn. -- Marc Glisse
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/09/2013 01:47 AM, Robert Dewar wrote: Well the back end has all the information to figure this out I think! But anyway, for Ada, the current situation is just fine, and has the advantage that the -gnatG expanded code listing clearly shows in Ada source form, what is going on. Isn't this a bit optimistic, considering that run-time overflow checking currently does not use existing hardware support? -- Florian Weimer / Red Hat Product Security Team
Re: [PATCH] Fix PR48762
Alternatively, we can bump the minimum of that param, as usual ;) Let's do that and bump it to 1, my understanding is that 0 and 1 are equivalent for this param. -- Eric Botcazou
Re: [Fortran, RFC patch] Document naming and argument passing convention
On Mon, Apr 8, 2013 at 5:38 PM, Tobias Burnus bur...@net-b.de wrote: Dear all, attached is an updated version of the patch, which address the raised issues and some minor problems and omissions I found. OK for the trunk? +For Boolean (@code{LOGICAL}) arguments, please note that GCC expects +only the integer value 0 and 1. If a GNU Fortran @code{LOGICAL} +variable contains another integer value, the result is undefined. +As some other Fortran compilers use @math{-1} for @code{.TRUE.}, +extra care has to be taken -- such as passing the value as +@code{INTEGER}. (The same value restriction also applies to other +front ends of GCC, e.g. to GCC's C99 compiler for @code{_Bool} +or GCC's Ada compiler for @code{Complex}.) Presumably you meant Ada's @code{Bool} (or whatever the Ada boolean type is called)? Ok with that change. -- Janne Blomqvist
Re: [Fortran, RFC patch] Document naming and argument passing convention
Janne Blomqvist wrote: +For Boolean (@code{LOGICAL}) arguments, please note that GCC expects +only the integer value 0 and 1. If a GNU Fortran @code{LOGICAL} +variable contains another integer value, the result is undefined. +As some other Fortran compilers use @math{-1} for @code{.TRUE.}, +extra care has to be taken -- such as passing the value as +@code{INTEGER}. (The same value restriction also applies to other +front ends of GCC, e.g. to GCC's C99 compiler for @code{_Bool} +or GCC's Ada compiler for @code{Complex}.) Presumably you meant Ada's @code{Bool} (or whatever the Ada boolean type is called)? Ok with that change. Boolean. Committed as Rev. 197624. Thanks for the review! Tobias
Re: [patch libgcc]: Adjust cygming-crtbegin code to use weak
2013/4/9 Dave Korn dave.korn.cyg...@gmail.com: On 22/03/2013 08:44, Kai Tietz wrote: 2013-03-22 Kai Tietz kti...@redhat.com * config/i386/cygming-crtbegin.c (__register_frame_info): Make weak. (__deregister_frame_info): Likewise. Hi Kai, I read your explanation of the problem relating to x86-64 memory models over on the Cygwin dev list, and that explained your motivation for making this change; I see why it's not easy to get an *ABS* 0 reference there. So, providing dummy versions of the functions makes perfect sense to me, and certainly won't cause problems for i686. (I did a lot of testing, and the only problem I found is that a weak definition has to be provided on the linker command line *after* the file that contains the weak-with-zero-default definition if it is to override that; in the case here however we're going to be overriding the weak-with-default by a strong function declaration, so that issue does not arise.) I still have a comment or two about the patch itself: Index: libgcc/config/i386/cygming-crtbegin.c === --- libgcc/config/i386/cygming-crtbegin.c (Revision 196898) +++ libgcc/config/i386/cygming-crtbegin.c (Arbeitskopie) @@ -46,15 +46,33 @@ see the files COPYING3 and COPYING.RUNTIME respect #define LIBGCJ_SONAME libgcj_s.dll #endif - +#if DWARF2_UNWIND_INFO /* Make the declarations weak. This is critical for _Jv_RegisterClasses because it lives in libgcj.a */ -extern void __register_frame_info (const void *, struct object *) +extern void __register_frame_info (__attribute__((unused)) const void *, +__attribute__((unused)) struct object *) TARGET_ATTRIBUTE_WEAK; -extern void *__deregister_frame_info (const void *) +extern void *__deregister_frame_info (__attribute__((unused)) const void *) TARGET_ATTRIBUTE_WEAK; -extern void _Jv_RegisterClasses (const void *) TARGET_ATTRIBUTE_WEAK; +TARGET_ATTRIBUTE_WEAK void +__register_frame_info (__attribute__((unused)) const void *p, +__attribute__((unused)) struct object *o) +{} Braces should go on separate lines I think. +TARGET_ATTRIBUTE_WEAK void * +__deregister_frame_info (__attribute__((unused)) const void *p) +{ return (void*) 0; } Certainly here. +#endif /* DWARF2_UNWIND_INFO */ + +#if TARGET_USE_JCR_SECTION +extern void _Jv_RegisterClasses (__attribute__((unused)) const void *) + TARGET_ATTRIBUTE_WEAK; + +TARGET_ATTRIBUTE_WEAK void +_Jv_RegisterClasses (__attribute__((unused)) const void *p) +{} +#endif /* TARGET_USE_JCR_SECTION */ + #if defined(HAVE_LD_RO_RW_SECTION_MIXING) # define EH_FRAME_SECTION_CONST const #else Also, now that you've provided a default weak definition of the functions in the file itself, it's no longer possible for the function pointer variables (register_frame_fn, register_class_fn, deregister_frame_fn) to be zero, so you should remove the if () tests on them and just call them unconditionally. Hmm, well in standard-case you are right. But well there is still a chance that GetProcAddress returns NULL-pointer ... I think we should keep the if-check. cheers, DaveK Kai
[PATCH] Random cleanups
from my local tree. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2013-04-09 Richard Biener rguent...@suse.de java/ * expr.c (build_java_binop): Pass a type to build_int_cst. * tree-ssa-loop-manip.c (rewrite_into_loop_closed_ssa): Avoid work that is not necessary. Index: gcc/java/expr.c === --- gcc/java/expr.c (revision 197621) +++ gcc/java/expr.c (working copy) @@ -1531,7 +1531,7 @@ build_java_binop (enum tree_code op, tre } case LSHIFT_EXPR: case RSHIFT_EXPR: - mask = build_int_cst (NULL_TREE, + mask = build_int_cst (int_type_node, TYPE_PRECISION (TREE_TYPE (arg1)) - 1); arg2 = fold_build2 (BIT_AND_EXPR, int_type_node, arg2, mask); break; Index: gcc/tree-ssa-loop-manip.c === --- gcc/tree-ssa-loop-manip.c (revision 197621) +++ gcc/tree-ssa-loop-manip.c (working copy) @@ -489,7 +489,6 @@ find_uses_to_rename (bitmap changed_bbs, void rewrite_into_loop_closed_ssa (bitmap changed_bbs, unsigned update_flag) { - bitmap *loop_exits; bitmap *use_blocks; bitmap names_to_rename; @@ -505,11 +504,6 @@ rewrite_into_loop_closed_ssa (bitmap cha names_to_rename = BITMAP_ALLOC (loop_renamer_obstack); - /* An array of bitmaps where LOOP_EXITS[I] is the set of basic blocks - that are the destination of an edge exiting loop number I. */ - loop_exits = XNEWVEC (bitmap, number_of_loops ()); - get_loops_exits (loop_exits); - /* Uses of names to rename. We don't have to initialize this array, because we know that we will only have entries for the SSA names in NAMES_TO_RENAME. */ @@ -518,17 +512,26 @@ rewrite_into_loop_closed_ssa (bitmap cha /* Find the uses outside loops. */ find_uses_to_rename (changed_bbs, use_blocks, names_to_rename); - /* Add the PHI nodes on exits of the loops for the names we need to - rewrite. */ - add_exit_phis (names_to_rename, use_blocks, loop_exits); + if (!bitmap_empty_p (names_to_rename)) +{ + /* An array of bitmaps where LOOP_EXITS[I] is the set of basic blocks +that are the destination of an edge exiting loop number I. */ + bitmap *loop_exits = XNEWVEC (bitmap, number_of_loops ()); + get_loops_exits (loop_exits); + + /* Add the PHI nodes on exits of the loops for the names we need to +rewrite. */ + add_exit_phis (names_to_rename, use_blocks, loop_exits); + + free (loop_exits); + + /* Fix up all the names found to be used outside their original +loops. */ + update_ssa (TODO_update_ssa); +} bitmap_obstack_release (loop_renamer_obstack); free (use_blocks); - free (loop_exits); - - /* Fix up all the names found to be used outside their original - loops. */ - update_ssa (TODO_update_ssa); } /* Check invariants of the loop closed ssa form for the USE in BB. */
[patch] print SEQUENCE of insns in sched-vis.c
Hello, sched-vis.c would print insns in a SEQUENCE as a pattern, in effect printing only GET_CODE(insn). Fixed with this patch. Bootstrappedtested on powerpc64-unknown-linux-gnu and sparc64-unknown-linux-gnu with some dump inspections to make sure everything looks as expected now. OK for trunk? Ciao! Steven * sched-vis.c (print_pattern): Print SEQUENCE of insns as insns. Index: sched-vis.c === --- sched-vis.c (revision 197610) +++ sched-vis.c (working copy) @@ -51,6 +51,9 @@ along with GCC; see the file COPYING3. If not see static bool rtl_slim_pp_initialized = false; static pretty_printer rtl_slim_pp; +/* For insns we print patterns, and for some patterns we print insns... */ +static void print_insn_with_notes (pretty_printer *, const_rtx); + /* This recognizes rtx'en classified as expressions. These are always represent some action on values or results of other expression, that may be stored in objects representing values. */ @@ -562,14 +565,32 @@ print_pattern (pretty_printer *pp, const_rtx x, in break; case SEQUENCE: { - int i; - pp_string (pp, sequence{); - for (i = 0; i XVECLEN (x, 0); i++) + if (INSN_P (XVECEXP (x, 0, 0))) { - print_pattern (pp, XVECEXP (x, 0, i), verbose); - pp_character (pp, ';'); + /* Print the sequence insns indented. */ + const char * save_print_rtx_head = print_rtx_head; + char indented_print_rtx_head[32]; + + pp_newline (pp); + gcc_assert (strlen (print_rtx_head) sizeof (indented_print_rtx_head) - 4); + snprintf (indented_print_rtx_head, + sizeof (indented_print_rtx_head), + %s , print_rtx_head); + print_rtx_head = indented_print_rtx_head; + for (int i = 0; i XVECLEN (x, 0); i++) + print_insn_with_notes (pp, XVECEXP (x, 0, i)); + pp_printf (pp, %s , save_print_rtx_head); + print_rtx_head = save_print_rtx_head; } + else + { + for (int i = 0; i XVECLEN (x, 0); i++) + { + print_pattern (pp, XVECEXP (x, 0, i), verbose); + pp_character (pp, ';'); + } + } pp_character (pp, '}'); } break;
Re: [patch] print SEQUENCE of insns in sched-vis.c
Bootstrappedtested on powerpc64-unknown-linux-gnu and sparc64-unknown-linux-gnu with some dump inspections to make sure everything looks as expected now. OK for trunk? Yes, thanks. -- Eric Botcazou
Re: [PATCH] Fix PR48762
On Tue, Apr 09, 2013 at 11:45:04AM +0200, Eric Botcazou wrote: Alternatively, we can bump the minimum of that param, as usual ;) Let's do that and bump it to 1, my understanding is that 0 and 1 are equivalent for this param. Alright. So ok to apply this one (trunk/4.8)? 2013-04-09 Marek Polacek pola...@redhat.com PR tree-optimization/48762 * params.def (PARAM_MAX_CSE_INSNS): Increase the minimum value to 1. --- gcc/params.def.mp 2013-04-09 12:52:07.838278693 +0200 +++ gcc/params.def 2013-04-09 12:52:28.705349278 +0200 @@ -451,7 +451,7 @@ DEFPARAM(PARAM_MAX_GOTO_DUPLICATION_INSN DEFPARAM(PARAM_MAX_CSE_PATH_LENGTH, max-cse-path-length, The maximum length of path considered in cse, -10, 0, 0) +10, 1, 0) DEFPARAM(PARAM_MAX_CSE_INSNS, max-cse-insns, The maximum instructions CSE process before flushing, Marek
Re: RFA: Fix tree-optimization/55524
Quoting Richard Biener richard.guent...@gmail.com: On Mon, Apr 8, 2013 at 5:10 PM, Joern Rennecke joern.renne...@embecosm.com wrote: This is basically the same patch as attached to the PR, except that I have changed the goto-loop into a do-while loop with a new comment; this caused the need for a lot of reformatting. Can you please include a testcase that shows the effect of this? Attached. FWIW, the assembler is simpler to understand if you add the compilation option -mfp-mode=round-nearest. Without the patch, you then get: _f: fmul r1,r1,r3 mov r3, %low(#-2147483648) movt r3, %high(#-2147483648) eor r1,r3,r1 fmadd r1,r0,r2 mov r0,r1 rts And with the patch: _f: fmul r0,r0,r2 fmsub r0,r1,r3 rts 2013-04-08 Joern Rennecke joern.renne...@embecosm.com * tree-ssa-math-opts.c (mult_to_fma_pass): New file static struct. (convert_mult_to_fma): In first pass, don't use an fms construct when we don't have an fms operation, but fmna. it's fnma I believe. Oops, yes, typo. The patch itself has the right spelling. 2013-03-09 Joern Rennecke joern.renne...@embecosm.com * gcc.target/epiphany/fnma-1.c: New test. Index: gcc.target/epiphany/fnma-1.c === --- gcc.target/epiphany/fnma-1.c(revision 0) +++ gcc.target/epiphany/fnma-1.c(working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-final { scan-assembler-times fmsub\[ \ta-zA-Z0-9\]*, 1 } } */ + +float +f (float ar, float ai, float br, float bi) +{ + return ar * br - ai * bi; +}
[PATCH] More vectorizer TLC
This gets rid of slp_void_p and moves vect_get_place_in_interleaving_chain next to its only user. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2013-04-09 Richard Biener rguent...@suse.de * tree-vectorizer.h (slp_void_p): Remove. (slp_tree): Typedef before _slp_tree declaration. (struct _slp_tree): Use a vector of slp_tree as children. (vect_get_place_in_interleaving_chain): Remove. * tree-vect-data-refs.c (vect_get_place_in_interleaving_chain): Move ... * tree-vect-slp.c (vect_get_place_in_interleaving_chain): ... here and make static. (vect_free_slp_tree, vect_print_slp_tree, vect_mark_slp_stmts, vect_mark_slp_stmts_relevant, vect_slp_rearrange_stmts, vect_detect_hybrid_slp_stmts, vect_slp_analyze_node_operations, vect_schedule_slp_instance, vect_remove_slp_scalar_calls): Use slp_node instead of slp_void_p and adjust. Index: gcc/tree-vect-data-refs.c === *** gcc/tree-vect-data-refs.c (revision 197621) --- gcc/tree-vect-data-refs.c (working copy) *** vect_get_smallest_scalar_type (gimple st *** 129,159 } - /* Find the place of the data-ref in STMT in the interleaving chain that starts -from FIRST_STMT. Return -1 if the data-ref is not a part of the chain. */ - - int - vect_get_place_in_interleaving_chain (gimple stmt, gimple first_stmt) - { - gimple next_stmt = first_stmt; - int result = 0; - - if (first_stmt != GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) - return -1; - - while (next_stmt next_stmt != stmt) - { - result++; - next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt)); - } - - if (next_stmt) - return result; - else - return -1; - } - - /* Check if data references pointed by DR_I and DR_J are same or belong to same interleaving group. Return FALSE if drs are different, otherwise return TRUE. */ --- 129,134 Index: gcc/tree-vect-slp.c === *** gcc/tree-vect-slp.c (revision 197621) --- gcc/tree-vect-slp.c (working copy) *** static void *** 67,79 vect_free_slp_tree (slp_tree node) { int i; ! slp_void_p child; if (!node) return; FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! vect_free_slp_tree ((slp_tree) child); SLP_TREE_CHILDREN (node).release (); SLP_TREE_SCALAR_STMTS (node).release (); --- 67,79 vect_free_slp_tree (slp_tree node) { int i; ! slp_tree child; if (!node) return; FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! vect_free_slp_tree (child); SLP_TREE_CHILDREN (node).release (); SLP_TREE_SCALAR_STMTS (node).release (); *** vect_free_oprnd_info (vecslp_oprnd_info *** 168,173 --- 168,198 } + /* Find the place of the data-ref in STMT in the interleaving chain that starts +from FIRST_STMT. Return -1 if the data-ref is not a part of the chain. */ + + static int + vect_get_place_in_interleaving_chain (gimple stmt, gimple first_stmt) + { + gimple next_stmt = first_stmt; + int result = 0; + + if (first_stmt != GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) + return -1; + + do + { + if (next_stmt == stmt) + return result; + result++; + next_stmt = GROUP_NEXT_ELEMENT (vinfo_for_stmt (next_stmt)); + } + while (next_stmt); + + return -1; + } + + /* Get the defs for the rhs of STMT (collect them in OPRNDS_INFO), check that they are of a valid type and that they match the defs of the first stmt of the SLP group (stored in OPRNDS_INFO). */ *** vect_print_slp_tree (int dump_kind, slp_ *** 991,997 { int i; gimple stmt; ! slp_void_p child; if (!node) return; --- 1016,1022 { int i; gimple stmt; ! slp_tree child; if (!node) return; *** vect_print_slp_tree (int dump_kind, slp_ *** 1005,1011 dump_printf (dump_kind, \n); FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! vect_print_slp_tree (dump_kind, (slp_tree) child); } --- 1030,1036 dump_printf (dump_kind, \n); FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! vect_print_slp_tree (dump_kind, child); } *** vect_mark_slp_stmts (slp_tree node, enum *** 1019,1025 { int i; gimple stmt; ! slp_void_p child; if (!node) return; --- 1044,1050 { int i; gimple stmt; ! slp_tree child; if (!node) return; *** vect_mark_slp_stmts (slp_tree node, enum *** 1029,1035 STMT_SLP_TYPE (vinfo_for_stmt (stmt)) = mark; FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) ! vect_mark_slp_stmts ((slp_tree) child, mark, j); }
Re: Comments on the suggestion to use infinite precision math for wide int.
On 4/9/2013 5:39 AM, Florian Weimer wrote: On 04/09/2013 01:47 AM, Robert Dewar wrote: Well the back end has all the information to figure this out I think! But anyway, for Ada, the current situation is just fine, and has the advantage that the -gnatG expanded code listing clearly shows in Ada source form, what is going on. Isn't this a bit optimistic, considering that run-time overflow checking currently does not use existing hardware support? Not clear what you mean here, we don't rely on the back end for run-time overflow checking. What is over-optimistic here? BTW, existing hardware support can be a dubious thing, you have to be careful to evaluate costs, for instance you don't want to use INTO on modern x86 targets!
Re: [PATCH] Fix PR48762
On 04/09/2013 05:13 AM, Marek Polacek wrote: On Tue, Apr 09, 2013 at 11:45:04AM +0200, Eric Botcazou wrote: Alternatively, we can bump the minimum of that param, as usual ;) Let's do that and bump it to 1, my understanding is that 0 and 1 are equivalent for this param. Alright. So ok to apply this one (trunk/4.8)? 2013-04-09 Marek Polacek pola...@redhat.com PR tree-optimization/48762 * params.def (PARAM_MAX_CSE_INSNS): Increase the minimum value to 1. OK. Jeff
Re: [PATCH][Backport 4.7][ARM] Fix PR 56720
On 05/04/13 16:35, Kyrylo Tkachov wrote: Hi all, This patch is a backport of the fix for PR 56720 where we would ICE on arm-*-* when trying to expand vcond with a floating point unorderd comparison cases. The patch is almost identical to the trunk patch at: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00652.html except that it adds explicit handling of the LTGT case which does not require explicit handling in 4.8 and onwards. PR 56720 itself has been fixed on trunk and 4.8 but is present in 4.7 and 4.6 Ok for 4.7 now or when it reopens? Is this a regression? If so, from what? My feeling is that unless this has fixed a regression, it's sufficiently complex that it's not really appropriate to back-port it to a stable branch. R. Tested arm-none-eabi on qemu. Thanks, Kyrill gcc/ChangeLog 2013-04-05 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/56720 * config/arm/iterators.md (v_cmp_result): New mode attribute. * config/arm/neon.md (vcondmodemode): Handle unordered cases. gcc/testsuite/ChangeLog 2013-04-05 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/56720 * gcc.target/arm/neon-vcond-gt.c: New test. * gcc.target/arm/neon-vcond-ltgt.c: Likewise. * gcc.target/arm/neon-vcond-unordered.c: Likewise.= neon-vcond-4.7.patch
[PATCH] color diagnostics markers
On Mon, Apr 08, 2013 at 03:23:01PM +0200, Jakub Jelinek wrote: Anyway, I've kept the default as never for now. Here is an updated patch with %r/%R, fully bootstrapped/regtested now on x86_64-linux and i686-linux, tested on a couple of small testcases with -fdiagnostics-color{,=never,=auto,=always}, without GCC_COLORS in environment, with GCC_COLORS= or with GCC_COLORS='error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01;33:quote=01;34' and both on my default gnome-terminal settings (Linux console colors with white on black) and on default configured xterm (i.e. xterm colors, black on white). The default is still -fdiagnostics-color=never, can be changed later on. Ok for trunk? 2013-04-09 Manuel López-Ibáñez m...@gcc.gnu.org Jakub Jelinek ja...@redhat.com * opts.c: Include diagnostic-color.h. (common_handle_option): Handle OPT_fdiagnostics_color_. * Makefile.in (OBJS-libcommon): Add diagnostic-color.o. (diagnostic.o, opts.o, pretty-print.o): Depend on diagnostic-color.h. (diagnostic-color.o): New. * common.opt (fdiagnostics-color, fdiagnostics-color=): New options. (diagnostic_color_rule): New enum. * dwarf2out.c (gen_producer_string): Don't print -fdiagnostics-color*. * langhooks.c (lhd_print_error_function): Add %r locus and %R around the location string. * diagnostic.def: Add 3rd argument to DEFINE_DIAGNOSTIC_KIND macros, either NULL, or color kind. * diagnostic-color.c: New file. * diagnostic-color.h: New file. * diagnostic-core.h (DEFINE_DIAGNOSTIC_KIND): Adjust macro for 3 arguments. * doc/invoke.texi (-fdiagnostics-color): Document. * pretty-print.h (pp_show_color): Define. (struct pretty_print_info): Add show_color field. * diagnostic.c: Include diagnostic-color.h. (diagnostic_build_prefix): Adjust for 3 argument DEFINE_DIAGNOSTIC_KIND macros. Colorize error:, warning: etc. strings and also the location string. (diagnostic_show_locus): Colorize the caret line. * pretty-print.c: Include diagnostic-color.h. (pp_base_format): Handle %r and %R format specifiers. Colorize strings inside of % % quotes or quoted through q format modifier. c-family/ * c-format.c (gcc_diag_char_table, gcc_tdiag_char_table, gcc_cdiag_char_table, gcc_cxxdiag_char_table): Add %r and %R format specifiers. cp/ * error.c (cp_print_error_function, print_instantiation_partial_context_line, maybe_print_constexpr_context): Colorize locus strings. --- gcc/opts.c.jj 2013-04-09 08:34:52.577866059 +0200 +++ gcc/opts.c 2013-04-09 08:35:02.879844677 +0200 @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3. #include flags.h #include params.h #include diagnostic.h +#include diagnostic-color.h #include opts-diagnostic.h #include insn-attr-common.h #include common/common-target.h @@ -1497,6 +1498,11 @@ common_handle_option (struct gcc_options dc-show_caret = value; break; +case OPT_fdiagnostics_color_: + pp_show_color (dc-printer) + = colorize_init ((diagnostic_color_rule_t) value); + break; + case OPT_fdiagnostics_show_option: dc-show_option_requested = value; break; --- gcc/Makefile.in.jj 2013-04-09 08:34:52.554866101 +0200 +++ gcc/Makefile.in 2013-04-09 08:35:02.881844586 +0200 @@ -1465,7 +1465,7 @@ OBJS = \ # Objects in libcommon.a, potentially used by all host binaries and with # no target dependencies. -OBJS-libcommon = diagnostic.o pretty-print.o intl.o input.o version.o +OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o input.o version.o # Objects in libcommon-target.a, used by drivers and by the core # compiler and containing target-dependent code. @@ -2668,11 +2668,12 @@ fold-const.o : fold-const.c $(CONFIG_H) $(GIMPLE_H) realmpfr.h $(TREE_FLOW_H) diagnostic.o : diagnostic.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ version.h $(DEMANGLE_H) $(INPUT_H) intl.h $(BACKTRACE_H) $(DIAGNOSTIC_H) \ - diagnostic.def + diagnostic.def diagnostic-color.h +diagnostic-color.o : diagnostic-color.c $(CONFIG_H) $(SYSTEM_H) diagnostic-color.h opts.o : opts.c $(OPTS_H) $(OPTIONS_H) $(DIAGNOSTIC_CORE_H) $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(DUMPFILE_H) $(TM_H) \ $(DIAGNOSTIC_H) insn-attr-common.h intl.h $(COMMON_TARGET_H) \ - $(FLAGS_H) $(PARAMS_H) opts-diagnostic.h + $(FLAGS_H) $(PARAMS_H) opts-diagnostic.h diagnostic-color.h opts-global.o : opts-global.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(DIAGNOSTIC_H) $(OPTS_H) $(FLAGS_H) $(GGC_H) $(TREE_H) langhooks.h \ $(TM_H) $(RTL_H) $(DBGCNT_H) debug.h $(LTO_STREAMER_H) output.h \ @@ -3434,7 +3435,8 @@ params.o : params.c $(CONFIG_H) $(SYSTEM $(PARAMS_H) $(DIAGNOSTIC_CORE_H) pointer-set.o: pointer-set.c pointer-set.h $(CONFIG_H) $(SYSTEM_H) hooks.o: hooks.c $(CONFIG_H)
RE: [PATCH][Backport 4.7][ARM] Fix PR 56720
-Original Message- From: Richard Earnshaw Sent: 09 April 2013 14:12 To: Kyrylo Tkachov Cc: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan; 'Jakub Jelinek' Subject: Re: [PATCH][Backport 4.7][ARM] Fix PR 56720 On 05/04/13 16:35, Kyrylo Tkachov wrote: Hi all, This patch is a backport of the fix for PR 56720 where we would ICE on arm-*-* when trying to expand vcond with a floating point unorderd comparison cases. The patch is almost identical to the trunk patch at: http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00652.html except that it adds explicit handling of the LTGT case which does not require explicit handling in 4.8 and onwards. PR 56720 itself has been fixed on trunk and 4.8 but is present in 4.7 and 4.6 Ok for 4.7 now or when it reopens? Is this a regression? If so, from what? I've been able to reproduce it in 4.6 and onwards but have not tried any earlier releases. My feeling is that unless this has fixed a regression, it's sufficiently complex that it's not really appropriate to back-port it to a stable branch. Fair enough, I don't think there's any urgency to get this fixed on earlier releases. We can drop this backport then. Thanks, Kyrill R. Tested arm-none-eabi on qemu. Thanks, Kyrill gcc/ChangeLog 2013-04-05 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/56720 * config/arm/iterators.md (v_cmp_result): New mode attribute. * config/arm/neon.md (vcondmodemode): Handle unordered cases. gcc/testsuite/ChangeLog 2013-04-05 Kyrylo Tkachov kyrylo.tkac...@arm.com PR target/56720 * gcc.target/arm/neon-vcond-gt.c: New test. * gcc.target/arm/neon-vcond-ltgt.c: Likewise. * gcc.target/arm/neon-vcond-unordered.c: Likewise.= neon-vcond-4.7.patch
[PATCH] Fix omp loop ICEs if iterators are addressable
Hi! The omp for loop iterator can be addressable e.g. because of shared use of it in a 2x nested parallel (in which case we can't use copy-in/out). The following patch tweaks expand_omp_for* to allow the iterators to be addressable. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. Will backport to 4.8 soon (and to 4.7 after 4.7.3 is released). 2013-04-09 Jakub Jelinek ja...@redhat.com PR middle-end/56883 * omp-low.c (expand_omp_for_generic, expand_omp_for_static_nochunk, expand_omp_for_static_chunk): Use simple_p = true in force_gimple_operand_gsi calls when assigning to addressable decls. * c-c++-common/gomp/pr56883.c: New test. --- gcc/omp-low.c.jj2013-02-07 08:59:48.0 +0100 +++ gcc/omp-low.c 2013-04-09 11:09:42.197181925 +0200 @@ -3920,8 +3920,10 @@ expand_omp_for_generic (struct omp_regio if (POINTER_TYPE_P (type)) t = fold_convert (signed_type_for (type), t); t = fold_convert (type, t); - t = force_gimple_operand_gsi (gsi, t, false, NULL_TREE, - false, GSI_CONTINUE_LINKING); + t = force_gimple_operand_gsi (gsi, t, + DECL_P (fd-loop.v) +TREE_ADDRESSABLE (fd-loop.v), + NULL_TREE, false, GSI_CONTINUE_LINKING); stmt = gimple_build_assign (fd-loop.v, t); gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); @@ -3952,8 +3954,11 @@ expand_omp_for_generic (struct omp_regio t = fold_build_pointer_plus (fd-loops[i].n1, t); else t = fold_build2 (PLUS_EXPR, itype, fd-loops[i].n1, t); - t = force_gimple_operand_gsi (gsi, t, false, NULL_TREE, - false, GSI_CONTINUE_LINKING); + t = force_gimple_operand_gsi (gsi, t, + DECL_P (fd-loops[i].v) +TREE_ADDRESSABLE (fd-loops[i].v), + NULL_TREE, false, + GSI_CONTINUE_LINKING); stmt = gimple_build_assign (fd-loops[i].v, t); gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); if (i != 0) @@ -3981,12 +3986,15 @@ expand_omp_for_generic (struct omp_regio t = fold_build_pointer_plus (vmain, fd-loop.step); else t = fold_build2 (PLUS_EXPR, type, vmain, fd-loop.step); - t = force_gimple_operand_gsi (gsi, t, false, NULL_TREE, - true, GSI_SAME_STMT); + t = force_gimple_operand_gsi (gsi, t, + DECL_P (vback) TREE_ADDRESSABLE (vback), + NULL_TREE, true, GSI_SAME_STMT); stmt = gimple_build_assign (vback, t); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); - t = build2 (fd-loop.cond_code, boolean_type_node, vback, iend); + t = build2 (fd-loop.cond_code, boolean_type_node, + DECL_P (vback) TREE_ADDRESSABLE (vback) ? t : vback, + iend); stmt = gimple_build_cond_empty (t); gsi_insert_before (gsi, stmt, GSI_SAME_STMT); @@ -4011,8 +4019,12 @@ expand_omp_for_generic (struct omp_regio e-probability = REG_BR_PROB_BASE / 8; t = fd-loops[i + 1].n1; - t = force_gimple_operand_gsi (gsi, t, false, NULL_TREE, - false, GSI_CONTINUE_LINKING); + t = force_gimple_operand_gsi (gsi, t, + DECL_P (fd-loops[i + 1].v) +TREE_ADDRESSABLE + (fd-loops[i + 1].v), + NULL_TREE, false, + GSI_CONTINUE_LINKING); stmt = gimple_build_assign (fd-loops[i + 1].v, t); gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); } @@ -4026,8 +4038,11 @@ expand_omp_for_generic (struct omp_regio else t = fold_build2 (PLUS_EXPR, vtype, fd-loops[i].v, fd-loops[i].step); - t = force_gimple_operand_gsi (gsi, t, false, NULL_TREE, - false, GSI_CONTINUE_LINKING); + t = force_gimple_operand_gsi (gsi, t, + DECL_P (fd-loops[i].v) +TREE_ADDRESSABLE (fd-loops[i].v), + NULL_TREE, false, + GSI_CONTINUE_LINKING); stmt = gimple_build_assign (fd-loops[i].v, t); gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING); @@ -4036,8 +4051,12 @@ expand_omp_for_generic (struct omp_regio t = fd-loops[i].n2; t =
Re: RFA: Fix tree-optimization/55524
On Tue, Apr 9, 2013 at 1:25 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: On Mon, Apr 8, 2013 at 5:10 PM, Joern Rennecke joern.renne...@embecosm.com wrote: This is basically the same patch as attached to the PR, except that I have changed the goto-loop into a do-while loop with a new comment; this caused the need for a lot of reformatting. Can you please include a testcase that shows the effect of this? Attached. FWIW, the assembler is simpler to understand if you add the compilation option -mfp-mode=round-nearest. Without the patch, you then get: _f: fmul r1,r1,r3 mov r3, %low(#-2147483648) movt r3, %high(#-2147483648) eor r1,r3,r1 fmadd r1,r0,r2 mov r0,r1 rts And with the patch: _f: fmul r0,r0,r2 fmsub r0,r1,r3 rts This means the iteration doesn't help here. 2013-04-08 Joern Rennecke joern.renne...@embecosm.com * tree-ssa-math-opts.c (mult_to_fma_pass): New file static struct. (convert_mult_to_fma): In first pass, don't use an fms construct when we don't have an fms operation, but fmna. it's fnma I believe. Oops, yes, typo. The patch itself has the right spelling. 2013-03-09 Joern Rennecke joern.renne...@embecosm.com * gcc.target/epiphany/fnma-1.c: New test. Index: gcc.target/epiphany/fnma-1.c === --- gcc.target/epiphany/fnma-1.c(revision 0) +++ gcc.target/epiphany/fnma-1.c(working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-final { scan-assembler-times fmsub\[ \ta-zA-Z0-9\]*, 1 } } */ + +float +f (float ar, float ai, float br, float bi) +{ + return ar * br - ai * bi; +}
Re: Comments on the suggestion to use infinite precision math for wide int.
On 04/09/2013 02:41 PM, Robert Dewar wrote: On 4/9/2013 5:39 AM, Florian Weimer wrote: On 04/09/2013 01:47 AM, Robert Dewar wrote: Well the back end has all the information to figure this out I think! But anyway, for Ada, the current situation is just fine, and has the advantage that the -gnatG expanded code listing clearly shows in Ada source form, what is going on. Isn't this a bit optimistic, considering that run-time overflow checking currently does not use existing hardware support? Not clear what you mean here, we don't rely on the back end for run-time overflow checking. What is over-optimistic here? BTW, existing hardware support can be a dubious thing, you have to be careful to evaluate costs, for instance you don't want to use INTO on modern x86 targets! It's not aobut INTO, just access to the overflow flag. A simple function which adds its two Integer arguments compiles to this machine code (with -gnato -O2): .cfi_startproc movslq %esi, %rax movslq %edi, %rdx addq%rax, %rdx movl$2147483648, %eax addq%rax, %rdx movl$4294967295, %eax cmpq%rax, %rdx ja .L5 leal(%rsi,%rdi), %eax ret .L5: pushq %rax .cfi_def_cfa_offset 16 movl$3, %esi movl$.LC0, %edi xorl%eax, %eax call__gnat_rcheck_10 .cfi_endproc While it could be like this: .cfi_startproc movl%edi, %eax addl%esi, %eax jo .L5 ret .L5: pushq %rax .cfi_def_cfa_offset 16 movl$3, %esi movl$.LC0, %edi xorl%eax, %eax call__gnat_rcheck_10 .cfi_endproc Admittedly, I haven't benchmarked it, but at least from the code size aspect, it's a significant improvement. -- Florian Weimer / Red Hat Product Security Team
[PATCH] Vectorizer TLC, split SLP discovery and cost calculation
This splits discovering of a SLP opportunity (vect_build_slp_tree) and calculating the cost of a SLP instance (now vect_analyze_slp_cost). The immediate need for me is to support handling of mismatches in the SLP tree due to commutative operand order mismatch, one more cleanup patch for this reason will follow. In the end cost calculation should be delayed even more, but that's for later. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2013-04-09 Richard Biener rguent...@suse.de * tree-vect-slp.c (vect_get_and_check_slp_defs): Remove code dealing with cost. (vect_build_slp_tree): Likewise. (vect_analyze_slp_cost_1, vect_analyze_slp_cost): New functions calculating the cost of a SLP instance. (vect_analyze_slp_instance): Use it from here, after building the SLP tree. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 197629) +++ gcc/tree-vect-slp.c (working copy) @@ -199,11 +199,8 @@ vect_get_place_in_interleaving_chain (gi static bool vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo, - slp_tree slp_node, gimple stmt, -int ncopies_for_cost, bool first, - vecslp_oprnd_info *oprnds_info, -stmt_vector_for_cost *prologue_cost_vec, -stmt_vector_for_cost *body_cost_vec) + gimple stmt, bool first, + vecslp_oprnd_info *oprnds_info) { tree oprnd; unsigned int i, number_of_oprnds; @@ -211,8 +208,6 @@ vect_get_and_check_slp_defs (loop_vec_in gimple def_stmt; enum vect_def_type dt = vect_uninitialized_def; enum vect_def_type dt_op0 = vect_uninitialized_def; - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - tree lhs = gimple_get_lhs (stmt); struct loop *loop = NULL; enum tree_code rhs_code; bool different_types = false; @@ -344,22 +339,6 @@ vect_get_and_check_slp_defs (loop_vec_in { def_op0 = def; dt_op0 = dt; - /* Analyze costs (for the first stmt of the group only). */ - if (REFERENCE_CLASS_P (lhs)) - /* Store. */ -vect_model_store_cost (stmt_info, ncopies_for_cost, false, - dt, slp_node, prologue_cost_vec, - body_cost_vec); - else - { - enum vect_def_type dts[2]; - dts[0] = dt; - dts[1] = vect_uninitialized_def; - /* Not memory operation (we don't call this function for -loads). */ - vect_model_simple_cost (stmt_info, ncopies_for_cost, dts, - prologue_cost_vec, body_cost_vec); - } } } else @@ -479,13 +458,11 @@ vect_get_and_check_slp_defs (loop_vec_in static bool vect_build_slp_tree (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo, - slp_tree *node, unsigned int group_size, int *outside_cost, - int ncopies_for_cost, unsigned int *max_nunits, + slp_tree *node, unsigned int group_size, + unsigned int *max_nunits, vecint *load_permutation, vecslp_tree *loads, - unsigned int vectorization_factor, bool *loads_permuted, -stmt_vector_for_cost *prologue_cost_vec, -stmt_vector_for_cost *body_cost_vec) + unsigned int vectorization_factor, bool *loads_permuted) { unsigned int i; vecgimple stmts = SLP_TREE_SCALAR_STMTS (*node); @@ -750,11 +727,8 @@ vect_build_slp_tree (loop_vec_info loop_ if (REFERENCE_CLASS_P (lhs)) { /* Store. */ - if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, *node, - stmt, ncopies_for_cost, - (i == 0), oprnds_info, - prologue_cost_vec, - body_cost_vec)) + if (!vect_get_and_check_slp_defs (loop_vinfo, bb_vinfo, + stmt, (i == 0), oprnds_info)) { vect_free_oprnd_info (oprnds_info); return false; @@ -863,11 +837,6 @@ vect_build_slp_tree (loop_vec_info loop_ vect_free_oprnd_info (oprnds_info); return false; } - - /* Analyze costs (for the first stmt in the group). */ - vect_model_load_cost (vinfo_for_stmt (stmt), -ncopies_for_cost,
[PATCH SH] Error: unaligned opcodes detected in executable segment
Hello, This patch fixes label alignments after a ADDR_DIFF_VEC with byte offsets. The bug occurs with building the libgcc for a sh-elf target. See a reduced case here. The funny thing is that the assembly error given in Subject appears only on a debug section when compiled with -g, thus the emitted code without -g: .align 2 .L4: .byte .L3-.L5 .byte .L10-.L5 .byte .L7-.L5 .byte .L8-.L5 .byte .L9-.L5 .L3: mov.l .L13,r1 assembles silently, although unaligned... OK for trunk ? The failure occurs with the libgcc, so tested by allowing the sh-elf build to complete. Thanks, Christian 2013-04-09 Christian Bruel christian.br...@st.com * config/sh/sh.md (barrier_align): Use next/prev_active_insn instead of next/prev_real_insn. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 197633) +++ gcc/config/sh/sh.c (working copy) @@ -5842,7 +5842,7 @@ fixup_addr_diff_vecs (rtx first) int barrier_align (rtx barrier_or_label) { - rtx next = next_real_insn (barrier_or_label), pat, prev; + rtx next = next_active_insn (barrier_or_label), pat, prev; if (! next) return 0; @@ -5856,7 +5856,7 @@ barrier_align (rtx barrier_or_label) /* This is a barrier in front of a constant table. */ return 0; - prev = prev_real_insn (barrier_or_label); + prev = prev_active_insn (barrier_or_label); if (GET_CODE (PATTERN (prev)) == ADDR_DIFF_VEC) { pat = PATTERN (prev); int a; int foo (int sw) { int r=0; switch (sw) { case 1: r=a; break; case 2: r=2; break; case 3: r=3; break; case 4: r=4; break; case 5: r=5; break; } return r; }
Re: RFA: Fix tree-optimization/55524
Quoting Richard Biener richard.guent...@gmail.com: Oops, I missed your final comment when I wrote my first reply. I fail to see why you need two passes for this rather than considering the case that the immediate use stmt of the multiplication we start from combines another multiplication with a MINUS_EXPR. That is ... + if (use_code == MINUS_EXPR !negate_p + gimple_assign_rhs1 (use_stmt) == result + optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing + optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing + mult_to_fma_pass.second_pass == false) see if that is the case here and simply not do the transform. In other words, I would have to take gimple_assign_rh2 of use_stmt, then find the statement that sets that value, check if this is a multiply, and that it is not expanded as a widening multiply (thus duplicating most of the code in execute_optimize_widening_mul), and also check that if there are any other uses of the multiply result, that these are likewise reduced to fma somehow. At which point we hit recursion. This could easily double the size of this source file. Also, we are alternating between forward and backward searches, which could lead to quadratic complexity where we had linear otherwise. A second pass will not recover from that without destroying the fnma pattern (testcase?) The second pass is there to generate the fms where we didn't generate fnma. If fnma (as negate (fma)) has been generated, it will use up the MINUS_EXPR, so that fms cannot be formed anymore. This arrangement is a lot simpler than going for a depth-first decision.
Re: [PATCH] color diagnostics markers
Looks good to me if nobody else has comments. Jason
Re: RFA: Fix tree-optimization/55524
On Tue, Apr 9, 2013 at 4:53 PM, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Richard Biener richard.guent...@gmail.com: Oops, I missed your final comment when I wrote my first reply. I fail to see why you need two passes for this rather than considering the case that the immediate use stmt of the multiplication we start from combines another multiplication with a MINUS_EXPR. That is ... + if (use_code == MINUS_EXPR !negate_p + gimple_assign_rhs1 (use_stmt) == result + optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing + optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing + mult_to_fma_pass.second_pass == false) see if that is the case here and simply not do the transform. In other words, I would have to take gimple_assign_rh2 of use_stmt, then find the statement that sets that value, check if this is a multiply, and that it is not expanded as a widening multiply (thus duplicating most of the code in execute_optimize_widening_mul), and also check that if there are any other uses of the multiply result, that these are likewise reduced to fma somehow. At which point we hit recursion. This could easily double the size of this source file. Also, we are alternating between forward and backward searches, which could lead to quadratic complexity where we had linear otherwise. I don't see that. It's merely a complication of optimal handling of a * b +- c * d vs. just a * b +- c. The pass does simple pattern matching only, not doing a global optimal transform, so adding another special-case is reasonable. Special-casing just for single-use 2nd multiplication simplifies the cases for example. Richard. A second pass will not recover from that without destroying the fnma pattern (testcase?) The second pass is there to generate the fms where we didn't generate fnma. If fnma (as negate (fma)) has been generated, it will use up the MINUS_EXPR, so that fms cannot be formed anymore. This arrangement is a lot simpler than going for a depth-first decision.
Re: [PATCH v3]IPA: fixing inline fail report caused by overwritable functions.
Hi Richard, I do not have write access to GCC SVN repository, can you commit it for me? Thanks alot Cheers On Tue, Apr 9, 2013 at 5:04 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Apr 9, 2013 at 5:40 AM, Zhouyi Zhou zhouzho...@gmail.com wrote: On Mon, Apr 8, 2013 at 5:48 PM, Richard Biener richard.guent...@gmail.com wrote: Can you trigger this message to show up with -Winline before/after the patch? Can you please add a testcase then? Thanks Richard for reviewing, from my point of view about gcc and my invoking of gcc, -Winline only works on callees that be declared inline, but if the callee is declared inline, it will be AVAIL_AVAILABLE in function can_inline_edge_p, thus out of the range of my patch. Ah, indeed ... So I only add a testcase for fixing the tree dump, are there any thing more I can do? No. Patch is ok. Thanks, Richard. Regtested/bootstrapped on x86_64-linux ChangeLog: 2013-04-08 Zhouyi Zhou yizhouz...@ict.ac.cn * cif-code.def (OVERWRITABLE): correct the comment for overwritable function * ipa-inline.c (can_inline_edge_p): let dump mechanism report the inline fail caused by overwritable functions. * gcc.dg/tree-ssa/inline-11.c: New test Index: gcc/cif-code.def === --- gcc/cif-code.def(revision 197549) +++ gcc/cif-code.def(working copy) @@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE, /* Function is not inlinable. */ DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_(function not inlinable)) -/* Function is not overwritable. */ +/* Function is overwritable. */ DEFCIFCODE(OVERWRITABLE, N_(function body can be overwritten at link time)) /* Function is not an inlining candidate. */ Index: gcc/testsuite/gcc.dg/tree-ssa/inline-11.c === --- gcc/testsuite/gcc.dg/tree-ssa/inline-11.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/inline-11.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-einline } */ +int w; +int bar (void) __attribute__ ((weak)); +int bar (){ + w++; +} +void foo() +{ + bar(); +} +/* { dg-final { scan-tree-dump-times function body can be overwritten at link time 1 einline } } */ +/* { dg-final { cleanup-tree-dump einline } } */ Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c(revision 197549) +++ gcc/ipa-inline.c(working copy) @@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e else if (avail = AVAIL_OVERWRITABLE) { e-inline_failed = CIF_OVERWRITABLE; - return false; + inlinable = false; } else if (e-call_stmt_cannot_inline_p) {
Re: useless cast blocking some optimization in gcc 4.7.3
I understand, Thanks for your answer. Looking at the standard, I was thinking the example you pointed was undefined. I have created a bugzilla (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56894) to track this big performance regression. My investigations pointed on the scev_const_prop optimization. The scalar evolution of result variable in the loop cannot be computed because of the cast : type integer_type 0xf7de32a0 char sizes-gimplified public string-flag type_6 QI[...] arg 0 polynomial_chrec 0xf7ecd180 type integer_type 0xf7de33c0 int sizes-gimplified public type_6 SI[...] arg 0 integer_cst 0xf7dcbe70 constant 2 arg 1 integer_cst 0xf7dcba80 constant 0 arg 2 integer_cst 0xf7dcba9c constant 1 I don't know how to fix the case. But I believe it should be looked at, as it is a quite big perf regression in some testcases. please advice. Also, I didn't find the bugzilla that led to this change of the += operation. I would be interested to have a look at it, if you can find it. Thanks Laurent On 09.04.2013 10:50, Richard Biener wrote: On Mon, Apr 8, 2013 at 9:13 PM, Laurent Alfonsi laurent.alfo...@st.com wrote: Hello, I have identified a big performance regression between 4.6 and 4.7. (I have enclosed a pathological test). After investigation, it is because of the += statement applied on 2 signed chars. - It is now type-promoted to int when it is written result += foo(). (since 4.7) - it is type promoted to unsigned char when it is written result = result + foo(). The char-int-char cast is blocking some optimizations in later phases. Anyway, this doesn't look wrong, so I extended fold optimization in order to catch this case. (patch enclosed) The patch basically transforms : (TypeA) ( (TypeB) a1 + (TypeB) a2 )/* with a1 and a2 of the signed type TypeA */ into : a1 + a2 I believe this is legal for any licit a1/a2 input values (no overflow on signed char). No new failure on the two tested targets : sh-superh-elf and x86_64-unknown-linux-gnu. Should I enter a bugzilla to track this ? Is it ok for trunk ? Please open a bugzilla. No, the patch is not ok, as said by Marc. It's possible to shorten the operation by performing it using unsigned arithmetic, that is (signed)((unsigned)a1 + (unsigned)a2). But if really casts cause the issue you are seeing that will not help you. Richard. 2013-04-08 Laurent Alfonsi laurent.alfo...@st.com * fold-const.c (fold_unary_loc): Suppress useless type promotion. Thanks, Laurent .
Re: [PATCH][ARM] minmax_arithsi for non-canonical operand order with MINUS operator
On 02/04/13 17:06, Kyrylo Tkachov wrote: From: Ramana Radhakrishnan [mailto:ramana@googlemail.com] Sent: 02 April 2013 11:10 To: Kyrylo Tkachov Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan Subject: Re: [PATCH][ARM] minmax_arithsi for non-canonical operand order with MINUS operator On Thu, Mar 21, 2013 at 6:09 PM, Kyrylo Tkachov kyrylo.tkac...@arm.com wrote: Hi all, This patch adds a splitter variant of the minmax_arithsi pattern for when the operator is non-commutative (MINUS) and the ordering of the operands is not canonical. That is, it will trigger for: #define MAX(a, b) (a b ? a : b) int foo (int a, int b, int c) { return c - MAX (a,b); } and will generate: cmp r1, r0 rsbge r0, r1, r2 rsblt r0, r0, r2 instead of the current: cmp r0, r1 movlt r0, r1 rsb r0, r0, r2 No regressions on arm-none-eabi. Ok for trunk? Split after reload please into cond-exec or use if_then_else instead if you are splitting before reload - I originally thought it to be safe when you asked me, but then have gone back and corrected myself. Read this thread . http://patches.linaro.org/6469/ . Hi Ramana, thanks for the review. How about this? We split after reload now. Using if_then_else got me a lot of unrecognisable instruction ICEs and delaying the split till after reload seemed like a cleaner approach. Tested arm-none-eabi on qemu. gcc/ 2013-04-02 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (minmax_arithsi_non_canon): New pattern. gcc/testsuite 2013-04-02 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/minmax_minus.c: New test. OK. R.
Re: [PATCH,ARM][6/n] Split min and max patterns
On 18/02/13 18:44, Greta Yorsh wrote: Convert define_insn into define_insn_and_split for various min and max patterns that output multiple assembly instructions. Use movsicc to emit RTL. A separate patch will split movsicc. gcc/ 2013-02-14 Greta Yorsh greta.yo...@arm.com * config/arm/arm.md (arm_smax_insn): Convert define_insn into define_insn_and_split. (arm_smin_insn,arm_umaxsi3,arm_uminsi3): Likewise. OK. R.
Re: [PATCH,ARM][3/n] Split various patterns
On 18/02/13 18:38, Greta Yorsh wrote: Convert define_insn into define_insn_and_split for various patterns that output multiple assembly instructions. It appears that preparation statements in define_insn_and_split sometimes are called with which_alternative set to -1 even after reload. Therefore, preparation statements use conditions on the operands instead of which_alternative. gcc/ 2013-02-14 Greta Yorsh greta.yo...@arm.com * config/arm/arm.md (andsi_iorsi3_notsi): Convert define_insn into define_insn_and_split. (arm_negdi2,arm_abssi2,arm_neg_abssi2): Likewise. (arm_cmpdi_insn,arm_cmpdi_unsigned): Likewise. OK. R.
Re: Fill more delay slots in conditional returns
Ah, OK, yea, it makes perfect sense now. Approved. Thanks. If you wanted to get ambitious, rewriting reorg.c to compute and use dependency links to drive its candidate selection instead of the insane tracking code it uses would be a huge step forward, both in terms of cleanliness. It'd also help compile-time performance; we do a lot of work to track resources that would be totally unnecessary. Yes, I agree that it's quite convoluted but, as Steven already said, rewriting it to use a more modern framework is hard because of SEQUENCEs and, frankly, I have neither the real incentive nor the time to start such an endeavor. Instead I can raise the following couple of points that I also ran into with the private port I'm working on: 1. When fill_simple_delay_slots is scanning backwards to find insns to put into slots, it calls update_reg_dead_note which collects REG_DEAD notes and puts them on the insn to be moved. But emit_delay_sequence will later drop them on the floor. I have one case where preserving such a REG_DEAD note is correct and makes a difference (slot filled vs empty). 2. In resource.c, when mark_target_live_regs is doing its liveness analysis, it extracts the insns wrapped in USEs by reorg.c: /* If this insn is a USE made by update_block, we care about the underlying insn. */ if (code == INSN GET_CODE (PATTERN (insn)) == USE INSN_P (XEXP (PATTERN (insn), 0))) real_insn = XEXP (PATTERN (insn), 0); and proceeds with the liveness analysis without further ado. Now, the story is different in find_dead_or_set_registers, which does: if (GET_CODE (PATTERN (insn)) == USE) { /* If INSN is a USE made by update_block, we care about the underlying insn. Any registers set by the underlying insn are live since the insn is being done somewhere else. */ if (INSN_P (XEXP (PATTERN (insn), 0))) mark_set_resources (XEXP (PATTERN (insn), 0), res, 0, MARK_SRC_DEST_CALL); and thus pessimizes the analysis by making registers unnecessary live. Again I have one case where not pessimizing is correct and makes a difference. Experimental patch attached, clean on the C testsuite for the private port. What do you think? -- Eric BotcazouIndex: reorg.c === --- reorg.c (revision 197617) +++ reorg.c (working copy) @@ -549,6 +549,11 @@ emit_delay_sequence (rtx insn, rtx list, remove_note (tem, note); break; + case REG_DEP_TRUE: + /* This was a REG_DEAD note that we want to preserve. */ + PUT_REG_NOTE_KIND (note, REG_DEAD); + break; + case REG_LABEL_OPERAND: case REG_LABEL_TARGET: /* Keep the label reference count up to date. */ @@ -1806,8 +1811,10 @@ update_reg_dead_notes (rtx insn, rtx del if (reg_referenced_p (XEXP (link, 0), PATTERN (insn))) { - /* Move the REG_DEAD note from P to INSN. */ + /* Move the REG_DEAD note from P to INSN but smuggle it so that + emit_delay_sequence doesn't drop it on the floor. */ remove_note (p, link); + PUT_REG_NOTE_KIND (link, REG_DEP_TRUE); XEXP (link, 1) = REG_NOTES (insn); REG_NOTES (insn) = link; } Index: resource.c === --- resource.c (revision 197617) +++ resource.c (working copy) @@ -425,6 +425,7 @@ find_dead_or_set_registers (rtx target, for (insn = target; insn; insn = next) { rtx this_jump_insn = insn; + rtx real_insn = insn; next = NEXT_INSN (insn); @@ -443,7 +444,6 @@ find_dead_or_set_registers (rtx target, AND_COMPL_HARD_REG_SET (pending_dead_regs, needed.regs); AND_COMPL_HARD_REG_SET (res-regs, pending_dead_regs); CLEAR_HARD_REG_SET (pending_dead_regs); - continue; case BARRIER: @@ -454,14 +454,12 @@ find_dead_or_set_registers (rtx target, if (GET_CODE (PATTERN (insn)) == USE) { /* If INSN is a USE made by update_block, we care about the - underlying insn. Any registers set by the underlying insn - are live since the insn is being done somewhere else. */ + underlying insn. */ if (INSN_P (XEXP (PATTERN (insn), 0))) - mark_set_resources (XEXP (PATTERN (insn), 0), res, 0, -MARK_SRC_DEST_CALL); - - /* All other USE insns are to be ignored. */ - continue; + real_insn = XEXP (PATTERN (insn), 0); + else + /* All other USE insns are to be ignored. */ + continue; } else if (GET_CODE (PATTERN (insn)) == CLOBBER) continue; @@ -582,8 +580,8 @@ find_dead_or_set_registers (rtx target, } } - mark_referenced_resources (insn, needed, true); - mark_set_resources (insn, set, 0, MARK_SRC_DEST_CALL); + mark_referenced_resources (real_insn, needed, true); + mark_set_resources
Re: RFA: Fix tree-optimization/55524
Quoting Richard Biener richard.guent...@gmail.com: I don't see that. It's merely a complication of optimal handling of a * b +- c * d vs. just a * b +- c. The pass does simple pattern matching only, not doing a global optimal transform, so adding another special-case is reasonable. Special-casing just for single-use 2nd multiplication simplifies the cases for example. I have attached a version of the patch that uses this simpler test. Currently bootstrapping / regtesting on i686-pc-linux-gnu . gcc: 2013-04-09 Joern Rennecke joern.renne...@embecosm.com PR tree-optimization/55524 * tree-ssa-math-opts.c (convert_mult_to_fma): Don't use an fms construct when we don't have an fms operation, but fnma, and it looks likely that we'll be able to use the latter. gcc/testsuite: 2013-04-09 Joern Rennecke joern.renne...@embecosm.com PR tree-optimization/55524 * gcc.target/epiphany/fnma-1.c: New test. Index: testsuite/gcc.target/epiphany/fnma-1.c === --- testsuite/gcc.target/epiphany/fnma-1.c (revision 0) +++ testsuite/gcc.target/epiphany/fnma-1.c (working copy) @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-final { scan-assembler-times fmsub\[ \ta-zA-Z0-9\]*, 1 } } */ + +float +f (float ar, float ai, float br, float bi) +{ + return ar * br - ai * bi; +} Index: tree-ssa-math-opts.c === --- tree-ssa-math-opts.c(revision 197578) +++ tree-ssa-math-opts.c(working copy) @@ -2570,6 +2570,24 @@ convert_mult_to_fma (gimple mul_stmt, tr return false; } + /* If the subtrahend (gimple_assign_rhs2 (use_stmt)) is computed +by a MULT_EXPR that we'll visit later, we might be able to +get a more profitable match with fnma. +OTOH, if we don't, a negate / fma pair has likely lower latency +that a mult / subtract pair. */ + if (use_code == MINUS_EXPR !negate_p + gimple_assign_rhs1 (use_stmt) == result + optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing + optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) + { + tree rhs2 = gimple_assign_rhs2 (use_stmt); + gimple stmt2 = SSA_NAME_DEF_STMT (rhs2); + + if (has_single_use (rhs2) + gimple_assign_rhs_code (stmt2) == MULT_EXPR) + return false; + } + /* We can't handle a * b + a * b. */ if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt)) return false;
Re: Fill more delay slots in conditional returns
On Tue, Apr 9, 2013 at 6:15 PM, Eric Botcazou wrote: If you wanted to get ambitious, rewriting reorg.c to compute and use dependency links to drive its candidate selection instead of the insane tracking code it uses would be a huge step forward, both in terms of cleanliness. It'd also help compile-time performance; we do a lot of work to track resources that would be totally unnecessary. Yes, I agree that it's quite convoluted but, as Steven already said, rewriting it to use a more modern framework is hard because of SEQUENCEs and, frankly, I have neither the real incentive nor the time to start such an endeavor. I've actually picked up the idea again. This is just last week-end's work so don't expect much of it yet ;-) But I'm near the point where I want to see if I can actually make it fill some slots from the containing basic block and pushing it through verify_flow_info somehow. After that, the hard parts: branches and annulling. Ciao! Steven Index: reorg.c === --- reorg.c (revision 197610) +++ reorg.c (working copy) @@ -3847,6 +3847,10 @@ free (uid_to_ruid); crtl-dbr_scheduled_p = true; } + +#include sched-deps-graph.c +#include sched-dbr.c + #endif /* DELAY_SLOTS */ static bool @@ -3865,6 +3869,7 @@ rest_of_handle_delay_slots (void) { #ifdef DELAY_SLOTS + fill_delay_slots (); dbr_schedule (get_insns ()); #endif return 0; #include sched-int.h /* ??? That means dbr_schedule would come to depend on INSN_SCHEDULING?! */ //#ifdef INSN_SCHEDULING /* Bah, sched-deps needs this. That's a bug, but what the heck... */ static struct haifa_sched_info sched_dbr_info = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, NULL, NULL, NULL, NULL, NULL, NULL, 0 }; static struct sched_deps_info_def dbr_sched_deps_info = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0, 0 }; static struct common_sched_info_def dbr_common_sched_info; static void fill_delay_slots_from_bb (basic_block bb) { rtx head, tail; struct deps_desc tmp_deps; /* Compute dependencies. */ get_ebb_head_tail (bb, bb, head, tail); init_deps_global (); init_deps (tmp_deps, false); sched_analyze (tmp_deps, head, tail); free_deps (tmp_deps); finish_deps_global (); /* Print dependency information. */ if (dump_file sched_dump) { fprintf (sched_dump, ;; --- forward dependences: \n); fprintf (sched_dump, \n;; --- DBR Dependences --- for bb%d \n, bb-index); debug_dependencies (head, tail); } print_dependencies_graph (stderr, cfun, bb); rtx insn; FOR_BB_INSNS (bb, insn) { if (! NONDEBUG_INSN_P (insn) || (GET_CODE (PATTERN (insn)) == USE || GET_CODE (PATTERN (insn)) == CLOBBER)) continue; INSN_FROM_TARGET_P (insn) = 0; if (JUMP_P (insn)) INSN_ANNULLED_BRANCH_P (insn) = 0; int slots_to_fill = num_delay_slots (insn); if (slots_to_fill 0) { int flags; if (JUMP_P (insn)) flags = get_jump_flags (insn, JUMP_LABEL (insn)); else flags = get_jump_flags (insn, NULL_RTX); fprintf (stderr, // insn %d has %d delay slots to fill\n, INSN_UID (insn), slots_to_fill); for (rtx cand_insn = PREV_INSN (insn); cand_insn != BB_HEAD (bb); cand_insn = PREV_INSN (cand_insn)) { sd_iterator_def sd_it; dep_t dep; int n_forw_deps = 0; bool cand_feeds_call_insn = false; bool cand_deps_cross_insn = false; /* TRIAL must be a CALL_INSN or INSN. Skip USE and CLOBBER. */ if (! NONDEBUG_INSN_P (cand_insn) || (GET_CODE (PATTERN (cand_insn)) == USE || GET_CODE (PATTERN (cand_insn)) == CLOBBER)) continue; if (! eligible_for_delay (insn, 0, cand_insn, flags)) { fprintf (stderr, //\tinsn %d not eligible for delay slots of insn %d\n, INSN_UID (cand_insn), INSN_UID (insn)); continue; } rtx set = single_set (cand_insn); if (! set || multiple_sets (cand_insn)) continue; FOR_EACH_DEP (cand_insn, SD_LIST_FORW, sd_it, dep) { n_forw_deps++; if (DF_INSN_LUID (insn) DF_INSN_LUID (DEP_CON (dep))) cand_deps_cross_insn = true; else if (DF_INSN_LUID (insn) == DF_INSN_LUID (DEP_CON (dep)) CALL_P (insn) df_reg_used (insn, SET_DEST (set))) cand_feeds_call_insn = true; else // not a deps leaf, not a call arg setter { fprintf (stderr, //\tinsn %d is not suitable, blocked by insn %d\n, INSN_UID (cand_insn), INSN_UID (DEP_CON (dep))); n_forw_deps = -1; break; } } if (n_forw_deps 0) continue; if (n_forw_deps == 0) { fprintf (stderr, //\tinsn %d is an obvious candidate, dependencies leaf\n, INSN_UID (cand_insn)); fprintf (stderr, \tfn_%d_insn_uid_%d:e -
Re: [google gcc-4_7] offline profile merge (patchset 2) (issue8508048)
Ok for google branches. thanks, David On Mon, Apr 8, 2013 at 3:44 PM, Rong Xu x...@google.com wrote: Revised copyright info. -Rong 2013-04-08 Rong Xu x...@google.com * contrib/profile_merge.py: An offline profile merge tool. Index: contrib/profile_merge.py === --- contrib/profile_merge.py(revision 0) +++ contrib/profile_merge.py(revision 0) @@ -0,0 +1,1320 @@ +#!/usr/bin/python2.7 +# +#Copyright (C) 2013 +#Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. +# + + +Merge two or more gcda profile. + + +__author__ = 'Seongbae Park, Rong Xu' +__author_email__ = 'sp...@google.com, x...@google.com' + +import array +from optparse import OptionGroup +from optparse import OptionParser +import os +import struct +import zipfile + +new_histogram = None + + +class Error(Exception): + Exception class for profile module. + + +def ReadAllAndClose(path): + Return the entire byte content of the specified file. + + Args: +path: The path to the file to be opened and read. + + Returns: +The byte sequence of the content of the file. + + data_file = open(path, 'rb') + data = data_file.read() + data_file.close() + return data + + +def MergeCounters(objs, index, multipliers): + Accumulate the counter at index from all counters objs. + val = 0 + for j in xrange(len(objs)): +val += multipliers[j] * objs[j].counters[index] + return val + + +class DataObject(object): + Base class for various datum in GCDA/GCNO file. + + def __init__(self, tag): +self.tag = tag + + +class Function(DataObject): + Function and its counters. + + Attributes: +length: Length of the data on the disk +ident: Ident field +line_checksum: Checksum of the line number +cfg_checksum: Checksum of the control flow graph +counters: All counters associated with the function +file: The name of the file the function is defined in. Optional. +line: The line number the function is defined at. Optional. + + Function object contains other counter objects and block/arc/line objects. + + + def __init__(self, reader, tag, n_words): +Read function record information from a gcda/gcno file. + +Args: + reader: gcda/gcno file. + tag: funtion tag. + n_words: length of function record in unit of 4-byte. + +DataObject.__init__(self, tag) +self.length = n_words +self.counters = [] + +if reader: + pos = reader.pos + self.ident = reader.ReadWord() + self.line_checksum = reader.ReadWord() + self.cfg_checksum = reader.ReadWord() + + # Function name string is in gcno files, but not + # in gcda files. Here we make string reading optional. + if (reader.pos - pos) n_words: +reader.ReadStr() + + if (reader.pos - pos) n_words: +self.file = reader.ReadStr() +self.line_number = reader.ReadWord() + else: +self.file = '' +self.line_number = 0 +else: + self.ident = 0 + self.line_checksum = 0 + self.cfg_checksum = 0 + self.file = None + self.line_number = 0 + + def Write(self, writer): +Write out the function. + +writer.WriteWord(self.tag) +writer.WriteWord(self.length) +writer.WriteWord(self.ident) +writer.WriteWord(self.line_checksum) +writer.WriteWord(self.cfg_checksum) +for c in self.counters: + c.Write(writer) + + def EntryCount(self): +Return the number of times the function called. +return self.ArcCounters().counters[0] + + def Merge(self, others, multipliers): +Merge all functions in others into self. + +Args: + others: A sequence of Function objects + multipliers: A sequence of integers to be multiplied during merging. + +for o in others: + assert self.ident == o.ident + assert self.line_checksum == o.line_checksum + assert self.cfg_checksum == o.cfg_checksum + +for i in xrange(len(self.counters)): + self.counters[i].Merge([o.counters[i] for o in others], multipliers) + + def Print(self): +Print all the attributes in full detail. +print 'function: ident %d length %d line_chksum %x
Re: [patch] PR middle-end/43631
Premature ping? Also bootstrappedtested on ia64-unknown-linux-gnu now. On Sun, Apr 7, 2013 at 12:40 AM, Steven Bosscher wrote: Hello, In this PR43631, var-tracking notes are inserted on basic block boundaries and BB_HEAD/BB_END end up pointing to these notes. This breaks verify_flow_info. The problem is actually bigger than just the var-tracking notes. The general problem is that there are no rules for whether or not notes are part of a basic block or not. Some notes never appear inside a basic block, some notes always must appear inside a basic block, and some can appear virtually anywhere. With this patch I've chosen to maintain the invariant that BB_END must be an INSN, and never be a NOTE or BARRIER. The result is that NOTE_INSN_EH_REGION_BEG notes can be inside a basic block while the pairing NOTE_INSN_EH_REGION_END is outside the basic block. I don't think this is a problem: The same thing already happens with jump table data, barriers, etc. The nice thing is that with this patch, *finally* I have verify_flow_info pass after var-tracking. That's important because officially the CFG is freed after this pass (pass_free_cfg runs after pass_variable_tracking) but because verify_flow_info didn't pass after var-tracking the CFG was already invalidated before it was freed (BB_HEAD and BB_END would be incorrect). That has the effect that some of the machine-reorg passes that use the CFG never really had a correct CFG at all! With this patch and a hack to call compute_bb_for_insn, I now can call verify_flow_info in every pass up to final for the i386 port :-) Bootstrappedtested on x86_64-unknown-linux-gnu and powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven
Re: [patch libgcc]: Adjust cygming-crtbegin code to use weak
On 09/04/2013 11:37, Kai Tietz wrote: Hmm, well in standard-case you are right. But well there is still a chance that GetProcAddress returns NULL-pointer ... How would that actually happen? Removing any of those functions from libgcc or libjava would be a very serious ABI breakage; we're never going to do that. (And even if we do, the version number of the DLL would have to change and LoadLibrary wouldn't return anything, unless we changed the shared lib string, which would be the appropriate time to re-add a check.) It's not critical, but it's wasted code. cheers, DaveK
Re: [google gcc-4_7] offline profile merge tool (issue 8508048)
Have not had a chance to look at the whole patch yet, but here are some initial comments. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py File contrib/profile_merge.py (right): https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode53 contrib/profile_merge.py:53: data_file = open(path, 'rb') How about simpler version: with open(file, 'rb') as f: data = f.read() https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode59 contrib/profile_merge.py:59: def MergeCounters(objs, index, multipliers): Perhaps use function name that shows that it returns the value. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode78 contrib/profile_merge.py:78: length: Length of the data on the disk Minor style. Make sure format is coherent (ends with . or not, starts with capital letter or not). https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode119 contrib/profile_merge.py:119: self.ident = 0 It is ever useful to have a function object with no reader? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode138 contrib/profile_merge.py:138: return self.ArcCounters().counters[0] This will throw an exception if no counters exists. Is that expected? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode209 contrib/profile_merge.py:209: class Lines(DataObject): Where is this used? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1088 contrib/profile_merge.py:1088: dir: A path to the directory containing the gcda/gcno. A cleaner way could perhaps be to have a base profile archive class and then classes for ProfileArchieveDir and ProfileArchiveZip? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1153 contrib/profile_merge.py:1153: os.chdir(direc) How about using absolute paths to avoid the chdir logic? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1199 contrib/profile_merge.py:1199: outfile_path = output_path + '/' + f os.path.join() instead of +, here and other places where the pattern is used. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1213 contrib/profile_merge.py:1213: def Merge(self, archives, multipliers): The code seems to handle both the case when 'self' is inside archives and when it is not. Does both happen when running the code? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1233 contrib/profile_merge.py:1233: for i, a in enumerate(archives): It's hard to see what i is in this case. I would prefer to have information about arguments in the documentation at the top of the function, but more descriptive variable names and comments would also help. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1288 contrib/profile_merge.py:1288: group = OptionGroup(parser, 'Arguments', You don't seem to add any options to the group? It seems not necessary to use a group at all in this program. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1313 contrib/profile_merge.py:1313: input_profiles[0].Merge(input_profiles, profile_multipliers) Did you conside creating a new object instead of modifying the existing? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1317 contrib/profile_merge.py:1317: input_profiles[0].Write(output_profile) Warn before overwriting? https://codereview.appspot.com/8508048/
Re: RFC: add some static probes to libstdc++
Jonathan == Jonathan Wakely jwakely@gmail.com writes: Marc I thought you were going to suggest enhancing the configure test Marc so it fails on old systemtap (detects it as absent). Jonathan Ah yes, that's a much better idea! Here's a patch to do that. I tested it on x86-64 Fedora 18. I tested the failing path by using an old sdt.h. This required a number of iterations to ensure that the test failed for the right reasons. Ok? I think it should go on the 4.8 branch as well. Tom 2013-04-09 Tom Tromey tro...@redhat.com * configure, config.h.in: Rebuild. * configure.ac: Use GLIBCXX_CHECK_SDT_H. Don't check for sys/sdt.h. * acinclude.m4 (GLIBCXX_CHECK_SDT_H): New defun. diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 4d06b20..619fff0 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -3660,6 +3660,36 @@ AC_DEFUN([GLIBCXX_ENABLE_WERROR], [ ]) +dnl +dnl Check to see if sys/sdt.h exists and that it is suitable for use. +dnl Some versions of sdt.h were not compatible with C++11. +dnl +AC_DEFUN([GLIBCXX_CHECK_SDT_H], [ + AC_MSG_RESULT([for suitable sys/sdt.h]) + # Note that this test has to be run with the C language. + # Otherwise, sdt.h will try to include some headers from + # libstdc++ itself. + AC_LANG_SAVE + AC_LANG_C + AC_CACHE_VAL(glibcxx_cv_sys_sdt_h, [ +# Because we have to run the test in C, we use grep rather +# than the compiler to check for the bug. The bug is that +# were strings without trailing whitespace, causing g++ +# to look for operator. The pattern searches for the fixed +# output. +AC_EGREP_CPP([ \,\ ], [ + #include sys/sdt.h + int f() { STAP_PROBE(hi, bob); } +], [glibcxx_cv_sys_sdt_h=yes], [glibcxx_cv_sys_sdt_h=no]) + ]) + AC_LANG_RESTORE + if test $glibcxx_cv_sys_sdt_h = yes; then +AC_DEFINE(HAVE_SYS_SDT_H, 1, + [Define to 1 if you have a suitable sys/sdt.h header file]) + fi + AC_MSG_RESULT($glibcxx_cv_sys_sdt_h) +]) + # Macros from the top-level gcc directory. m4_include([../config/gc++filt.m4]) m4_include([../config/tls.m4]) diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac index de66406..73d430a 100644 --- a/libstdc++-v3/configure.ac +++ b/libstdc++-v3/configure.ac @@ -211,12 +211,13 @@ GLIBCXX_CHECK_SC_NPROCESSORS_ONLN GLIBCXX_CHECK_SC_NPROC_ONLN GLIBCXX_CHECK_PTHREADS_NUM_PROCESSORS_NP GLIBCXX_CHECK_SYSCTL_HW_NCPU +GLIBCXX_CHECK_SDT_H # Check for available headers. AC_CHECK_HEADERS([endian.h execinfo.h float.h fp.h ieeefp.h inttypes.h \ locale.h machine/endian.h machine/param.h nan.h stdint.h stdlib.h string.h \ strings.h sys/ipc.h sys/isa_defs.h sys/machine.h sys/param.h \ -sys/resource.h sys/sdt.h sys/sem.h sys/stat.h sys/time.h sys/types.h unistd.h \ +sys/resource.h sys/sem.h sys/stat.h sys/time.h sys/types.h unistd.h \ wchar.h wctype.h]) # Only do link tests if native. Else, hardcode.
Re: Fill more delay slots in conditional returns
On 04/09/2013 10:15 AM, Eric Botcazou wrote: Yes, I agree that it's quite convoluted but, as Steven already said, rewriting it to use a more modern framework is hard because of SEQUENCEs and, frankly, I have neither the real incentive nor the time to start such an endeavor. I understand completely. Instead I can raise the following couple of points that I also ran into with the private port I'm working on: 1. When fill_simple_delay_slots is scanning backwards to find insns to put into slots, it calls update_reg_dead_note which collects REG_DEAD notes and puts them on the insn to be moved. But emit_delay_sequence will later drop them on the floor. I have one case where preserving such a REG_DEAD note is correct and makes a difference (slot filled vs empty). I don't recall ever working on this aspect of reorg. The obvious worry is that with reorg moving stuff around those notes may not be valid anymore in the general case. This is further complicated by the fact that reorg mucks up the CFG enough that correct, incremental updates of life information might be exceedingly hard. 2. In resource.c, when mark_target_live_regs is doing its liveness analysis, it extracts the insns wrapped in USEs by reorg.c: /* If this insn is a USE made by update_block, we care about the underlying insn. */ if (code == INSN GET_CODE (PATTERN (insn)) == USE INSN_P (XEXP (PATTERN (insn), 0))) real_insn = XEXP (PATTERN (insn), 0); and proceeds with the liveness analysis without further ado. Now, the story is different in find_dead_or_set_registers, which does: Just a note, there's a formatting error in this code. The real_insn = statement is indented too far. Feel free to fix that :-) This looks like the safe/conservative thing for this code. We're marking when regs become live. Marking something live too early doesn't hurt here (from a correctness standpoint) as far as I can tell. Note that deaths are deferred until the next label. if (GET_CODE (PATTERN (insn)) == USE) { /* If INSN is a USE made by update_block, we care about the underlying insn. Any registers set by the underlying insn are live since the insn is being done somewhere else. */ if (INSN_P (XEXP (PATTERN (insn), 0))) mark_set_resources (XEXP (PATTERN (insn), 0), res, 0, MARK_SRC_DEST_CALL); and thus pessimizes the analysis by making registers unnecessary live. Again I have one case where not pessimizing is correct and makes a difference. Seems reasonable. This might just be an oversight by the original author. Effectively we're going to start marking the referenced resources with your patch. We still defer killing as far as I can tell. Experimental patch attached, clean on the C testsuite for the private port. What do you think? If you want to run with it, I won't object. My worry would be we might not see the fallout for a long time as the number of things that have to fall into place for an observable failure here are very high. Jeff
[google] Avoid warning when unused attribute applied to C++ member variables (issue8580044)
This patch is pending on trunk, but I would like to get this into google branches now as it is causing spurious warnings. Google ref b/8496800. This patch allows the unused attribute to be used without warning on C++ class members, which are of type FIELD_DECL. This is for compatibility with clang, which allows the attribute to be specified on class members and struct fields. It looks like more work would need to be done to implement the actual unused variable detection and warning on FIELD_DECLs, but this change will at least avoid the warning on the code that uses the unused attribute in these cases. The documentation at http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html also doesn't seem to preclude its use on C++ member variables. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for google branches? 2013-04-09 Teresa Johnson tejohn...@google.com * c-family/c-common.c (handle_unused_attribute): Index: c-family/c-common.c === --- c-family/c-common.c (revision 197640) +++ c-family/c-common.c (working copy) @@ -6320,6 +6320,7 @@ handle_unused_attribute (tree *node, tree name, tr if (TREE_CODE (decl) == PARM_DECL || TREE_CODE (decl) == VAR_DECL + || TREE_CODE (decl) == FIELD_DECL || TREE_CODE (decl) == FUNCTION_DECL || TREE_CODE (decl) == LABEL_DECL || TREE_CODE (decl) == TYPE_DECL) -- This patch is available for review at http://codereview.appspot.com/8580044
C++ PATCH for c++/25466 (runtime SEGV with typeid)
The standard is somewhat unclear here, but I think what makes sense is to always check whether the address is null rather than confine the check to when the immediate operand of typeid is an INDIRECT_REF. Tested x86_64-pc-linux-gnu, applying to trunk. commit 43d5a64843e3e18a74e59b1b368cd129d5f1de88 Author: Jason Merrill ja...@redhat.com Date: Tue Apr 9 14:11:01 2013 -0400 PR c++/25466 * rtti.c (build_typeid): Check the address of the argument rather than looking for an INDIRECT_REF. diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c index e83d666..b3c6687 100644 --- a/gcc/cp/rtti.c +++ b/gcc/cp/rtti.c @@ -326,18 +326,16 @@ build_typeid (tree exp, tsubst_flags_t complain) /* FIXME when integrating with c_fully_fold, mark resolves_to_fixed_type_p case as a non-constant expression. */ - if (INDIRECT_REF_P (exp) - TYPE_PTR_P (TREE_TYPE (TREE_OPERAND (exp, 0))) - TYPE_POLYMORPHIC_P (TREE_TYPE (exp)) + if (TYPE_POLYMORPHIC_P (TREE_TYPE (exp)) ! resolves_to_fixed_type_p (exp, nonnull) ! nonnull) { /* So we need to look into the vtable of the type of exp. - This is an lvalue use of expr then. */ - exp = mark_lvalue_use (exp); + Make sure it isn't a null lvalue. */ + exp = cp_build_addr_expr (exp, complain); exp = stabilize_reference (exp); - cond = cp_convert (boolean_type_node, TREE_OPERAND (exp, 0), - complain); + cond = cp_convert (boolean_type_node, exp, complain); + exp = cp_build_indirect_ref (exp, RO_NULL, complain); } exp = get_tinfo_decl_dynamic (exp, complain); diff --git a/gcc/testsuite/g++.dg/rtti/typeid10.C b/gcc/testsuite/g++.dg/rtti/typeid10.C new file mode 100644 index 000..47b45b1 --- /dev/null +++ b/gcc/testsuite/g++.dg/rtti/typeid10.C @@ -0,0 +1,36 @@ +// PR c++/25466 +// { dg-do run } + +#include typeinfo + +const std::type_info *a; + +template class T +bool is_polymorphic() { + bool result(false); + const std::type_info a1 = typeid( (result=true), *(T*)0); + a = a1; + return result; +} + +struct non_polymorphic {}; +struct polymorphic { virtual ~polymorphic() {} }; + + +int main() { + if (is_polymorphicint()) __builtin_abort(); + if (is_polymorphicnon_polymorphic()) __builtin_abort(); + try +{ + is_polymorphicpolymorphic(); + __builtin_abort(); // should have thrown bad_typeid +} + catch (std::bad_typeid) +{ + // OK +} + catch (...) +{ + __builtin_abort(); +} +}
[Patch, fortran] PR 40958 Compress module files with zlib
Hi, the attached patch reduces the size of module files on disk by compressing them with zlib and storing them in the gzip format (RFC 1952). I chose zlib because it's a) ubiquitous and b) there's already a copy of zlib in the GCC source tree, so this doesn't introduce any further build dependencies. Since the mod files with the patch are in the gzip format, one can use tools like zcat, zless, zgrep, zdiff etc. to inspect the uncompressed contents easily. As the gzip file format already contains a CRC32 checksum, the patch gets rid of the current MD5 checksumming, and instead compares the CRC32 to determine whether a mod file should be replaced or not. As the gz* functions can also handle uncompressed files, and because I replaced the line with the MD5 sum with an empty line, there was no need to bump the module version number. Based on some testing with CP2K_2009-05-01.f90 (a single file containing all of CP2K), the compile time with -fsyntax-only increased from about 46 to 48 seconds, and the total size of the module files were reduced from 124 MB to 13 MB. (IMHO the increase in compile time is modest enough that it's not worth doing the caching of uncompressed module files that I was previously thinking about, especially considering that large projects are invariably(?!) split over several source files, thereby reducing the effectiveness of such a caching scheme.) Regtested on x86_64-unknown-linux-gnu, Ok for trunk? 2013-04-09 Janne Blomqvist j...@gcc.gnu.org PR fortran/40958 * scanner.h: New file. * Make-lang.in: Dependencies on scanner.h. * scanner.c (gfc_directorylist): Move to scanner.h. * module.c: Don't include md5.h, include scanner.h and zlib.h. (MOD_VERSION): Add comment about backwards compatibility. (module_fp): Change type to gzFile. (ctx): Remove. (gzopen_included_file_1): New function. (gzopen_included_file): New function. (gzopen_intrinsic_module): New function. (write_char): Use gzputc. (read_crc32_from_module_file): New function. (read_md5_from_module_file): Remove. (gfc_dump_module): Use gz* functions instead of stdio, check gzip crc32 instead of md5. (read_module_to_tmpbuf): Use gz* functions instead of stdio. (gfc_use_module): Use gz* functions. testsuite: 2013-04-09 Janne Blomqvist j...@gcc.gnu.org PR fortran/40958 * lib/gcc-dg.exp (scan-module): Uncompress module file before scanning. * gfortran.dg/module_md5_1.f90: Remove. -- Janne Blomqvist modzlib.diff Description: Binary data
Re: RFC: add some static probes to libstdc++
On 9 April 2013 18:47, Tom Tromey wrote: Jonathan == Jonathan Wakely jwakely@gmail.com writes: Marc I thought you were going to suggest enhancing the configure test Marc so it fails on old systemtap (detects it as absent). Jonathan Ah yes, that's a much better idea! Here's a patch to do that. I tested it on x86-64 Fedora 18. I tested the failing path by using an old sdt.h. This required a number of iterations to ensure that the test failed for the right reasons. Ah yes, tricky. Ok? I think it should go on the 4.8 branch as well. OK for trunk and 4.8, thanks.
Re: [i386] Replace builtins with vector extensions
On Tue, 9 Apr 2013, Marc Glisse wrote: On Tue, 9 Apr 2013, Richard Biener wrote: I seem to remember discussion in the PR(s) that the intrinsics should (and do for other compilers) expand to the desired instructions even when the corresponding instruction set is disabled. emmintrin.h starts with: #ifndef __SSE2__ # error SSE2 instruction set not enabled Oh, re-reading your post, it looks like you mean we should change the current behavior, not just avoid regressions... My opinion on the intrinsics is that they are the portable way to use vectors on x86, but they are not equivalent to asm (which people should use if they don't want the compiler looking at their code). Knowingly generating SSE code with -mno-sse is not very appealing. However, the arguments in: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56298 make sense. I guess I'll forget about this patch. -- Marc Glisse
Re: [google] Avoid warning when unused attribute applied to C++ member variables (issue8580044)
Ok -- there are legitimate use cases for this. David On Tue, Apr 9, 2013 at 11:01 AM, Teresa Johnson tejohn...@google.com wrote: This patch is pending on trunk, but I would like to get this into google branches now as it is causing spurious warnings. Google ref b/8496800. This patch allows the unused attribute to be used without warning on C++ class members, which are of type FIELD_DECL. This is for compatibility with clang, which allows the attribute to be specified on class members and struct fields. It looks like more work would need to be done to implement the actual unused variable detection and warning on FIELD_DECLs, but this change will at least avoid the warning on the code that uses the unused attribute in these cases. The documentation at http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html also doesn't seem to preclude its use on C++ member variables. Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for google branches? 2013-04-09 Teresa Johnson tejohn...@google.com * c-family/c-common.c (handle_unused_attribute): Index: c-family/c-common.c === --- c-family/c-common.c (revision 197640) +++ c-family/c-common.c (working copy) @@ -6320,6 +6320,7 @@ handle_unused_attribute (tree *node, tree name, tr if (TREE_CODE (decl) == PARM_DECL || TREE_CODE (decl) == VAR_DECL + || TREE_CODE (decl) == FIELD_DECL || TREE_CODE (decl) == FUNCTION_DECL || TREE_CODE (decl) == LABEL_DECL || TREE_CODE (decl) == TYPE_DECL) -- This patch is available for review at http://codereview.appspot.com/8580044
[google gcc-4_7] offline profile tool (patchset 3) (issue8508048)
Hi, Patch set 3 for offline profile merge tool. (1) Rename to profile_tool for future function expansion. (2) Install to bin/ directory for easy user access. Thanks, -Rong 2013-04-09 Rong Xu x...@google.com * gcc/Makefile.in: Install profile_tool to bin directory. * contrib/profile_tool: A set of offline tools to analyze/manipulate gcov profiles. Current version only supports profile merge. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 197613) +++ gcc/Makefile.in (working copy) @@ -752,6 +752,7 @@ GCC_INSTALL_NAME := $(shell echo gcc|sed '$(progra GCC_TARGET_INSTALL_NAME := $(target_noncanonical)-$(shell echo gcc|sed '$(program_transform_name)') CPP_INSTALL_NAME := $(shell echo cpp|sed '$(program_transform_name)') GCOV_INSTALL_NAME := $(shell echo gcov|sed '$(program_transform_name)') +PROFILE_TOOL_INSTALL_NAME := $(shell echo profile_tool|sed '$(program_transform_name)') # Setup the testing framework, if you have one EXPECT = `if [ -f $${rootme}/../expect/expect ] ; then \ @@ -4691,6 +4692,13 @@ install-common: native lang.install-common install rm -f $(DESTDIR)$(bindir)/$(GCOV_INSTALL_NAME)$(exeext); \ $(INSTALL_PROGRAM) gcov$(exeext) $(DESTDIR)$(bindir)/$(GCOV_INSTALL_NAME)$(exeext); \ fi +# Install profile_tool if it is available. + -if [ -f $(srcdir)/../contrib/profile_tool ]; \ + then \ + rm -f $(DESTDIR)$(bindir)/$(PROFILE_TOOL_INSTALL_NAME)$(exeext); \ + $(INSTALL_PROGRAM) $(srcdir)/../contrib/profile_tool \ + $(DESTDIR)$(bindir)/$(PROFILE_TOOL_INSTALL_NAME)$(exeext); \ + fi # Install the driver program as $(target_noncanonical)-gcc, # $(target_noncanonical)-gcc-$(version) Index: contrib/profile_tool === --- contrib/profile_tool(revision 0) +++ contrib/profile_tool(revision 0) @@ -0,0 +1,1320 @@ +#!/usr/bin/python2.7 +# +#Copyright (C) 2013 +#Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. +# + + +Merge two or more gcda profile. + + +__author__ = 'Seongbae Park, Rong Xu' +__author_email__ = 'sp...@google.com, x...@google.com' + +import array +from optparse import OptionGroup +from optparse import OptionParser +import os +import struct +import zipfile + +new_histogram = None + + +class Error(Exception): + Exception class for profile module. + + +def ReadAllAndClose(path): + Return the entire byte content of the specified file. + + Args: +path: The path to the file to be opened and read. + + Returns: +The byte sequence of the content of the file. + + data_file = open(path, 'rb') + data = data_file.read() + data_file.close() + return data + + +def MergeCounters(objs, index, multipliers): + Accumulate the counter at index from all counters objs. + val = 0 + for j in xrange(len(objs)): +val += multipliers[j] * objs[j].counters[index] + return val + + +class DataObject(object): + Base class for various datum in GCDA/GCNO file. + + def __init__(self, tag): +self.tag = tag + + +class Function(DataObject): + Function and its counters. + + Attributes: +length: Length of the data on the disk +ident: Ident field +line_checksum: Checksum of the line number +cfg_checksum: Checksum of the control flow graph +counters: All counters associated with the function +file: The name of the file the function is defined in. Optional. +line: The line number the function is defined at. Optional. + + Function object contains other counter objects and block/arc/line objects. + + + def __init__(self, reader, tag, n_words): +Read function record information from a gcda/gcno file. + +Args: + reader: gcda/gcno file. + tag: funtion tag. + n_words: length of function record in unit of 4-byte. + +DataObject.__init__(self, tag) +self.length = n_words +self.counters = [] + +if reader: + pos = reader.pos + self.ident = reader.ReadWord() + self.line_checksum = reader.ReadWord() + self.cfg_checksum = reader.ReadWord() + + # Function name string is in gcno files, but not + # in gcda files. Here we make string reading optional. + if (reader.pos - pos) n_words: +
Re: [google gcc-4_7] offline profile merge tool (issue 8508048)
https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py File contrib/profile_merge.py (right): https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode53 contrib/profile_merge.py:53: data_file = open(path, 'rb') On 2013/04/09 17:32:11, martint wrote: How about simpler version: with open(file, 'rb') as f: data = f.read() Done. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode59 contrib/profile_merge.py:59: def MergeCounters(objs, index, multipliers): changed the name to ReturnMergedCounters On 2013/04/09 17:32:11, martint wrote: Perhaps use function name that shows that it returns the value. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode78 contrib/profile_merge.py:78: length: Length of the data on the disk All the bullets sub-to Attributes are not capital. On 2013/04/09 17:32:11, martint wrote: Minor style. Make sure format is coherent (ends with . or not, starts with capital letter or not). https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode119 contrib/profile_merge.py:119: self.ident = 0 Probably not. 'reader' here is ProfileDataFile. We should have a valid gcda/gcno file for this program. But it does not hurt to have a empty reader. I'll leave as it's. On 2013/04/09 17:32:11, martint wrote: It is ever useful to have a function object with no reader? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode138 contrib/profile_merge.py:138: return self.ArcCounters().counters[0] that will not happen. all functions have at least one counter. On 2013/04/09 17:32:11, martint wrote: This will throw an exception if no counters exists. Is that expected? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode209 contrib/profile_merge.py:209: class Lines(DataObject): It's for reading in gcno file. We are not using this functionality for merge. On 2013/04/09 17:32:11, martint wrote: Where is this used? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1153 contrib/profile_merge.py:1153: os.chdir(direc) On 2013/04/09 17:32:11, martint wrote: How about using absolute paths to avoid the chdir logic? Done. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1199 contrib/profile_merge.py:1199: outfile_path = output_path + '/' + f On 2013/04/09 17:32:11, martint wrote: os.path.join() instead of +, here and other places where the pattern is used. Done. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1213 contrib/profile_merge.py:1213: def Merge(self, archives, multipliers): The caller in this program always has self in archives set. On 2013/04/09 17:32:11, martint wrote: The code seems to handle both the case when 'self' is inside archives and when it is not. Does both happen when running the code? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1233 contrib/profile_merge.py:1233: for i, a in enumerate(archives): i is the i-th profile archive. On 2013/04/09 17:32:11, martint wrote: It's hard to see what i is in this case. I would prefer to have information about arguments in the documentation at the top of the function, but more descriptive variable names and comments would also help. https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1313 contrib/profile_merge.py:1313: input_profiles[0].Merge(input_profiles, profile_multipliers) That will use more memory. We already use too much memory. And it only modifies in-memory object (the profile files are not changed). On 2013/04/09 17:32:11, martint wrote: Did you conside creating a new object instead of modifying the existing? https://codereview.appspot.com/8508048/diff/4001/contrib/profile_merge.py#newcode1317 contrib/profile_merge.py:1317: input_profiles[0].Write(output_profile) Refer to previous comment. Only in-memory objects are overwritten. On 2013/04/09 17:32:11, martint wrote: Warn before overwriting? https://codereview.appspot.com/8508048/
[Debug, Fortran] RFC patch for DW_TAG_namelist (PR fortran/37132)
Dear all, attached is a first attempt to implement DW_TAG_namelist support in GCC. (That's a DWARF2 features.) It nicely works for: module m real :: cc end module m subroutine sub() use m implicit none integer :: aa, bb namelist /nml/ aa, bb, cc aa = 5 end subroutine sub which gives: 24f: Abbrev Number: 3 (DW_TAG_namelist) 50 DW_AT_name: nml 54 DW_AT_sibling : 0x68 358: Abbrev Number: 4 (DW_TAG_namelist_item) 59 DW_AT_namelist_items: 0x68 35d: Abbrev Number: 4 (DW_TAG_namelist_item) 5e DW_AT_namelist_items: 0x72 362: Abbrev Number: 4 (DW_TAG_namelist_item) 63 DW_AT_namelist_items: 0xab 268: Abbrev Number: 5 (DW_TAG_variable) 69 DW_AT_name: aa etc. However, it fails for: subroutine nml_test(x1) integer :: x1 namelist /nml2/ x1 x1 = 5 end subroutine nml_test i.e. where x1 is a procedure argument. The problem is that lookup_decl_die cannot find it and force_decl_die doesn't like to handle it. Any idea? Tobias PS: Except for gfortran.dg/namelist_14.f90, gfortran.dg/namelist_69.f90 and gfortran.dg/namelist_70.f90 it regtested successfully for check-gfortran. (I didn't see a need for an all-language bootstrap + regtesting as long as there are still Fortran regressions.) 2013-04-09 Tobias Burnus bur...@net-b.de PR fortran/37132 * dbxout.c (dbx_debug_hooks, xcoff_debug_hooks): Add noop for namelist_decl. * sdbout.c (sdb_debug_hooks): Ditto. * vmsdbgout.c (vmsdbg_debug_hooks): Ditto. * debug.c (do_nothing_debug_hooks): Ditto. (debug_nothing_charstar_tree_vectree): New function. * debug.h (gcc_debug_hooks): Add namelist_decl. (debug_nothing_charstar_tree_vectree): New prototype. * dwarf2out.c (dwarf2out_namelist_decl): New function. (dwarf2_debug_hooks): Use it. 2013-04-09 Tobias Burnus bur...@net-b.de PR fortran/37132 * trans-decl.c (generate_namelist_decl): New function. (generate_local_vars): Call it. diff --git a/gcc/dbxout.c b/gcc/dbxout.c index 4d9fc4e..9e5388c 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -367,6 +367,7 @@ const struct gcc_debug_hooks dbx_debug_hooks = dbxout_global_decl, /* global_decl */ dbxout_type_decl, /* type_decl */ debug_nothing_tree_tree_tree_bool, /* imported_module_or_decl */ + debug_nothing_charstar_tree_vectree, /* namelist_decl */ debug_nothing_tree, /* deferred_inline_function */ debug_nothing_tree, /* outlining_inline_function */ debug_nothing_rtx, /* label */ @@ -403,6 +404,7 @@ const struct gcc_debug_hooks xcoff_debug_hooks = dbxout_global_decl, /* global_decl */ dbxout_type_decl, /* type_decl */ debug_nothing_tree_tree_tree_bool, /* imported_module_or_decl */ + debug_nothing_charstar_tree_vectree, /* namelist_decl */ debug_nothing_tree, /* deferred_inline_function */ debug_nothing_tree, /* outlining_inline_function */ debug_nothing_rtx, /* label */ diff --git a/gcc/debug.c b/gcc/debug.c index 7ed50b4..02106a5 100644 --- a/gcc/debug.c +++ b/gcc/debug.c @@ -46,6 +46,7 @@ const struct gcc_debug_hooks do_nothing_debug_hooks = debug_nothing_tree, /* global_decl */ debug_nothing_tree_int, /* type_decl */ debug_nothing_tree_tree_tree_bool, /* imported_module_or_decl */ + debug_nothing_charstar_tree_vectree, /* namelist_decl */ debug_nothing_tree, /* deferred_inline_function */ debug_nothing_tree, /* outlining_inline_function */ debug_nothing_rtx, /* label */ @@ -84,6 +85,11 @@ debug_nothing_tree_tree_tree_bool (tree t1 ATTRIBUTE_UNUSED, { } +void +debug_nothing_charstar_tree_vectree (const char *, tree, vectree *) +{ +} + bool debug_true_const_tree (const_tree block ATTRIBUTE_UNUSED) { diff --git a/gcc/debug.h b/gcc/debug.h index 886de17..b429016 100644 --- a/gcc/debug.h +++ b/gcc/debug.h @@ -18,6 +18,8 @@ #ifndef GCC_DEBUG_H #define GCC_DEBUG_H +#include vec.h + /* This structure contains hooks for the debug information output functions, accessed through the global instance debug_hooks set in toplev.c according to command line options. */ @@ -108,6 +110,9 @@ struct gcc_debug_hooks void (* imported_module_or_decl) (tree decl, tree name, tree context, bool child); + /* Debug information for namelists. */ + void (* namelist_decl) (const char *, tree, vectree *); + /* DECL is an inline function, whose body is present, but which is not being output at this point. */ void (* deferred_inline_function) (tree decl); @@ -158,6 +163,8 @@ extern void debug_nothing_int_int (unsigned int, unsigned int); extern void debug_nothing_tree (tree); extern void debug_nothing_tree_tree (tree, tree); extern void debug_nothing_tree_int (tree, int); +extern void debug_nothing_charstar_tree_vectree (const char *, tree, + vectree *); extern void
Drop more of the old alias handling code
Hi, this patch removes handling of alias pairs in ipa.c. This code is no longer needed now when we represent all aliases explicitly in the symbol table. Bootstrapped/regtested x86_64-linux. Honza * ipa.c (cgraph_externally_visible_p, varpool_externally_visible_p): Drop aliased parameter. (function_and_variable_visibility): Do not handle alias pairs. * cgraph.c (varpool_externally_visible_p): Update prototype. * varpool.c (varpool_add_new_variable): Update. Index: ipa.c === --- ipa.c (revision 197243) +++ ipa.c (working copy) @@ -573,7 +573,7 @@ cgraph_comdat_can_be_unshared_p (struct static bool cgraph_externally_visible_p (struct cgraph_node *node, -bool whole_program, bool aliased) +bool whole_program) { if (!node-local.finalized) return false; @@ -582,11 +582,6 @@ cgraph_externally_visible_p (struct cgra || DECL_EXTERNAL (node-symbol.decl))) return false; - /* Do not even try to be smart about aliased nodes. Until we properly - represent everything by same body alias, these are just evil. */ - if (aliased) -return true; - /* Do not try to localize built-in functions yet. One of problems is that we end up mangling their asm for WHOPR that makes it impossible to call them using the implicit built-in declarations anymore. Similarly this enables @@ -638,7 +633,7 @@ cgraph_externally_visible_p (struct cgra /* Return true when variable VNODE should be considered externally visible. */ bool -varpool_externally_visible_p (struct varpool_node *vnode, bool aliased) +varpool_externally_visible_p (struct varpool_node *vnode) { /* Do not touch weakrefs; while they are not externally visible, dropping their DECL_EXTERNAL flags confuse most @@ -652,11 +647,6 @@ varpool_externally_visible_p (struct var if (!DECL_COMDAT (vnode-symbol.decl) !TREE_PUBLIC (vnode-symbol.decl)) return false; - /* Do not even try to be smart about aliased nodes. Until we properly - represent everything by same body alias, these are just evil. */ - if (aliased) -return true; - /* If linker counts on us, we must preserve the function. */ if (symtab_used_from_object_file_p ((symtab_node) vnode)) return true; @@ -733,42 +723,9 @@ function_and_variable_visibility (bool w { struct cgraph_node *node; struct varpool_node *vnode; - struct pointer_set_t *aliased_nodes = pointer_set_create (); - struct pointer_set_t *aliased_vnodes = pointer_set_create (); - unsigned i; - alias_pair *p; - /* Discover aliased nodes. */ - FOR_EACH_VEC_SAFE_ELT (alias_pairs, i, p) -{ - if (dump_file) - fprintf (dump_file, Alias %s-%s, - IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (p-decl)), - IDENTIFIER_POINTER (p-target)); - - if ((node = cgraph_node_for_asm (p-target)) != NULL - !DECL_EXTERNAL (node-symbol.decl)) - { - if (!node-analyzed) - continue; - cgraph_mark_force_output_node (node); - pointer_set_insert (aliased_nodes, node); - if (dump_file) - fprintf (dump_file, node %s/%i, -cgraph_node_name (node), node-uid); - } - else if ((vnode = varpool_node_for_asm (p-target)) != NULL - !DECL_EXTERNAL (vnode-symbol.decl)) - { - vnode-symbol.force_output = 1; - pointer_set_insert (aliased_vnodes, vnode); - if (dump_file) - fprintf (dump_file, varpool node %s, -varpool_node_name (vnode)); - } - if (dump_file) - fprintf (dump_file, \n); -} + /* All aliases should be procssed at this point. */ + gcc_checking_assert (!alias_pairs || !alias_pairs-length()); FOR_EACH_FUNCTION (node) { @@ -817,9 +774,7 @@ function_and_variable_visibility (bool w !DECL_COMDAT (node-symbol.decl)) || TREE_PUBLIC (node-symbol.decl) || DECL_EXTERNAL (node-symbol.decl)); - if (cgraph_externally_visible_p (node, whole_program, - pointer_set_contains (aliased_nodes, -node))) + if (cgraph_externally_visible_p (node, whole_program)) { gcc_assert (!node-global.inlined_to); node-symbol.externally_visible = true; @@ -898,9 +853,7 @@ function_and_variable_visibility (bool w { if (!vnode-finalized) continue; - if (varpool_externally_visible_p - (vnode, -pointer_set_contains (aliased_vnodes, vnode))) + if (varpool_externally_visible_p (vnode)) vnode-symbol.externally_visible = true; else vnode-symbol.externally_visible = false; @@ -913,8 +866,6 @@ function_and_variable_visibility (bool w
Re: [PATCH SH] Error: unaligned opcodes detected in executable segment
Christian Bruel christian.br...@st.com wrote: This patch fixes label alignments after a ADDR_DIFF_VEC with byte offsets. The bug occurs with building the libgcc for a sh-elf target. See a reduced case here. The funny thing is that the assembly error given in Subject appears only on a debug section when compiled with -g, thus the emitted code without -g: .align 2 .L4: .byte .L3-.L5 .byte .L10-.L5 .byte .L7-.L5 .byte .L8-.L5 .byte .L9-.L5 .L3: mov.l .L13,r1 assembles silently, although unaligned... OK for trunk ? The failure occurs with the libgcc, so tested by allowing the sh-elf build to complete. OK. Regards, kaz