Re: [00/32] Support multiple ABIs in the same translation unit
On Wednesday, September 11, 2019, Richard Sandiford < richard.sandif...@arm.com> wrote:. > > > Sorry for the long write-up. > > Richard > *thanks* for the long write-up! Ciao! Steven
Re: [PATCH] rs6000: Make CSE'ing __tls_get_addr calls possible
On Sun, Mar 24, 2019 at 12:46 AM Segher Boessenkool wrote: > > CSE does not consider calls, not even const calls. This patch puts a > REG_EQUAL note on the pseudo we assign the __tls_get_addr result to, > so that those pseudos can be CSE'd and the extra calls deleted as dead > code. Hi Segher, There were REG_EQUAL notes on these tls calls in the past, but I recall removing them for one reason or another. So watch out for fall-out from this patch! ;-) Ciao! Steven
Re: [PATCH] Fix PR89150, GC of tree-form bitmaps
On Mon, Feb 4, 2019 at 2:16 PM Richard Biener wrote: > When I introduced tree-form bitmaps I forgot to think about GC. > The following drops the chain_prev annotation to make the marker > work for trees. I don't understand this patch. How are the nodes in a bitmap tree now to be found for marking? This patch should cause an ICE on test cases with bitmaps as trees with ENABLE_GC_ALWAYS_COLLECT. Ciao! Steven
Re: [PATCH] Remove a barrier when EDGE_CROSSING is remoed (PR lto/88858).
On Mon, Feb 4, 2019 at 9:10 AM Martin Liška wrote: > > @Honza: PING^1 > >>> + else > >>> + { > >>> + if (PREV_INSN (insn)) > >>> + SET_NEXT_INSN (PREV_INSN (insn)) = NEXT_INSN (insn); > >>> + if (NEXT_INSN (insn)) > >>> + SET_PREV_INSN (NEXT_INSN (insn)) = PREV_INSN (insn); > >>> + } > >>> + } This looks like remove_insn(). Ciao! steven
Re: [PATCH] sched-ebb.c: avoid moving table jumps (PR rtl-optimization/88423)
On Fri, Jan 18, 2019 at 5:43 PM David Malcolm wrote: > (1) What does GCC mean by "ebb" in this context? In this context, the EBB is what Muchnick would call a trace. Ciao! Steven
Re: [PATCH] Add splay-tree "view" for bitmap
On Fri, Oct 19, 2018 at 8:46 AM Richard Biener <> wrote: > Yeah. I also noticed some 'obvious' shortcomings in the heuristics... > I guess in the end well predicted branches in the out of line code are > important... What also would help is to put bitmaps on their own obstack to improve cache locality. As for the patch, I never hacked it with "production code" in mind, it was just a proof of concept. Not all of it is optimal or even safe as-is. For example you probably should add "gcc_checking_assert(!(BITMAP)->tree-form)" tests in the bmp_iter_*_init functions. And perhaps semi-splaying trees work better for the use cases of GCC (x.f. "Rehabilitation of an unloved child: semi-splaying"). I implemented classic splay trees because I could not find a semi-splay tree implementation in any of the usual text books while classic splay tree implementations were given in all of those books ;-) Ciao! Steven
Re: [PATCH][RFC] Fix PR63155 (some more)
On Wed, Sep 19, 2018 at 3:06 PM Richard Biener wrote: > If we'd only had a O(log n) search sparse bitmap implementation ... > (Steven posted patches to switch bitmap from/to such one but IIRC > that at least lacked bitmap_first_set_bit). But bitmap_first_set_bit would be easy to implement. Just take the left-most node of the splay tree. Actually all bit-tests would be easy to implement. It's only enumeration and set operations on the tree-views that would be complicated fluff (easier to "switch views" than re-implement). Impressive that you remember >5yr old patches like that ;-) Ciao! Steven
Re: [PATCH] Remove verify_no_unreachable_blocks
On Thu, Aug 23, 2018 at 1:18 PM Richard Biener <> wrote: > -/* Verify that there are no unreachable blocks in the current function. */ > - > -void > -verify_no_unreachable_blocks (void) > -{ > - find_unreachable_blocks (); > - > - basic_block bb; > - FOR_EACH_BB_FN (bb, cfun) > -gcc_assert ((bb->flags & BB_REACHABLE) != 0); Alternatively, just clear BB_REACHABLE here? FOR_EACH_BB_FN (bb, cfun) { gcc_assert ((bb->flags & BB_REACHABLE) != 0); bb->flags &= ~BB_REACHABLE; } The function is quite useful for debugging, I wouldn't remove it. Ciao! Steven
Re: [PATCH 4/4] rs6000: Delete old add+cmp patterns
On Thu, Aug 16, 2018 at 7:14 PM, Segher Boessenkool <> wrote: > There are some patterns that recognise the parallel of an add and a > compare, and split it back to the same two insns. This apparently > helped RIOS machines before RTL scheduling existed? Either way, it > isn't helpful anymore, and even hurts a tiny bit. So, delete it. It's been there since r251 (initial revision). On the topic of archaeology: Perhaps the time also finally has come to revisit this comment in rs6000.md: 13065; FIXME: This would probably be somewhat simpler if the Cygnus sibcall 13066; stuff was in GCC. <...> :-) Ciao! Steven
Re: [PATCH 2/4] Switch other switch expansion methods into classes.
On Tue, Jun 12, 2018 at 10:44 PM, Jeff Law wrote: > On 06/05/2018 01:15 AM, marxin wrote: >> >> + The definition of "much bigger" depends on whether we are >> + optimizing for size or for speed. If the former, the maximum >> + ratio range/count = 3, because this was found to be the optimal >> + ratio for size on i686-pc-linux-gnu, see PR11823. The ratio >> + 10 is much older, and was probably selected after an extensive >> + benchmarking investigation on numerous platforms. Or maybe it >> + just made sense to someone at some point in the history of GCC, >> + who knows... */ > "much older" is an understatement. I believe the magic "10" pre-dates > my involvement in GCC. You can find evidence of it as far back as > gcc-0.9. I doubt it was extensively benchmarked, and even if it was, > the targets on which it was benchmarked don't reflect modern target > reality in terms of importance. When I added this comment (https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/stmt.c?r1=189284&r2=189285&;) it as an attempt at humor. I should have turned the number into a PARAM at the time. Maybe that's something Martin could still do now? Ciao! Steven
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Thu, Aug 3, 2017 at 2:56 PM, Richard Biener wrote: > I think the main reason for not doing it early is the benefit is small > (unless it is GIMPLE optimizations triggering) Agree. > and we can't get rid of > switches completely because we eventually have to support casei RTL expansion. > (and no, computed goto with an array of label addresses at GIMPLE is really > too ugly ;)) What I would have done, is lower all gswitch statements that are to be lowered to something other than a tablejump. So by the time you get to RTL expansion, all remaining gswitch statements would be tablejump or casesi. Ciao! Steven
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: > Hello. > > After some discussions with Honza, I've decided to convert current code in > stmt.c that > is responsible for switch expansion. More precisely, I would like to convert > the code > to expand gswitch statements on tree level. Currently the newly created pass > is executed > at the end of tree optimizations. Hah, something I promissed myself (and others) to do years ago! Thanks thanks thanks! :-) The end of the gimple optimizations is seems late for the lowering. Before there were gimple optimizations, switch lowering was part of the first compiler pass (to generate RTL from the front end) *before* all code transformation passes ("optimizations" ;-). Because the lowering of switch statements was rewritten as a gimple lowering pass, it now runs *after* optimizations. But to be honest, I can't think of any optimization opportunities exposed by lowering earlier than at the end of gimple optimizations. Years ago there was some RTL jump threading still done after lowering of small switch statements, but I can't find the related PRs anymore. > My plan for future is to inspire in [1] and come up with some more > sophisticated switch > expansions. For that I've been working on a paper where I'll summarize > statistics based > on what I've collected in openSUSE distribution with specially instrumented > GCC. If I'll be > happy I can also fit in to schedule of this year's Cauldron with a talk. Sayle's paper is a good starting point. Also interesting: http://llvm.org/devmtg/2017-02-04/Efficient-clustering-of-case-statements-for-indirect-branch-prediction.pdf About adjusting the size of jump tables to the capabilities of the CPU branch predictor. Mixed results but something to consider. Kannan & Proebsting, "CORRECTION TO ‘PRODUCING GOOD CODE FOR THE CASE STATEMENT’" About finding "clusers" of given density within the target values of the switch statement. The clustering algorithm as presented is N^2 in the number of case labels, but the idea is interesting and a simplified, cheaper approach may be possible. This is already used in LLVM, it seems. The real challenge will be to figure out how to pick from all these different approaches the right ones to lower a single switch statement. Ciao! Steven
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Wed, Aug 2, 2017 at 1:51 PM, Richard Biener wrote: > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: >> Hello. >> >> After some discussions with Honza, I've decided to convert current code in >> stmt.c that >> is responsible for switch expansion. More precisely, I would like to convert >> the code >> to expand gswitch statements on tree level. Currently the newly created pass >> is executed >> at the end of tree optimizations. Hah, something I promissed myself (and others) to do years ago! Thanks thanks thanks! :-) >> My plan for future is to inspire in [1] and come up with some more >> sophisticated switch >> expansions. For that I've been working on a paper where I'll summarize >> statistics based >> on what I've collected in openSUSE distribution with specially instrumented >> GCC. If I'll be >> happy I can also fit in to schedule of this year's Cauldron with a talk. Sayle's paper is a good starting point. Also interesting: >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Thoughts? > > First of all thanks. > > I think part of switch expansion moved to switch-conversion some time ago > (emit_case_bit_tests). So maybe the full lowering should be in at least > the same source file and it should maybe applied earlier for a subset of > cases (very low number of cases for example). > > Did you base the code on the RTL expansion code or did you re-write it from > scratch? > > No further comments sofar. > > Richard. > >> Martin >> >> [1] https://www.nextmovesoftware.com/technology/SwitchOptimization.pdf >> >> gcc/ChangeLog: >> >> 2017-07-31 Martin Liska >> >> * Makefile.in: Add gimple-switch-low.o. >> * passes.def: Include pass_lower_switch. >> * stmt.c (dump_case_nodes): Remove and move to >> gimple-switch-low.c. >> (case_values_threshold): Likewise. >> (expand_switch_as_decision_tree_p): Likewise. >> (emit_case_decision_tree): Likewise. >> (expand_case): Likewise. >> (balance_case_nodes): Likewise. >> (node_has_low_bound): Likewise. >> (node_has_high_bound): Likewise. >> (node_is_bounded): Likewise. >> (emit_case_nodes): Likewise. >> * timevar.def: Add TV_TREE_SWITCH_LOWERING. >> * tree-pass.h: Add make_pass_lower_switch. >> * gimple-switch-low.c: New file. >> >> gcc/testsuite/ChangeLog: >> >> 2017-07-31 Martin Liska >> >> * gcc.dg/tree-prof/update-loopch.c: Scan patterns in >> switchlower. >> * gcc.dg/tree-ssa/vrp104.c: Likewise. >> --- >> gcc/Makefile.in|1 + >> gcc/gimple-switch-low.c| 1226 >> >> gcc/passes.def |1 + >> gcc/stmt.c | 861 - >> gcc/testsuite/gcc.dg/tree-prof/update-loopch.c | 10 +- >> gcc/testsuite/gcc.dg/tree-ssa/vrp104.c |2 +- >> gcc/timevar.def|1 + >> gcc/tree-pass.h|1 + >> 8 files changed, 1236 insertions(+), 867 deletions(-) >> create mode 100644 gcc/gimple-switch-low.c >> >>
Re: [PATCH] Fix PR middle-end/81564: ICE in group_case_labels_stmt()
On Wed, Jul 26, 2017 at 9:35 PM, Peter Bergner wrote: > The test case for PR81564 exposes an issue where the case labels for a > switch statement point to blocks that have already been removed by an > earlier call to cleanup_tree_cfg(). In that case, the code in > group_case_labels_stmt() that does: How can a basic block be removed (apparently as unreachable) if there are still case labels leading to it? Apparently there is enough information to make CASE_LABEL be set to NULL. Why is the case label not just removed (or redirected to the default, or ...)? The patch feels like it's papering over another issue. group_case_labels is an optional thing to do, basically just a simplification. The compiler should run even if you never group the case labels... Ciao! Steven
Re: [PATCH][PR 59521] Respect probabilities when expanding switch statement
On Tue, Jul 18, 2017 at 9:04 AM, Yuri Gribov wrote: > Hi all, > > Currently all cases in switch statement are treated as having equal > probabilities which causes suboptimal code as demonstrated in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59521 . This patch > modifies expander to select pivot point for decision tree so that > probabilities of cases on the left are roughly equal to probabilities > on the right. > > Patch survives bootstrap and regtesting on x64 but has some issues: > * tests are fragile but I'm not sure how to make them better > * I haven't done any performance measurements - would these be needed? > I don't have access to SPEC these days, any other suggestions? > > Patch is jointly authored with Martin. Hi Yuri, Can you come up with test cases that don't scan the assembly output? Ideally the test case should check a dump file that is as close as possible to the code transformation, in this case the dump from pass_expand. Ciao! Steven
Re: [PATCH] Improve RTL ifcvt for empty else_bb (PR rtl-optimization/80491)
On Wed, Apr 26, 2017 at 1:25 PM, Jakub Jelinek wrote: > Or shall we just: > --- gcc/alias.c 2017-04-25 15:51:31.072923325 +0200 > +++ gcc/alias.c 2017-04-26 13:23:55.595048464 +0200 > @@ -3221,6 +3221,8 @@ memory_modified_in_insn_p (const_rtx mem > { >if (!INSN_P (insn)) > return false; > + if (CALL_P (insn)) > +return true; +"&& ! RTL_CONST_OR_PURE_CALL_P (insn)" ? Ciao! Steven >memory_modified = false; >note_stores (PATTERN (insn), memory_modified_1, CONST_CAST_RTX(mem)); >return memory_modified; > > instead of the call_crossed hacks? Then modified_between_p and > modified_in_p would return true for !MEM_READONLY_P MEMs crossing a call. > > Jakub
Re: [PATCH] Reenable RTL sharing verification
On Wed, Nov 30, 2016 at 1:08 PM, Jakub Jelinek wrote: > Hi! > > The http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01055.html > change broke all RTL sharing verification, even with --enable-checking=rtl > we don't verify anything for the last 3.5 years. Eh, I guess "oops!" doesn't quite cover that error. Sorry! Ciao! Steven
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener wrote: > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote: >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote: >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote: >>> > + cfg_layout_finalize (); >>> >>> I don't think you have to go into/out-of cfglayout mode for each iteration. >> >> Yeah probably. I was too lazy :-) It needs the cleanup_cfg every >> iteration though, I was afraid that interacts. > > Ick. Why would it need a cfg-cleanup every iteration? I don't believe it needs a cleanup on every iteration. One cleanup at the end should work fine. Ciao! Steven
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote: > This patch solves this problem by simply running the duplicate_computed_gotos > pass again, as long as it does any work. The patch looks much bigger than > it is, because I factored out two routines to simplify the control flow. It's made the patch a bit difficult to read. Condensing it a bit... > + for (;;) > { > + if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) > + return 0; This test should not be needed in the loop. This pass can never collapse the function to a single basic block. > + clear_bb_flags (); > + cfg_layout_initialize (0); See comment below... > + basic_block bb; > + FOR_EACH_BB_FN (bb, fun) > + { > + /* Build the reorder chain for the original order of blocks. */ > + if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun)) > + bb->aux = bb->next_bb; > + } > > + duplicate_computed_gotos_find_candidates (fun, candidates, max_size); > > + bool changed = false; > + if (!bitmap_empty_p (candidates)) > + changed = duplicate_computed_gotos_do_duplicate (fun, candidates); > > + if (changed) > + fixup_partitions (); > + > + cfg_layout_finalize (); I don't think you have to go into/out-of cfglayout mode for each iteration. >/* Merge the duplicated blocks into predecessors, when possible. */ > + if (changed) > + cleanup_cfg (0); > + else > + break; > } Maybe a gcc_assert that the loop doesn't iterate more often than num_edges? Ciao! Steven
Re: [PATCH, libfortran] PR 48587 Newunit allocator
On Thu, Oct 13, 2016 at 5:16 PM, Janne Blomqvist wrote: > +static bool *newunits; You could make this a bitmap (like sbitmap). A bit more code but makes a potentially quadratic search (when opening many units) less time consuming. Ciao! Steven
Re: [PATCH] df: Keep return address register undefined until epilogue_completed
On Mon, Aug 29, 2016 at 6:50 PM, Segher Boessenkool wrote: > This patch changes that so that that def is only added after > epilogue_completed. ... > Does this work on all other targets? Guessing: not for targets where INCOMING_RETURN_ADDR_RTX is the stack register? That'd be at least h8300, rl78, and rx. Ciao! Steven
Re: Init df for split pass since some target use REG_NOTE in split pattern
On Mon, Aug 8, 2016 at 9:45 PM, Jeff Law wrote: > > I'm pretty sure we do _not_ want it for split_all_insns_noflow since that's > used when the CFG is not necessarily correct and thus I don't see how we can > reliably have death/unused notes. Actually, trying to initializing DF without a valid CFG will most likely just crash the compiler. DF doesn't work without a valid CFG. I expect the patch to fail for at least MIPS, SPARC, and ARM. Ciao! Steven
Re: [PATCH] Add code-hoisting to GIMPLE
On Mon, Jul 4, 2016 at 1:26 PM, Richard Biener wrote: > > The following patch is Stevens code-hoisting based on PRE forward-ported > and fixed for bootstrap plus the case of hoisting code across loops > which we generally do not want (expressions in the loop exit target block > are antic-in throughout the whole loop unless they are killed and thus > get inserted into the exit block and then PREd before the loop). > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > I'm going to try making the bitmap_set ops in do_hoist_insert a bit > faster - Steven, do you remember any issues with the approach from the > time you worked on it? Hi Richi, It's been almost 8 years since I worked on this, so I really don't recall much about this at all. Sorry :-) But thank you for picking this old work up! Ciao! Steven
Re: [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump
On Tue, May 3, 2016 at 8:59 AM, Segher Boessenkool wrote: > If the jump_block here contains just a return, we will crash later > in invert_jump. Don't allow that case. But if jump_block contains a return, then it isn't the EXIT_BLOCK for the function. Is the conditional jump a conditional return insn? Ciao! Steven
[patch] cleanups for vtable-verify
Hello, This patch is random cleanups on the vtable-verify code. OK for trunk? Ciao! Steven gcc/ * vtable-verify.h (verify_vtbl_ptr_fndecl): Add GTY markers. (num_vtable_map_nodes): Remove extern declaration. (vtbl_mangled_name_types, vtbl_mangled_name_ids): Likewise. * vtable-verify.c (num_vtable_map_nodes): Make static. (vtbl_mangled_name_types, vtbl_mangled_name_ids): Likewise. (verify_vtbl_ptr_fndecl): Remove redundant extern declaration. cp/ * vtable-class-hierarchy.c (vtable_find_or_create_map_decl): Make static. (vtv_compute_class_hierarchy_transitive_closure): Eliminate uses of num_vtable_map_nodes in lieu of vtbl_map_nodes_vec.length() and of vtbl_map_nodes_vec.iterate(). (guess_num_vtable_pointers, register_all_pairs, write_out_vtv_count_data, vtv_register_class_hierarchy_information, vtv_generate_init_routine): Likewise. gcc/ * vtable-verify.h (verify_vtbl_ptr_fndecl): Add GTY markers. (num_vtable_map_nodes): Remove extern declaration. (vtbl_mangled_name_types, vtbl_mangled_name_ids): Likewise. * vtable-verify.c (num_vtable_map_nodes): Make static. (vtbl_mangled_name_types, vtbl_mangled_name_ids): Likewise. (verify_vtbl_ptr_fndecl): Remove redundant extern declaration. cp/ * vtable-class-hierarchy.c (vtable_find_or_create_map_decl): Make static. (vtv_compute_class_hierarchy_transitive_closure): Eliminate uses of num_vtable_map_nodes in lieu of vtbl_map_nodes_vec.length() and of vtbl_map_nodes_vec.iterate(). (guess_num_vtable_pointers, register_all_pairs, write_out_vtv_count_data, vtv_register_class_hierarchy_information, vtv_generate_init_routine): Likewise. Index: vtable-verify.h === --- vtable-verify.h (revision 235623) +++ vtable-verify.h (working copy) @@ -27,12 +27,8 @@ along with GCC; see the file COPYING3. If not see be global because it needs to be initialized in the C++ front end, but used in the middle end (in the vtable verification pass). */ -extern tree verify_vtbl_ptr_fndecl; +extern GTY(()) tree verify_vtbl_ptr_fndecl; -/* Global variable keeping track of how many vtable map variables we - have created. */ -extern unsigned num_vtable_map_nodes; - /* Keep track of how many virtual calls we are actually verifying. */ extern int total_num_virtual_calls; extern int total_num_verified_vcalls; @@ -127,10 +123,6 @@ extern bool vtv_debug; /* The global vector of vtbl_map_nodes. */ extern vec vtbl_map_nodes_vec; -/* The global vectors for mangled class names for anonymous classes. */ -extern GTY(()) vec *vtbl_mangled_name_types; -extern GTY(()) vec *vtbl_mangled_name_ids; - extern void vtbl_register_mangled_name (tree, tree); extern struct vtbl_map_node *vtbl_map_get_node (tree); extern struct vtbl_map_node *find_or_create_vtbl_map_node (tree); Index: vtable-verify.c === --- vtable-verify.c (revision 235623) +++ vtable-verify.c (working copy) @@ -144,11 +144,13 @@ along with GCC; see the file COPYING3. If not see #include "vtable-verify.h" -unsigned num_vtable_map_nodes = 0; +/* Global variable keeping track of how many vtable map variables we + have created. */ +static unsigned num_vtable_map_nodes = 0; + int total_num_virtual_calls = 0; int total_num_verified_vcalls = 0; -extern GTY(()) tree verify_vtbl_ptr_fndecl; tree verify_vtbl_ptr_fndecl = NULL_TREE; /* Keep track of whether or not any virtual call were verified. */ @@ -305,10 +307,8 @@ static vtbl_map_table_type *vtbl_map_hash; vec vtbl_map_nodes_vec; /* Vector of mangled names for anonymous classes. */ -extern GTY(()) vec *vtbl_mangled_name_types; -extern GTY(()) vec *vtbl_mangled_name_ids; -vec *vtbl_mangled_name_types; -vec *vtbl_mangled_name_ids; +static GTY(()) vec *vtbl_mangled_name_types; +static GTY(()) vec *vtbl_mangled_name_ids; /* Look up class_type (a type decl for record types) in the vtbl_mangled_names_* vectors. This is a linear lookup. Return the associated mangled name for Index: cp/vtable-class-hierarchy.c === --- cp/vtable-class-hierarchy.c (revision 235623) +++ cp/vtable-class-hierarchy.c (working copy) @@ -137,8 +137,6 @@ struct work_node { struct work_node *next; }; -struct vtbl_map_node *vtable_find_or_create_map_decl (tree); - /* As part of vtable verification the compiler generates and inserts calls to __VLTVerifyVtablePointer, which is in libstdc++. This function builds and initializes the function decl that is used @@ -380,7 +378,7 @@ void vtv_compute_class_hierarchy_transitive_closure (void) { struct work_node *worklist = NULL; - sbitmap inserted = sbitmap_alloc (num_vtable_map_nodes); + sbitmap inserted = sbitmap_allo
Re: [PATCH] Ensure noce_convert_multiple_sets handles only multiple sets (PR rtl-optimization/69570)
On Mon, Feb 1, 2016 at 9:32 AM, Jakub Jelinek wrote: > Bootstrapped/regtested on > {x86_64,i686,ppc64,ppc64le,s390,s390x,aarch64}-linux, > ok for trunk? OK. Browny points for opting out of the loop over all insns in the basic block when count > limit. Ciao! Steven
Re: Try to update dominance info in tree-call-cdce.c
On Fri, Oct 30, 2015 at 2:11 PM, Richard Sandiford wrote: > Is the split_block change really so bad? IMHO: Yes. split_block just splits some basic block B into two blocks B1/B2 somewhere in the middle of B. The dominance relations between B1 and B2 are obvious and intuitive. The new flag to split_block is not. The dominance info is not updated but the loops are? Confusing, if you ask me... Your situation, where a pass knows the dominance relations will change, is unusual. The extra work to fix up the ET tree twice should be negligible anyway. Ciao! Steven
Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm
On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote: > + /* First, collect all edges that can be optimized by reordering blocks: > + simple jumps and conditional jumps, as well as the function entry edge. > */ > + > + int n = 0; > + edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0); > + > + basic_block bb; > + FOR_EACH_BB_FN (bb, cfun) > +{ > + rtx_insn *end = BB_END (bb); > + > + if (computed_jump_p (end) || tablejump_p (end, NULL, NULL)) > + continue; Should handle ASM jumps. > + FOR_ALL_BB_FN (bb, cfun) > +bb->aux = bb; Bit tricky for the ENTRY and EXIT blocks, that are not really basic blocks. After the pass, EXIT should not end up pointing to itself. Maybe use FOR_EACH_BB_FN and set ENTRY separately? Other than that, looks good to me. Ciao! Steven
Re: [PATCH 1/4] bb-reorder: Split out STC
On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote: > This first patch simply factors code a little bit. > > > 2015-09-23 Segher Boessenkool > > * bb-reorder.c (reorder_basic_blocks_software_trace_cache): New > function, factored out from ... > (reorder_basic_blocks): ... here. OK. Ciao! Steven
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law wrote: > > So there's a tight relationship between the implementation of > bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then > we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Ciao! Steven
Re: [PATCH] Fix PR66168: ICE due to incorrect invariant register info
On Tue, May 19, 2015 at 12:17 PM, Thomas Preud'homme wrote: > 2015-05-18 Thomas Preud'homme > > PR rtl-optimization/66168 > * loop-invariant.c (move_invariant_reg): Set inv->reg to destination > of inv->insn when moving an invariant without introducing a temporary > register. Not OK. This will break in move_invariants() when it looks at REGNO (inv->reg). Ciao! Steven
Re: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
On Thu, Apr 16, 2015 at 10:43 AM, Thomas Preud'homme wrote: > 2015-04-15 Thomas Preud'homme > Steven Bosscher > > * cprop.c (cprop_reg_p): New. > (hash_scan_set): Use above function to check if register can be > propagated. > (find_avail_set): Return up to two sets, one whose source is > a register and one whose source is a constant. Sets are returned in > an array passed as parameter rather than as a return value. > (cprop_insn): Use a do while loop rather than a goto. Try each of the > sets returned by find_avail_set, starting with the one whose source is > a constant. Use cprop_reg_p to check if register can be propagated. > (do_local_cprop): Use cprop_reg_p to check if register can be > propagated. > (implicit_set_cond_p): Likewise. I wouldn't usually approve patches I've coded bits in myself. But this post is now 7 days old and it's Thomas' patch for 99%, so... OK for trunk. Can you please put steven at gcc.gnu.org for my e-mail address in the ChangeLog entry? Ciao! Steven
Re: [PR65768] Check rtx_cost when propagating constant
On Wed, Apr 15, 2015 at 9:53 AM, Kugan wrote: > 2015-04-15 Kugan Vivekanandarajah < > > Zhenqiang Chen <> > > PR target/65768 > * cprop.c (try_replace_reg): Check cost of constants before > propagating. > + > + /* For CONSTANT_P (to), loop2_invariant pass might hoist it out the loop. > + And it can be shared by different references. So skip propagation if > + it makes INSN's rtx cost higher. */ > + So only undo if the insn is inside a loop (i.e. BLOCK_FOR_INSN(insn)->loop_father != NULL) and this is a post-pass_loop2 cprop run? Ciao! Steven
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
On Tue, Apr 14, 2015 at 5:06 PM, Jiong Wang wrote: > but, after some investigation I found gcc actually generate difference > code when debug info enabled because > DEBUG_INSN will affect data flow analysis. It should not, so that's a bug. > So I think this stage2/3 binary difference is acceptable? No, they should be identical. If there's a difference, then there's a bug - which, it seems, you've already found, too. Ciao! Steven
Re: [PATCH] Fix rtl_split_edge (PR rtl-optimization/65761)
On Tue, Apr 14, 2015 at 2:58 PM, Jakub Jelinek wrote: > PR rtl-optimization/65761 > * cfgrtl.c (rtl_split_edge): For EDGE_CROSSING split, use > get_last_bb_insn (after) instead of NEXT_INSN (BB_END (after)). > > --- gcc/cfgrtl.c.jj 2015-04-12 21:50:18.0 +0200 > +++ gcc/cfgrtl.c2015-04-14 12:45:34.127722958 +0200 > @@ -1928,7 +1928,7 @@ rtl_split_edge (edge edge_in) >&& (edge_in->flags & EDGE_CROSSING)) > { >after = last_bb_in_partition (edge_in->src); > - before = NEXT_INSN (BB_END (after)); > + before = get_last_bb_insn (after); >/* The instruction following the last bb in partition should > be a barrier, since it cannot end in a fall-through. */ >gcc_checking_assert (BARRIER_P (before)); This is OK. Ciao! Steven
Re: breakage with "[PATCH] combine: Disregard clobbers in another test for two SETs (PR65693)"
On Thu, Apr 9, 2015 at 2:41 PM, Segher Boessenkool wrote: > It would be nice if there would be some cc0 target in the compile farm, > since cc0 isn't going away any time soon :-( In this case it would be enough to replace the "#ifndef/#ifdef HAVE_cc0" code with "if (HAVE_cc0)". That's the simplest way to avoid compile breakage. Likewise for so many other #ifdef code (HAVE_conditional_move, HAVE_trap, etc.). Perhaps something to work on in the next stage1... Ciao! Steven
Re: [PATCH] Disable ppc/spu context sensitive macros for CLK_ASM preprocessing (PR preprocessor/61977)
On Wed, Apr 1, 2015 at 10:40 AM, Jakub Jelinek wrote: > Hi! > > The context sensitive macros are inherently C/C++ specific, so trying to > expand them into anything when preprocessing assembler doesn't make any > sense to me. Why are the -c.c cpp builtins defined at all when preprocessing assembly? Or in other words: should these (supposedly) language-dependent hooks for cpp builtins be called if the pre-processor is called stand-alone? Ciao! Steven
Re: [PR64164] drop copyrename, integrate into expand
On Sat, Mar 28, 2015 at 8:21 PM, Alexandre Oliva wrote: > Regstrapped on x86_64-linux-gnu native and on i686-pc-linux-gnu native > on x86_64, so without lto plugin. The only regression is in > gcc.dg/guality/pr54200.c, that explicitly disables VTA. What about memory footprint? IIRC this pass was in part introduced to reduce the number of VAR_DECLs. Ciao! Steven
Re: [PATCH 3/3] Fix dbr_schedule for -freorder-blocks-and-partition
On Tue, Jan 27, 2015 at 12:52 AM, Kaz Kojima wrote: > This patch is to fix 2 issues found in dbr_schedule when trying to > fix PR target/64761. The first is relax_delay_slots removes > the jump insn in the insns like below: > > (jump_insn/j 74 58 59 (set (pc) (label_ref:SI 29)) ...) > (barrier 59 74 105) > (note 105 59 29 NOTE_INSN_SWITCH_TEXT_SECTIONS) > (code_label 29 105 30 31 "" [5 uses]) > (insn 31 30 32 (set (reg ... > > i.e. relax_delay_slot tries to delete the jump insn pointing to > the next active insn of that jump insn as a trivial jump even when > there is a NOTE_INSN_SWITCH_TEXT_SECTIONS note between that jump > and its next active insn. > The second issue is that relax_delay_slots does a variant of > follow jump optimization without checking targetm.can_follow_jump. > > -- > PR target/64761 > * reorg.c (switch_text_sections_between_p): New function. > (relax_delay_slots): Call it when testing if the jump insn > is removable. Use targetm.can_follow_jump when testing if > the conditional branch can follow an unconditional jump. This patch merely papers over another issue, probably a missing CROSSING_JUMP_P test. Ciao! Steven > diff --git a/reorg.c b/reorg.c > index 326fa53..2387910 100644 > --- a/reorg.c > +++ b/reorg.c > @@ -3211,6 +3211,19 @@ label_before_next_insn (rtx x, rtx scan_limit) >return insn; > } > > +/* Return TRUE if there is a NOTE_INSN_SWITCH_TEXT_SECTIONS note in between > + BEG and END. */ > + > +static bool > +switch_text_sections_between_p (const rtx_insn *beg, const rtx_insn *end) > +{ > + const rtx_insn *p; > + for (p = beg; p != end; p = NEXT_INSN (p)) > +if (NOTE_P (p) && NOTE_KIND (p) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > + return true; > + return false; > +} > + > > /* Once we have tried two ways to fill a delay slot, make a pass over the > code to try to improve the results and to do such things as more jump > @@ -3247,7 +3260,8 @@ relax_delay_slots (rtx_insn *first) > target_label = find_end_label (target_label); > > if (target_label && next_active_insn (target_label) == next > - && ! condjump_in_parallel_p (insn)) > + && ! condjump_in_parallel_p (insn) > + && ! (next && switch_text_sections_between_p (insn, next))) > { > delete_jump (insn); > continue; > @@ -3262,12 +3276,13 @@ relax_delay_slots (rtx_insn *first) > > /* See if this jump conditionally branches around an unconditional > jump. If so, invert this jump and point it to the target of the > -second jump. */ > +second jump. Check if it's possible on the target. */ > if (next && simplejump_or_return_p (next) > && any_condjump_p (insn) > && target_label > && next_active_insn (target_label) == next_active_insn (next) > - && no_labels_between_p (insn, next)) > + && no_labels_between_p (insn, next) > + && targetm.can_follow_jump (insn, next)) > { > rtx label = JUMP_LABEL (next); >
Re: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
On Mon, Mar 23, 2015 at 12:01 PM, Thomas Preud'homme wrote: > What about the cprop_reg_p that needs to be negated? Did I miss something > that makes it ok? You didn't miss anything. I sent the wrong patch. The one I tested on ppc64 also has the condition reversed: @@ -1328,9 +1329,8 @@ implicit_set_cond_p (const_rtx cond) if (GET_CODE (cond) != EQ && GET_CODE (cond) != NE) return false; - /* The first operand of COND must be a pseudo-reg. */ - if (! REG_P (XEXP (cond, 0)) - || HARD_REGISTER_P (XEXP (cond, 0))) + /* The first operand of COND must be a register we can propagate. */ + if (! cprop_reg_p (XEXP (cond, 0))) return false; /* The second operand of COND must be a suitable constant. */
Re: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
On Fri, Mar 20, 2015 at 11:27 AM, Thomas Preud'homme wrote: > Sorry, I missed the parenthesis. REG_P needs indeed to be kept. I'd be > tempted to use !HARD_REGISTER_P instead since REG_P is already > checked but I don't mind either way. I put the cprop_reg_p check there instead of !HARD_REGISTER_P because I like to be able to quickly find all places where a similar check is performed. The check is whether the reg is something that copy propagation can handle, and that is what I added cprop_reg_p for. (Note that cprop can _currently_ handle only pseudos but there is no reason why a limited set of hard regs can't be handled also, e.g. the flag registers like in targetm.fixed_condition_code_regs). In this case, the result is that REG_P is checked twice. But then again, cprop_reg_p will be inlined and the double check optimized away. Anyway, I guess we've bikeshedded long enough over this patch as it is :-) Let's post a final form and declare it OK for stage1. As for PSEUDO_REG_P: If it were up to me, I'd like to have in rtl.h: static bool hard_register_p (rtx x) { return (REG_P (x) && HARD_REGISTER_NUM_P (REGNO (x))); } static bool pseudo_register_p (rtx x) { return (REG_P (x) && !HARD_REGISTER_NUM_P (REGNO (x))); } and do away with all the FIRST_PSEUDO_REGISTER tests. But I've proposed this in the past and there was opposition. Perhaps when we introduce a rtx_reg class... Ciao! Steven
Re: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
On Tue, Feb 17, 2015 at 3:51 AM, Thomas Preud'homme wrote: >> > - else if (REG_P (src) >> > - && REGNO (src) >= FIRST_PSEUDO_REGISTER >> > - && REGNO (src) != regno) >> > - { >> > - if (try_replace_reg (reg_used, src, insn)) >> > + else if (src_reg && REG_P (src_reg) >> > + && REGNO (src_reg) >= FIRST_PSEUDO_REGISTER >> > + && REGNO (src_reg) != regno >> > + && try_replace_reg (reg_used, src_reg, insn)) >> >> Likewise for the REG_P and ">= FIRST_PSEUDO_REGISTER" tests here >> (with >> the equivalent and IMHO preferable HARD_REGISTER_P test in >> find_avail_set()). > > I'm not sure I follow you here. First, it seems to me that the equivalent > test is rather REG_P && !HARD_REGISTER_P since here it checks if it's > a pseudo register. > > Then, do you mean the test can be simply removed because of the > REG_P && !HARD_REGISTER_P in hash_scan_set () called indirectly by > compute_hash_table () when called in one_cprop_pass () before any > cprop_insn ()? Or do you mean I should move the check in > find_avail_set ()? What I meant, is that I believe the tests are already done in hash_scan_set and should be redundant in cprop_insn (i.e. the test can be replaced with gcc_[checking_]assert). I've attached a patch with some changes to it: introduce cprop_reg_p() to get rid of all the "REG_P && regno > FIRST_PSEUDO_REGISTER" tests. I still have the cprop_constant_p and cprop_reg_p tests in cprop_insn but this weekend I'll try with gcc_checking_asserts instead. Please have a look at the patch and let me know if you like it (given it's mostly yours I hope you do like it ;-) Bootstrapped & tested on powerpc64-unknown-linux-gnu. In building all of cc1, 35 extra copies are propagated with the patch. Ciao! Steven Index: cprop.c === --- cprop.c (revision 221523) +++ cprop.c (working copy) @@ -285,6 +285,15 @@ cprop_constant_p (const_rtx x) return CONSTANT_P (x) && (GET_CODE (x) != CONST || shared_const_p (x)); } +/* Determine whether the rtx X should be treated as a register that can + be propagated. Any pseudo-register is fine. */ + +static bool +cprop_reg_p (const_rtx x) +{ + return REG_P (x) && !HARD_REGISTER_P (x); +} + /* Scan SET present in INSN and add an entry to the hash TABLE. IMPLICIT is true if it's an implicit set, false otherwise. */ @@ -295,8 +304,7 @@ hash_scan_set (rtx set, rtx_insn *insn, struct has rtx src = SET_SRC (set); rtx dest = SET_DEST (set); - if (REG_P (dest) - && ! HARD_REGISTER_P (dest) + if (cprop_reg_p (dest) && reg_available_p (dest, insn) && can_copy_p (GET_MODE (dest))) { @@ -321,9 +329,8 @@ hash_scan_set (rtx set, rtx_insn *insn, struct has src = XEXP (note, 0), set = gen_rtx_SET (VOIDmode, dest, src); /* Record sets for constant/copy propagation. */ - if ((REG_P (src) + if ((cprop_reg_p (src) && src != dest - && ! HARD_REGISTER_P (src) && reg_available_p (src, insn)) || cprop_constant_p (src)) insert_set_in_table (dest, src, insn, table, implicit); @@ -821,15 +828,15 @@ try_replace_reg (rtx from, rtx to, rtx_insn *insn) return success; } -/* Find a set of REGNOs that are available on entry to INSN's block. Return - NULL no such set is found. */ +/* Find a set of REGNOs that are available on entry to INSN's block. If found, + SET_RET[0] will be assigned a set with a register source and SET_RET[1] a + set with a constant source. If not found the corresponding entry is set to + NULL. */ -static struct cprop_expr * -find_avail_set (int regno, rtx_insn *insn) +static void +find_avail_set (int regno, rtx_insn *insn, struct cprop_expr *set_ret[2]) { - /* SET1 contains the last set found that can be returned to the caller for - use in a substitution. */ - struct cprop_expr *set1 = 0; + set_ret[0] = set_ret[1] = NULL; /* Loops are not possible here. To get a loop we would need two sets available at the start of the block containing INSN. i.e. we would @@ -869,8 +876,10 @@ try_replace_reg (rtx from, rtx to, rtx_insn *insn) If the source operand changed, we may still use it for the next iteration of this loop, but we may not use it for substitutions. */ - if (cprop_constant_p (src) || reg_not_set_p (src, insn)) - set1 = set; + if (cprop_constant_p (src)) + set_ret[1] = set; + else if (reg_not_set_p (src, insn)) + set_ret[0] = set; /* If the source of the set is anything except a register, then we have reached the end of the copy chain. */ @@ -881,10 +890,6 @@ try_replace_reg (rtx from, rtx to, rtx_insn *insn) and see if we have an available copy into SRC. */ regno = REGNO (src); } - - /* SET1 holds the last set that was available and anticipatable at -
Re: [PATCH][gcse] Use emit_move_insn rather than creating SET rtx and emitting that
On Thu, Mar 19, 2015 at 2:45 PM, Kyrill Tkachov wrote: > As pointed out by James Greenhalgh offline the correct thing would have been > to do an > emit_move_insn to let the backend expanders do the right thing (especially > in the concerned > testcase gcc.c-torture/execute/pr65427.c that uses 256-bit vectors that arm > doesn't support > natively). This is supposed to be caught by want_to_gcse_p() via can_assign_to_reg_without_clobbers_p(). How does your expression get past that barrier? The gcc_unreachable() is there because all the code in gcse.c assumes it is OK to emit a SET-insn without going through emit_move_insn(). Ciao! Steven
Re: [PATCH] Run DCE after if conversion
On Tue, Mar 10, 2015 at 8:57 AM, Andreas Krebbel wrote: > > * gcc/ifcvt.c (if_convert): > ...yes...? > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index a3e3e5c..d2040af 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -4626,6 +4626,13 @@ if_convert (bool after_combine) >num_true_changes); > } > > + if (num_true_changes > 0) > +{ > + df_set_flags (DF_LR_RUN_DCE); > + df_mark_solutions_dirty (); > + df_analyze (); > +} > + > if (optimize == 1) > df_remove_problem (df_live); Tiny nail, huge hammer. This triggers a full re-scan of all insns and a re-calculation of all dataflow problems. The transformations in ifcvt are all simple enough that it should be possible to just clean up that redundant insn at the site where the code transformation is performed. Ciao! Steven
Re: [PATCH] Improve PR44563
On Mon, Mar 9, 2015 at 4:12 PM, Richard Biener wrote: > ! /* This is a really poor hash function, but it is what the current code > uses, > ! so I am reusing it to avoid an additional axis in testing. */ This is a bit silly as a comment, because after your patch the "current" code is the patched code. Better to reference the htab_hash_pointer function. But can't we add an inline version in hashtab.h instead? Ciao! Steven
Re: [PATCH, stage1] Move insns without introducing new temporaries in loop2_invariant
On Thu, Mar 5, 2015 at 10:53 AM, Thomas Preud'homme wrote: > diff --git a/gcc/dominance.c b/gcc/dominance.c > index 33d4ae4..09c8c90 100644 > --- a/gcc/dominance.c > +++ b/gcc/dominance.c > @@ -982,7 +982,7 @@ nearest_common_dominator_for_set (enum cdi_direction dir, > bitmap blocks) > > A_Dominated_by_B (node A, node B) > { > - return DFS_Number_In(A) >= DFS_Number_In(A) > + return DFS_Number_In(A) >= DFS_Number_In(B) > && DFS_Number_Out (A) <= DFS_Number_Out(B); > } */ This hunk is obvious enough ;-) > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index f79b497..ab2a45c 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c ... > + /* Check whether the set is always executed. We could omit this condition > if > + we know that the register is unused outside of the loop, but it does not > + seem worth finding out. */ > + may_exit = BITMAP_ALLOC (NULL); > + has_exit = BITMAP_ALLOC (NULL); > + always_executed = BITMAP_ALLOC (NULL); > + body = get_loop_body_in_dom_order (loop); > + find_exits (loop, body, may_exit, has_exit); > + compute_always_reached (loop, body, has_exit, always_executed); > + /* Find bit position for basic block bb. */ > + for (i = 0; i < loop->num_nodes && body[i] != bb; i++); > + if (!bitmap_bit_p (always_executed, i)) > +goto cleanup; It looks like this would run for all candidate loop invariants, right? If so, you're creating run time of O(n_invariants*n_bbs_in_loop), a potential compile time hog for large loops. But why compute this at all? Perhaps I'm missing something, but you already have inv->always_executed available, no? > + /* Check that all uses reached by the def in insn would still be reached > + it. */ > + dest_regno = REGNO (reg); > + for (use = DF_REG_USE_CHAIN (dest_regno); use; use = DF_REF_NEXT_REG (use)) > +{ > + rtx ref; Would be nice if we can start using rtx_insn for new code. You do so further up, I suggest you continue that good citizenship here :-) > + basic_block use_bb; > + > + ref = DF_REF_INSN (use); > + use_bb = BLOCK_FOR_INSN (ref); You can use DF_REF_BB. > + /* Ignore instruction considered for moving. */ > + if (ref == insn) > + continue; > + > + /* Don't consider uses outside loop. */ > + if (!flow_bb_inside_loop_p (loop, use_bb)) > + continue; > + > + /* Don't move if a use is not dominated by def in insn. */ > + if (use_bb == bb && DF_INSN_LUID (insn) > DF_INSN_LUID (ref)) > + goto cleanup; You're safer with ">=" even if you've already checked ref==insn. Ciao! Steven
Re: [PATCH] [RTL] Relax CSE check to set REG_EQUAL notes.
On Wed, Mar 4, 2015 at 12:09 PM, Alex Velenko wrote: > For example, in arm testcase pr43920-2.c, CSE previously decided not to put > an "obvious" note on insn 9, as set value was the same as note value. > At the same time, other insns set up as -1 were set up through a register > and did get a note: ...which is the point of the REG_EQUAL notes. In insn 8 there is a REG_EQUAL note to show that the value of r111 is known. In insn 9 the known value is, well, known from SET_SRC so there is no need for a REG_EQUAL note. Adding REG_EQUAL notes in such cases is just wasteful. > (insn 9 53 34 8 (set (reg:SI 110 [ D.4934 ]) > (const_int -1 [0x])) > /work/src/gcc/gcc/testsuite/gcc.target/arm/pr43920-2.c:21 613 > {*thumb2_movsi_vfp} > (nil)) > > (insn 8 45 50 6 (set (reg:SI 110 [ D.4934 ]) > (reg/v:SI 111 [ startD.4917 ])) > /work/src/gcc/gcc/testsuite/gcc.target/arm/pr43920-2.c:21 613 > {*thumb2_movsi_vfp} > (expr_list:REG_EQUAL (const_int -1 [0x]) > (nil))) > > (insn 6 49 54 7 (set (reg:SI 110 [ D.4934 ]) > (reg/v:SI 112 [ endD.4918 ])) > /work/src/gcc/gcc/testsuite/gcc.target/arm/pr43920-2.c:21 613 > {*thumb2_movsi_vfp} > (expr_list:REG_EQUAL (const_int -1 [0x]) > (nil))) > > Jump2 pass, optimizing common code, was looking at notes to reason about > register values and failing to recognize those insns to be equal. I suppose you are talking about the head/tail merging code? Can you please provide a test case for problem preferably filed in Bugzilla)? Ciao! Steven
Re: [patch, avr] Take 2: Fix PR64331: insn output and insn length computation rely on REG_DEAD notes.
On Sat, Feb 28, 2015 at 5:38 PM, Georg-Johann Lay wrote: > Am 02/26/2015 um 11:45 PM schrieb Steven Bosscher: >> >> On Thu, Feb 26, 2015 at 8:35 PM, Georg-Johann Lay wrote: >>> >>> Take #2 introduces a new, avr-specific rtl pass whose sole purpose is to >>> rectify notes. The pass is scheduled right before cfg does down (right >>> before .*free_cfg) so that cfg and hence df machinery is available. >>> >>> Regression tests look fine and for the test case the patches produce >>> correct >>> code and correct insn length. >> >> >> Sorry for party-pooping, but it seems to me that the real bug is that >> the target is lying to reload. >> >> IIUC the AVR port selects the insn alternative after register >> allocation (after reload). It bases its selection on REG_DEAD notes. >> In PR64331 an alternative is used that clobbers r20 that has a >> REG_DEAD note, but r20 is not actually dead because hardreg-cprop has >> propagated it forward without adjusting the note. > > > It's not actually about constraint alternatives. > > Let me give an example: Testing HI for 0. The usual sequence would be > > cc0 = reg.low == 0 > cc0 = cc0 && reg.high == 0 > > which costs 2 instructions. If reg is unused after, then ORing can be used > and cc0 will be set as a side effect. This costs 1 insn: > > cc0 = (reg.low |= reg.high) > > Using alternatives would double their number, i.e. 14 instead of 7 for > *cmphi. > > These constraint alternatives would have to express > > 1) reg-alloc, please, use alternative #1 (with clobber of reg) >if the register is unused after > > 2) reg-alloc, please, use alternative #2 (not clobbering reg) >if the register is used after > > If we assume for a moment that we have a *testhi insn and the old *tsthi had > just 1 alternative: > > > (define_insn "*tsthi" > [(set (cc0) > (compare > (match_operand:ALL2 0 "register_operand" "r") > (const_int 0)))] > ...) > > > Then the new one would be something like > > > (define_insn "*tsthi" > [(set (cc0) > (compare > (match_operand:ALL2 0 "register_operand" "r,r") > (const_int 0))) >(clobber (match_scratch:HI 1 "=0,X")] > ...) > > But how can I express 1) and 2) ? I don't think you can. Other ports express such transformations using define_peephole2 and peep2_reg_dead_p. > Currently, my preferred approach is a new drop-in replacement for the old > reg_unused_after which uses clobbers to decide whether or not op 0 is still > needed. That way, reg-alloc can work like before and there is no need to > implement dozens of new constraint alternatives across the md files. The problem with this approach is that it may break at random. There's just no guarantee that it will work, because you're relying on information that just isn't valid anymore. Unfortunately I don't know enough about CC0-targets to suggest an alternative. Ciao! Steven
Re: [patch, avr] Take 2: Fix PR64331: insn output and insn length computation rely on REG_DEAD notes.
On Thu, Feb 26, 2015 at 8:35 PM, Georg-Johann Lay wrote: > Take #2 introduces a new, avr-specific rtl pass whose sole purpose is to > rectify notes. The pass is scheduled right before cfg does down (right > before .*free_cfg) so that cfg and hence df machinery is available. > > Regression tests look fine and for the test case the patches produce correct > code and correct insn length. Sorry for party-pooping, but it seems to me that the real bug is that the target is lying to reload. IIUC the AVR port selects the insn alternative after register allocation (after reload). It bases its selection on REG_DEAD notes. In PR64331 an alternative is used that clobbers r20 that has a REG_DEAD note, but r20 is not actually dead because hardreg-cprop has propagated it forward without adjusting the note. The "normal" way of things is that the insn alternative is selected in reload (or in LRA) and that the clobbers are added as necessary. In PR64331, an alternative for insn r17 would be selected that has a CLOBBER for r20, prevent hardreg-cprop from propagating r20. Selecting insns based on REG-notes is dangerous business. Lying to reload and to post-RA passes is a mortal sin, the compiler will punish you. There is no guarantee that nothing will change between your new pass to recompute notes, and the final pass that emits the insns. It's not my port, for sure, but I would look for a real fix instead: Don't select insns to output based on unreliable information like REG-notes. Ciao! Steven
Re: [PATCH] gcc/reload.c: Initialize several arrays before use them in find_reloads()
On Mon, Feb 23, 2015 at 11:26 PM, Jeff Law wrote: > On 02/22/15 02:02, Chen Gang S wrote: >> >> It is for Bug65117, after this fix, ".i" file can be passed compiling. >> >>- 'this_alternative_win' is not initialized before use it: for the >> first looping 0, it initializes 'this_alternative_win[0]', but >> 'did_match' may use 'this_alternative_win[2]'. >> >>- 'this_alternative' may be not initialized before using: it >> initializes 'this_alternative[i]', but may use 'this_alternative[m]' >> (m > i). >> >>- After reading through the code, arrays 'this_alternative_match_win', >> 'this_alternative_offmemok', and 'this_alternative_earlyclobber' may >> be not initialized either, so initialize them too. >> >> This issue is found by cross compiling xtensa Linux kernel with the >> latest gcc5. And after this patch, it can cross compile xtensa Linux >> kernel with allmodconfig, successfully. >> >> >> 2015-02-22 Chen Gang >> >> * reload.c (find_reloads): Initialize several arrays before use >> them. > > > From the documentation for matching constraints: > > Moreover, the digit must be a > smaller number than the number of the operand that uses it in the > constraint. > > > If we look at the zero_cost_loop_{start,end} patterns we have: > > (if_then_else (ne (match_operand:SI 0 "register_operand" "2") > > and > > > (if_then_else (ne (match_operand:SI 0 "nonimmediate_operand" "2,2") > > Similarly for the loop_end pattern. > > > Which violate the rule for matching constraints. ...and should never have worked at all... > I'm confident that if the xtensa's patterns were fixed to abide by the rules > for matching constraints the problem in reload would not occur. Chen, perhaps a warming can be implemented for this in genrecog? Ciao! Steven
Re: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
On Mon, Feb 16, 2015 at 11:26 AM, Thomas Preud'homme wrote: > /* Subroutine of cprop_insn that tries to propagate constants into > @@ -1044,40 +1042,41 @@ cprop_insn (rtx_insn *insn) > - /* Constant propagation. */ > - if (cprop_constant_p (src)) > - { > - if (constprop_register (reg_used, src, insn)) > + /* Constant propagation. */ > + if (src_cst && cprop_constant_p (src_cst) > + && constprop_register (reg_used, src_cst, insn)) > { > changed_this_round = changed = 1; > global_const_prop_count++; The cprop_constant_p test is redundant, you only have non-NULL src_cst if it is a cprop_constant_p (as you test for it in find_avail_set()). > @@ -1087,18 +1086,16 @@ retry: >"GLOBAL CONST-PROP: Replacing reg %d in ", regno); > fprintf (dump_file, "insn %d with constant ", >INSN_UID (insn)); > - print_rtl (dump_file, src); > + print_rtl (dump_file, src_cst); > fprintf (dump_file, "\n"); > } > if (insn->deleted ()) > return 1; > } > - } > - else if (REG_P (src) > - && REGNO (src) >= FIRST_PSEUDO_REGISTER > - && REGNO (src) != regno) > - { > - if (try_replace_reg (reg_used, src, insn)) > + else if (src_reg && REG_P (src_reg) > + && REGNO (src_reg) >= FIRST_PSEUDO_REGISTER > + && REGNO (src_reg) != regno > + && try_replace_reg (reg_used, src_reg, insn)) Likewise for the REG_P and ">= FIRST_PSEUDO_REGISTER" tests here (with the equivalent and IMHO preferable HARD_REGISTER_P test in find_avail_set()). Looks good to me otherwise. Ciao! Steven
Re: [PATCH, GCC, stage1] Fallback to copy-prop if constant-prop not possible
On Mon, Feb 16, 2015 at 11:26 AM, Thomas Preud'homme wrote: > Hi, > > The RTL cprop pass in GCC operates by doing a local constant/copy propagation > first and then a global one. In the local one, if a constant cannot be > propagated (eg. due to constraints of the destination instruction) a copy > propagation is done instead. However, at the global level copy propagation is > only tried if no constant can be propagated, ie. if a constant can be > propagated but the constraints of the destination instruction forbids it, no > copy propagation will be tried. This patch fixes this issue. This solves the > redundant ldr problem in GCC32RM-439. > This would address https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34503#c4 I'll have a look at the patch tonight. Ciao! Seven
Re: [PATCH] Update BBs in cleanup_barriers pass (PR rtl-optimization/61058)
On Mon, Jan 26, 2015 at 10:11 AM, Jakub Jelinek wrote: > On Mon, Jan 26, 2015 at 10:06:05AM +0100, Eric Botcazou wrote: >> > While the cleanup_barriers runs after cleaning up BLOCK_FOR_INSNs, >> > some targets like i?86/x86_64 choose to populate it again during machine >> > reorg and some target don't free it at the end of machine reorg. >> > This patch updates cleanup_barrier pass, so that it adjusts basic block >> > boundaries and BLOCK_FOR_INSNs in that case, so that we don't crash during >> > final pass. >> >> This isn't a recent regression so what about fixing it more "properly"? For >> example, by calling free_bb_for_insn at the end of the machinre reorg passes >> which called compute_bb_for_insn at the beginning? Or do the affected ports >> need the BB info all the way down to final? > > Yes, they do, that is why it crashed during final. And they should not do so. It cannot be relied upon, in general. Even now, recomputing BLOCK_FOR_INSN only works because machine reorg is the first pass after free_cfg (so nothing has changed yet to the insns stream). Some ports (including x86_64/i386, ia64, and most non-sched_dbr ports) could simply do away with free_cfg and cleanup_barriers. But some ports put things into the insns stream after free_cfg that verify_flow_info chokes on, e.g. the fake insns for const pools for ARM (that causes bugs elsewhere also, see e.g. https://gcc.gnu.org/ml/gcc-patches/2011-07/msg02101.html). The current situation is confusing and bound to give bugs sooner or later... I had patches to have an "early" and "late" free_cfg pass -- I think I even posted them and I still believe that's the right short-term solution -- to make sure nobody can put something between free_cfg and a target with a machine_reorg that needs the CFG, or at least BLOCK_FOR_INSN. Oh well... Ciao! Steven
Re: Housekeeping work in backends.html
On Wed, Jan 7, 2015 at 9:42 AM, Eric Botcazou wrote: >> the attached patch removes obsolete ports (c4x, m68hc11 and ms1), toggles >> the 'p' letter and adjust accordingly (only avr, fr30, m68k, mcore, rs6000 >> and sh still use define_peephole) and removes trailing spaces. > > Same treatment for the 'd' letter, the ports that do not use DFA scheduler > descriptions are a clear minority (avr, cr16, cris, fr30, h8300, m32c, mmix, > msp430, pdp11, stormy16, vax). Applied. Perhaps 'd' should just go away completely. It was intended to distinguish between ports using the old scheduler description and ports using the DFA model. But support for the old scheduler description was removed some 10 years ago, and AFAIR the targets that don't use the DFA scheduler don't use the scheduler at all. Ciao! Steven
Re: [PATCH 1/2] PR debug/63239 Add DWARF representation for C++11 deleted member function.
On Mon, Oct 6, 2014 at 9:54 AM, Mark Wielaard wrote: > Just removing the # prefix (but keeping the space) from > scan-assembler-times should work for both our cases. I just don't know > why our .s outputs look different. Wild guess: different comment marker in the target's assembly language? Ciao! Steven
Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer
On Tue, Sep 30, 2014 at 6:51 PM, Richard Earnshaw wrote: > It's not just clobbers; it ignores patterns like > > (parallel > [(set (a) (...) > (set (b) (...)]) > [(reg_note (reg_unused(b))] > > Which is probably fine before register allocation but definitely > something you have to think about afterwards. Even before RA this isn't always fine. We have checks for !multiple_sets for this. Ciao! Steven
Re: [PATCH] Fix signed integer overflow in gcc/data-streamer.c
On Sun, Sep 28, 2014 at 2:22 PM, Markus Trippelsdorf wrote: > Running the testsuite with a -fsanitize=undefined instrumented compiler > shows: > % gcc -O2 -flto -fno-use-linker-plugin -flto-partition=none > testsuite/gcc.dg/torture/pr28045.c > gcc/data-streamer.c:113:45: runtime error: negation of -9223372036854775808 > cannot be represented in type 'long int'; cast to an unsigned type to negate > this value to itself > > The fix is obvious. > > Boostrapped and tested on x86_64-unknown-linux-gnu. > OK for trunk? > Thanks. > > 2014-09-28 Markus Trippelsdorf > > * data-streamer.c (bp_unpack_var_len_int): Avoid signed > integer overflow. > > diff --git a/gcc/data-streamer.c b/gcc/data-streamer.c > index 0e19c72162aa..0760ed590c22 100644 > --- a/gcc/data-streamer.c > +++ b/gcc/data-streamer.c > @@ -110,7 +110,7 @@ bp_unpack_var_len_int (struct bitpack_d *bp) >if ((half_byte & 0x8) == 0) > { > if ((shift < HOST_BITS_PER_WIDE_INT) && (half_byte & 0x4)) > - result |= - ((HOST_WIDE_INT)1 << shift); > + result |= - ((unsigned HOST_WIDE_INT)1 << shift); > > return result; > } Can you use HOST_WIDE_INT_1U for this? Ciao! Steven
Re: [PATCH] Put all MAINTAINERS email addresses into <...>
On Fri, Sep 26, 2014 at 3:09 PM, Segher Boessenkool wrote: > On Thu, Sep 25, 2014 at 11:10:00PM +0200, Jan-Benedict Glaw wrote: >> Resending this email. Seems some spam filter ate it due to the many >> email addresses... > > Will now all/most/many further patches to MAINTAINERS hit spam filters > as well? Let's hope not. But at least for me, all mail to people @ arm.com now bounce... Ciao! Steven
Re: [Patch 1/4] Hookize MOVE_BY_PIECES_P, remove most uses of MOVE_RATIO
On Thu, Sep 25, 2014 at 4:57 PM, James Greenhalgh wrote: > * doc/tm.texi.in (MOVE_BY_PIECES_P): Reduce documentation to a stub > describing that this macro is deprecated. Remove it entirely and poison it in system.h? It takes changes to only a few targets: mips, arc, s390, and sh. Thanks for hookizing this! Ciao! Steven
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 2:51 PM, Ilya Enkovich wrote: > 2014-09-24 16:47 GMT+04:00 Steven Bosscher : > It is not a control flow instruction. It copies value of instruction > pointer into a general purpose register. Therefore REG_LABEL_OPERAND > seems to be correct. OK - sorry for being a bit slow on the up-take, I got confused by the asm syntax :-) So I'm going to speculate a bit more... What you want to have is: foo: insns... L2: leaq L2(%rip), rXX What happens is that L2 is "deleted", which is to say converted to a NOTE_INSN_DELETED_LABEL. Then the notes are re-ordered (NOTE_INSN_DELETED_LABEL notes are not tied to anything in the insns stream and can end up anywhere) so you end up with something like, foo: L2: # (was deleted) insns... leaq L2(%rip),rXX I bet you'd find that in the failing test case the label is output to the assembly file but it's simply in the wrong place. For the large code model, we get away with it because the prologue is output late and the order of the insns is not adjusted (a few passes later, the CFG doesn't even exist anymore so you don't go through cfgcleanup). But if you emit the label early and let it go through the entire RTL pipeline then anything can happen. If the above makes sense, then you'll want to emit the label late, or not at all, to the insns stream. If you emit the label late into the insns stream, you'd rewrite the set_rip as a define_insn_and_split that emits the label as part of the last splitting pass. But there is no splitting pass late enough to guarantee that the label and insns won't get separated. If you don't emit the label to the insns stream, you would write ix86_output_set_rip() and call that from the define_insns for set_rip. You'd not emit the label in the expander. You'd create it and make it an operand, but not emit it. Your ix86_output_set_rip() would write the label and the set_rip instruction. This is probably the only way to make 100% sure that the label is always exactly at the set_rip instruction. Something like below (completely untested, etc...). Hope this helps, Ciao! Steven Index: config/i386/i386-protos.h === --- config/i386/i386-protos.h (revision 215483) +++ config/i386/i386-protos.h (working copy) @@ -303,6 +303,7 @@ extern enum attr_cpu ix86_schedule; #endif extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op); +extern const char * ix86_output_set_rip_insn (rtx *operands); #ifdef RTX_CODE /* Target data for multipass lookahead scheduling. Index: config/i386/i386.c === --- config/i386/i386.c (revision 215483) +++ config/i386/i386.c (working copy) @@ -11225,8 +11225,6 @@ ix86_expand_prologue (void) gcc_assert (Pmode == DImode); label = gen_label_rtx (); - emit_label (label); - LABEL_PRESERVE_P (label) = 1; tmp_reg = gen_rtx_REG (Pmode, R11_REG); gcc_assert (REGNO (pic_offset_table_rtx) != REGNO (tmp_reg)); insn = emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx, @@ -12034,8 +12032,6 @@ ix86_expand_split_stack_prologue (void) rtx x; label = gen_label_rtx (); - emit_label (label); - LABEL_PRESERVE_P (label) = 1; emit_insn (gen_set_rip_rex64 (reg10, label)); emit_insn (gen_set_got_offset_rex64 (reg11, label)); emit_insn (ix86_gen_add3 (reg10, reg10, reg11)); @@ -25016,6 +25012,17 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op return ""; } + +/* Output the assembly for a SET_RIP instruction. We do so with this output + function to ensure that the label and %rip load instruction are together. */ + +const char * +ix86_output_set_rip_insn (rtx *operands) +{ + output_asm_label (operands[1]); + output_asm_insn ("lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]}", operands); + return ""; +} /* Clear stack slot assignments remembered from previous functions. This is called from INIT_EXPANDERS once before RTL is emitted for each Index: config/i386/i386.md === --- config/i386/i386.md (revision 215483) +++ config/i386/i386.md (working copy) @@ -12010,7 +12010,7 @@ [(set (match_operand:DI 0 "register_operand" "=r") (unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_RIP))] "TARGET_64BIT" - "lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]}" + "* return ix86_output_set_rip_insn (operands);" [(set_attr "type" "lea") (set_attr "length_address" "4") (set_attr "mode" "DI")])
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 2:30 PM, Ilya Enkovich wrote: > I didn't generate references separately from label. Now I found an > old patch and a test where this problem appeared. In this patch I > moved set_rip generation currently performed in ix86_expand_prologue > into expand pass. And I got following code in expand dump for > testsuite/gcc.target/i386/pr55154.c test: > > (note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (note/s 2 7 3 2 "" NOTE_INSN_DELETED_LABEL 2) > (insn 3 2 4 2 (set (reg:DI 85) > (unspec:DI [ > (label_ref [2 deleted]) > ] UNSPEC_SET_RIP)) > /export/users/ienkovic/issues/4161/gcc/gcc/testsuite/gcc.target/i386/pr55154.c:9 > -1 > (insn_list:REG_LABEL_OPERAND 2 (nil))) > > There is a REG_LABEL_OPERAND generated but label is still removed. Because it should be a REG_LABEL_TARGET? AFAUI this is a contol flow insn so I'd expect it to be a jump_insn (and the note will be a TARGET note). But it's not a PC-set insn and a jump target the compiler will interpret as an infinite loop (if the insns are really in the order as above) which is clearly not what you want. So if you emit it as a jump_insn I'm not sure what will happen... Is it necessary to emit the label into a basic block? Ciao! Steven
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 11:57 AM, Ilya Enkovich wrote: > 2014-09-24 13:30 GMT+04:00 Steven Bosscher : >>> Description of LABEL_PRESERVE_P says label that should always be >>> considered to be needed. >> >> It's more specific than that, really: >> >> @item LABEL_PRESERVE_P (@var{x}) >> In a @code{code_label} or @code{note}, indicates that the label is >> referenced by >> code or data not visible to the RTL of a given function. > > I read another description: > /* 1 if RTX is a code_label that should always be considered to be needed. */ > #define LABEL_PRESERVE_P(RTX) \ > (RTL_FLAG_CHECK2 ("LABEL_PRESERVE_P", (RTX), CODE_LABEL, NOTE)->in_struct) Yes, from rtl.h. I'd recommend to always read the descriptions in doc/ (in this case doc/rtl.texi). The "documentation" in the header files is often not very comprehensive. >> The "not visible" part is important. If there are visible references >> to a label, then they should never be removed (obviously) and that >> should work through LABEL_NUSES. Unfortunately we are not very good at >> keeping LABEL_NUSES up-to-date (this is why all the >> rebuild_jump_labels() are still required). > > Does rebuild handle all kinds of instructions including those which use > UNSPEC? Yes. Patterns are walked (deep) and REG_LABEL notes are added for all labels encountered that are not already the JUMP_LABEL of INSN. If the label is reachable from XEXP(UNSPEC, 0) -- the 'E' operand -- then that label is visible. >> What appears to be the case here, is that you have a label between two >> basic blocks B1 and B2, and the label acts as a control flow barrier: >> B1 and B2 cannot be merged. Then this should be expressed in the CFG. >> Otherwise: What else prevents the merge_blocks CFG hooks from deleting >> the label? > > Label acts as a barrier here but it is a side effect. I don't care > about block merging. I just don't want label with usages to be > removed. Understood. Only, LABEL_PRESERVE_P is not the right means to achieve that. So let's get back to basics and see what the usages look like. AFAIU now, you emit the code label early, and add the references much later (in machine reorg?). Does your UNSPEC have the code_label as an operand? If so, what breaks if cfgcleanup removes the label? Is the insn no longer recognized? Or does the label not end up in the assembly output? Or ...? I can try to help figure out what breaks if you have a test case. FWIW, the LABEL_PRESERVE_P uses in config/i386/i386.c look suspect. It probably only works because those labels are added late, and the code paths that use (x86_64 large PIC code model) are not tested all that well... >>> That means even if we do not have any usages >>> we shouldn't remove it. >> >> Sorry, no. >> Even a LABEL_PRESERVE_P label can be deleted: It will be replaced by a >> NOTE_INSN_DELETED_LABEL. See cfgrtl.c:delete_insn(). > > According to description you quoted label marked by LABEL_PRESERVE_P > is used by some code or data. Let this use be not visible to the RTL > of a given function. It is still used, right? How can you remove it? The code_label rtx is removed, but the label itself is still output to the object file. The label number is retained in the CODE_LABEL_NUMBER of the NOTE_INSN_DELETED_LABEL. Look for how NOTE_INSN_DELETED_LABEL is handled in final.c. It's a hack IMHO, but that's how it has been since day 0 (see https://gcc.gnu.org/r104). Ciao! Steven
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 8:41 AM, Ilya Enkovich wrote: > 2014-09-23 20:06 GMT+04:00 Jeff Law: >> On 09/23/14 10:01, Steven Bosscher wrote: >>> Are you sure this patch is necessary, and is not just papering over >>> another problem? In the past, all cases I've seen where labels were >>> removed inadvertently were caused by incorrect reference counting or >>> missing REG_LABEL_* notes. > > Description of LABEL_PRESERVE_P says label that should always be > considered to be needed. It's more specific than that, really: @item LABEL_PRESERVE_P (@var{x}) In a @code{code_label} or @code{note}, indicates that the label is referenced by code or data not visible to the RTL of a given function. The "not visible" part is important. If there are visible references to a label, then they should never be removed (obviously) and that should work through LABEL_NUSES. Unfortunately we are not very good at keeping LABEL_NUSES up-to-date (this is why all the rebuild_jump_labels() are still required). What appears to be the case here, is that you have a label between two basic blocks B1 and B2, and the label acts as a control flow barrier: B1 and B2 cannot be merged. Then this should be expressed in the CFG. Otherwise: What else prevents the merge_blocks CFG hooks from deleting the label? > That means even if we do not have any usages > we shouldn't remove it. Sorry, no. Even a LABEL_PRESERVE_P label can be deleted: It will be replaced by a NOTE_INSN_DELETED_LABEL. See cfgrtl.c:delete_insn(). If you really want to prevent a label from being deleted, then LABEL_PRESERVE_P is not a sufficient condition. > Why can't we add some additional usages > later? If you add the usages later, then you're lying to the compiler ;-) >>> >>> Did the label use count drop to zero? Is there a REG_LABEL_TARGET note >>> for the label operand? > > In the current code of ix86_expand_prologue I don't see any notes > generation for set_rip_rex64 instruction which actually uses label. > But IMO this is another potential issue and we still shouldn't remove > labels with LABEL_PRESERVE_P. Notes are generated in jump.c:rebuild_jump_labels. They are automatically added when a label is not Ciao! Steven
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Fri, Sep 19, 2014 at 10:03 PM, Jeff Law wrote: > On 09/19/14 13:36, Ilya Enkovich wrote: >> >> Hi, >> >> During my work on enabling pseudo PIC register I've found that cfg cleaunp >> may remove lables with LABEL_PRESERVE_P set to 1. In my case I generated >> SET_RIP during expand pass and cfg cleanup removed label it used as an >> operand. Below is a patch that fixes it. It is not actually required for >> our latest PIC related patch but still seems to make sense. >> >> Bootstrapped and tested on linux-x86_64. >> >> Thanks, >> Ilya >> -- >> 2014-09-19 Ilya Enkovich >> >> * cfgcleanup.c (try_optimize_cfg): Do not remove label >> with LABEL_PRESERVE_P flag set. > > OK. Please install. > > Note for those not following the x86 32 bit PIC register discussion, I asked > Ilya to submit this separately. It was something an earlier version of his > patch triggered and it stood out as something that ought to be fixed > regardless of the final form of the PIC register changes that are in > progress. Jeff, Are you sure this patch is necessary, and is not just papering over another problem? In the past, all cases I've seen where labels were removed inadvertently were caused by incorrect reference counting or missing REG_LABEL_* notes. Did the label use count drop to zero? Is there a REG_LABEL_TARGET note for the label operand? Ciao! Steven
Re: ptx preliminary rtl patches [3/4]
On Thu, Sep 11, 2014 at 6:19 PM, Bernd Schmidt wrote: >>> What do you expect that function to do different? It returns the correct >>> value. >>> >> >> No different. Just that if you want to check whether DECL is a global >> variable then we have a predicate for it. So why use TREE_STATIC >> instead? >> >> In other words: Just trying to make/keep certain checks consistent. (A >> hopeless cause, but a noble one... ;-) > > > You're talking about a different patch here. This one is about > num_sign_bit_copies. Ah. *sigh* can't even keep two patches in my mind at any one time. The point about num_sign_bit_copies is that it doesn't really return the correct value IMHO, if there isn't really a correct value to speak of: What is the sign of TRUE or FALSE, the only two values a BImode value can take? A 1-bit precision integer can have value 0 or -1 and in that case num_sign_bit_copies should be 0. But for a BImode value, it seems to me that asking for the sign bit or sign bit copies is just wrong. Ciao! Steven
Re: ptx preliminary rtl patches [3/4]
On Thu, Sep 11, 2014 at 6:06 PM, Bernd Schmidt wrote: > On 09/11/2014 05:55 PM, Steven Bosscher wrote: >> >> On Thu, Sep 11, 2014 at 3:26 PM, Bernd Schmidt wrote: >>> >>> nvptx will be the first port to use BImode and have STORE_FLAG_VALUE==-1. >>> That has exposed a bug in combine where we can end up calling >>> num_sign_bit_copies for a BImode value. However, the return value is >>> always >>> 1 in that case, so it doesn't tell us anything and is going to be >>> misinterpreted by the caller. >>> >>> Bootstrapped and tested on x86_64-linux, together with the other patches. >>> Ok? >> >> >> This should be handled in num_sign_bit_copies itself, i.e. handle BImode >> there. > > > What do you expect that function to do different? It returns the correct > value. > No different. Just that if you want to check whether DECL is a global variable then we have a predicate for it. So why use TREE_STATIC instead? In other words: Just trying to make/keep certain checks consistent. (A hopeless cause, but a noble one... ;-) Ciao! Steven
Re: ptx preliminary rtl patches [4/4]
On Thu, Sep 11, 2014 at 3:27 PM, Bernd Schmidt wrote: > It turns out that we're calling eliminate_regs for global variables which > can't possibly have eliminable regs in their decl. At that point, > reg_eliminate can be NULL. This patch avoids unnecessary work, and allows us > to add an assert to eliminate_regs later. > > Bootstrapped and tested on x86_64-linux, together with the other patches. > Ok? Why not use is_global_var()? Ciao! Steven
Re: ptx preliminary rtl patches [3/4]
On Thu, Sep 11, 2014 at 3:26 PM, Bernd Schmidt wrote: > nvptx will be the first port to use BImode and have STORE_FLAG_VALUE==-1. > That has exposed a bug in combine where we can end up calling > num_sign_bit_copies for a BImode value. However, the return value is always > 1 in that case, so it doesn't tell us anything and is going to be > misinterpreted by the caller. > > Bootstrapped and tested on x86_64-linux, together with the other patches. > Ok? This should be handled in num_sign_bit_copies itself, i.e. handle BImode there. Ciao! Steven
Re: ptx preliminary rtl patches [2/4]
On Thu, Sep 11, 2014 at 3:25 PM, Bernd Schmidt wrote: > Bootstrapped and tested on x86_64-linux, together with the other patches. > Ok? This is OK. Ciao! Steven
Re: [PATCH] Force rtl templates to be inlined
On Tue, Sep 2, 2014 at 9:22 AM, Andrew Pinski wrote: > On Tue, Sep 2, 2014 at 12:20 AM, Andi Kleen wrote: >> >>> there have been bugs in the past in the area of always_inline too. >> >> You're arguing for my patch. It would find those bugs. > > > No I am arguing against it since the older versions of GCC we cannot change. Should such bugs turn up, we can account for them in ansidecl.h. I think Andi's patch should go in. Ciao! Steven
Re: [FORTRAN PATCH] Quash two -Wlogical-not-parentheses warnings
On Mon, Sep 1, 2014 at 3:23 PM, Marek Polacek wrote: > diff --git gcc/fortran/interface.c gcc/fortran/interface.c > index b210d18..68d8545 100644 > --- gcc/fortran/interface.c > +++ gcc/fortran/interface.c > @@ -2014,7 +2014,7 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual, >if (formal->ts.type == BT_CLASS && formal->attr.class_ok >&& actual->expr_type != EXPR_NULL >&& ((CLASS_DATA (formal)->attr.class_pointer > - && !formal->attr.intent == INTENT_IN) > + && (!formal->attr.intent) == INTENT_IN) >|| CLASS_DATA (formal)->attr.allocatable)) > { >if (actual->ts.type != BT_CLASS) This is certainly not OK, intent is a tri-state. > diff --git gcc/fortran/trans-expr.c gcc/fortran/trans-expr.c > index f2ed474..6592c7e 100644 > --- gcc/fortran/trans-expr.c > +++ gcc/fortran/trans-expr.c > @@ -4589,7 +4589,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, > && e->expr_type == EXPR_VARIABLE > && (!e->ref > || (e->ref->type == REF_ARRAY > - && !e->ref->u.ar.type != AR_FULL)) > + && (!e->ref->u.ar.type) != AR_FULL)) > && e->symtree->n.sym->attr.optional) > { > tmp = fold_build3_loc (input_location, COND_EXPR, Also not OK. You probably want to wrap the (in)equality tests in parenthesis. Ciao! Steven
Re: [patch] Why xstrdup cgraph node names for dumpfiles?
On Tue, Aug 26, 2014 at 10:52 PM, Uros Bizjak wrote: > Hello! > >> I noticed most of the cgraph and IPA files use xstrdup for cgraph node >> names when printing to dump_file. Very leaky... >> >> What is the reason for all those xstrdups? I couldn't think of any. > > Please see [1] and [2]. > > [1] https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01904.html > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53136 Thanks. Well, that is surprising. The cxx_printable_name_internal uses a ring buffer for caching the names. Apparently that didn't work in your PR. The leaks get quite large over time, especially with a large code base and many small functions. I noticed this when compiling a large code base with dumping enabled and ran out of memory. I assume you also tried increasing the ring buffer size? Oh well, probably not very robust, and we do seem to leak a lot of strings in general... Ciao! Steven
[patch] Why xstrdup cgraph node names for dumpfiles?
Hello Martin, Honza, I noticed most of the cgraph and IPA files use xstrdup for cgraph node names when printing to dump_file. Very leaky... What is the reason for all those xstrdups? I couldn't think of any. Thoughts? Ciao! Steven * cgraph.c (cgraph_node::get_create): Don't xstrdup cgraph node names. (cgraph_edge::make_speculative): Likewise. (cgraph_edge::resolve_speculation): Likewise. (cgraph_edge::redirect_call_stmt_to_callee): Likewise. (cgraph_node::dump): Likewise. * cgraphclones.c (symbol_table::materialize_all_clones): Likewise. * ipa-cp.c (perhaps_add_new_callers): Likewise. * ipa-inline.c (report_inline_failed_reason): Likewise. (want_early_inline_function_p): Likewise. (edge_badness): Likewise. (update_edge_key): Likewise. (flatten_function): Likewise. (inline_always_inline_functions): Likewise. (early_inline_small_functions): Likewise. * ipa-profile.c (ipa_profile): Likewise. * ipa-prop.c (ipa_print_node_jump_functions): Likewise. (ipa_make_edge_direct_to_target): Likewise. (remove_described_reference): Likewise. (propagate_controlled_uses): Likewise. * ipa-utils.c (ipa_merge_profiles): Likewise. * tree-sra.c (convert_callers_for_node): Likewise. Index: cgraph.c === --- cgraph.c(revision 214545) +++ cgraph.c(working copy) @@ -489,11 +489,11 @@ cgraph_node::get_create (tree decl) if (dump_file) fprintf (dump_file, "Introduced new external node " "(%s/%i) and turned into root of the clone tree.\n", -xstrdup (node->name ()), node->order); +node->name (), node->order); } else if (dump_file) fprintf (dump_file, "Introduced new external node " -"(%s/%i).\n", xstrdup (node->name ()), +"(%s/%i).\n", node->name (), node->order); return node; } @@ -1036,8 +1036,8 @@ cgraph_edge::make_speculative (cgraph_node *n2, gc { fprintf (dump_file, "Indirect call -> speculative call" " %s/%i => %s/%i\n", - xstrdup (n->name ()), n->order, - xstrdup (n2->name ()), n2->order); + n->name (), n->order, + n2->name (), n2->order); } speculative = true; e2 = n->create_edge (n2, call_stmt, direct_count, direct_frequency); @@ -1155,16 +1155,16 @@ cgraph_edge::resolve_speculation (tree callee_decl { fprintf (dump_file, "Speculative indirect call %s/%i => %s/%i has " "turned out to have contradicting known target ", - xstrdup (edge->caller->name ()), edge->caller->order, - xstrdup (e2->callee->name ()), e2->callee->order); + edge->caller->name (), edge->caller->order, + e2->callee->name (), e2->callee->order); print_generic_expr (dump_file, callee_decl, 0); fprintf (dump_file, "\n"); } else { fprintf (dump_file, "Removing speculative call %s/%i => %s/%i\n", - xstrdup (edge->caller->name ()), edge->caller->order, - xstrdup (e2->callee->name ()), e2->callee->order); + edge->caller->name (), edge->caller->order, + e2->callee->name (), e2->callee->order); } } } @@ -1284,9 +1284,9 @@ cgraph_edge::redirect_call_stmt_to_callee (void) if (dump_file) fprintf (dump_file, "Not expanding speculative call of %s/%i -> %s/%i\n" "Type mismatch.\n", -xstrdup (e->caller->name ()), +e->caller->name (), e->caller->order, -xstrdup (e->callee->name ()), +e->callee->name (), e->callee->order); e = e->resolve_speculation (); /* We are producing the final function body and will throw away the @@ -1303,9 +1303,9 @@ cgraph_edge::redirect_call_stmt_to_callee (void) fprintf (dump_file, "Expanding speculative call of %s/%i -> %s/%i count:" "%"PRId64"\n", -xstrdup (e->caller->name ()), +e->caller->name (), e->caller->order, -xstrdup (e->callee->name ()), +e->callee->name (), e->callee->order, (int64_t)e->count); gcc_assert (e2->speculative); @@ -1353,8 +1353,8 @@ cgraph_edge::redirect_call_stmt_to_callee (void) if (symtab->dump_file) { fprintf (symtab->dump_file, "updating call of %s/%i -> %s/%i: ", - xstrdup (e->caller->name ()), e->caller->order, - xstrdup (e->callee->na
Re: [BUILDROBOT][PATCH] frv-linux fallout (was: [PATCH 009/236] Replace BB_HEAD et al macros with functions)
On Mon, Aug 25, 2014 at 9:29 PM, Mike Stump wrote: > On Aug 25, 2014, at 7:08 AM, David Malcolm wrote: >> It's too late now to switch to this approach, so in the meantime I've >> been working on ways to make my bootstraps as fast as possible. > > -j64 works wonders. :-) Though, it is annoying watching the build run at > 98% idle. And genautomata just sits there... waiting... waiting... wai.. Ciao! Steven
[patch] PR fortran/61669
Hello, This bug is an error recovery issue. A data declaration is parsed and accepted, and added to gfc_current_ns->data, but the statement is rejected. The rejected data decl is not rolled back, causing memory corruption later on. Proposed fix is to roll back DATA for rejected statements. Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk? Ciao! Steven fortran/ PR fortran/61669 * gfortran.h (struct gfc_namespace): Add OLD_DATA field. * decl.c (gfc_reject_data): New function. * parse.c *use_modules): Record roll-back point. (next_statement): Likewise. (reject_statement): Roll back to last accepted DATA. testsuite/ PR fortran/61669 * gfortran.dg/pr61669.f90: New test. Index: fortran/gfortran.h === --- fortran/gfortran.h (revision 214350) +++ fortran/gfortran.h (working copy) @@ -1625,7 +1625,7 @@ typedef struct gfc_namespace gfc_st_label *st_labels; /* This list holds information about all the data initializers in this namespace. */ - struct gfc_data *data; + struct gfc_data *data, *old_data; gfc_charlen *cl_list, *old_cl_list; @@ -2941,6 +2941,7 @@ void gfc_free_omp_namelist (gfc_omp_namelist *); void gfc_free_equiv (gfc_equiv *); void gfc_free_equiv_until (gfc_equiv *, gfc_equiv *); void gfc_free_data (gfc_data *); +void gfc_reject_data (gfc_namespace *); void gfc_free_case_list (gfc_case *); /* matchexp.c -- FIXME too? */ Index: fortran/decl.c === --- fortran/decl.c (revision 214350) +++ fortran/decl.c (working copy) @@ -178,7 +178,21 @@ gfc_free_data_all (gfc_namespace *ns) } } +/* Reject data parsed since the last restore point was marked. */ +void +gfc_reject_data (gfc_namespace *ns) +{ + gfc_data *d; + + while (ns->data && ns->data != ns->old_data) +{ + d = ns->data->next; + free (ns->data); + ns->data = d; +} +} + static match var_element (gfc_data_variable *); /* Match a list of variables terminated by an iterator and a right Index: fortran/parse.c === --- fortran/parse.c (revision 214350) +++ fortran/parse.c (working copy) @@ -118,6 +118,7 @@ use_modules (void) gfc_warning_check (); gfc_current_ns->old_cl_list = gfc_current_ns->cl_list; gfc_current_ns->old_equiv = gfc_current_ns->equiv; + gfc_current_ns->old_data = gfc_current_ns->data; last_was_use_stmt = false; } @@ -1097,6 +1098,7 @@ next_statement (void) gfc_current_ns->old_cl_list = gfc_current_ns->cl_list; gfc_current_ns->old_equiv = gfc_current_ns->equiv; + gfc_current_ns->old_data = gfc_current_ns->data; for (;;) { gfc_statement_label = NULL; @@ -2045,6 +2047,8 @@ reject_statement (void) gfc_free_equiv_until (gfc_current_ns->equiv, gfc_current_ns->old_equiv); gfc_current_ns->equiv = gfc_current_ns->old_equiv; + gfc_reject_data (gfc_current_ns); + gfc_new_block = NULL; gfc_undo_symbols (); gfc_clear_warning (); Index: testsuite/gfortran.dg/pr61669.f90 === --- testsuite/gfortran.dg/pr61669.f90 (revision 0) +++ testsuite/gfortran.dg/pr61669.f90 (working copy) @@ -0,0 +1,8 @@ +! { dg-do compile } + write (*,"(a)") char(12) + CHARACTER*80 A /"A"/ ! { dg-error "Unexpected data declaration statement" } + REAL*4 B ! { dg-error "Unexpected data declaration statement" } + write (*,"(a)") char(12) + DATA B / 0.02 / + END +
[patch] PR fortran/62135
Hello, Low-hanging fruit, almost embarassing to fix. But then again I caused this bug -- in 1999 or so :-) Will commit after testing. Ciao! Steven fortran/ * resolve.c (resolve_select): Fix list traversal in case the last element of the CASE list was dropped as unreachable code. testsuite/ * gfortran.dg/pr62135.f90: New test. Index: fortran/resolve.c === --- fortran/resolve.c (revision 214292) +++ fortran/resolve.c (working copy) @@ -7761,7 +7761,7 @@ resolve_select (gfc_code *code, bool select_type) /* Strip all other unreachable cases. */ if (body->ext.block.case_list) { - for (cp = body->ext.block.case_list; cp->next; cp = cp->next) + for (cp = body->ext.block.case_list; cp && cp->next; cp = cp->next) { if (cp->next->unreachable) { Index: testsuite/gfortran.dg/pr62135.f90 === --- testsuite/gfortran.dg/pr62135.f90 (revision 0) +++ testsuite/gfortran.dg/pr62135.f90 (working copy) @@ -0,0 +1,17 @@ +! { dg-do compile } +! { dg-options -Wsurprising } + + PROGRAM PR62135 + IMPLICIT NONE + CHARACTER*1 :: choice + choice = 'x' + SELECT CASE (choice) + ! This triggered an ICE: an unreachable case clause + ! as the last of a list. + CASE ('2':'7','9':'0') ! { dg-warning "can never be matched" } +WRITE(*,*) "barf" + CASE DEFAULT +CONTINUE + END SELECT + END PROGRAM PR62135 +
Re: [RFC, PATCH][LRA, MIPS] ICE: in decompose_normal_address, at rtlanal.c:5817
On Fri, Aug 15, 2014 at 5:45 PM, Robert Suchanek wrote: > gcc/ > * rtlanal.c (get_base_term): Accept HIGH as the base term. > > > diff --git gcc/rtlanal.c gcc/rtlanal.c > index 82cfc1bf..2bea2ca 100644 > --- gcc/rtlanal.c > +++ gcc/rtlanal.c > @@ -5624,6 +5624,7 @@ get_base_term (rtx *inner) > inner = strip_address_mutations (&XEXP (*inner, 0)); >if (REG_P (*inner) >|| MEM_P (*inner) > + || GET_CODE (*inner) == HIGH >|| GET_CODE (*inner) == SUBREG) > return inner; >return 0; This is not correct, BASE is a *variable* expression, HIGH is a *constant* expression. It's hard to say what the correct fix should be, but it sounds like the address you get after the substitutions should be simplified (folded). B.R., Steven
Re: [PATCH] Add target macro to override DWARF2 frame register size
On Fri, Aug 1, 2014 at 3:31 PM, Matthew Fortune wrote: > This patch adds a target macro Please don't add target macros. Add a hook if you must, but we're supposed to remove target macros, not add new ones :-) Ciao! Steven
Re: [PATCH] Fix INSN_TICK heuristic for SCHED_PRESSURE
On Mon, Jul 14, 2014 at 10:09 AM, Maxim Kuvyrkov wrote: > On Jul 14, 2014, at 8:05 PM, Steven Bosscher wrote: > >> On Mon, Jul 14, 2014 at 6:12 AM, Maxim Kuvyrkov wrote: >>> Hi, >>> >>> This patch fixes 2 of the [several] problems in rank_for_schedule >>> heuristics for SCHED_PRESSURE_MODEL. SCHED_PRESSURE_MODEL is used for >>> first scheduling pass in ARM backend by default. >>> >>> The first one is when INSN_TICK of both instructions are equal, and >>> rank_for_schedule returns a "tie" result, even though there are more >>> heuristics down the path to break the tie. >>> >>> The second one is to account for the fact that model_index() of two >>> instructions is meaningful only when both instructions are in the current >>> basic block. >>> >>> Bootstrapped and tested on {x86_64,arm,aarch64}-linux. >>> >>> OK to apply? >> >> s/INSN_BB/BLOCK_FOR_INSN/ > > That would be a mistake, see definition of INSN_BB in sched-int.h. Scheduler > uses its own basic block numbering. Eh, right. I seem to be confused here with older sched CFG oddities but this isn't one of those... >> >> OK with that change. > > I assume OK without the change? Yup. Ciao! Steven
Re: [PATCH] Fix INSN_TICK heuristic for SCHED_PRESSURE
On Mon, Jul 14, 2014 at 6:12 AM, Maxim Kuvyrkov wrote: > Hi, > > This patch fixes 2 of the [several] problems in rank_for_schedule heuristics > for SCHED_PRESSURE_MODEL. SCHED_PRESSURE_MODEL is used for first scheduling > pass in ARM backend by default. > > The first one is when INSN_TICK of both instructions are equal, and > rank_for_schedule returns a "tie" result, even though there are more > heuristics down the path to break the tie. > > The second one is to account for the fact that model_index() of two > instructions is meaningful only when both instructions are in the current > basic block. > > Bootstrapped and tested on {x86_64,arm,aarch64}-linux. > > OK to apply? s/INSN_BB/BLOCK_FOR_INSN/ OK with that change. Ciao! Steven
Re: Fix PR61772: ifcvt removing asm volatile gotos
On Fri, Jul 11, 2014 at 2:34 PM, Michael Matz wrote: > PR rtl-optimization/61772 > * ifcvt.c (dead_or_predicable): Check jump to be free of side > effects. This is OK. Ciao! Steven
Re: Instructions vs Expressions in the backend (was Re: RFA: Rework FOR_BB_INSNS iterators)
On Wed, Jun 25, 2014 at 10:46 PM, Jeff Law wrote: > On 06/25/14 02:54, Richard Sandiford wrote: >> >> SEQUENCE is just weird though :-) It would be good to have an alternative >> representation, but that'd be a lot of work on reorg. > > Yea. And I don't think anyone is keen on rewriting reorg. Rewriting/revamping reorg is not really the problem, IMHO. Last year I hacked a bit on a new delayed-branch scheduler, and I got results that were not too bad (especially considering my GCC time is only a few hours per week). The the real problem is designing that alternative representation of delay slots, and teaching the back ends about that. Just communicating a delayed-branch sequence to the back ends is pretty awful (a global variable in final.c) and a lot of back-end code has non-obvious assumptions about jumping across insns contained in SEQUENCEs. There's also one back end (mep?) that uses SEQUENCE for bundles (RTL abuse is not considered bad practice by all maintainers ;-). (I actually found SEQUENCE to be quite nice to work with when I allowed them to be the last insn in a basic block. One of my goals was to retain the CFG across dbr_sched, but that turned out to be blocked by other things than dbr_sched, like fake insns that some back ends emit between basic blocks, e.g. for constant pools). Having some kind of "insns container" like SEQUENCE would IMHO not be a bad thing, perhaps a necessity, and perhaps even an improvement (like for representing bundles), as long as we can assign sane semantics to it w.r.t. next/prev insn. SEQUENCE wasn't designed with its current application in mind... Ciao! Steven
Re: breakage with "[PATCH 1/6] Add FOR_EACH_INSN{_INFO}_{DEFS,USES,EQ_USES}"
On Mon, Jun 16, 2014 at 12:36 AM, Hans-Peter Nilsson wrote: > On Sun, 15 Jun 2014, Steven Bosscher wrote: >> On Sun, Jun 15, 2014 at 1:27 PM, Hans-Peter Nilsson wrote: >> > /tmp/hpautotest-gcc0/gcc/gcc/auto-inc-dec.c: In function 'void >> > merge_in_block(int, basic_block_def*)': >> > /tmp/hpautotest-gcc0/gcc/gcc/auto-inc-dec.c:1442: error: 'uid' >> > was not declared in this scope >> > make[2]: *** [auto-inc-dec.o] Error 1 >> > >> > brgds, H-P >> >> >> Bah, this is why I just *hate* all the gcc code that's only compiled >> if certain #defines are set... > > I couldn't agree more. Might not have been obvious when writing > the mosly-mechanical patch, still the auto-inc-dec.c name should > have been a red flag that a representative target should have > been tested (i.e. not x86_64 and i686). I agree, but I think you'd agree with me if I say that Richard S. is one of the few people who almost always goes beyond the normal amount of testing required for a patch. Breakage like this will just happen to us all, every once in a while, until we compile all middle-end code at least, regardless of #defines and whatnot (conditionally compiled code, from the top of my head: CC0, scheduler, dbrsched, auto-inc-dec, HAVE_conditional_move, etc...). Ciao! Steven
Re: breakage with "[PATCH 1/6] Add FOR_EACH_INSN{_INFO}_{DEFS,USES,EQ_USES}"
On Sun, Jun 15, 2014 at 1:27 PM, Hans-Peter Nilsson wrote: > On Sat, 14 Jun 2014, Richard Sandiford wrote: > >> To make the final representation change easier, this patch introduces >> macros for iterating over lists of defs, uses and eq_uses. At the >> moment there are three possible keys when accessing df_ref lists: >> the insn rtx (DF_INSN_*), the insn uid (DF_INSN_UID_*) and the >> df_insn_info (DF_INSN_INFO_*). I don't think it's worth adding >> iterators for uids though. Any code that's going to the trouble of >> caching the uid might as well go the whole hog and cache the underlying >> df_insn_info. >> >> After the feedback to the BB iterator patch: >> >> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00676.html >> >> I've kept the iterator variable definitions outside the FOR_* macros >> rather than make the FOR_* macros define the variables themselves. >> >> Richard >> >> >> gcc/ >> * df.h (DF_INSN_INFO_MWS, FOR_EACH_INSN_INFO_DEF): New macros. >> (FOR_EACH_INSN_INFO_USE, FOR_EACH_INSN_INFO_EQ_USE): Likewise. >> (FOR_EACH_INSN_DEF, FOR_EACH_INSN_USE, FOR_EACH_INSN_EQ_USE): Likewise. >> * auto-inc-dec.c (find_inc, merge_in_block): Use them. > > > One of these patches (in 211677:211684) broke cris-elf: > > g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall > \ > -Wwrite-strings -Wcast-qual -Wmissing-format-attribute > -Woverloaded-virtual -pedantic -Wno-long-long > -Wno-variadic-macr\ > os -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. > -I/tmp/hpautotest-gcc0/gcc/gcc -I/tmp/hpautotest-gcc0/g\ > cc/gcc/. -I/tmp/hpautotest-gcc0/gcc/gcc/../include > -I/tmp/hpautotest-gcc0/gcc/gcc/../libcpp/include > -I/tmp/hpautotest-g\ > cc0/cris-elf/gccobj/./gmp -I/tmp/hpautotest-gcc0/gcc/gmp > -I/tmp/hpautotest-gcc0/cris-elf/gccobj/./mpfr -I/tmp/hpautotes\ > t-gcc0/gcc/mpfr -I/tmp/hpautotest-gcc0/gcc/mpc/src > -I/tmp/hpautotest-gcc0/gcc/gcc/../libdecnumber > -I/tmp/hpautotest-gc\ > c0/gcc/gcc/../libdecnumber/dpd -I../libdecnumber > -I/tmp/hpautotest-gcc0/gcc/gcc/../libbacktrace-o > auto-inc-dec.o -M\ > T auto-inc-dec.o -MMD -MP -MF ./.deps/auto-inc-dec.TPo > /tmp/hpautotest-gcc0/gcc/gcc/auto-inc-dec.c > /tmp/hpautotest-gcc0/gcc/gcc/auto-inc-dec.c: In function 'void > merge_in_block(int, basic_block_def*)': > /tmp/hpautotest-gcc0/gcc/gcc/auto-inc-dec.c:1442: error: 'uid' > was not declared in this scope > make[2]: *** [auto-inc-dec.o] Error 1 > > brgds, H-P Bah, this is why I just *hate* all the gcc code that's only compiled if certain #defines are set... Can you please try: Index: auto-inc-dec.c === --- auto-inc-dec.c (revision 211685) +++ auto-inc-dec.c (working copy) @@ -1439,7 +1439,8 @@ merge_in_block (int max_reg, basic_block bb) } } else if (dump_file) - fprintf (dump_file, "skipping update of deleted insn %d\n", uid); + fprintf (dump_file, "skipping update of deleted insn %d\n", +INSN_UID (insn)); } /* If we were successful, try again. There may have been several Ciao! Steven
Re: [PATCH 0/6] Make df_ref representation more efficient
On Sat, Jun 14, 2014 at 9:36 PM, Richard Sandiford wrote: > Using a linked list gives a consistent 2% compile-time improvement for > fold-const.ii -O0 and ~1% for various -O2 compiles I tried. The df > routines do still show up high on the profile though. Can you explain a bit more about what shows up high? Ciao! Steven
Re: Remove some unnecessary null checks in df.h
On Sat, Jun 14, 2014 at 10:02 PM, Richard Sandiford wrote: > DF_REF_REG_USE_P and DF_MWS_REG_USE_P checked for null arguments > but the def equivalents didn't. There doesn't seem to be any > need for this difference. Right, looking at the users (one user for each) of these macros, I don't think REF ever can be NULL. > gcc/ > * df.h (DF_REF_REG_USE_P, DF_MWS_REG_USE_P): Remove null checks. OK. Ciao! Steven
Re: [PATCH 6/6] Use a linked list for insn defs and uses
On Sat, Jun 14, 2014 at 9:53 PM, Richard Sandiford wrote: > gcc/ > * df.h (df_mw_hardreg, df_base_ref): Add a link pointer. > (df_insn_info): Turn defs, uses, eq_uses and mw_hardregs into linked > lists. > (df_scan_bb_info): Likewise artificial_defs and artificial_uses. > (FOR_EACH_INSN_INFO_DEF, FOR_EACH_INSN_INFO_USE) > (FOR_EACH_INSN_INFO_EQ_USE, FOR_EACH_INSN_INFO_MW) > (FOR_EACH_ARTIFICIAL_USE, FOR_EACH_ARTIFICIAL_DEF) > (df_get_artificial_defs, df_get_artificial_uses) > (df_single_def, df_single_use): Update accordingly. > (df_refs_chain_dump): Take the first element in a linked list as > parameter, rather than a pointer to an array of pointers. > * df-core.c (df_refs_chain_dump, df_mws_dump): Likewise. > * df-problems.c (df_rd_bb_local_compute_process_def): Likewise. > (df_chain_create_bb_process_use): Likewise. > (df_md_bb_local_compute_process_def): Likewise. > * fwprop.c (process_defs, process_uses): Likewise. > (register_active_defs, update_uses): Likewise. > (forward_propagate_asm): Update for new df_ref linking. > * df-scan.c (df_scan_free_ref_vec, df_scan_free_mws_vec): Delete. > (df_null_ref_rec, df_null_mw_rec): Likewise. > (df_scan_free_internal): Don't free df_ref and df_mw_hardreg lists > explicitly. > (df_scan_free_bb_info): Remove check for null artificial_defs. > (df_install_ref_incremental): Adjust for new df_ref linking. > Use a single-element insertion rather than a full sort. > (df_ref_chain_delete_du_chain): Take the first element > in a linked list as parameter, rather than a pointer to an array of > pointers. > (df_ref_chain_delete, df_mw_hardreg_chain_delete): Likewise. > (df_add_refs_to_table, df_refs_verify, df_mws_verify): Likewise. > (df_insn_info_delete): Remove check for null defs and call to > df_scan_free_mws_vec. > (df_insn_rescan): Initialize df_ref and df_mw_hardreg lists to > null rather than df_null_*_rec. > (df_insn_rescan_debug_internal): Likewise, and update null > checks in the same way. Remove check for null defs. > (df_ref_change_reg_with_loc_1): Fix choice of list for defs. > Move a single element rather doing a full sort. > (df_mw_hardreg_chain_delete_eq_uses): Adjust for new df_mw_hardreg > linking. > (df_notes_rescan): Likewise. Use a merge rather than a full sort. > Initialize df_ref and df_mw_hardreg lists to null rather than > df_null_*_rec. > (df_ref_compare): Take df_refs as parameter, transferring the > old interface to... > (df_ref_ptr_compare): ...this new function. > (df_sort_and_compress_refs): Update accordingly. > (df_mw_compare): Take df_mw_hardregs as parameter, transferring the > old interface to... > (df_mw_ptr_compare): ...this new function. > (df_sort_and_compress_mws): Update accordingly. > (df_install_refs, df_install_mws): Return a linked list rather than > an array of pointers. > (df_refs_add_to_chains): Assert that old lists are empty rather > than freeing them. > (df_insn_refs_verify): Don't handle null defs speciailly. > * web.c (union_match_dups): Update for new df_ref linking. I would prefer a macro for base.next_loc (DF_REF_NEXT_LOC?). Other than that: OK. Ciao! Steven
Re: [PATCH 5/6] Remove dead code
On Sat, Jun 14, 2014 at 9:48 PM, Richard Sandiford wrote: > This patch removes some dead code that would otherwise need to be > changed in the final patch. These functions were intended to allow passes to update DF info manually. That never was a very practical idea, apparently. > gcc/ > * df.h (df_ref_create, df_ref_remove): Delete. > * df-scan.c (df_ref_create, df_ref_compress_rec): Likewise. > (df_ref_remove): Likewise. OK. Ciao! Steven
Re: [PATCH 4/6] Add df_single_{def,use} helper functions
On Sat, Jun 14, 2014 at 9:47 PM, Richard Sandiford wrote: > IRA wants to know whether a particular insn has a single def or use. > That seems worth putting in a utility function, again to hide the > underlying representation a bit. I would have sworn we already had functions for this, but apparently not... > * ira.c (find_moveable_pseudos): Use them. OK. Ciao! Steven
Re: [PATCH 3/6] Add FOR_EACH_INSN_INFO_MW
On Sat, Jun 14, 2014 at 9:45 PM, Richard Sandiford wrote: > gcc/ > * df.h (FOR_EACH_INSN_INFO_MW): New macro. > * df-problems.c (df_note_bb_compute): Use it. > * regstat.c (regstat_bb_compute_ri): Likewise. OK. Ciao! Steven
Re: [PATCH 2/6] FOR_EACH_ARTIFICIAL_{DEF,USE}
On Sat, Jun 14, 2014 at 9:44 PM, Richard Sandiford wrote: > gcc/ > * df.h (FOR_EACH_ARTIFICIAL_USE, FOR_EACH_ARTIFICIAL_DEF): New macros. > * cse.c (cse_extended_basic_block): Use them. > * dce.c (mark_artificial_use): Likewise. > * df-problems.c (df_rd_simulate_artificial_defs_at_top): Likewise. > (df_lr_bb_local_compute, df_live_bb_local_compute): Likewise. > (df_chain_remove_problem, df_chain_bb_dump): Likewise. > (df_word_lr_bb_local_compute, df_note_bb_compute): Likewise. > (df_simulate_initialize_backwards): Likewise. > (df_simulate_finalize_backwards): Likewise. > (df_simulate_initialize_forwards): Likewise. > (df_md_simulate_artificial_defs_at_top): Likewise. > * df-scan.c (df_reorganize_refs_by_reg_by_insn): Likewise. > * regrename.c (init_rename_info): Likewise. > * regstat.c (regstat_bb_compute_ri): Likewise. > (regstat_bb_compute_calls_crossed): Likewise. OK. Ciao! Steven
Re: [PATCH 1/6] Add FOR_EACH_INSN{_INFO}_{DEFS,USES,EQ_USES}
On Sat, Jun 14, 2014 at 9:43 PM, Richard Sandiford wrote: > gcc/ > * df.h (DF_INSN_INFO_MWS, FOR_EACH_INSN_INFO_DEF): New macros. > (FOR_EACH_INSN_INFO_USE, FOR_EACH_INSN_INFO_EQ_USE): Likewise. > (FOR_EACH_INSN_DEF, FOR_EACH_INSN_USE, FOR_EACH_INSN_EQ_USE): > Likewise. > * auto-inc-dec.c (find_inc, merge_in_block): Use them. > * combine.c (create_log_links): Likewise. > * compare-elim.c (find_flags_uses_in_insn): Likewise. > (try_eliminate_compare): Likewise. > * cprop.c (make_set_regs_unavailable, mark_oprs_set): Likewise. > * dce.c (deletable_insn_p, find_call_stack_args): Likewise. > (remove_reg_equal_equiv_notes_for_defs): Likewise. > (reset_unmarked_insns_debug_uses, mark_reg_dependencies): Likewise. > (word_dce_process_block, dce_process_block): Likewise. > * ddg.c (def_has_ccmode_p): Likewise. > * df-core.c (df_bb_regno_first_def_find): Likewise. > (df_bb_regno_last_def_find, df_find_def, df_find_use): Likewise. > * df-problems.c (df_rd_simulate_one_insn): Likewise. > (df_lr_bb_local_compute, df_live_bb_local_compute): Likewise. > (df_chain_remove_problem, df_chain_insn_top_dump): Likewise. > (df_chain_insn_bottom_dump, df_word_lr_bb_local_compute): Likewise. > (df_word_lr_simulate_defs, df_word_lr_simulate_uses): Likewise. > (df_remove_dead_eq_notes, df_note_bb_compute): Likewise. > (df_simulate_find_defs, df_simulate_find_uses): Likewise. > (df_simulate_find_noclobber_defs, df_simulate_defs): Likewise. > (df_simulate_uses, df_md_simulate_one_insn): Likewise. > * df-scan.c (df_reorganize_refs_by_reg_by_insn): Likewise. > * fwprop.c (local_ref_killed_between_p): Likewise. > (all_uses_available_at, free_load_extend): Likewise. > * gcse.c (update_bb_reg_pressure, calculate_bb_reg_pressure): > Likewise. > * hw-doloop.c (scan_loop): Likewise. > * ifcvt.c (dead_or_predicable): Likewise. > * init-regs.c (initialize_uninitialized_regs): Likewise. > * ira-lives.c (mark_hard_reg_early_clobbers): Likewise. > (process_bb_node_lives): Likewise. > * ira.c (compute_regs_asm_clobbered, build_insn_chain): Likewise. > (find_moveable_pseudos): Likewise. > * loop-invariant.c (check_dependencies, record_uses): Likewise. > * recog.c (peep2_find_free_register): Likewise. > * ree.c (get_defs): Likewise. > * regstat.c (regstat_bb_compute_ri): Likewise. > (regstat_bb_compute_calls_crossed): Likewise. > * sched-deps.c (find_inc, find_mem): Likewise. > * sel-sched-ir.c (maybe_downgrade_id_to_use): Likewise. > (maybe_downgrade_id_to_use, setup_id_reg_sets): Likewise. > * shrink-wrap.c (requires_stack_frame_p): Likewise. > (prepare_shrink_wrap): Likewise. > * store-motion.c (compute_store_table, build_store_vectors): Likewise. > * web.c (union_defs, pass_web::execute): Likewise. > * config/i386/i386.c (increase_distance, insn_defines_reg): Likewise. > (insn_uses_reg_mem, ix86_ok_to_clobber_flags): Likewise. OK. Nice :-) Ciao! Steven
Re: [patch i386]: Combine memory and indirect jump
On Wed, Jun 11, 2014 at 10:32 AM, Kai Tietz wrote: > this patch adds simple combining of indirect-jumps on memory-address. > This patch is pretty similar to sibcall-combing. > ChangeLog > > 2014-06-11 Kai Tietz > > * config/i386/i386.md (peehole2): To combine > indirect jump with memory. Likely fixes part of PR39284, xf. https://gcc.gnu.org/PR39284#c12 Ciao! Steven
Re: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants
On Tue, Jun 10, 2014 at 11:23 AM, Zhenqiang Chen wrote: > * loop-invariant.c (struct invariant): Add a new member: eqno; > (find_identical_invariants): Update eqno; > (create_new_invariant): Init eqno; > (get_inv_cost): Compute comp_cost wiht eqno; > (gain_for_invariant): Take spill cost into account. Look OK except ... > @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv, > unsigned *regs_needed, > + IRA_LOOP_RESERVED_REGS > - ira_class_hard_regs_num[cl]; >if (size_cost > 0) > - return -1; > + { > + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl]; > + if (comp_cost <= spill_cost) > + return -1; > + > + return 2; > + } >else > size_cost = 0; > } ... why "return 2", instead of just falling through to "return comp_cost - size_cost;"? Ciao! Steven
Re: [PATCH, loop2_invariant, 1/2] Check only one register class
On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote: > Hi, > > For loop2-invariant pass, when flag_ira_loop_pressure is enabled, > function gain_for_invariant checks the pressures of all register > classes. This does not make sense since one invariant might impact > only one register class. > > The patch enhances functions get_inv_cost and gain_for_invariant to > check only the register pressure of the invariant if possible. This patch may work for targets with more-or-less orthogonal reg classes, but not if there is a lot of overlap between reg classes. So I don't think this approach is OK. Ciao! Steven
Re: [PATCH, loop2_invariant] Skip inv (marked as move) from depends_on
On Tue, Jun 10, 2014 at 11:32 AM, Zhenqiang Chen wrote: > > * loop-invariant.c (get_inv_cost): Skip invariants, which are marked > as "move", from depends_on. > This is OK. Ciao! Steven
Re: [PATCH, loop2_invariant] Pre-check invariants
On Tue, Jun 10, 2014 at 11:55 AM, Zhenqiang Chen wrote: > > * loop-invariant.c (find_invariant_insn): Skip invariants, which > can not make a valid insn during replacement in move_invariant_reg. > > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -881,6 +881,35 @@ find_invariant_insn (rtx insn, bool > always_reached, bool always_executed) >|| HARD_REGISTER_P (dest)) > simple = false; > > + /* Pre-check candidate to skip the one which can not make a valid insn > + during move_invariant_reg. */ > + if (flag_ira_loop_pressure && df_live && simple > + && REG_P (dest) && DF_REG_DEF_COUNT (REGNO (dest)) > 1) Why only do this with (flag_ira_loop_pressure && df_live)? If the invariant can't be moved, we should ignore it regardless of whether register pressure is taken into account. > +{ > + df_ref use; > + rtx ref; > + unsigned int i = REGNO (dest); > + struct df_insn_info *insn_info; > + df_ref *def_rec; > + > + for (use = DF_REG_USE_CHAIN (i); use; use = DF_REF_NEXT_REG (use)) > + { > + ref = DF_REF_INSN (use); > + insn_info = DF_INSN_INFO_GET (ref); > + > + for (def_rec = DF_INSN_INFO_DEFS (insn_info); *def_rec; def_rec++) > + if (DF_REF_REGNO (*def_rec) == i) > + { > + /* Multi definitions at this stage, most likely are due to > + instruction constrain, which requires both read and write > + on the same register. Since move_invariant_reg is not > + powerful enough to handle such cases, just ignore the INV > + and leave the chance to others. */ > + return; > + } > + } > +} > + >if (!may_assign_reg_p (SET_DEST (set)) >|| !check_maybe_invariant (SET_SRC (set))) > return; Can you put your new check between "may_assign_reg_p (dest)" and "check_maybe_invariant"? The may_assign_reg_p check is cheap and triggers quite often. Looks good to me otherwise. Ciao! Steven
Re: RFA: Rework FOR_BB_INSNS iterators
On Sat, Jun 7, 2014 at 7:54 PM, Richard Sandiford wrote: > The two parts of the loop condition are really handling two different > kinds of block: ones like entry and exit that are completely empty > and normal ones that have at least a block note. There's no real > need to check for null INSNs in normal blocks. Block notes should go away some day, they're just remains of a time when there was no actual CFG in the compiler. > Also, refetching NEXT_INSN (BB_END (BB)) for each iteration can be expensive. > If we're prepared to say that the loop body can't insert instructions > for another block immediately after BB_END, This can happen with "block_label()" if e.g. a new jump is inserted for one reason or another. Not very likely for passes working in cfglayout mode, but post-RA there may be places that need this (splitters, peepholes, machine dependent reorgs, etc.). So even if we're prepared to say what you suggest, I don't think you can easily enforce it. > It's easier to change these macros if they define the INSN variables > themselves. If you're going to hack these iterators anyway (much appreciated BTW), I suggest to make them similar to the gsi, loop, edge, and bitmap iterators: A new "insn_iterator" structure to hold the variables and static inline functions wrapped in the macros. This will also be helpful if (when) we ever manage to make the type for an insn a non-rtx... > +/* For iterating over insns in a basic block. The iterator allows the loop > + body to delete INSN. It also ignores any instructions that the body > + inserts between INSN and the following instruction. */ > +#define FOR_BB_INSNS(BB, INSN) \ > + for (rtx INSN = BB_HEAD (BB), INSN##_cond_ = INSN, INSN##_next_, \ > + INSN##_end_ = INSN ? NEXT_INSN (BB_END (BB)) : NULL_RTX; > \ > + INSN##_cond_ && (INSN##_next_ = NEXT_INSN (INSN), true); > \ > + INSN = INSN##_next_,\ > + INSN##_cond_ = (INSN != INSN##_end_ ? (rtx) 1 : NULL_RTX)) This just makes my eyes hurt... What about cases where a FOR_BB_INSNS is terminated before reaching the end of a basic block, and you need to know at what insn you stopped? Up to now, you could do: rtx insn; basic_block bb; FOR_BB_INSNS (bb, insn) { ... // do stuff if (something) break; } do_something_with (insn); Looks like this is no longer possible with the implementation of FOR_BB_INSNS of your patch. I would not approve this patch, but let's wait what others think of it. Ciao! Steven
Re: [PATCH] Trust TREE_ADDRESSABLE
On Sat, Jun 7, 2014 at 12:59 PM, Eric Botcazou wrote: >> >In Ada we don't mark (external) variables as addressable if we don't >> >see their address taken. >> >> You have to (now). The testing was of course to detect this... > > Well, you need to define what TREE_ADDRESSABLE means now, because according to > > /* In VAR_DECL, PARM_DECL and RESULT_DECL nodes, nonzero means address >of this is needed. So it cannot be in a register. > [...] > #define TREE_ADDRESSABLE(NODE) ((NODE)->base.addressable_flag) > > your change is clearly wrong and the Ada compiler clearly right. "Clearly"? An external variable is a VAR_DECL that cannot be in a register. It can be loaded into a register (or stored into), and for that its address is needed. So I would expect an external variable to be marked addressable by default. I was always surprised that this was not the case before Richi's change. > And auditing > the various front-ends might also be in order here if they really need to mark > every single external variable as addressable to be safe wrt aliasing. Right. And this should have been done (clearly ;-) ) before the patch was committed... Ciao! Steven
Re: [PATCH] Updates merged bb count
On Fri, May 30, 2014 at 11:43 PM, Dehao Chen wrote: > Index: gcc/testsuite/gcc.dg/tree-prof/merge_block.c > === > --- gcc/testsuite/gcc.dg/tree-prof/merge_block.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-prof/merge_block.c (revision 0) > @@ -0,0 +1,20 @@ > + > +/* { dg-options "-O2 -fno-ipa-pure-const > -fdump-tree-optimized-blocks-details -fno-early-inlining" } */ > +int a[8]; > +int t() > +{ > + int i; > + for (i = 0; i < 3; i++) > + if (a[i]) > + break; > + return i; > +} > +main () > +{ > + int i; > + for (i = 0; i < 1000; i++) > +t (); > + return 0; > +} > +/* { dg-final-use { scan-tree-dump-not "Invalid sum" "optimized"} } */ > +/* { dg-final-use { cleanup-tree-dump "optimized" } } */ I suppose you want to avoid having t() inlined into main()? If so, then I'd suggest adding __attribute__((__noinline__,__noclone__)) to "robustify" the test case. Ciao! Steven