On Sun, Jul 17, 2011 at 8:33 PM, Tom de Vries <vr...@codesourcery.com> wrote:
> Bootstrapped and reg-tested on x86_64. Ok for trunk (after ARM testing)? +static int +same_succ_equal (const void *ve1, const void *ve2) +{ ... + if (bitmap_bit_p (e1->bbs, ENTRY_BLOCK) + || bitmap_bit_p (e1->bbs, EXIT_BLOCK) + || bitmap_bit_p (e2->bbs, ENTRY_BLOCK) + || bitmap_bit_p (e2->bbs, EXIT_BLOCK)) + return 0; that's odd - what are these checks for? + if (dump_file) + { + fprintf (dump_file, "initial worklist:\n"); with dump_flags & TDF_DETAILS I'm now at merge_calls and wondering about alias info again. We are probably safe for the per-pointer information because we are not operating flow-sensitive for memory and for merging require value-equivalence for SSA names. For calls the same should be true - we are not flow- or context-sensitive, and even if we were context-sentitive we require equivalent arguments (for memory arguments we should be safe because of the non-flow-sensitivity). So, did you actually run into problems? If not then I suggest to remove merge_calls completely (and the related changes that it requires). +/* Create or update a vop phi in BB2. Use VUSE1 arguments for all the + REDIRECTED_EDGES, or if VUSE1 is NULL_TREE, use BB_VOP_AT_EXIT. If a new + phis is created, use the phi instead of VUSE2 in BB2. */ + +static void +update_vuses (tree vuse1, tree vuse2, basic_block bb2, + VEC (edge,heap) *redirected_edges) ... + if (vuse2 == NULL_TREE) + return; hm, that's the case when there is no VUSE that is dominated by BB2 (or is in BB2). Ok, might happen. + locus1 = gimple_location (SSA_NAME_DEF_STMT (arg)); + add_phi_arg (phi, arg, e, locus1); I don't think virtual operand PHIs should have locations, just use UNKNOWN_LOCATION here. + locus2 = gimple_location (def_stmt2); Likewise. + /* Create a phi, first with default argument vuse2 for all preds. */ + lhs = make_ssa_name (SSA_NAME_VAR (vuse2), NULL); + VN_INFO_GET (lhs); + phi = create_phi_node (lhs, bb2); + SSA_NAME_DEF_STMT (lhs) = phi; + FOR_EACH_EDGE (e, ei, bb2->preds) + add_phi_arg (phi, vuse2, e, locus2); + + /* Now overwrite the arguments associated with the redirected edges with + vuse1. */ + for (i = 0; i < EDGE_COUNT (redirected_edges); ++i) + { + e = VEC_index (edge, redirected_edges, i); + gcc_assert (PHI_ARG_DEF_FROM_EDGE (phi, e)); No need for this assert. + if (vuse1) + arg = vuse1; + else + arg = BB_VOP_AT_EXIT (e->src); + SET_PHI_ARG_DEF (phi, e->dest_idx, arg); + locus1 = gimple_location (SSA_NAME_DEF_STMT (arg)); See above. + gimple_phi_arg_set_location (phi, e->dest_idx, locus1); + } Can you maybe merge this with the update-existing-phi-case? They look all too similar. + /* Replace uses of vuse2 in bb2 with phi. */ + FOR_EACH_IMM_USE_STMT (stmt, iter, vuse2) + { + if (gimple_code (stmt) == GIMPLE_PHI) Does FOR_EACH_IMM_USE_ON_STMT really not work for PHIs? Other code doesn't seem to care. + { + edge e; + if (stmt == phi) + continue; + e = find_edge (bb2, gimple_bb (stmt)); + if (e == NULL) + continue; + use_p = PHI_ARG_DEF_PTR_FROM_EDGE (stmt, e); + SET_USE (use_p, lhs); + update_stmt (stmt); + } + else if (gimple_bb (stmt) == bb2) That check looks odd. A use can very well appear in a forwarder block. + { + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, lhs); + update_stmt (stmt); + } + } +/* Scans the vdefs and vuses of the insn of BB, and returns the vop at entry in + VOP_AT_ENTRY, and the vop at exit in VOP_AT_EXIT. */ + +static void +insn_vops (basic_block bb, tree *vop_at_entry, tree *vop_at_exit) it's easier to start from the bb end and walk until you see the first vdef or vuse. Then you have *vop_at_exit. From there just walk the SSA_NAME_DEF_STMTs of the vuse until you hit one whose definition is not in BB - and you have *vop_at_entry. That way you can avoid scanning most of the stmts. The function also has an odd name ;) It should be something like vops_at_bb_entry_and_exit. +static void +vop_at_entry (basic_block bb1, basic_block bb2, tree *vop_at_entry1, + tree *vop_at_entry2) so you don't need the vop at exit at all? The function is a bit unclear to me given it does so much stuff other than just computing the BBs entry VOPs ... +static void +replace_block_by (basic_block bb1, basic_block bb2, bool update_vops) +{ can you add some comments before the different phases of update? I _think_ I understand what it does, but ... +/* Runs tail merge optimization. */ + +unsigned int +tail_merge_optimize (unsigned int todo) +{ + int nr_bbs_removed_total = 0; + int nr_bbs_removed; + bool loop_entered = false; + int iteration_nr = 0; + bool update_vops = ((todo & TODO_update_ssa_only_virtuals) == 0 + || !symbol_marked_for_renaming (gimple_vop (cfun))); you need to simplify this to bool update_vops = !symbol_marked_for_renaming (gimple_vop (cfun)); + if (nr_bbs_removed == 0) + break; + + free_dominance_info (CDI_DOMINATORS); we might want to limit the number of iterations we perform - especially as you are re-computing dominators on each iteration. What's the maximum number of iterations you see during a GCC bootstrap? + if (dump_file) + fprintf (dump_file, "htab collision / search: %f\n", + htab_collisions (same_succ_htab)); in general without dump_flags & TDF_DETAILS passes should print at most things when they did a transformation (some even don't do that). Please double-check all your dump-prints. + todo |= (TODO_verify_ssa | TODO_verify_stmts | TODO_verify_flow + | TODO_dump_func); should all be already set. @ -4945,6 +4944,9 @@ execute_pre (bool do_fre) scev_finalize (); fini_pre (do_fre); + todo |= tail_merge_optimize (todo); + free_scc_vn (); Please only run tail_merge_optimize once. As we are running through this code three times at -O2. I suggest to try it in the !do_fre case only which we only run once (as PRE). If that doesn't work out nicely we need to find other means (like introduce a pass_fre_and_tail_merge which passes down another flag and replace one FRE pass with that new combined pass). Index: gcc/tree-flow.h =================================================================== --- gcc/tree-flow.h (revision 175801) +++ gcc/tree-flow.h (working copy) @@ -806,6 +806,9 @@ bool multiplier_allowed_in_address_p (HO unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool); bool may_be_nonaddressable_p (tree expr); +/* In tree-ssa-tail-merge.c. */ +extern unsigned int tail_merge_optimize (unsigned int); Eh, tree-flow.h kitchen-sink ;) Please put it into tree-pass.h instead. That said - I'm reasonably happy with the pass now, but it's rather large (this review took 40min again ...) so I appreciate a second look from somebody else. Btw, can you expand a bit on the amount of testcases? Thanks, Richard. > Thanks, > - Tom > > 2011-07-17 Tom de Vries <t...@codesourcery.com> > > PR middle-end/43864 > * tree-ssa-tail-merge.c: New file. > (struct same_succ): Define. > (same_succ_t, const_same_succ_t): New typedef. > (struct bb_cluster): Define. > (bb_cluster_t, const_bb_cluster_t): New typedef. > (struct aux_bb_info): Define. > (BB_SIZE, BB_SAME_SUCC, BB_CLUSTER, BB_VOP_AT_EXIT): Define. > (gvn_uses_equal): New function. > (same_succ_print, same_succ_print_traverse, same_succ_hash) > (inverse_flags, same_succ_equal, same_succ_alloc, same_succ_delete) > (same_succ_reset): New function. > (same_succ_htab, same_succ_edge_flags) > (deleted_bbs, deleted_bb_preds): New var. > (debug_same_succ): New function. > (worklist): New var. > (print_worklist, add_to_worklist, find_same_succ_bb, find_same_succ) > (init_worklist, delete_worklist, delete_basic_block_same_succ) > (same_succ_flush_bbs, update_worklist): New function. > (print_cluster, debug_cluster, same_predecessors) > (add_bb_to_cluster, new_cluster, delete_cluster): New function. > (all_clusters): New var. > (alloc_cluster_vectors, reset_cluster_vectors, delete_cluster_vectors) > (merge_clusters, set_cluster): New function. > (gimple_equal_p, find_duplicate, same_phi_alternatives_1) > (same_phi_alternatives, bb_has_non_vop_phi, find_clusters_1) > (find_clusters): New function. > (merge_calls, update_vuses, vop_phi, insn_vops, vop_at_entry) > (replace_block_by): New function. > (update_bbs): New var. > (apply_clusters): New function. > (update_debug_stmt, update_debug_stmts): New function. > (tail_merge_optimize): New function. > tree-flow.h (tail_merge_optimize): Declare. > * tree-ssa-pre.c (execute_pre): Use tail_merge_optimize. > * Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o. > (tree-ssa-tail-merge.o): New rule. > * opts.c (default_options_table): Set OPT_ftree_tail_merge by default > at > OPT_LEVELS_2_PLUS. > * tree-ssa-sccvn.c (vn_valueize): Move to ... > * tree-ssa-sccvn.h (vn_valueize): Here. > * tree-ssa-alias.h (pt_solution_ior_into_shared): Declare. > * tree-ssa-structalias.c (find_what_var_points_to): Factor out and > use ... > (pt_solution_share): New function. > (pt_solution_unshare, pt_solution_ior_into_shared): New function. > (delete_points_to_sets): Nullify shared_bitmap_table after deletion. > * timevar.def (TV_TREE_TAIL_MERGE): New timevar. > * common.opt (ftree-tail-merge): New switch. > * params.def (PARAM_MAX_TAIL_MERGE_COMPARISONS): New parameter. > * doc/invoke.texi (Optimization Options, -O2): Add -ftree-tail-merge. > (-ftree-tail-merge, max-tail-merge-comparisons): New item. >