> Basic block reordering is disabled for -Os from gcc 4.7 since the pass will > lead to big code size regression. But benchmarks logs also show there are > lots of regression due to poor code layout compared with 4.6. > > The patch is to enable bbro for -Os. When optimizing for size, it > * avoid duplicating block. > * keep its original order if there is no chance to fall through. > * ignore edge frequency and probability. > * handle predecessor first if its index is smaller to break long trace. > * only connect Trace n with Trace n + 1 to reduce long jump. > > Here are the CSiBE code size benchmark results: > * For ARM, code size reduces 0.21%. > * For MIPS, code size reduces 0.25%. > * For PPC, code size reduces 0.33%. > * For X86, code size reduces 0.22%.
Interesting figures. The patch looks good overall but, since the -Os path deviates substantially from the original algorithm, it needs to be clearly documented in the comment at the top of the file (before "Reference"), e.g. "The above description is for the full algorithm, which is used when the function is optimized for speed. When the function is optimized for size, in order to <...insert reasons here...>, the algorithm is modified as follows: <...list modifications here...>" More details comments: /* Edge that cannot be fallthru or improbable or infrequent - successor (i.e. it is unsuitable successor). */ + successor (i.e. it is unsuitable successor). + For size, ignore the frequency and probability. */ if (!(e->flags & EDGE_CAN_FALLTHRU) || (e->flags & EDGE_COMPLEX) - || prob < branch_th || EDGE_FREQUENCY (e) < exec_th - || e->count < count_th) + || (prob < branch_th || EDGE_FREQUENCY (e) < exec_th + || e->count < count_th) && !for_size) continue; ...ignore the probability and frequency. @@ -558,6 +564,14 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, /* Add all non-selected successors to the heaps. */ FOR_EACH_EDGE (e, ei, bb->succs) { + /* Wait for the predecessors. */ + if ((e == best_edge) && for_size + && (EDGE_COUNT (best_edge->dest->succs) > 1 + || EDGE_COUNT (best_edge->dest->preds) > 1)) + { + best_edge = NULL; + } + if (e == best_edge || e->dest == EXIT_BLOCK_PTR || bb_visited_trace (e->dest)) I don't really understand what this means and why this is done here. If you don't want to add the best destination in some cases, why not do it just before the loop and explicitly state the reason? And you don't need parentheses. @@ -596,11 +610,12 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, { /* When partitioning hot/cold basic blocks, make sure the cold blocks (and only the cold blocks) all get - pushed to the last round of trace collection. */ + pushed to the last round of trace collection. + Do not push to next round when optimizing for size. Trailing spaces in comment. @@ -681,6 +696,8 @@ find_traces_1_round (int branch_th, int exec_th, gcov_type count_th, (i.e. 2 * B->frequency >= EDGE_FREQUENCY (AC) ) Best ordering is then A B C. + For size, A B C is always the best order. + This situation is created for example by: if (A) B; When optimizing for size, A B C is always the best ordering. @@ -864,6 +886,13 @@ better_edge_p (const_basic_block bb, const_edge e, int prob, int freq, int best_ int diff_prob = best_prob / 10; int diff_freq = best_freq / 10; + if (optimize_function_for_size_p (cfun)) + { + /* The smaller one is better to keep the original order. */ + return !cur_best_edge + || cur_best_edge->dest->index > e->dest->index; + } Move the comment out of the block, add "When optimizing for size" and remove the parentheses. +/* Return true when the edge E is better than the temporary best edge + CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of + E and CUR_BEST_EDGE; otherwise it will compare the dest bb. + BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE. + TRACES record the information about traces. + When optimizing for size, the edge with smaller index is better. + When optimizing for speed, the edge with bigger probability or longer trace + is better. */ + +static bool +connect_better_edge_p (const_edge e, bool src_index_p, int best_len, + const_edge cur_best_edge, struct trace *traces) +{ + int e_index; + int b_index; + + if (!cur_best_edge) + return true; + + if (optimize_function_for_size_p (cfun)) + { + e_index = src_index_p ? e->src->index : e->dest->index; + b_index = src_index_p ? cur_best_edge->src->index + : cur_best_edge->dest->index; + /* The smaller one is better to keep the original order. */ + return b_index > e_index; + } + else if (src_index_p) + { + e_index = e->src->index; + return e->probability > cur_best_edge->probability + || (e->probability == cur_best_edge->probability + && (traces[bbd[e_index].end_of_trace].length > best_len)); + } + else + { + e_index = e->dest->index; + return e->probability > cur_best_edge->probability + || (e->probability == cur_best_edge->probability + && (traces[bbd[e_index].start_of_trace].length > best_len)); + } +} Remove the second "else" to have the same structure as in better_edge_p. + if (for_size) + { + if (!best) + break; Add a small comment to explain the break here. + /* It is OK to connect block n with block n + 1 or a block + before n. For others, only connect to the loop header. */ + if (best->dest->index > (traces[t].last->index + 1)) + { + int count = EDGE_COUNT(best->dest->preds); Missing space before parenthesis. + FOR_EACH_EDGE (e, ei, best->dest->preds) + if (e->flags & EDGE_DFS_BACK) + count--; + + /* If dest has multiple predecessors, skip it. Expect + block dest->index - 1 connect with it later. */ + if (count != 1) + break; + } I'm not sure I understand the last sentence. Could you rephrase it in plain English? "We expect that the previous block will connect with it later"? + /* Only connect Trace n with Trace n + 1. It is conservative + to keep the order as close as the original order. */ + if (last_trace != bbd[best->dest->index].start_of_trace - 1) + break; ... the order as close as possible to the original order. + if (dump_file) + { + fprintf (dump_file, "Connection: %d %d\n", + best->src->index, best->dest->index); + } Superfluous parentheses. @@ -1169,6 +1272,10 @@ copy_bb_p (const_basic_block bb, int code_may_grow) int max_size = uncond_jump_length; rtx insn; + /* Avoid duplicating blocks for size. */ + if (optimize_function_for_size_p (cfun)) + return false; ...when optimizing for size. Please adjust and repost. Note that I just installed a patch that makes some cosmetic changes to the file so you might have a couple of minor conflicts. -- Eric Botcazou