On Thu, Oct 4, 2012 at 2:59 PM, Steven Bosscher <stevenb....@gmail.com> wrote: > On Thu, Oct 4, 2012 at 1:30 PM, Richard Guenther > <richard.guent...@gmail.com> wrote: >> Isn't _REVERSE vs. non-_RESERVE still kind-of "random" order? > > Not at this stage. For cfglayout mode I would answer yes, but IRA/LRA > operates in cfgrtl mode, so the sequence of insns and basic blocks > must match. Therefore, if you walk the basic blocks in reverse, and > the insns in each basic block in reverse, you effectively work on a, > let's say, "reverse extended basic block" (??) in that your program > points are sequential across fallthrough edges. > >> Thus, doesn't >> the above show there exists an optimal order for processing which we could >> use? > > There may be a smarter order: Could even walk blocks in that order if > you know a priori what path through the CFG minimizes the length of > the live range chains. But to know that order, you have to build the > chains. So chicken-and-egg...
Richi, you had me inspired, and I remembered your (still disgusting ;-)) my_rev_post_order_compute hack. A better order than walking basic blocks in reverse, is to walk them in topological order on the reverse CFG. This results in very long live ranges being compressed into very short chains because the program points become almost sequential for them. So here's one more patch: @@ -994,8 +1092,16 @@ lra_create_live_ranges (bool all_p) curr_point = 0; point_freq_vec = VEC_alloc (int, heap, get_max_uid () * 2); lra_point_freq = VEC_address (int, point_freq_vec); - FOR_EACH_BB (bb) - process_bb_lives (bb); + int *post_order_rev_cfg = XNEWVEC (int, last_basic_block); + int n_blocks_inverted = inverted_post_order_compute (post_order_rev_cfg); + lra_assert (n_blocks_inverted == n_basic_blocks); + for (i = n_blocks_inverted - 1; i >= 0; --i) + { + bb = BASIC_BLOCK (post_order_rev_cfg[i]); + if (bb == EXIT_BLOCK_PTR || bb == ENTRY_BLOCK_PTR) + continue; + process_bb_lives (bb, curr_point); + } // HACK13 must add free (post_order_rev_cfg); here lra_live_max_point = curr_point; create_start_finish_chains (); if (lra_dump_file != NULL) Without this patch: Compressing live ranges: from 700458 to 391665 - 55%, pre_count 40730653, post_count 34363983 max per-reg pre_count 12978 (228090, 2 defs, 2 uses) (reg/f:DI 228090 [ SR.25009 ]) max per-reg post_count 10967 (228090, 2 defs, 2 uses) (reg/f:DI 228090 [ SR.25009 ]) With this patch: Compressing live ranges: from 700458 to 372585 - 53%, pre_count 283937, post_count 271120 max per-reg pre_count 545 (230653, 542 defs, 542 uses) (reg/f:DI 230653 [ SR.13303 ]) max per-reg post_count 544 (230649, 542 defs, 542 uses) (reg/f:DI 230649 [ SR.13305 ]) (the per-reg counts are the lengths of the live range chains for the mentioned regno). So the patch results in fewer program points, and way, waaaaay shorter live range chains. Needless to say, really, compile time benefits significantly also. Time spent in create_start_finish_chains: Without this patch, 19.75s (2%) usr With this patch, 0.14s (0%) Vlad, is this something you also already tried? If not, then I'm going to bootstrap&test this (on top of my pending other patches on lra-lives.c). Ciao! Steven