Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

Steven Bosscher Fri, 05 Oct 2012 14:53:45 -0700

On Thu, Oct 4, 2012 at 2:59 PM, Steven Bosscher <stevenb....@gmail.com> wrote:
> On Thu, Oct 4, 2012 at 1:30 PM, Richard Guenther
> <richard.guent...@gmail.com> wrote:
>> Isn't _REVERSE vs. non-_RESERVE still kind-of "random" order?
>
> Not at this stage. For cfglayout mode I would answer yes, but IRA/LRA
> operates in cfgrtl mode, so the sequence of insns and basic blocks
> must match. Therefore, if you walk the basic blocks in reverse, and
> the insns in each basic block in reverse, you effectively work on a,
> let's say, "reverse extended basic block" (??) in that your program
> points are sequential across fallthrough edges.
>
>>  Thus, doesn't
>> the above show there exists an optimal order for processing which we could 
>> use?
>
> There may be a smarter order: Could even walk blocks in that order if
> you know a priori what path through the CFG minimizes the length of
> the live range chains. But to know that order, you have to build the
> chains. So chicken-and-egg...


Richi, you had me inspired, and I remembered your (still disgusting
;-)) my_rev_post_order_compute hack.

A better order than walking basic blocks in reverse, is to walk them
in topological order on the reverse CFG. This results in very long
live ranges being compressed into very short chains because the
program points become almost sequential for them.

So here's one more patch:

@@ -994,8 +1092,16 @@ lra_create_live_ranges (bool all_p)
   curr_point = 0;
   point_freq_vec = VEC_alloc (int, heap, get_max_uid () * 2);
   lra_point_freq = VEC_address (int, point_freq_vec);
-  FOR_EACH_BB (bb)
-    process_bb_lives (bb);
+  int *post_order_rev_cfg = XNEWVEC (int, last_basic_block);
+  int n_blocks_inverted = inverted_post_order_compute (post_order_rev_cfg);
+  lra_assert (n_blocks_inverted == n_basic_blocks);
+  for (i = n_blocks_inverted - 1; i >= 0; --i)
+    {
+      bb = BASIC_BLOCK (post_order_rev_cfg[i]);
+      if (bb == EXIT_BLOCK_PTR || bb == ENTRY_BLOCK_PTR)
+       continue;
+      process_bb_lives (bb, curr_point);
+    } // HACK13 must add free (post_order_rev_cfg); here
   lra_live_max_point = curr_point;
   create_start_finish_chains ();
   if (lra_dump_file != NULL)


Without this patch:
Compressing live ranges: from 700458 to 391665 - 55%, pre_count
40730653, post_count 34363983
max per-reg pre_count 12978 (228090, 2 defs, 2 uses) (reg/f:DI 228090
[ SR.25009 ])
max per-reg post_count 10967 (228090, 2 defs, 2 uses) (reg/f:DI 228090
[ SR.25009 ])

With this patch:
Compressing live ranges: from 700458 to 372585 - 53%, pre_count
283937, post_count 271120
max per-reg pre_count 545 (230653, 542 defs, 542 uses) (reg/f:DI
230653 [ SR.13303 ])
max per-reg post_count 544 (230649, 542 defs, 542 uses) (reg/f:DI
230649 [ SR.13305 ])

(the per-reg counts are the lengths of the live range chains for the
mentioned regno).

So the patch results in fewer program points, and way, waaaaay shorter
live range chains. Needless to say, really, compile time benefits
significantly also. Time spent in create_start_finish_chains:
Without this patch, 19.75s (2%) usr
With this patch, 0.14s (0%)

Vlad, is this something you also already tried? If not, then I'm going
to bootstrap&test this (on top of my pending other patches on
lra-lives.c).

Ciao!
Steven

Re: [patch][lra] Improve initial program point density in lra-lives.c (was: Re: RFC: LRA for x86/x86-64 [7/9])

Reply via email to