Re: RFC: LRA for x86/x86-64 [0/9]

Vladimir Makarov Mon, 01 Oct 2012 18:14:18 -0700

On 10/01/2012 09:03 AM, Steven Bosscher wrote:

On Sat, Sep 29, 2012 at 10:26 PM, Steven Bosscher <stevenb....@gmail.com> wrote:

  LRA create live ranges  : 175.30 (15%) usr   2.14 (13%) sys 177.44
(15%) wall    2761 kB ( 0%) ggc

I've tried to split this up a bit more:


process_bb_lives ~50%
create_start_finish_chains ~25%
remove_some_program_points_and_update_live_ranges ~25%

The latter two have a common structure with loops that look like this:

   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
     {
       for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)

Perhaps it's possible to do some of the work of compress_live_ranges
during process_bb_lives, to create shorter live_ranges chains.

It is aready done. Moreover program points are compressed to minimalto guarantee right conflict resolution. For example, if only 2 pseudoslive in 1..10 and 2..14, they actually will have the same range like 1..1.

Analogous live ranges are used in IRA as intermidiate step to build aconflict graph. Actually, the first approach was to use IRA code toassign hard registers to pseudos (e.g. Jeff Law tried this approach)but it was rejected as requiring much more compilation time. In someway, one can look at the assignment in LRA is a compromise betweenquality (which could achieved through repeated buidling conflict graphsand using graph coloring) and compilation speed.

Also, maybe doing something else than a linked list of live_ranges
will help (unfortunately that's not a trivial change, it seems,
because the lra_live_range_t type is used everywhere and there are no
iterators or anything abstracted out like that -- just chain
walks...).

Still it does seem to me that a sorted VEC of lra_live_range objects
probably would speed things up. Question is of course how much... :-)

My experience shows that these lists are usually 1-2 elements.Although in this case, there are pseudos with huge number elements(hundreeds). I tried -fweb for this tests because it can decrease thenumber elements but GCC (I don't know what pass) scales even worse:after 20 min of waiting and when virt memory achieved 20GB I stoped it.

I guess the same speed or close to reload one on this test can beachieved through implementing other simpler algorithms generating worsecode. I need some time to design them and implement this.

Re: RFC: LRA for x86/x86-64 [0/9]

Reply via email to