On 10/01/2012 09:03 AM, Steven Bosscher wrote:
On Sat, Sep 29, 2012 at 10:26 PM, Steven Bosscher <stevenb....@gmail.com> wrote:
  LRA create live ranges  : 175.30 (15%) usr   2.14 (13%) sys 177.44
(15%) wall    2761 kB ( 0%) ggc
I've tried to split this up a bit more:

process_bb_lives ~50%
create_start_finish_chains ~25%
remove_some_program_points_and_update_live_ranges ~25%

The latter two have a common structure with loops that look like this:

   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
     {
       for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)

Perhaps it's possible to do some of the work of compress_live_ranges
during process_bb_lives, to create shorter live_ranges chains.
It is aready done. Moreover program points are compressed to minimal to guarantee right conflict resolution. For example, if only 2 pseudos live in 1..10 and 2..14, they actually will have the same range like 1..1.

Analogous live ranges are used in IRA as intermidiate step to build a conflict graph. Actually, the first approach was to use IRA code to assign hard registers to pseudos (e.g. Jeff Law tried this approach) but it was rejected as requiring much more compilation time. In some way, one can look at the assignment in LRA is a compromise between quality (which could achieved through repeated buidling conflict graphs and using graph coloring) and compilation speed.
Also, maybe doing something else than a linked list of live_ranges
will help (unfortunately that's not a trivial change, it seems,
because the lra_live_range_t type is used everywhere and there are no
iterators or anything abstracted out like that -- just chain
walks...).

Still it does seem to me that a sorted VEC of lra_live_range objects
probably would speed things up. Question is of course how much... :-)

My experience shows that these lists are usually 1-2 elements. Although in this case, there are pseudos with huge number elements (hundreeds). I tried -fweb for this tests because it can decrease the number elements but GCC (I don't know what pass) scales even worse: after 20 min of waiting and when virt memory achieved 20GB I stoped it.

I guess the same speed or close to reload one on this test can be achieved through implementing other simpler algorithms generating worse code. I need some time to design them and implement this.

Reply via email to