Ian Bolton wrote:
Hi Vladimir,

I have just begun working for Icera Inc, on a private port of GCC,
and my first task has been to investigate the performance impact of
using the Chaitin-Briggs algorithm of IRA vs the priority algorithm
that we currently have enabled in our production compiler (due to not
defining IRA_COVER_CLASSES).

After running various tests, I measured little performance changes
except for some test cases that have regressed quite considerably.
My suspicion was that these regressions were due to our architecture
having some instructions that can only use a subset of the register
bank, so I set about trying to understand IRA enough to confirm this.

That is quite possible. Chaitin-Briggs algorithm has some problems when there are instructions using specific register subsets. It is well known and some researchers tried to solve the problem. The most known article about this is "A generalized algorithm for graph-coloring register allocation":

http://130.203.133.121:8080/viewdoc/summary?doi=10.1.1.107.113

I am still working on this problem but I don't know when it will be finally in gcc.
I now believe I have now reached a point where I understand IRA
enough to start thinking about how I can improve these regressed
cases, but I thought it might be a good idea to formulate an email
with my current understanding of IRA in it, so you could correct any
errors before I continue.  (I figure this exchange between us will
also serve useful to other newbies in the field of GCC register
allocation!)

The biggest problem of GCC RA is still reload pass. It frequently changes decisions of IRA not in a good way.
Without further ado, here is my current understanding of IRA
(numbered for ease of reference).

1. You can enable full IRA, with Chaitin-Briggs coloring, by defining
IRA_COVER_CLASSES, but you can just as easily disable it again by
supplying -fira-algorithm=priority at the command line.  (The latter
is useful to know because it means you can compare two versions
without recompiling.)

Correct.
2. The priority algorithm allocates in an order determined by the
allocno_priority_compare_func and calls into assign_hard_reg in that
order to determine which allocno gets what register (or memory).

Correct.
3. The CB algorithm puts allocnos into colored and uncolored buckets,
according to conflicts between allocnos.  The more conflicts, the
more chance of being put in the uncolored bucket.  Being in the
uncolored bucket increases an allocno's chance of being spilled;
being in the colored bucket guarantees that it will get a hard
register.

I'd say if allocno goes to the coloring stack from the uncolored bucket, it will be probably spilled. Going to the coloring stack from coloring bucket means that the allocno most probably get a hard register. In theory it should be guaranteed but in practice it might not happen because of multi-registers allocnos, hard register conflicts with allocno, non-profitability of some hard registers (e.g. callee-saved vs callee-clobbered registers).

Being in colored or uncolored bucket is defined by trivial colorability criteria which is classically based on conflicts left in the graph for given allocno. The criteria can be more complicated and exact but it might slow down RA a lot.
4. When you run at -O0, the conflicts aspect of IRA is disabled
(ira_conflict_p is set to false), and this subsequently causes the
fast_allocation function to be used instead of the do_coloring IRA
function.  (This makes sense since you cannot separate into buckets
without inspecting conflicts.)

Correct. Building conflicts is an expansive operation. Therefore -O0 (or fast) RA is based on live ranges.
5. Assuming ira_conflicts_p is enabled and you are using the CB
algorithm, the two buckets are sorted using a
bucket_allocno_compare_func, and this ordering is important, since it
dictates the order each allocno will be pushed onto the stack and
hence the order it will be popped from it (and assign_hard_reg
called).

Correct. But I should say it is only used for currently colored allocnos. You can not reorder two allocnos when putting the first allocno on the stack (in order words, removing trivially colored allocno from the graph) results in making the second allocno trivially colored.
6. Allocnos from the colorable bucket always go on the stack first
(in the order they were sorted) and then uncolorable ones, in an
order based on which is the best candidate for spilling.  Each time
an allocno is added to the stack, conflicts may be reduced and there
is the possibility of moving one or more allocnos from the
uncolorable bucket to the colorable one (so they end up being added
to the stack sooner than the other uncolorable ones).  Near the end
of processing the uncolorable bucket, there comes an opportunity to
move all remaining allocnos to the colorable bucket (due to few
remaining conflicts) and these get put on the stack last, and hence
popped first, and so they all get registers.
Correct. It is how Chaitin-Briggs coloring works and such description can be found in any compiler book.
 These final uncolorables
will be those that had the highest spill cost and/or the most
remaining conflicts.

Sometimes such allocnos get hard register (it is implemented by optimistic coloring which is the biggest invention of Briggs). It is very hard to predict when it happens.
7. The spill cost and priority calculation for determining which
uncolorable to put on the stack first is important, for two main
reasons: firstly, whichever allocno is picked is a more likely
candidate for spilling; secondly, it will free other allocnos up to
be added to the nicer colored bucket, from which an allocno always
receives a hard register.

Correct with mentioned above problem that allocnos from colored bucket might not get a hard register.

Chaitin-Briggs algorithm is good at coloring but not good at spilling which should be used on register pressure criteria and this is very different from # conflicts. Priority coloring, imho, better dealing with spilling because its priority takes register pressure into account by using live range length. From well known RA algorithms, ELS (extended linear scan RA) probably does the best decision about spilling but it is not good about coloring (especially for code with complicated CFG).
If you could let me know if and where I've gone wrong with any of the
above, I would be extremely grateful.  Then, once we are on the same
page, I hope you could also make some suggestions as to how I might
help IRA work well with our instructions that can only use a subset
of the register bank.

Subsets are complicated issue. I've been working on different approaches to this problem for year without significant improvements. As I wrote I hope it finally will be in GCC in some time and necessity for IRA_COVER_CLASS will finally go away.
Best regards, Ian Bolton (Compiler Engineer, Icera Inc.)

Reply via email to