At 9:51 AM +0200 8/6/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
... extraordinarily large (60-80k lines) ...
... dying with an out-of-memory error.

How many symbols does the sub have? Please set a breakpoint at imcc/reg_alloc.c:468 and inspect:
n_symbols
(or just create a debug print statement there)


The interference_graph size is n_symbols * n_symbols * sizeof(a_pointer). This might already be too much.

I don't even get to this spot. (Though at some point we need to do something about some of this code, as I see a fair number of globals, which makes reentrancy an issue. But that's for later)


The code, when I try with -v, gives:

 debug = 0x0
 Reading forms/order.imc
 using optimization '0' (0)
 Starting parse...
 warning:imcc:build_reglist: probably too small HASH_SIZE (29451 symbols)
 error:imcc:make_life_range: Out of mem calloc-ing 12 bytes

And then dies.

2) There is a note in the source code that the interference graph could be done without the N x N graph array. Any hints welcome (Angel Faus!).

That'd probably be a good thing... :)

3) We don't rename registers (except when it comes to spilling). So when a symbol starts its life somewhere at the beginning of the subroutine and is used down to the end, it interfers with each other symbol in that range. One register is blocked permanently. When there are around 30 such long-ranged symbols, spilling starts, which means that the variable is stored into and fetched from a spill array.
This could be avoided, if the symbol is a lexical or a global variable. On each fetch of this variable it could get a fresh register. This currently depends on the source code:


   foo = global "foo"     # bad
   $P1234 = global "foo"  # good - if each fetch increments the $P...

Right, and this is good info to leave in the archive. I've long-since converted away from named .locals to $Pxxx temps, which made quite a difference -- probably made another 15-20% of my source compilable. Most of the temps are very short-lived, though I put in a caching scheme to keep from fetching things too often. (All my library subs have to be fetched, and I tried to cut down on the number of redundant fetches there)


4) Proper register allocation depends on a correct life analysis of the symbol, which depends basically on 2 things:

4.1) the usage of the register in the opcode. Have a look at ops/*.ops:

    op add(in PMC, in PMC)

The destination (left hand side) of add is a "in" register, the PMC has to exist beforehand as well as the source operand of course.

    op find_lex(out PMC, in STR)

The destination PMC register is filled with the address of the lexical. It gets a new pointer there, the usage is written as "out". This means for the register allocator that the life range of this register starts here. Whatever was in that register is "dead".

But we have opcodes that silently have such an influence on the registers life range. These are mainly stack pop opcodes. Some of these opcodes are handled manually inside imcc, but not all. This leads to wrong or too long life ranges and to spilling stress.

There was a TODO item (posted on the list) to mark up all opcodes in the source to carry along their register usage. This is still badly required.

Hrm. Put a notation scheme in place and I'll see about getting that done.

4.2) Basic blocks, life span of symbols inside these blocks and loop detection.

A basic block is a stream of opcodes that runs in one step without a branch and it isn't a branch target - no other opcode can jump inside such a basic block. If a symbol is used as "in" inside of such a block, it had to exist beforehand. Its life range is propagated down to the "previous" block. This is straight forward for branches but gets nasty with loops and subroutines.

The code in cfg.c is cluttered by exceptions that try to follow the control flow of stack-based subroutines ("bsr") and CPS subroutines. "invoke". This code needs cleaning up and I'm willing to eliminate support for old code that uses "bsr" and doesn't preserve registers. The main problem with "invoke" is that it doesn't carry a specific branch destination.

But anyway basic block detechtion should be ok. *But* its totally untested, which is bad. Any hints for testing it, patches, test frame work is really welcome.

Figure out what you want and I'll come up with more test cases than you'd ever want.
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to