This is a summary of a private mail conversation between Leo and myself. <lie>No, it didn't start by me forgetting to fix Reply-To when trying to post follow-up on the list.</lie> ;)

Essentially we whipped up a GC scheme for collecting the register stacks that doesn't make call/cc-using code, well, unusably slow.

In addition to LT's original post on the register stack, here's how to allocate them and clean up after their use. LT, feel free to hit me with wet noodles if I forgot anything.

Terminology:
---

Register frame is an actual frame of 4x16 registers.

Register chunk is a flat chunk of memory containing multiple register frames. It has a water mark that points where a new frame should be pushed into the chunk. I'm using stack and chunk interchangably.

Frames are subject to DoD, and chunks are subject to GC. There are plenty of tricks that can prevent GC from happening in many cases (read on for details). DoD is necessary anyway (to retrieve the live PMCs pointed from the register frames).

A chunk is pinned if GC currently can't copy it over and kill it (read on for details).

Allocation:
---

Register stacks should be allocated in fairly big chunks. However, since there must be at least one active chunk per thread (and per coroutine), choosing anything actually huge will pose a scaling problem.

Frames are allocated from the current stack, simply by advancing the water mark of the currently active chunk. If this causes the water mark to overflow, a new chunk needs to be allocated.

Note that if a continuation has been captured and then invoked, the water mark will not necessarily point at the end of the current frame (since the invoked continuation may keep its registers deeper in the chunk)

Deallocation:
---

The stack frame can only be popped if the current continuation hasn't been captured (from the frame being popped). Here, pop means changing both frame pointer and the watermark. This ammounts to adding a flag to the current frame and bumping the flag if the return continuation gets moved into any directly accessible location. If the frame can't be popped, only the frame pointer should be changed to the caller's.

GC:
---

During DoD, the register frames should be marked (as parts of their chunks). Then the dead frames get dealt with in the following manner:

Remove the trailing dead frames from each chunk (by just lowering the water mark).

If after this the water mark remains high (e.g. past 50% of the chunk) but more than certain ammount of the chunk is salvagable as dead frames (50% seems like a good number again), the chunk should be copied, all the frame pointers fixed up, then the chunk gets killed. Essentially the chunks are treated as buffers. The watermark lowering won't help in cases when continuations get invoked in a round-robin manner (wish I could think of some simple Scheme example here that behaves this way), and start littering the chunk with interlaced dead frames.

Two caveats:

The frame pointer of the currently active frames (can be more than 1 due to threads) may be kept in a CPU register and can't be fixed-up. So the chunk containing currently active frame is pinned until it either overflows into another chunk or gets freed by popping.

Chunks can contain reverse pointers to the continuations that use its frames. When copying the frame, just go through these reverse pointers and fix the continuations they point to.

Performance:
---

This scheme requires some GC flags to each frame (as well as reverse pointers). Possibly also next pointers to the frames, if they are not cut to equal size.

Without continuations, this behaves like C function calling. Nothing will read return continuation and so the frames will simply get popped from the stack on return.

When continuations begin to prevent popping, the stack will start growing from the last captured continuation (even if its dead). Watermark will drop in GC if the GC happens to hit while the active frame is way down the stack (i.e. just between two function calls). Otherwise, GC will wait till the chunk overflows (so that the active frame is in a higher chunk) and then will copy the live frames to a newly allocated chunk, compacting several chunks together if possible. Copying can be skipped if the chunk is near-full of the live frames.


I think this about sums it up. Comments, corrections, wet noodles?

   Miro




Reply via email to