Oh. As to Bartok and stack scanning, I suspect that a segmented stack doesn't matter much. Let's assume a hybrid-RC collector for a minute, or a generational collector with generations G0 (oldest) ... G_now.
At any given moment, I'm scanning generations Gn ... G_now. There exists some relatively old stack frame such that everything in that stack frame and above refers to objects in generations G0.. Gn-1. There is no need to scan those frames, because I'm not interested in marking those generations. Similarly, in a hybrid-RC scheme, there is some stack frame such that everything in that stack frame and above has non-deferred roots. In both cases, we can treat these frames as "black" frames (as in tri-color black). When a thread halts for local root marking, all of it's stack frames are either black (per above) or white. In an incremental approach, we: 1. Immediately scan the currently active frame and install a return barrier (because we can't release the mutator to execute before we do). This makes the currently active frame black. 2. Incrementally mark stack frames from the mutator as the mutator returns. At each stage, the return barrier examines the next frame up. If that frame is white, it installs a return barrier. If black, it does not. 3. Potentially attempt to scan frames from the collector as well, offloading that work from the mutator. This requires a distinct return barrier to ensure that no concurrency hazards arise. So you end up with a stack that is black at the top, white in the middle, and black at the bottom. The upper (older) black/white boundary is demarcated by a pointer, and the lower one by a return barrier. The collector's goal is to move the return barrier progressively up the stack before the mutator trips over it until the entire stack is black. Now if the stack is contiguous, you can check approximately how much more work you have to do by a magnitude comparison on the black/white boundary pointer. If the stack is segmented, you can't. In the segment case, the "find parent frame" code may be slightly more complicated, but probably not (either way you consult the return pointer). In either scheme, the test that you are "done" is a constant compare, so the fact that stack segments can have non-monotonic start addresses isn't really an issue. But there is *another* for liking a segmented stack, having to do with thread-local data (e.g. the bump pointer). There are basically two possible designs for thread-local data: 1. Dedicate a register for the base pointer. This is how it's done on Pentium. It's why amd64 still supports the %fs and %gs segment registers. 2. Store the base pointer at the least address of the stack, and use a ld (%sp & constant-mask) -> %tlocal In the second scheme, you *really* want the stack to be 2^k bytes that is naturally aligned, and you want all stacks to use the same value of k. The reason is that this lets you find the bottom (least address) portion of the stack segment with a mask, and retrieve the thread-local data pointer[s] from there. You want *k* to be constant because you really want to compile the mask value into the code as a constant (retrieving thread-local state is a frequent operation). But if you do that, then the only way you can grow the stack is by chunking. The fixed-k constraint is why most platforms reserve a register. Chunking without an inline stack frame construction probe requires that you leave a read-only page in the stack chunk as a boundary marker (so that the initial writes to create a frame will fault). The problem with that is that it requires a larger per-thread VA region for the stack. Not a big deal in a 64-bit VA region, but a big deal for 32-bits. Thankfully 32 bit machines are going away, huh? Just like those 8 bit processors. You never see *those* anymore... Though oddly, 16-bit processors really *did* go away. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
