Oh. As to Bartok and stack scanning, I suspect that a segmented stack
doesn't matter much. Let's assume a hybrid-RC collector for a minute, or a
generational collector with generations G0 (oldest) ... G_now.

At any given moment, I'm scanning generations Gn ... G_now. There exists
some relatively old stack frame such that everything in that stack frame
and above refers to objects in generations G0.. Gn-1. There is no need to
scan those frames, because I'm not interested in marking those generations.
Similarly, in a hybrid-RC scheme, there is some stack frame such that
everything in that stack frame and above has non-deferred roots. In both
cases, we can treat these frames as "black" frames (as in tri-color black).

When a thread halts for local root marking, all of it's stack frames are
either black (per above) or white. In an incremental approach, we:

   1. Immediately scan the currently active frame and install a return
   barrier (because we can't release the mutator to execute before we do).
   This makes the currently active frame black.
   2. Incrementally mark stack frames from the mutator as the mutator
   returns. At each stage, the return barrier examines the next frame up. If
   that frame is white, it installs a return barrier. If black, it does not.
   3. Potentially attempt to scan frames from the collector as well,
   offloading that work from the mutator. This requires a distinct return
   barrier to ensure that no concurrency hazards arise.

So you end up with a stack that is black at the top, white in the middle,
and black at the bottom. The upper (older) black/white boundary is
demarcated by a pointer, and the lower one by a return barrier. The
collector's goal is to move the return barrier progressively up the stack
before the mutator trips over it until the entire stack is black.

Now if the stack is contiguous, you can check approximately how much more
work you have to do by a magnitude comparison on the black/white boundary
pointer. If the stack is segmented, you can't. In the segment case, the
"find parent frame" code may be slightly more complicated, but probably not
(either way you consult the return pointer). In either scheme, the test
that you are "done" is a constant compare, so the fact that stack segments
can have non-monotonic start addresses isn't really an issue.


But there is *another* for liking a segmented stack, having to do with
thread-local data (e.g. the bump pointer).

There are basically two possible designs for thread-local data:

   1. Dedicate a register for the base pointer. This is how it's done on
   Pentium. It's why amd64 still supports the %fs and %gs segment registers.
   2. Store the base pointer at the least address of the stack, and use a
   ld (%sp & constant-mask) -> %tlocal

In the second scheme, you *really* want the stack to be 2^k bytes that is
naturally aligned, and you want all stacks to use the same value of k. The
reason is that this lets you find the bottom (least address) portion of the
stack segment with a mask, and retrieve the thread-local data pointer[s]
from there. You want *k* to be constant because you really want to compile
the mask value into the code as a constant (retrieving thread-local state
is a frequent operation). But if you do that, then the only way you can
grow the stack is by chunking.

The fixed-k constraint is why most platforms reserve a register.


Chunking without an inline stack frame construction probe requires that you
leave a read-only page in the stack chunk as a boundary marker (so that the
initial writes to create a frame will fault). The problem with that is that
it requires a larger per-thread VA region for the stack. Not a big deal in
a 64-bit VA region, but a big deal for 32-bits.

Thankfully 32 bit machines are going away, huh? Just like those 8 bit
processors. You never see *those* anymore...

Though oddly, 16-bit processors really *did* go away.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to