On Sun, Nov 10, 2013 at 7:29 PM, Ben Kloosterman <[email protected]> wrote:

> One ( big?) bonus is you can walk the stack back and provide accurate
> debug in crashes . We dont need a frame pointer  and can handle all sorts
> of dynamic types.
>

Yes, though this comment drew my attention to another problem with
stack-on-RC-immix: you need to bump the allocation count in the per-line
metadata (or set the object start bit). If we use the "object starts here"
bit idea instead of the count, that starts to be a fair number of
instructions.

I still consider the "object starts here bit" to be a hair-brained thought
until somebody actually looks at the code sequence involved.

Since I'm tossing out thoughts here:

I was contemplating whether "object free" procedure calls need an on-stack
object. The answer is "yes and no". We seem to have four options:

1. Stack is modeled as a dense linear sequence of bump-allocated objects.
Per-line metadata is *not* maintained (no counts, no object start bits).
The stack pointer is used to identify the end of the sequence.

2. Stack is modeled as a non-dense linear sequence of bump-allocated
objects. Non-reference data may intervene between these objects. The cost
of this is that per-line metadata must be maintained so that object start
positions can be located.

3. Variant on [2], except that we modify the "embedded frame object" layout
to include a pointer to the preceding frame. Note that in contrast to the
saved SP, this pointer is write-only; it is examined solely by the GC's
stack walker. This option will be significantly faster than maintaining
metadata on the side.

4. Maintain a conventional stack map.


It's an interesting set of tradeoffs, and the performant answer may well
turn out to be architecture dependent. The reads on this data are rare, so
a higher-overhead mechanism (the stack map) that does not impose overhead
on calls is probably the right thing. On the other hand, the stack is
likely-hot, the additional writes performed for the embedded object frames
are "blind" writes (i.e. have no subsequent data dependencies), and it
supports reuse of mechanism. On a superscalar processor this could go
either way, and it would need to be measured. On an in-order processor I'd
bet that the stack map approach is better - especially if we have a decent
self-scaling hash table.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to