On Sun, Nov 10, 2013 at 7:29 PM, Ben Kloosterman <[email protected]> wrote:
> One ( big?) bonus is you can walk the stack back and provide accurate > debug in crashes . We dont need a frame pointer and can handle all sorts > of dynamic types. > Yes, though this comment drew my attention to another problem with stack-on-RC-immix: you need to bump the allocation count in the per-line metadata (or set the object start bit). If we use the "object starts here" bit idea instead of the count, that starts to be a fair number of instructions. I still consider the "object starts here bit" to be a hair-brained thought until somebody actually looks at the code sequence involved. Since I'm tossing out thoughts here: I was contemplating whether "object free" procedure calls need an on-stack object. The answer is "yes and no". We seem to have four options: 1. Stack is modeled as a dense linear sequence of bump-allocated objects. Per-line metadata is *not* maintained (no counts, no object start bits). The stack pointer is used to identify the end of the sequence. 2. Stack is modeled as a non-dense linear sequence of bump-allocated objects. Non-reference data may intervene between these objects. The cost of this is that per-line metadata must be maintained so that object start positions can be located. 3. Variant on [2], except that we modify the "embedded frame object" layout to include a pointer to the preceding frame. Note that in contrast to the saved SP, this pointer is write-only; it is examined solely by the GC's stack walker. This option will be significantly faster than maintaining metadata on the side. 4. Maintain a conventional stack map. It's an interesting set of tradeoffs, and the performant answer may well turn out to be architecture dependent. The reads on this data are rare, so a higher-overhead mechanism (the stack map) that does not impose overhead on calls is probably the right thing. On the other hand, the stack is likely-hot, the additional writes performed for the embedded object frames are "blind" writes (i.e. have no subsequent data dependencies), and it supports reuse of mechanism. On a superscalar processor this could go either way, and it would need to be measured. On an in-order processor I'd bet that the stack map approach is better - especially if we have a decent self-scaling hash table. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
