On Tue, Nov 12, 2013 at 11:03 PM, Jonathan S. Shapiro <[email protected]>wrote:
> On Sun, Nov 10, 2013 at 7:29 PM, Ben Kloosterman <[email protected]>wrote: > >> One ( big?) bonus is you can walk the stack back and provide accurate >> debug in crashes . We dont need a frame pointer and can handle all sorts >> of dynamic types. >> > > Yes, though this comment drew my attention to another problem with > stack-on-RC-immix: you need to bump the allocation count in the per-line > metadata (or set the object start bit). If we use the "object starts here" > bit idea instead of the count, that starts to be a fair number of > instructions. > > I still consider the "object starts here bit" to be a hair-brained thought > until somebody actually looks at the code sequence involved. > Im looking over RC-Immix at the moment playing around with nocopying looks interesting . Will have a look at this when i hack the allocator. Pretty sure you are right they specifically mention the IBM mark-region collector doing this and saying they dont need the overhead ( they can find it by a fast walk with a manageable worst case) > > Since I'm tossing out thoughts here: > > I was contemplating whether "object free" procedure calls need an on-stack > object. The answer is "yes and no". We seem to have four options: > > 1. Stack is modeled as a dense linear sequence of bump-allocated objects. > Per-line metadata is *not* maintained (no counts, no object start bits). > The stack pointer is used to identify the end of the sequence. > > 2. Stack is modeled as a non-dense linear sequence of bump-allocated > objects. Non-reference data may intervene between these objects. The cost > of this is that per-line metadata must be maintained so that object start > positions can be located. > > I kind of favoured this at first but if you have a 32 bit Vtable these can be put down in 1 word which will not be much more than setting a bit and a bit set will be an add for reference using methods. Its important to note we are talking about function as objects here as you say. > 3. Variant on [2], except that we modify the "embedded frame object" > layout to include a pointer to the preceding frame. Note that in contrast > to the saved SP, this pointer is write-only; it is examined solely by the > GC's stack walker. This option will be significantly faster than > maintaining metadata on the side. > Tricky and elegant . Probably the strongest of these options .. your holding the reference to the previous call-object , maybe a slight improvement is this you hold a pointer to the vtable word ( which is the header) of the last reference holding call object . You keep bump alocating new frames but every time you hit a reference frame you write the offset into the high 32 bits of the vtable word ( since the vtable is likely 32 bits) . This way reference free call-objects have no header but you know the location of the next reference object. Obviously you have to handle end of block /start of block (probably dummy calls that do nothing with a header). This is attractive because reference free methods have no cost for the stack map ( apart from a conditional check on the method type) . And for a reference contain procedure you have Write the Vtable word ( cached write) , Store last ref ( reg to reg mov) , a conditional check on the method "type" and a cached store to the previous header . Still its pretty light for a stackmap The cost of this method is higher for 32 bit systems ( it cost an additional word and the register loss is higher - likely the register will need to be reloaded from the allocator ) A bit based stack map has a no/low cost for no references but an increasing cost as the number of references increase. > 4. Maintain a conventional stack map. > > > It's an interesting set of tradeoffs, and the performant answer may well > turn out to be architecture dependent. The reads on this data are rare, so > a higher-overhead mechanism (the stack map) that does not impose overhead > on calls is probably the right thing. On the other hand, the stack is > likely-hot, the additional writes performed for the embedded object frames > are "blind" writes (i.e. have no subsequent data dependencies), and it > supports reuse of mechanism. On a superscalar processor this could go > either way, and it would need to be measured. On an in-order processor I'd > bet that the stack map approach is better - especially if we have a decent > self-scaling hash table. > One issue is if you change the stack you may loose all stack/GC help the compiler gives you ..but this is a less of an issue for a custom JIT. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
