On Tue, Nov 12, 2013 at 11:03 PM, Jonathan S. Shapiro <[email protected]>wrote:

> On Sun, Nov 10, 2013 at 7:29 PM, Ben Kloosterman <[email protected]>wrote:
>
>> One ( big?) bonus is you can walk the stack back and provide accurate
>> debug in crashes . We dont need a frame pointer  and can handle all sorts
>> of dynamic types.
>>
>
> Yes, though this comment drew my attention to another problem with
> stack-on-RC-immix: you need to bump the allocation count in the per-line
> metadata (or set the object start bit). If we use the "object starts here"
> bit idea instead of the count, that starts to be a fair number of
> instructions.
>
> I still consider the "object starts here bit" to be a hair-brained thought
> until somebody actually looks at the code sequence involved.
>

Im looking over RC-Immix at the moment playing around with nocopying looks
interesting . Will have a look at this when i hack the allocator.

Pretty sure you are right they specifically mention the IBM mark-region
collector doing this and saying they dont need the overhead ( they can find
it by a fast walk with a manageable worst case)



>
> Since I'm tossing out thoughts here:
>
> I was contemplating whether "object free" procedure calls need an on-stack
> object. The answer is "yes and no". We seem to have four options:
>
> 1. Stack is modeled as a dense linear sequence of bump-allocated objects.
> Per-line metadata is *not* maintained (no counts, no object start bits).
> The stack pointer is used to identify the end of the sequence.
>
> 2. Stack is modeled as a non-dense linear sequence of bump-allocated
> objects. Non-reference data may intervene between these objects. The cost
> of this is that per-line metadata must be maintained so that object start
> positions can be located.
>
> I kind of favoured this at first but if you have a 32 bit Vtable these can
be put down in 1 word  which will not be much more than setting a bit and a
bit set will be an add for reference using methods.
Its important to note we are talking about function as objects here as you
say.


> 3. Variant on [2], except that we modify the "embedded frame object"
> layout to include a pointer to the preceding frame. Note that in contrast
> to the saved SP, this pointer is write-only; it is examined solely by the
> GC's stack walker. This option will be significantly faster than
> maintaining metadata on the side.
>

Tricky and elegant .  Probably the strongest of these options .. your
holding the reference to the previous call-object , maybe a slight
improvement is this you hold a pointer to the vtable word ( which is the
header) of the last reference holding call object  . You keep bump
alocating new frames but every time you hit a reference frame  you write
the offset into the high 32 bits of the vtable word  ( since the vtable is
likely 32 bits) . This way  reference free call-objects have no header but
 you know the location of the next reference object.

Obviously you have to handle end of block /start of block (probably dummy
calls that do nothing with a header). This is attractive because reference
free methods have no cost for the stack map   ( apart from a conditional
check on the method type) . And for a reference contain procedure you have
  Write the Vtable word ( cached write) , Store last ref ( reg to reg mov)
, a conditional check on the method "type"  and a  cached store to the
 previous header .  Still its pretty light for a stackmap

The cost of this method is higher for 32 bit systems ( it cost an
additional word and the register loss is higher - likely the register will
need to be reloaded from the allocator )

A bit based stack map has a no/low cost for no references but an increasing
cost as the number of references increase.




> 4. Maintain a conventional stack map.
>
>
> It's an interesting set of tradeoffs, and the performant answer may well
> turn out to be architecture dependent. The reads on this data are rare, so
> a higher-overhead mechanism (the stack map) that does not impose overhead
> on calls is probably the right thing. On the other hand, the stack is
> likely-hot, the additional writes performed for the embedded object frames
> are "blind" writes (i.e. have no subsequent data dependencies), and it
> supports reuse of mechanism. On a superscalar processor this could go
> either way, and it would need to be measured. On an in-order processor I'd
> bet that the stack map approach is better - especially if we have a decent
> self-scaling hash table.
>

One issue is if you change the stack you may loose all stack/GC help the
compiler gives you ..but this is a less of an issue for a custom JIT.

Ben
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to