Re: [bitc-dev] Non copying Rc-Immix was Putting the stack on an RC-Immix heap

Ben Kloosterman Mon, 18 Nov 2013 19:33:43 -0800

On Mon, Nov 18, 2013 at 11:41 PM, Jonathan S. Shapiro <[email protected]>wrote:


> On Mon, Nov 18, 2013 at 5:57 AM, Ben Kloosterman <[email protected]>wrote:
>
>> On Mon, Nov 18, 2013 at 12:37 AM, Jonathan S. Shapiro 
>> <[email protected]>wrote:
>>
>>> I just realized that the immix papers don't mention one of the big
>>> advantages to bump allocations: when you are allocating multiple objects in
>>> sequence, the limit checks can be consolidated into a single check. It
>>> seems to me that this is not true for immix, because the occupancy count on
>>> each line makes the consolidation difficult. If things can't be
>>> consolidated, that's a pretty significant fast-path overhead. Does anybody
>>> know if they are consolidating successfully?
>>>
>>
>> I think there is a runtime contraint where the runtime presents 1 call to
>> alloc at a time.
>>
>
> Right. But if you know you are going to allocate an 8 word object followed
> by a 12 word object, you can just do a single 24 word (allowing for
> headers) allocation. Given the inline sequence you sent out, a conventional
> compiler would do this optimization more or less automatically, except that
> the procedure calls to the long path are an impediment.
>

In Jikes this is external C code not likeley to be inlined  .. but that
shouldnt stop another runtime.

>
>
>> I see no reason why if the runtime/compiler supports it you cant go.
>>
>> var ptr= tryallocatecontiguious ( size_of_multiple_objects.)
>> if ( ptr == NULL)
>>      // allocate each object individually
>> //construct the objects
>>
>> Though it does add to the inline path.
>>
>
> You wouldn't do it conditionally. You'd just do it. You'd probably end up
> allocating more chunks out of the "free" block, but that's not necessarily
> bad.
>

Yes .. missed that . That should work provided there is not too much
consolidation else you get big holes at the end of each block.


>
>> I think ( i checked but im not 100%) that the occupancy count is done
>> durring the collect phase (when processing newroots )  .
>>
>
> (slaps head) yes, of course. And that would make all the difference. If
> the object counts are handled during nursery collection, dynamically sized
> blocks aren't a problem at all.
>

Except a minor cost in managing them..

>
> Note the metadata for lines is located at the start of the Chunk not the
>> block ( i think all metadata except the object & object GC header is in the
>> chunk and hence a block is just unmarked objects)
>>
>
> That's strange, if only because it contradicts the paper.
>

Pretty sure on this i saw the line count lookup the Chunk table and put it
on an offset there.  Note the history here

Immix paper
Immix continued developed as production GC for Jikes
Gen Immix
RC Immix paper
Patch for Jikes.

Its kind of nice to have clean blocks and lines.. also all the non object
header metadata is close together  which will be better.



>
>
>>  Regarding the stack, there is another issue that I hadn't considered:
>>> if we use an immix block of some form for the stack, we can't rely on the
>>> collector to zero it in the background. We would have needed to zero
>>> reference slots anyway, so it's not like this is a new cost, but the
>>> stack-of-objects approach may change how eagerly we need to do that.
>>> Actually, I don't think it does, but it needs looking at.
>>>
>>
>> I take it we cant rely on the collector to do this because collector
>> threads  have immix block for a stack .
>>
>
> No no. We'll get an initially zero block just fine. The problem is that
> the stack pointer is going up and down within the block. As procedures
> return, they leave dirty lines "below" the stack. This happens *way* too
> rapidly for background zeroing to work.
>

I was thinking just the intial clear when the block is received as it is
received at the moment dirty..

>
> Maybe I haven't been clear enough. I'm not imagining that we allocate a
> frame as a nursery object. I'm imagining that we manage a *conventional* stack
> in an immix block, using the bump limit pointer to guard the end of the
> block so we know when to create a new stack block/segment. All I was doing
> was laying out the frames in a way that made the stack walkable by the
> conventional object mark/scan logic.
>

I understand though its a bit unconventional .. I keep thinking i can
lookup the type now since i have a vtable to the method...:-) .  Also the
bump pointer check is not conventional and you can go either way use the
bump pointer to check for end of stack vs guard page.  ( No reason you cant
have a guard page at the end of an Immix block though it does need a block
from 4K pages)

A bit concerned about the bump pointer , the paper i posted showed the
stack check cost was massive, here we have a  bump pointer check doing
effectively the same thing ( yet this is a super cheap check)  .



>
> One huge issue is block availability...
>>
>
> Right. Except now that I've said more clearly that I'm not allocating
> stack frames as heap objects, hopefully it's clearer why I don't think
> that's an issue.
>

Its a minor issue as you need some code to get a block , clear it and then
set it up as a stack.  Your probably thinking a seperate pool which fetches
its own pages but i dont think thats needed just reserved blocks in the
existing pool.


>
>
>> Writing a good self-scaling hash table  is not easy..
>>
>
> True. But in this case, doubling the hash table size each time is a good
> and simple approach.
>
>
>> I kind of like the idea of writing a reference to static meta data
>> regardless of the method since
>> - You want the writing to be as fast as possible and the parsing is less
>> important .
>> - It has constant time regardless of the amount of variables.
>>
>
> Sure, but the stack map is effectively the same. It's using the same
> actual map data, but it's statically precomputed and doesn't require any
> writes at run time at all.
>

Yeah i figured you  can put the map data as a  constant -word  offset of
the address ( say as a header to the method ) but since its not the mutator
may as well use a hash and not pollute the cache.

Ben

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Non copying Rc-Immix was Putting the stack on an RC-Immix heap

Reply via email to