On Mon, Nov 18, 2013 at 5:57 AM, Ben Kloosterman <[email protected]> wrote:
> On Mon, Nov 18, 2013 at 12:37 AM, Jonathan S. Shapiro <[email protected]>wrote: > >> I just realized that the immix papers don't mention one of the big >> advantages to bump allocations: when you are allocating multiple objects in >> sequence, the limit checks can be consolidated into a single check. It >> seems to me that this is not true for immix, because the occupancy count on >> each line makes the consolidation difficult. If things can't be >> consolidated, that's a pretty significant fast-path overhead. Does anybody >> know if they are consolidating successfully? >> > > I think there is a runtime contraint where the runtime presents 1 call to > alloc at a time. > Right. But if you know you are going to allocate an 8 word object followed by a 12 word object, you can just do a single 24 word (allowing for headers) allocation. Given the inline sequence you sent out, a conventional compiler would do this optimization more or less automatically, except that the procedure calls to the long path are an impediment. > I see no reason why if the runtime/compiler supports it you cant go. > > var ptr= tryallocatecontiguious ( size_of_multiple_objects.) > if ( ptr == NULL) > // allocate each object individually > //construct the objects > > Though it does add to the inline path. > You wouldn't do it conditionally. You'd just do it. You'd probably end up allocating more chunks out of the "free" block, but that's not necessarily bad. > I think ( i checked but im not 100%) that the occupancy count is done > durring the collect phase (when processing newroots ) . > (slaps head) yes, of course. And that would make all the difference. If the object counts are handled during nursery collection, dynamically sized blocks aren't a problem at all. Note the metadata for lines is located at the start of the Chunk not the > block ( i think all metadata except the object & object GC header is in the > chunk and hence a block is just unmarked objects) > That's strange, if only because it contradicts the paper. > Regarding the stack, there is another issue that I hadn't considered: if >> we use an immix block of some form for the stack, we can't rely on the >> collector to zero it in the background. We would have needed to zero >> reference slots anyway, so it's not like this is a new cost, but the >> stack-of-objects approach may change how eagerly we need to do that. >> Actually, I don't think it does, but it needs looking at. >> > > I take it we cant rely on the collector to do this because collector > threads have immix block for a stack . > No no. We'll get an initially zero block just fine. The problem is that the stack pointer is going up and down within the block. As procedures return, they leave dirty lines "below" the stack. This happens *way* too rapidly for background zeroing to work. Maybe I haven't been clear enough. I'm not imagining that we allocate a frame as a nursery object. I'm imagining that we manage a *conventional* stack in an immix block, using the bump limit pointer to guard the end of the block so we know when to create a new stack block/segment. All I was doing was laying out the frames in a way that made the stack walkable by the conventional object mark/scan logic. One huge issue is block availability... > Right. Except now that I've said more clearly that I'm not allocating stack frames as heap objects, hopefully it's clearer why I don't think that's an issue. > Normal Rc-Immix cycle is Allocate all new blocks ,Allocate all recycled > blocks , Run Collect. We probably dont want a stack to use Recycled blocks > so we need some sort of reserve here. > For what I have in mind, the stack *definitely* can't use recycled blocks. > Writing a good self-scaling hash table is not easy.. > True. But in this case, doubling the hash table size each time is a good and simple approach. > I kind of like the idea of writing a reference to static meta data > regardless of the method since > - You want the writing to be as fast as possible and the parsing is less > important . > - It has constant time regardless of the amount of variables. > Sure, but the stack map is effectively the same. It's using the same actual map data, but it's statically precomputed and doesn't require any writes at run time at all. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
