On Thu, Oct 17, 2013 at 10:06 AM, Bennie Kloosteman <[email protected]>wrote:
> re S bits looks solid only issues ( and they both should not be) are we > sure only the last block in a line straddles ? ( seems likely but you never > know). > Yes. Lines are empty when allocated, and are filled by increasing addresses. The last object (not block) in a line can overflow into the next line. Once it does, the initial line is full and is marked S to indicate overflow (S for "spill"). We don't need to track this in both directions, because the marking algorithm is working from the object address rather than the line address. > There may also be a higher cost for getting the line data... > Yes. But the S bit is never consulted by the mutator. It is set at allocation time by the bump allocator, and it is consumed by the background collector while discovering and zeroing blocks of contiguous free lines. The allocator has to run a "does it fit" check on the line in any case. The hidden cost here is the cache line reference to update the per-line S bit. But in a nursery allocator that line will be cache hot. > Note that an object with an N bit set resides in thread-local storage; no >> cross-CPU contention exists for such objects. >> > > ???? So what happens when a reference to that object is passed to another > thread. > It can't be. Objects in the nursery, by design and requirement, are thread local until they are "graduated" into the general heap. So the answer to your question is either that a collection is triggered or that the object is initially allocated in the general heap. You definitely don't want to forward these objects, because you'd have to do a page invalidate on the nursery region until the forwarding completed. * Definitely* not a good idea! This isn't that unusual - objects allocated from the large object store (LOS) are also allocated from the general heap. The reason general heap allocation is avoided is to reduce contention and reference counting. That doesn't have to be 100% effective to make a huge difference. > > >> *New Object Allocation* >> >> - *Question:* Does nursery allocation always proceed from an >> initially free block? >> >> It doesnt say so .. so it would follow normal allocation a free block and > a recycled block .. > That's one possibility, but it would make the new object bump allocator more complex, which seems unfortunate. I tend to suspect that their strategy for nursery collection has room for improvement. I think that a better question (on my part) would have been: does tracing the nursery always cause the surviving objects in the nursery to be copied into the general heap? My *guess* is that surviving objects are unconditionally moved to the general heap. If so, *some* of those objects will be very young, which will lead to heap churn. If we believe the 10% survival heuristic, it might be worth looking at hybrid collection strategies in the nursery. So here's the thing: - The most recent (youngest) 10% of objects in the nursery are contiguous, at the top of the allocated region. - Run mark-copy to evacuate everything *below* that to the general heap. - Now either run mark-compact on the 10% or just implement a cyclic nursery bump allocator. The point is to give those infant objects time to die young. Note that we don't even need to mark-compact them. They're only taking up 10% of the nursery space in any case, and we can just ignore their internal fragmentation. The counter-argument would be: you'll catch those objects when you run the decrements on the non-live reference from the nursery to the general heap, and you'll probably end up returning most of those lines to their containing block at that point. Except that if you implement a deferred dec-buf.... This is something that can probably be determined only by experiment. > > >> >> - *Question*: Why isn't partial compaction being used here? I.e. a >> scheme in which up to *N* lines will be moved to the front of the >> nursery block, with the effect that the oldest lines trend toward the >> bottom and can be generationally eliminated? Or is this really what is >> meant by "compaction"? >> >> > Sounds good and there may be an attractive scheme without any stackmap / > relocation though obviously this will wasted more memory but they do > recycle to lines in partial blocks . > If stuff is getting copied to the general heap, you need a stack map in any case. You said this later. > Im not sure why they didnt check a more standard Nursery as discussed > previously , they do note the better performance of the generational > collector on smaller heaps (Stephen Blackburn was on both so he should > know) . Probably added implimentation cost and they were already good > enough. > Graduate students working on big things are often encouraged to stop when they have enough for a dissertation or a paper. Then they write something in the paper like "well, we wanted to see how the thing we were focusing on did without polluting the experiment with other techniques". Sound like anything you may have seen in the RC-immix paper? >From a research perspective, I actually think this was a very reasonable "line in the sand" for this particular work. That doesn't mean that *we* need to be constrained by it. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
