On Tue, Nov 19, 2013 at 8:55 PM, Ben Kloosterman <[email protected]> wrote:
> On Wed, Nov 20, 2013 at 1:27 AM, Jonathan S. Shapiro <[email protected]>wrote: > >> >> > ...if we use a copying nursery (i.e. a hybrid approach), then the nursery >> itself is *always* a free block (perhaps bigger than 32K, but still a >> free block). >> > > Correct and note with such a copying Nursery i see no reason to move > objects or defragment... > This question about whether we need to relocate is important, because it drives a *whole* lot of (eventual) work, and it constrains some aspects of the source level language design. I'd really like to try to get a better handle on it. It would be *really* nice to know how the fragmentation numbers work out over time. We have a whole bunch of input that says only 10% of the nursery generally survives. That number has regime shifts, but it seems to be a sensible guideline. The survivors get copied into a new generation. Let's call that G1. I don't know of good demographics about survival in G1, I suspect it's higher than 10%, but how much higher? Some of the objects in G1 will have been young in the nursery. 90% of those will die. But how many is that? We *do* have various data that suggests G1 immix blocks exist after GC in which 90% of the objects have died but 10% remain. We don't know the distribution of those pesky live objects within the block, but it doesn't seem like an insurmountable number. I don't have any intuition about how that number changes over multiple recyclings. Problem is: a 90% free block is not a fully free block, especially if it has bad fragmentation characteristics. My guess is that a lot of those 10% occupied blocks arise from regime shifts. That is: the dead 90% seem likely to be dying in bulk. If blocks are 32Kb/128b, then a 10% block has something approximating 10 free regions which, on average, are roughly 10 lines (1280 bytes) long. I suspect that if we manage blocks in a *vaguely* size-partitioned way, we can probably achieve an average block occupancy of 85% without relocating. That part, at least, seems pretty reasonable to me. The problem is that we still aren't going to get any fully free blocks, which means that mid-sized allocations are going to tend to push us to grow the heap. It *does* seem possible to me that a *rough* partitioning scheme in which mid-sized objects go to "big blocks" might sufficiently resolve this, but I *really*hate to concede a 15% heap inefficiency and an indefinitely growing heap at the gate. When I weigh this against the cost of building a whole optimizing compiler that tracks enough information to do relocating *well*, it's undeniably a hard choice, and there is a very strong temptation to toss relocation out the window, at least initially. But having said that, I'm also mindful of something that Steve got right at NeXT: he never built a machine with a single-plane display. There are ways to "cheat" in code with a black-and-white bitmap that you can't do with a multilayer bitmap, and he wanted to make sure that bad algorithms never got into the code in the first place. I think we're in much the same position with respect to relocating collection. My itchy expectation is that we *do* eventually need to do a relocating collector, and consequently an optimizing compiler with the right infrastructure. And I think that means that we need a relocating option *now* to keep ourselves honest. It doesn't have to be a good relocating collector, and it doesn't have to be a good optimizer. It *does* need to be enough to serve to keep us honest on language and runtime design and semantics. In the end, I'm pretty sure that Ben is right, and that relocation is only necessary in corner cases. For those "stuck" objects, I'd really like to know what the in-degree is, and I'd like some sense of how read-hot the containing blocks are once the other 90% of the objects are gone. That would tell us if a "mostly not copying" design is feasible. At the moment I don't feel like I have any basis for intuitions on that. Unfortunately, the only way to get that data appears to be building an implementation we can measure. As nice and as clean as Jikes is, it is built to support a language with a qualitatively different set of objects and a significantly different BCVM semantics available to the compiler. And we're contemplating making further changes to the BCVM semantics. Do we know of any comparative data on *Java* programs between CLR and JVM that would tell us whether, for those programs, the behaviors of the respective BCVM machines are comparable? There is a complication however right now there is no ref counting at all > in RC-Immix if there is sufficient space . Ref counting only occurs durring > collections , this sync is important. eg Allocate from free Collect ( > ref count and create holes) then repeat Allocate from free , Allocate to > holes , Collect.. Im not sure if there are any holes created by > processing a modbuff/newbuff and ref counts going to 0 outside of the > collect phase. The complication is freeing these objects cant use ref > counts else it would need to be synced with the ref count hence it will > just do a GC Mark and you probably dont want these sync'd with the ref > count as Nursery sweeps are too frequent. > This got too compacted for me to follow. I'm not sure what "sync" you are referring to. Could you expand on this, but could you please do it in a separate message of its own? No need to change the subject line; I just feel like the responses are getting too spread out. In the end, I *think* you're saying that there is an overall heap threshold size below which the right strategy is to grow the heap in preference to doing *any* GC. I'm not clear whether, during this phase, you bother keeping track of deferred references or not. Tracking them would require more space, but it probably isn't initially necessary if you are prepared to rebuild the counts during the first GC (which seems reasonable). Note if these counts are only done durring a GC then IMHO its not really a > ref counting GC but merely a standard GC where the ref counting replaces > the mark and card table. > Or in an alternative view, it's a hybrid scheme. But yes, I agree. > We can very easily impose a rule in the nursery that maintains /start/ at >> an two word boundary. In that case, for typically aligned allocations, it >> reduces to... >> > Agree. Was thinking the same thing , there is quite a lot of allignment code > which i wandered about but its because Jikes code is uniform ( there is no > platform specific conditionals) so when running on a 386 it would use allign > 4. > > Yup. I think part of the confusion is that you are looking at the general allocator implementation (the one used by the relocator) rather than the initial inline allocator implementation. The initial allocator has a lot of static type information that it can use to partially evaluate a bunch of this code. The general allocator no longer has access to that, and so has to actually do the work. :-) And if we institute that alignment requirement on the cursor, then the >> probe hoists to the top of the procedure, and we get... >> > > Whether you can do static object sizes depend on the integration between > the runtime object creation, allocator , compiler/jit and app. If it was > all a same language blob then sure. > Yes. For nursery allocations, at least, it seems very doable. > And, of course, a lot of this will peephole out if the compiler is aggressive > about it. This approach won't work for loops, but it still helps. >> >> aggressive loop unrolling will help though. > Definitely. The issue is a loop that does allocations which outlive the loop, where you don't know a (low) static bound on the number of loop iterations. When those conditions hold, you can't preallocate a (possibly conservative) frame size. It's similar to the conditions under which you need a frame pointer vs. just the stack pointer. If you prefer to think of it in C-like terms, think of it as similar to using alloca() inside a loop in C code. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
