Hi,

Compiling the Rakudo setting creates a lot of objects (the parse tree, the PAST tree, the model and so forth). Not only that, but since it's building up a complete view of the thing it's compiling, those objects tend to stay around for a long time. Innevitably, that makes for a pretty large working set. Of course, there's lots of very short lived things (CallContext, for example) that come and go.

From the profiles I've been looking at, in situations where we make lots of objects but they are short lived, the GC seems to handle reasonably well. In theory, since it's generational, it'd continue to do so when we're doing things like compiling the setting, since the tree nodes we're keeping around survive a collection and end up in an older generation, and the regular collections just grab the short-lived things. However, the actual profile results are not so rosy.

Tonight I dug the profile data I was seeing, and found that gc_gms_is_pmc_ptr seems to be a hot path. A little analysis later, I think I know what's going on. When we do a GC run, we have to find all pointers to PMCs and STRINGs, and that includes scanning the stack. Since we can't be precise there, we have to validate the things we've seen really are string and PMC pointers. gc_gms_is_pmc_ptr is the function that does that. However, we'd not expect this to be so costly - the C stack isn't really going to be that deep. Yet it shows up high in the profile. What's going on?

The PMC headers are allocated from a pool. A pool is a linked list of arenas. And arena is a chunk of memory holding a bunch of PMC headers. Thus if we have a PMC pointer, it must have an address at a valid offset somewhere between the start address and the end address of one of the arenas in the pool. The way that we determine this is "the obvious": walk the linked list, and check if the pointer lies within the bounds of the arena, and at a valid offset.

Unfortunatley, I suspect this is actually quite expensive. An arena is 4096 bytes. On a 32 bit machine, given that a PMC is 16 bytes and there's an extra 4 bytes for a pointer used by the GC, we get about 200 PMCs in an arena. On a 64 bit machine I guess that drops off to about 100. So if we had, say, 100,000 PMCs around in memory, that's 500/1000 linked list follows to determine it's not a pointer, and average 250/500 if it is (maybe temporal locality biases the search one way or the other). And we probably have many more PMCs than that. Also note that we do this per "candidate" pointer that we see in the memory block.

That would be less than ideal, but there's another issue: we're likely jumping all over the place in memory as we visit these arenas. We're probably getting a ton of CPU cache misses as we do so, which is going to make walking the linked list more expensive than it may otherwise be. We go through this a bunch of times, and given the arenas probably line up well with pages, it's possible that we hit unfortunate overlap issues with the cache too (IIUC, set associative ones can do weird things in these cases).

One obvious fix would be to just keep an extent list: an array of (start,end) pairs, hanging off the pool header, which we can very cheaply scan through (since it's just a bunch of values laid out contiguously in memory). That way, we don't go chasing all over RAM, and clobber the cache (though since we're doing a GC mark we may well be about to anyway...)

It'd be interesting to know if anybody else sees things like this when profile Rakudo (nom branch) setting compilation, and of course a patch would be interested to see too (and, in theory, shouldn't be too hard).

Thanks,

Jonathan

_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to