I want to give a really simple example of how relocating heaps change things.
In order to have a base for experiments, I've been working on a dumb, clean CLI implementation. The first thing you implement in this sort of effort is a dumper for PE files, mainly to make sure you understand the format properly. That needs to open a PE file, so you need to write some code to load the image. But you're going to use that later in LoadLibrary, so you want the PE image loader to load the image into the heap. The question is: *which* heap? The C heap or the GC heap? Putting it in the C heap has definite advantages. You don't have to worry about it getting collected or relocated, so you don't need to write any special guard code around it. You also don't need to deal with the fact that the PE image needs to be mmap'd with special permissions, and therefore isn't anything remotely like a "normal" large object in the view of the GC heap. Those advantages are also disadvantages. If you *don't* somehow load the image into the GC heap, it becomes challenging to garbage collect it later (though that's challenging for other reasons). Now if you *do* put it in the GC heap, you immediately hit a bunch of issues: 1. You need to create a root handle for it, since the C reference isn't visible to the GC. 2. Given the possibility of a relocating collector framework, you need to pin it whenever the C code is messing with it. 3. Because a PE image has segments, it is not loaded as a contiguous object. If you want it to logically live in the GC heap, you *may* need a model for non-contiguous GC heap objects. Not a very complete model, but at least a half-assed model. 4. For that matter, you need at least the beginnings (the allocating part) of a GC runtime in order to write the first bit of test code. 5. While loading all of this, you don't get to stick object headers in front of some parts. If you put any of those parts (e.g. the code segment) into the GC heap, you now have "alien stuff" in the GC heap, and need the ability to recognize that stuff for what it is. And yet it would be *so* convenient if this stuff could just be GC'd... And then, of course, LoadLibrary could conceivably be called for the same library by two client threads at the same time, so you don't get to ignore concurrency issues here even for a first implementation of PEdump. At which point you discover how awkward the pthreads logic for thread create and management it. What's interesting here, in part, is the degree to which you cannot hide *for even a moment* from the need to deal with things that cross over from the legacy heap into the GC heap and vice versa. You could say "hey, I'll just write pedump in C# to begin with", but it turns out that the PE file structures are not a good match for what can be expressed in C#. Besides, this doesn't really solve anything, because ultimately the CLI implementation needs to load that first image in any case. Heh. It's also interesting that a certain *other* CLI implementation doesn't do path handling right in this [deceptively complicated] bit of code. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
