I want to give a really simple example of how relocating heaps change
things.

In order to have a base for experiments, I've been working on a dumb, clean
CLI implementation. The first thing you implement in this sort of effort is
a dumper for PE files, mainly to make sure you understand the format
properly. That needs to open a PE file, so you need to write some code to
load the image.

But you're going to use that later in LoadLibrary, so you want the PE image
loader to load the image into the heap. The question is: *which* heap? The
C heap or the GC heap?

Putting it in the C heap has definite advantages. You don't have to worry
about it getting collected or relocated, so you don't need to write any
special guard code around it. You also don't need to deal with the fact
that the PE image needs to be mmap'd with special permissions, and
therefore isn't anything remotely like a "normal" large object in the view
of the GC heap.

Those advantages are also disadvantages. If you *don't* somehow load the
image into the GC heap, it becomes challenging to garbage collect it later
(though that's challenging for other reasons).

Now if you *do* put it in the GC heap, you immediately hit a bunch of
issues:

   1. You need to create a root handle for it, since the C reference isn't
   visible to the GC.
   2. Given the possibility of a relocating collector framework, you need
   to pin it whenever the C code is messing with it.
   3. Because a PE image has segments, it is not loaded as a contiguous
   object. If you want it to logically live in the GC heap, you *may* need
   a model for non-contiguous GC heap objects. Not a very complete model, but
   at least a half-assed model.
   4. For that matter, you need at least the beginnings (the allocating
   part) of a GC runtime in order to write the first bit of test code.
   5. While loading all of this, you don't get to stick object headers in
   front of some parts. If you put any of those parts (e.g. the code segment)
   into the GC heap, you now have "alien stuff" in the GC heap, and need the
   ability to recognize that stuff for what it is.

And yet it would be *so* convenient if this stuff could just be GC'd...

And then, of course, LoadLibrary could conceivably be called for the same
library by two client threads at the same time, so you don't get to ignore
concurrency issues here even for a first implementation of PEdump. At which
point you discover how awkward the pthreads logic for thread create and
management it.


What's interesting here, in part, is the degree to which you cannot hide *for
even a moment* from the need to deal with things that cross over from the
legacy heap into the GC heap and vice versa. You could say "hey, I'll just
write pedump in C# to begin with", but it turns out that the PE file
structures are not a good match for what can be expressed in C#. Besides,
this doesn't really solve anything, because ultimately the CLI
implementation needs to load that first image in any case.


Heh. It's also interesting that a certain *other* CLI implementation
doesn't do path handling right in this [deceptively complicated] bit of
code.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to