On Sat, 19 Feb 2011 22:39:10 -0500, Nick Sabalausky <a@a.a> wrote:

"bearophile" <bearophileh...@lycos.com> wrote in message
news:ijpkh8$232r$1...@digitalmars.com...
dsimcha:

Yeh, I rebuilt the same model in my head over the past few hours (like
you, I had a good mental model of the GC at one point but have slowly
forgotten it).

For serious D programmers having such model in the head is so important
that I'd like a short page with such explanations & pseudocode in the D
site :-)


I seem to remember finding out at one point (the hard way) that arrays of
primitives are scanned for pointers unless you explicitly tell the GC not to do so (for each array). A long time ago, this caused me major headaches when I tried to write a batch processing tool for image files. I never actually
verified this, but my understanding was that the file and image data
contained false pointers that prevented the data from completed images from being collected. So the program just happily kept munching away at more and
more memory (and presumably, more false pointers) with each image until
after only a fairly small number of files it ground to a sputtering halt
under the weight of its own memory footprint.

This is not correct. Arrays of value-elements (builtins or custom structs with no pointers in them, the compiler records this in the TypeInfo) are marked NO_SCAN when the array is constructed.

Here are the two problems I know of:

1. You can have a struct with sparse pointers in it, an array of such structs would not be marked NO_SCAN. This means, even the non-pointer parts are scanned as pointers. For example:

struct Bad
{
   void *pointer;
   int[1024] data;
}

the data member is scanned in addition to the pointer member. This can create false hits on the heap, keeping those blocks alive.

IMO, this problem is not very significant -- those types of aggregates are uncommon.

2. A large value-element array is kept in memory because of false pointers. There are several types of memory that are *always* considered pointers. This means you always have false pointers laying around, even if you never allocate any heap data. For example, stack data is always considered to contain all pointers. As well as global and TLS data. I think even the proposed "precise" scanning patches for D's GC in bugzilla do not address stack or global space.

This is a bigger problem. If you, for example, allocate a 200MB array, in a 4GB address space, that's 1/20th the total address space. The chances that a false pointer somewhere in the stack or TLS or globals "points" at that data is 1/20.

If you don't explicitly delete such arrays, and stop pointing at them, that memory is now leaked. If you start allocating more of them, you are now even more likely to have false pointers.

This second issue I think is much more likely to happen. I proposed at one point that we should have a library construct that aids in freeing such chunks of memory deterministically to try and avoid this problem. I don't think it was well received, but I can't really remember.

-Steve

Reply via email to