I've spent some free brain cycles today thinking about this and here's the 
scheme
I have in mind.  If anyone thinks this could be improved in a way that would not
have substantial ripple effects throughout the compiler/language (because then 
it
might never actually get implemented) let me know.

1.  GC.malloc's signature changes to GC.malloc(size_t size, uint ba = 0, size_t*
bitmask = null).  A null bitmask means use the old-school behavior and either 
scan
everything or don't scan anything based on the NO_SCAN bit.  IMHO plain old
conservative scanning must be supported to allow for untyped memory blocks to be
allocated, because requiring every memory block to have a type associated with 
it
is an unacceptable limitation in a systems language.

For now, the only way to get precise scanning would be to call malloc directly 
or
use a template that does.  Eventually new would either be fixed to instantiate 
the
bitMask!(T) template or eliminated entirely.  Since using the precise scanning 
by
writing a template that calls GC.malloc() is a lot easier than hacking the
compiler, this may be a catalyst for getting rid of new.

2.  Some concern has been expressed in the past about the possibility of using 4
bytes per block in overhead to store pointers to bitmasks.  IMHO this concern is
misplaced because in any program that takes up enough memory for space 
efficiency
to matter, false pointers waste more space.  Furthermore, if you're programming
for a toaster oven, elevator controller, etc., you probably aren't going to use
the standard out-of-the-box GC anyhow.  However, I've thought of ways to 
mitigate
this and have come up with the following:

    A.  Store a pointer to the bitmask iff the NO_SCAN bit is set.
        this is a no-brainer and will prevent any overhead on,
        for example, arrays of floats.

    B.  Add another attributes bit to the GC called something like
        NEEDS_BITMASK.  This bit would be set iff an object mixes
        pointers and non-pointers.  If it's not set, no bitmask
        pointer would be stored.  However, the overhead of an
        extra bit may or may not be worth it.

3.  The bitmask pointer would be stored at the end of every GC-allocated block 
for
which a bitmask pointer is stored.  The reason for using the end of the block
instead of the beginning is just implementation simplicity:  That way, finding 
the
beginning of a block would work the same whether or not we have a bitmask 
pointer.

4.  The bitmask would be a size_t[] created at compile time by a template and
stored in the static data segment.  Its layout would be [length of array,
T.sizeof, offsets that need to be scanned].  For example, if you have something 
like:

struct Foo {
    uint bar;
    void* ptr;
}

On a 32-bit machine, bitMask!Foo would be [3, 8, 4].  On a 64-bit, it would be 
[3,
16, 8].  The reason the size of the array is stored in the array is so that we 
can
get away with storing a single ptr in each memory block instead of a pointer 
and a
length.

5.  To store information about pinning, we simply use the high-order bits of the
pointer offsets.  1 means pinned, 0 means not pinned.  This means that, for any
type T, T.sizeof can't be bigger than size_t.max / 2.  I think this is a fairly
minor limitation.

Reply via email to