Re: [RFC]Proposal for better garbage collection

Dmitry Olshansky Wed, 22 Feb 2012 12:05:26 -0800

On 22.02.2012 22:56, Benjamin Thaut wrote:

As I'm not satisfied with the current GC D has and don't see the
situation improving in the future without significant changes to the
compiler I wrote the following document that points out all the possible
issues with garbage collection I could think of and possible solutions
for them. This document is just a draft and only a proposal, critics and
comments are welcome.


Kind Regards
Benjamin Thaut


1) Preface

All modern Languages with fast and efficient GCs have build in support
for garbage collection within the compiler / virtual machine. Currently
the D language does have as much GC support as C++ has but fully relies
on the GC with a lot of language features and the standard library. As a
result the GC is slow, non parallel and always needs to stop all threads
before collection. This document suggests to build in better support for
garbage collection into the language so that creating a fast and
efficient GC becomes possible. A list of features that would be required
is described in this document.

2) Tracking references on the stack:

The D compiler always needs to emit a full stack frame so that the GC
can walk up the stack at any time in the program.

I think walking up the stack to collect this info again and again (thestack has a lot of "heavy frames" on the bottom, right?) sounds like atremendously slow way of getting necessary memory ranges. I'm no expertthough.


The stack frame of

every function generated by the D compiler starts which a bitfield
(usually the size of a machine register) where each bit indicates that
these bytes are a pointer / reference. The bitfield needs to be large
enough to cover the whole stack frame of the function.

For example on x86: 11001...
1 = bytes 0 to 4 are a pointer
1 = bytes 4 to 8 are a pointer
00 = bytes 8 to 16 are not a pointer
1 = bytes 16 to 20 are a pointer

The last bit indicates whether the bitfield is continued in the
following bytes or not. 1 means continued 0 means finished.
Every scope generated by the D compiler would need additional code at
the start and end of the scope. When the scope is entered the bitfield
would be patched to represent the new variables inside the scope and
when the scope is left the bitfield is patched again to remove the
changes that were made on entering the scope.

Again I'm no expert, but what happens when GC starts collecting a threadstack amid this patching operation?

Every time a function gets called that did not get generated by the D
compiler ( C / C++ etc functions) the compiler generates a call into the
runtime and passes the current stack pointer and stack base pointer to it.

void _d_externalCallStart(void* stackptr, void* baseptr);

Every time such a function returns the compiler generates a call into
the the runtime too.

void _d_externalCallEnd(void* stackptr, void* baseptr);

Why would you need these? And if you start calling callback on everyoperation, you may just as well pass direct ranges of memory to GCwithout stack walk.+ you can call D function from C one that in turn calls D one, thinkextern(C) and callbacks.

Every time a functions that can get called from other languages
(extern(C) etc) are executed the end callback is inserted at the start
of the functions and the start callback is inserted at the end of the
function.
Using these callbacks the GC can mark certain parts of the stack as
"non-D" and ignore them when scanning for bit fields and
references/pointers. It can just skip parts of the stack that are
"non-D" and therefore does not need a full stack frame within these
"non-D" sections.
All these features are required so that the GC can precisely scan
pointers/references on the stack and change them as necessary.
Remaining issues: The D compiler can freely move around value types on
the stack. With such move operations it would be necessary to fix up all
the bit fields. I needs to be investigated if this is doable.

3) Tracking references on the heap

For every class / struct a mixin template which is defined inside the
runtime gets instantiated. This template can then use introspection to
generate the necessary information to allow the GC to scan the pointers
within that struct / class precisely.

4) Thread local / global memory

A callback into the runtime needs to happen in the following cases:
- a __gshared variable is assigned
- a reference / pointer is casted to immutable
- a reference / pointer is casted to shared

void _d_castToGlobalMem(void* ptr);

This can be used by the GC to keep thread local pools of memory and move
a memory block to a global memory pool as soon as it is needed there.

5) pointer / reference changed callback

Every time a pointer / reference is changed the D compiler emits a call
into the runtime and passes the new value of the reference / pointer
with it.


Bye-bye any speed of p++ ? I mean I'm horrified, and I bet I'm not alone.


void _d_pointerChanged(void *ptr);

This can be used when writing a generational GC to have separate pools
for young and old generations. Every time the young generation needs to
be collected it can be avoided to scan the old generations pool because
it is sufficient to only check the pointers that have changed since the
last time the young generation was collected. With the above mentioned
callback it is easily possible to track these references.

Remaining issues:
-If the GC interrupts a thread right before any of the above mentioned
callbacks happen it will cause a invalid state for the GC and the GC
might access invalid pointers. It has to be investigated if this leads
to invalid behavior. It can be fixed by not interrupting a thread but
pause it the next time it calls any of callbacks, or other functions
that can be interrupted by the GC. This in turn could cause a thread to
be non pausable because it is stuck in a polling loop. The compiler
could identify loops without any GC interruptible function and manually
insert one.
-When pointers/references are passed inside processor registers the GC
cannot know if these values are actually pointers/references or
represent other values. If threads are paused at defined points in the
code as mentioned before this issues would be fixed because the state at
these points is known and can be handled accordingly.

6) Different interface for the GC

The current interface to the GC would have to change because the "this
block of memory might contain a pointer" approach wouldn't work anymore.
For example a block of memory and a delegate which iterates over all
pointers within the memory block could be used for user allocated memory
blocks. There should be a separate allocator function provided by the GC
that allocates memory that does not get moved around so it can be used
to pass it to non garbage collected code.

7) Compiler Options

Each of the above mentioned groups of features should be exposed as
compiler options so that you can turn them on/off depending on which
type of GC you use. Default on/off states for these features are set
within a config file depending on which type of GC currently ships per
default with druntime.

Combinatorial explosion of sets of options that doesn't necessary allowa particular GC?


8) Conclusion

Garbage Collection brings a lot of advantages for the programmer using
the language but is not free and shouldn't be treated as free. Full
support for garbage collection is required to build a fast and efficient
GC. This additional support requires additional features within the
compiler but should result in a overall better performing language.


Well, that was critic ;)

--
Dmitry Olshansky

Re: [RFC]Proposal for better garbage collection

Reply via email to