GC experiments. Writing my own GC.

Adam Sakareassen via Digitalmars-d Mon, 12 May 2014 21:42:24 -0700

Hi all,

As a learning exercise I've just been doing some experimenting withrewriting the garbage collection code, and thought I might share some ofthe initial results. I only program as a hobby these days, and I'mcertainly no expert, but I thought some people might find it interesting.

My interest started because I wrote a LR1 file parser in D. I thenmulti threaded the application so multiple files could be parsedsimultaneously. (Disk IO was all on one thread). To my surprise, thethrough-put dropped significantly. I could process the files a lotfaster using only one thread. It turns out the delays were due to myliberal use of the “new” statement. Rather than block allocate thememory I thought I would just hack the GC. So I cleared out gc.d in theruntime and started again. The basic plan was to make it moremulti-thread friendly. Already I have learnt quite a lot. There are anumber of things I would do differently if I tried another re-write.However so far it has been a good learning experience.

Currently, allocations are all working, and the mark phase is running.It still will not sweep and free the memory.

All memory allocations are entirely lock free (using CAS instructions).So a pre-empted thread will never block another. For allocations ofless than 128 bytes, each thread is allocated memory from it's ownmemory pool to avoid false sharing on the CPU's cache. The collectorcomponent runs on a background thread using a mark and sweep algorithmwhich is basically the same as the existing algorithm. Currently thethread will wake up every 100ms and decide if a collection should beperformed. An emergency collection will run in the foreground if amemory allocation fails during that period.

The mark phase needs to stop the world. The sweeping portion of thecollection will run in the background. This is similar to the currentimplementation as the world is restarted after the mark phase, howeverthe thread doing the collection will not allocate the requested memoryto the calling thread until after the sweep has completed. This meansthat single threaded applications always wait for the full garbagecollection cycle.

So far allocation speed seems to have improved. I can't test collectionspeed as it's not complete. As a test I wrote a simple function thatallocates a linked list of 2 million items. This function is thenspawned by 20 threads. This test script is shown below. Timing forallocation (with GC disabled) is as follows. (Using DMD 2.065)


Existing GC code:  15700ms (average)
My GC code:   500ms (Average)

When performing the same amount of allocations on a single thread, thenew code is still slightly faster than the old.

What this demonstrates is that the locking mechanisms in the current GCcode is a huge overhead for multi threaded applications that perform alot of memory allocations. (ie. Use the “new” operator or dynamicarrays.)

It would be nice to see the default GC and memory allocator improved.There is certainly room for improvement on the allocator end which maymask some of the performance issues associated with garbage collection.

In the future I think D needs to look at making collection precise. Itwould not be too hard to adjust the mark and sweep GC to be nearlyprecise. The language needs to support precise GC before things likemoving garbage collection become feasible.

Anyway, I just thought I'd share the results of my experimenting. Iwould be happy to make the code available in a few weeks time. Perhapssomeone might find is useful. I need to get it finished and testedfirst. :-)


Cheers!
Adam

------
//Test script that generated these results:
import std.stdio;
import std.datetime;
import std.concurrency;
import core.memory;

class LinkedList{
        long value =0;
        LinkedList next;
}

shared int threadCount = 0;

void main(){
        core.memory.GC.disable();
        auto start = Clock.currSystemTick();

        foreach(i; 0 .. 20){
                auto tid = spawn(&doSomething, thisTid);
                threadCount++;
        }
                
        while(threadCount >0){};
        
        auto ln = Clock.currSystemTick() - start;
        writeln(ln.msecs, "ms");
}

void doSomething(Tid tid){
        auto top = new LinkedList;
        auto recent = top;

        //Create the linked list
        foreach(i; 1 .. 2_000_000){
                auto newList = new LinkedList;
                newList.value = i;
                recent.next = newList;
                recent = newList;
        }
        
        //Sum the values.  (Just spends some time walking the memory).
        recent = top;
        long total=0;
        while(recent !is null){
                total += recent.value;
                recent = recent.next;
        }
        writeln("Total : ", total );
        threadCount--;
}

GC experiments. Writing my own GC.

Reply via email to