A possible GC optimization

bearophile Thu, 07 May 2009 15:45:20 -0700

Among the new things done for Python 2.7 (currently in alpha) they have 
improved the GC:


http://bugs.python.org/issue4074
http://bugs.python.org/issue4688

There they have shown a small Python program named "tuple_gc_hell.py", this is 
a reduced version:

import time, gc

def list_of_tuples(n):
    m = n // 10
    l = []
    prev_time = time.time()
    for i in xrange(n):
        if i % m == 0:
            cur_time = time.time()
            print i, round(cur_time - prev_time, 2)
            prev_time = cur_time
        l.append((i % 2, i % 12))

#gc.disable()
#import psyco; psyco.full()
list_of_tuples(10 * 1000 * 1000)


Its output:

0 0.0
1000000 1.2
2000000 2.58
3000000 3.98
4000000 4.89
5000000 6.64
6000000 8.05
7000000 9.44
8000000 9.84
9000000 12.2

Total running time: 73.3 s


Disabling the GC and using Psyco JIT compiler:

0 0.0
1000000 0.17
2000000 0.19
3000000 0.17
4000000 0.19
5000000 0.16
6000000 0.17
7000000 0.17
8000000 0.17
9000000 0.19

Total running time: 2.67 s


Despite being a very different language, with a very different GC (Python has 
an improved reference counting GC) I have tried to test a similar D program 
(compiled with DMD v2.029):

import std.c.time: clock; // where's CLOCKS_PER_SEC?
import std.stdio: writeln;
//import std.gc: disable;
//import core.memory: disable; // where's the disable?

void array_of_pairs(int n) {
    class S {
        int x, y;
        this(int x, int y) {
            this.x = x;
            this.x = y;
        }
    }

    int m = n / 10;
    S[] l;
    auto prev_time = clock();
    for (int i; i < n; i++) {
        if (i % m == 0) {
            auto cur_time = clock();
            writeln(i, " ", (cur_time - prev_time) / 1000.0);
            prev_time = cur_time;
        }
        l ~= new S(i % 2, i % 12);
    }
}

void main() {
    //disable();
    array_of_pairs(10_000_000);
}

Its output is not good:

0 0
1000000 1.594
2000000 3.563
3000000 6.156
4000000 8.5
5000000 9.469
6000000 10.047
7000000 13.453
8000000 18.703
9000000 24.203

Total running time: 122.3 s


In DMD2.029 array appends are very slow so in a successive version I have 
created the array up-front. And I have seen that pulling the class out of the 
array_of_pairs function speeds up the program significantly. To improve the 
situation even more I have used a struct instead of a class:

import std.c.time: clock;
import std.stdio: writeln;

struct S {
    int x, y;
    this(int x, int y) {
        this.x = x;
        this.x = y;
    }
}

void array_of_pairs(int n) {
    int m = n / 10;
    auto l = new S*[n];
    auto prev_time = clock();
    for (int i; i < n; i++) {
        if (i % m == 0) {
            auto cur_time = clock();
            writeln(i, " ", (cur_time - prev_time) / 1000.0);
            prev_time = cur_time;
        }
        l[i] = new S(i % 2, i % 12);
    }
}

void main() {
    //disable();
    array_of_pairs(10_000_000);
}

Its output:

0 0
1000000 0.266
2000000 0.625
3000000 0.547
4000000 0.641
5000000 0.765
6000000 0.891
7000000 1.031
8000000 1.172
9000000 1.328

Total running time: 9.46 s

Now it's much faster than Python (with GC and without JIT), but the GC shows 
having troules anyway.

Just to be sure it's not a D2 problem I have done the following test with DMD 
v1.042 and Phobos1:

import std.c.time: clock, CLOCKS_PER_SEC;
import std.stdio: writefln;
import std.gc: disable;

struct S { int x, y; }

void list_of_tuples(int n) {
    int m = n / 10;
    auto l = new S*[n];
    auto prev_time = clock();
    for (int i; i < n; i++) {
        if (i % m == 0) {
            auto cur_time = clock();
            writefln(i, " ", (cur_time - prev_time) / 
cast(float)CLOCKS_PER_SEC);
            prev_time = cur_time;
        }
        l[i] = new S;
        l[i].x = i % 2;
        l[i].y = i % 12;
    }
}

void main() {
    disable();
    list_of_tuples(10_000_000);
}

Its output:

0 0
1000000 0.297
2000000 0.344
3000000 0.359
4000000 0.391
5000000 0.421
6000000 0.438
7000000 0.484
8000000 0.5
9000000 0.532

Total running time: 4.96
About twice faster.

Now that Phobos 2 has Tuple people will use it, and many of such tuples will 
not contain references, so I think an optimization similar to the one done for 
Python 2.7 may become useful.

Bye,
bearophile

A possible GC optimization

Reply via email to