Re: Windows multi-threading performance issues on multi-core systems only

Jacob Carlborg Wed, 16 Dec 2009 04:12:58 -0800

On 12/16/09 03:44, Michel Fortin wrote:

On 2009-12-15 19:49:43 -0500, dsimcha <dsim...@yahoo.com> said:

== Quote from Simen kjaeraas (simen.kja...@gmail.com)'s article

Tested this on a Core 2 Duo, same options. OS is Windows 7, 64bit. It
scales roughly inverse linearly with number of threads:
163ms for 1,
364ms for 2,
886ms for 4
This is quite different from your numbers, though.


Yea, forgot to mention my numbers were on Win XP. Maybe Windows 7
critical
sections are better implemented or something. Can a few other people
with a
variety of OS's run this benchmark and post their numbers?


Core 2 Duo / Mac OS X 10.6 / 4 threads:

Crystal:~ mifo$ ./test
Set affinity, then press enter.

Bus error

Runs for about 18 seconds, then crashes. At first glance, it looks as if
the Thread class is broken and for some reason I get a null dereference
when a thread finishes. Great!

Anyway, I've done some sampling on the program while it runs, and each
of the worker thread spans about 85% of its time inside _d_monitorenter
and 11% in _d_monitorleave soon after starting the program, which later
becomes 88% and 7% respectively soon before the program finishes.

The funny things is that if I just bypass the GC like this:

void doAppending() {
uint* arr = null;
foreach(i; 0..1_000_000) {
arr = cast(uint*)realloc(arr, (uint*).sizeof * (i+1));
arr[i] = i;
}
// leak arr
}

it finishes (I mean crashes) in less than half a second. So it looks
like realloc does a much better job at locking it's data structure that
the GC.


It runs fine on Mac OS X 10.5 with dmd 2.037.

Re: Windows multi-threading performance issues on multi-core systems only

Reply via email to