A few days ago, I commented that I thought that maybe the GC should be using spinlocks, given how little time a typical allocation takes compared to context switches, etc. I've created a version of the D 2.21 druntime GC with spinlocks instead of synchronized, and created the following simple benchmark to just generate a ton of contention for the GC:
import core.thread, core.memory, std.perf, std.stdio, std.c.time, std.c.stdio; void main() { readln; //Allow for affinity to be changed. GC.disable; auto T = new Thread(&foo); T.start; scope auto pc = new PerformanceCounter; pc.start; foo(); T.join; pc.stop; writeln(pc.milliseconds); } void foo() { foreach(i; 0..10_000_000) { auto foo = GC.malloc(8); GC.free(foo); } } Here are the times: Using both of my CPU cores, meaning serious contention, in milliseconds: Spinlock: 10006 Synchronized: 28563 The synchronized version uses ~25-30% CPU, because of OS rescheduling, while the spinlock version uses 100%. Setting the affinity to only one CPU to simulate a single-CPU environment: Spinlock: 4356 Synchronized: 4758 Replacing one thread's foo() by a dummy function so that the lock is never even contested: Spinlock: 1876 Synchronized: 2589 I will acknowledge that this is an extremely simple benchmark, but I think it's reasonably representative of a severely contested memory allocation lock. The spinlock I used was the simplest possible atomic CAS lock, nothing fancy.