On Thu, 18 Feb 2016 13:00:12 +0000, Witek wrote: > So, the question is, why is D / DMD allocator so slow under heavy > multithreading?
It's a global GC, possibly with a little per-thread pool. As part of the abortive Amber language project, I was looking into ways to craft per-thread GC. You need to tell the runtime whether a variable is marked as shared or __gshared and that's pretty much sufficient -- you can only refer to unshared variables from one thread, which means you can do a local collection stopping only one thread's execution. You can have one heap for each thread and one cross-thread heap. This work hasn't happened in D yet. I would like to look into D's GC and parallelism more. I've started on mark/sweep parallelism but haven't made any worthwhile progress. I'll take this as my next task. It's more awkward because it requires changes to the runtime interface, which means modifying both DMD and the runtime.