On Thursday, 18 February 2016 at 13:00:12 UTC, Witek wrote:
So, the question is, why is D / DMD allocator so slow under heavy multithreading? The working set is pretty small (few megabytes at most), so I do not think this is an issue with GC scanning itself. Can I plug-in tcmalloc / jemalloc, to be used as the underlying allocator, instead of using glibc? Or is D runtime using mmap/srbk/etc directly?

Thanks.

As others have noted, this is due to heavy contention in the GC.
There is a pending PR (https://github.com/D-Programming-Language/druntime/pull/1447) to replace the recursive mutex with a spinlock, that should improve the numbers a bit but doesn't solve the problem. Since version 2.070 we also suspend threads in parallel which heavily reduces the pause times with many threads https://github.com/D-Programming-Language/druntime/pull/1110.

The real fix (using thread local allocator caches) has a very high priority in our backlog (https://trello.com/c/K7HrSnwo/28-thread-cache-for-gc), but isn't yet fully implemented. You can see the latest state here https://github.com/MartinNowak/druntime/commits/gcCache. I still need to add a queue on each thread cache to sync metadata.

So for the time being, use at least 2.070.0, and try to replace GC allocations with malloc.

Reply via email to