On 02/10/12 14:54, Oliver Plow wrote: >>> I wonder how much it helps to just optimize the GC a little. How much >>> does the performance gap close when you use DMD 2.058 beta instead of >>> 2.057? This upcoming release has several new garbage collector >>> optimizations. If the GC is the bottleneck, then it's not surprising > > Is there a way to "turn off" the GC, e.g. a compiler switch to set the heap > size to a large number so that the GC is likely not to set in? I searched > through this page: > http://www.d-programming-language.org/dmd-windows.html#switches But couldn't > find anything helpful. Then you could measure the thing with GC "turned off" > to see whether the GC is the problem or not.
Calling GC.disable() at runtime will delay the GC until it's actually needed, but won't disable it completely. Having a std noop GC stub selected by a switch would be nice, but you can get the same effect by giving the linker an object that has the necessary stubs. For this test case something like the patch below improves things significantly, for more gains, std.concurrency needs more invasive changes. Note it's just POC, to measure the current std.concurrency efficiency vs other approaches. The "freelist" arrays are not freed and leak, a complete implementation would free them when the link is shut down. Ignore the synchronize() calls - "synchronized" isn't properly lowered by the compiler, so i had to resort to this after switching the locking primitives. It should work with std "synchronized" equally well. The original testcase from this thread achieves ~4M msg/sec with this change (the numbers aren't stable, but mostly in the 3.5..4.0M range, 4.5M+ happens sometimes). The memory usage also decreases noticeably. artur --- std/concurrency.d +++ std/concurrency.d @@ -1387,7 +1396,7 @@ private m_last = n; Node* todelete = n.next; n.next = n.next.next; - //delete todelete; + delete todelete; m_count--; } @@ -1430,6 +1439,56 @@ private { val = v; } + import core.memory; + import core.exception; + new(size_t size) { + void* p; + if (afreelist.length) + p = afreelist[--afreelist.length]; + else if (gfreelist.length) { + { + scope lock = synchronize(fl); + if (gfreelist.length) { + afreelist = cast(Node*[])gfreelist; + gfreelist.length=0; + } + } + if (afreelist.length) + p = afreelist[--afreelist.length]; + } + + if (p) + return p; + + p = std.c.stdlib.malloc(size); + if (!p) + throw new OutOfMemoryError(); + GC.addRange(p, size); + return p; + } + delete(void* p) { + if (!p) + return; + pfreelist ~= cast(Node*)p; + if (pfreelist.length>=8) + { + { + scope lock = synchronize(fl); + gfreelist ~= cast(shared Node*[])pfreelist; + } + pfreelist.length=0; + pfreelist.assumeSafeAppend(); + } + // At some point all free nodes need to be freed, using: + //GC.removeRange(p); + //std.c.stdlib.free(p); + } + static Node*[] afreelist; + static ubyte[56] d1; + static Node*[] pfreelist; + static ubyte[56] d2; + shared static Node*[] gfreelist; + shared static Mutex fl; }