On Feb 9, 2012, at 10:14 AM, Marco Leise wrote: > Am 09.02.2012, 17:22 Uhr, schrieb dsimcha <dsim...@yahoo.com>: > >> I wonder how much it helps to just optimize the GC a little. How much does >> the performance gap close when you use DMD 2.058 beta instead of 2.057? >> This upcoming release has several new garbage collector optimizations. If >> the GC is the bottleneck, then it's not surprising that anything that relies >> heavily on it is slow because D's GC is still fairly naive. > > I did some OProfile-ing. The full report is attached, but for simplicity it > is without call graph this time. Here is an excerpt: > > CPU: Core 2, speed 2001 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit > mask of 0x00 (Unhalted core cycles) count 100000 > samples % linenr info symbol name > 13838 18.8416 gcx.d:426 void* gc.gcx.GC.malloc(ulong, > uint, ulong*) > 4465 6.0795 gcx.d:2454 ulong > gc.gcx.Gcx.fullcollect(void*)
One random thing that just occurred to me… if the standard receive pattern is: receive((int x) { … }); There's a good chance that a stack frame is being dynamically allocated for the delegate when it's passed to receive (since I don't believe there's any way to declare the parameters to receive as "scope"). I'll have to check this, and maybe consider changing receive to use alias template parameters instead of normal function parameters?