On 02/09/2012 08:27 PM, Sean Kelly wrote:
On Feb 9, 2012, at 10:14 AM, Marco Leise wrote:

Am 09.02.2012, 17:22 Uhr, schrieb dsimcha<dsim...@yahoo.com>:

I wonder how much it helps to just optimize the GC a little.  How much does the 
performance gap close when you use DMD 2.058 beta instead of 2.057?  This 
upcoming release has several new garbage collector optimizations.  If the GC is 
the bottleneck, then it's not surprising that anything that relies heavily on 
it is slow because D's GC is still fairly naive.

I did some OProfile-ing. The full report is attached, but for simplicity it is 
without call graph this time. Here is an excerpt:

CPU: Core 2, speed 2001 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 100000
samples  %        linenr info                 symbol name
13838    18.8416  gcx.d:426                   void* gc.gcx.GC.malloc(ulong, 
uint, ulong*)
4465      6.0795  gcx.d:2454                  ulong 
gc.gcx.Gcx.fullcollect(void*)

One random thing that just occurred to me… if the standard receive pattern is:

receive((int x) { … });

There's a good chance that a stack frame is being dynamically allocated for the delegate 
when it's passed to receive (since I don't believe there's any way to declare the 
parameters to receive as "scope").  I'll have to check this, and maybe consider 
changing receive to use alias template parameters instead of normal function parameters?

You can mark an entire tuple as scope without trouble:

void foo(T,S...)(T arg1, scope S args) {...}

Does this improve the run time?

Reply via email to