On 02/09/2012 08:27 PM, Sean Kelly wrote:
On Feb 9, 2012, at 10:14 AM, Marco Leise wrote:
Am 09.02.2012, 17:22 Uhr, schrieb dsimcha<dsim...@yahoo.com>:
I wonder how much it helps to just optimize the GC a little. How much does the
performance gap close when you use DMD 2.058 beta instead of 2.057? This
upcoming release has several new garbage collector optimizations. If the GC is
the bottleneck, then it's not surprising that anything that relies heavily on
it is slow because D's GC is still fairly naive.
I did some OProfile-ing. The full report is attached, but for simplicity it is
without call graph this time. Here is an excerpt:
CPU: Core 2, speed 2001 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (Unhalted core cycles) count 100000
samples % linenr info symbol name
13838 18.8416 gcx.d:426 void* gc.gcx.GC.malloc(ulong,
uint, ulong*)
4465 6.0795 gcx.d:2454 ulong
gc.gcx.Gcx.fullcollect(void*)
One random thing that just occurred to me… if the standard receive pattern is:
receive((int x) { … });
There's a good chance that a stack frame is being dynamically allocated for the delegate
when it's passed to receive (since I don't believe there's any way to declare the
parameters to receive as "scope"). I'll have to check this, and maybe consider
changing receive to use alias template parameters instead of normal function parameters?
You can mark an entire tuple as scope without trouble:
void foo(T,S...)(T arg1, scope S args) {...}
Does this improve the run time?