On Sunday, 29 July 2012 at 14:43:09 UTC, Dmitry Olshansky wrote:
On 29-Jul-12 18:17, Andrei Alexandrescu wrote:
On 7/29/12 8:17 AM, Gor Gyolchanyan wrote:
std.variant is so incredibly slow! It's practically unusable for
anything, which requires even a tiny bit of performance.

You do realize you actually benchmark against a function that does
nothing, right? Clearly there are ways in which we can improve
std.variant to the point initialization costs assignment of two words, but this benchmark doesn't help. (Incidentally I just prepared a class at C++ and Beyond on benchmarking, and this benchmark makes a lot of the
mistakes described therein...)


Andrei


This should be more relevant then:

//fib.d
import std.datetime, std.stdio, std.variant;

auto fib(Int)()
{
        Int a = 1, b = 1;
        for(size_t i=0; i<100; i++){
                Int c = a + b;
                a = b;
                b = c;
        }
        return a;       
}

void main()
{
        writeln(benchmark!(fib!int, fib!long, fib!Variant)(10_000));
}


dmd -O -inline -release fib.d

Output:

[TickDuration(197), TickDuration(276), TickDuration(93370107)]

I'm horrified. Who was working on std.variant enhancements? Please chime in.

I thought this results are a bit strange, so I converted the result to seconds. This gave me:

[3.73e-06, 3.721e-06, 2.97281]

One million inner loop iterations in under 4 microseconds? My processor's frequency isn't measured in THz, so something strange must be going on here. In order to find out what it was, I changed the code to this:

    writeln(benchmark!(fib!int, fib!long)(1000_000_000)[]
        .map!"a.nsecs() * 1.0e-9");

and used a profiler on it. The relevant part of the output is:

    0.00 :        445969:       test   %r12d,%r12d
    0.00 :        44596c:       je     445975 <_D3std8date
   46.67 :        44596e:       inc    %ebx
    0.00 :        445970:       cmp    %r12d,%ebx
    0.00 :        445973:       jb     44596e <_D3std8date
    0.00 :        445975:       lea    -0x18(%rbp),%rdi
    0.00 :        445979:       callq  45a048 <_D3std8date
    0.00 :        44597e:       mov    %rax,0x0(%r13)
    0.00 :        445982:       lea    -0x18(%rbp),%rdi
    0.00 :        445986:       callq  459fb4 <_D3std8date
    0.00 :        44598b:       xor    %ebx,%ebx
    0.00 :        44598d:       test   %r12d,%r12d
    0.00 :        445990:       je     445999 <_D3std8date
   53.33 :        445992:       inc    %ebx
    0.00 :        445994:       cmp    %r12d,%ebx
    0.00 :        445997:       jb     445992 <_D3std8date


As you can see, most of the time is spent in two loops with empty body, so your code is benchmarking Variant against nothing, too. Adding asm{ nop; } to fib changes the output to this:

[0.00437154, 0.00444938, 3.03917]

Whih is still a huge difference.

Reply via email to