On Sunday, 29 July 2012 at 14:43:09 UTC, Dmitry Olshansky wrote:
On 29-Jul-12 18:17, Andrei Alexandrescu wrote:
On 7/29/12 8:17 AM, Gor Gyolchanyan wrote:
std.variant is so incredibly slow! It's practically unusable
for
anything, which requires even a tiny bit of performance.
You do realize you actually benchmark against a function that
does
nothing, right? Clearly there are ways in which we can improve
std.variant to the point initialization costs assignment of
two words,
but this benchmark doesn't help. (Incidentally I just prepared
a class
at C++ and Beyond on benchmarking, and this benchmark makes a
lot of the
mistakes described therein...)
Andrei
This should be more relevant then:
//fib.d
import std.datetime, std.stdio, std.variant;
auto fib(Int)()
{
Int a = 1, b = 1;
for(size_t i=0; i<100; i++){
Int c = a + b;
a = b;
b = c;
}
return a;
}
void main()
{
writeln(benchmark!(fib!int, fib!long, fib!Variant)(10_000));
}
dmd -O -inline -release fib.d
Output:
[TickDuration(197), TickDuration(276), TickDuration(93370107)]
I'm horrified. Who was working on std.variant enhancements?
Please chime in.
I thought this results are a bit strange, so I converted the
result to seconds. This gave me:
[3.73e-06, 3.721e-06, 2.97281]
One million inner loop iterations in under 4 microseconds? My
processor's frequency isn't measured in THz, so something strange
must be going on here. In order to find out what it was, I
changed the code to this:
writeln(benchmark!(fib!int, fib!long)(1000_000_000)[]
.map!"a.nsecs() * 1.0e-9");
and used a profiler on it. The relevant part of the output is:
0.00 : 445969: test %r12d,%r12d
0.00 : 44596c: je 445975 <_D3std8date
46.67 : 44596e: inc %ebx
0.00 : 445970: cmp %r12d,%ebx
0.00 : 445973: jb 44596e <_D3std8date
0.00 : 445975: lea -0x18(%rbp),%rdi
0.00 : 445979: callq 45a048 <_D3std8date
0.00 : 44597e: mov %rax,0x0(%r13)
0.00 : 445982: lea -0x18(%rbp),%rdi
0.00 : 445986: callq 459fb4 <_D3std8date
0.00 : 44598b: xor %ebx,%ebx
0.00 : 44598d: test %r12d,%r12d
0.00 : 445990: je 445999 <_D3std8date
53.33 : 445992: inc %ebx
0.00 : 445994: cmp %r12d,%ebx
0.00 : 445997: jb 445992 <_D3std8date
As you can see, most of the time is spent in two loops with empty
body, so your code is benchmarking Variant against nothing, too.
Adding asm{ nop; } to fib changes the output to this:
[0.00437154, 0.00444938, 3.03917]
Whih is still a huge difference.