On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:
I've written this simple piece of code:
[CODE]
import std.algorithm;
import std.array;
import std.conv;
import std.datetime;
import std.parallelism;
import std.range;
import std.stdio;
import core.atomic;
import core.thread;
void main()
{
immutable long n = 11;
long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];
long j = 0;
long l = z.length;
StopWatch w; w.reset(); w.start;
do
{
if (z[0] == 0) break; else
{
long m = 0;
for (long i = 0; i < z.length; i += 2)
{
m += (z[i]*10 + z[i+1]);
}
if (m % n == 0) j++;
}
}
while (nextPermutation(z));
w.stop();
j.writeln;
w.peek().msecs.writeln;
}
[/CODE]
I've compiled it with "dmd -m64 -release -inline <filename>.d".
Output is 58679513, and it takes around 52 seconds to run.
Then I've replicated this code in MSVS2015 C++:
[CODE]
#include <algorithm>
#include <iostream>
#include <ctime>
using namespace std;
const long N = 11;
long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 };
long L = sizeof(Z) / sizeof(Z[0]);
int main(void)
{
long j = 0;
clock_t t = clock();
do {
if (Z[0] == 0) break;
long m = 0;
for (long i = 0; i < L; i += 2) {
m += (Z[i] * 10 + Z[i + 1]);
}
if (m % N == 0) {
j++;
}
} while (next_permutation(Z, Z+L));
t = clock() - t;
cout << j << endl;
cout << (float(t)) / CLOCKS_PER_SEC << endl;
return 0;
}
[/CODE]
Compile mode is also Release/x64. I've run the program and was
eagerly surprised, not to say stunned. The output is also
58679513, but it's done in under 6 seconds!
Care to explain? I haven't found a solution to this :(
On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The
slowdown in D seems to be due to parts of nextPermutation not
ending up inlined.
Be careful with benchmarks like this, you are giving the compiler
a lot more information than it usually has in any real world case
(here it knows the exact values of all the input
data/parameters!).