Re: Simple performance question from a newcomer

Daniel Kozak via Digitalmars-d-learn Sun, 21 Feb 2016 07:58:42 -0800

You can use -profile to see what is causing it.

  Num          Tree        Func        Per
  Calls        Time        Time        Call

23000000 550799875 550243765 23 pure nothrow @nogc@safe double std.algorithm.iteration.sumPairwise!(double,std.experimental.ndslice.slice.Slice!(1uL, std.range.iota!(double,double, double).iota(double, double,double).Result).Slice).sumPairwise(std.experimental.ndslice.slice.Slice!(1uL,std.range.iota!(double, double, double).iota(double, double,double).Result).Slice)


Dne 21.2.2016 v 15:32 dextorious via Digitalmars-d-learn napsal(a):

I've been vaguely aware of D for many years, but the recent additionof std.experimental.ndslice finally inspired me to give it a try,since my main expertise lies in the domain of scientific computing andI primarily use Python/Julia/C++, where multidimensional arrays can behandled with a great deal of expressiveness and flexibility. Beforewriting anything serious, I wanted to get a sense for the kind of codeI would have to write to get the best performance for numericalcalculations, so I wrote a trivial summation benchmark. The followingcode gave me slightly surprising results:
import std.stdio;
import std.array : array;
import std.algorithm;
import std.datetime;
import std.range;
import std.experimental.ndslice;

void main() {
    int N = 1000;
    int Q = 20;
    int times = 1_000;
    double[] res1 = uninitializedArray!(double[])(N);
    double[] res2 = uninitializedArray!(double[])(N);
    double[] res3 = uninitializedArray!(double[])(N);
    auto f = iota(0.0, 1.0, 1.0 / Q / N).sliced(N, Q);
    StopWatch sw;
    double t0, t1, t2;
    sw.start();
    foreach (unused; 0..times) {
        for (int i=0; i<N; ++i) {
            res1[i] = sumtest1(f[i]);
        }
    }
    sw.stop();
    t0 = sw.peek().msecs;
    sw.reset();
    sw.start();
    foreach (unused; 0..times) {
        for (int i=0; i<N; ++i) {
            res2[i] = sumtest2(f[i]);
        }
    }
    sw.stop();
    t1 = sw.peek().msecs;
    sw.reset();
    sw.start();
    foreach (unused; 0..times) {
        sumtest3(f, res3, N, Q);
    }
    t2 = sw.peek().msecs;
    writeln(t0, " ms");
    writeln(t1, " ms");
    writeln(t2, " ms");
    assert( res1 == res2 );
    assert( res2 == res3 );
}

auto sumtest1(Range)(Range range) @safe pure nothrow @nogc {
    return range.sum;
}

auto sumtest2(Range)(Range f) @safe pure nothrow @nogc {
    double retval = 0.0;
    foreach (double f_ ; f)    {
        retval += f_;
    }
    return retval;
}
auto sumtest3(Range)(Range f, double[] retval, double N, double Q)@safe pure nothrow @nogc {
    for (int i=0; i<N; ++i)    {
        for (int j=1; j<Q; ++j)    {
            retval[i] += f[i,j];
        }
    }
}
When I compiled it using dmd -release -inline -O -noboundscheck../src/main.d, I got the following timings:
1268 ms
312 ms
271 ms
I had heard while reading up on the language that in D explicit loopsare generally frowned upon and not necessary for the usual performancereasons. Nevertheless, the two explicit loop functions gave me animprovement by a factor of 4+. Furthermore, the difference betweensumtest2 and sumtest3 seems to indicate that function calls have asignificant overhead. I also tried using f.reduce!((a, b) => a + b)instead of f.sum in sumtest1, but that yielded even worse performance.I did not try the GDC/LDC compilers yet, since they don't seem to beup to date on the standard library and don't include the ndslicepackage last I checked.
Now, seeing as how my experience writing D is literally a few hours,is there anything I did blatantly wrong? Did I miss any optimizations?Most importantly, can the elegant operator chaining style be generallymade as fast as the explicit loops we've all been writing for decades?

Re: Simple performance question from a newcomer

Reply via email to