Re: My ACCU 2016 keynote video available online

Jens Müller via Digitalmars-d-announce Thu, 19 May 2016 05:56:13 -0700

On Thursday, 19 May 2016 at 12:04:31 UTC, Andrei Alexandrescuwrote:

On 5/19/16 4:12 AM, Jens Müller wrote:

What test data did you use?

An instance for benchmarking is generated as follows. Given nnzwhich is the sum of non-zero indices in input vector a and b.


auto lengthA = uniform!"[]"(0, nnz, gen);
auto lengthB = nnz - lengthA;

auto a = iota(0, nnz).randomSample(lengthA, gen).map!(i =>Pair(i, 10)).array();auto b = iota(0, nnz).randomSample(lengthB, gen).map!(i =>Pair(i, 10)).array();


So I take a random sample of (0, ..., nnz) for each input.

Any better idea? I've seen that people generate vectors withlarger gaps.

10%-20% win on dot product is significant because for manyalgorithms dot product is a kernel and dominates everythingelse. For those any win goes straight to the bottom line.

Sure. Still I wasn't sure whether I got the idea from your talk.So maybe there is/was more.

The base line (dot1 in the graphs) is the straightforwardversion


---
size_t i,j = 0;
double sum = 0;
while (i < a.length && j < b.length)
{
     if (a[i].index < b[j].index) i++;
     else if (a[i].index > b[j].index) j++;
     else
     {
         assert(a[i].index == b[j].index);
         sum += a[i].value * b[j].value;
         i++;
         j++;
     }
}
return sum;
---


That does redundant checking. There's a better baseline:

---
if (a.length == 0 || b.length == 0)
    return 0;
const amax = a.length - 1, bmax = b.length - 1;
size_t i,j = 0;
double sum = 0;
for (;;)
{
    if (a[i].index < b[j].index) {
        if (i++ == amax) break;
    }
    else if (a[i].index > b[j].index) {
        bumpJ: if (j++ == bmax) break;
    }
    else
    {
        assert(a[i].index == b[j].index);
        sum += a[i].value * b[j].value;
        if (i++ == amax) break;
        goto bumpJ;
    }
}
return sum;
---


I check that.

Then if you add the sentinel you only need the bounds tests inthe third case.

I post the sentinel code later. Probably there is something toimprove there as well.

BTW the effects vary greatly for different compilers.
For example with dmd the optimized version is slowest. Thebaseline isbest. Weird. With gdc the optimized is best and gdc's code isalwaysfaster than dmd's code. With ldc it's really strange. Slowerthan dmd. I
assume I'm doing something wrong here.

Used compiler flags
dmd v2.071.0
-wi -dw -g -O -inline -release -noboundscheck
gdc (crosstool-NG 203be35 - 20160205-2.066.1-e95a735b97) 5.2.0
-Wall  -g -O3 -fomit-frame-pointer -finline-functions -frelease
-fno-bounds-check -ffast-math
ldc (0.15.2-beta2) based on DMD v2.066.1 and LLVM 3.6.1
-wi -dw -g -O3 -enable-inlining -release -boundscheck=off

Am I missing some flags?


These look reasonable.


But ldc looks so bad.

Any comments from ldc users or developers? Because I see this inmany other measurements as well. I would love to have anothercompiler producing efficient like gdc.

For example what's equivalent to gdc's -ffast-math in ldc.

I uploaded my plots.
- running timehttps://www.scribd.com/doc/312951947/Running-Time
- speed up https://www.scribd.com/doc/312951964/Speedup
What is dot2? Could you please redo the experiments with themodified code as well?

dot2 is an optimization for jumping over gaps more quicklyreplacing the first two if statements with while statements.But my benchmark tests have no large gaps but interestingly itdoes make things worse.


Jens

Re: My ACCU 2016 keynote video available online

Reply via email to