Re: Optimize my code =)

Robin Sat, 15 Feb 2014 15:10:49 -0800

Hiho,

wow, first of all: this community is awesome - so many kind andinteresting answers. =)

With your help I was able to achieve a performance boost forseveral different operations.


Some benchmarks:

Allocation of 5 10.000 x 10.000 matrices in a row:
Before: ~8,2 seconds
After: ~2,3 seconds (with the minimallyInitializedArray!)

Multiplication of two 1000x1000 matrices:
Before: ~14,8 seconds
After: ~4,3 seconds

However, I think there is still much potential in order tofurther optimize this code. Let me tell you what changes I havemainly performed on the code so far ...

Matrix is still a class but I changed it to a final classpreventing matrix methods to be virtual. Dimension is now a finalstruct (don't know if 'final' is affecting structs in any waytough ...). This mainly gave the multiplication a hugeperformance boost.

When converting the Matrix to a struct from class themultiplication even lowered from ~4.3 seconds to about 3.6seconds. However, I am currently not sure if I want matrices tobe structs (value types).

Besides that I tried to add nothrow and pure as attribute toevery possible method. However, as it turned out I wasn't able toadd the pure modifier to any method as I always called an impuremethod with it (as stated by the compiler). This actually madesense most of the times and I think the only pure methods now arethe constructor methods of the Dimension struct.


What may still speed up things?

In my tests it turned out that the simple code:

auto m1 = Matrix!double.random(1000, 1000, -10, 10, 0.25);
auto m2 = Matrix!double.random(1000, 1000, -10, 10, 0.25);
auto m3 = m1 * m2;

Called the normal copy-constructor. This is sad as it would be ahuge performance boost if it would make use of the movesemantics. (In the C++ matrix codes this scenario would actuallycall move assignment operator for matrix m3 which is much muchfaster than copying.)

But I haven't figured out yet how to use move semantics in D withclass objects. Or is that just possible with struct value types?

I have also tried the LDC compiler. However, it gave me somestrange bugs. E.g. it didn't like the following line:


ref Matrix transpose() const {
        return new Matrix(this).transposeAssign();
}

And it forced me to change it to:

ref Matrix transpose() const {
        auto m = new Matrix(this);
        m.transposeAssign();
        return m;
}

Which is kind of ugly ...

I hope that this is getting fixed soon, as I imply it as correctD code because the DMD is capable to compile this correctly.

Some of you came up with the thing that transposing the matrixbefore multiplication of both takes place must be much slowerthan without the transposition. In my former Java and C++programs I have already tested what strategy is faster and itturned out that cache efficiency DOES matter of course. There arealso papers explaining why a transpose matrix multiplication maybe faster than without transposing.In the end this is just a test suite and there is already an evenfaster approach of a matrix multiplication which runs in O(n^2.8)instead of O(n^3) as my current simple solution. And with thepower of OpenCL (or similar) one could even lift a matrixmultiplication to the GPU and boom.^^ But the current task is tofind general optimized code for the D language - this threadalready helped me much!

I wasn't aware that D is actually capable of lazy evaluationsother than the if-pseude-lazy-evaluation which is kind of cool.However, I still have to look that up in order to maybe find ause for it in these codes.

SIMD instructions also sound extremely cool but a little bit toocomplex regarding the fact that I am fairly new to D and stilllearning basics of this language.

In the end I wanted to state something which I found veryiritating. Bearophile stated correctly that my pre-conditions forthe index operators are all wrong and corrected my code with somesmooth additions:


T opIndex(in size_t row, in size_t col) const nothrow
in {
    assert(row < nRows);
    assert(col < nCols);
} body {
    return data[dim.offset(row, col)];
}

The in and body statements are cool as far as I realize what theyare for. However, in the benchmarks they had a clear andnoticable negative impact on the matrix multiplication whichraised to ~8 seconds from ~4 seconds with his code compared tomine. When leaving out the in and body statement block and usingonly one normal block as follows, it stayed at the 4 secondsduration for that task and so I think that the compiler won'toptimize things in an 'in' code block - is that right?


T opIndex(in size_t row, in size_t col) const nothrow {
    assert(row < nRows);
    assert(col < nCols);
    return data[dim.offset(row, col)];
}

Thanks again for all your helpful comments and thanks in advance- I am eagerly looking forward for your future comments! =)


Robin

Re: Optimize my code =)

Reply via email to