
thank you for your code improvements and suggestions.

I really like the foreach loop in D as well as the slight (but existing) performance boost over conventional for loops. =)

Another success of the changes I have made is that I have achieved to further improve the matrix multiplication performance from 3.6 seconds for two 1000x1000 matrices to 1.9 seconds which is already very close to java and c++ with about 1.3 - 1.5 seconds.

The key to victory was pointer arithmetics as I notices that I have used them in the C++ implementation, too. xD

The toString implementation has improved its performance slightly due to the changes you have mentioned above: 1.37 secs -> 1.29 secs

I have also adjusted all operator overloadings to the "new style" - I just haven't known about that "new style" until then - thanks!

I will just post the whole code again so that you can see what I have changed.

Keep in mind that I am still using DMD as compiler and thus performance may still raise once I use another compiler!

All in all I am very happy with the code analysis and its improvements! However, there were some strange things of which I am very confused ...

void allocationTest() {
        writeln("allocationTest ...");
        auto m1 = Matrix!double(10000, 10000);
        { auto m2 = Matrix!double(10000, 10000); }
        { auto m2 = Matrix!double(10000, 10000); }
        { auto m2 = Matrix!double(10000, 10000); }
        //{ auto m2 = Matrix!double(10000, 10000); }

This is the most confusing code snippet. I have just changed the whole allocation for all m1 and m2 from new Matrix!double (on heap) to Matrix!double (on stack) and the performance dropped significantly - the benchmarked timed raised from 2,3 seconds to over 25 seconds!! Now look at the code above. When I leave it as it is now, the code requires about 2,9 seconds runtime, however, when enabeling the currently out-commented line the code takes 14 to 25 seconds longer! mind blown ... 0.o This is extremely confusion as I allocate these matrices on the stack and since I have allocated them within their own scoped-block they should instantly release their memory again so that no memory consumption takes place for more than 2 matrices at the same time. This just wasn't the fact as far as I have tested it.

Another strange things was that the new opEquals implementation:

        bool opEquals(const ref Matrix other) const pure nothrow {
                if (this.dim != other.dim) {
                        return false;
                foreach (immutable i; 0 .. this.dim.size) {
                        if (this.data[i] != other.data[i]) return false;
                return true;

is actually about 20% faster than the one you have suggested. With the single line of "return (this.dim == other.dim && this.data[] == other.data[]).

The last thing I haven't quite understood is that I tried to replace

auto t = Matrix(other).transposeAssign();

in the matrix multiplication algorithm with its shorter and clearer form

auto t = other.transpose(); // sorry for the nasty '()', but I like them! :/

This however gave me wonderful segmentation faults on runtime while using the matrix multiplication ...

And here is the complete and improved code:

Thanks in advance for helping me! =)


