Robin:
the existance of move semantics in C++ and is one of the
coolest features since C++11 which increased and simplified
codes in many cases enormously for value types just as structs
in D.
I guess Andrei doesn't agree with you (and move semantics in
C++11 is quite hard to understand).
I also gave scoped imports a try and hoped that they were able
to reduce my executable file and perhaps increase the
performance of my program, none of which was true -> confused.
Instead I now have more lines of code and do not see instantly
what dependencies the module as itself has. So what is the
point in scoped imports?
Scoped imports in general can't increase performance. Their main
point is to avoid importing modules that are needed only by
templated code. So if you don't instantiate the template, the
liker works less and the binary is usually smaller (no
moduleinfo, etc).
Another weird thing is that the result ~= text(tabStr, this[r,
c]) in the toString method is much slower than the two
following lines of code:
result ~= tabStr;
result ~= to!string(this[r, c]);
Does anybody have an answer to this?
It doesn't look too much weird. In the first case you are
allocating and creating larger strings. But I don't think matrix
printing is a bottleneck in a program.
- Then I have finally found out the optimizing commands for the
DMD
This is a small but common problem. Perhaps worth fixing.
There are still many ways to further improve the performance.
For examply by using LDC
Latest stable and unstable versions of LDC2, try it:
https://github.com/ldc-developers/ldc/releases/tag/v0.12.1
https://github.com/ldc-developers/ldc/releases/tag/v0.13.0-alpha1
on certain hardwares, paralellism and perhaps by implementing
COW with no GC dependencies. And of course I may miss many
other possible optimization features of D.
Matrix multiplication can be improved a lot tiling the matrix (or
better using a cache oblivious algorithm), using SSE/AVX2, using
multiple cores, etc. As starting point you can try to use
std.parallelism. It could speed up your code on 4 cores with a
very limited amount of added code.
Bye,
bearophile