Re: Optimize my code =)

bearophile Tue, 18 Feb 2014 16:06:07 -0800

Robin:

the existance of move semantics in C++ and is one of thecoolest features since C++11 which increased and simplifiedcodes in many cases enormously for value types just as structsin D.

I guess Andrei doesn't agree with you (and move semantics inC++11 is quite hard to understand).

I also gave scoped imports a try and hoped that they were ableto reduce my executable file and perhaps increase theperformance of my program, none of which was true -> confused.Instead I now have more lines of code and do not see instantlywhat dependencies the module as itself has. So what is thepoint in scoped imports?

Scoped imports in general can't increase performance. Their mainpoint is to avoid importing modules that are needed only bytemplated code. So if you don't instantiate the template, theliker works less and the binary is usually smaller (nomoduleinfo, etc).

Another weird thing is that the result ~= text(tabStr, this[r,c]) in the toString method is much slower than the twofollowing lines of code:
result ~= tabStr;
result ~= to!string(this[r, c]);

Does anybody have an answer to this?

It doesn't look too much weird. In the first case you areallocating and creating larger strings. But I don't think matrixprinting is a bottleneck in a program.

- Then I have finally found out the optimizing commands for theDMD


This is a small but common problem. Perhaps worth fixing.

There are still many ways to further improve the performance.For examply by using LDC


Latest stable and unstable versions of LDC2, try it:
https://github.com/ldc-developers/ldc/releases/tag/v0.12.1
https://github.com/ldc-developers/ldc/releases/tag/v0.13.0-alpha1

on certain hardwares, paralellism and perhaps by implementingCOW with no GC dependencies. And of course I may miss manyother possible optimization features of D.

Matrix multiplication can be improved a lot tiling the matrix (orbetter using a cache oblivious algorithm), using SSE/AVX2, usingmultiple cores, etc. As starting point you can try to usestd.parallelism. It could speed up your code on 4 cores with avery limited amount of added code.


Bye,
bearophile

Re: Optimize my code =)

Reply via email to