Re: Simple performance question from a newcomer

dextorious via Digitalmars-d-learn Mon, 22 Feb 2016 07:46:54 -0800

First of all, I am pleasantly surprised by the rapid influx ofhelpful responses. The community here seems quite wonderful. Inthe interests of not cluttering the thread too much, since theadvice given here has many commonalities, I will only try torespond once to each type of suggestion.


On Sunday, 21 February 2016 at 16:29:26 UTC, ZombineDev wrote:

The problem is not with ranges, but with the particualralgorithm used for summing. If you look at the docs(http://dlang.org/phobos-prerelease/std_algorithm_iteration.html#.sum) you'll see that if the range has random-access `sum` will use the pair-wise algorithm. About the second and third tests, the problem is with DMD which should not be used when measuring performance (but only for development, because it has fast compile-times).
...
According to `dub --verbose`, my command-line was roughly this:
ldc2 -ofapp -release -O5 -singleobj -w source/app.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/internal.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/iteration.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/package.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/selection.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/slice.d

It appears that I cannot use the GDC compiler for this particularproblem due to it using a comparatively older version of the DMDfrontend (I understand Mir requires >=2.068), but I did manage toget LDC working on my system after a bit of work. Since I've beenusing dub to manage my project, I used the default "release"build type. I also tried compiling manually with LDC, using the-O5 switch you mentioned. These are the results (I increased theiteration count to lessen the noise, the array is now 10000x20,each function is run a thousand times):

DMD LDC (dub) LDC (-release -enable-inlining-O5 -w -singleobj)

sumtest1:12067 ms  6899 ms      1940 ms
sumtest2: 3076 ms  1349 ms       452 ms
sumtest3: 2526 ms   847 ms       434 ms
sumtest4: 5614 ms  1481 ms       452 ms

The sumtest1, 2 and 3 functions are as given in the first post,sumtest4 uses the range.reduce!((a, b) => a + b) approach toenforce naive summation. Much to my satisfaction, therange.reduce version is now exactly as quick as the traditionalloop and while function inlining isn't quite perfect, the 4%performance penalty incurred by the 10_000 function calls (orwhatever inlined form the function finally takes) is quiteacceptable.

I do have to wonder, however, about the default settings of dubin this case. Having gone through its documentation, I mightstill not have guessed to try the compiler options you provided,thereby losing out on a 2-3x performance improvement. What buildoptions did you use in your dub.json that it managed to translateto the correct compiler switches?

Re: Simple performance question from a newcomer

Reply via email to