First of all, I am pleasantly surprised by the rapid influx of helpful responses. The community here seems quite wonderful. In the interests of not cluttering the thread too much, since the advice given here has many commonalities, I will only try to respond once to each type of suggestion.

On Sunday, 21 February 2016 at 16:29:26 UTC, ZombineDev wrote:
The problem is not with ranges, but with the particualr algorithm used for summing. If you look at the docs (http://dlang.org/phobos-prerelease/std_algorithm_iteration.html#.sum) you'll see that if the range has random-access `sum` will use the pair-wise algorithm. About the second and third tests, the problem is with DMD which should not be used when measuring performance (but only for development, because it has fast compile-times).
...
According to `dub --verbose`, my command-line was roughly this:
ldc2 -ofapp -release -O5 -singleobj -w source/app.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/internal.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/iteration.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/package.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/selection.d
../../../../.dub/packages/mir-0.10.1-alpha/source/mir/ndslice/slice.d

It appears that I cannot use the GDC compiler for this particular problem due to it using a comparatively older version of the DMD frontend (I understand Mir requires >=2.068), but I did manage to get LDC working on my system after a bit of work. Since I've been using dub to manage my project, I used the default "release" build type. I also tried compiling manually with LDC, using the -O5 switch you mentioned. These are the results (I increased the iteration count to lessen the noise, the array is now 10000x20, each function is run a thousand times):

DMD LDC (dub) LDC (-release -enable-inlining -O5 -w -singleobj)
sumtest1:12067 ms  6899 ms      1940 ms
sumtest2: 3076 ms  1349 ms       452 ms
sumtest3: 2526 ms   847 ms       434 ms
sumtest4: 5614 ms  1481 ms       452 ms

The sumtest1, 2 and 3 functions are as given in the first post, sumtest4 uses the range.reduce!((a, b) => a + b) approach to enforce naive summation. Much to my satisfaction, the range.reduce version is now exactly as quick as the traditional loop and while function inlining isn't quite perfect, the 4% performance penalty incurred by the 10_000 function calls (or whatever inlined form the function finally takes) is quite acceptable.

I do have to wonder, however, about the default settings of dub in this case. Having gone through its documentation, I might still not have guessed to try the compiler options you provided, thereby losing out on a 2-3x performance improvement. What build options did you use in your dub.json that it managed to translate to the correct compiler switches?

Reply via email to