If you do want to test the differences between the range approach and the loop approach, something like:
auto sumtest4(Range)(Range range) @safe pure {
        return range.reduce!((a, b) => a + b);
is a more fair comparison. I get results within 15% of sumtest2 with this using dmd. I think with ldc this would be identical, but the version in homebrew is too old to compile this.

Using LDC with the mir version of ndslice so it compiles, and the following code:
foreach (unused; 0..times) {
        for (int i=0; i<N; ++i) {
                res4[i] = sumtest4(f[i]);
t3 = sw.peek().msecs;


auto sumtest4(Range)(Range range) {
        return range.reduce!((a, b) => a + b);

I get:
145 ms
19 ms
19 ms
19 ms

So, with LDC, there is no performance hit doing this. The only performance hit is when .sum uses a different algorithm for a more accurate result. Also, the LDC version appears to be roughly 5x faster than the DMD version.

