P.S. After I posted the code, I took a closer look at the disassembly and found that gdc wasn't generating the best code for the parallel foreach loop body. I haven't fully traced the cause yet, but I did find a simple optimization (arguably a micro-optimization): updating the subtotal inside the inner loop is a bit inefficient, because the subtotal is outside the loop body and, due to the way std.parallelism is implemented, is passed by reference to the loop body. This caused gdc not to enregister the subtotal, so incrementing it requires a cache roundtrip at best, a full RAM roundtrip at worst.
So I introduced a local count variable for incrementing, and only assign to the subtotal array at the end of the block: -------snip------- // real 0m0.240s // user 0m1.045s // sys 0m0.059s size_t lineCount4(string filename) { import std.algorithm.comparison : min; import std.algorithm.iteration : sum; import std.mmfile; import std.parallelism; auto f = new MmFile(filename); auto data = cast(ubyte[]) f[]; size_t[] subtotals; enum blockSize = 16384; if (data.length == 0) return 0; subtotals.length = 1 + (data.length - 1) / blockSize; foreach (j, ref subtotal; parallel(subtotals)) { size_t end = min(blockSize*(j+1), data.length); size_t c = 0; foreach (i; blockSize*j .. end) { if (data[i] == '\n') c++; } subtotal = c; } return subtotals.sum; } -------snip------- A look at the disassembly showed that gdc has enregistered the subtotal, and thereby able to eliminate a branch from the inner loop using a sete instruction and an add for the if statement. The measured performance is only marginally better, though, and doesn't change the overall performance in a significant way. But I thought this might be a useful tip for people who want to squeeze out the last drop of juice from their CPUs. ;-) Next, I'll have to look into why gdc didn't unroll the inner loop, like I thought it would. But I'm out of time now, so that will have to wait for another day. T -- The peace of mind---from knowing that viruses which exploit Microsoft system vulnerabilities cannot touch Linux---is priceless. -- Frustrated system administrator.