Andrei Alexandrescu: > My guess is that if you turn that off, the differences won't be as large > (or even detectable for certain ranges of N).
The array bounds aren't controlled, the code is compiled with -O -release -inline. Do you see array bound controls in the asm code at the bottom of my post? > Probably blocking will bring even more mileage (but again that depends > on N). Yes, blocking may help. And using SSE instructions may help some more. The end result may be hundred or more times faster than the naive code in D :-) Bye, bearophile