On 6/4/18 1:40 PM, Dennis wrote:
On Monday, 4 June 2018 at 15:43:20 UTC, Steven Schveighoffer wrote:
Note, it's not going to necessarily be as efficient, but it's likely
to be close.
I've compared the range versions with a for-loop. For integers and longs
or high stride amounts the time is roughly equal, but for bytes with low
stride amounts it can be up to twice as slow.
https://run.dlang.io/is/BoTflQ
50 Mb array, type = byte, stride = 3, compiler = LDC -O4 -release
For-loop 18 ms
Fill(0) 33 ms
each! 33 ms
With stride = 13:
For-loop 7.3 ms
Fill(0) 7.5 ms
each! 7.8 ms
Interesting!
BTW, do you have cross-module inlining on? I wonder if that makes a
difference if you didn't have it on before. (I'm somewhat speaking from
ignorance, as I've heard people talk about this limitation, but am not
sure exactly when it's enabled)
-Steve