On 6/4/18 1:40 PM, Dennis wrote:
On Monday, 4 June 2018 at 15:43:20 UTC, Steven Schveighoffer wrote:
Note, it's not going to necessarily be as efficient, but it's likely to be close.

I've compared the range versions with a for-loop. For integers and longs or high stride amounts the time is roughly equal, but for bytes with low stride amounts it can be up to twice as slow.
https://run.dlang.io/is/BoTflQ

50 Mb array, type = byte, stride = 3, compiler = LDC -O4 -release
For-loop  18 ms
Fill(0)   33 ms
each!     33 ms

With stride = 13:
For-loop  7.3 ms
Fill(0)   7.5 ms
each!     7.8 ms

Interesting!

BTW, do you have cross-module inlining on? I wonder if that makes a difference if you didn't have it on before. (I'm somewhat speaking from ignorance, as I've heard people talk about this limitation, but am not sure exactly when it's enabled)

-Steve

Reply via email to