On Sunday, 13 August 2017 at 08:43:29 UTC, amfvcg wrote:
On Sunday, 13 August 2017 at 08:33:53 UTC, Petar Kirov
[ZombineDev] wrote:
With Daniel's latest version (
http://forum.dlang.org/post/mailman.5963.1502612885.31550.digitalmars-d-le...@puremagic.com
)
$ ldc2 -O3 --release sum_subranges2.d
$ ./sum_subranges2
210 ms, 838 μs, and 8 hnsecs
50000000
Great!!! And that's what I was hoping for.
So the conclusion is:
use the latest ldc that's out there.
Thank you Petar, thank you Daniel. (I cannot change the subject
to SOLVED, can I?)
Btw. the idiomatic version of this d sample looks how I
imagined it should!
There's one especially interesting result:
This instantiation:
sum_subranges(std.range.iota!(int, int).iota(int, int).Result,
uint)
of the following function:
auto sum_subranges(T)(T input, uint range)
{
import std.range : chunks, ElementType, array;
import std.algorithm : map;
return input.chunks(range).map!(sum);
}
gets optimized with LDC to:
push rax
test edi, edi
je .LBB2_2
mov edx, edi
mov rax, rsi
pop rcx
ret
.LBB2_2:
lea rsi, [rip + .L.str.3]
lea rcx, [rip + .L.str]
mov edi, 45
mov edx, 89
mov r8d, 6779
call _d_assert_msg@PLT
I.e. the compiler turned a O(n) algorithm to O(1), which is quite
neat. It is also quite surprising to me that it looks like even
dmd managed to do a similar optimization:
sum_subranges(std.range.iota!(int, int).iota(int, int).Result,
uint):
push rbp
mov rbp,rsp
sub rsp,0x30
mov DWORD PTR [rbp-0x8],edi
mov r9d,DWORD PTR [rbp-0x8]
test r9,r9
jne 41
mov r8d,0x1b67
mov ecx,0x0
mov eax,0x61
mov rdx,rax
mov QWORD PTR [rbp-0x28],rdx
mov edx,0x0
mov edi,0x2d
mov rsi,rdx
mov rdx,QWORD PTR [rbp-0x28]
call 41
41: mov QWORD PTR [rbp-0x20],rsi
mov QWORD PTR [rbp-0x18],r9
mov rdx,QWORD PTR [rbp-0x18]
mov rax,QWORD PTR [rbp-0x20]
mov rsp,rbp algorithms a
pop rbp
ret
Moral of the story: templates + ranges are an awesome combination.