On Sunday, 13 August 2017 at 08:43:29 UTC, amfvcg wrote:
On Sunday, 13 August 2017 at 08:33:53 UTC, Petar Kirov [ZombineDev] wrote:

With Daniel's latest version (
http://forum.dlang.org/post/mailman.5963.1502612885.31550.digitalmars-d-le...@puremagic.com
 )

$ ldc2 -O3 --release sum_subranges2.d
$ ./sum_subranges2
210 ms, 838 μs, and 8 hnsecs
50000000

Great!!! And that's what I was hoping for.

So the conclusion is:

use the latest ldc that's out there.

Thank you Petar, thank you Daniel. (I cannot change the subject to SOLVED, can I?)

Btw. the idiomatic version of this d sample looks how I imagined it should!

There's one especially interesting result:

This instantiation:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result, uint)

of the following function:

auto sum_subranges(T)(T input, uint range)
{
    import std.range : chunks, ElementType, array;
    import std.algorithm : map;
    return input.chunks(range).map!(sum);
}

gets optimized with LDC to:
  push rax
  test edi, edi
  je .LBB2_2
  mov edx, edi
  mov rax, rsi
  pop rcx
  ret
.LBB2_2:
  lea rsi, [rip + .L.str.3]
  lea rcx, [rip + .L.str]
  mov edi, 45
  mov edx, 89
  mov r8d, 6779
  call _d_assert_msg@PLT

I.e. the compiler turned a O(n) algorithm to O(1), which is quite neat. It is also quite surprising to me that it looks like even dmd managed to do a similar optimization:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result, uint):
    push   rbp
    mov    rbp,rsp
    sub    rsp,0x30
    mov    DWORD PTR [rbp-0x8],edi
    mov    r9d,DWORD PTR [rbp-0x8]
    test   r9,r9
    jne    41
    mov    r8d,0x1b67
    mov    ecx,0x0
    mov    eax,0x61
    mov    rdx,rax
    mov    QWORD PTR [rbp-0x28],rdx
    mov    edx,0x0
    mov    edi,0x2d
    mov    rsi,rdx
    mov    rdx,QWORD PTR [rbp-0x28]
    call   41
41: mov    QWORD PTR [rbp-0x20],rsi
    mov    QWORD PTR [rbp-0x18],r9
    mov    rdx,QWORD PTR [rbp-0x18]
    mov    rax,QWORD PTR [rbp-0x20]
    mov    rsp,rbp algorithms a
    pop    rbp
    ret

Moral of the story: templates + ranges are an awesome combination.

Reply via email to