Re: D outperformed by C++, what am I doing wrong?

amfvcg via Digitalmars-d-learn Sun, 13 Aug 2017 02:21:11 -0700

On Sunday, 13 August 2017 at 09:08:14 UTC, Petar Kirov[ZombineDev] wrote:


There's one especially interesting result:


This instantiation:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result,uint)


of the following function:

auto sum_subranges(T)(T input, uint range)
{
    import std.range : chunks, ElementType, array;
    import std.algorithm : map;
    return input.chunks(range).map!(sum);
}

gets optimized with LDC to:
  push rax
  test edi, edi
  je .LBB2_2
  mov edx, edi
  mov rax, rsi
  pop rcx
  ret
.LBB2_2:
  lea rsi, [rip + .L.str.3]
  lea rcx, [rip + .L.str]
  mov edi, 45
  mov edx, 89
  mov r8d, 6779
  call _d_assert_msg@PLT

I.e. the compiler turned a O(n) algorithm to O(1), which isquite neat. It is also quite surprising to me that it lookslike even dmd managed to do a similar optimization:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result,uint):

    push   rbp
    mov    rbp,rsp
    sub    rsp,0x30
    mov    DWORD PTR [rbp-0x8],edi
    mov    r9d,DWORD PTR [rbp-0x8]
    test   r9,r9
    jne    41
    mov    r8d,0x1b67
    mov    ecx,0x0
    mov    eax,0x61
    mov    rdx,rax
    mov    QWORD PTR [rbp-0x28],rdx
    mov    edx,0x0
    mov    edi,0x2d
    mov    rsi,rdx
    mov    rdx,QWORD PTR [rbp-0x28]
    call   41
41: mov    QWORD PTR [rbp-0x20],rsi
    mov    QWORD PTR [rbp-0x18],r9
    mov    rdx,QWORD PTR [rbp-0x18]
    mov    rax,QWORD PTR [rbp-0x20]
    mov    rsp,rbp algorithms a
    pop    rbp
    ret

Moral of the story: templates + ranges are an awesomecombination.

Change the parameter for this array size to be taken from stdinand I assume that these optimizations will go away.

Re: D outperformed by C++, what am I doing wrong?

Reply via email to