Ivan Kazmenko:

I am studying the difference between x86 generated code of DMD and C/C++ compilers on Windows (simply put: why exactly, and by what margin, DMD-compiled D code is often slower than GCC-compiled C/C++ equivalent).

Now, I have this simple D program:

-----
immutable int MAX_N = 1_000_000;
void main () {
    int [MAX_N] a;
    foreach (i; 0..MAX_N)
        a[i] = i;
}
-----

(I know there's iota in std.range, and it turns out to be even slower - but that's a high level function, and I'm trying to understand the lower-level details now.)

The assembly (dmd -O -release -inline -noboundscheck, then obj2asm) has the following piece corresponding to the cycle:

-----
L2C:            mov     -03D0900h[EDX*4][EBP],EDX
                mov     ECX,EDX
                inc     EDX
                cmp     EDX,0F4240h
                jb      L2C
-----

ldc2 optimizes the useless loop away:


__Dmain:
        xorl    %eax, %eax
        ret


If I modify the code returning some value from the int main:
    return a[7];


ldc2 gives the loop code:

LBB0_1:
        movl    %eax, 12(%esp,%eax,4)
        incl    %eax
        cmpl    $1000000, %eax
        jne     LBB0_1


If I use iota ldc2 copiles the loop to exactly the same asm:

foreach (i; MAX_N.iota)

Bye,
bearophile

Reply via email to