Ivan Kazmenko:
I am studying the difference between x86 generated code of DMD
and C/C++ compilers on Windows (simply put: why exactly, and by
what margin, DMD-compiled D code is often slower than
GCC-compiled C/C++ equivalent).
Now, I have this simple D program:
-----
immutable int MAX_N = 1_000_000;
void main () {
int [MAX_N] a;
foreach (i; 0..MAX_N)
a[i] = i;
}
-----
(I know there's iota in std.range, and it turns out to be even
slower - but that's a high level function, and I'm trying to
understand the lower-level details now.)
The assembly (dmd -O -release -inline -noboundscheck, then
obj2asm) has the following piece corresponding to the cycle:
-----
L2C: mov -03D0900h[EDX*4][EBP],EDX
mov ECX,EDX
inc EDX
cmp EDX,0F4240h
jb L2C
-----
ldc2 optimizes the useless loop away:
__Dmain:
xorl %eax, %eax
ret
If I modify the code returning some value from the int main:
return a[7];
ldc2 gives the loop code:
LBB0_1:
movl %eax, 12(%esp,%eax,4)
incl %eax
cmpl $1000000, %eax
jne LBB0_1
If I use iota ldc2 copiles the loop to exactly the same asm:
foreach (i; MAX_N.iota)
Bye,
bearophile