On 12 March 2012 19:03, Iain Buclaw <ibuc...@ubuntu.com> wrote: > On 12 March 2012 00:44, Manu <turkey...@gmail.com> wrote: > > On 12 March 2012 00:58, Robert Jacques <sandf...@jhu.edu> wrote: > >> > >> That's an argument for using the right register for the job. And we can > / > >> will be doing this on x86-64, as other compilers have already done. > Manu was > >> arguing that MRV were somehow special and had mystical optimization > >> potential. That's simply not true. > > > > > > Here's some tests for you: > > > > // first test that the argument registers allocate as expected... > > int gprtest(int x, int y, int z) > > { > > return x+y+z; > > } > > > > Perfect, ints pass in register sequence, return in r0, no memory > access > > add r0, r0, r1 > > add r0, r0, r2 > > bx lr > > > > float fptest(float x, float y, float z) > > { > > return x+y+z; > > } > > > > Same for floats > > fadds s0, s0, s1 > > fadds s0, s0, s2 > > bx lr > > > > > > // Some MRV tests... > > auto mrv1(int x, int z) > > { > > return Tuple!(int, int)(x, z); > > } > > > > Simple case, 2 ints > > FAIL, stores the 2 arguments it received in regs straight to output > struct > > pointer supplied > > stmia r0, {r1, r2} > > bx lr > > > > > > auto mrv2(int x, float y, byte z) > > { > > return Tuple!(int, float, byte)(x, y, z); > > } > > > > Different typed things > > EPIC FAIL > > stmfd sp!, {r4, r5} > > mov ip, #0 > > sub sp, sp, #24 > > mov r4, r2 > > str ip, [sp, #12] > > str ip, [sp, #20] > > ldr r2, .L27 > > add ip, sp, #24 > > mov r3, r0 > > mov r5, r1 > > str r2, [sp, #16] @ float > > ldmdb ip, {r0, r1, r2} > > stmia r3, {r0, r1, r2} > > fsts s0, [r3, #4] > > stmia sp, {r0, r1, r2} > > str r5, [r3, #0] > > strb r4, [r3, #8] > > mov r0, r3 > > add sp, sp, #24 > > ldmfd sp!, {r4, r5} > > bx lr > > > > > > auto range(int *p) > > { > > return p[0..1]; > > } > > > > Range > > SURPRISE FAIL, even a range is returned as a struct! O_O > > mov r2, #1 > > str r2, [r0, #0] > > str r1, [r0, #4] > > bx lr > > > > > > So the D ABI is a complete shambles on ARM! > > Unsurprisingly, it all just follows the return struct by-val ABI, which > is > > to write it to the stack unconditionally. And sadly, it even thinks the > > internal types like range+delegate are just a struct by-val, and > completely > > ruins those! > > > > Let's try again with x86... > > > > > > auto mrv1(int x, int z) > > { > > return Tuple!(int, int)(x, z); > > } > > > > Returns in eax/edx as expected > > movl 4(%esp), %eax > > movl 8(%esp), %edx > > > > > > auto mrv2(int x, float y, int z) > > { > > return Tuple!(int, float, int)(x, y, z); > > } > > > > FAIL! All written to a struct rather than returning in eax,edx,st0 .. > This > > is C ABI baggage, D can do better. > > movl 4(%esp), %eax > > movl 8(%esp), %edx > > movl %edx, (%eax) > > movl 12(%esp), %edx > > movl %edx, 4(%eax) > > movl 16(%esp), %edx > > movl %edx, 8(%eax) > > ret $4 > > > > > > auto range(int *p) > > { > > return p[0..1]; > > } > > > > Obviously, the small struct optimisation allows this to work properly > > movl $1, %eax > > movl 4(%esp), %edx > > ret > > > > > > All that said, x86 isn't a good test case, since all args are ALWAYS > passed > > on the stack. x64 would be a much better test since it actually has arg > > registers, but I'm on windows, so no x64 for me... > > > What compiler flags are you using here? For x86, I would have thought > that small structs (< 8 bytes) would be passed back in registers... > only speculating though - will need to see what codegen is being built > from the D code provided to be sure. >
-S -O2 -msse2 And as expected, 8byte structs were returned packed in registers from my examples above. That's a traditional x86 ABI hack which conveniently allows delegates+ranges to work well on x86, but as you can see, they're proper broken on other architectures.