On 12 March 2012 17:49, Iain Buclaw <ibuc...@ubuntu.com> wrote: > On 12 March 2012 17:22, Manu <turkey...@gmail.com> wrote: >> On 12 March 2012 19:03, Iain Buclaw <ibuc...@ubuntu.com> wrote: >>> >>> On 12 March 2012 00:44, Manu <turkey...@gmail.com> wrote: >>> > On 12 March 2012 00:58, Robert Jacques <sandf...@jhu.edu> wrote: >>> >> >>> >> That's an argument for using the right register for the job. And we can >>> >> / >>> >> will be doing this on x86-64, as other compilers have already done. >>> >> Manu was >>> >> arguing that MRV were somehow special and had mystical optimization >>> >> potential. That's simply not true. >>> > >>> > >>> > Here's some tests for you: >>> > >>> > // first test that the argument registers allocate as expected... >>> > int gprtest(int x, int y, int z) >>> > { >>> > return x+y+z; >>> > } >>> > >>> > Perfect, ints pass in register sequence, return in r0, no memory >>> > access >>> > add r0, r0, r1 >>> > add r0, r0, r2 >>> > bx lr >>> > >>> > float fptest(float x, float y, float z) >>> > { >>> > return x+y+z; >>> > } >>> > >>> > Same for floats >>> > fadds s0, s0, s1 >>> > fadds s0, s0, s2 >>> > bx lr >>> > >>> > >>> > // Some MRV tests... >>> > auto mrv1(int x, int z) >>> > { >>> > return Tuple!(int, int)(x, z); >>> > } >>> > >>> > Simple case, 2 ints >>> > FAIL, stores the 2 arguments it received in regs straight to output >>> > struct >>> > pointer supplied >>> > stmia r0, {r1, r2} >>> > bx lr >>> > >>> > >>> > auto mrv2(int x, float y, byte z) >>> > { >>> > return Tuple!(int, float, byte)(x, y, z); >>> > } >>> > >>> > Different typed things >>> > EPIC FAIL >>> > stmfd sp!, {r4, r5} >>> > mov ip, #0 >>> > sub sp, sp, #24 >>> > mov r4, r2 >>> > str ip, [sp, #12] >>> > str ip, [sp, #20] >>> > ldr r2, .L27 >>> > add ip, sp, #24 >>> > mov r3, r0 >>> > mov r5, r1 >>> > str r2, [sp, #16] @ float >>> > ldmdb ip, {r0, r1, r2} >>> > stmia r3, {r0, r1, r2} >>> > fsts s0, [r3, #4] >>> > stmia sp, {r0, r1, r2} >>> > str r5, [r3, #0] >>> > strb r4, [r3, #8] >>> > mov r0, r3 >>> > add sp, sp, #24 >>> > ldmfd sp!, {r4, r5} >>> > bx lr >>> > >>> > >>> > auto range(int *p) >>> > { >>> > return p[0..1]; >>> > } >>> > >>> > Range >>> > SURPRISE FAIL, even a range is returned as a struct! O_O >>> > mov r2, #1 >>> > str r2, [r0, #0] >>> > str r1, [r0, #4] >>> > bx lr >>> > >>> > >>> > So the D ABI is a complete shambles on ARM! >>> > Unsurprisingly, it all just follows the return struct by-val ABI, which >>> > is >>> > to write it to the stack unconditionally. And sadly, it even thinks the >>> > internal types like range+delegate are just a struct by-val, and >>> > completely >>> > ruins those! >>> > >>> > Let's try again with x86... >>> > >>> > >>> > auto mrv1(int x, int z) >>> > { >>> > return Tuple!(int, int)(x, z); >>> > } >>> > >>> > Returns in eax/edx as expected >>> > movl 4(%esp), %eax >>> > movl 8(%esp), %edx >>> > >>> > >>> > auto mrv2(int x, float y, int z) >>> > { >>> > return Tuple!(int, float, int)(x, y, z); >>> > } >>> > >>> > FAIL! All written to a struct rather than returning in eax,edx,st0 .. >>> > This >>> > is C ABI baggage, D can do better. >>> > movl 4(%esp), %eax >>> > movl 8(%esp), %edx >>> > movl %edx, (%eax) >>> > movl 12(%esp), %edx >>> > movl %edx, 4(%eax) >>> > movl 16(%esp), %edx >>> > movl %edx, 8(%eax) >>> > ret $4 >>> > >>> > >>> > auto range(int *p) >>> > { >>> > return p[0..1]; >>> > } >>> > >>> > Obviously, the small struct optimisation allows this to work properly >>> > movl $1, %eax >>> > movl 4(%esp), %edx >>> > ret >>> > >>> > >>> > All that said, x86 isn't a good test case, since all args are ALWAYS >>> > passed >>> > on the stack. x64 would be a much better test since it actually has arg >>> > registers, but I'm on windows, so no x64 for me... >>> >>> >>> What compiler flags are you using here? For x86, I would have thought >>> that small structs (< 8 bytes) would be passed back in registers... >>> only speculating though - will need to see what codegen is being built >>> from the D code provided to be sure. >> >> >> -S -O2 -msse2 >> And as expected, 8byte structs were returned packed in registers from my >> examples above. That's a traditional x86 ABI hack which conveniently allows >> delegates+ranges to work well on x86, but as you can see, they're proper >> broken on other architectures. > > OK, -msse2 is not an ARM target option. :~) > > > Looking around, the "Procedure Call Standard for the ARM Architecture" > specifically says (section 5.4: Result Return): > > "A Composite Type not larger than 4 bytes is returned in R0." > > "A Composite Type larger than 4 bytes ... is stored in memory at an > address passed as an extra argument when the function was called ..." > > > > Feel free to correct me if that document is slightly out of date. > > > -- > Iain Buclaw > > *(p < e ? p++ : p) = (c & 0x0f) + '0';
Link: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';