https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100219
Bug ID: 100219 Summary: Arm/Cortex-M: Suboptimal code returning unaligned struct with non-empty stack frame Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- Consider the program below, which deals with functions returning a struct of two members, either using a literal value or by forwarding the return value from another function. When the struct has no alignment, this results in suboptimal code that breaks the struct (stored in a single registrer) apart into its members and reassembles them into the struct into a single register again, where it could just have done absolutely nothing. Giving the struct some alignment somehow prevents this problem from occuring. Consider this program: $ cat Foo.c struct Result { char a, b; } #if defined(ALIGN) __attribute((aligned(ALIGN)))__ #endif ; struct Result other(const int*); struct Result func1() { int x; return other(&x); } struct Result func2() { struct Result y = {0x12, 0x34}; return y; } struct Result func3() { return other(0); } Which produces the following code: $ arm-linux-gnueabi-gcc-10 --version arm-linux-gnueabi-gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0 $ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3 ~/Foo.c && objdump -d Foo.o 00000000 <func1>: 0: b500 push {lr} 2: b083 sub sp, #12 4: a801 add r0, sp, #4 6: f7ff fffe bl 0 <other> a: 4603 mov r3, r0 c: b2da uxtb r2, r3 e: 2000 movs r0, #0 10: f362 0007 bfi r0, r2, #0, #8 14: f3c3 2307 ubfx r3, r3, #8, #8 18: f363 200f bfi r0, r3, #8, #8 1c: b003 add sp, #12 1e: f85d fb04 ldr.w pc, [sp], #4 22: bf00 nop 00000024 <func2>: 24: f243 4312 movw r3, #13330 ; 0x3412 28: f003 0212 and.w r2, r3, #18 2c: 2000 movs r0, #0 2e: f362 0007 bfi r0, r2, #0, #8 32: 0a1b lsrs r3, r3, #8 34: b082 sub sp, #8 36: f363 200f bfi r0, r3, #8, #8 3a: b002 add sp, #8 3c: 4770 bx lr 3e: bf00 nop 00000040 <func3>: 40: b082 sub sp, #8 42: 2000 movs r0, #0 44: b002 add sp, #8 46: f7ff bffe b.w 0 <other> 4a: bf00 nop Especially note func2, which correctly builds the struct using a single word literal, and then continues to break it apart and rebuild it. Note that I added -fno-stack-protector to make the generated code more consise, but the problem occurs even without this option. Somehow, the alignment influences this, since adding some alignment makes the problem disappear: $ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3 ~/Foo.c -DALIGN=2 && objdump -d Foo.o Foo.o: file format elf32-littlearm Disassembly of section .text: 00000000 <func1>: 0: b500 push {lr} 2: b083 sub sp, #12 4: a801 add r0, sp, #4 6: f7ff fffe bl 0 <other> a: b003 add sp, #12 c: f85d fb04 ldr.w pc, [sp], #4 00000010 <func2>: 10: f243 4012 movw r0, #13330 ; 0x3412 14: 4770 bx lr 16: bf00 nop 00000018 <func3>: 18: 2000 movs r0, #0 1a: f7ff bffe b.w 0 <other> 1e: bf00 nop Other things I've observed: - When using ALIGN=2 or ALIGN=4, the problem disappears as shown above. ALIGN=1 is equivalent to no alignment. Using ALIGN=8 also makes the problem disappear, but it seams this cause the return value to be passed in memory, rather than in r0 directly. - Using -mcpu=arm8, or arm7tdmi, or some other arm cpus I tried, the problem disappears. With all cortex variants I tried the problem stays, though sometimes it seems slightly less severe. - I could not reproduce this on x86_64. - Using a struct with just 1 char, the problem disappears. - Using a struct with 4 chars, the problem stays (and becomes more pronounced because there's more work to rebuild the struct). - Using a struct with 2 shorts, the problem disappears for func2, but stays for func1. - Writing something equivalent in C++, the problem also appears (I originally saw this problem in C++ and then tried reproducing in C). - When running with -Os, the problem disappears for func2 but stays for func1. Also note that in almost all cases (except with ALIGN=4 and no stack variables), the stack frame size seems to be 8 bytes bigger then I'd expect, and in some cases there are some pointless add/sub instructions messing with the stack for no (apparent to me) reason.