https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100219

            Bug ID: 100219
           Summary: Arm/Cortex-M: Suboptimal code returning unaligned
                    struct with non-empty stack frame
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthijs at stdin dot nl
  Target Milestone: ---

Consider the program below, which deals with functions returning a struct of
two members, either using a literal value or by forwarding the return value
from another function. When the struct has no alignment, this results in
suboptimal code that breaks the struct (stored in a single registrer) apart
into its members and reassembles them into the struct into a single register
again, where it could just have done absolutely nothing. Giving the struct some
alignment somehow prevents this problem from occuring.

Consider this program:

    $ cat Foo.c
    struct Result { char a, b; }
    #if defined(ALIGN)
    __attribute((aligned(ALIGN)))__
    #endif
    ;

    struct Result other(const int*);

    struct Result func1() {
      int x;
      return other(&x);
    }

    struct Result func2() {
      struct Result y = {0x12, 0x34};
      return y;
    }

    struct Result func3() {
      return other(0);
    }

Which produces the following code:

    $ arm-linux-gnueabi-gcc-10 --version
    arm-linux-gnueabi-gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
    $ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3
~/Foo.c && objdump -d Foo.o

    00000000 <func1>:
       0:   b500            push    {lr}
       2:   b083            sub     sp, #12
       4:   a801            add     r0, sp, #4
       6:   f7ff fffe       bl      0 <other>
       a:   4603            mov     r3, r0
       c:   b2da            uxtb    r2, r3
       e:   2000            movs    r0, #0
      10:   f362 0007       bfi     r0, r2, #0, #8
      14:   f3c3 2307       ubfx    r3, r3, #8, #8
      18:   f363 200f       bfi     r0, r3, #8, #8
      1c:   b003            add     sp, #12
      1e:   f85d fb04       ldr.w   pc, [sp], #4
      22:   bf00            nop

    00000024 <func2>:
      24:   f243 4312       movw    r3, #13330      ; 0x3412
      28:   f003 0212       and.w   r2, r3, #18
      2c:   2000            movs    r0, #0
      2e:   f362 0007       bfi     r0, r2, #0, #8
      32:   0a1b            lsrs    r3, r3, #8
      34:   b082            sub     sp, #8
      36:   f363 200f       bfi     r0, r3, #8, #8
      3a:   b002            add     sp, #8
      3c:   4770            bx      lr
      3e:   bf00            nop

    00000040 <func3>:
      40:   b082            sub     sp, #8
      42:   2000            movs    r0, #0
      44:   b002            add     sp, #8
      46:   f7ff bffe       b.w     0 <other>
      4a:   bf00            nop


Especially note func2, which correctly builds the struct using a single word
literal, and then continues to break it apart and rebuild it.

Note that I added -fno-stack-protector to make the generated code more consise,
but the problem occurs even without this option.

Somehow, the alignment influences this, since adding some alignment makes the
problem disappear:

    $ arm-linux-gnueabi-gcc-10 -fno-stack-protector -mcpu=cortex-m4 -c -O3
~/Foo.c -DALIGN=2 && objdump -d Foo.o

    Foo.o:     file format elf32-littlearm


    Disassembly of section .text:

    00000000 <func1>:
       0:   b500            push    {lr}
       2:   b083            sub     sp, #12
       4:   a801            add     r0, sp, #4
       6:   f7ff fffe       bl      0 <other>
       a:   b003            add     sp, #12
       c:   f85d fb04       ldr.w   pc, [sp], #4

    00000010 <func2>:
      10:   f243 4012       movw    r0, #13330      ; 0x3412
      14:   4770            bx      lr
      16:   bf00            nop

    00000018 <func3>:
      18:   2000            movs    r0, #0
      1a:   f7ff bffe       b.w     0 <other>
      1e:   bf00            nop


Other things I've observed:
 - When using ALIGN=2 or ALIGN=4, the problem disappears as shown above.
ALIGN=1 is equivalent to no alignment. Using ALIGN=8 also makes the problem
disappear, but it seams this cause the return value to be passed in memory,
rather than in r0 directly.
 - Using -mcpu=arm8, or arm7tdmi, or some other arm cpus I tried, the problem
disappears. With all cortex variants I tried the problem stays, though
sometimes it seems slightly less severe.
 - I could not reproduce this on x86_64.
 - Using a struct with just 1 char, the problem disappears.
 - Using a struct with 4 chars, the problem stays (and becomes more pronounced
because there's more work to rebuild the struct).
 - Using a struct with 2 shorts, the problem disappears for func2, but stays
for func1.
 - Writing something equivalent in C++, the problem also appears (I originally
saw this problem in C++ and then tried reproducing in C).
 - When running with -Os, the problem disappears for func2 but stays for func1.

Also note that in almost all cases (except with ALIGN=4 and no stack
variables), the stack frame size seems to be 8 bytes bigger then I'd expect,
and in some cases there are some pointless add/sub instructions messing with
the stack for no (apparent to me) reason.

Reply via email to