[Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64

jwerner at chromium dot org via Gcc-bugs Thu, 16 Nov 2023 13:30:14 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573


            Bug ID: 112573
           Summary: Suboptimal code generation with `-fdata-sections` on
                    aarch64
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jwerner at chromium dot org
  Target Milestone: ---
            Target: aarch64

I noticed some weird code generation behavior in aarch64 that seems to be a new
regression in GCC 13 or 12. Consider the following minimal test case:

char a[32];

void f(int x, int y)
{
        a[y + 3] = (char)x;
        a[y + 2] = (char)(x >> 8);
        a[y + 1] = (char)(x >> 16);
        a[y + 0] = (char)(x >> 24);
}

void h(int x, int y)
{
        *((a + y) + 3) = (char)x;
        *((a + y) + 2) = (char)(x >> 8);
        *((a + y) + 1) = (char)(x >> 16);
        *((a + y) + 0) = (char)(x >> 24);
}

These two functions should, of course, be 100% equivalent. Strangely enough
they still don't give the same code when compiling with -Os, but the difference
is minimal:

0000000000000000 <f> (File Offset: 0x40):
   0:   11000c23        add     w3, w1, #0x3
   4:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        4: R_AARCH64_ADR_PREL_PG_HI21   .bss
   8:   91000042        add     x2, x2, #0x0
                        8: R_AARCH64_ADD_ABS_LO12_NC    .bss
   c:   13087c04        asr     w4, w0, #8
  10:   3823c840        strb    w0, [x2, w3, sxtw]
  14:   11000823        add     w3, w1, #0x2
  18:   3823c844        strb    w4, [x2, w3, sxtw]
  1c:   11000423        add     w3, w1, #0x1
  20:   13107c04        asr     w4, w0, #16
  24:   13187c00        asr     w0, w0, #24
  28:   3823c844        strb    w4, [x2, w3, sxtw]
  2c:   3821c840        strb    w0, [x2, w1, sxtw]
  30:   d65f03c0        ret

0000000000000034 <h> (File Offset: 0x74):
  34:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        34: R_AARCH64_ADR_PREL_PG_HI21  .bss
  38:   91000042        add     x2, x2, #0x0
                        38: R_AARCH64_ADD_ABS_LO12_NC   .bss
  3c:   8b21c043        add     x3, x2, w1, sxtw
  40:   13087c04        asr     w4, w0, #8
  44:   39000864        strb    w4, [x3, #2]
  48:   13107c04        asr     w4, w0, #16
  4c:   39000464        strb    w4, [x3, #1]
  50:   39000c60        strb    w0, [x3, #3]
  54:   13187c00        asr     w0, w0, #24
  58:   3821c840        strb    w0, [x2, w1, sxtw]
  5c:   d65f03c0        ret

However, when I add the -fdata-sections flag, the code for f() remains the
same, but the code for h() becomes this weird thing:

0000000000000034 <h> (File Offset: 0x74):
  34:   93407c21        sxtw    x1, w1
  38:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        38: R_AARCH64_ADR_PREL_PG_HI21  a+0x3
  3c:   91000042        add     x2, x2, #0x0
                        3c: R_AARCH64_ADD_ABS_LO12_NC   a+0x3
  40:   13087c03        asr     w3, w0, #8
  44:   38216840        strb    w0, [x2, x1]
  48:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        48: R_AARCH64_ADR_PREL_PG_HI21  a+0x2
  4c:   91000042        add     x2, x2, #0x0
                        4c: R_AARCH64_ADD_ABS_LO12_NC   a+0x2
  50:   38216843        strb    w3, [x2, x1]
  54:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        54: R_AARCH64_ADR_PREL_PG_HI21  a+0x1
  58:   91000042        add     x2, x2, #0x0
                        58: R_AARCH64_ADD_ABS_LO12_NC   a+0x1
  5c:   13107c03        asr     w3, w0, #16
  60:   13187c00        asr     w0, w0, #24
  64:   38216843        strb    w3, [x2, x1]
  68:   90000002        adrp    x2, 0 <f> (File Offset: 0x40)
                        68: R_AARCH64_ADR_PREL_PG_HI21  a
  6c:   91000042        add     x2, x2, #0x0
                        6c: R_AARCH64_ADD_ABS_LO12_NC   a
  70:   38216840        strb    w0, [x2, x1]
  74:   d65f03c0        ret

There should be absolutely no reason to reload the address of `a` from scratch
for each access. `-fdata-sections` also shouldn't have any influence at all on
how the code inside a function looks like, as far as I'm aware.

When I try this on GCC 11.2.0 instead, the output for h() with -fdata-sections
is the same short code that it is without.

Also, there is of course the question why GCC doesn't just reduce this code to
the `rev` instruction (it doesn't seem to be able to figure that out on either
GCC 11 or GCC 13, but it would be able to do it if the `y` parameter was a
constant instead).

[Bug c/112573] New: Suboptimal code generation with `-fdata-sections` on aarch64

Reply via email to