https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112573
Bug ID: 112573 Summary: Suboptimal code generation with `-fdata-sections` on aarch64 Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jwerner at chromium dot org Target Milestone: --- Target: aarch64 I noticed some weird code generation behavior in aarch64 that seems to be a new regression in GCC 13 or 12. Consider the following minimal test case: char a[32]; void f(int x, int y) { a[y + 3] = (char)x; a[y + 2] = (char)(x >> 8); a[y + 1] = (char)(x >> 16); a[y + 0] = (char)(x >> 24); } void h(int x, int y) { *((a + y) + 3) = (char)x; *((a + y) + 2) = (char)(x >> 8); *((a + y) + 1) = (char)(x >> 16); *((a + y) + 0) = (char)(x >> 24); } These two functions should, of course, be 100% equivalent. Strangely enough they still don't give the same code when compiling with -Os, but the difference is minimal: 0000000000000000 <f> (File Offset: 0x40): 0: 11000c23 add w3, w1, #0x3 4: 90000002 adrp x2, 0 <f> (File Offset: 0x40) 4: R_AARCH64_ADR_PREL_PG_HI21 .bss 8: 91000042 add x2, x2, #0x0 8: R_AARCH64_ADD_ABS_LO12_NC .bss c: 13087c04 asr w4, w0, #8 10: 3823c840 strb w0, [x2, w3, sxtw] 14: 11000823 add w3, w1, #0x2 18: 3823c844 strb w4, [x2, w3, sxtw] 1c: 11000423 add w3, w1, #0x1 20: 13107c04 asr w4, w0, #16 24: 13187c00 asr w0, w0, #24 28: 3823c844 strb w4, [x2, w3, sxtw] 2c: 3821c840 strb w0, [x2, w1, sxtw] 30: d65f03c0 ret 0000000000000034 <h> (File Offset: 0x74): 34: 90000002 adrp x2, 0 <f> (File Offset: 0x40) 34: R_AARCH64_ADR_PREL_PG_HI21 .bss 38: 91000042 add x2, x2, #0x0 38: R_AARCH64_ADD_ABS_LO12_NC .bss 3c: 8b21c043 add x3, x2, w1, sxtw 40: 13087c04 asr w4, w0, #8 44: 39000864 strb w4, [x3, #2] 48: 13107c04 asr w4, w0, #16 4c: 39000464 strb w4, [x3, #1] 50: 39000c60 strb w0, [x3, #3] 54: 13187c00 asr w0, w0, #24 58: 3821c840 strb w0, [x2, w1, sxtw] 5c: d65f03c0 ret However, when I add the -fdata-sections flag, the code for f() remains the same, but the code for h() becomes this weird thing: 0000000000000034 <h> (File Offset: 0x74): 34: 93407c21 sxtw x1, w1 38: 90000002 adrp x2, 0 <f> (File Offset: 0x40) 38: R_AARCH64_ADR_PREL_PG_HI21 a+0x3 3c: 91000042 add x2, x2, #0x0 3c: R_AARCH64_ADD_ABS_LO12_NC a+0x3 40: 13087c03 asr w3, w0, #8 44: 38216840 strb w0, [x2, x1] 48: 90000002 adrp x2, 0 <f> (File Offset: 0x40) 48: R_AARCH64_ADR_PREL_PG_HI21 a+0x2 4c: 91000042 add x2, x2, #0x0 4c: R_AARCH64_ADD_ABS_LO12_NC a+0x2 50: 38216843 strb w3, [x2, x1] 54: 90000002 adrp x2, 0 <f> (File Offset: 0x40) 54: R_AARCH64_ADR_PREL_PG_HI21 a+0x1 58: 91000042 add x2, x2, #0x0 58: R_AARCH64_ADD_ABS_LO12_NC a+0x1 5c: 13107c03 asr w3, w0, #16 60: 13187c00 asr w0, w0, #24 64: 38216843 strb w3, [x2, x1] 68: 90000002 adrp x2, 0 <f> (File Offset: 0x40) 68: R_AARCH64_ADR_PREL_PG_HI21 a 6c: 91000042 add x2, x2, #0x0 6c: R_AARCH64_ADD_ABS_LO12_NC a 70: 38216840 strb w0, [x2, x1] 74: d65f03c0 ret There should be absolutely no reason to reload the address of `a` from scratch for each access. `-fdata-sections` also shouldn't have any influence at all on how the code inside a function looks like, as far as I'm aware. When I try this on GCC 11.2.0 instead, the output for h() with -fdata-sections is the same short code that it is without. Also, there is of course the question why GCC doesn't just reduce this code to the `rev` instruction (it doesn't seem to be able to figure that out on either GCC 11 or GCC 13, but it would be able to do it if the `y` parameter was a constant instead).