https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
Bug ID: 102162 Summary: Byte-wise access optimized away at -O1 and above Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: danglin at gcc dot gnu.org CC: helge.deller at sap dot com Target Milestone: --- Host: hppa*-*-linux* Target: hppa*-*-linux* Build: hppa*-*-linux* Created attachment 51394 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51394&action=edit Test case The packed attribute is used in Linux v5.14 to request byte-wise access to unaligned data. This is important on hppa as loads and stores require strict alignment. The attached test program is miscompiled at -O1 and above. The byte-wise accesses are optimized to a single ldw instruction during RTL expansion: .LEVEL 2.0w .text .align 8 .globl test .type test, @function test: .PROC .CALLINFO FRAME=0,NO_CALLS .ENTRY addil LT'output_len,%r27 ldd RT'output_len(%r1),%r28 ldw 0(%r28),%r28 bve (%r2) extrd,s %r28,63,32,%r28 .EXIT .PROCEND .size test, .-test .globl output_len .section .bss .type output_len, @object .size output_len, 4 .align 1 output_len: .block 4 .ident "GCC: (GNU) 10.3.0" This faults when output_len is not aligned on a word boundary. Not sure, but problem may be the test-unaligned.c.027t.einline pass: ;; Function get_unaligned_le32 (get_unaligned_le32, funcdef_no=0, decl_uid=1506, cgraph_uid=1, symbol_order=1) Iterations: 0 get_unaligned_le32 (const void * p) { const struct { u32 x; } * __pptr; u32 _4; <bb 2> : __pptr_2 = p_1(D); _4 = __pptr_2->x; return _4; } ;; Function test (test, funcdef_no=1, decl_uid=1512, cgraph_uid=2, symbol_order=2) Iterations: 1 Symbols to be put in SSA form { D.1520 D.1524 } Incremental SSA update started at block: 0 Number of blocks in CFG: 5 Number of blocks to update: 4 ( 80%) Merging blocks 2 and 4 Merging blocks 2 and 3 test () { u32 D.1524; unsigned int _1; unsigned int _3; int _4; <bb 2> : _3 = MEM[(const struct *)&output_len].x; _5 = _3; _1 = _5; _4 = (int) _1; return _4; } Ultimately, the MEM gets expanded to the ldw.