[Bug tree-optimization/102162] New: Byte-wise access optimized away at -O1 and above

danglin at gcc dot gnu.org via Gcc-bugs Wed, 01 Sep 2021 09:52:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162


            Bug ID: 102162
           Summary: Byte-wise access optimized away at -O1 and above
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: danglin at gcc dot gnu.org
                CC: helge.deller at sap dot com
  Target Milestone: ---
              Host: hppa*-*-linux*
            Target: hppa*-*-linux*
             Build: hppa*-*-linux*

Created attachment 51394
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51394&action=edit
Test case

The packed attribute is used in Linux v5.14 to request byte-wise access
to unaligned data.  This is important on hppa as loads and stores require
strict alignment.

The attached test program is miscompiled at -O1 and above.  The byte-wise
accesses are optimized to a single ldw instruction during RTL expansion:

        .LEVEL 2.0w
        .text
        .align 8
.globl test
        .type   test, @function
test:
        .PROC
        .CALLINFO FRAME=0,NO_CALLS
        .ENTRY
        addil LT'output_len,%r27
        ldd RT'output_len(%r1),%r28
        ldw 0(%r28),%r28
        bve (%r2)
        extrd,s %r28,63,32,%r28
        .EXIT
        .PROCEND
        .size   test, .-test
.globl output_len
        .section        .bss
        .type   output_len, @object
        .size   output_len, 4
        .align 1
output_len:
        .block 4
        .ident  "GCC: (GNU) 10.3.0"

This faults when output_len is not aligned on a word boundary.

Not sure, but problem may be the test-unaligned.c.027t.einline pass:

;; Function get_unaligned_le32 (get_unaligned_le32, funcdef_no=0,
decl_uid=1506, cgraph_uid=1, symbol_order=1)

Iterations: 0
get_unaligned_le32 (const void * p)
{
  const struct
  {
    u32 x;
  } * __pptr;
  u32 _4;

  <bb 2> :
  __pptr_2 = p_1(D);
  _4 = __pptr_2->x;
  return _4;

}



;; Function test (test, funcdef_no=1, decl_uid=1512, cgraph_uid=2,
symbol_order=2)

Iterations: 1

Symbols to be put in SSA form
{ D.1520 D.1524 }
Incremental SSA update started at block: 0
Number of blocks in CFG: 5
Number of blocks to update: 4 ( 80%)


Merging blocks 2 and 4
Merging blocks 2 and 3
test ()
{
  u32 D.1524;
  unsigned int _1;
  unsigned int _3;
  int _4;

  <bb 2> :
  _3 = MEM[(const struct  *)&output_len].x;
  _5 = _3;
  _1 = _5;
  _4 = (int) _1;
  return _4;

}

Ultimately, the MEM gets expanded to the ldw.

[Bug tree-optimization/102162] New: Byte-wise access optimized away at -O1 and above

Reply via email to