https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:ba730fd10934e4ca004251aa3748bf9da4d35e62

commit r15-2696-gba730fd10934e4ca004251aa3748bf9da4d35e62
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Fri Aug 2 15:58:31 2024 +0100

    Make may_trap_p_1 return false for constant pool references [PR116145]

    The testcase contains the constant:

      arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));

    which was initially hoisted by hand, but which gimple optimisers later
    propagated to each use (as expected).  The constant was then expanded
    as a load-and-duplicate from the constant pool.  Normally that load
    should then be hoisted back out of the loop, but may_trap_or_fault_p
    stopped that from happening in this case.

    The code responsible was:

          if (/* MEM_NOTRAP_P only relates to the actual position of the memory
                 reference; moving it out of context such as when moving code
                 when optimizing, might cause its address to become invalid. 
*/
              code_changed
              || !MEM_NOTRAP_P (x))
            {
              poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
              return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size,
                                            GET_MODE (x), code_changed);
            }

    where code_changed is true.  (Arguably it doesn't need to be true in
    this case, if we inserted invariants on the preheader edge, but it
    would still need to be true for conditionally executed loads.)

    Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1
    would recognise that the address refers to the constant pool.
    However, the SVE load-and-replicate instructions have a limited
    offset range, so it isn't possible for them to have a LO_SUM address.
    All we have is a plain pseudo base register.

    MEM_READONLY_P is defined as:

      /* 1 if RTX is a mem that is statically allocated in read-only memory. 
*/
      #define MEM_READONLY_P(RTX) \
        (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging)

    and so I think it should be safe to move memory references if both
    MEM_READONLY_P and MEM_NOTRAP_P are true.

    The testcase isn't a minimal reproducer, but I think it's good
    to have a realistic full routine in the testsuite.

    gcc/
            PR rtl-optimization/116145
            * rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code
            movement if MEM_READONLY_P is also true.

    gcc/testsuite/
            PR rtl-optimization/116145
            * gcc.target/aarch64/sve/acle/general/pr116145.c: New test.

Reply via email to