https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target
   Last reconfirmed|                            |2021-08-30
             Target|                            |arm
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
One common source of missed optimizations is gimple_fold_builtin_memory_op
which has

      /* If we can perform the copy efficiently with first doing all loads
         and then all stores inline it that way.  Currently efficiently
         means that we can load all the memory into a single integer
         register which is what MOVE_MAX gives us.  */
      src_align = get_pointer_alignment (src);
      dest_align = get_pointer_alignment (dest);
      if (tree_fits_uhwi_p (len)
          && compare_tree_int (len, MOVE_MAX) <= 0
...
                  /* If the destination pointer is not aligned we must be able
                     to emit an unaligned store.  */
                  && (dest_align >= GET_MODE_ALIGNMENT (mode)
                      || !targetm.slow_unaligned_access (mode, dest_align)
                      || (optab_handler (movmisalign_optab, mode)
                          != CODE_FOR_nothing)))

where here likely the MOVE_MAX limit applies (it is 4).  Since we actually
do need to perform two loads the code seems to do what is intended (but
that's of course "bad" for 64bit copies on 32bit archs and likewise for
128bit copies on 64bit archs).

It's usually too late for RTL memcpy expansion to fully elide stack storage.

Reply via email to