https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72863

--- Comment #3 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
This is a phase ordering issue involving the expanders for the built-ins.  In
vsx.md:

;; Explicit  load/store expanders for the builtin functions
(define_expand "vsx_load_<mode>"
  [(set (match_operand:VSX_M 0 "vsx_register_operand" "")
        (match_operand:VSX_M 1 "memory_operand" ""))]
  "VECTOR_MEM_VSX_P (<MODE>mode)"
  "")

(define_expand "vsx_store_<mode>"
  [(set (match_operand:VSX_M 0 "memory_operand" "")
        (match_operand:VSX_M 1 "vsx_register_operand" ""))]
  "VECTOR_MEM_VSX_P (<MODE>mode)"
  "")

This delays expanding into swaps until after the next split phase, instead of
right at expand time.  Since the swap optimization runs immediately following
expand, this is too late.

A normal assignment, on the other hand, goes through the mov expander in
vector.md, which takes us here:

  if (!BYTES_BIG_ENDIAN
      && VECTOR_MEM_VSX_P (<MODE>mode)
      && !TARGET_P9_VECTOR
      && !gpr_or_gpr_p (operands[0], operands[1])
      && (memory_operand (operands[0], <MODE>mode)
          ^ memory_operand (operands[1], <MODE>mode)))
    {
      rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
      DONE;
    }

thus generating the permuting load/store with the register permute.

We should be able to add similar logic to the intrinsic expanders in order to
get the swaps to show up in time to be optimized.

Reply via email to