The first form has a lower latency (due to the special handling of "move" in LA464 and LA664) despite it's longer.
gcc/ChangeLog: * config/loongarch/loongarch.md (define_peephole2): Require optimize_insn_for_size_p () for move/move/bstrins => srai/bstrins transform. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 25c1d323ba0..e4434c3bd4e 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1617,20 +1617,23 @@ (define_insn_and_split "*bstrins_<mode>_for_ior_mask" }) ;; We always avoid the shift operation in bstrins_<mode>_for_ior_mask -;; if possible, but the result may be sub-optimal when one of the masks +;; if possible, but the result may be larger when one of the masks ;; is (1 << N) - 1 and one of the src register is the dest register. ;; For example: ;; move t0, a0 ;; move a0, a1 ;; bstrins.d a0, t0, 42, 0 ;; ret -;; using a shift operation would be better: +;; using a shift operation would be smaller: ;; srai.d t0, a1, 43 ;; bstrins.d a0, t0, 63, 43 ;; ret ;; unfortunately we cannot figure it out in split1: before reload we cannot ;; know if the dest register is one of the src register. Fix it up in ;; peephole2. +;; +;; Note that the first form has a lower latency so this should only be +;; done when optimizing for size. (define_peephole2 [(set (match_operand:GPR 0 "register_operand") (match_operand:GPR 1 "register_operand")) @@ -1639,7 +1642,7 @@ (define_peephole2 (match_operand:SI 3 "const_int_operand") (const_int 0)) (match_dup 0))] - "peep2_reg_dead_p (3, operands[0])" + "peep2_reg_dead_p (3, operands[0]) && optimize_insn_for_size_p ()" [(const_int 0)] { int len = GET_MODE_BITSIZE (<MODE>mode) - INTVAL (operands[3]); -- 2.45.2