https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533
--- Comment #7 from Oleg Endo <olegendo at gcc dot gnu.org> --- (In reply to Roger Sayle from comment #6) > To help diagnose the problem, I came up with this simple patch: Thanks for looking into it! > which then helps reveal that on sh3-linux-gnu with -O1 we see: I think this will also happen on all sh-elf sub-targets, not necessarily sh3-linux... if it helps anything ... > propagating insn 6 into insn 12, replacing: > (set (reg:SI 174 [ _1 ]) > (sign_extend:SI (reg:QI 169 [ *a_7(D) ]))) > successfully matched this instruction to *extendqisi2_compact_snd: > (set (reg:SI 174 [ _1 ]) > (sign_extend:SI (mem:QI (reg/v/f:SI 168 [ aD.1817 ]) [0 *a_7(D)+0 S1 > A8]))) > change is profitable (cost 4 -> cost 1) > > which confirms Andrew's and Oleg's analyses above; the sh_rtx_costs function > is a little odd... Reading from memory is four times faster than using a > pseudo!? > I'm investigating a "costs" patch for the SH backend. Looks like sh_rtx_costs function assumes that the costs of the whole RTX are summed up outside in the caller. In sh_rtx_costs SIGN_EXTEND, ZERO_EXTEND, the 'sh_address_cost' is returned directly for the MEM_P case. It should probably have went through COSTS_N_INSN to get it into the same scale as for the arith_reg_operand case.