https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533

--- Comment #7 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #6)
> To help diagnose the problem, I came up with this simple patch:

Thanks for looking into it!

> which then helps reveal that on sh3-linux-gnu with -O1 we see:

I think this will also happen on all sh-elf sub-targets, not necessarily
sh3-linux... if it helps anything ... 

> propagating insn 6 into insn 12, replacing:
> (set (reg:SI 174 [ _1 ])
>     (sign_extend:SI (reg:QI 169 [ *a_7(D) ])))
> successfully matched this instruction to *extendqisi2_compact_snd:
> (set (reg:SI 174 [ _1 ])
>     (sign_extend:SI (mem:QI (reg/v/f:SI 168 [ aD.1817 ]) [0 *a_7(D)+0 S1
> A8])))
> change is profitable (cost 4 -> cost 1)
> 
> which confirms Andrew's and Oleg's analyses above; the sh_rtx_costs function
> is a little odd... Reading from memory is four times faster than using a
> pseudo!?
> I'm investigating a "costs" patch for the SH backend.

Looks like sh_rtx_costs function assumes that the costs of the whole RTX are
summed up outside in the caller.

In sh_rtx_costs SIGN_EXTEND, ZERO_EXTEND, the 'sh_address_cost' is returned
directly for the MEM_P case. It should probably have went through COSTS_N_INSN
to get it into the same scale as for the arith_reg_operand case.

Reply via email to