[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

roger at nextmovesoftware dot com via Gcc-bugs Mon, 22 Jan 2024 17:23:09 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113533


--- Comment #10 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Oleg.  Great question.  The "speed" parameter passed to rtx_costs, and
address_cost indicates whether the middle-end is optimizing for peformance, and
interested in the nummber of cycles taken by each instruction, or optimizing
for size, and interested in the number of bytes used to encode the instruction.
 Previously, this speed parameter was ignored by the SH backend, so the costs
were the same independent of the objective function.

In my proposed patch, the address cost (1) when optimizing for size attempts to
return the additional size of an instruction based on the addressing mode.  For
register, and reg+reg addressing modes there is no size increase (overhead),
and for adressing modes with displacements, and displacements to address
pointers, there is a cost.  (2) when optimizing for speed, address cost remains
between 0 and 3, and is used to prioritize between (equivalent numbers of)
instructions.  Normally, rtx_costs are defined in terms of COST_N_INSNS, which
multiplies by 4.  Hence on many platforms a single instruction that references
memory may be encoded as COSTS_N_INSNS(1)+1 (or a more complex addressing mode
as COSTS_N_INSNS(1)+2) to show that this is disfavored to a single instruction
that doesn't reference memory, COSTS_N_INSNS(1)+0.

This is the fix for this particular regression; SIGN_EXTEND of a register now
costs COSTS_N_INSNS(1), and SIGN_EXTEND of a MEM now costs COSTS_N_INSNS(1)+1.

A useful way to debug rtx_costs is to use the -dP command line option, and then
look at the [c=X, l=Y] annotations in the assembly language file.  One way to
check/confirm that these are sensible is that ideally they should be correlated
when optimizing for size (with -Os or -Oz).

I've found an interesting table of SH cycle counts (for different CPUs) at
http://www.shared-ptr.com/sh_insns.html and these could be used to improve
sh_rtx_costs further.  For example, SH currently reports multiplications as
a single cycle operation, which doesn't match the hardware specs, and prevents
GCC from using synth_mult to produce faster (or shorter) sequences using shifts
and additions.  Likewise, sh_rtx_costs doesn't distinguish the machine mode,
so the costs of SImode multiplications are the same as DImode multiplications.

In comment #5 you mention GCC's defaults; it turns out that for rtx_costs the
default values that would be provided by the middle-end, may be more accurate
than the values (currently) specified by the backend.

I hope this answers your question.

[Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267

Reply via email to