On 11/23/23 11:34, Maciej W. Rozycki wrote:
On Sun, 19 Nov 2023, Jeff Law wrote:
As I suspect you know a big part of the problem here is that BRANCH_COST and
rtx_cost don't have any common scale and thus trying to compare BRANCH_COST to
RTX_COST doesn't have well defined meaning.
We do have preexisting places using COSTS_N_INSNS (BRANCH_COST ())
though, as documented in ifcvt.cc:
??? Actually, instead of the branch instruction costs we might want
to use COSTS_N_INSNS (BRANCH_COST ()) as in other places. */
so it seems the right direction, and given that we expose this measure to
the user (and at the very least GCC developers implementing new tuning
microarchitectures) I think it's the only sane way to do branch costing:
define the measure in terms of how many ordinary ALU instructions a branch
is statistically equivalent to.
Oh, it probably is the only sane way at this point. To do anything more
sane we'd have to disassociate the BRANCH_COST uses during tree/gimple
from those in RTL-land.
FWIW, I was looking at a regression with our internal tests after your
changes. It was quite nice to see how well twiddling -mbranch-cost
correlated to how many instructions we would allow in a conditional move
sequence.
The downside is it highlighted the gimple vs RTL use issue. I'm
confident that we would like to see a higher branch cost in the RTL
phases for our uarch, but I'm much less comfortable with how that's
going to change the decisions made in trees/gimple. We'll have to
investigate that at some depth.
WRT the extraneous zero-extension. Isn't that arguably a bug in the scc
expander for risc-v? Fixing that isn't a prerequisite here, but it probably
worth a bit of someone's time.
I've looked at it already and it's the middle end that ends up with the
zero-extension, specifically `convert_move' invoked from `emit_cstore'
down the call to `noce_try_store_flag_mask', to widen the output from
`cstoredi4', so I don't think we can do anything in the backend to prevent
it from happening. And neither I think we can do anything useful about
`cstoredi4' having a SImode output, as it's a pattern matched by name
rather than RTX, so we can't provide variants having a SImode and a DImode
output each both at a time, as that would cause a name clash.
We're actually tracking some of these extraneous extensions. Do you
happen to know if the zero-extended object happens to be (subreg:SI
(reg:DI)) kind of construct? That's the kind of thing we're chasing
down right now from various points. Vineet has already fixed one class
of them. Jivan and I are looking at others.
jeff