On 11/23/23 11:34, Maciej W. Rozycki wrote:
On Sun, 19 Nov 2023, Jeff Law wrote:

As I suspect you know a big part of the problem here is that BRANCH_COST and
rtx_cost don't have any common scale and thus trying to compare BRANCH_COST to
RTX_COST doesn't have well defined meaning.

  We do have preexisting places using COSTS_N_INSNS (BRANCH_COST ())
though, as documented in ifcvt.cc:

      ??? Actually, instead of the branch instruction costs we might want
      to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */

so it seems the right direction, and given that we expose this measure to
the user (and at the very least GCC developers implementing new tuning
microarchitectures) I think it's the only sane way to do branch costing:
define the measure in terms of how many ordinary ALU instructions a branch
is statistically equivalent to.
Oh, it probably is the only sane way at this point. To do anything more sane we'd have to disassociate the BRANCH_COST uses during tree/gimple from those in RTL-land.

FWIW, I was looking at a regression with our internal tests after your changes. It was quite nice to see how well twiddling -mbranch-cost correlated to how many instructions we would allow in a conditional move sequence.

The downside is it highlighted the gimple vs RTL use issue. I'm confident that we would like to see a higher branch cost in the RTL phases for our uarch, but I'm much less comfortable with how that's going to change the decisions made in trees/gimple. We'll have to investigate that at some depth.




WRT the extraneous zero-extension.  Isn't that arguably a bug in the scc
expander for risc-v?  Fixing that isn't a prerequisite here, but it probably
worth a bit of someone's time.

  I've looked at it already and it's the middle end that ends up with the
zero-extension, specifically `convert_move' invoked from `emit_cstore'
down the call to `noce_try_store_flag_mask', to widen the output from
`cstoredi4', so I don't think we can do anything in the backend to prevent
it from happening.  And neither I think we can do anything useful about
`cstoredi4' having a SImode output, as it's a pattern matched by name
rather than RTX, so we can't provide variants having a SImode and a DImode
output each both at a time, as that would cause a name clash.
We're actually tracking some of these extraneous extensions. Do you happen to know if the zero-extended object happens to be (subreg:SI (reg:DI)) kind of construct? That's the kind of thing we're chasing down right now from various points. Vineet has already fixed one class of them. Jivan and I are looking at others.

jeff

Reply via email to