On Mon, May 25, 2026 at 8:55 PM Richard Henderson <[email protected]> wrote: > > On 5/25/26 08:27, Matt Turner wrote: > > The gUSA pattern matcher rejected `add #imm, Rn` whenever any prior > > `mov Rm, Rn` appeared (mv_src >= 0), forcing a fallback to > > cpu_exec_step_atomic for sequences like: > > > > mov.l @r2, r3 ; load > > mov r3, r7 ; save old value (mv_src == ld_dst) > > add #1, r7 ; increment copy > > mov.l r7, @r2 ; store > > > > When mv_src == ld_dst the move merely copies the loaded value to > > preserve it -- exactly the situation already accepted for the > > `add Rm, Rn` form. The immediate form can be handled identically with > > tcg_gen_atomic_fetch_add_i32 + tcg_gen_add_i32, so translate it inline > > instead of taking the slower single-step atomic fallback. > > --- > > target/sh4/translate.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git ./target/sh4/translate.c ./target/sh4/translate.c > > index 5adf650744..d38a6bd352 100644 > > --- ./target/sh4/translate.c > > +++ ./target/sh4/translate.c > > @@ -1974,7 +1974,7 @@ static void decode_gusa(DisasContext *ctx, > > CPUSH4State *env) > > break; > > > > case 0x7000 ... 0x700f: /* add #imm,Rn */ > > - if (op_dst != B11_8 || mv_src >= 0) { > > + if (op_dst != B11_8 || (mv_src >= 0 && mv_src != ld_dst)) { > > goto fail; > > Hmm, but then you don't further constrain B11_8. > > For the given example, afaics, B11_8 == op_dst, so we're failing only because > mv_src >= 0. > Do we actually need the mv_src check? I'm guessing not.
I think you're exactly right that `mv_src >= 0` doesn't do anything by itself. With this patch, the `mv_src >= 0` check is needed to prevent a spurious failure for: mov.l @r2, r3 ; ld_dst=3, op_dst=3 (set to ld_dst by default) add #1, r3 ; B11_8=3 mov.l r3, @r2 With just mv_src != ld_dst (no mv_src >= 0 guard): - mv_src = -1, ld_dst = 3 - -1 != 3 -> true -> spurious fail
