Il mer 4 feb 2026, 10:06 Richard Henderson <[email protected]> ha scritto:
> On 2/4/26 18:05, Paolo Bonzini wrote: > > On 2/4/26 06:24, Richard Henderson wrote: > >> Use tcg_op_imm_match to choose between expanding with AND+SHL vs > SHL+SHR. > >> > >> Suggested-by: Paolo Bonzini <[email protected]> > >> Signed-off-by: Richard Henderson <[email protected]> > >> --- > >> tcg/optimize.c | 40 +++++++++++++++++++++++++++++++--------- > >> 1 file changed, 31 insertions(+), 9 deletions(-) > >> > >> diff --git a/tcg/optimize.c b/tcg/optimize.c > >> index e6a16921c9..2944c5a748 100644 > >> --- a/tcg/optimize.c > >> +++ b/tcg/optimize.c > >> @@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp > *op) > >> goto done; > >> } > >> - /* Lower invalid deposit into zero as AND + SHL or SHL + AND. > */ > >> + /* Lower invalid deposit into zero. */ > >> if (!valid) { > >> - if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) && > >> - !TCG_TARGET_extract_valid(ctx->type, 0, len)) { > >> + if (TCG_TARGET_extract_valid(ctx->type, 0, len)) { > >> + /* EXTRACT (at 0) + SHL */ > >> + op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4); > >> + op2->args[0] = ret; > >> + op2->args[1] = arg2; > >> + op2->args[2] = 0; > >> + op2->args[3] = len; > >> + } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + > len)) { > >> + /* SHL + EXTRACT (at 0) */ > >> op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3); > >> op2->args[0] = ret; > >> op2->args[1] = arg2; > >> @@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp > *op) > >> op->args[2] = 0; > >> op->args[3] = ofs + len; > >> goto done; > >> + } else if (tcg_op_imm_match(INDEX_op_and, ctx->type, > len_mask)) { > >> + /* AND + SHL */ > > > > Even if these extracts are valid, can they really be cheaper then an AND > with immediate > > argument, or back to back shifts? > > This is primarily for x86. > > (1) movz is 2 operand, so that may avoid clobbering an input, > (2) movz is 3-4 byte whereas and r/i32 is 6-7 byte. > > Because of these, there's a comment somewhere that says we'll prefer > extract over and > (perhaps in tcg_gen_andi_* or fold_and). IIRC this also happens to > simplify ppc and s390x > insn selection (and vs rotate and mask). AFAIK, no other hosts are > penalized. > I think it would be better to pick a canonical form for AND with 2^n-1 and handle conversion to extract (like PPC rotates or movz) in the backend. Picking AND as the canonical form also avoids makes the macros for extract validity simpler too; adding an extra constraint for immediate 2^n-1 is easier and it generalizes to other PPC rotate and mask cases. Paolo > > > > r~ > >
