Il mer 4 feb 2026, 10:06 Richard Henderson <[email protected]>
ha scritto:

> On 2/4/26 18:05, Paolo Bonzini wrote:
> > On 2/4/26 06:24, Richard Henderson wrote:
> >> Use tcg_op_imm_match to choose between expanding with AND+SHL vs
> SHL+SHR.
> >>
> >> Suggested-by: Paolo Bonzini <[email protected]>
> >> Signed-off-by: Richard Henderson <[email protected]>
> >> ---
> >>   tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
> >>   1 file changed, 31 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/tcg/optimize.c b/tcg/optimize.c
> >> index e6a16921c9..2944c5a748 100644
> >> --- a/tcg/optimize.c
> >> +++ b/tcg/optimize.c
> >> @@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp
> *op)
> >>               goto done;
> >>           }
> >> -        /* Lower invalid deposit into zero as AND + SHL or SHL + AND.
> */
> >> +        /* Lower invalid deposit into zero. */
> >>           if (!valid) {
> >> -            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
> >> -                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> >> +            if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> >> +                /* EXTRACT (at 0) + SHL */
> >> +                op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
> >> +                op2->args[0] = ret;
> >> +                op2->args[1] = arg2;
> >> +                op2->args[2] = 0;
> >> +                op2->args[3] = len;
> >> +            } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs +
> len)) {
> >> +                /* SHL + EXTRACT (at 0) */
> >>                   op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
> >>                   op2->args[0] = ret;
> >>                   op2->args[1] = arg2;
> >> @@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp
> *op)
> >>                   op->args[2] = 0;
> >>                   op->args[3] = ofs + len;
> >>                   goto done;
> >> +            } else if (tcg_op_imm_match(INDEX_op_and, ctx->type,
> len_mask)) {
> >> +                /* AND + SHL */
> >
> > Even if these extracts are valid, can they really be cheaper then an AND
> with immediate
> > argument, or back to back shifts?
>
> This is primarily for x86.
>
> (1) movz is 2 operand, so that may avoid clobbering an input,
> (2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.
>
> Because of these, there's a comment somewhere that says we'll prefer
> extract over and
> (perhaps in tcg_gen_andi_* or fold_and).  IIRC this also happens to
> simplify ppc and s390x
> insn selection (and vs rotate and mask).  AFAIK, no other hosts are
> penalized.
>

I think it would be better to pick a canonical form for AND with 2^n-1 and
handle conversion to extract (like PPC rotates or movz) in the backend.

Picking AND as the canonical form also avoids makes the macros for extract
validity simpler too; adding an extra constraint for immediate 2^n-1 is
easier and it generalizes to other PPC rotate and mask cases.

Paolo

>
>
>
> r~
>
>

Reply via email to