On 2/4/26 18:05, Paolo Bonzini wrote:
On 2/4/26 06:24, Richard Henderson wrote:
Use tcg_op_imm_match to choose between expanding with AND+SHL vs SHL+SHR.
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
---
tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index e6a16921c9..2944c5a748 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
goto done;
}
- /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
+ /* Lower invalid deposit into zero. */
if (!valid) {
- if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
- !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+ if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+ /* EXTRACT (at 0) + SHL */
+ op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
+ op2->args[0] = ret;
+ op2->args[1] = arg2;
+ op2->args[2] = 0;
+ op2->args[3] = len;
+ } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len)) {
+ /* SHL + EXTRACT (at 0) */
op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
op2->args[0] = ret;
op2->args[1] = arg2;
@@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
op->args[2] = 0;
op->args[3] = ofs + len;
goto done;
+ } else if (tcg_op_imm_match(INDEX_op_and, ctx->type, len_mask)) {
+ /* AND + SHL */
Even if these extracts are valid, can they really be cheaper then an AND with immediate
argument, or back to back shifts?
This is primarily for x86.
(1) movz is 2 operand, so that may avoid clobbering an input,
(2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.
Because of these, there's a comment somewhere that says we'll prefer extract over and
(perhaps in tcg_gen_andi_* or fold_and). IIRC this also happens to simplify ppc and s390x
insn selection (and vs rotate and mask). AFAIK, no other hosts are penalized.
r~