On 2/4/26 18:05, Paolo Bonzini wrote:
On 2/4/26 06:24, Richard Henderson wrote:
Use tcg_op_imm_match to choose between expanding with AND+SHL vs SHL+SHR.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Richard Henderson <[email protected]>
---
  tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
  1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e6a16921c9..2944c5a748 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
              goto done;
          }
-        /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
+        /* Lower invalid deposit into zero. */
          if (!valid) {
-            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
-                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+            if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+                /* EXTRACT (at 0) + SHL */
+                op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = 0;
+                op2->args[3] = len;
+            } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len)) {
+                /* SHL + EXTRACT (at 0) */
                  op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
                  op2->args[0] = ret;
                  op2->args[1] = arg2;
@@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
                  op->args[2] = 0;
                  op->args[3] = ofs + len;
                  goto done;
+            } else if (tcg_op_imm_match(INDEX_op_and, ctx->type, len_mask)) {
+                /* AND + SHL */

Even if these extracts are valid, can they really be cheaper then an AND with immediate argument, or back to back shifts?

This is primarily for x86.

(1) movz is 2 operand, so that may avoid clobbering an input,
(2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.

Because of these, there's a comment somewhere that says we'll prefer extract over and (perhaps in tcg_gen_andi_* or fold_and). IIRC this also happens to simplify ppc and s390x insn selection (and vs rotate and mask). AFAIK, no other hosts are penalized.



r~

Reply via email to