On Fri, Dec 04, 2020 at 01:35:55PM -0600, Richard Henderson wrote:

Thank you Richard for your answer. I don't want to generate a debate,
or defend the way I've done things initially. Really want to clarify
these internals. Hope it will benefit to other QEMU enthusiasts.

> You can't just inject a call anywhere you like.  If you add it at
> the IR level, then the rest of the compiler will see it and work
> properly.  If you add the call in the middle of another operation,
> the compiler doesn't get to see it and Bad Things Happen.

I do understand that, and surprisingly isn't it what is done in the
qemu slow path ? I mean, the call to the helper is not generated at IR
level but rather injected through a 'jmp' right in the middle of
currently generated instructions, plus code added at the end of the
TB.

What's the difference between the way it is currently done for the
slow path and something like:

static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
{ [...]
    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
                     label_ptr, offsetof(CPUTLBEntry, addr_write));

    /* TLB Hit.  */
    tcg_out_qemu_st_filter(s, opc, addrlo, addrhi, datalo, datahi);
    tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);

    /* Record the current context of a store into ldst label */
    add_qemu_ldst_label(s, false, is64, oi, datalo, datahi, addrlo, addrhi,
                        s->code_ptr, label_ptr);
}

Where:
static void tcg_out_qemu_st_filter(TCGContext *s, MemOp opc,
                                   TCGReg addrlo, TCGReg addrhi,
                                   TCGReg datalo, TCGReg datahi)
{
  MemOp s_bits = opc & MO_SIZE;

  tcg_out_push(s, TCG_REG_L1); // used later on by tcg_out_qemu_st_direct

  tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
              tcg_target_call_iarg_regs[0], addrlo);

  tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
              tcg_target_call_iarg_regs[1], datalo);

  tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], opc);

  tcg_out_call(s, (void*)filter_store_memop);

  tcg_out_pop(s, TCG_REG_L1);
}

Does the ldst_label mechanism generating slow path code at TB's end
change something ? There is still an injected 'jne' at
tcg_out_tlb_load() which redirects to the slow path code, whatever its
location, like I do in-place for tcg_out_qemu_st_filter.

For sure the TCG is blind at some point, but it works for the slow
path, so it should for the filter. The TCG qemu_st_i32 op is

DEF(qemu_st_i32, 0, TLADDR_ARGS + 1, 1,
    TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)

And as you stated, the tcg_reg_alloc_op() had properly managed the
call clobbered registers. So we should be safe calling a helper from
tcg_out_qemu_st() and arguably that's why you do so for the slow path
?


> > I noticed that 'esp' is not shifted down before stacking up the
> > args, which might corrupt last stacked words.
> 
> No, we generate code for a constant esp, as if by gcc's
> -mno-push-args option. We have reserved TCG_STATIC_CALL_ARGS_SIZE
> bytes of stack for the arguments (which is actually larger than
> necessary for any of the tcg targets).

As this is done only at the TB prologue, do you mean that the TCG will
never generate an equivalent to a push *followed* by a memory
store/load ? Our host esp will never point to a last stacked word,
issued by the translation of a TCG op ?

Reply via email to