Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode

Stuart Brady Mon, 19 Sep 2011 14:45:11 -0700

On Sat, Sep 17, 2011 at 10:00:31PM +0200, Stefan Weil wrote:

> +#if MAX_OPC_PARAM_IARGS != 4
> +# error Fix needed, number of supported input arguments changed!
> +#endif
> +#if TCG_TARGET_REG_BITS == 32
> +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong);
> +#else
> +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong,
> +                                    tcg_target_ulong, tcg_target_ulong);
> +#endif


[...]

> +        case INDEX_op_call:
> +            t0 = tci_read_ri(&tb_ptr);
> +#if TCG_TARGET_REG_BITS == 32
> +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> +                                        tci_read_reg(TCG_REG_R1),
> +                                        tci_read_reg(TCG_REG_R2),
> +                                        tci_read_reg(TCG_REG_R3),
> +                                        tci_read_reg(TCG_REG_R5),
> +                                        tci_read_reg(TCG_REG_R6),
> +                                        tci_read_reg(TCG_REG_R7),
> +                                        tci_read_reg(TCG_REG_R8));
> +            tci_write_reg(TCG_REG_R0, u64);
> +            tci_write_reg(TCG_REG_R1, u64 >> 32);
> +#else
> +            u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0),
> +                                        tci_read_reg(TCG_REG_R1),
> +                                        tci_read_reg(TCG_REG_R2),
> +                                        tci_read_reg(TCG_REG_R3));
> +            tci_write_reg(TCG_REG_R0, u64);
> +#endif
> +            break;

Unfortunately, this won't work on all architectures.

C99 6.5.2.2 states:

   9. If the function is defined with a type that is not compatible with
      the type (of the expression) pointed to by the expression that
      denotes the called function, the behavior is undefined.

We could perhaps get away with this on certain architectures (and on
those architectures, doing it this way might be the most efficient
option), although I'm not sure which those architectures are.

The real problem is relates to alignment of parameters in registers
when passing 64-bit ints as arguments on 32-bit architectures.

Some ABIs have the situation where:

    void foo(uint32_t a, uint64_t b);

results in:

    register  contents
          p0  a
          p1  [padding]
          p2  b & 0xffffffff
          p3  b >> 32

An ABI may require this regardless of whether 64-bit integers are
typically aligned to 64-bit addresses in memory (or it may even do
this without such alignment for memory addresses).  The ordering of
the upper and lower 32-bits of a 64-bit parameter may have nothing at
all to do with the architecture's endianness.  The alignment rules
when passing arguments via registers might not even be consistent
with those when passing via the stack.

In QEMU, tcg_gen_callN() handles alignment of registers for the
architectures that currently have TCG backends.  If any new backend
were to require features not already supported by tcg_gen_callN(),
then those features would simply have to be added.

When using TCI, we could define TCG_TARGET_CALL_ALIGN_ARGS (and be
careful to handle the REGPARM case under x86), and simply rely on
tcg_gen_callN(), but this isn't guaranteed to work for all ABIs.

Since TCI is intended to be portable, I feel that we should provide
a means of calling helper functions that doesn't rely upon any
ABI-specific definitions, at least as a fallback.  It would probably
make sense to get the generic code working first, and then think about
optimising for specific ABIs later, IMO.

So, this leaves the question of how to do this in a generic manner.
Do we:

 1) Include a pointer to a wrapper function in the bytecode, which
    would call the helper with the correct type.  Each wrapper could
    just read from and write to the TCI registers itself without
    accepting/returning values, or the values of the TCI registers
    could be passed in as arguments to each of the wrapper functions.

 2) Encode the type of the function into the bytecode, such that a
    huge switch() statement can be used to cast the function pointer
    to the appropriate type, allowing the helper to be invoked in a
    defined manner.  My guess is that this would be slower when
    executing bytecode than 1), although it would be quicker for
    compilation of the bytecode.

 3) Modify the helpers themselves to accept uint32_t arguments when
    using TCI.  This would require quite a lot of work but would
    likely yield the best performance.  However, it would prevent
    us from ever being able to choose between architecture-specific
    backends and TCI using a command line option.

 4) Go with some other option that I've not considered?

To me, option 1) seems like the simplest, although the macros needed
to do this are likely to be a little hairy...

I'm also concerned that we should not clobber R1 when storing a
32-bit return value in R0 on 32-bit architectures.

Cheers,
-- 
Stuart Brady

Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode

Reply via email to