On Sat, Sep 17, 2011 at 10:00:31PM +0200, Stefan Weil wrote: > +#if MAX_OPC_PARAM_IARGS != 4 > +# error Fix needed, number of supported input arguments changed! > +#endif > +#if TCG_TARGET_REG_BITS == 32 > +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong, > + tcg_target_ulong, tcg_target_ulong, > + tcg_target_ulong, tcg_target_ulong, > + tcg_target_ulong, tcg_target_ulong); > +#else > +typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong, > + tcg_target_ulong, tcg_target_ulong); > +#endif
[...] > + case INDEX_op_call: > + t0 = tci_read_ri(&tb_ptr); > +#if TCG_TARGET_REG_BITS == 32 > + u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0), > + tci_read_reg(TCG_REG_R1), > + tci_read_reg(TCG_REG_R2), > + tci_read_reg(TCG_REG_R3), > + tci_read_reg(TCG_REG_R5), > + tci_read_reg(TCG_REG_R6), > + tci_read_reg(TCG_REG_R7), > + tci_read_reg(TCG_REG_R8)); > + tci_write_reg(TCG_REG_R0, u64); > + tci_write_reg(TCG_REG_R1, u64 >> 32); > +#else > + u64 = ((helper_function)t0)(tci_read_reg(TCG_REG_R0), > + tci_read_reg(TCG_REG_R1), > + tci_read_reg(TCG_REG_R2), > + tci_read_reg(TCG_REG_R3)); > + tci_write_reg(TCG_REG_R0, u64); > +#endif > + break; Unfortunately, this won't work on all architectures. C99 6.5.2.2 states: 9. If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined. We could perhaps get away with this on certain architectures (and on those architectures, doing it this way might be the most efficient option), although I'm not sure which those architectures are. The real problem is relates to alignment of parameters in registers when passing 64-bit ints as arguments on 32-bit architectures. Some ABIs have the situation where: void foo(uint32_t a, uint64_t b); results in: register contents p0 a p1 [padding] p2 b & 0xffffffff p3 b >> 32 An ABI may require this regardless of whether 64-bit integers are typically aligned to 64-bit addresses in memory (or it may even do this without such alignment for memory addresses). The ordering of the upper and lower 32-bits of a 64-bit parameter may have nothing at all to do with the architecture's endianness. The alignment rules when passing arguments via registers might not even be consistent with those when passing via the stack. In QEMU, tcg_gen_callN() handles alignment of registers for the architectures that currently have TCG backends. If any new backend were to require features not already supported by tcg_gen_callN(), then those features would simply have to be added. When using TCI, we could define TCG_TARGET_CALL_ALIGN_ARGS (and be careful to handle the REGPARM case under x86), and simply rely on tcg_gen_callN(), but this isn't guaranteed to work for all ABIs. Since TCI is intended to be portable, I feel that we should provide a means of calling helper functions that doesn't rely upon any ABI-specific definitions, at least as a fallback. It would probably make sense to get the generic code working first, and then think about optimising for specific ABIs later, IMO. So, this leaves the question of how to do this in a generic manner. Do we: 1) Include a pointer to a wrapper function in the bytecode, which would call the helper with the correct type. Each wrapper could just read from and write to the TCI registers itself without accepting/returning values, or the values of the TCI registers could be passed in as arguments to each of the wrapper functions. 2) Encode the type of the function into the bytecode, such that a huge switch() statement can be used to cast the function pointer to the appropriate type, allowing the helper to be invoked in a defined manner. My guess is that this would be slower when executing bytecode than 1), although it would be quicker for compilation of the bytecode. 3) Modify the helpers themselves to accept uint32_t arguments when using TCI. This would require quite a lot of work but would likely yield the best performance. However, it would prevent us from ever being able to choose between architecture-specific backends and TCI using a command line option. 4) Go with some other option that I've not considered? To me, option 1) seems like the simplest, although the macros needed to do this are likely to be a little hairy... I'm also concerned that we should not clobber R1 when storing a 32-bit return value in R0 on 32-bit architectures. Cheers, -- Stuart Brady