Hello everyone, this is a proposal in the libtcg direction, i.e., enabling building a single QEMU executable featuring multiple *targets* (i.e., TCG frontends). This follows what we discussed in recent KVM calls and originally at the KVM Forum Emulation BoF.
Note that our commitment to this project is possible thanks to our collaboration with the Qualcomm Innovation Center! # Grand goal The following is the core point where translation of input code to host code takes place: TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc, target_ulong cs_base, uint32_t flags, int cflags) { // ... gen_intermediate_code(cpu, tb, max_insns, pc, host_pc); // ... gen_code_size = tcg_gen_code(tcg_ctx, tb, pc); // ... } https://github.com/qemu/qemu/blob/4e4fa6c/accel/tcg/translate-all.c#L816 We want: * `gen_intermediate_code` to be target-specific (provided by a shared library) * `tb_gen_code` and `tcg_gen_code` to be target-agnostic Result: the frontend just emits instructions into the `TranslationBlock`, not much more. But first we need to make sure that QEMU is neatly divided into components that can make compile-time assumptions about the target and components that don't, which is what this proposal is about. This will be a large chunk of work affecting many parts of the code and requiring some infrastructural changes. # Strategy The strategy to make this happen is primarily to have an out-of-tree branch where the development of libtcg takes place. The main goal is to be able to limit usage of `NEED_CPU_H` and some other macros only to certain places (mainly `target/*` and `{bsd,linux}-user/*`). Initially we will focus on getting a `--target-list=x86_64-linux-user` configuration to build with reduced usage of `NEED_CPU_H`. This is just for simplicity, we intend to tackle all other configurations step-by-step. At first, there will be lots of hacks to make it build and QEMU won't be operational, but building it will enable us to identify all the core issues. Once all the core issues have been identified, we intend to prepare fixes for each one of them and frequently send them upstream in a form palatable for master. For instance, we expect to have a patch to move from `TARGET_TB_PCREL` to `TranslationBlock.pc_rel`. This way, we hope to be able to tackle the project in small bites and, at a certain point, send a patch upstream where `NEED_CPU_H` is actually disabled in many many places. Given the amount of work, doing everything in a single shot does not seem feasible or beneficial. # Core issues identifed so far In the following we report some of the core issues we identified, why they are problematic and what we intend to do to tackle them. Feedback on each and every point is very welcome! :) ## `NEED_CPU_H` `NEED_CPU_H` is a macro to decide whether a certain translation unit needs `target/$ARCH/cpu.h` or not. In several headers, this macro is used to guard target-specific code. Problem: we need to build translation units out of the `target/` and `*-user/` directories *without* this macro being defined. This also means that many files will be compiled only once and not once per target! Solution: Fix all the issues caused by undefining it (main ones are reported below). Sometimes a solution is to take headers that are currently target-specific headers and introduce `#ifdef NEED_CPU_H` in certain parts so that they become target-agnostic. At a certain point it might make sense to count how many lines are guarded by such macro and consider splitting headers into target-agnostic and target-specific ones. Note: currently, we plan to get rid of all usages of `TARGET_*` macros in code that we want to become target agnostic but *not* of `CONFIG_USER_ONLY` and `CONFIG_SOFTMMU`. At least initially, our goal is just to decouple translation units out of `target/`/`*-user/` from target-specific assumptions. ## `target_ulong` `target_ulong` is `uint32_t` in 32-bit targets and `uint64_t` in 64-bit targets. Problem: This is used in many many places to represent addresses in code that could become target-independent. Proposed solution: we can convert it to: typedef uint64_t target_address; The problem with this is that, if arithmetic operations are performed on it, we might get undesired results: // Was: char load_data(target_ulong address) char load_data(target_address address) { char *base_address = get_base_address(); // On a 32-bits target this would overflow, it doesn't with // uint64_t target_address real_address = address + 1; return *(base_address + real_address); } load_data(UINT32_MAX); This is might be a source of subtle bugs. We're using a rough `clang-query` invocation to see who performs arithmetic on `target_ulong`: $ cd build $ cat compile_commands.json \ | jq -r '.[] | .["file"]' \ | grep -v 'target/' \ | grep -v 'linux-user/' \ | xargs -P$(nproc) -n1 \ clang-query \ --extra-arg='-Wno-error' \ --extra-arg='-w' \ -p . \ -c 'match binaryOperator(anyOf(hasOperatorName("-"), hasOperatorName("+")), hasDescendant(expr(hasType(typedefDecl(anyOf(hasName("target_ulong"), hasName("abi_ulong"))))))).bind("x")' \ | grep -v '0 matches.' \ | grep 'binds here' \ | sort -u \ | wc -l 34 Not too bad. An alternative would be to do some "data hiding" and have: struct target_address { uint64_t address; }; And then have `increment_address(target_address *, CPUState *)` in order to perform safe arithmetic depending on the current target. However this seems a bit excessive, we could rely on simply letting the developer mask addresses in cases of arithmetic in target-independent code. ## `abi_ulong` Similar to `target_ulong`, but with alignment info. The role of `abi_ulong` is not entirely clear to me. Some feedback on this would be welcome. ## `TCGv` `TCGv` is a macro for `TCGv_i32` for 32-bit targets and `TCGv_i64` for 64-bit targets. Problem: it makes `tcg-op.h` and, more importantly, `tcg-op.c` target-specific. Solution: transform current functions using them into target-specific wrappers that dispatch to target-agnostic functions that accept `TCGv_dyn` instead of `TCGv`: typedef struct { union { TCGv_i32 i32; TCGv_i64 i64; }; bool is_64; } TCGv_dyn; In the semi-last KVM call, it has been suggested that instead of having a discriminated `union` we could stuff the address size in `MemOp`, which is a common argument to functions taking a `TCGv`. However, that would either require one of the following: * still have `TCG_dyn` as a union, use `MemOp` instead of the `is_64` field; however I think the discriminated union is more suitable and it'd be a bit of an abuse to use `MemOp` since currently it describe *how* to perform the load but it says nothing about the size of the address; * make `TCGv` become opaque (`typedef struct TCGv_d *TCGv`) and put a cast in `tcg-op.h` code, we'd lose type safety here; * use `_Generic` but that, in turn, would require either moving all functions taking a `TCGv` to the header and turn them into macros *or* turn `tcg-op.c` into a template file where each function taking a `TCGv` is instantiated twice with different suffixes depending on the type of the argument; The changeset for the `TCGv_dyn` solution, on the other hand, is rather small: https://github.com/revng/qemu-hexagon/commit/f4098d84c241b86418e401396851d12f2fdbcf7a https://github.com/revng/qemu-hexagon/commit/251ef7631640e645a03b0ffa72fa96f3eb686129 ## `TARGET_` macros These are macros that provide target-specific information. Problem: they need to be abandoned in translation units that need to become target agnostic. Solution: promote them to fields of a `struct`. Current ideas: TARGET_TB_PCREL -> TranslationBlock.pc_rel TARGET_PAGE_BITS -> TranslationBlock.page_bits TARGET_PAGE_MASK -> TranslationBlock.page_mask TARGET_PAGE_ALIGN -> CPUArchState.page_align -> DisasContextBase.page_align TARGET_LONG_BITS -> TCGContext.long_bits TARGET_PAGE_SIZE -> ??? TCG_OVERSIZED_GUEST -> ??? TARGET_FMT_lx -> ??? CPU_RESOLVING_TYPE -> ??? Sometimes, this is trivial, other times this means adding arguments to functions that previously could rely on direct macro use. ## `CPUState` `CPUState` is a target-agnostic `struct` representing common information of `ArchCPU`. Problem: given an (opaque) pointer to `CPU${Arch}State`, target-agnostic code often wants to reach `CPUState`, but this requires knowledge of the layout of `ArchCPU`, which is target-dependent. Solution: have a target-specific function to obtain the pointer to `CPUState` given `CPU${Arch}State`. Where this function would go, and how it would be retrieved, needs more consideration. I imagine a table of target-specific function pointers. ## `CPUNegativeOffsetState` `CPUNegativeOffsetState` is a `struct` placed in `ArchCPU` before `CPU${Arch}State`. struct ArchCPU { /*< private >*/ CPUState parent_obj; /*< public >*/ CPUNegativeOffsetState neg; CPUAlphaState env; /* This alarm doesn't exist in real hardware; we wish it did. */ QEMUTimer *alarm_timer; }; Problem: used in several parts of the code that need to become target-agnostic. Solution: make it target-agnostic and push it into `CPUState`, which is the place for target-agnostic stuff. I'm not sure why this isn't already the case. I'm probably missing something here, so feedback is welcome. # What's left out? Quite a few minor things but.. you tell me! There's probably *a lot* more stuff when we dig into system mode emulation, but I'd postpone investigating those issues until we have user mode in a decent shape. Apart from this, let me know if there's something else we're missing. # Current status We currently have a branch where we can build (but not link nor run) a `x86_64-linux-user` configuration where `NEED_CPU_H` is defined only for translation units in `target/` and `linux-user/`. It's the result of a (painful) journey. You can try it here: git clone https://github.com/revng/qemu-hexagon.git cd qemu-hexagon git checkout feature/multi-tcg mkdir build cd build ../configure --target-list=x86_64-linux-user ninja -k0 | grep FAILED # It should build but fail linking qemu-x86_64 This process required commenting a lot of code, it's not supposed to work at all and it's quite ugly. However, it has been very useful since it enabled us to use the compiler to discover what is problematic. Sorry for the wall of text. Enjoy the last few days of 2022! -- Alessandro Di Federico rev.ng Labs