tcg: Expand on multi-threaded TCG

Paolo Bonzini Thu, 28 May 2026 02:05:25 -0700

Before looking at the specifics, I appreciate you bold and experimentingwith how to improve our documentation. I can see that this is largelyunedited LLM output, and honestly, that is actually a good thing forthis experiment.

On the other hand, it exposes where the tool falls short, and highlightsvery clearly the risks of accepting AI-generated content too leisurely.


On 5/28/26 10:20, Philippe Mathieu-Daudé wrote:

Significantly expands the TCG documentation to provide more
comprehensive overview of its internal architecture.

Use more rST anchors to improve cross-referencing across the
documentation.

Clarify front-end / optimization / back-end phases.

Detail a bit memory consistency barriers under MTTCG mode.

Add the following new sections:

  - Register Allocation and Liveness analysis
  - Overviews of the Vector/SIMD internal strategy
  - Deterministic Execution (icount)
  - TCG Plugins
  - Instruction Decoding with decodetree

This commit message is not really up to the standards. It is purely a"what" which can be obtained just by glancing at the section headers.

It should explain the purpose of tcg.rst and why these new sections weresingled out.

+The translation process occurs in several distinct passes:
+
+1. **Front-end**: Guest instructions are parsed (often using the
+   `decodetree <Instruction Decoding (decodetree)_>`_ tool) and converted
+   into target-independent TCG Intermediate Representation (IR) opcodes.
+2. **Optimization**: TCG performs passes such as constant folding, liveness
+   analysis, and dead code elimination on the IR.
+3. **Back-end**: The optimized IR is converted by a host-specific code
+   generator into native instructions for the host CPU.

The sections below should be sorted according to these sections, whenapplicable.


Register allocation also fits somewhere, probably in "back-end".

There should be also another sentence for the TCG run-time (accel/tcg).

+Register Allocation and Liveness
+--------------------------------
+
+During the translation phase, guest instructions are converted into TCG IR
+using an **unlimited number of temporaries (TEMPs)**.
+This allows guest translators to express logic without being constrained
+by the finite register set of the host CPU.
+
+To resolve these TEMPs into physical registers, TCG performs two passes:
+
+1. **Liveness Analysis**: This pass determines the "live range" of each
+   temporary within a basic block. By identifying when a variable
+   becomes "dead" (i.e., its value is no longer needed), TCG can suppress
+   redundant moves and remove instructions that compute unused results.
+2. **Register Allocation**: The Global Register Allocator maps live TEMPs
+   to host physical registers. Fixed globals, such as the pointer
+   to the CPU architecture state (``cpu_env``), are often permanently
+   held in host registers to minimize memory traffic during execution.
+
+Vector/SIMD Internal Strategy
+-----------------------------
+
+TCG supports SIMD operations through a set of generic vector instructions
+(e.g., ``add_vec``, ``shli_vec``) parameterized by vector length and element
+size. The length is specified as a ``TCGType`` (V64, V128, or V256), and the
+element size is given in log2 8-bit units.
+
+The internal strategy relies on the backend mapping these generic opcodes
+to native host SIMD instructions, such as x86 AVX or ARM NEON. If the host
+backend does not support a specific vector operation  or length, TCG's
+expansion layer automatically decomposes the opcode into smaller supported
+vector sizes or standard integer operations.
+
+Deterministic Execution (icount)
+--------------------------------
+
+The :ref:`icount` mechanism provides deterministic execution by ensuring
+that each Translation Block executes a fixed number of instructions.

Hallucination (to put it kindly). It ensures that QEMU_CLOCK_VIRTUAL isa multiple of the number of instructions executed.

This
+is essential for features like record/replay and deterministic virtual time,
+where instruction counts serve as the system clock.
+
+Instrumentation and Plugins
+---------------------------
+
+:ref:`TCG Plugins` provide a mechanism for runtime instrumentation. Opcodes
+like ``plugin_cb`` and ``plugin_mem_cb`` are inserted during translation to
+trigger callbacks in external modules, allowing analysis of instruction
+execution or memory access.
+
+Instruction Decoding (decodetree)
+---------------------------------
+
+The first step of the translation process is converting a raw bitstream of
+guest instructions into a structured format that the translator can process.

Is this true? Maybe "extracting operands from the raw bitstream ofguest instructions, for easier processing in the translator"?

+QEMU simplifies this using the ``decodetree.py`` script, which generates C
+code decoders from a domain-specific language defined in ``.decode`` files.
+
+The decodetree tool allows developers to define instruction **patterns**
+based on a bitmask and fixed bits. When a match is found, the generated
+decoder automatically  extracts defined **fields** (such as registers or
+immediates) and passes  them to a manually written translation function.
+
+This declarative approach drastically reduces the amount of error-prone
+manual bit-shifting and nested "if-else" logic required in guest translators.

I would just say "``decodetree`` simplifies writing and maintaining thefront-end compared to manual decoding". Maybe it's worth addingsomething like "Note however that it is mostly applicable to processorswhose instruction encoding is fixed length, or mostly fixed length.".

+For detailled implementation see :ref:`decodetree`.


"detailed".

Honestly, I'm not impressed by the quality of the output. There's noorganization, just a bunch of new sections in no order (decodetree comeslast). They might be good enough for a glossary, but for developerdocumentation it would just add structural debt(*). At the very leastall the "----"-level sections should be split into front-end,optimization, back-end and run-time.

Again, this is not about you---I hope you knew that this wasn't going tobe included as is. :) Submitting this without manual editing shows thebaseline capabilities of the LLM and highlights the importance of humansteering.


Paolo

(*) I have just made this term up, but I think it should be a thing - wehave a lot of it already in QEMU docs

Re: [PATCH] docs/devel/tcg: Expand on multi-threaded TCG

Reply via email to