Hi Ruslan,

On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
From: Ruslan Ruslichenko <[email protected]>

This patch series is submitted as an RFC to gather early feedback on a Fault 
Injection (FI) framework built on top of the QEMU TCG plugin subsystem.

Motivation

Testing guest operating systems, hypervisors (like Xen), and low-level drivers 
against unexpected hardware failures can be difficult.
This series provides an interface to inject faults dynamically without altering 
QEMU's core emulation source code for every test case.

Architecture & Key Features

The series introduces the core API extensions and implements a fault injection 
plugin (contrib/plugins/fault_injection.c) targeting AArch64.
The plugin can be controlled statically via XML configurations on boot, or 
dynamically at runtime via a UNIX socket (enabling integration with automated 
testing frameworks via Python or GDB).

New Plugin API Capabilities:

MMIO Interception: Allows plugins to hook into 
memory_region_dispatch_read/write to modify hardware register reads or drop 
writes.
Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks 
to be scheduled based on guest virtual time.
TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force 
re-translation when applying dynamic PC-based hooks.
Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on 
the primary INTC and inject CPU exceptions (e.g., SErrors).
Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) 
can expose specific fault handlers (like CMDQ errors) to be triggered 
externally by plugins.

Patch Summary
Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to 
the public plugin API.
Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception 
routing, and the Custom Fault registry.
Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch 
path.
Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to 
enable direct hardware IRQ injection.
Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to 
demonstrate how device models can expose specific errors (like CMDQ faults) to 
plugins.
Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using 
the new APIs.
Patch 9 (docs): Adds documentation and usage examples for the plugin.

Request for Comments & Feedback

Any suggestions on improvements, potential edge cases, or issues with the 
current design are highly welcome.

Ruslan Ruslichenko (9):
   target/arm: Add API for dynamic exception injection
   plugins/api: Expose virtual clock timers to plugins
   plugins: Expose Transaction Block cache flush API to plugins
   plugins: Introduce fault injection API and core subsystem
   system/memory: Add plugin callbacks to intercept MMIO accesses
   hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
   hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
   contrib/plugins: Add fault injection plugin
   docs: Add description of fault-injection plugin and subsystem

  contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
  contrib/plugins/meson.build       |   1 +
  docs/fault-injection.txt          | 111 +++++
  hw/arm/smmuv3.c                   |  54 +++
  hw/intc/arm_gic.c                 |  28 ++
  hw/intc/arm_gicv3.c               |  28 ++
  include/plugins/qemu-plugin.h     |  28 ++
  include/qemu/plugin.h             |  39 ++
  plugins/api.c                     |  62 +++
  plugins/core.c                    |  11 +
  plugins/fault.c                   | 116 +++++
  plugins/meson.build               |   1 +
  plugins/plugin.h                  |   2 +
  system/memory.c                   |   8 +
  target/arm/cpu.h                  |   4 +
  target/arm/helper.c               |  55 +++
  16 files changed, 1320 insertions(+)
  create mode 100644 contrib/plugins/fault_injection.c
  create mode 100644 docs/fault-injection.txt
  create mode 100644 plugins/fault.c


first, thanks for posting your series!

About the general approach.
As you noticed, this is exposing a lot of QEMU internals, and it's something we tend to avoid to do. As well, it's very architecture specific, which is another pattern we try to avoid.

For some of your needs (especially IRQ injection and timer injection), did you consider writing a custom ad-hoc device and timer generating those? There is nothing preventing you from writing a plugin that can communicate with this specific device (through a socket for instance), to request specific injections. I feel that it would scale better than exposing all this to QEMU plugins API.

For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test device, associated to qtest to unit test the smmu implementation. We could maybe see to leverage that on a full machine, associated with the communication method mentioned above, to generate specific operations at runtime, all triggered via a plugin.

Exposing qemu_plugin_flush_tb_cache is a hint we are missing something on QEMU side. Better to fix it than expose this very internal function. The associated TRIGGER_ON_PC is very similar to existing inline operations. They could be enhanced to support writing to a given register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit more complex, but we might enhance inline operations also to support hooks on specific register writes.

For MMIO override, the current approach you have is good, and it's definitely something we could integrate.

What are you toughts about this? (especially the device based approach in case that you maybe tried first).

Regards,
Pierrick

Reply via email to