Hi Narayana,
I tested this patch and saw a qemu crash just after triggering error injection on guest after applying the patch series.
Here is my full analysis:

1) Started the guest
2) Attached an NVME backplane device to the guest.
3) Triggered the error injection on guest console for the attached NVME device.

It came to the following message and the guest crashed:

Injecting an ioa-bus-error...

Following is the qemu logs after crash:


2026-06-08T06:25:16.420015Z qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2026-06-08T06:25:16.495864Z RTAS: Read 236 bytes from device-tree
2026-06-08T06:28:43.769276Z qemu-system-ppc64: 0213:60:00.0 BAR 0: failed to create dma-buf: PCI BAR IOMMU mappings may fail: Invalid argument
2026-06-08T13:21:00.218278Z RTAS: Read 236 bytes from device-tree
2026-06-08T13:21:05.658991Z qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2026-06-08T13:21:05.794879Z RTAS: Read 236 bytes from device-tree
2026-06-08T13:41:59.736480Z RTAS: Read 236 bytes from device-tree
2026-06-08T13:42:13.712991Z qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2026-06-08T13:42:13.848406Z RTAS: Read 236 bytes from device-tree
2026-06-08T14:24:42.012714Z qemu-system-ppc64: 0213:60:00.0 BAR 0: failed to create dma-buf: PCI BAR IOMMU mappings may fail: Invalid argument
2026-06-08 14:34:09.570+0000: shutting down, reason=crashed


Thank you,
Anushree Mathur

On 20/05/26 3:24 PM, Narayana Murty N wrote:
This patch series implements comprehensive RTAS-based error injection
support for VFIO EEH (Enhanced Error Handling) on PowerPC sPAPR platforms.
The implementation enables guest-initiated PCI error injection for improved
testing and diagnostics of EEH recovery mechanisms.

Background
----------
EEH is a critical feature on PowerPC platforms that provides error detection,
isolation, and recovery for PCI devices. Testing EEH recovery paths requires
the ability to inject various types of errors into the system. While physical
hardware supports error injection through firmware interfaces, QEMU's VFIO
implementation previously lacked this capability.

This series bridges that gap by implementing the IBM RTAS error injection
interface, allowing guests to inject PCI errors through the same firmware
calls used on physical hardware. This enables comprehensive testing of device
drivers' EEH recovery code paths in virtualized environments.

Implementation Overview
-----------------------
The patch series introduces three new RTAS calls:
   - ibm,open-errinjct:  Opens an error injection session
   - ibm,errinjct:       Injects a specific error type
   - ibm,close-errinjct: Closes the error injection session

The implementation supports multiple error types including:
   - IOA bus errors (32-bit and 64-bit addressing)
   - Memory/IO/Config space load/store errors
   - DMA read/write errors
   - Cache and TLB corruption scenarios
   - Special recovery events

Tesed on pseries and powernv hosts on kvm guest with errinjct tool.

Patch Organization
------------------
Patch 1: Adds the VFIO backend for error injection
Patch 2: Implements the ibm,errinjct RTAS call handler
Patch 3: Adds session management (open/close) RTAS calls
Patch 4: Advertises capabilities via device tree properties
Patch 5: Refactors EEH specific code/stubs to new files.
Patch 6: Updates MAINTAINERS file

Changelog:
----------
v3:
   - Fixed the build failure reported at 
https://github.com/p-b-o/qemu-ci/actions/runs/26094993976
   - Also fixed a gitlab CI breakage in patch 2 (qemu_log_mask LOG_UNIMP)
v2: Addressed refactor suggestions from Cedric, Pierrick
v1: https://lore.kernel.org/all/[email protected]/


Narayana Murty N (6):
   ppc/spapr: Add VFIO EEH error injection backend
   ppc/spapr: Add ibm,errinjct RTAS call handler
   ppc/spapr: Add support for 'ibm, open-errinjct' and 'ibm,
     close-errinjct'
   ppc/spapr: Advertise RTAS error injection call support via FDT
     property
   ppc/spapr: Split VFIO code and refactor EEH interface
   MAINTAINERS: Add entry for sPAPR PCI VFIO EEH support

  MAINTAINERS                  |   6 +
  hw/ppc/Kconfig               |   2 +-
  hw/ppc/meson.build           |   1 +
  hw/ppc/spapr.c               | 104 +++++++++++
  hw/ppc/spapr_pci.c           | 219 ++++++++++++++++++++++
  hw/ppc/spapr_pci_vfio.c      | 314 +------------------------------
  hw/ppc/spapr_pci_vfio_eeh.c  | 346 +++++++++++++++++++++++++++++++++++
  include/hw/pci-host/spapr.h  |  37 +---
  include/hw/ppc/spapr.h       |  57 +++++-
  include/hw/ppc/spapr_vfio.h  |  28 +++
  stubs/meson.build            |   1 +
  stubs/spapr_phb_vfio-stubs.c |  52 ++++++
  12 files changed, 816 insertions(+), 351 deletions(-)
  create mode 100644 hw/ppc/spapr_pci_vfio_eeh.c
  create mode 100644 include/hw/ppc/spapr_vfio.h
  create mode 100644 stubs/spapr_phb_vfio-stubs.c



Reply via email to