On 08/06/26 11:01 PM, Anushree Mathur wrote:
Hi Narayana,
I tested this patch and saw a qemu crash just after triggering error injection on guest after applying the patch series.
Here is my full analysis:

1) Started the guest
2) Attached an NVME backplane device to the guest.
3) Triggered the error injection on guest console for the attached NVME device.

It came to the following message and the guest crashed:

Injecting an ioa-bus-error...

Following is the qemu logs after crash:


2026-06-08T06:25:16.420015Z qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2026-06-08T06:25:16.495864Z RTAS: Read 236 bytes from device-tree
2026-06-08T06:28:43.769276Z qemu-system-ppc64: 0213:60:00.0 BAR 0: failed to create dma-buf: PCI BAR IOMMU mappings may fail: Invalid argument
2026-06-08T13:21:00.218278Z RTAS: Read 236 bytes from device-tree
2026-06-08T13:21:05.658991Z qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2026-06-08T13:21:05.794879Z RTAS: Read 236 bytes from device-tree
2026-06-08T13:41:59.736480Z RTAS: Read 236 bytes from device-tree
2026-06-08T13:42:13.712991Z qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM
Falling back to kernel-irqchip=off
2026-06-08T13:42:13.848406Z RTAS: Read 236 bytes from device-tree
2026-06-08T14:24:42.012714Z qemu-system-ppc64: 0213:60:00.0 BAR 0: failed to create dma-buf: PCI BAR IOMMU mappings may fail: Invalid argument
2026-06-08 14:34:09.570+0000: shutting down, reason=crashed

Hi Anushree,

Thank you very much for the detailed crash report and thorough testing!
This is extremely valuable feedback that helped identify a critical bug in
the error injection implementation.

I'll prepare a v4 of the patch series with this fix. Could you please test it
again once I send it out? I'd particularly appreciate testing with:
- Multiple PHBs in the system
- Devices attached to different PHBs
- Both IOA bus errors and other error types (if possible)

Thanks again for the excellent bug report!

Best regards,
Narayana Murty N

Thank you,
Anushree Mathur

On 20/05/26 3:24 PM, Narayana Murty N wrote:
This patch series implements comprehensive RTAS-based error injection
support for VFIO EEH (Enhanced Error Handling) on PowerPC sPAPR platforms. The implementation enables guest-initiated PCI error injection for improved
testing and diagnostics of EEH recovery mechanisms.

Background
----------
EEH is a critical feature on PowerPC platforms that provides error detection, isolation, and recovery for PCI devices. Testing EEH recovery paths requires the ability to inject various types of errors into the system. While physical hardware supports error injection through firmware interfaces, QEMU's VFIO
implementation previously lacked this capability.

This series bridges that gap by implementing the IBM RTAS error injection interface, allowing guests to inject PCI errors through the same firmware calls used on physical hardware. This enables comprehensive testing of device
drivers' EEH recovery code paths in virtualized environments.

Implementation Overview
-----------------------
The patch series introduces three new RTAS calls:
   - ibm,open-errinjct:  Opens an error injection session
   - ibm,errinjct:       Injects a specific error type
   - ibm,close-errinjct: Closes the error injection session

The implementation supports multiple error types including:
   - IOA bus errors (32-bit and 64-bit addressing)
   - Memory/IO/Config space load/store errors
   - DMA read/write errors
   - Cache and TLB corruption scenarios
   - Special recovery events

Tesed on pseries and powernv hosts on kvm guest with errinjct tool.

Patch Organization
------------------
Patch 1: Adds the VFIO backend for error injection
Patch 2: Implements the ibm,errinjct RTAS call handler
Patch 3: Adds session management (open/close) RTAS calls
Patch 4: Advertises capabilities via device tree properties
Patch 5: Refactors EEH specific code/stubs to new files.
Patch 6: Updates MAINTAINERS file

Changelog:
----------
v3:
   - Fixed the build failure reported at https://github.com/p-b-o/qemu-ci/actions/runs/26094993976    - Also fixed a gitlab CI breakage in patch 2 (qemu_log_mask LOG_UNIMP)
v2: Addressed refactor suggestions from Cedric, Pierrick
v1: https://lore.kernel.org/all/[email protected]/


Narayana Murty N (6):
   ppc/spapr: Add VFIO EEH error injection backend
   ppc/spapr: Add ibm,errinjct RTAS call handler
   ppc/spapr: Add support for 'ibm, open-errinjct' and 'ibm,
     close-errinjct'
   ppc/spapr: Advertise RTAS error injection call support via FDT
     property
   ppc/spapr: Split VFIO code and refactor EEH interface
   MAINTAINERS: Add entry for sPAPR PCI VFIO EEH support

  MAINTAINERS                  |   6 +
  hw/ppc/Kconfig               |   2 +-
  hw/ppc/meson.build           |   1 +
  hw/ppc/spapr.c               | 104 +++++++++++
  hw/ppc/spapr_pci.c           | 219 ++++++++++++++++++++++
  hw/ppc/spapr_pci_vfio.c      | 314 +------------------------------
  hw/ppc/spapr_pci_vfio_eeh.c  | 346 +++++++++++++++++++++++++++++++++++
  include/hw/pci-host/spapr.h  |  37 +---
  include/hw/ppc/spapr.h       |  57 +++++-
  include/hw/ppc/spapr_vfio.h  |  28 +++
  stubs/meson.build            |   1 +
  stubs/spapr_phb_vfio-stubs.c |  52 ++++++
  12 files changed, 816 insertions(+), 351 deletions(-)
  create mode 100644 hw/ppc/spapr_pci_vfio_eeh.c
  create mode 100644 include/hw/ppc/spapr_vfio.h
  create mode 100644 stubs/spapr_phb_vfio-stubs.c


Reply via email to