Hi Marc, Oliver, and other upstream friends, can you help review this patch series? I would really appreciate any comments and feedback.
[sorry for resending, as previous msg was sent as HTML] On Thu, Jul 31, 2025 at 1:58 PM Jiaqi Yan <[email protected]> wrote: > > Problem > ======= > > When host APEI is unable to claim a synchronous external abort (SEA) > during guest abort, today KVM directly injects an asynchronous SError > into the VCPU then resumes it. The injected SError usually results in > unpleasant guest kernel panic. > > One of the major situation of guest SEA is when VCPU consumes recoverable > uncorrected memory error (UER), which is not uncommon at all in modern > datacenter servers with large amounts of physical memory. Although SError > and guest panic is sufficient to stop the propagation of corrupted memory, > there is room to recover from an UER in a more graceful manner. > > Proposed Solution > ================= > > The idea is, we can replay the SEA to the faulting VCPU. If the memory > error consumption or the fault that cause SEA is not from guest kernel, > the blast radius can be limited to the poison-consuming guest process, > while the VM can keep running. > > In addition, instead of doing under the hood without involving userspace, > there are benefits to redirect the SEA to VMM: > > - VM customers care about the disruptions caused by memory errors, and > VMM usually has the responsibility to start the process of notifying > the customers of memory error events in their VMs. For example some > cloud provider emits a critical log in their observability UI [1], and > provides a playbook for customers on how to mitigate disruptions to > their workloads. > > - VMM can protect future memory error consumption by unmapping the poisoned > pages from stage-2 page table with KVM userfault [2], or by splitting the > memslot that contains the poisoned pages. > > - VMM can keep track of SEA events in the VM. When VMM thinks the status > on the host or the VM is bad enough, e.g. number of distinct SEAs > exceeds a threshold, it can restart the VM on another healthy host. > > - Behavior parity with x86 architecture. When machine check exception > (MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to > let VMM either recover from the MCE, or terminate itself with VM. > The prior RFC proposes to implement SIGBUS on arm64 as well, but > Marc preferred KVM exit over signal [3]. However, implementation > aside, returning SEA to VMM is on par with returning MCE to VMM. > > Once SEA is redirected to VMM, among other actions, VMM is encouraged > to inject external aborts into the faulting VCPU. > > New UAPIs > ========= > > This patchset introduces following userspace-visible changes to empower > VMM to control what happens for SEA on guest memory: > > - KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled > this new capability at VM creation, and the SEA is not owned by kernel > allocated memory, instead of injecting SError, return KVM_EXIT_ARM_SEA > to userspace. > > - KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details > about the SEA is provided in arm_sea as much as possible, including > sanitized ESR value at EL2, faulting guest virtual and physical > addresses if available. > > * From v2 [4]: > - Rebased on "[PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection" > [5] > and kvmarm/next commit 7b8346bd9fce ("KVM: arm64: Don't attempt vLPI > mappings when vPE allocation is disabled") > - Took the host_owns_sea implementation from Oliver [6, 7]. > - Excluded the guest SEA injection patches. > - Updated selftest. > > * From v1 [8]: > - Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid > dereferencing NULL ITE pointer"). > - Sanitize ESR_EL2 before reporting it to userspace. > - Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to > stage-2 translation table. > > [1] https://cloud.google.com/solutions/sap/docs/manage-host-errors > [2] https://lore.kernel.org/kvm/[email protected] > [3] https://lore.kernel.org/kvm/[email protected] > [4] https://lore.kernel.org/kvm/[email protected]/ > [5] > https://lore.kernel.org/kvmarm/[email protected]/ > [6] https://lore.kernel.org/kvm/[email protected]/#t > [7] https://lore.kernel.org/kvm/[email protected]/ > [8] https://lore.kernel.org/kvm/[email protected] > > Jiaqi Yan (3): > KVM: arm64: VM exit to userspace to handle SEA > KVM: selftests: Test for KVM_EXIT_ARM_SEA > Documentation: kvm: new UAPI for handling SEA > > Documentation/virt/kvm/api.rst | 61 ++++ > arch/arm64/include/asm/kvm_host.h | 2 + > arch/arm64/kvm/arm.c | 5 + > arch/arm64/kvm/mmu.c | 68 +++- > include/uapi/linux/kvm.h | 10 + > tools/arch/arm64/include/asm/esr.h | 2 + > tools/testing/selftests/kvm/Makefile.kvm | 1 + > .../testing/selftests/kvm/arm64/sea_to_user.c | 327 ++++++++++++++++++ > tools/testing/selftests/kvm/lib/kvm_util.c | 1 + > 9 files changed, 476 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c > > -- > 2.50.1.565.gc32cd1483b-goog >
