Amit Machhiwal <[email protected]> writes: > On IBM POWER systems, newer processor generations can operate in > compatibility modes corresponding to earlier generations. This becomes > relevant for nested virtualization, where nested KVM guests may need to > run with a specific processor compatibility level. > > Currently, when running a nested KVM guest (L2) inside a Power11 pSeries > logical partition (L1) booted in Power10 compatibility mode, the guest > fails to boot while setting 'arch_compat'. This happens because the CPU > class is derived from the hardware PVR (via mfspr()), which reflects the > physical processor generation (Power11), rather than the effective > compatibility mode (Power10). > > As a result, userspace may request a Power11 arch_compat for the L2 > guest. However, the L1 partition, running in Power10 compatibility, has > only negotiated support up to Power10 with the Power Hypervisor (L0). > When H_GUEST_SET_STATE is invoked with a Power11 Logical PVR, the > hypervisor rejects the request, leading to a late guest boot failure: > > KVM-NESTEDv2: couldn't set guest wide elements > [..KVM reg dump..] > > This situation should be detected earlier and rejected by KVM. Without > proper validation, if userspace ignores the error, the guest may continue > to boot in Power11 raw mode on a Power10 compatibility host, which should > not be allowed. > > Introduce a validation mechanism that detects unsupported arch_compat > values early in the guest initialization path. When an unsupported > arch_compat is requested (e.g., Power11 on a Power10 compatibility mode > host), kvmppc_set_arch_compat() uses cpu_has_feature(CPU_FTR_P11_PVR) to > detect the mismatch and sets arch_compat to PVR_ARCH_INVALID. This > triggers kvmppc_sanity_check() to mark the vCPU as invalid by setting > vcpu->arch.sane to false. On the next vCPU run, kvmppc_vcpu_run_hv() > checks this flag and returns -EINVAL, preventing the guest from running > with an invalid processor compatibility configuration. > > With this, when a Power11 arch_compat is requested on a Power10 > compatibility mode host, the guest fails early during boot with: > > error: kvm run failed Invalid argument > > This provides a much clearer failure mode compared to the previous > behavior where the guest could boot in Power11 raw mode (if userspace > ignored the error) or fail late during H_GUEST_SET_STATE. > > Suggested-by: Vaibhav Jain <[email protected]> > Reviewed-by: Vaibhav Jain <[email protected]> > Cc: [email protected] # v6.13+ > Signed-off-by: Amit Machhiwal <[email protected]> > --- > Changes in v3: > * Fixed null pointer dereference in kvmppc_sanity_check(): added check for > vcpu->arch.vcore before accessing arch_compat, as vcore is NULL for Book3S > PR and BookE guests (only Book3S HV uses vcore) [Reported by Sashiko AI] > * Added Reviewed-by tag from Vaibhav > > Changes in v2: > * Fixed issue where v1 allowed guest to boot in Power11 raw mode when > userspace ignored the error, by adding validation in kvmppc_sanity_check() > to ensure early failure during vCPU run [Found the issue after posting v1, > also reported by Gautam.]
Would be nice if we could post the matrix test results which Gautam posted earlier with this v3. I guess you meant you already tested all of those - it would be nice if we could explicitely put that info in the changelog. > * Introduced PVR_ARCH_INVALID constant for marking invalid arch_compat > * Dropped all Reviewed-by and Tested-by tags due to code changes; requesting > fresh reviews > * v1: > https://lore.kernel.org/all/[email protected]/ > > Changes in v1: > * Moved this patch out of the v3 series [1] as discussed here [2] > * Addressed below review comments from Ritesh: > - Based the PVR validation on cpu features > - Fixed hcall name typo > - Stable backport > > [1] https://lore.kernel.org/all/[email protected]/ > [2] https://lore.kernel.org/all/[email protected]/ > --- > arch/powerpc/include/asm/reg.h | 1 + > arch/powerpc/kvm/book3s_hv.c | 15 ++++++++++++++- > arch/powerpc/kvm/powerpc.c | 4 ++++ > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 3449dd2b577d..7472b9522f71 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -1356,6 +1356,7 @@ > #define PVR_ARCH_300 0x0f000005 > #define PVR_ARCH_31 0x0f000006 > #define PVR_ARCH_31_P11 0x0f000007 > +#define PVR_ARCH_INVALID 0xffffffff Logical processor version is defined as part of the PAPR spec. We should ensure that this invalid PVR is also documented in the PAPR spec. If you have already taken care of that, then please confirm and feel free to add: Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
