On Wed, Jun 10, 2026 at 02:41:58PM -0500, Michael Roth wrote:
> On Wed, Jun 10, 2026 at 05:00:11PM +0530, Naveen N Rao wrote:
> > [+Sean]
> >
> > Hi Mike,
> >
> > On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote:
> > > On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote:
> > > > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
> > > > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
> > > > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
> > > > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
> > > > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
> > > > type SEV-ES guests. As a result of this, it is no longer possible to
> > > > start SEV-ES guests with any SEV feature enabled (in particular,
> > > > debug-swap) with pflash devices.
> > > >
> > > > This is an issue since SEV-ES guests have historically used pflash
> > > > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
> > > > while using pflash devices, so work around the KVM limitation by
> > > > switching to using a VMA-based write protection. This allows
> > > > well-behaved SEV-ES guests to continue to work with pflash devices and
> > > > enable debug-swap. Mis-behaving guests trying to write to the protected
> > > > OVMF area will be killed.
> > >
> > > Based on Sean's description, a write access to a read-only memslot would
> > > cause the vCPU to permanently spin on #NPFs if trying to write to it as
> > > MMIO due to #VC handler not triggering, and that's why we don't support
> > > read-only memslots. But since SEV-ES was previously working with pflash,
> > > it seems like it does not rely on this functionality...
> >
> > Right, normal well-behaved SEV-ES/SNP guests work just fine as they
> > don't write to any of the read-only areas.
> >
> > >
> > > So if OVMF isn't writing to write-protected memory, then it wouldn't be
> > > triggering the MMIO emulation path in the first place. And if we don't
> > > care about enabling the emulation path in this case... then I'm not sure
> > > the original reasons for not allowing it for SEV-ES/SNP are applicable.
> >
> > Guest (not just OVMF) could try and write to the read-only area
> > triggering this issue. A simple write to 0xc0000 from within the guest
> > triggers this.
>
> Is that still true even with this patch?
>
> commit 0f4a1e80989aca185d955fcd791d7750082044a2
> Author: Kevin Loughlin <[email protected]>
> Date: Wed Mar 13 12:15:46 2024 +0000
>
> x86/sev: Skip ROM range scans and validation for SEV-SNP guests
>
> SEV-SNP requires encrypted memory to be validated before access.
> Because the ROM memory range is not part of the e820 table, it is not
> pre-validated by the BIOS. Therefore, if a SEV-SNP guest kernel wishes
> to access this range, the guest must first validate the range.
>
> The current SEV-SNP code does indeed scan the ROM range during early
> boot and thus attempts to validate the ROM range in probe_roms().
> However, this behavior is neither sufficient nor necessary for the
> following reasons:
>
> ...
Yes, that was mostly a change for SEV-SNP guests, and only to not have
the kernel access those regions. Userspace is still free to access
through /dev/mem.
>
> but in that case, those private accesses didn't work because they were
> accessing legacy MMIO regions as private/encrypted even though none of the
> option ROMs were loaded into memory as encrypted, so they're basically just
> garbage/legacy regions we try to completely ignore on the guest-side now and
> any lingering cases should probably get the same treatment.
>
> It would be nice to still be able to catch write accesses....but I think we
> still could (with the kernel changes discussed in my reply to Tom) if we
> really wanted that. But is that really a hard requirement? Personally, the
> -bios vs pflash argument thing makes this feel justified since -bios also
> let's the writes through silently, but maybe we can do better with kernel
> changes.
Indeed.
>
> >
> > >
> > > It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE,
> > > whereas we literally just need as actually slot that's permanently mapped
> > > in the NPT without write access.
> > >
> > > Is that an accurate summary of the situation?
> >
> > Yes, that sounds right to me.
> >
> > >
> > > If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures
> > > this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT.
> >
> > That would certainly work.
> >
> > > Or, for
> > > a QEMU-specific workaround, just have a pflash implementation that doesn't
> > > rely on KVM_MEM_READONLY for cases like this where we don't need MMIO
> > > emulation.
> >
> > Not sure I follow that... are you suggesting that pflash use regular RW
> > memslots and just let the write through?
>
> Yes, isn't that basically what we're getting with -bios? At least this
> way we don't have the awkwardness of needing to randomly switch from -bios
> to pflash based on what SEV features the user selects, which is pretty
> bad.
>
> But that was more of a last resort, maybe we haven't yet bottomed out on
> whether we could do things a bit more nicely with some kernel help as
> discussed elsewhere in this thread.
Yes, letting the writes through is simple enough as a last resort. The
VMA based protection I have implemented here is the other option if we
want to be able to prevent writes without KVM's help (but will likely
need more work overall).
Thanks,
Naveen