[+Sean]
Hi Mike,
On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote:
> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote:
> > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
> > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
> > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
> > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
> > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
> > type SEV-ES guests. As a result of this, it is no longer possible to
> > start SEV-ES guests with any SEV feature enabled (in particular,
> > debug-swap) with pflash devices.
> >
> > This is an issue since SEV-ES guests have historically used pflash
> > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
> > while using pflash devices, so work around the KVM limitation by
> > switching to using a VMA-based write protection. This allows
> > well-behaved SEV-ES guests to continue to work with pflash devices and
> > enable debug-swap. Mis-behaving guests trying to write to the protected
> > OVMF area will be killed.
>
> Based on Sean's description, a write access to a read-only memslot would
> cause the vCPU to permanently spin on #NPFs if trying to write to it as
> MMIO due to #VC handler not triggering, and that's why we don't support
> read-only memslots. But since SEV-ES was previously working with pflash,
> it seems like it does not rely on this functionality...
Right, normal well-behaved SEV-ES/SNP guests work just fine as they
don't write to any of the read-only areas.
>
> So if OVMF isn't writing to write-protected memory, then it wouldn't be
> triggering the MMIO emulation path in the first place. And if we don't
> care about enabling the emulation path in this case... then I'm not sure
> the original reasons for not allowing it for SEV-ES/SNP are applicable.
Guest (not just OVMF) could try and write to the read-only area
triggering this issue. A simple write to 0xc0000 from within the guest
triggers this.
>
> It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE,
> whereas we literally just need as actually slot that's permanently mapped
> in the NPT without write access.
>
> Is that an accurate summary of the situation?
Yes, that sounds right to me.
>
> If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures
> this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT.
That would certainly work.
> Or, for
> a QEMU-specific workaround, just have a pflash implementation that doesn't
> rely on KVM_MEM_READONLY for cases like this where we don't need MMIO
> emulation.
Not sure I follow that... are you suggesting that pflash use regular RW
memslots and just let the write through?
Thanks,
Naveen
> There's actually another case in hw/nvram/nrf51_nvm.c where this
> would be applicable. I guess it could be done automatically for the
> confidential VM case to retain cmdline compatibility...though you're
> wanting to add the debugswap feature anyway so not sure how important that
> aspect is.
>
> Thanks,
>
> Mike
>
> >
> > Enable VMA protection and set the memory to be RO when adding the KVM
> > memory slot. Because pflash devices support command-mode, change VMA
> > protection to RW when tearing down the KVM memory slot. KVM
> > SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the
> > protection when calling that.
> >
> > Print a warning when switching to VMA-based protection so that it is
> > clear that KVM itself isn't supporting readonly memory, and that a
> > workaround is in place. Users can plan on switching to using '-bios'.
> >
> > Finally, drop the check rejecting SEV-ES guests with SEV features so
> > that debug-swap can be enabled.
> >
> >
> > Signed-off-by: Naveen N Rao (AMD) <[email protected]>
> > ---
> > Background discussion on this issue:
> > http://lore.kernel.org/r/[email protected]
> >
> > This series depends on VMSA features support:
> > http://lore.kernel.org/r/[email protected]
> >
> >
> > - Naveen
> >
> >
> > include/system/kvm.h | 5 ++++
> > include/system/kvm_int.h | 1 +
> > accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++-
> > hw/i386/pc_sysfw.c | 19 +++++++++------
> > target/i386/sev.c | 21 +++++++++++-----
> > 5 files changed, 84 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/system/kvm.h b/include/system/kvm.h
> > index 5fa33eddda38..585058bd6f1c 100644
> > --- a/include/system/kvm.h
> > +++ b/include/system/kvm.h
> > @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void);
> >
> > void kvm_mark_guest_state_protected(void);
> >
> > +void kvm_enable_ro_mem_vma_protection(void);
> > +
> > /**
> > * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
> > * reported for the VM.
> > @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start,
> > uint64_t size);
> >
> > int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
> >
> > +void kvm_set_memory_readonly(void *addr, size_t len);
> > +void kvm_set_memory_readwrite(void *addr, size_t len);
> > +
> > /* argument to vmfd change notifier */
> > typedef struct VmfdChangeNotifier {
> > int vmfd;
> > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
> > index 0876aac938d3..0e083a56ce2a 100644
> > --- a/include/system/kvm_int.h
> > +++ b/include/system/kvm_int.h
> > @@ -123,6 +123,7 @@ struct KVMState
> > OnOffAuto kernel_irqchip_split;
> > bool sync_mmu;
> > bool guest_state_protected;
> > + bool guest_wants_ro_mem_vma_protection;
> > uint64_t manual_dirty_log_protect;
> > /*
> > * Older POSIX says that ioctl numbers are signed int, but in
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 96f90ebb240f..4208df5b25ac 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start,
> > uint64_t size)
> > return kvm_set_memory_attributes(start, size, 0);
> > }
> >
> > +static void kvm_set_memory_flags(void *addr, size_t len, int flags)
> > +{
> > + if (mprotect(addr, len, flags)) {
> > + error_report("failed to apply memory protection "
> > + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'",
> > + (hwaddr)addr, len, strerror(errno));
> > + exit(1);
> > + }
> > +}
> > +
> > +void kvm_set_memory_readonly(void *addr, size_t len)
> > +{
> > + if (kvm_state->guest_wants_ro_mem_vma_protection) {
> > + kvm_set_memory_flags(addr, len, PROT_READ);
> > + }
> > +}
> > +
> > +void kvm_set_memory_readwrite(void *addr, size_t len)
> > +{
> > + if (kvm_state->guest_wants_ro_mem_vma_protection) {
> > + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE);
> > + }
> > +}
> > +
> > +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr)
> > +{
> > + if (!memory_region_is_ram(mr) &&
> > + (mr->readonly || mr->rom_device) &&
> > + !kvm_readonly_mem_allowed &&
> > + kvm_state->guest_wants_ro_mem_vma_protection) {
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > /* Called with KVMMemoryListener.slots_lock held */
> > static void kvm_set_phys_mem(KVMMemoryListener *kml,
> > MemoryRegionSection *section, bool add)
> > @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> > void *ram;
> >
> > if (!memory_region_is_ram(mr)) {
> > - if (writable || !kvm_readonly_mem_allowed) {
> > + if (writable || (!kvm_readonly_mem_allowed &&
> > + !kvm_state->guest_wants_ro_mem_vma_protection)) {
> > return;
> > } else if (!mr->romd_mode) {
> > /* If the memory device is not in romd_mode, then we actually
> > want
> > @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> > kvm_slot_sync_dirty_pages(mem);
> > }
> >
> > + if (kvm_mem_wants_vma_protection(mr)) {
> > + kvm_set_memory_readwrite(mem->ram, mem->memory_size);
> > + }
> > +
> > /* unregister the slot */
> > g_free(mem->dirty_bmap);
> > mem->dirty_bmap = NULL;
> > @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> > }
> > }
> >
> > + if (kvm_mem_wants_vma_protection(mr)) {
> > + kvm_set_memory_readonly(mem->ram, mem->memory_size);
> > + }
> > +
> > start_addr += slot_size;
> > ram_start_offset += slot_size;
> > ram += slot_size;
> > @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void)
> > kvm_state->guest_state_protected = true;
> > }
> >
> > +void kvm_enable_ro_mem_vma_protection(void)
> > +{
> > + kvm_state->guest_wants_ro_mem_vma_protection = true;
> > +}
> > +
> > int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> > {
> > int fd;
> > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> > index 1a41a5972bd0..9590458e00c5 100644
> > --- a/hw/i386/pc_sysfw.c
> > +++ b/hw/i386/pc_sysfw.c
> > @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms,
> > }
> > } else {
> > if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
> > - /*
> > - * Older KVM cannot execute from device memory. So, flash
> > - * memory cannot be used unless the readonly memory kvm
> > - * capability is present.
> > - */
> > - error_report("pflash with kvm requires KVM readonly memory
> > support");
> > - exit(1);
> > + if (sev_es_enabled() && !sev_snp_enabled()) {
> > + warn_report("pflash not supported with SEV-ES guests, "
> > + "attempting VMA based protection");
> > + } else {
> > + /*
> > + * Older KVM cannot execute from device memory. So, flash
> > + * memory cannot be used unless the readonly memory kvm
> > + * capability is present.
> > + */
> > + error_report("pflash with kvm requires KVM readonly memory
> > support");
> > + exit(1);
> > + }
> > }
> >
> > pc_system_flash_map(pcms, rom_memory);
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index f04ae4e91f3e..82cf2c562729 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState
> > *sev_common, uint64_t sev_features,
> > return -1;
> > }
> > } else {
> > - if (sev_features && sev_es_enabled()) {
> > - error_setg(errp,
> > - "%s: SEV features are not supported with SEV-ES at
> > this time",
> > - __func__);
> > - return -1;
> > - }
> > if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) {
> > error_setg(errp,
> > "%s: SEV_SNP is not enabled but is enabled in VMSA
> > sev_features",
> > @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport
> > *cgs, Error **errp)
> > return -1;
> > }
> >
> > + /*
> > + * Use VMA-based protection for SEV-ES guests that enable any
> > + * SEV feature, since KVM does not advertise readonly memory
> > + * support for non-default type SEV guests.
> > + */
> > + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) {
> > + kvm_enable_ro_mem_vma_protection();
> > + }
> > +
> > if (!cgs->ready) {
> > /*
> > * SEV uses these notifiers to register/pin pages prior to guest
> > use,
> > @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t
> > len, Error **errp)
> > if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
> > int ret;
> >
> > + /*
> > + * KVM requires these pages to be RW, so remove VMA RO protection
> > + * for the duration of SEV_LAUNCH_UPDATE if using SEV features.
> > + */
> > + kvm_set_memory_readwrite(ptr, len);
> > ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp);
> > + kvm_set_memory_readonly(ptr, len);
> > if (ret < 0) {
> > return ret;
> > }
> >
> > base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7
> > prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8
> > prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c
> > prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784
> > prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49
> > prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3
> > prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b
> > prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab
> > prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368
> > prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd
> > --
> > 2.54.0
> >
> >