[+Sean]

Hi Mike,

On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote:
> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote:
> > KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
> > SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
> > ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
> > stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
> > (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
> > type SEV-ES guests. As a result of this, it is no longer possible to
> > start SEV-ES guests with any SEV feature enabled (in particular,
> > debug-swap) with pflash devices.
> > 
> > This is an issue since SEV-ES guests have historically used pflash
> > devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
> > while using pflash devices, so work around the KVM limitation by
> > switching to using a VMA-based write protection. This allows
> > well-behaved SEV-ES guests to continue to work with pflash devices and
> > enable debug-swap. Mis-behaving guests trying to write to the protected
> > OVMF area will be killed.
> 
> Based on Sean's description, a write access to a read-only memslot would
> cause the vCPU to permanently spin on #NPFs if trying to write to it as
> MMIO due to #VC handler not triggering, and that's why we don't support
> read-only memslots. But since SEV-ES was previously working with pflash,
> it seems like it does not rely on this functionality...

Right, normal well-behaved SEV-ES/SNP guests work just fine as they 
don't write to any of the read-only areas.

> 
> So if OVMF isn't writing to write-protected memory, then it wouldn't be
> triggering the MMIO emulation path in the first place. And if we don't
> care about enabling the emulation path in this case... then I'm not sure
> the original reasons for not allowing it for SEV-ES/SNP are applicable.

Guest (not just OVMF) could try and write to the read-only area 
triggering this issue. A simple write to 0xc0000 from within the guest 
triggers this.

> 
> It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE,
> whereas we literally just need as actually slot that's permanently mapped
> in the NPT without write access.
> 
> Is that an accurate summary of the situation?

Yes, that sounds right to me.

> 
> If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures
> this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT.

That would certainly work.

> Or, for
> a QEMU-specific workaround, just have a pflash implementation that doesn't
> rely on KVM_MEM_READONLY for cases like this where we don't need MMIO
> emulation.

Not sure I follow that... are you suggesting that pflash use regular RW 
memslots and just let the write through?


Thanks,
Naveen

> There's actually another case in hw/nvram/nrf51_nvm.c where this
> would be applicable. I guess it could be done automatically for the
> confidential VM case to retain cmdline compatibility...though you're
> wanting to add the debugswap feature anyway so not sure how important that
> aspect is.
> 
> Thanks,
> 
> Mike
>
> > 
> > Enable VMA protection and set the memory to be RO when adding the KVM
> > memory slot. Because pflash devices support command-mode, change VMA
> > protection to RW when tearing down the KVM memory slot. KVM
> > SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the
> > protection when calling that.
> > 
> > Print a warning when switching to VMA-based protection so that it is
> > clear that KVM itself isn't supporting readonly memory, and that a
> > workaround is in place. Users can plan on switching to using '-bios'.
> > 
> > Finally, drop the check rejecting SEV-ES guests with SEV features so 
> > that debug-swap can be enabled.
> > 
> > 
> > Signed-off-by: Naveen N Rao (AMD) <[email protected]>
> > ---
> > Background discussion on this issue:
> > http://lore.kernel.org/r/[email protected]
> > 
> > This series depends on VMSA features support:
> > http://lore.kernel.org/r/[email protected]
> > 
> > 
> > - Naveen
> > 
> > 
> >  include/system/kvm.h     |  5 ++++
> >  include/system/kvm_int.h |  1 +
> >  accel/kvm/kvm-all.c      | 52 +++++++++++++++++++++++++++++++++++++++-
> >  hw/i386/pc_sysfw.c       | 19 +++++++++------
> >  target/i386/sev.c        | 21 +++++++++++-----
> >  5 files changed, 84 insertions(+), 14 deletions(-)
> > 
> > diff --git a/include/system/kvm.h b/include/system/kvm.h
> > index 5fa33eddda38..585058bd6f1c 100644
> > --- a/include/system/kvm.h
> > +++ b/include/system/kvm.h
> > @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void);
> >  
> >  void kvm_mark_guest_state_protected(void);
> >  
> > +void kvm_enable_ro_mem_vma_protection(void);
> > +
> >  /**
> >   * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
> >   * reported for the VM.
> > @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, 
> > uint64_t size);
> >  
> >  int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
> >  
> > +void kvm_set_memory_readonly(void *addr, size_t len);
> > +void kvm_set_memory_readwrite(void *addr, size_t len);
> > +
> >  /* argument to vmfd change notifier */
> >  typedef struct VmfdChangeNotifier {
> >      int vmfd;
> > diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
> > index 0876aac938d3..0e083a56ce2a 100644
> > --- a/include/system/kvm_int.h
> > +++ b/include/system/kvm_int.h
> > @@ -123,6 +123,7 @@ struct KVMState
> >      OnOffAuto kernel_irqchip_split;
> >      bool sync_mmu;
> >      bool guest_state_protected;
> > +    bool guest_wants_ro_mem_vma_protection;
> >      uint64_t manual_dirty_log_protect;
> >      /*
> >       * Older POSIX says that ioctl numbers are signed int, but in
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 96f90ebb240f..4208df5b25ac 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, 
> > uint64_t size)
> >      return kvm_set_memory_attributes(start, size, 0);
> >  }
> >  
> > +static void kvm_set_memory_flags(void *addr, size_t len, int flags)
> > +{
> > +    if (mprotect(addr, len, flags)) {
> > +        error_report("failed to apply memory protection "
> > +                     "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'",
> > +                     (hwaddr)addr, len, strerror(errno));
> > +        exit(1);
> > +    }
> > +}
> > +
> > +void kvm_set_memory_readonly(void *addr, size_t len)
> > +{
> > +    if (kvm_state->guest_wants_ro_mem_vma_protection) {
> > +        kvm_set_memory_flags(addr, len, PROT_READ);
> > +    }
> > +}
> > +
> > +void kvm_set_memory_readwrite(void *addr, size_t len)
> > +{
> > +    if (kvm_state->guest_wants_ro_mem_vma_protection) {
> > +        kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE);
> > +    }
> > +}
> > +
> > +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr)
> > +{
> > +    if (!memory_region_is_ram(mr) &&
> > +        (mr->readonly || mr->rom_device) &&
> > +        !kvm_readonly_mem_allowed &&
> > +        kvm_state->guest_wants_ro_mem_vma_protection) {
> > +            return true;
> > +    }
> > +
> > +    return false;
> > +}
> > +
> >  /* Called with KVMMemoryListener.slots_lock held */
> >  static void kvm_set_phys_mem(KVMMemoryListener *kml,
> >                               MemoryRegionSection *section, bool add)
> > @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> >      void *ram;
> >  
> >      if (!memory_region_is_ram(mr)) {
> > -        if (writable || !kvm_readonly_mem_allowed) {
> > +        if (writable || (!kvm_readonly_mem_allowed &&
> > +                         !kvm_state->guest_wants_ro_mem_vma_protection)) {
> >              return;
> >          } else if (!mr->romd_mode) {
> >              /* If the memory device is not in romd_mode, then we actually 
> > want
> > @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> >                  kvm_slot_sync_dirty_pages(mem);
> >              }
> >  
> > +            if (kvm_mem_wants_vma_protection(mr)) {
> > +                kvm_set_memory_readwrite(mem->ram, mem->memory_size);
> > +            }
> > +
> >              /* unregister the slot */
> >              g_free(mem->dirty_bmap);
> >              mem->dirty_bmap = NULL;
> > @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
> >              }
> >          }
> >  
> > +        if (kvm_mem_wants_vma_protection(mr)) {
> > +            kvm_set_memory_readonly(mem->ram, mem->memory_size);
> > +        }
> > +
> >          start_addr += slot_size;
> >          ram_start_offset += slot_size;
> >          ram += slot_size;
> > @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void)
> >      kvm_state->guest_state_protected = true;
> >  }
> >  
> > +void kvm_enable_ro_mem_vma_protection(void)
> > +{
> > +    kvm_state->guest_wants_ro_mem_vma_protection = true;
> > +}
> > +
> >  int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
> >  {
> >      int fd;
> > diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> > index 1a41a5972bd0..9590458e00c5 100644
> > --- a/hw/i386/pc_sysfw.c
> > +++ b/hw/i386/pc_sysfw.c
> > @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms,
> >          }
> >      } else {
> >          if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
> > -            /*
> > -             * Older KVM cannot execute from device memory. So, flash
> > -             * memory cannot be used unless the readonly memory kvm
> > -             * capability is present.
> > -             */
> > -            error_report("pflash with kvm requires KVM readonly memory 
> > support");
> > -            exit(1);
> > +            if (sev_es_enabled() && !sev_snp_enabled()) {
> > +                warn_report("pflash not supported with SEV-ES guests, "
> > +                            "attempting VMA based protection");
> > +            } else {
> > +                /*
> > +                 * Older KVM cannot execute from device memory. So, flash
> > +                 * memory cannot be used unless the readonly memory kvm
> > +                 * capability is present.
> > +                 */
> > +                error_report("pflash with kvm requires KVM readonly memory 
> > support");
> > +                exit(1);
> > +            }
> >          }
> >  
> >          pc_system_flash_map(pcms, rom_memory);
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index f04ae4e91f3e..82cf2c562729 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState 
> > *sev_common, uint64_t sev_features,
> >              return -1;
> >          }
> >      } else {
> > -        if (sev_features && sev_es_enabled()) {
> > -            error_setg(errp,
> > -                       "%s: SEV features are not supported with SEV-ES at 
> > this time",
> > -                       __func__);
> > -            return -1;
> > -        }
> >          if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) {
> >              error_setg(errp,
> >                         "%s: SEV_SNP is not enabled but is enabled in VMSA 
> > sev_features",
> > @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport 
> > *cgs, Error **errp)
> >          return -1;
> >      }
> >  
> > +    /*
> > +     * Use VMA-based protection for SEV-ES guests that enable any
> > +     * SEV feature, since KVM does not advertise readonly memory
> > +     * support for non-default type SEV guests.
> > +     */
> > +    if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) {
> > +        kvm_enable_ro_mem_vma_protection();
> > +    }
> > +
> >      if (!cgs->ready) {
> >          /*
> >           * SEV uses these notifiers to register/pin pages prior to guest 
> > use,
> > @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t 
> > len, Error **errp)
> >      if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
> >          int ret;
> >  
> > +        /*
> > +         * KVM requires these pages to be RW, so remove VMA RO protection
> > +         * for the duration of SEV_LAUNCH_UPDATE if using SEV features.
> > +         */
> > +        kvm_set_memory_readwrite(ptr, len);
> >          ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp);
> > +        kvm_set_memory_readonly(ptr, len);
> >          if (ret < 0) {
> >              return ret;
> >          }
> > 
> > base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7
> > prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8
> > prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c
> > prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784
> > prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49
> > prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3
> > prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b
> > prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab
> > prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368
> > prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd
> > -- 
> > 2.54.0
> > 
> > 

Reply via email to