On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote:
> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
> type SEV-ES guests. As a result of this, it is no longer possible to
> start SEV-ES guests with any SEV feature enabled (in particular,
> debug-swap) with pflash devices.
> 
> This is an issue since SEV-ES guests have historically used pflash
> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
> while using pflash devices, so work around the KVM limitation by
> switching to using a VMA-based write protection. This allows
> well-behaved SEV-ES guests to continue to work with pflash devices and
> enable debug-swap. Mis-behaving guests trying to write to the protected
> OVMF area will be killed.

Based on Sean's description, a write access to a read-only memslot would
cause the vCPU to permanently spin on #NPFs if trying to write to it as
MMIO due to #VC handler not triggering, and that's why we don't support
read-only memslots. But since SEV-ES was previously working with pflash,
it seems like it does not rely on this functionality...

So if OVMF isn't writing to write-protected memory, then it wouldn't be
triggering the MMIO emulation path in the first place. And if we don't
care about enabling the emulation path in this case... then I'm not sure
the original reasons for not allowing it for SEV-ES/SNP are applicable.

It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE,
whereas we literally just need as actually slot that's permanently mapped
in the NPT without write access.

Is that an accurate summary of the situation?

If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures
this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT. Or, for
a QEMU-specific workaround, just have a pflash implementation that doesn't
rely on KVM_MEM_READONLY for cases like this where we don't need MMIO
emulation.  There's actually another case in hw/nvram/nrf51_nvm.c where this
would be applicable. I guess it could be done automatically for the
confidential VM case to retain cmdline compatibility...though you're
wanting to add the debugswap feature anyway so not sure how important that
aspect is.

Thanks,

Mike

> 
> Enable VMA protection and set the memory to be RO when adding the KVM
> memory slot. Because pflash devices support command-mode, change VMA
> protection to RW when tearing down the KVM memory slot. KVM
> SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the
> protection when calling that.
> 
> Print a warning when switching to VMA-based protection so that it is
> clear that KVM itself isn't supporting readonly memory, and that a
> workaround is in place. Users can plan on switching to using '-bios'.
> 
> Finally, drop the check rejecting SEV-ES guests with SEV features so 
> that debug-swap can be enabled.
> 
> 
> Signed-off-by: Naveen N Rao (AMD) <[email protected]>
> ---
> Background discussion on this issue:
> http://lore.kernel.org/r/[email protected]
> 
> This series depends on VMSA features support:
> http://lore.kernel.org/r/[email protected]
> 
> 
> - Naveen
> 
> 
>  include/system/kvm.h     |  5 ++++
>  include/system/kvm_int.h |  1 +
>  accel/kvm/kvm-all.c      | 52 +++++++++++++++++++++++++++++++++++++++-
>  hw/i386/pc_sysfw.c       | 19 +++++++++------
>  target/i386/sev.c        | 21 +++++++++++-----
>  5 files changed, 84 insertions(+), 14 deletions(-)
> 
> diff --git a/include/system/kvm.h b/include/system/kvm.h
> index 5fa33eddda38..585058bd6f1c 100644
> --- a/include/system/kvm.h
> +++ b/include/system/kvm.h
> @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void);
>  
>  void kvm_mark_guest_state_protected(void);
>  
> +void kvm_enable_ro_mem_vma_protection(void);
> +
>  /**
>   * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
>   * reported for the VM.
> @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, 
> uint64_t size);
>  
>  int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
>  
> +void kvm_set_memory_readonly(void *addr, size_t len);
> +void kvm_set_memory_readwrite(void *addr, size_t len);
> +
>  /* argument to vmfd change notifier */
>  typedef struct VmfdChangeNotifier {
>      int vmfd;
> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
> index 0876aac938d3..0e083a56ce2a 100644
> --- a/include/system/kvm_int.h
> +++ b/include/system/kvm_int.h
> @@ -123,6 +123,7 @@ struct KVMState
>      OnOffAuto kernel_irqchip_split;
>      bool sync_mmu;
>      bool guest_state_protected;
> +    bool guest_wants_ro_mem_vma_protection;
>      uint64_t manual_dirty_log_protect;
>      /*
>       * Older POSIX says that ioctl numbers are signed int, but in
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 96f90ebb240f..4208df5b25ac 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, 
> uint64_t size)
>      return kvm_set_memory_attributes(start, size, 0);
>  }
>  
> +static void kvm_set_memory_flags(void *addr, size_t len, int flags)
> +{
> +    if (mprotect(addr, len, flags)) {
> +        error_report("failed to apply memory protection "
> +                     "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'",
> +                     (hwaddr)addr, len, strerror(errno));
> +        exit(1);
> +    }
> +}
> +
> +void kvm_set_memory_readonly(void *addr, size_t len)
> +{
> +    if (kvm_state->guest_wants_ro_mem_vma_protection) {
> +        kvm_set_memory_flags(addr, len, PROT_READ);
> +    }
> +}
> +
> +void kvm_set_memory_readwrite(void *addr, size_t len)
> +{
> +    if (kvm_state->guest_wants_ro_mem_vma_protection) {
> +        kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE);
> +    }
> +}
> +
> +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr)
> +{
> +    if (!memory_region_is_ram(mr) &&
> +        (mr->readonly || mr->rom_device) &&
> +        !kvm_readonly_mem_allowed &&
> +        kvm_state->guest_wants_ro_mem_vma_protection) {
> +            return true;
> +    }
> +
> +    return false;
> +}
> +
>  /* Called with KVMMemoryListener.slots_lock held */
>  static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                               MemoryRegionSection *section, bool add)
> @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      void *ram;
>  
>      if (!memory_region_is_ram(mr)) {
> -        if (writable || !kvm_readonly_mem_allowed) {
> +        if (writable || (!kvm_readonly_mem_allowed &&
> +                         !kvm_state->guest_wants_ro_mem_vma_protection)) {
>              return;
>          } else if (!mr->romd_mode) {
>              /* If the memory device is not in romd_mode, then we actually 
> want
> @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                  kvm_slot_sync_dirty_pages(mem);
>              }
>  
> +            if (kvm_mem_wants_vma_protection(mr)) {
> +                kvm_set_memory_readwrite(mem->ram, mem->memory_size);
> +            }
> +
>              /* unregister the slot */
>              g_free(mem->dirty_bmap);
>              mem->dirty_bmap = NULL;
> @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>              }
>          }
>  
> +        if (kvm_mem_wants_vma_protection(mr)) {
> +            kvm_set_memory_readonly(mem->ram, mem->memory_size);
> +        }
> +
>          start_addr += slot_size;
>          ram_start_offset += slot_size;
>          ram += slot_size;
> @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void)
>      kvm_state->guest_state_protected = true;
>  }
>  
> +void kvm_enable_ro_mem_vma_protection(void)
> +{
> +    kvm_state->guest_wants_ro_mem_vma_protection = true;
> +}
> +
>  int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
>  {
>      int fd;
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index 1a41a5972bd0..9590458e00c5 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms,
>          }
>      } else {
>          if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
> -            /*
> -             * Older KVM cannot execute from device memory. So, flash
> -             * memory cannot be used unless the readonly memory kvm
> -             * capability is present.
> -             */
> -            error_report("pflash with kvm requires KVM readonly memory 
> support");
> -            exit(1);
> +            if (sev_es_enabled() && !sev_snp_enabled()) {
> +                warn_report("pflash not supported with SEV-ES guests, "
> +                            "attempting VMA based protection");
> +            } else {
> +                /*
> +                 * Older KVM cannot execute from device memory. So, flash
> +                 * memory cannot be used unless the readonly memory kvm
> +                 * capability is present.
> +                 */
> +                error_report("pflash with kvm requires KVM readonly memory 
> support");
> +                exit(1);
> +            }
>          }
>  
>          pc_system_flash_map(pcms, rom_memory);
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index f04ae4e91f3e..82cf2c562729 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState 
> *sev_common, uint64_t sev_features,
>              return -1;
>          }
>      } else {
> -        if (sev_features && sev_es_enabled()) {
> -            error_setg(errp,
> -                       "%s: SEV features are not supported with SEV-ES at 
> this time",
> -                       __func__);
> -            return -1;
> -        }
>          if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) {
>              error_setg(errp,
>                         "%s: SEV_SNP is not enabled but is enabled in VMSA 
> sev_features",
> @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport *cgs, 
> Error **errp)
>          return -1;
>      }
>  
> +    /*
> +     * Use VMA-based protection for SEV-ES guests that enable any
> +     * SEV feature, since KVM does not advertise readonly memory
> +     * support for non-default type SEV guests.
> +     */
> +    if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) {
> +        kvm_enable_ro_mem_vma_protection();
> +    }
> +
>      if (!cgs->ready) {
>          /*
>           * SEV uses these notifiers to register/pin pages prior to guest use,
> @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t 
> len, Error **errp)
>      if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
>          int ret;
>  
> +        /*
> +         * KVM requires these pages to be RW, so remove VMA RO protection
> +         * for the duration of SEV_LAUNCH_UPDATE if using SEV features.
> +         */
> +        kvm_set_memory_readwrite(ptr, len);
>          ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp);
> +        kvm_set_memory_readonly(ptr, len);
>          if (ret < 0) {
>              return ret;
>          }
> 
> base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7
> prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8
> prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c
> prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784
> prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49
> prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3
> prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b
> prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab
> prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368
> prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd
> -- 
> 2.54.0
> 
> 

Reply via email to