On 6/10/26 06:30, Naveen N Rao wrote:
> [+Sean]
> 
> Hi Mike,
> 
> On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote:
>> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote:
>>> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
>>> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
>>> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
>>> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
>>> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
>>> type SEV-ES guests. As a result of this, it is no longer possible to
>>> start SEV-ES guests with any SEV feature enabled (in particular,
>>> debug-swap) with pflash devices.
>>>
>>> This is an issue since SEV-ES guests have historically used pflash
>>> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
>>> while using pflash devices, so work around the KVM limitation by
>>> switching to using a VMA-based write protection. This allows
>>> well-behaved SEV-ES guests to continue to work with pflash devices and
>>> enable debug-swap. Mis-behaving guests trying to write to the protected
>>> OVMF area will be killed.
>>
>> Based on Sean's description, a write access to a read-only memslot would
>> cause the vCPU to permanently spin on #NPFs if trying to write to it as
>> MMIO due to #VC handler not triggering, and that's why we don't support
>> read-only memslots. But since SEV-ES was previously working with pflash,
>> it seems like it does not rely on this functionality...
> 
> Right, normal well-behaved SEV-ES/SNP guests work just fine as they 
> don't write to any of the read-only areas.

Yes they do. There is specific support to make a direct GHCB MMIO
request because of the lack of the #VC exception (see
OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c).

Thanks,
Tom

> 
>>
>> So if OVMF isn't writing to write-protected memory, then it wouldn't be
>> triggering the MMIO emulation path in the first place. And if we don't
>> care about enabling the emulation path in this case... then I'm not sure
>> the original reasons for not allowing it for SEV-ES/SNP are applicable.
> 
> Guest (not just OVMF) could try and write to the read-only area 
> triggering this issue. A simple write to 0xc0000 from within the guest 
> triggers this.
> 
>>
>> It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE,
>> whereas we literally just need as actually slot that's permanently mapped
>> in the NPT without write access.
>>
>> Is that an accurate summary of the situation?
> 
> Yes, that sounds right to me.
> 
>>
>> If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures
>> this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT.
> 
> That would certainly work.
> 
>> Or, for
>> a QEMU-specific workaround, just have a pflash implementation that doesn't
>> rely on KVM_MEM_READONLY for cases like this where we don't need MMIO
>> emulation.
> 
> Not sure I follow that... are you suggesting that pflash use regular RW 
> memslots and just let the write through?
> 
> 
> Thanks,
> Naveen
> 
>> There's actually another case in hw/nvram/nrf51_nvm.c where this
>> would be applicable. I guess it could be done automatically for the
>> confidential VM case to retain cmdline compatibility...though you're
>> wanting to add the debugswap feature anyway so not sure how important that
>> aspect is.
>>
>> Thanks,
>>
>> Mike
>>
>>>
>>> Enable VMA protection and set the memory to be RO when adding the KVM
>>> memory slot. Because pflash devices support command-mode, change VMA
>>> protection to RW when tearing down the KVM memory slot. KVM
>>> SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the
>>> protection when calling that.
>>>
>>> Print a warning when switching to VMA-based protection so that it is
>>> clear that KVM itself isn't supporting readonly memory, and that a
>>> workaround is in place. Users can plan on switching to using '-bios'.
>>>
>>> Finally, drop the check rejecting SEV-ES guests with SEV features so 
>>> that debug-swap can be enabled.
>>>
>>>
>>> Signed-off-by: Naveen N Rao (AMD) <[email protected]>
>>> ---
>>> Background discussion on this issue:
>>> http://lore.kernel.org/r/[email protected]
>>>
>>> This series depends on VMSA features support:
>>> http://lore.kernel.org/r/[email protected]
>>>
>>>
>>> - Naveen
>>>
>>>
>>>  include/system/kvm.h     |  5 ++++
>>>  include/system/kvm_int.h |  1 +
>>>  accel/kvm/kvm-all.c      | 52 +++++++++++++++++++++++++++++++++++++++-
>>>  hw/i386/pc_sysfw.c       | 19 +++++++++------
>>>  target/i386/sev.c        | 21 +++++++++++-----
>>>  5 files changed, 84 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>>> index 5fa33eddda38..585058bd6f1c 100644
>>> --- a/include/system/kvm.h
>>> +++ b/include/system/kvm.h
>>> @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void);
>>>  
>>>  void kvm_mark_guest_state_protected(void);
>>>  
>>> +void kvm_enable_ro_mem_vma_protection(void);
>>> +
>>>  /**
>>>   * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
>>>   * reported for the VM.
>>> @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start, 
>>> uint64_t size);
>>>  
>>>  int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
>>>  
>>> +void kvm_set_memory_readonly(void *addr, size_t len);
>>> +void kvm_set_memory_readwrite(void *addr, size_t len);
>>> +
>>>  /* argument to vmfd change notifier */
>>>  typedef struct VmfdChangeNotifier {
>>>      int vmfd;
>>> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
>>> index 0876aac938d3..0e083a56ce2a 100644
>>> --- a/include/system/kvm_int.h
>>> +++ b/include/system/kvm_int.h
>>> @@ -123,6 +123,7 @@ struct KVMState
>>>      OnOffAuto kernel_irqchip_split;
>>>      bool sync_mmu;
>>>      bool guest_state_protected;
>>> +    bool guest_wants_ro_mem_vma_protection;
>>>      uint64_t manual_dirty_log_protect;
>>>      /*
>>>       * Older POSIX says that ioctl numbers are signed int, but in
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index 96f90ebb240f..4208df5b25ac 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start, 
>>> uint64_t size)
>>>      return kvm_set_memory_attributes(start, size, 0);
>>>  }
>>>  
>>> +static void kvm_set_memory_flags(void *addr, size_t len, int flags)
>>> +{
>>> +    if (mprotect(addr, len, flags)) {
>>> +        error_report("failed to apply memory protection "
>>> +                     "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'",
>>> +                     (hwaddr)addr, len, strerror(errno));
>>> +        exit(1);
>>> +    }
>>> +}
>>> +
>>> +void kvm_set_memory_readonly(void *addr, size_t len)
>>> +{
>>> +    if (kvm_state->guest_wants_ro_mem_vma_protection) {
>>> +        kvm_set_memory_flags(addr, len, PROT_READ);
>>> +    }
>>> +}
>>> +
>>> +void kvm_set_memory_readwrite(void *addr, size_t len)
>>> +{
>>> +    if (kvm_state->guest_wants_ro_mem_vma_protection) {
>>> +        kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE);
>>> +    }
>>> +}
>>> +
>>> +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr)
>>> +{
>>> +    if (!memory_region_is_ram(mr) &&
>>> +        (mr->readonly || mr->rom_device) &&
>>> +        !kvm_readonly_mem_allowed &&
>>> +        kvm_state->guest_wants_ro_mem_vma_protection) {
>>> +            return true;
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>>  /* Called with KVMMemoryListener.slots_lock held */
>>>  static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>>                               MemoryRegionSection *section, bool add)
>>> @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>>      void *ram;
>>>  
>>>      if (!memory_region_is_ram(mr)) {
>>> -        if (writable || !kvm_readonly_mem_allowed) {
>>> +        if (writable || (!kvm_readonly_mem_allowed &&
>>> +                         !kvm_state->guest_wants_ro_mem_vma_protection)) {
>>>              return;
>>>          } else if (!mr->romd_mode) {
>>>              /* If the memory device is not in romd_mode, then we actually 
>>> want
>>> @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>>                  kvm_slot_sync_dirty_pages(mem);
>>>              }
>>>  
>>> +            if (kvm_mem_wants_vma_protection(mr)) {
>>> +                kvm_set_memory_readwrite(mem->ram, mem->memory_size);
>>> +            }
>>> +
>>>              /* unregister the slot */
>>>              g_free(mem->dirty_bmap);
>>>              mem->dirty_bmap = NULL;
>>> @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>>              }
>>>          }
>>>  
>>> +        if (kvm_mem_wants_vma_protection(mr)) {
>>> +            kvm_set_memory_readonly(mem->ram, mem->memory_size);
>>> +        }
>>> +
>>>          start_addr += slot_size;
>>>          ram_start_offset += slot_size;
>>>          ram += slot_size;
>>> @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void)
>>>      kvm_state->guest_state_protected = true;
>>>  }
>>>  
>>> +void kvm_enable_ro_mem_vma_protection(void)
>>> +{
>>> +    kvm_state->guest_wants_ro_mem_vma_protection = true;
>>> +}
>>> +
>>>  int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
>>>  {
>>>      int fd;
>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>>> index 1a41a5972bd0..9590458e00c5 100644
>>> --- a/hw/i386/pc_sysfw.c
>>> +++ b/hw/i386/pc_sysfw.c
>>> @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms,
>>>          }
>>>      } else {
>>>          if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
>>> -            /*
>>> -             * Older KVM cannot execute from device memory. So, flash
>>> -             * memory cannot be used unless the readonly memory kvm
>>> -             * capability is present.
>>> -             */
>>> -            error_report("pflash with kvm requires KVM readonly memory 
>>> support");
>>> -            exit(1);
>>> +            if (sev_es_enabled() && !sev_snp_enabled()) {
>>> +                warn_report("pflash not supported with SEV-ES guests, "
>>> +                            "attempting VMA based protection");
>>> +            } else {
>>> +                /*
>>> +                 * Older KVM cannot execute from device memory. So, flash
>>> +                 * memory cannot be used unless the readonly memory kvm
>>> +                 * capability is present.
>>> +                 */
>>> +                error_report("pflash with kvm requires KVM readonly memory 
>>> support");
>>> +                exit(1);
>>> +            }
>>>          }
>>>  
>>>          pc_system_flash_map(pcms, rom_memory);
>>> diff --git a/target/i386/sev.c b/target/i386/sev.c
>>> index f04ae4e91f3e..82cf2c562729 100644
>>> --- a/target/i386/sev.c
>>> +++ b/target/i386/sev.c
>>> @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState 
>>> *sev_common, uint64_t sev_features,
>>>              return -1;
>>>          }
>>>      } else {
>>> -        if (sev_features && sev_es_enabled()) {
>>> -            error_setg(errp,
>>> -                       "%s: SEV features are not supported with SEV-ES at 
>>> this time",
>>> -                       __func__);
>>> -            return -1;
>>> -        }
>>>          if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) {
>>>              error_setg(errp,
>>>                         "%s: SEV_SNP is not enabled but is enabled in VMSA 
>>> sev_features",
>>> @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport 
>>> *cgs, Error **errp)
>>>          return -1;
>>>      }
>>>  
>>> +    /*
>>> +     * Use VMA-based protection for SEV-ES guests that enable any
>>> +     * SEV feature, since KVM does not advertise readonly memory
>>> +     * support for non-default type SEV guests.
>>> +     */
>>> +    if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) {
>>> +        kvm_enable_ro_mem_vma_protection();
>>> +    }
>>> +
>>>      if (!cgs->ready) {
>>>          /*
>>>           * SEV uses these notifiers to register/pin pages prior to guest 
>>> use,
>>> @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t 
>>> len, Error **errp)
>>>      if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
>>>          int ret;
>>>  
>>> +        /*
>>> +         * KVM requires these pages to be RW, so remove VMA RO protection
>>> +         * for the duration of SEV_LAUNCH_UPDATE if using SEV features.
>>> +         */
>>> +        kvm_set_memory_readwrite(ptr, len);
>>>          ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp);
>>> +        kvm_set_memory_readonly(ptr, len);
>>>          if (ret < 0) {
>>>              return ret;
>>>          }
>>>
>>> base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7
>>> prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8
>>> prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c
>>> prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784
>>> prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49
>>> prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3
>>> prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b
>>> prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab
>>> prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368
>>> prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd
>>> -- 
>>> 2.54.0
>>>
>>>


Reply via email to