On 6/10/26 06:30, Naveen N Rao wrote:
> [+Sean]
>
> Hi Mike,
>
> On Tue, Jun 09, 2026 at 07:35:46PM -0500, Michael Roth wrote:
>> On Tue, Jun 02, 2026 at 12:42:13PM +0530, Naveen N Rao (AMD) wrote:
>>> KVM commit 66155de93bcf ("KVM: x86: Disallow read-only memslots for
>>> SEV-ES and SEV-SNP (and TDX)"), and the subsequent commit d30d9ee94cc0
>>> ("KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM")
>>> stopped advertising KVM_CAP_READONLY_MEM support for encrypted guests
>>> (KVM_X86_SEV_ES_VM and KVM_X86_SNP_VM), but not for KVM_X86_DEFAULT_VM
>>> type SEV-ES guests. As a result of this, it is no longer possible to
>>> start SEV-ES guests with any SEV feature enabled (in particular,
>>> debug-swap) with pflash devices.
>>>
>>> This is an issue since SEV-ES guests have historically used pflash
>>> devices for OVMF. Guests on older KVM+Qemu are able to enable debug-swap
>>> while using pflash devices, so work around the KVM limitation by
>>> switching to using a VMA-based write protection. This allows
>>> well-behaved SEV-ES guests to continue to work with pflash devices and
>>> enable debug-swap. Mis-behaving guests trying to write to the protected
>>> OVMF area will be killed.
>>
>> Based on Sean's description, a write access to a read-only memslot would
>> cause the vCPU to permanently spin on #NPFs if trying to write to it as
>> MMIO due to #VC handler not triggering, and that's why we don't support
>> read-only memslots. But since SEV-ES was previously working with pflash,
>> it seems like it does not rely on this functionality...
>
> Right, normal well-behaved SEV-ES/SNP guests work just fine as they
> don't write to any of the read-only areas.
Yes they do. There is specific support to make a direct GHCB MMIO
request because of the lack of the #VC exception (see
OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c).
Thanks,
Tom
>
>>
>> So if OVMF isn't writing to write-protected memory, then it wouldn't be
>> triggering the MMIO emulation path in the first place. And if we don't
>> care about enabling the emulation path in this case... then I'm not sure
>> the original reasons for not allowing it for SEV-ES/SNP are applicable.
>
> Guest (not just OVMF) could try and write to the read-only area
> triggering this issue. A simple write to 0xc0000 from within the guest
> triggers this.
>
>>
>> It feels like KVM_CAP_READONLY_MEM is more like KVM_CAP_EMULATE_ON_WRITE,
>> whereas we literally just need as actually slot that's permanently mapped
>> in the NPT without write access.
>>
>> Is that an accurate summary of the situation?
>
> Yes, that sounds right to me.
>
>>
>> If so, maybe we can introduce a KVM_CAP_READONLY_NO_MMIO that captures
>> this and simply errors out if it hits the KVM_PFN_ERR_RO_FAULT.
>
> That would certainly work.
>
>> Or, for
>> a QEMU-specific workaround, just have a pflash implementation that doesn't
>> rely on KVM_MEM_READONLY for cases like this where we don't need MMIO
>> emulation.
>
> Not sure I follow that... are you suggesting that pflash use regular RW
> memslots and just let the write through?
>
>
> Thanks,
> Naveen
>
>> There's actually another case in hw/nvram/nrf51_nvm.c where this
>> would be applicable. I guess it could be done automatically for the
>> confidential VM case to retain cmdline compatibility...though you're
>> wanting to add the debugswap feature anyway so not sure how important that
>> aspect is.
>>
>> Thanks,
>>
>> Mike
>>
>>>
>>> Enable VMA protection and set the memory to be RO when adding the KVM
>>> memory slot. Because pflash devices support command-mode, change VMA
>>> protection to RW when tearing down the KVM memory slot. KVM
>>> SEV_LAUNCH_UPDATE also requires memory to be RW, so disable the
>>> protection when calling that.
>>>
>>> Print a warning when switching to VMA-based protection so that it is
>>> clear that KVM itself isn't supporting readonly memory, and that a
>>> workaround is in place. Users can plan on switching to using '-bios'.
>>>
>>> Finally, drop the check rejecting SEV-ES guests with SEV features so
>>> that debug-swap can be enabled.
>>>
>>>
>>> Signed-off-by: Naveen N Rao (AMD) <[email protected]>
>>> ---
>>> Background discussion on this issue:
>>> http://lore.kernel.org/r/[email protected]
>>>
>>> This series depends on VMSA features support:
>>> http://lore.kernel.org/r/[email protected]
>>>
>>>
>>> - Naveen
>>>
>>>
>>> include/system/kvm.h | 5 ++++
>>> include/system/kvm_int.h | 1 +
>>> accel/kvm/kvm-all.c | 52 +++++++++++++++++++++++++++++++++++++++-
>>> hw/i386/pc_sysfw.c | 19 +++++++++------
>>> target/i386/sev.c | 21 +++++++++++-----
>>> 5 files changed, 84 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/system/kvm.h b/include/system/kvm.h
>>> index 5fa33eddda38..585058bd6f1c 100644
>>> --- a/include/system/kvm.h
>>> +++ b/include/system/kvm.h
>>> @@ -555,6 +555,8 @@ uint32_t kvm_dirty_ring_size(void);
>>>
>>> void kvm_mark_guest_state_protected(void);
>>>
>>> +void kvm_enable_ro_mem_vma_protection(void);
>>> +
>>> /**
>>> * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
>>> * reported for the VM.
>>> @@ -568,6 +570,9 @@ int kvm_set_memory_attributes_shared(hwaddr start,
>>> uint64_t size);
>>>
>>> int kvm_convert_memory(hwaddr start, hwaddr size, bool to_private);
>>>
>>> +void kvm_set_memory_readonly(void *addr, size_t len);
>>> +void kvm_set_memory_readwrite(void *addr, size_t len);
>>> +
>>> /* argument to vmfd change notifier */
>>> typedef struct VmfdChangeNotifier {
>>> int vmfd;
>>> diff --git a/include/system/kvm_int.h b/include/system/kvm_int.h
>>> index 0876aac938d3..0e083a56ce2a 100644
>>> --- a/include/system/kvm_int.h
>>> +++ b/include/system/kvm_int.h
>>> @@ -123,6 +123,7 @@ struct KVMState
>>> OnOffAuto kernel_irqchip_split;
>>> bool sync_mmu;
>>> bool guest_state_protected;
>>> + bool guest_wants_ro_mem_vma_protection;
>>> uint64_t manual_dirty_log_protect;
>>> /*
>>> * Older POSIX says that ioctl numbers are signed int, but in
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index 96f90ebb240f..4208df5b25ac 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -1629,6 +1629,42 @@ int kvm_set_memory_attributes_shared(hwaddr start,
>>> uint64_t size)
>>> return kvm_set_memory_attributes(start, size, 0);
>>> }
>>>
>>> +static void kvm_set_memory_flags(void *addr, size_t len, int flags)
>>> +{
>>> + if (mprotect(addr, len, flags)) {
>>> + error_report("failed to apply memory protection "
>>> + "(0x%" HWADDR_PRIx "+0x%" PRIx64 ") error '%s'",
>>> + (hwaddr)addr, len, strerror(errno));
>>> + exit(1);
>>> + }
>>> +}
>>> +
>>> +void kvm_set_memory_readonly(void *addr, size_t len)
>>> +{
>>> + if (kvm_state->guest_wants_ro_mem_vma_protection) {
>>> + kvm_set_memory_flags(addr, len, PROT_READ);
>>> + }
>>> +}
>>> +
>>> +void kvm_set_memory_readwrite(void *addr, size_t len)
>>> +{
>>> + if (kvm_state->guest_wants_ro_mem_vma_protection) {
>>> + kvm_set_memory_flags(addr, len, PROT_READ | PROT_WRITE);
>>> + }
>>> +}
>>> +
>>> +static bool kvm_mem_wants_vma_protection(MemoryRegion *mr)
>>> +{
>>> + if (!memory_region_is_ram(mr) &&
>>> + (mr->readonly || mr->rom_device) &&
>>> + !kvm_readonly_mem_allowed &&
>>> + kvm_state->guest_wants_ro_mem_vma_protection) {
>>> + return true;
>>> + }
>>> +
>>> + return false;
>>> +}
>>> +
>>> /* Called with KVMMemoryListener.slots_lock held */
>>> static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>> MemoryRegionSection *section, bool add)
>>> @@ -1642,7 +1678,8 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>> void *ram;
>>>
>>> if (!memory_region_is_ram(mr)) {
>>> - if (writable || !kvm_readonly_mem_allowed) {
>>> + if (writable || (!kvm_readonly_mem_allowed &&
>>> + !kvm_state->guest_wants_ro_mem_vma_protection)) {
>>> return;
>>> } else if (!mr->romd_mode) {
>>> /* If the memory device is not in romd_mode, then we actually
>>> want
>>> @@ -1697,6 +1734,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>> kvm_slot_sync_dirty_pages(mem);
>>> }
>>>
>>> + if (kvm_mem_wants_vma_protection(mr)) {
>>> + kvm_set_memory_readwrite(mem->ram, mem->memory_size);
>>> + }
>>> +
>>> /* unregister the slot */
>>> g_free(mem->dirty_bmap);
>>> mem->dirty_bmap = NULL;
>>> @@ -1746,6 +1787,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>>> }
>>> }
>>>
>>> + if (kvm_mem_wants_vma_protection(mr)) {
>>> + kvm_set_memory_readonly(mem->ram, mem->memory_size);
>>> + }
>>> +
>>> start_addr += slot_size;
>>> ram_start_offset += slot_size;
>>> ram += slot_size;
>>> @@ -4771,6 +4816,11 @@ void kvm_mark_guest_state_protected(void)
>>> kvm_state->guest_state_protected = true;
>>> }
>>>
>>> +void kvm_enable_ro_mem_vma_protection(void)
>>> +{
>>> + kvm_state->guest_wants_ro_mem_vma_protection = true;
>>> +}
>>> +
>>> int kvm_create_guest_memfd(uint64_t size, uint64_t flags, Error **errp)
>>> {
>>> int fd;
>>> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
>>> index 1a41a5972bd0..9590458e00c5 100644
>>> --- a/hw/i386/pc_sysfw.c
>>> +++ b/hw/i386/pc_sysfw.c
>>> @@ -254,13 +254,18 @@ void pc_system_firmware_init(PCMachineState *pcms,
>>> }
>>> } else {
>>> if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
>>> - /*
>>> - * Older KVM cannot execute from device memory. So, flash
>>> - * memory cannot be used unless the readonly memory kvm
>>> - * capability is present.
>>> - */
>>> - error_report("pflash with kvm requires KVM readonly memory
>>> support");
>>> - exit(1);
>>> + if (sev_es_enabled() && !sev_snp_enabled()) {
>>> + warn_report("pflash not supported with SEV-ES guests, "
>>> + "attempting VMA based protection");
>>> + } else {
>>> + /*
>>> + * Older KVM cannot execute from device memory. So, flash
>>> + * memory cannot be used unless the readonly memory kvm
>>> + * capability is present.
>>> + */
>>> + error_report("pflash with kvm requires KVM readonly memory
>>> support");
>>> + exit(1);
>>> + }
>>> }
>>>
>>> pc_system_flash_map(pcms, rom_memory);
>>> diff --git a/target/i386/sev.c b/target/i386/sev.c
>>> index f04ae4e91f3e..82cf2c562729 100644
>>> --- a/target/i386/sev.c
>>> +++ b/target/i386/sev.c
>>> @@ -550,12 +550,6 @@ static int check_sev_features(SevCommonState
>>> *sev_common, uint64_t sev_features,
>>> return -1;
>>> }
>>> } else {
>>> - if (sev_features && sev_es_enabled()) {
>>> - error_setg(errp,
>>> - "%s: SEV features are not supported with SEV-ES at
>>> this time",
>>> - __func__);
>>> - return -1;
>>> - }
>>> if (sev_features & SVM_SEV_FEAT_SNP_ACTIVE) {
>>> error_setg(errp,
>>> "%s: SEV_SNP is not enabled but is enabled in VMSA
>>> sev_features",
>>> @@ -2024,6 +2018,15 @@ static int sev_kvm_init(ConfidentialGuestSupport
>>> *cgs, Error **errp)
>>> return -1;
>>> }
>>>
>>> + /*
>>> + * Use VMA-based protection for SEV-ES guests that enable any
>>> + * SEV feature, since KVM does not advertise readonly memory
>>> + * support for non-default type SEV guests.
>>> + */
>>> + if (sev_es_enabled() && SEV_COMMON(cgs)->sev_features) {
>>> + kvm_enable_ro_mem_vma_protection();
>>> + }
>>> +
>>> if (!cgs->ready) {
>>> /*
>>> * SEV uses these notifiers to register/pin pages prior to guest
>>> use,
>>> @@ -2111,7 +2114,13 @@ sev_encrypt_flash(hwaddr gpa, uint8_t *ptr, uint64_t
>>> len, Error **errp)
>>> if (sev_check_state(sev_common, SEV_STATE_LAUNCH_UPDATE)) {
>>> int ret;
>>>
>>> + /*
>>> + * KVM requires these pages to be RW, so remove VMA RO protection
>>> + * for the duration of SEV_LAUNCH_UPDATE if using SEV features.
>>> + */
>>> + kvm_set_memory_readwrite(ptr, len);
>>> ret = klass->launch_update_data(sev_common, gpa, ptr, len, errp);
>>> + kvm_set_memory_readonly(ptr, len);
>>> if (ret < 0) {
>>> return ret;
>>> }
>>>
>>> base-commit: 5611a9268dae7b7ff99d478ed134052a9fc7e9f7
>>> prerequisite-patch-id: dc27ad6297d47d063b04fa797c1b8203ee97d9c8
>>> prerequisite-patch-id: 603eff49233c4b0483e7c405754b95aa455dd38c
>>> prerequisite-patch-id: d4085e72ecfb0fbf977f7358d1edd29951b93784
>>> prerequisite-patch-id: f6f201825b1e56f89a87a26bc457b3e6018aee49
>>> prerequisite-patch-id: 08d79cec4c3f117178be8f6c866ff1be08e971f3
>>> prerequisite-patch-id: c112bccab9ab9cee2d0227516fc857590b99a75b
>>> prerequisite-patch-id: f42fd829d0ea4909537e722477057a4013a247ab
>>> prerequisite-patch-id: 67136368ed1d2fa0ae55fee4368a7bd1fe394368
>>> prerequisite-patch-id: 08349bb1e0e11ee1518a9041771302c97866b5cd
>>> --
>>> 2.54.0
>>>
>>>