On 4/1/26 13:12, Marc-André Lureau wrote: > In Confidential Computing (CoCo) environments such as Intel TDX or AMD > SEV-SNP, hotplugged memory must be explicitly "accepted" (transitioned to > a private/encrypted state) before it can be safely used by the guest. > Conversely, before returning memory to the hypervisor during an unplug > operation, it must be converted back to a shared/decrypted state. > > Attempting to handle memory acceptance automatically using generic > architecture-level memory hotplug notifiers (e.g., MEM_GOING_ONLINE) > is not viable for devices like virtio-mem: > > 1. Granularity Mismatch: virtio-mem can dynamically hot(un)plug memory > at a subblock granularity (e.g., 2MB chunks within a 128MB memory > block). Generic memory notifiers operate on the entire memory block. > 2. Lifecycle Control: Memory must be explicitly accepted *before* it is > handed to the core memory management subsystem (the buddy allocator), > and it must be decrypted *before* being handed back to the device. > 3. State Tracking (Offline -> Re-online): If memory is offlined and > re-onlined without proper state transitions, TDX will panic on > attempting to accept an already-accepted page > (TDX_EPT_ENTRY_STATE_INCORRECT). > > To address this, this patch implements explicit CoCo memory conversions > directly within the virtio-mem driver using set_memory_encrypted() and > set_memory_decrypted(): > > - During hotplug, explicitly accepts only the physically plugged subblocks > right before fake-onlining them into the buddy allocator. > - During unplug, memory is explicitly transitioned to the shared state > before being handed back to the host. If the unplug operation fails, > the driver attempts to re-accept (encrypt) the memory. If this > re-acceptance fails, the memory is intentionally leaked to prevent > confidentiality breaches or fatal hypervisor faults. > > This was discovered while testing virtio-mem resize with TDX guests. > The associated QEMU virtio-mem + TDX patch series is under review at: > https://patchew.org/QEMU/[email protected]/ > > Note that QEMU punches the guest_memfd on KVM_HC_MAP_GPA_RANGE, when the > guest memory is decrypted. There is thus no need to discard the guest_memfd > in the virtio-mem device. > > This patch is a follow-up and supersedes "[PATCH 0/2] x86/tdx: Fix > memory hotplug in TDX guests".
Hi! Two things: 1) How do we know which state hotplugged memory is when we get it, and which state hotunplugged memory must have before returning it? I assume that other hypervisors (pKVM?) might have different demands in the future. Should that be specified in the spec? 2) Should we indicate through a device feature flag that the driver actually supports CoCo VMs? > > Assisted-by: Claude:claude-opus-4-6 > Reported-by: Chenyi Qiang <[email protected]> > Signed-off-by: Marc-André Lureau <[email protected]> > --- > drivers/virtio/virtio_mem.c | 183 > +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 173 insertions(+), 10 deletions(-) > > diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c > index 48051e9e98abf..518bc0aae5304 100644 > --- a/drivers/virtio/virtio_mem.c > +++ b/drivers/virtio/virtio_mem.c > @@ -23,6 +23,8 @@ > #include <linux/log2.h> > #include <linux/vmalloc.h> > #include <linux/suspend.h> > +#include <linux/set_memory.h> > +#include <linux/cc_platform.h> > > #include <acpi/acpi_numa.h> > > @@ -864,19 +866,90 @@ static bool virtio_mem_contains_range(struct virtio_mem > *vm, uint64_t start, > return start >= vm->addr && start + size <= vm->addr + vm->region_size; > } > > +/* > + * In CoCo (TDX, SEV-SNP) environments, hotplugged memory must be explicitly > + * accepted before use (private/encrypted), and converted to shared/decrypted > + * before returning to the hypervisor on unplug. > + */ > +static int virtio_mem_coco_set_encrypted(uint64_t addr, uint64_t size) > +{ Where is the "acceptance" happening? Do we have to use the term "acceptance" in this file at all when all we're doing is calling set_memory_encrypted()? > + return set_memory_encrypted((unsigned long)__va(addr), PFN_DOWN(size)); > +} > + > +static int virtio_mem_coco_set_decrypted(uint64_t addr, uint64_t size) > +{ > + return set_memory_decrypted((unsigned long)__va(addr), PFN_DOWN(size)); > +} > + > +/* > + * Convert all plugged subblocks of a memory block back to shared/decrypted. > + * Used to undo set_encrypted on failure or cancel. > + */ > +static void virtio_mem_sbm_coco_set_decrypted(struct virtio_mem *vm, > + unsigned long mb_id) > +{ > + int sb_id, count; > + > + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) > + return; > + > + for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id += count) { > + count = 1; > + if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1)) > + continue; > + while (sb_id + count < vm->sbm.sbs_per_mb && > + virtio_mem_sbm_test_sb_plugged(vm, mb_id, > + sb_id + count, 1)) > + count++; > + virtio_mem_coco_set_decrypted( > + virtio_mem_mb_id_to_phys(mb_id) + > + sb_id * vm->sbm.sb_size, > + (uint64_t)count * vm->sbm.sb_size); > + } > +} > + > static int virtio_mem_sbm_notify_going_online(struct virtio_mem *vm, > unsigned long mb_id) > { > + int sb_id, count, rc; > + > switch (virtio_mem_sbm_get_mb_state(vm, mb_id)) { > case VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL: > case VIRTIO_MEM_SBM_MB_OFFLINE: > - return NOTIFY_OK; > - default: > break; > + default: > + dev_warn_ratelimited(&vm->vdev->dev, > + "memory block onlining denied\n"); > + return NOTIFY_BAD; > + } > + > + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) > + return NOTIFY_OK; > + > + /* > + * In CoCo environments, explicitly accept plugged subblocks before > + * they get onlined and handed to the buddy. Only accept at subblock > + * granularity -- unplugged subblocks have no backing in the secure > + * page tables (SEPT) and accepting them would fail. > + */ > + for (sb_id = 0; sb_id < vm->sbm.sbs_per_mb; sb_id += count) { > + count = 1; > + if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, sb_id, 1)) > + continue; > + /* Coalesce consecutive plugged subblocks */ > + while (sb_id + count < vm->sbm.sbs_per_mb && > + virtio_mem_sbm_test_sb_plugged(vm, mb_id, > + sb_id + count, 1)) > + count++; > + rc = virtio_mem_coco_set_encrypted( > + virtio_mem_mb_id_to_phys(mb_id) + > + sb_id * vm->sbm.sb_size, > + (uint64_t)count * vm->sbm.sb_size); > + if (rc) > + return NOTIFY_BAD; > } This looks extremely similar to the reverse of virtio_mem_sbm_coco_set_decrypted() and should be similarly factored out in a helper. > - dev_warn_ratelimited(&vm->vdev->dev, > - "memory block onlining denied\n"); > - return NOTIFY_BAD; > + > + return NOTIFY_OK; > } > > static void virtio_mem_sbm_notify_offline(struct virtio_mem *vm, > @@ -1055,8 +1128,16 @@ static int virtio_mem_memory_notifier_cb(struct > notifier_block *nb, > break; > } > vm->hotplug_active = true; > - if (vm->in_sbm) > + if (vm->in_sbm) { > rc = virtio_mem_sbm_notify_going_online(vm, id); > + } else { > + /* > + * For BBM, accept the whole memory block range. Unlike > + * SBM, the entire big block is always plugged. > + */ > + if (virtio_mem_coco_set_encrypted(start, size)) > + rc = NOTIFY_BAD; > + } > break; > case MEM_OFFLINE: > if (vm->in_sbm) > @@ -1106,6 +1187,14 @@ static int virtio_mem_memory_notifier_cb(struct > notifier_block *nb, > case MEM_CANCEL_ONLINE: > if (!vm->hotplug_active) > break; > + /* > + * Undo CoCo acceptance done in MEM_GOING_ONLINE. Pages were > + * made private but never onlined -- convert back to shared. > + */ I think that comment can be mostly dropped. The code is pretty clear. > + if (vm->in_sbm) > + virtio_mem_sbm_coco_set_decrypted(vm, id); > + else > + virtio_mem_coco_set_decrypted(start, size); > vm->hotplug_active = false; > mutex_unlock(&vm->hotplug_mutex); > break; > @@ -1583,12 +1672,17 @@ static int virtio_mem_bbm_plug_bb(struct virtio_mem > *vm, unsigned long bb_id) > * memory block. Will fail if any subblock cannot get unplugged (instead of > * skipping it). > * > + * If @coco_shared is true, convert each subblock range to shared/decrypted > + * before unplugging. This is required for offline blocks that have a direct > + * map but must not be used for blocks in PLUGGED state (no direct map). > + * > * Will not modify the state of the memory block. > * > * Note: can fail after some subblocks were unplugged. > */ > static int virtio_mem_sbm_unplug_any_sb_raw(struct virtio_mem *vm, > - unsigned long mb_id, uint64_t > *nb_sb) > + unsigned long mb_id, uint64_t > *nb_sb, > + bool coco_shared) > { > int sb_id, count; > int rc; > @@ -1609,6 +1703,15 @@ static int virtio_mem_sbm_unplug_any_sb_raw(struct > virtio_mem *vm, > sb_id--; > } > > + if (coco_shared) { > + rc = virtio_mem_coco_set_decrypted( > + virtio_mem_mb_id_to_phys(mb_id) + > + sb_id * vm->sbm.sb_size, > + (uint64_t)count * vm->sbm.sb_size); There has to be some way to make this readable. Like local variables or different helpers. Same applies to similar code below. > + if (rc) > + return rc; > + } > + > rc = virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count); > if (rc) > return rc; > @@ -1630,7 +1733,11 @@ static int virtio_mem_sbm_unplug_mb(struct virtio_mem > *vm, unsigned long mb_id) > { > uint64_t nb_sb = vm->sbm.sbs_per_mb; > > - return virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, &nb_sb); > + /* > + * Called for PLUGGED blocks (add_memory failed) -- no direct map > + * exists, so CoCo conversion is not possible. > + */ Isn't it rather "nothing to undo"? > + return virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, &nb_sb, false); > } > > /* > @@ -1744,6 +1851,17 @@ static int virtio_mem_sbm_plug_any_sb(struct > virtio_mem *vm, > if (old_state == VIRTIO_MEM_SBM_MB_OFFLINE_PARTIAL) > continue; > > + /* Accept memory for CoCo before fake-onlining into buddy */ > + rc = virtio_mem_coco_set_encrypted( > + virtio_mem_mb_id_to_phys(mb_id) + > + sb_id * vm->sbm.sb_size, > + (uint64_t)count * vm->sbm.sb_size); > + if (rc) { > + virtio_mem_sbm_unplug_sb(vm, mb_id, sb_id, count); > + *nb_sb += count; > + return rc; > + } > + > /* fake-online the pages if the memory block is online */ > pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + > sb_id * vm->sbm.sb_size); > @@ -1941,7 +2059,8 @@ static int virtio_mem_sbm_unplug_any_sb_offline(struct > virtio_mem *vm, > { > int rc; > > - rc = virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, nb_sb); > + /* Offline blocks have a direct map -- convert for CoCo before unplug */ > + rc = virtio_mem_sbm_unplug_any_sb_raw(vm, mb_id, nb_sb, true); > > /* some subblocks might have been unplugged even on failure */ > if (!virtio_mem_sbm_test_sb_plugged(vm, mb_id, 0, vm->sbm.sbs_per_mb)) > @@ -1989,10 +2108,32 @@ static int virtio_mem_sbm_unplug_sb_online(struct > virtio_mem *vm, > if (rc) > return rc; > > + /* Convert private→shared for CoCo before handing back to device */ > + rc = virtio_mem_coco_set_decrypted( > + virtio_mem_mb_id_to_phys(mb_id) + > + sb_id * vm->sbm.sb_size, > + (uint64_t)count * vm->sbm.sb_size); God this is ugly. -- Cheers, David

