On Fri, Apr 24, 2026 at 12:08:45PM -0700, Ackerley Tng wrote: > Michael Roth <[email protected]> writes: > > Thank you for your patches! > > > > > [...snip...] > > > >> > >> I also did some minor updates (prefixed with a "[squash]" tag) to advertise > >> the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by > > > > Though I'm not sure how we deal with it if SNP/TDX at some point become > > capable of using the PRESERVED flag *after* populate... but maybe that's > > too unlikely to worry about? If we wanted to address it though, we could > > have both PRESERVED and PRESERVED_BEFORE_LAUNCH so they can be > > enumerated separately from the start. > > > > Not sure how likely it is, but if SNP and TDX can honor PRESERVE > semantics after populate, I think we could implement support under a new > flag like CIPHER.
That works, but it still makes things *slightly* awkward due to special-casing the PRESERVE semantics for 1 guest type vs. another. For instance, for the QEMU patches what I'd like to do is be able to issue conversions with the PRESERVE flag set, and it makes sense that when the function that handles that gets called (in this case, guest_memfd_set_memory_attributes_fd()), it asserts that QEMU checked for the corresponding capability/flag (via KVM_CAP_MEMORY_ATTRIBUTES2_FLAGS) earlier on during QEMU init time and stored that into kvm_supported_memory_attributes2_flag to check against later in cases where we try to use PRESERVE: https://github.com/AMDESE/qemu/blob/1308d4bcd8dd8acd151243e96ff0be4098389b93/accel/kvm/kvm-all.c#L1655 but that check is sort of incomplete: for use-cases like pKVM it makes sense to allow conversion+PRESERVE at all times, but for SNP/TDX we'd ideally have an additional check like: static int guest_memfd_set_memory_attributes_fd(..., attrs, flags) { struct kvm_memory_attributes2 attrs; ... assert(kvm_supported_memory_attributes & attrs); assert(kvm_supported_memory_attributes2_flags & flags); /* SNP and TDX only support PRESERVE prior to launch) */ if (vm_type_is_snp_or_tdx() && (flags & PRESERVE)) assert(vcpus_not_yet_started_and_LAUNCH_FINISH_etc_not_yet_called); ... } maybe that's fine, but there's just some asymmetry in the fact that the capabilities checks are essentially self-documenting/self-enforcing in this regard, except for PRESERVE+SNP/TDX where userspace needs to sprinkle some additional vm_type-specific logic. So that's why I was sort of thinking PRESERVE might deserve a PRESERVE_BEFORE_LAUNCH or something so the kernel can enumerate the support rather than userspace needing to read into the documentation / implementation to implement finer-grained checks/etc. > > CIPHER can then be used to mean "do the encryption or decryption", and > for platforms not supporting encryption, they'd stick with PRESERVE? It's all theoretical, but I'd imagine that *if* SNP/TDX ever added support for allowing userspace to write encrypted memory into a guest range, it would be something like: - write plain-text data to corresponding GPA - issue shared-to-private conversion with CIPHER (or PRESERVE) bit set, which will then be augmented to make some sort of firmware call to encrypt the memory with guest key and make it visible to guest So, at a high-level, if host writes 0xbeef to shared memory, then upon completion of the converison ioctl, 0xbeef then be visible to the guest via encrypted/private memory, which seems to match the semantics of PRESERVE perfectly: +``KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE`` + + On conversion, KVM guarantees memory contents will be preserved with + respect to the last written unencrypted value. As a concrete + example, if the host writes ``0xbeef`` to shared memory and converts + the memory to private, the guest will also read ``0xbeef``, even if + the in-memory data is encrypted as part of the conversion. And vice + versa, if the guest writes ``0xbeef`` to private memory and then + converts the memory to shared, the host (and guest) will read + ``0xbeef`` (if the memory is accessible). So it seems like a waste to have to add a CIPHER that would essentially have the same semantics (even though the architectural implementation details are different), and then complicate the documentation of PRESERVE to have vm-specific behaviors that userspace needs to account for, rather than leaving PRESERVE as-is and adding a PRESERVE_BEFORE_LAUNCH that neatly encapsulates the specialness of the SNP/TDX flow without complicating the more generally-defined flags like PRESERVE that could in theory become usable by all variants in the future. But then that sort of has me wondering if PRESERVE should be PRESERVE_AFTER_LAUNCH (to cover cases where maybe SNP/TDX gain this ability in the future, but can only use it post-launch, where other architectures might allow it both before/after.) ...but that seems to be getting into a more general issue of how we determine what is available before-launch/after-launch, so maybe instead of introducing new flags, we just have 2 capabilities that returns what flags are available at each particular phase, e.g. KVM_CAP_MEMORY_ATTRIBUTES2_PRE_LAUNCH_FLAGS KVM_CAP_MEMORY_ATTRIBUTES2_POST_LAUNCH_FLAGS And then userspace can easily probe/lock down what it expects to be available prior to launching the guest vs after without any vm-specific logic, and we can continue to just stick with PRESERVE/ZERO. (and if we do end up need to further distinguish pre vs. post-launch behavior, this gives us some flexibility to handle that as well). > > Should we redefine the semantics of PRESERVE to be "ensure that memory > contents don't change while guest_memfd tracking is being updated" and > avoid making a commitment on how the guest should read the memory? > > The above update would be aligned with ZERO not being allowed for > conversions to private (because KVM/guest_memfd does not make guarantees > about the contract between the host and guest. It's a little awkward here too, because PRESERVE is sort of making a contract between host/guest: we are providing trusted data to the guest. However, there's an attestation proces that allows the guest to enforce that for pre-launch and ensure we are following through on that contract. PRESERVE for post-launch...I'm not sure how that could be enforced, but I think this is further reason to allow pre vs. post launch flags to be enumerated separately. Thanks, Mike > > This way, all of those (ZERO, PRESERVE) will focus on KVM's interface > with the host. > > This lines up for SW_PROTECTED_VMs too, since reading memory that didn't > change in the guest is the contract between SW_PROTECTED_VMs and the > host. > > >> userspace for SNP/TDX in the kvm_gmem_populate() path as agreed upon > >> during PUCK. > >> > >> > >> [...snip...] > >>
