Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2

Michael Roth Wed, 29 Apr 2026 16:22:29 -0700

On Fri, Apr 24, 2026 at 12:08:45PM -0700, Ackerley Tng wrote:
> Michael Roth <[email protected]> writes:
> 
> Thank you for your patches!
> 
> >
> > [...snip...]
> >
> >>
> >> I also did some minor updates (prefixed with a "[squash]" tag) to advertise
> >> the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVED flag so it can be used by
> >
> > Though I'm not sure how we deal with it if SNP/TDX at some point become
> > capable of using the PRESERVED flag *after* populate... but maybe that's
> > too unlikely to worry about? If we wanted to address it though, we could
> > have both PRESERVED and PRESERVED_BEFORE_LAUNCH so they can be
> > enumerated separately from the start.
> >
> 
> Not sure how likely it is, but if SNP and TDX can honor PRESERVE
> semantics after populate, I think we could implement support under a new
> flag like CIPHER.


That works, but it still makes things *slightly* awkward due to special-casing
the PRESERVE semantics for 1 guest type vs. another.

For instance, for the QEMU patches what I'd like to do is be able to
issue conversions with the PRESERVE flag set, and it makes sense that
when the function that handles that gets called (in this case,
guest_memfd_set_memory_attributes_fd()), it asserts that QEMU
checked for the corresponding capability/flag (via
KVM_CAP_MEMORY_ATTRIBUTES2_FLAGS) earlier on during QEMU init time
and stored that into kvm_supported_memory_attributes2_flag to check
against later in cases where we try to use PRESERVE:

  
https://github.com/AMDESE/qemu/blob/1308d4bcd8dd8acd151243e96ff0be4098389b93/accel/kvm/kvm-all.c#L1655

but that check is sort of incomplete: for use-cases like pKVM it makes
sense to allow conversion+PRESERVE at all times, but for SNP/TDX we'd
ideally have an additional check like:

  static int guest_memfd_set_memory_attributes_fd(..., attrs, flags)
  {
      struct kvm_memory_attributes2 attrs;
      ...

      assert(kvm_supported_memory_attributes & attrs);
      assert(kvm_supported_memory_attributes2_flags & flags);

      /* SNP and TDX only support PRESERVE prior to launch) */
      if (vm_type_is_snp_or_tdx() && (flags & PRESERVE))
          assert(vcpus_not_yet_started_and_LAUNCH_FINISH_etc_not_yet_called);
  
      ...
  }

maybe that's fine, but there's just some asymmetry in the fact that
the capabilities checks are essentially self-documenting/self-enforcing
in this regard, except for PRESERVE+SNP/TDX where userspace needs to
sprinkle some additional vm_type-specific logic.

So that's why I was sort of thinking PRESERVE might deserve a
PRESERVE_BEFORE_LAUNCH or something so the kernel can enumerate the
support rather than userspace needing to read into the documentation /
implementation to implement finer-grained checks/etc.

> 
> CIPHER can then be used to mean "do the encryption or decryption", and
> for platforms not supporting encryption, they'd stick with PRESERVE?

It's all theoretical, but I'd imagine that *if* SNP/TDX ever added
support for allowing userspace to write encrypted memory into a guest
range, it would be something like:

  - write plain-text data to corresponding GPA
  - issue shared-to-private conversion with CIPHER (or PRESERVE) bit
    set, which will then be augmented to make some sort of firmware
    call to encrypt the memory with guest key and make it visible to
    guest

So, at a high-level, if host writes 0xbeef to shared memory, then upon
completion of the converison ioctl, 0xbeef then be visible to the guest
via encrypted/private memory, which seems to match the semantics of
PRESERVE perfectly:

+``KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE``
+
+  On conversion, KVM guarantees memory contents will be preserved with
+  respect to the last written unencrypted value.  As a concrete
+  example, if the host writes ``0xbeef`` to shared memory and converts 
+  the memory to private, the guest will also read ``0xbeef``, even if
+  the in-memory data is encrypted as part of the conversion.  And vice
+  versa, if the guest writes ``0xbeef`` to private memory and then
+  converts the memory to shared, the host (and guest) will read
+  ``0xbeef`` (if the memory is accessible).

So it seems like a waste to have to add a CIPHER that would essentially
have the same semantics (even though the architectural implementation
details are different), and then complicate the documentation of
PRESERVE to have vm-specific behaviors that userspace needs to account
for, rather than leaving PRESERVE as-is and adding a
PRESERVE_BEFORE_LAUNCH that neatly encapsulates the specialness of the
SNP/TDX flow without complicating the more generally-defined flags like
PRESERVE that could in theory become usable by all variants in the future.

But then that sort of has me wondering if PRESERVE should be
PRESERVE_AFTER_LAUNCH (to cover cases where maybe SNP/TDX gain this
ability in the future, but can only use it post-launch, where other
architectures might allow it both before/after.)

...but that seems to be getting into a more general issue of how we
determine what is available before-launch/after-launch, so maybe instead
of introducing new flags, we just have 2 capabilities that returns what
flags are available at each particular phase, e.g.

  KVM_CAP_MEMORY_ATTRIBUTES2_PRE_LAUNCH_FLAGS
  KVM_CAP_MEMORY_ATTRIBUTES2_POST_LAUNCH_FLAGS

And then userspace can easily probe/lock down what it expects to be
available prior to launching the guest vs after without any vm-specific
logic, and we can continue to just stick with PRESERVE/ZERO. (and if
we do end up need to further distinguish pre vs. post-launch behavior,
this gives us some flexibility to handle that as well).

> 
> Should we redefine the semantics of PRESERVE to be "ensure that memory
> contents don't change while guest_memfd tracking is being updated" and
> avoid making a commitment on how the guest should read the memory?
> 
> The above update would be aligned with ZERO not being allowed for
> conversions to private (because KVM/guest_memfd does not make guarantees
> about the contract between the host and guest.

It's a little awkward here too, because PRESERVE is sort of making a
contract between host/guest: we are providing trusted data to the guest.
However, there's an attestation proces that allows the guest to enforce
that for pre-launch and ensure we are following through on that
contract. PRESERVE for post-launch...I'm not sure how that could be
enforced, but I think this is further reason to allow pre vs. post
launch flags to be enumerated separately.

Thanks,

Mike

> 
> This way, all of those (ZERO, PRESERVE) will focus on KVM's interface
> with the host.
> 
> This lines up for SW_PROTECTED_VMs too, since reading memory that didn't
> change in the guest is the contract between SW_PROTECTED_VMs and the
> host.
> 
> >> userspace for SNP/TDX in the kvm_gmem_populate() path as agreed upon
> >> during PUCK.
> >>
> >>
> >> [...snip...]
> >>

Re: [PATCH RFC v4 10/44] KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES2

Reply via email to