On Tue, Jun 25, 2024 at 10:51:43AM +0100, Daniel P. Berrangé wrote:
> On Mon, Jun 24, 2024 at 08:19:19PM -0500, Michael Roth wrote:
> > On Fri, Jun 14, 2024 at 11:39:24AM +0100, Daniel P. Berrangé wrote:
> > > The KVM_SEV_INIT2 ioctl was only introduced in Linux 6.10, which will
> > > only have been released for a bit over a month when QEMU 9.1 is
> > > released.
> > > 
> > > The SEV(-ES) support in QEMU has been present since 2.12 dating back
> > > to 2018. With this in mind, the overwhealming majority of users of
> > > SEV(-ES) are unlikely to be running Linux >= 6.10, any time in the
> > > forseeable future.
> > > 
> > > IOW, defaulting new QEMU to 'legacy-vm-type=false' means latest QEMU
> > > machine types will be broken out of the box for most SEV(-ES) users.
> > > Even if the kernel is new enough, it also affects the guest measurement,
> > > which means that their existing tools for validating measurements will
> > > also be broken by the new default.
> > > 
> > > This is not a sensible default choice at this point in time. Revert to
> > > the historical behaviour which is compatible with what most users are
> > > currently running.
> > 
> > Part of the reason for the change is that SEV-ES measurements are
> > already affected by some short-comings of the legacy KVM_SEV_ES_INIT
> > API. Namely, if the kvm_amd.debug-swap module param is used to enable
> > that SEV-ES feature, then that feature will get enabled on the KVM side
> > and change the initial guest measurement (due to VMSA_FEATURES field
> > of the vCPU's VMSA changing), and userspace has no way to control that
> > on a per-VM basis, so measurement for any particular invocation will
> > be somewhat random depending on the system configuration and kernel
> > level.
> 
> The debug-swap feature was set to disabled by default. So that
> could be just a docs problem to say if you want to use that
> feature, then you must set the legacy-vm=false property. IOW
> an opt-in to incompatible behaviour.

debug-swap defaulted to true for KVM_SEV*_INIT guests unfortunately, so
the ship sailed on preparing users for the change in advance and instead
over time legacy guests users will gradually see the measurement change
when they upgrade to new kernels and then need to take steps to either
adjust their measurement calculation or disable debug-swap via module
parameters.

debug-swap is fairly recent as well however, so there's a fair chance
users hitting the above issue will have the option of switching over to
KVM_SEV_INIT2 where it's not much additional work to update
measurements, and in turn they'll benefit from better control over what
ends up in the VMSA as well. If they do plan to eventually switch to SNP,
these steps will bring them closer toward that end as well since there's
a lot of common handling/infrastructure in that regard.

> 
> 
> > I think that's why users of newer QEMU machine types are highly
> > encouraged to switch to the new KVM_SEV_INIT2 interface. I do see this
> > causing issues for older QEMU machine types that previously relied on
> > the legacy interface, since we do want to avoid measurement changing
> > for an existing guest that was previously working on an older kernel,
> > which is why this flag defaults to true for pre-9.1 machine types.
> 
> This justification mis-understands how machine types are actually
> used in practice though. There is *zero* correlation between use
> of the new machine types, and availabilty of the new kernel
> interface. 
> 
> 99% of usage of QEMU, will just ask for the unversioned "q35"
> / "pc" machines. They will be expanded to the very latest machine
> type version, either internally by QEMU, or by libvirt prior to
> launching the VM.

In my experience that's how many VMs start off until they start breaking
on newer kernels/QEMUs, then everyone scrambles to revert to the old
behavior. Quite often that ends up involving just tacking on an explicit
machine-type to maintain migration/behavioral compatibility with what
QEMU originally defaulted to when the VM was created.

But when first creating the VM, there is less expectation about what
should/shouldn't work. If they see failures because KVM_SEV_INIT2 isn't
available, it seems worthwhile that they need to make a decision on
whether to upgrade kernel or adopt the legacy behavior and be stuck on
a reduced featureset for the life of the guest. "Just works" is nice,
but "just working" in the case of KVM_SEV*_INIT comes with potential
headaches down the road and ideally users would be aware of what they
are signing up for.

If failing is too heavy-handed, maybe some type of warning that gets
printed by QEMU any time KVM_SEV*_INIT set? Then maybe down the road
if we decide to finally default KVM_SEV_INIT2, there's a better chance
that users have taken the hint and have already made the transiton?

I'll also defer to the maintainers on this point though since there are
clearly merits to both approaches.

> 
> Either way, you can expect essentially everything to be running on
> the latest machine type versions, regardless of kernel version.
> 
> So making the latest machine types dependent on a kernel version
> that is brand new is just not a sensible default. Latest QEMU
> machines types need to work on kernel releases years old, without
> expecting the user to set magic flags to avoid incompatibility.
> 
> > I was actually planning to go the other direction on this because
> > currently for 9.1+, QEMU will try to use KVM_SEV_INIT2 if
> > KVM_CAP_VM_TYPES advertises its availability, but otherwise fall back to
> > the above KVM_SEV_ES_INIT interface and potential inherit the issues
> > noted above. So I was planning on getting rid of the fallback, and
> > basically only allowing legacy KVM_SEV_ES_INIT for 9.1+ if the user
> > manually sets sev_guest->legacy_vm_type via cmdline.
> 
> Dynamic detection of SEV_INIT vs SEV-INIT2 is a bad idea as that
> breaks migration when someone is moving from a host with new
> kernel to an older kernel, while keeping the QEMU machine type
> unchanged. The behaviour of what kernel feature to use must be
> controllable with an explicit choice.

Totally agreed on this point. I've sent a patch that does this, and
adopted the QAPI wording you used in this patch so there is less churn
if they are both applied:

  https://lore.kernel.org/kvm/20240704000019.3928862-1-michael.r...@amd.com/T/#u

Thanks,

Mike

> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
> 

Reply via email to