On Friday, July 5, 2024 9:34 PM, Peter Xu wrote: > On Fri, Jul 05, 2024 at 10:22:23AM +0000, Wang, Wei W wrote: > > On Thursday, July 4, 2024 11:59 PM, Peter Xu wrote: > > > On Thu, Jul 04, 2024 at 03:10:27PM +0000, Wang, Wei W wrote: > > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index > > > > > > 4c2e6f3a71..7db4fe4ead 100644 > > > > > > --- a/target/i386/cpu.c > > > > > > +++ b/target/i386/cpu.c > > > > > > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = { > > > > > > DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU, > > > > > > hyperv_ver_id_sn, 0), > > > > > > > > > > > > DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), > > > > > > - DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), > > > > > > + DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true), > > > > > > > > > > I assume in many cases people can still properly migrate when > > > > > the hosts are similar or identical, so maybe we at least want > > > > > the old machine types keep working (by introducing a machine compat > property)? > > > > > > > > You meant keeping "enforce_cpuid=false" for old machine types > > > > (e.g. before > > > 9.1)? > > > > This will make them non-migratable with this patch, but they were > > > > migratable (by > > > > default) as "migratable" wasn't enforced by "enforce_cpuid". > > > > Should we keep them being migratable by default (e.g. > enforce_cpuid=true) as well? > > > > > > Ah, this is trickier than I thought.. > > > > > > The issue is if we make them silently switch to enforce_cpuid=true > > > on old machines, there's chance they start to fail boot, am I right? > > > > Right for newly launched guests, regardless of whether they are new or > > old machine types, they will fail to boot when the host cannot afford > > the features for the configured vCPU models. This is expected, and > > actually part of the intentions of this patch. > > > > When there is a need to boot a guest with reduced features, users need > > to explicitly add "enforce_cpuid=false", which marks the new booted > > guest as non-migratable, or a _better_ way, to identify the > > unsupported features from the host first, and then get it booted with > > "-cpu CpuModel,-A,-B", this can make it migratable with those known > > reduced features, and the destination guest is required to use the > > same QEMU commands (as usual) to reduce the same set of features as the > source and get a enforced check by "enforce_cpuid". > > > > For live update of QEMU for existing running guests (as you mentioned > > below), the impact is only on the running guests that have had > > features reduced from vCPU models (at the time of their original > > launch). For this case, the recommended way to update them to the new > > QEMU is also to explicitly identify the reduced features and update them > with "-cpu CpuModel,-A,-B". > > > > The rationale behind this is that the features reduced from the guest > > needs to be explicitly determined and controllable. In terms of live > > migration, the destination is ensured to have the same set of reduced > > features as the source side. > > > > > > > > if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) { > > > error_setg(&local_err, > > > accel_uses_host_cpuid() ? > > > "Host doesn't support requested features" : > > > "TCG doesn't support requested features"); > > > goto out; > > > } > > > > > > I suppose we still need to keep all the old worlds running all fine > > > without breaking them when people do an QEMU upgrade. It needs to > > > work both on booting fine, and on allowing to migrate. > > > > > > So maybe we actually need two things? > > > > > > - One patch introduce forbit_migration_if_cpuid_mismatch property, > when > > > set, block migration if not enforced, otherwise it should still allow > > > migration even if enforce_cpud=fales. It should default to on, but > > > off > > > on old machines. > > > > > > - One patch change default value of enforce_cpuid to on, but turn it off > > > on old machines. > > > > > > Does that look right? > > > > I think this can work. Not sure what you would think about the above > explanations. > > If agree, then probably we don’t need to add the extra complexity. > > > > Also, the above two things seem to impede the upgrade for guests with > > older machine types to incorporate this enforcement. I think the > > primary goal of live updating to a newer QEMU version is to benefit from the > enhancements offered by the new QEMU. > > So it seems more beneficial to bring old guests under such > > enforcements, given that this doesn't break functionalities that the > > guest is running. The only requirement for this is to upgrade using > > more explicit QEMU commands (i.e., -cpu CpuModel,-A,-B) when needed. > > What you said makes sense. It's just that the concern still exists, and I'm > not > sure whether that'll be too much to ask for a customer. > > Also, see this commit: > > commit 15e41345906d29a319cc9cdf566347bf79134d24 > Author: Eduardo Habkost <ehabk...@redhat.com> > Date: Wed Aug 26 13:25:44 2015 -0300 > > target-i386: Enable "check" mode by default > > Current default behavior of QEMU is to silently disable features that > are not supported by the host when a CPU model is requested in the > command-line. This means that in addition to risking breaking guest ABI > by default, we are silent about it. > > I would like to enable "enforce" by default, but this can easily break > existing production systems because of the way libvirt makes assumptions > about CPU models today (this will change in the future, once QEMU > provide a proper interface for checking if a CPU model is runnable). > > But there's no reason we should be silent about it. So, change > target-i386 to enable "check" mode by default so at least we have some > warning printed to stderr (and hopefully logged somewhere) when QEMU > disables a feature that is not supported by the host system. > > Reviewed-by: Igor Mammedov <imamm...@redhat.com> > Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> > > I don't think I know what's the "libvirt assumptions" mentioned, and how it > changed as of today. I had a vague memory Libvirt constantly set off on some > of the relevant flags last time Jiri explained some cpuid issues to me; maybe > it's > "check" not "enforce"? It would be great if Jiri or Dan can comment here.
I had a check in libvirt and didn't find "enforce" is set to false anywhere (except only in some tests, which I think should be fine). Seems we haven't got responses from more folks here :) Is it appropriate to open a discussion thread in de...@lists.libvirt.org to confirm?