On Thursday, July 4, 2024 11:59 PM, Peter Xu wrote: > On Thu, Jul 04, 2024 at 03:10:27PM +0000, Wang, Wei W wrote: > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index > > > > 4c2e6f3a71..7db4fe4ead 100644 > > > > --- a/target/i386/cpu.c > > > > +++ b/target/i386/cpu.c > > > > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = { > > > > DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU, > > > > hyperv_ver_id_sn, 0), > > > > > > > > DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), > > > > - DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), > > > > + DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true), > > > > > > I assume in many cases people can still properly migrate when the > > > hosts are similar or identical, so maybe we at least want the old > > > machine types keep working (by introducing a machine compat property)? > > > > You meant keeping "enforce_cpuid=false" for old machine types (e.g. before > 9.1)? > > This will make them non-migratable with this patch, but they were > > migratable (by > > default) as "migratable" wasn't enforced by "enforce_cpuid". Should we > > keep them being migratable by default (e.g. enforce_cpuid=true) as well? > > Ah, this is trickier than I thought.. > > The issue is if we make them silently switch to enforce_cpuid=true on old > machines, there's chance they start to fail boot, am I right?
Right for newly launched guests, regardless of whether they are new or old machine types, they will fail to boot when the host cannot afford the features for the configured vCPU models. This is expected, and actually part of the intentions of this patch. When there is a need to boot a guest with reduced features, users need to explicitly add "enforce_cpuid=false", which marks the new booted guest as non-migratable, or a _better_ way, to identify the unsupported features from the host first, and then get it booted with "-cpu CpuModel,-A,-B", this can make it migratable with those known reduced features, and the destination guest is required to use the same QEMU commands (as usual) to reduce the same set of features as the source and get a enforced check by "enforce_cpuid". For live update of QEMU for existing running guests (as you mentioned below), the impact is only on the running guests that have had features reduced from vCPU models (at the time of their original launch). For this case, the recommended way to update them to the new QEMU is also to explicitly identify the reduced features and update them with "-cpu CpuModel,-A,-B". The rationale behind this is that the features reduced from the guest needs to be explicitly determined and controllable. In terms of live migration, the destination is ensured to have the same set of reduced features as the source side. > > if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) { > error_setg(&local_err, > accel_uses_host_cpuid() ? > "Host doesn't support requested features" : > "TCG doesn't support requested features"); > goto out; > } > > I suppose we still need to keep all the old worlds running all fine without > breaking them when people do an QEMU upgrade. It needs to work both on > booting fine, and on allowing to migrate. > > So maybe we actually need two things? > > - One patch introduce forbit_migration_if_cpuid_mismatch property, when > set, block migration if not enforced, otherwise it should still allow > migration even if enforce_cpud=fales. It should default to on, but off > on old machines. > > - One patch change default value of enforce_cpuid to on, but turn it off > on old machines. > > Does that look right? I think this can work. Not sure what you would think about the above explanations. If agree, then probably we don’t need to add the extra complexity. Also, the above two things seem to impede the upgrade for guests with older machine types to incorporate this enforcement. I think the primary goal of live updating to a newer QEMU version is to benefit from the enhancements offered by the new QEMU. So it seems more beneficial to bring old guests under such enforcements, given that this doesn't break functionalities that the guest is running. The only requirement for this is to upgrade using more explicit QEMU commands (i.e., -cpu CpuModel,-A,-B) when needed.