On Thursday, July 4, 2024 2:04 AM, Peter Xu wrote:
> On Wed, Jul 03, 2024 at 10:49:12PM +0800, Wei Wang wrote:
> > When enforce_cpuid is set to false, the guest is launched with a
> > filtered set of features, meaning that unsupported features by the
> > host are removed from the guest's vCPU model. This could cause issues for
> live migration.
> > For example, a guest on the source is running with features A and B.
> > If the destination host does not support feature B, the stub guest can
> > still be launched on the destination with feature A only if
> enforce_cpuid=false.
> > Live migration can start in this case, though it may fail later when
> > the states of feature B are put to the destination side. This failure
> > occurs in the late stage (i.e., stop&copy phase) of the migration
> > flow, where the source guest has already been paused. Tests show that
> > in such cases the source guest does not recover, and the destination
> > is unable to resume to run.
> >
> > Make "enfore_cpuid=true" a hard requirement for a guest to be
> > migratable, and change the default value of "enforce_cpuid" to true,
> > making the guest vCPUs migratable by default. If the destination stub
> > guest has inconsistent CPUIDs (i.e., destination host cannot support
> > the features defined by the guest's vCPU model), it fails to boot
> > (with enfore_cpuid=true by default), thereby preventing migration from
> > occuring. If enfore_cpuid=false is explicitly added for the guest, the
> > guest is deemed as non-migratable (via the migration blocker), so the
> > above issue won't occur as the guest won't be migrated.
> >
> > Tested-by: Lei Wang <lei4.w...@intel.com>
> > Signed-off-by: Wei Wang <wei.w.w...@intel.com>
> 
> [Copy Jiri and Dan for libvirt-side implications]

Thanks!

> 
> > ---
> >  target/i386/cpu.c     |  2 +-
> >  target/i386/kvm/kvm.c | 25 +++++++++++++++----------
> >  2 files changed, 16 insertions(+), 11 deletions(-)
> >
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index
> > 4c2e6f3a71..7db4fe4ead 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = {
> >      DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU,
> > hyperv_ver_id_sn, 0),
> >
> >      DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> > -    DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> > +    DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true),
> 
> I assume in many cases people can still properly migrate when the hosts are
> similar or identical, so maybe we at least want the old machine types keep
> working (by introducing a machine compat property)?

You meant keeping "enforce_cpuid=false" for old machine types (e.g. before 9.1)?
This will make them non-migratable with this patch, but they were migratable (by
default) as "migratable" wasn't enforced by "enforce_cpuid". Should we keep them
being migratable by default (e.g. enforce_cpuid=true) as well?

> 
> >      DEFINE_PROP_BOOL("x-force-features", X86CPU, force_features, false),
> >      DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> >      DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0), diff --git
> > a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index
> > dd8b0f3313..aee717c1cf 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -1741,7 +1741,7 @@ static int hyperv_init_vcpu(X86CPU *cpu)
> >      return 0;
> >  }
> >
> > -static Error *invtsc_mig_blocker;
> > +static Error *cpu_mig_blocker;
> >
> >  #define KVM_MAX_CPUID_ENTRIES  100
> >
> > @@ -2012,6 +2012,15 @@ full:
> >      abort();
> >  }
> >
> > +static bool kvm_vcpu_need_block_migration(X86CPU *cpu) {
> > +    CPUX86State *env = &cpu->env;
> > +
> > +    return !cpu->enforce_cpuid ||
> > +           (!env->user_tsc_khz && (env->features[FEAT_8000_0007_EDX] &
> > +                                   CPUID_APM_INVTSC)); }
> 
> Nit: maybe it's nice this returns a "const char*" with detailed reasons to be 
> put
> into the error_setg(), so it dumps the same as before for the invtsc blocker.

Sounds good. I'll check how to incorporate such info into the logs.

Reply via email to