On Friday, July 5, 2024 9:34 PM, Peter Xu wrote:
> On Fri, Jul 05, 2024 at 10:22:23AM +0000, Wang, Wei W wrote:
> > On Thursday, July 4, 2024 11:59 PM, Peter Xu wrote:
> > > On Thu, Jul 04, 2024 at 03:10:27PM +0000, Wang, Wei W wrote:
> > > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index
> > > > > > 4c2e6f3a71..7db4fe4ead 100644
> > > > > > --- a/target/i386/cpu.c
> > > > > > +++ b/target/i386/cpu.c
> > > > > > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = {
> > > > > >      DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU,
> > > > > > hyperv_ver_id_sn, 0),
> > > > > >
> > > > > >      DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
> > > > > > -    DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> > > > > > +    DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true),
> > > > >
> > > > > I assume in many cases people can still properly migrate when
> > > > > the hosts are similar or identical, so maybe we at least want
> > > > > the old machine types keep working (by introducing a machine compat
> property)?
> > > >
> > > > You meant keeping "enforce_cpuid=false" for old machine types
> > > > (e.g. before
> > > 9.1)?
> > > > This will make them non-migratable with this patch, but they were
> > > > migratable (by
> > > > default) as "migratable" wasn't enforced by "enforce_cpuid".
> > > > Should we keep them being migratable by default (e.g.
> enforce_cpuid=true) as well?
> > >
> > > Ah, this is trickier than I thought..
> > >
> > > The issue is if we make them silently switch to enforce_cpuid=true
> > > on old machines, there's chance they start to fail boot, am I right?
> >
> > Right for newly launched guests, regardless of whether they are new or
> > old machine types, they will fail to boot when the host cannot afford
> > the features for the configured vCPU models. This is expected, and
> > actually part of the intentions of this patch.
> >
> > When there is a need to boot a guest with reduced features, users need
> > to explicitly add "enforce_cpuid=false", which marks the new booted
> > guest as non-migratable, or a _better_ way, to identify the
> > unsupported features from the host first, and then get it booted with
> > "-cpu CpuModel,-A,-B", this can make it migratable with those known
> > reduced features, and the destination guest is required to use the
> > same QEMU commands (as usual) to reduce the same set of features as the
> source and get a enforced check by "enforce_cpuid".
> >
> > For live update of QEMU for existing running guests (as you mentioned
> > below), the impact is only on the running guests that have had
> > features reduced from vCPU models (at the time of their original
> > launch). For this case, the recommended way to update them to the new
> > QEMU is also to explicitly identify the reduced features and update them
> with "-cpu CpuModel,-A,-B".
> >
> > The rationale behind this is that the features reduced from the guest
> > needs to be explicitly determined and controllable. In terms of live
> > migration, the destination is ensured to have the same set of reduced
> > features as the source side.
> >
> > >
> > >     if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
> > >         error_setg(&local_err,
> > >                    accel_uses_host_cpuid() ?
> > >                        "Host doesn't support requested features" :
> > >                        "TCG doesn't support requested features");
> > >         goto out;
> > >     }
> > >
> > > I suppose we still need to keep all the old worlds running all fine
> > > without breaking them when people do an QEMU upgrade.  It needs to
> > > work both on booting fine, and on allowing to migrate.
> > >
> > > So maybe we actually need two things?
> > >
> > >   - One patch introduce forbit_migration_if_cpuid_mismatch property,
> when
> > >     set, block migration if not enforced, otherwise it should still allow
> > >     migration even if enforce_cpud=fales.  It should default to on, but 
> > > off
> > >     on old machines.
> > >
> > >   - One patch change default value of enforce_cpuid to on, but turn it off
> > >     on old machines.
> > >
> > > Does that look right?
> >
> > I think this can work. Not sure what you would think about the above
> explanations.
> > If agree, then probably we don’t need to add the extra complexity.
> >
> > Also, the above two things seem to impede the upgrade for guests with
> > older machine types to incorporate this enforcement. I think the
> > primary goal of live updating to a newer QEMU version is to benefit from the
> enhancements offered by the new QEMU.
> > So it seems more beneficial to bring old guests under such
> > enforcements, given that this doesn't break functionalities that the
> > guest is running. The only requirement for this is to upgrade using
> > more explicit QEMU commands (i.e., -cpu CpuModel,-A,-B) when needed.
> 
> What you said makes sense.  It's just that the concern still exists, and I'm 
> not
> sure whether that'll be too much to ask for a customer.
> 
> Also, see this commit:
> 
> commit 15e41345906d29a319cc9cdf566347bf79134d24
> Author: Eduardo Habkost <ehabk...@redhat.com>
> Date:   Wed Aug 26 13:25:44 2015 -0300
> 
>     target-i386: Enable "check" mode by default
> 
>     Current default behavior of QEMU is to silently disable features that
>     are not supported by the host when a CPU model is requested in the
>     command-line. This means that in addition to risking breaking guest ABI
>     by default, we are silent about it.
> 
>     I would like to enable "enforce" by default, but this can easily break
>     existing production systems because of the way libvirt makes assumptions
>     about CPU models today (this will change in the future, once QEMU
>     provide a proper interface for checking if a CPU model is runnable).
> 
>     But there's no reason we should be silent about it. So, change
>     target-i386 to enable "check" mode by default so at least we have some
>     warning printed to stderr (and hopefully logged somewhere) when QEMU
>     disables a feature that is not supported by the host system.
> 
>     Reviewed-by: Igor Mammedov <imamm...@redhat.com>
>     Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
> 
> I don't think I know what's the "libvirt assumptions" mentioned, and how it
> changed as of today.  I had a vague memory Libvirt constantly set off on some
> of the relevant flags last time Jiri explained some cpuid issues to me; maybe 
> it's
> "check" not "enforce"? It would be great if Jiri or Dan can comment here.

I had a check in libvirt and didn't find "enforce" is set to false anywhere 
(except
only in some tests, which I think should be fine).
Seems we haven't got responses from more folks here :)
Is it appropriate to open a discussion thread in de...@lists.libvirt.org to 
confirm?

Reply via email to