On Thu, Jul 11, 2024 at 01:48:59PM +0000, Wang, Wei W wrote:
> On Thursday, July 11, 2024 8:25 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 11, 2024 at 12:10:34PM +0000, Wang, Wei W wrote:
> > > On Thursday, July 11, 2024 7:48 PM, Daniel P. Berrangé wrote:
> > > > On Wed, Jul 03, 2024 at 10:49:12PM +0800, Wei Wang wrote:
> > > > > When enforce_cpuid is set to false, the guest is launched with a
> > > > > filtered set of features, meaning that unsupported features by the
> > > > > host are removed from the guest's vCPU model. This could cause
> > > > > issues for
> > > > live migration.
> > > > > For example, a guest on the source is running with features A and B.
> > > > > If the destination host does not support feature B, the stub guest
> > > > > can still be launched on the destination with feature A only if
> > enforce_cpuid=false.
> > > > > Live migration can start in this case, though it may fail later
> > > > > when the states of feature B are put to the destination side. This
> > > > > failure occurs in the late stage (i.e., stop&copy phase) of the
> > > > > migration flow, where the source guest has already been paused.
> > > > > Tests show that in such cases the source guest does not recover,
> > > > > and the destination is unable to resume to run.
> > > > >
> > > > > Make "enfore_cpuid=true" a hard requirement for a guest to be
> > > > > migratable, and change the default value of "enforce_cpuid" to
> > > > > true, making the guest vCPUs migratable by default. If the
> > > > > destination stub guest has inconsistent CPUIDs (i.e., destination
> > > > > host cannot support the features defined by the guest's vCPU
> > > > > model), it fails to boot (with enfore_cpuid=true by default),
> > > > > thereby preventing migration from occuring. If enfore_cpuid=false
> > > > > is explicitly added for the guest, the guest is deemed as
> > > > > non-migratable (via the migration blocker), so the above issue won't
> > occur as the guest won't be migrated.
> > > >
> > > > Blocking migration when enforce=false is making an assumption that
> > > > users of that setting are inherantly broken. This is NOT the case if
> > > > the user/app has already validated compatibility in some manner
> > > > outside QEMU. Blocking migration in this case will break valid working 
> > > > use
> > cases.
> > >
> > > It's just an enforcement to ensure a safe migration. Without this
> > > (i.e., the current QEMU code) is making an assumption that users
> > > always have validated compatibility in a good manner outside QEMU, which
> > is risky to some degree?
> > 
> > QEMU configurations must never be assumed to be migratable by default.
> > There is a huge set of things that a user must do with QEMU configuration to
> > guarantee migratability beyond CPU features. All aspects of guest HW device
> > topology must be set explicitly.
> 
> What if the source and destination are required to use exactly the same QEMU
> commands? Does this meet the feature and topology requirements as you
> mentioned above?

That is insufficient as it does not take account of device hotplug.

> > > > IMHO this patch doesn't need to exist. If users of QEMU want strong
> > > > protection they can already opt-in to that with enforce=true.
> > >
> > > AFAIK, many users are not aware of this, and also we couldn't assume
> > > everybody knows it. That's why we want to add the enforcement.
> > 
> > Users who directly launch QEMU are expected to know about QEMU config
> > details for migration. If they don't, then they ought to be using a higher 
> > level
> > tool like libvirt, which ensures the configuration is migration compatible.
> 
> Could you explain how libvirt provides a more reliable assurance of migration
> compatibility in its configuration compared to using raw QEMU commands?
> Per my understanding, libvirt configs mostly map to the QEMU commands.

Libvirt records the full details of the guest configuration required to
reproduce the exact same guest ABI, even across device hotplug. This is
why libvirt QEMU command lines are absolutely enourmous compared to
minimalist command lines that users usuall run directly.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to