On Wed, Sep 13, 2023 at 08:33:01AM +0200, Philippe Mathieu-Daudé wrote:
> On 18/1/22 09:49, Thomas Huth wrote:
> > On 17/01/2022 21.12, Daniel P. Berrangé wrote:
> > > On Mon, Jan 17, 2022 at 08:16:39PM +0100, Thomas Huth wrote:
> > > > The list of machine types grows larger and larger each release ... and
> > > > it is unlikely that many people still use the very old ones for live
> > > > migration. QEMU v1.7 has been released more than 8 years ago, so most
> > > > people should have updated their machines to a newer version in those
> > > > 8 years at least once. Thus let's mark the very old 1.x machine types
> > > > as deprecated now.
> > > 
> > > What criteria did you use for picking v1.7 as the end point ?
> > 
> > I picked everything starting with a "1." this time ;-)
> > 
> > No, honestly, since we don't have a deprecation policy in place yet,
> > there was no real good criteria around this time. For the machine types
> > < 1.3 there was a bug with migration, so these machine types could not
> > be used for reliable migration anymore anyway. But for the newer machine
> > types, we likely have to decide by other means indeed.
> > 
> > > I'm fine with the idea of aging out machine types, but I'd like us
> > > to explain the criteria we use for this, so that we can set clear
> > > expectations for users. I'm not a fan of adhoc decisions that have
> > > different impact every time we randomly decide to apply them.
> > > 
> > > A simple rule could be time based - eg we could say
> > > 
> > >    "we'll keep machine type versions for 5 years or 15 releases."
> > > 
> > > one factor is how long our downstream consumers have been keeping
> > > machines around for.
> > > 
> > > In RHEL-9 for example, the oldest machine is "pc-i440fx-rhel7.6.0"
> > > which IIUC is derived from QEMU 2.12.0. RHEL-9 is likely to rebase
> > > QEMU quite a few times over the coming years, so that 2.12.0 version
> > > sets an example baseline for how long machines might need to live for.
> > > That's 4 years this April, and could potentially be 6-7 years by the
> > > time RHEL-9 stops rebasing QEMU.
> > 
> > Yeah, 5 years still seemed a little bit short to me, that's one of the
> > reasons why I did not add more machine types in my patch here. I think
> > with 7 or 8 years, we should be on the safe side.
> > 
> > Any other opinions? And if we agree on an amount of years, where should
> > we document this? At the top of docs/about/deprecated.rst?
> 
> I suppose x86 being the oldest, x86 maintainers have to comment, but
> 5 years should be enough from sysadmins to migrate their VMs, isn't it?
> (No need to migrate from 1 -> 8, they can do 1 -> 3 -> 5 -> 8, right?)

You can't change guest hardware during migrate. So whether you go direct
from 1 -> 8, or go from 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8, you're
going to have the same guest hardware before and after every step.


If someone is using upstream QEMU, I'm sceptical they will successfully
live migrate over QEMU versions spanning 5+ years. While we make a pretty
decent effort at ensuring back compat, and fixing problems, we've had
a number of mistakes over the years, that were caught in RHEL downstream
testing.

If someone is using RHEL QEMU (or another vendor who's putting in alot
of effort at live migration testing), then I can see them spanning over
5 years for a VM deployment. Of course they *should* have VM reboots over
that timeframe to deploy new kernels for example, so they will have had
opportunities to update the machine type, but it does not mean they have
actually done so.

The pc-i440fx-rhel7.6.0 machine type I mentioned earlier in the thread is
a bit of an unusual case, as that has lasted longer than intended (RHEL-7,
RHEL-8, and RHEL-9). Normally our downstream policy is for machine types
to last 2 major RHEL releases, so you can deploy on N and later upgrade
the VM to N+1 without a reboot for re-configuration.

Now in the case of RHEL we don't use upstream QEMU machines types, so we
don't actually care when QEMU deprecates and deletes old machine types.

What matter is whether there are any internal tunable knobs that were
used in the pc_compat_*_fn() functions that get deleted as a result
of their usage going away.  For example our rhel7.6.0 machine type uses

    m->async_pf_vmexit_disable = true;
    m->smbus_no_migration_support = true;
    m->deprecation_reason = rhel_old_machine_deprecation;
    pcmc->pvh_enabled = false;
    pcmc->default_cpu_version = CPU_VERSION_LEGACY;
    pcmc->kvmclock_create_always = false;
    pcmc->pci_root_uid = 1;
    pcmc->legacy_no_rng_seed = true;
    pcmc->enforce_amd_1tb_hole = false;

Deleting any of those upstream is what would impact us downstream.


Personally I think we could make a case for QEMU upstream only
preserving machine types for 5 years, on the basis that people
interested in longer lifetimes should be using a vendor supported
QEMU. Upstream community just doesn't have the resources todo the
level of testing needed to provide such long life guarantees on
migration compat.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to