* Paolo Bonzini (pbonz...@redhat.com) wrote:
> Il 28/05/2014 14:04, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (pbonz...@redhat.com) wrote:
> >>Il 28/05/2014 13:20, Dr. David Alan Gilbert (git) ha scritto:
> >>>There aren't any uses of the migration version in this patch set, however
> >>>uses I can think of include:
> >>>   a) Generating an old format for a particular device when sent to a
> >>>      particular version
> >>>   b) Fudging a register value for a particular version
> >>>   c) Fudging around more general problems (e.g. the 1.6.x short PCI 
> >>> naming)
> >>
> >>http://lists.gnu.org/archive/html/qemu-stable/2013-10/msg00019.html
> >
> >Yep, I'm aware of that one, it only solves some of the directions.
> 
> Ok, I wanted to make sure it's the same issue.
> 
> >>I don't think this is necessary; not yet at least.
> >
> >Yep, most things seem to be fudgable; however I'd point out the
> >'not yet at least' in your response; the intention of this patch is to
> >already have the information/infrastructure in place so that if we do
> >find a nasty case then it's easier to fix.
> >
> >It's also more likely in the case where people want to backward migrate
> >across larger version gaps.
> 
> I think something as large as RHEL7->RHEL6 is the largest that anyone sane
> would try.  That's already too large probably.
> 
> I said "not yet at least" in the sense that I've not been convinced yet.  If
> it's useful, it's certainly better to get it in now.
> 
> >>Of the above three cases, (a) and (b) can be handled by machine types.
> >
> >I don't see how this is generally the case without having to boot the source
> >QEMU with a machine type dependent on which destination you're going to go 
> >to,
> >that you don't yet know.
> 
> You already need to boot the source QEMU with a machine type that _exists_
> on the destination that you're going to go it, for example with the least
> common denominator of your cluster.

Yes, agreed.   

> This is already the case, with or
> without this additional knob, and this is why I don't like adding it: you
> have a version-based knob already, which is the machine type.

I see that the problem is that the machine type is being used to express two
only loosely related versions:
    a) The version of the emulated machine as seen by the guest
    b) The structure of the migration data

(b) has changed over time in ways that don't directly relate to the machine 
type,
  new major versions of vmstates etc; while a lot of thought is put in to moving
  forward correctly not much is put in to going backwards.

(a) In principal is strictly tied to a machine type, but in practice isn't -
people find screwups that appear minor (maybe an apparently minor bug) that
they don't realise changes the guests view of the world significantly; you
then end up with two versions of QEMU with the same machine type but subtly
different behaviour, or your alternatives are to invent a new machine type
or maintain bug compatibility.

I think providing the destination version lets you break the link between
these two a bit, and lets you fix the type of things in (a) more cleanly.

> >>(c) should have been caught by testing, and is not entirely solved by this
> >>approach.  It would fix 2.0->1.6 migration, but not 1.6->2.0 migration with
> >>-M pc-1.5.  In other words, it would not fix the case that we care about in
> >>upstream.
> >
> >I think 1.6->2.0 -M pc-1.5 is fixable by the appropriate flag being passed to
> >the 2.0 at startup to tell it to use/not-use the short-bus; not pretty 
> >though.
> 
> Yes, I think it can be already fixed with "-global
> i440FX-pcihost.short_root_bus=on".
> 
> >>and don't solve the fundamental problem of migration receiving
> >>insufficient testing with upstream QEMU versions.
> >
> >Indeed; and it's not meant to - in no way is this an excuse for inadequate
> >testing - however, we're all human, and bugs happen, this is just providing
> >a way to get out of some of the mess afterwards.
> 
> But there is only one way to avoid the mess, and it's to test things.
> Because once you've released a buggy version, migration _from_ that version
> is f***ed up and going to involve hacks no matter what.

What about migration _to_ that version - at the moment we have no where to
put the hacks that might be needed to dig yourself out of that hole.


> Downstreams that care should and do their own testing, with their own
> matrix.  If upstream cares, we need to test things at least across adjacent
> versions (1.5 <-> 1.6 <-> 1.7).  If we do this, and if the assumption is
> valid that migration compatibility is essentially transitive, it solves the
> problem much better than code can do

My point is that testing helps to avoid it, but however much testing you
make you're always going to miss something, and I'd just like to have
the tools there sitting to help.  It would be wonderful if we never have
to use this in anger (and I'd expect any reviewer of a use to make damn sure
it really was the best solution); but it's good to have it there.

Dave

> 
> Paolo
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Reply via email to