Very lengthy discussion, apologies if I repeat something in one of the various threads but I read lots of these discussions and I'm somewhat confused still of what this is all about...
On Wed, Nov 25, 2009 at 04:09:55PM +0200, Michael S. Tsirkin wrote: > We were discussing features that are (mostly) not user-visible. > It is clear that if you have a user-visible change you have > a different machine, so you can not migrate. > > Now if you fix a bug by changing savevm format, without user visible > changes you *also* can not migrate, but this does not make it into > feature or make it a good fit for machine description. There clearly has to be a separation already of machine definition otherwise forward migration to new qemu version couldn't be guaranteed in the first place! To migrate back all we need is the ability of the new version of qemu to write savevm in the old version format negotiated as max(oldformats[], newformats[]). It already has to be able to "read" the old savevm format but it wasn't required to write it yet, writing old format is the only new requirement. The machine definition is the old one because it comes from an old qemu and it has to be handled by new qemu if forward migration was possible in the first place. Clearly the migration won't be done safely across the cluster until all host nodes are upgraded, so I think the highlevel GUI should print a warning when it notices a migration from new savevm format to old savevm format. (obviously only savevm format can change here, machine definition isn't changing if migration is possible at all and it should just return error!). Then in an orthogonal way (totally different problem) we need to ensure all VM are started with the same guest visible machine definition (that should be true even if savevm format doesn't change). With -M if that's the desired API and we are upgrading qemu significantly in that update, if we didn't change qemu drivers significantly no -M parameter is needed. And if we upgrade machine definition migration will simply stop and that's feature not a bug. Now how much finegrined we want the savevm format, to be versioned per device, how complex we want the negotiation protocol (to be more extensible in the future) is all a matter of implementation details. In very short all we can be reasonably discussing here is to add the ability to new qemu to write in older (buggy) savevm format to allow backwards migration and to negotiate the highest savevm format for a backwards migration at the start of the connection, with a warning that there's a savevm format downgrade during migration so user knows he's risking instability and he should confirm after negotiation is complete and the downgrade has been noticed. After that we can still migrate (with a warning) from fixed pvclock to broken pvclock (the latter will remain potentially unstable, which is warning is required in my view) and they won't be forced to upgrade all hosts at once to still migrate across the whole cluster.