Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Mon, 01 Aug 2011 08:53:28 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 08/01/2011 02:54 AM, Christoph Hellwig wrote: On Sun, Jul 31, 2011 at 07:15:21PM -0500, Anthony Liguori wrote: I think we've set the bar too low historically for introducing new interfaces. I think Avi's new memory API is a good example of how we should approach these things--do the vast majority of the thankless work up front before initial merge. Yes, that seems to work a bit better. So how will we sort out and finalized the vmstate bits, http://wiki.qemu.org/Features/Migration/Next Is what I think we need to do next for migration. In terms of VMState, I think we should can leave it in the current state its in for now. If there is a desire to keep converting devices, that would be fine. Because I think the next thing to do in terms of changing device serialization is to make serialization a proper virtual method of the base object class. I think devices that use composition should also serialize their children as part of their serialization. I think that falls under the banner of updating the object model. QMP, and making sure we have one sort of error reporting? I've updated the QMP merge plan on the wiki: http://wiki.qemu.org/Features/QAPI#Merge_Plan Something that delays a full QMP conversion is designing the new interfaces (sometimes internal ones too). I feel that we're striving for perfection. While it's obvious that we need good interfaces, we have tons of commands and properly designing each of them will take ages. We've merged phase one, and phase two shouldn't be that hard to merge as the code is already written. It's just a matter of rebasing and incorporating in an incremental fashion. Phase two eliminates qerror_report() in favor of passing Error **s. It's very invasive which is why we decided to merge in two phases. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Sun, Jul 31, 2011 at 07:15:21PM -0500, Anthony Liguori wrote: I think we've set the bar too low historically for introducing new interfaces. I think Avi's new memory API is a good example of how we should approach these things--do the vast majority of the thankless work up front before initial merge. Yes, that seems to work a bit better. So how will we sort out and finalized the vmstate bits, QMP, and making sure we have one sort of error reporting? For vmstate I'd agree to Dor and principle and just drop support for old-style load/save functions after converting everything that matters to vmstate.
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 08/01/2011 02:54 AM, Christoph Hellwig wrote: On Sun, Jul 31, 2011 at 07:15:21PM -0500, Anthony Liguori wrote: I think we've set the bar too low historically for introducing new interfaces. I think Avi's new memory API is a good example of how we should approach these things--do the vast majority of the thankless work up front before initial merge. Yes, that seems to work a bit better. So how will we sort out and finalized the vmstate bits, http://wiki.qemu.org/Features/Migration/Next Is what I think we need to do next for migration. In terms of VMState, I think we should can leave it in the current state its in for now. If there is a desire to keep converting devices, that would be fine. Because I think the next thing to do in terms of changing device serialization is to make serialization a proper virtual method of the base object class. I think devices that use composition should also serialize their children as part of their serialization. I think that falls under the banner of updating the object model. QMP, and making sure we have one sort of error reporting? I've updated the QMP merge plan on the wiki: http://wiki.qemu.org/Features/QAPI#Merge_Plan We've merged phase one, and phase two shouldn't be that hard to merge as the code is already written. It's just a matter of rebasing and incorporating in an incremental fashion. Phase two eliminates qerror_report() in favor of passing Error **s. It's very invasive which is why we decided to merge in two phases. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 31 July 2011 11:48, Dor Laor dl...@redhat.com wrote: ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No, definitely not. I think most people using non-x86 architectures don't use the vmsave/vmload/migration features at all, but would be annoyed if the perfectly functional device models they were using got deleted... -- PMM
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 02:37 PM, Peter Maydell wrote: On 31 July 2011 11:48, Dor Laordl...@redhat.com wrote: ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No, definitely not. I think most people using non-x86 architectures don't use the vmsave/vmload/migration features at all, but would be annoyed if the perfectly functional device models they were using got deleted... I didn't mean to erase the entire device, just the code for save/load which as you say, might not be used at all. -- PMM
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Sun, Jul 31, 2011 at 02:45:07PM +0300, Dor Laor wrote: No, definitely not. I think most people using non-x86 architectures don't use the vmsave/vmload/migration features at all, but would be annoyed if the perfectly functional device models they were using got deleted... I didn't mean to erase the entire device, just the code for save/load which as you say, might not be used at all. Like the one in virtio?
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 09:46 PM, Christoph Hellwig wrote: On Sun, Jul 31, 2011 at 02:45:07PM +0300, Dor Laor wrote: No, definitely not. I think most people using non-x86 architectures don't use the vmsave/vmload/migration features at all, but would be annoyed if the perfectly functional device models they were using got deleted... I didn't mean to erase the entire device, just the code for save/load which as you say, might not be used at all. Like the one in virtio? /me caught off guard. I wonder why it wasn't converted to VMSTATE before? virtio is one of the key devices, it's not just random forgotten one that might not care about migration. It's worth to utilize this discussion to realize whether vmstate is significant enough. From my brief browsing it looks like vmstate helps to reduce some plain errors with double save/load coding, ease the field encoding and handles subsections (which imho is the most important). It's true that we need to introduce capabilities to the live migration protocol and some other goodies but we might be able to do that with the existing method of gradual enhancement for VMSTATE to whatever form it may be.
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 05:48 AM, Dor Laor wrote: On 07/30/2011 01:28 AM, Anthony Liguori wrote: No, not at all. Just that converting everything to VMState isn't a prerequisite for building a more robust migration protocol. The main thing is to priorities the problems we're facing with. - Live migration protocol: - VMState conversion is not complete But this is not a problem because it doesn't gate anything. That's my point. - Live migration is not flexible enough (even with subsections) To make it more flexible, we need to be able to marshal to an internal data structure that we can transform in more flexible ways. - Simplify destination cmdline for machine creation This needs qdev fixing. - Qdev - conversion is not complete - Machine + devices description are complex and have hidden glue This is a hard problem. - Qapi - Needs merging We merged the first part (which includes the new QMP server). The work is done for converting the actual QMP commands. - QOB - Only the beginning So overall there are many parallel projects, probably more than the above. The RightThink(tm) would be to pick the ones that we can converge on and not try to handle all in parallel. There are problems we can live with. Engineering wise it might not be a beauty but they can wait (for instance dark magic to create the machines). There are some that prevent adding new features or make the code hard to support w/o them. Cheers, Dor ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No. VMState is a solution looking for a problem. Many important device models are still not converted and ultimately, it doesn't solve the problem we're really trying to solve. Regards, Anthony Liguori Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 11:43 PM, Anthony Liguori wrote: On 07/31/2011 05:48 AM, Dor Laor wrote: On 07/30/2011 01:28 AM, Anthony Liguori wrote: No, not at all. Just that converting everything to VMState isn't a prerequisite for building a more robust migration protocol. The main thing is to priorities the problems we're facing with. - Live migration protocol: - VMState conversion is not complete But this is not a problem because it doesn't gate anything. That's my point. The VMState might be an exception but in general we have too many unfinished businesses going on. - Live migration is not flexible enough (even with subsections) To make it more flexible, we need to be able to marshal to an internal data structure that we can transform in more flexible ways. - Simplify destination cmdline for machine creation This needs qdev fixing. - Qdev - conversion is not complete - Machine + devices description are complex and have hidden glue This is a hard problem. - Qapi - Needs merging We merged the first part (which includes the new QMP server). The work is done for converting the actual QMP commands. - QOB - Only the beginning So overall there are many parallel projects, probably more than the above. The RightThink(tm) would be to pick the ones that we can converge on and not try to handle all in parallel. There are problems we can live with. Engineering wise it might not be a beauty but they can wait (for instance dark magic to create the machines). There are some that prevent adding new features or make the code hard to support w/o them. Cheers, Dor ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No. VMState is a solution looking for a problem. Many important device The initial target solved some rare bugs, that tend not to bite us with virtio. On the way, it got enhanced with subsections that was a major improvement. models are still not converted and ultimately, it doesn't solve the problem we're really trying to solve. From the start I supported Michael Tisrkin's idea for ASN.1 protocol. The question is how visitors and ability to translate from one representation to another will help us. I do see value in it but I don't think it is that important. If we have one real device serialization method that is flexible enough we can stick with it w/o translation. If we define qdev serialization into vmstate/asn.1/json/other and add some capability negotiation and various other goodies it should be enough. btw: separating the live migration protocol from the machine state is even more important if we take a gradual approach. Regards, Anthony Liguori Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 03:57 PM, Dor Laor wrote: On 07/31/2011 11:43 PM, Anthony Liguori wrote: ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No. VMState is a solution looking for a problem. Many important device The initial target solved some rare bugs, that tend not to bite us with virtio. On the way, it got enhanced with subsections that was a major improvement. I should have qualified my statement. VMState did solve many real problems. I meant that at this point in time, we've gotten pretty much what we can get out it. models are still not converted and ultimately, it doesn't solve the problem we're really trying to solve. From the start I supported Michael Tisrkin's idea for ASN.1 protocol. The question is how visitors and ability to translate from one representation to another will help us. Because with Visitors you can do: Devices - internal QObject representation - ASN.1 - wire - ASN.1 - internal QObject representation - Device. While it's in an internal representation, we can make large changes like translating entire device state structures to new formats, splitting one device into two, etc. It's sort of the ultimate mechanism to make compatibility changes. If you just go Devices - ASN.1, you miss out on that. BTW, another really useful thing that Visitor would enable is the ability to read an individual device to a QObject and implement the equivalent of 'show devicename' which dumps the state of arbitrary devices via QMP. This could be very useful for debugging. I do see value in it but I don't think it is that important. If we have one real device serialization method that is flexible enough we can stick with it w/o translation. If we define qdev serialization into vmstate/asn.1/json/other and add some capability negotiation and various other goodies it should be enough. btw: separating the live migration protocol from the machine state is even more important if we take a gradual approach. Yeah, I think the critical technical requirement to achieve this is that the devices need to generate their own serialization format, and then another layer translates that to the live migration protocol format. Regards, Anthony Liguori Regards, Anthony Liguori Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 08/01/2011 12:03 AM, Anthony Liguori wrote: On 07/31/2011 03:57 PM, Dor Laor wrote: On 07/31/2011 11:43 PM, Anthony Liguori wrote: ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No. VMState is a solution looking for a problem. Many important device The initial target solved some rare bugs, that tend not to bite us with virtio. On the way, it got enhanced with subsections that was a major improvement. I should have qualified my statement. VMState did solve many real problems. I meant that at this point in time, we've gotten pretty much what we can get out it. models are still not converted and ultimately, it doesn't solve the problem we're really trying to solve. From the start I supported Michael Tisrkin's idea for ASN.1 protocol. The question is how visitors and ability to translate from one representation to another will help us. Because with Visitors you can do: Devices - internal QObject representation - ASN.1 - wire - ASN.1 - internal QObject representation - Device. I admit that QObject sounds more appealing than VMState, we can convert all into it. I'm not sure what's the difference between visitor and the load/save functions, potentially with enhanced parameters like name which can be part of QObject anyway. While it's in an internal representation, we can make large changes like translating entire device state structures to new formats, splitting one device into two, etc. It's sort of the ultimate mechanism to make compatibility changes. If you just go Devices - ASN.1, you miss out on that. What's important in ASN.1 is not the data representation itself but the ability to have a flexible protocol. We can have it with VMState and QObject as well. I do admit that QObject+ASN.1 will ease the way to make it right so you convinced me :). I still don't see have using ASN.1 will easily join/split several devices into few and some other magics. Not that it is not possible but it is way too hard. The main 'real' problems you're trying to solve are migration from one release to the other while most of our problems were forgotten fields here and there (floppy/ide/rtl/kvmclock/etc). I doubt that live migration of the same release worked on upstream for the random git head. Verifying save(i)== load(i)+save(i+1) is simple but no one executing it. Looks like we might be ready to go with your suggestion, I'm just worried that there are too many other non migration open issues. If the above work won't get complete we're better off with the current machine type + VMState + subsections. If it will be all completed, we're better with your suggestion. BTW, another really useful thing that Visitor would enable is the ability to read an individual device to a QObject and implement the equivalent of 'show devicename' which dumps the state of arbitrary devices via QMP. This could be very useful for debugging. I do see value in it but I don't think it is that important. If we have one real device serialization method that is flexible enough we can stick with it w/o translation. If we define qdev serialization into vmstate/asn.1/json/other and add some capability negotiation and various other goodies it should be enough. btw: separating the live migration protocol from the machine state is even more important if we take a gradual approach. Yeah, I think the critical technical requirement to achieve this is that the devices need to generate their own serialization format, and then another layer translates that to the live migration protocol format. Regards, Anthony Liguori Regards, Anthony Liguori Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 04:25 PM, Dor Laor wrote: On 08/01/2011 12:03 AM, Anthony Liguori wrote: On 07/31/2011 03:57 PM, Dor Laor wrote: On 07/31/2011 11:43 PM, Anthony Liguori wrote: ps: how hard is to finish the vmstate conversion? Can't we just assume not converted code is not functional and just remove all of it? No. VMState is a solution looking for a problem. Many important device The initial target solved some rare bugs, that tend not to bite us with virtio. On the way, it got enhanced with subsections that was a major improvement. I should have qualified my statement. VMState did solve many real problems. I meant that at this point in time, we've gotten pretty much what we can get out it. models are still not converted and ultimately, it doesn't solve the problem we're really trying to solve. From the start I supported Michael Tisrkin's idea for ASN.1 protocol. The question is how visitors and ability to translate from one representation to another will help us. Because with Visitors you can do: Devices - internal QObject representation - ASN.1 - wire - ASN.1 - internal QObject representation - Device. I admit that QObject sounds more appealing than VMState, we can convert all into it. I'm not sure what's the difference between visitor and the load/save functions, potentially with enhanced parameters like name which can be part of QObject anyway. VMStateInfo contains struct VMStateInfo { const char *name; int (*get)(QEMUFile *f, void *pv, size_t size); void (*put)(QEMUFile *f, void *pv, size_t size); }; It needs to change to: struct VMStateInfo { const char *name; void (*visit)(Visitor *v, const char *name, void *pv, size_t size, Error **errp); }; For each VMStateInfo, like vmstate_info_bool, we go from: static int get_bool(QEMUFile *f, void *pv, size_t size) { bool *v = pv; *v = qemu_get_byte(f); return 0; } static void put_bool(QEMUFile *f, void *pv, size_t size) { bool *v = pv; qemu_put_byte(f, *v); } To: static void visit_bool(Visitor *v, const char *name, void *pv, size_t size, Error **errp) { bool *v = pv; visit_type_bool(v, name, v, errp); } For non-converted devices, like virtio, we change: int virtio_load(VirtIODevice *vdev, QEMUFile *f) { int num, i, ret; uint32_t features; uint32_t supported_features = vdev-binding-get_features(vdev-binding_opaque); if (vdev-binding-load_config) { ret = vdev-binding-load_config(vdev-binding_opaque, f); if (ret) return ret; } qemu_get_8s(f, vdev-status); qemu_get_8s(f, vdev-isr); ... To: void visit_type_virtio(Visitor *v, VirtIODevice *vdev, const char *name, Error **errp) { int num, i, ret; uint32_t features; uint32_t supported_features = vdev-binding-get_features(vdev-binding_opaque); if (vdev-binding-load_config) { ret = vdev-binding-load_config(vdev-binding_opaque, f); if (ret) return ret; } visit_start_struct(v, VirtIODevice, name, errp); visit_type_u8(v, status, vdev-status); visit_type_u8(v, isr, vdev-isr); ... You'll notice it's almost entirely mechanical. It can probably be done with a few seds and an afternoons worth of grunt work. I'm resisting the urge to do this myself because it's a good intro task and we've got a number of folks looking for those. While it's in an internal representation, we can make large changes like translating entire device state structures to new formats, splitting one device into two, etc. It's sort of the ultimate mechanism to make compatibility changes. If you just go Devices - ASN.1, you miss out on that. What's important in ASN.1 is not the data representation itself but the ability to have a flexible protocol. We can have it with VMState and QObject as well. I do admit that QObject+ASN.1 will ease the way to make it right so you convinced me :). I still don't see have using ASN.1 will easily join/split several devices into few and some other magics. Not that it is not possible but it is way too hard. ASN.1 doesn't do it but having an object representation that we can manipulate will. Think of it like a compiler optimization phase, you write a visitor that can identify a node, and transform it into a different set of nodes. The main 'real' problems you're trying to solve are migration from one release to the other while most of our problems were forgotten fields here and there (floppy/ide/rtl/kvmclock/etc). I doubt that live migration of the same release worked on upstream for the random git head. Verifying save(i)== load(i)+save(i+1) is simple but no one executing it. Because it's not easily automated. I know it's preaching to the choir, but we need better unit tests. We're getting there though, we know have a handful of tests in the tree with hopefully more growing now that we're embracing glib. Looks
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Sun, Jul 31, 2011 at 11:43:08PM +0300, Dor Laor wrote: /me caught off guard. I wonder why it wasn't converted to VMSTATE before? virtio is one of the key devices, it's not just random forgotten one that might not care about migration. It just shows the extent of incomplete transitions in qemu. Given how much burden incomplete transitions have on software projects we should try to minimize them in qemu. That is if people add a new API we need to have a clear roadmap when it's going to be finished, and more importantly what the consequence of not finishing it are instead of leaving it half done. I think the way the Linux kernel handles API transitions is something qemu could borrow from. For most of them it's simply expected to do a simple conversion of all users of an API to the new equivalent, maybe it in simplistic and dumb way, but at least a transition. Combined with a deprectation schedule for unused drivers that seems to do wonders, although of course even the Linux kernel is slacking in some areas.
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/31/2011 06:10 PM, Christoph Hellwig wrote: On Sun, Jul 31, 2011 at 11:43:08PM +0300, Dor Laor wrote: /me caught off guard. I wonder why it wasn't converted to VMSTATE before? virtio is one of the key devices, it's not just random forgotten one that might not care about migration. It just shows the extent of incomplete transitions in qemu. Given how much burden incomplete transitions have on software projects we should try to minimize them in qemu. That is if people add a new API we need to have a clear roadmap when it's going to be finished, and more importantly what the consequence of not finishing it are instead of leaving it half done. I think the way the Linux kernel handles API transitions is something qemu could borrow from. For most of them it's simply expected to do a simple conversion of all users of an API to the new equivalent, maybe it in simplistic and dumb way, but at least a transition. Combined with a deprectation schedule for unused drivers that seems to do wonders, although of course even the Linux kernel is slacking in some areas. One of the things I think the kernel is good at, is making relatively large changes outside of the tree and then merging it in a way that makes sense when it makes sense. I think we've set the bar too low historically for introducing new interfaces. I think Avi's new memory API is a good example of how we should approach these things--do the vast majority of the thankless work up front before initial merge. Besides making sure we don't have incomplete interfaces, it also helps validate the interface before committing to it. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/25/2011 04:10 PM, Paolo Bonzini wrote: On Thu, Jun 30, 2011 at 17:46, Paolo Bonzinipbonz...@redhat.com wrote: I have now tested this series (exactly as sent) both by examining manually the differences between the two formats on the same guest state, and by a mix of saves/restores (new on new, 0.14 on new pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL). It always does what is expected. Michael Tsirkin objected that the format should be passed as a parameter in the migrate command. I kind of agree, however since this is a real bug you would need to bump the default for new machine types, and this default would still go in the QEMUMachine struct like I am doing. So I consider the two settings to be orthogonal. Also, the alternative requires changes to the whole management stack and if the default is not changed it imposes a broken format unless you update the management tools. Clearly much less bang for the buck. I think this is ready to go into 0.15. The bug happens when migrating to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a floppy. The media changed subsection is almost always included, and this causes problems when migrating to 0.14 which didn't have any subsection for the floppy device. While QEMU support for migration to old version admittedly depends on luck, this isn't true of certain downstreams :) which would like to have an unambiguous migration format. I really hate the idea of changing the migration format moments before the release. Since subsections are optional, can't we take the offending subsections, remove them, bump the section version numbers and make the fields required? That fixes this issue temporarily without changing the format and we can change the format for 1.0. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Am 26.07.2011 14:37, schrieb Anthony Liguori: On 07/26/2011 07:07 AM, Juan Quintela wrote: Anthony Liguorianth...@codemonkey.ws wrote: == What we need == We need to decompose migration into three different problems: 1) serializing device state 2) transforming the device model in order to satisfy forwards and backwards compatibility 3) encoding the serialized device model on the wire. I will change this to: - We need to be able to enable/disable features of a device. A.K.A. make -M pc-0.14 work with devices with the same features than 0.14. Notice that this is _independent_ of migration. In theory, we already have this with qdev flags. - Be able to describe that different features/versions. This is not the difficult part, it can be subsections, optional fields, whatever. What is the difficult part is _knowing_ what fields needs to be on each version. That again depends of the device, not migration. - Be able to to do forward/bacward compatibility (and without comunication both sides is basically impossible). Hrm, I'm not sure I agree with these conclusions. Management tools should do their best job to create two compatible device models. Given two compatible device models, there *may* be differences in the structure of the device models since we evolve things over time. We may rename a field, change the type, etc. To support this, we can use filters both on the destination and receive end to do our best to massage the device model into something compatible. But creating two creating compatible device models is not the job of the migration protocol. It's the job of management tools. I'm not sure if I agree with this. Let's forget about management tools for a moment, and just think of a qemu instance with a given set of command line option describing its devices. Then you start another instance with different options and -incoming and start a migration. The result will be something, but definitely not a successfully migrated VM (even though it might look like one at first). This is why I believe that the information about which devices to use actually belongs into the migration data. There's no way to make use of it with different options. 5) Once we're here, we can implement the next 5-year format. That could be ASN.1 and be bidirectional or whatever makes the most sense. We could support 50 formats if we wanted to. As long as the transport is distinct from the serialization and compat routines, it really doesn't matter. This means finishing the VMState support, once there, only thing needs to change is copy the savevm, and change the visitors to whatever else that we need/want. There's no need to finish VMState to convert to visitors. It's just sed -e 's:qemu_put_be32:visit_type_int32:g' Actually I think the real question is whether we want to have VMState or not. If we do (and I think it's a good thing), then yes, we need to finish it. If not, then we should revert the parts that are already there. We shouldn't end up in an inconsistent state where half of qemu is converted and we don't feel a need to do anything about the other half. Kevin
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/29/2011 09:03 AM, Kevin Wolf wrote: Am 26.07.2011 14:37, schrieb Anthony Liguori: Hrm, I'm not sure I agree with these conclusions. Management tools should do their best job to create two compatible device models. Given two compatible device models, there *may* be differences in the structure of the device models since we evolve things over time. We may rename a field, change the type, etc. To support this, we can use filters both on the destination and receive end to do our best to massage the device model into something compatible. But creating two creating compatible device models is not the job of the migration protocol. It's the job of management tools. I'm not sure if I agree with this. Let's forget about management tools for a moment, and just think of a qemu instance with a given set of command line option describing its devices. Then you start another instance with different options and -incoming and start a migration. The result will be something, but definitely not a successfully migrated VM (even though it might look like one at first). This is why I believe that the information about which devices to use actually belongs into the migration data. There's no way to make use of it with different options. I agree with you actually. Right now, it's the management tools job. The complexity is daunting. Recreating the same object model, particularly after hotplug, is difficult and in many cases impossible. I think the primary problem is that there are so many ways to create things. -M pc creates a bunch of stuff that there's no way to create individually. The stuff it creates can sort of be manipulated by using -global but not on a per device basis. Much of it isn't even addressable. Creating backends is a totally different mechanism and each backend has different mechanisms to enumerate and create. The result is that introspecting what's there and recreating it is insanely complex today. That's the motivation behind QOM. plug_list lists *everything*. All objects, whether they are created as part of the PIIX3 or whether it's a backing file, can be directly addressed and manipulated. If you look at qsh, there's an import command. The export command is trivial and I don't remember if I've already added it. But the point is that you should be able to 'qsh export' everything and then 'qsh import' everything to create the exact same device model in another QEMU instance. And yeah, this should end up becoming part of the migration protocol. 5) Once we're here, we can implement the next 5-year format. That could be ASN.1 and be bidirectional or whatever makes the most sense. We could support 50 formats if we wanted to. As long as the transport is distinct from the serialization and compat routines, it really doesn't matter. This means finishing the VMState support, once there, only thing needs to change is copy the savevm, and change the visitors to whatever else that we need/want. There's no need to finish VMState to convert to visitors. It's just sed -e 's:qemu_put_be32:visit_type_int32:g' Actually I think the real question is whether we want to have VMState or not. VMState doesn't give me what I want by itself. I want to be able to marshal the device tree to an in-memory representation that can be manipulated. One approach to doing that is first completing VMState, and then writing something that can walk the VMState descriptions. The VMState descriptions are fairly complicated but it's doable. Another approach, which I'm arguing is much simpler, the imperative nature of our current serialization and use visitors. There may be other advantages of a declarative description of VMState that would justify completing the conversions. But I don't think we need it to start improving the migration protocol. Regards, Anthony Liguori
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/29/2011 03:14 PM, Anthony Liguori wrote: I really hate the idea of changing the migration format moments before the release. So do I, but that's life. Since subsections are optional, can't we take the offending subsections, remove them, bump the section version numbers and make the fields required? The bug happens when you migrate from 0.15 to 0.15, and 0.14 didn't have any subsection for that device. This happens pretty much in all cases that were added to 0.15. It quickly makes a bigger patch than this one, and actually one that's harder to review. At least with this one things can only go _royally_ wrong, and any serious automated test would catch it. Paolo
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Am 29.07.2011 16:28, schrieb Anthony Liguori: On 07/29/2011 09:03 AM, Kevin Wolf wrote: Am 26.07.2011 14:37, schrieb Anthony Liguori: Hrm, I'm not sure I agree with these conclusions. Management tools should do their best job to create two compatible device models. Given two compatible device models, there *may* be differences in the structure of the device models since we evolve things over time. We may rename a field, change the type, etc. To support this, we can use filters both on the destination and receive end to do our best to massage the device model into something compatible. But creating two creating compatible device models is not the job of the migration protocol. It's the job of management tools. I'm not sure if I agree with this. Let's forget about management tools for a moment, and just think of a qemu instance with a given set of command line option describing its devices. Then you start another instance with different options and -incoming and start a migration. The result will be something, but definitely not a successfully migrated VM (even though it might look like one at first). This is why I believe that the information about which devices to use actually belongs into the migration data. There's no way to make use of it with different options. I agree with you actually. Right now, it's the management tools job. The complexity is daunting. Recreating the same object model, particularly after hotplug, is difficult and in many cases impossible. I think the primary problem is that there are so many ways to create things. -M pc creates a bunch of stuff that there's no way to create individually. The stuff it creates can sort of be manipulated by using -global but not on a per device basis. Much of it isn't even addressable. Creating backends is a totally different mechanism and each backend has different mechanisms to enumerate and create. And backends are actually something totally different: They are the part that you can't migrate automatically, but that you must create on the destination host like we're doing it today. The paths to images etc. could be completely different from the source host. The one change for backends is that if we migrate a device in way so that it can say I need the block backend with the ID 'foo', then we can at least make sure that the backend actually exists and is usable. The result is that introspecting what's there and recreating it is insanely complex today. That's the motivation behind QOM. plug_list lists *everything*. All objects, whether they are created as part of the PIIX3 or whether it's a backing file, can be directly addressed and manipulated. If you look at qsh, there's an import command. The export command is trivial and I don't remember if I've already added it. But the point is that you should be able to 'qsh export' everything and then 'qsh import' everything to create the exact same device model in another QEMU instance. And yeah, this should end up becoming part of the migration protocol. If all you're saying is that we can't get it tomorrow, that's fine for me. Good to know that we agree on the goal anyway. 5) Once we're here, we can implement the next 5-year format. That could be ASN.1 and be bidirectional or whatever makes the most sense. We could support 50 formats if we wanted to. As long as the transport is distinct from the serialization and compat routines, it really doesn't matter. This means finishing the VMState support, once there, only thing needs to change is copy the savevm, and change the visitors to whatever else that we need/want. There's no need to finish VMState to convert to visitors. It's just sed -e 's:qemu_put_be32:visit_type_int32:g' Actually I think the real question is whether we want to have VMState or not. VMState doesn't give me what I want by itself. I want to be able to marshal the device tree to an in-memory representation that can be manipulated. One approach to doing that is first completing VMState, and then writing something that can walk the VMState descriptions. The VMState descriptions are fairly complicated but it's doable. Another approach, which I'm arguing is much simpler, the imperative nature of our current serialization and use visitors. There may be other advantages of a declarative description of VMState that would justify completing the conversions. But I don't think we need it to start improving the migration protocol. Yeah, I somehow read it as there's no reason to continue with converting to VMState, which isn't what you were saying. Kevin
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/29/2011 10:18 AM, Kevin Wolf wrote: Am 29.07.2011 16:28, schrieb Anthony Liguori: The one change for backends is that if we migrate a device in way so that it can say I need the block backend with the ID 'foo', then we can at least make sure that the backend actually exists and is usable. Yup. So with QOM, this could work in a couple ways. You could dump the full graph including the backends, and then recreate it but not realize any objects. This would give you a chance to make changes to things like the block device filenames. It could be as simple as just changing the filename of a device, or deleting a complex block device chain (from backing files) and replacing it with something totally different. I think the common case is that the backends are much the same so I think an interface centered around recreating the backends verbatim but allowing tweaks would probably be the friendliest. We could also require that the backends are created before we migrate the device model. In QOM, while you would be allowed to create a virtio-blk device, when you tried to set the drive property to 'foo', you'd get an error unless the 'foo' backend existed and was of the appropriate type. Since it's pretty easy to enumerate the required backends, it's really not so bad for the management tools to do this work. My only concern is that this all has to happen in the migration downtime window in order for hotplug to work robustly. The result is that introspecting what's there and recreating it is insanely complex today. That's the motivation behind QOM. plug_list lists *everything*. All objects, whether they are created as part of the PIIX3 or whether it's a backing file, can be directly addressed and manipulated. If you look at qsh, there's an import command. The export command is trivial and I don't remember if I've already added it. But the point is that you should be able to 'qsh export' everything and then 'qsh import' everything to create the exact same device model in another QEMU instance. And yeah, this should end up becoming part of the migration protocol. If all you're saying is that we can't get it tomorrow, that's fine for me. Good to know that we agree on the goal anyway. Yup :-) 5) Once we're here, we can implement the next 5-year format. That could be ASN.1 and be bidirectional or whatever makes the most sense. We could support 50 formats if we wanted to. As long as the transport is distinct from the serialization and compat routines, it really doesn't matter. This means finishing the VMState support, once there, only thing needs to change is copy the savevm, and change the visitors to whatever else that we need/want. There's no need to finish VMState to convert to visitors. It's just sed -e 's:qemu_put_be32:visit_type_int32:g' Actually I think the real question is whether we want to have VMState or not. VMState doesn't give me what I want by itself. I want to be able to marshal the device tree to an in-memory representation that can be manipulated. One approach to doing that is first completing VMState, and then writing something that can walk the VMState descriptions. The VMState descriptions are fairly complicated but it's doable. Another approach, which I'm arguing is much simpler, the imperative nature of our current serialization and use visitors. There may be other advantages of a declarative description of VMState that would justify completing the conversions. But I don't think we need it to start improving the migration protocol. Yeah, I somehow read it as there's no reason to continue with converting to VMState, which isn't what you were saying. No, not at all. Just that converting everything to VMState isn't a prerequisite for building a more robust migration protocol. Regards, Anthony Liguori Kevin
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote: We also need a way to future proof ourselves. == What we can do == 1) Add migration capabilities to future proof ourselves. I think the simplest way this would work is to have a 'query-migration-capabilities' command that returned a bitmask of supported migration features. I think we also introduce a 'set-migration-capabilities' command that can mask some of the supported features. A management tool would query-migration features on the source and destination, take the intersection of the two masks, and set that mask on both the source and destination. Lack of support for these commands indicates a mask of zero which is the protocol we offer today. This sounds like a very good idea to me. 5) We could support 50 formats if we wanted to. As long as the transport is distinct from the serialization and compat routines, it really doesn't matter. Lets not get too carried away :-) Even just dealing with the different ways libvirt can invoke manage the migration process gives me ~100 test scenarios to run through for each release of libvirt. The fewer QEMU testing combinations we need to worry about the better, because it quickly explodes with migration as you throw different versions into the mix. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote: On 07/25/2011 04:10 PM, Paolo Bonzini wrote: On Thu, Jun 30, 2011 at 17:46, Paolo Bonzinipbonz...@redhat.com wrote: With the current migration format, VMS_STRUCTs with subsections are ambiguous. The protocol cannot tell whether a 0x5 byte after the VMS_STRUCT is a subsection or part of the parent data stream. In the past QEMU assumed it was always a part of a subsection; after commit eb60260 (savevm: fix corruption in vmstate_subsection_load(), 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections defined. Unfortunately, this means that if a destination has no subsections defined for the struct, it will happily read subsection data into its own fields. And if you are lucky enough to stumble on a zero byte at the right time, it will be interpreted as QEMU_VM_EOF and migration will be interrupted with half-loaded state. There is no way out of this except defining an incompatible migration protocol. Not-so-long-term we should really try to define one that is not a joke, but the bug is serious so we need a solution for 0.15. A sentinel at the end of embedded structs does remove the ambiguity. Of course, this can be restricted to new machine models, and this is what the patch series does. (And note that only patch 3 is specific to the short-term solution, everything else is entirely generic). Untested beyond compilation. I have now tested this series (exactly as sent) both by examining manually the differences between the two formats on the same guest state, and by a mix of saves/restores (new on new, 0.14 on new pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL). It always does what is expected. Michael Tsirkin objected that the format should be passed as a parameter in the migrate command. I kind of agree, however since this is a real bug you would need to bump the default for new machine types, and this default would still go in the QEMUMachine struct like I am doing. So I consider the two settings to be orthogonal. Also, the alternative requires changes to the whole management stack and if the default is not changed it imposes a broken format unless you update the management tools. Clearly much less bang for the buck. I think this is ready to go into 0.15. I'll take a look for 0.15. The bug happens when migrating to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a floppy. The media changed subsection is almost always included, and this causes problems when migrating to 0.14 which didn't have any subsection for the floppy device. While QEMU support for migration to old version admittedly depends on luck, this isn't true of certain downstreams :) which would like to have an unambiguous migration format. So this got me thinking about where we're at with migration and where we need to go. I actually think there might be a reasonable path forward if we attack the problem differently than we have so far. == Today == Today we only support generating the latest serialization of devices. To increase the probability of the latest version working on older versions of QEMU, we strategically omit fields that we know can safely be omitted with older versions (subsections). More than likely, migrating new to old won't work. Migrating old to new is more likely to work. We version each section in order to be able to identify when we're dealing with old. But all of this logic lives in one of two forms. Either as a savevm/loadvm callback that takes a QEMUFile and writes byte serialization to the stream in an open way (usually big endian) or encoded declaratively in a VMState section. == What we need == We need to decompose migration into three different problems: 1) serializing device state 2) transforming the device model in order to satisfy forwards and backwards compatibility 3) encoding the serialized device model on the wire. We also need a way to future proof ourselves. == What we can do == 1) Add migration capabilities to future proof ourselves. I think the simplest way this would work is to have a 'query-migration-capabilities' command that returned a bitmask of supported migration features. I think we also introduce a 'set-migration-capabilities' command that can mask some of the supported features. A management tool would query-migration features on the source and destination, take the intersection of the two masks, and set that mask on both the source and destination. Lack of support for these commands indicates a mask of zero which is the protocol we offer today. When the management tool drives negotiation it is possible to do nice error reporting (each capability bit has a meaning and detailed incompatibility errors can be generated). However, doing so imposes extra work on management tools - they need to understand and drive negotiation. If QEMU adds a new
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Anthony Liguori anth...@codemonkey.ws wrote: On 07/25/2011 04:10 PM, Paolo Bonzini wrote: == Today == Today we only support generating the latest serialization of devices. To increase the probability of the latest version working on older versions of QEMU, we strategically omit fields that we know can safely be omitted with older versions (subsections). More than likely, migrating new to old won't work. Migrating old to new is more likely to work. We version each section in order to be able to identify when we're dealing with old. But all of this logic lives in one of two forms. Either as a savevm/loadvm callback that takes a QEMUFile and writes byte serialization to the stream in an open way (usually big endian) or encoded declaratively in a VMState section. We have a very poor way to try to load a device without some features, but support is very bad. == What we need == We need to decompose migration into three different problems: 1) serializing device state 2) transforming the device model in order to satisfy forwards and backwards compatibility 3) encoding the serialized device model on the wire. I will change this to: - We need to be able to enable/disable features of a device. A.K.A. make -M pc-0.14 work with devices with the same features than 0.14. Notice that this is _independent_ of migration. - Be able to describe that different features/versions. This is not the difficult part, it can be subsections, optional fields, whatever. What is the difficult part is _knowing_ what fields needs to be on each version. That again depends of the device, not migration. - Be able to to do forward/bacward compatibility (and without comunication both sides is basically impossible). - Send things on the wire (really this is the easy part, we can play with it touching only migration functions.). We also need a way to future proof ourselves. We have been very bad at this. Automatic checking is the only way that I can think of. == What we can do == 1) Add migration capabilities to future proof ourselves. I think the simplest way this would work is to have a query-migration-capabilities' command that returned a bitmask of supported migration features. I think we also introduce a set-migration-capabilities' command that can mask some of the supported features. We have two things here. Device level protocol level. Device level: very late to set anything. Protocol level: we can set things here, but notice that only a few things cane be set here. A management tool would query-migration features on the source and destination, take the intersection of the two masks, and set that mask on both the source and destination. Lack of support for these commands indicates a mask of zero which is the protocol we offer today. 2) Switch to a visitor model to serialize device state. This involves converting any occurance of: qemu_put_be32(f, port-guest_connected); To: visit_type_u32(v, guest_connected, port-guest_connected, local_err); VMSTATE_INT32(guest_conected, FooState) can be make to do this at any point. It's 100% mechanical and makes absolutely no logic change. It works equally well with legacy and VMstate migration handlers. 3) Add a Visitor class that operates on QEMUFile. At this state, we can migrate to data structures. That means we can migrate to QEMUFile, QObjects, or JSON. We could change the protocol at this stage to something that was still binary but had section sizes and things of that nature. That was the whole point of vmstate. But we shouldn't stop here. 4) Compatibility logic should be extracted from the savevm functions and VMstate functions into separate functions that take a data structure. Basically, we want to have something roughly equivalent to: QObject *e1000_migration_compatibility(QObject *src, int src_version, int dst_version); We can have lots of helpers that reuse the VMstate declarative stuff to do this but this should be registered independent of the main serialization handler. This moves us to a model where we always generate the latest serialization format, and then have specific ways to convert to older mechanisms. It allows us to do very big backwards compatibility steps like convert the state of one device into two separate devices (because we're just dealing with in-memory data structures). Paint me sceptic about this. I don't think this is going to work because that functions will rote very fast. It's this step that lets us truly support compatibility with migration. The good news is, it doesn't have to be all or nothing. Since we always already generate the latest serialization format, the existing code only deals with migrating older versions to the latest which is something that isn't all that important. So if we did this in 1.0, we could have a single function that converted the 1.0 device model to 1.1 and vice versa, and we'd be fine. We wouldn't have
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/26/2011 07:07 AM, Juan Quintela wrote: Anthony Liguorianth...@codemonkey.ws wrote: == What we need == We need to decompose migration into three different problems: 1) serializing device state 2) transforming the device model in order to satisfy forwards and backwards compatibility 3) encoding the serialized device model on the wire. I will change this to: - We need to be able to enable/disable features of a device. A.K.A. make -M pc-0.14 work with devices with the same features than 0.14. Notice that this is _independent_ of migration. In theory, we already have this with qdev flags. - Be able to describe that different features/versions. This is not the difficult part, it can be subsections, optional fields, whatever. What is the difficult part is _knowing_ what fields needs to be on each version. That again depends of the device, not migration. - Be able to to do forward/bacward compatibility (and without comunication both sides is basically impossible). Hrm, I'm not sure I agree with these conclusions. Management tools should do their best job to create two compatible device models. Given two compatible device models, there *may* be differences in the structure of the device models since we evolve things over time. We may rename a field, change the type, etc. To support this, we can use filters both on the destination and receive end to do our best to massage the device model into something compatible. But creating two creating compatible device models is not the job of the migration protocol. It's the job of management tools. - Send things on the wire (really this is the easy part, we can play with it touching only migration functions.). We also need a way to future proof ourselves. We have been very bad at this. Automatic checking is the only way that I can think of. I don't know what you mean by automatic checking. == What we can do == 1) Add migration capabilities to future proof ourselves. I think the simplest way this would work is to have a query-migration-capabilities' command that returned a bitmask of supported migration features. I think we also introduce a set-migration-capabilities' command that can mask some of the supported features. We have two things here. Device level protocol level. Device level: very late to set anything. Protocol level: we can set things here, but notice that only a few things cane be set here. Once we have a protocol level feature bit, we can add device level feature bits as a new feature. A management tool would query-migration features on the source and destination, take the intersection of the two masks, and set that mask on both the source and destination. Lack of support for these commands indicates a mask of zero which is the protocol we offer today. 2) Switch to a visitor model to serialize device state. This involves converting any occurance of: qemu_put_be32(f, port-guest_connected); To: visit_type_u32(v, guest_connected,port-guest_connected,local_err); VMSTATE_INT32(guest_conected, FooState) can be make to do this at any point. It's 100% mechanical and makes absolutely no logic change. It works equally well with legacy and VMstate migration handlers. 3) Add a Visitor class that operates on QEMUFile. At this state, we can migrate to data structures. That means we can migrate to QEMUFile, QObjects, or JSON. We could change the protocol at this stage to something that was still binary but had section sizes and things of that nature. That was the whole point of vmstate. The problem with vmstate is that it's an all or nothing thing and the conversion isn't programmatic. Since visiting and qemu_put match 1-1, we can do the conversion all-at-once with some sed magic. So if we did this in 1.0, we could have a single function that converted the 1.0 device model to 1.1 and vice versa, and we'd be fine. We wouldn't have to touch 200 devices to do this. I still think this is wrong. We are launching a device with feature foo, and at migration time, we want to migration without feature foo. This is not going to work on the general case. But launching the device _without_ feature foo will always work. Don't confuse migration with creating compatible device models. We're never going to support migrating from a system with an e1000 to a system with virtio :-) Notice the things that can be optional: - features that are not used. We update the device to have more features, but OS driver only uses the features of the old version. With subsections test, we can fix this one. - values that are only needed sometimes. PIO subsection cames to mind, it is only needed when we are on the middle of a PIO operation. - values that rarely change for defaults. This the mmio addresess problems with rtl8139. If we plug/unplug the card, we will get a different address, so we need to change it. - values that depend of other features (change default
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Tue, Jul 26, 2011 at 10:48 AM, Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote: On 07/25/2011 04:10 PM, Paolo Bonzini wrote: On Thu, Jun 30, 2011 at 17:46, Paolo Bonzinipbonz...@redhat.com wrote: With the current migration format, VMS_STRUCTs with subsections are ambiguous. The protocol cannot tell whether a 0x5 byte after the VMS_STRUCT is a subsection or part of the parent data stream. In the past QEMU assumed it was always a part of a subsection; after commit eb60260 (savevm: fix corruption in vmstate_subsection_load(), 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections defined. Unfortunately, this means that if a destination has no subsections defined for the struct, it will happily read subsection data into its own fields. And if you are lucky enough to stumble on a zero byte at the right time, it will be interpreted as QEMU_VM_EOF and migration will be interrupted with half-loaded state. There is no way out of this except defining an incompatible migration protocol. Not-so-long-term we should really try to define one that is not a joke, but the bug is serious so we need a solution for 0.15. A sentinel at the end of embedded structs does remove the ambiguity. Of course, this can be restricted to new machine models, and this is what the patch series does. (And note that only patch 3 is specific to the short-term solution, everything else is entirely generic). Untested beyond compilation. I have now tested this series (exactly as sent) both by examining manually the differences between the two formats on the same guest state, and by a mix of saves/restores (new on new, 0.14 on new pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL). It always does what is expected. Michael Tsirkin objected that the format should be passed as a parameter in the migrate command. I kind of agree, however since this is a real bug you would need to bump the default for new machine types, and this default would still go in the QEMUMachine struct like I am doing. So I consider the two settings to be orthogonal. Also, the alternative requires changes to the whole management stack and if the default is not changed it imposes a broken format unless you update the management tools. Clearly much less bang for the buck. I think this is ready to go into 0.15. I'll take a look for 0.15. The bug happens when migrating to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a floppy. The media changed subsection is almost always included, and this causes problems when migrating to 0.14 which didn't have any subsection for the floppy device. While QEMU support for migration to old version admittedly depends on luck, this isn't true of certain downstreams :) which would like to have an unambiguous migration format. So this got me thinking about where we're at with migration and where we need to go. I actually think there might be a reasonable path forward if we attack the problem differently than we have so far. == Today == Today we only support generating the latest serialization of devices. To increase the probability of the latest version working on older versions of QEMU, we strategically omit fields that we know can safely be omitted with older versions (subsections). More than likely, migrating new to old won't work. Migrating old to new is more likely to work. We version each section in order to be able to identify when we're dealing with old. But all of this logic lives in one of two forms. Either as a savevm/loadvm callback that takes a QEMUFile and writes byte serialization to the stream in an open way (usually big endian) or encoded declaratively in a VMState section. == What we need == We need to decompose migration into three different problems: 1) serializing device state 2) transforming the device model in order to satisfy forwards and backwards compatibility 3) encoding the serialized device model on the wire. We also need a way to future proof ourselves. == What we can do == 1) Add migration capabilities to future proof ourselves. I think the simplest way this would work is to have a 'query-migration-capabilities' command that returned a bitmask of supported migration features. I think we also introduce a 'set-migration-capabilities' command that can mask some of the supported features. A management tool would query-migration features on the source and destination, take the intersection of the two masks, and set that mask on both the source and destination. Lack of support for these commands indicates a mask of zero which is the protocol we offer today. When the management tool drives negotiation it is possible to do nice error reporting (each capability bit has a meaning and detailed incompatibility errors can be generated). However, doing so imposes extra work on management
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
Anthony Liguori anth...@codemonkey.ws wrote: On 07/26/2011 07:07 AM, Juan Quintela wrote: I will change this to: - We need to be able to enable/disable features of a device. A.K.A. make -M pc-0.14 work with devices with the same features than 0.14. Notice that this is _independent_ of migration. In theory, we already have this with qdev flags. theory. we are not there at all :-( but anyways, that is not _migration_, it is qdev. - Be able to describe that different features/versions. This is not the difficult part, it can be subsections, optional fields, whatever. What is the difficult part is _knowing_ what fields needs to be on each version. That again depends of the device, not migration. - Be able to to do forward/bacward compatibility (and without comunication both sides is basically impossible). Hrm, I'm not sure I agree with these conclusions. Management tools should do their best job to create two compatible device models. How? only part that can have enough information is the new part (either source of destination). And we are being very careful about not allowing any comunication/setting of what is in the other side. Given two compatible device models, there *may* be differences in the structure of the device models since we evolve things over time. We may rename a field, change the type, etc. To support this, we can use filters both on the destination and receive end to do our best to massage the device model into something compatible. But creating two creating compatible device models is not the job of the migration protocol. It's the job of management tools. Agreed here. - Send things on the wire (really this is the easy part, we can play with it touching only migration functions.). We also need a way to future proof ourselves. We have been very bad at this. Automatic checking is the only way that I can think of. I don't know what you mean by automatic checking. We should have unit test to see that (at least) the obvious migration work. We have two things here. Device level protocol level. Device level: very late to set anything. Protocol level: we can set things here, but notice that only a few things cane be set here. Once we have a protocol level feature bit, we can add device level feature bits as a new feature. This don't help migration time is very late to configure a device. We need to configure it at creation time. It makes no sense to try to migrate device foo with 4 bar's and at migration time try to push it into only 2 bars. Having it created with 2 bars in the 1st place is the only sane solution. It's 100% mechanical and makes absolutely no logic change. It works equally well with legacy and VMstate migration handlers. 3) Add a Visitor class that operates on QEMUFile. At this state, we can migrate to data structures. That means we can migrate to QEMUFile, QObjects, or JSON. We could change the protocol at this stage to something that was still binary but had section sizes and things of that nature. That was the whole point of vmstate. The problem with vmstate is that it's an all or nothing thing and the conversion isn't programmatic. This is the whole point. We are being declarative, and we create a mecanism about how to visit all nodes. What we do in each node is not VMState business. VMState only defines the nodes, and which ones belong to each version. Since visiting and qemu_put match 1-1, we can do the conversion all-at-once with some sed magic. So if we did this in 1.0, we could have a single function that converted the 1.0 device model to 1.1 and vice versa, and we'd be fine. We wouldn't have to touch 200 devices to do this. I still think this is wrong. We are launching a device with feature foo, and at migration time, we want to migration without feature foo. This is not going to work on the general case. But launching the device _without_ feature foo will always work. Don't confuse migration with creating compatible device models. We're never going to support migrating from a system with an e1000 to a system with virtio :-) I am not confusing it. from virtio_serial_bus.c, you can see that what we sent on version 2 vs version 3 is much, much more information. Migrating from v3 to v2 is imposible, we need to start the device with the features that it had on v2. Notice that this is not related to vmstate, is related to _how_ the device works/features are implemented. Notice the things that can be optional: - features that are not used. We update the device to have more features, but OS driver only uses the features of the old version. With subsections test, we can fix this one. - values that are only needed sometimes. PIO subsection cames to mind, it is only needed when we are on the middle of a PIO operation. - values that rarely change for defaults. This the mmio addresess problems with rtl8139. If we
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/26/2011 03:13 PM, Juan Quintela wrote: Anthony Liguorianth...@codemonkey.ws wrote: On 07/26/2011 07:07 AM, Juan Quintela wrote: - Be able to describe that different features/versions. This is not the difficult part, it can be subsections, optional fields, whatever. What is the difficult part is _knowing_ what fields needs to be on each version. That again depends of the device, not migration. - Be able to to do forward/bacward compatibility (and without comunication both sides is basically impossible). Hrm, I'm not sure I agree with these conclusions. Management tools should do their best job to create two compatible device models. How? only part that can have enough information is the new part (either source of destination). And we are being very careful about not allowing any comunication/setting of what is in the other side. I'll explain below. - Send things on the wire (really this is the easy part, we can play with it touching only migration functions.). We also need a way to future proof ourselves. We have been very bad at this. Automatic checking is the only way that I can think of. I don't know what you mean by automatic checking. We should have unit test to see that (at least) the obvious migration work. Oh, 100% agree. In fact, I've posted patches :) But I wasn't happy with the level of completeness of those tests and want to write better tests which is part of my motivation in visiting this topic. We have two things here. Device level protocol level. Device level: very late to set anything. Protocol level: we can set things here, but notice that only a few things cane be set here. Once we have a protocol level feature bit, we can add device level feature bits as a new feature. This don't help migration time is very late to configure a device. We need to configure it at creation time. It makes no sense to try to migrate device foo with 4 bar's and at migration time try to push it into only 2 bars. Having it created with 2 bars in the 1st place is the only sane solution. I misunderstood what you were suggesting. For guest visible device features, they must be configured at creation time. I'm in full agreement. It's 100% mechanical and makes absolutely no logic change. It works equally well with legacy and VMstate migration handlers. 3) Add a Visitor class that operates on QEMUFile. At this state, we can migrate to data structures. That means we can migrate to QEMUFile, QObjects, or JSON. We could change the protocol at this stage to something that was still binary but had section sizes and things of that nature. That was the whole point of vmstate. The problem with vmstate is that it's an all or nothing thing and the conversion isn't programmatic. This is the whole point. We are being declarative, and we create a mecanism about how to visit all nodes. What we do in each node is not VMState business. VMState only defines the nodes, and which ones belong to each version. Right. Thinking more after the call, I think this may be a better way to explain what I'm proposing. With VMState, we provide a declarative description of each devices state. Because it's declarative, some things end up being tough to describe like variable sized arrays and complex data structures. You've worked through a lot of these, but this is fundamentally what makes this approach difficult to complete. At the end of VMState conversion, we have a declaration of how to read the current state of the device tree. We can write a function that takes all of the VMState descriptions and builds something from those descriptions. But right now, what we actually have is a routine that takes a VMState data description, and then calls a marshalling function. In essence, the data description gets interpreted to an imperative serialization mechanism. I'm suggesting that instead of trying to eliminate the imperativeness (which will be hard since we have a lot of hooks in various places), we should embrace the imperativeness. Instead of marshalling to a QEMUFile, we marshal to a Visitor, Visitor being an abstract that can marshal to arbitrary formats/objects. So we never actually walk the VMState tables to do anything. The unconverted purely imperative routines we just convert to use marshal to a Visitor instead of QEMUFile. What this gives us is a way to achieve the same level of abstraction that VMState would give us for almost no work at all. That fundamentally let's us take the next step in fixing migration. device with some features - migration - device with other features and it works. This means that migration does magic, and this is never going to work. Until now, this kind of worked because we only supported migration from old - new, or the same version. Migration from old - new can never have new features. But from new - old to work, we need a way to disable the new features.
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 26 July 2011 22:46, Anthony Liguori anth...@codemonkey.ws wrote: [This is a bit random-sniping at minor points because I'm still thinking about the big-picture bits] So we never actually walk the VMState tables to do anything. The unconverted purely imperative routines we just convert to use marshal to a Visitor instead of QEMUFile. What this gives us is a way to achieve the same level of abstraction that VMState would give us for almost no work at all. That fundamentally let's us take the next step in fixing migration. IME the problem with migration is not devices which implement old-style imperative save/load routines but all the devices which (silently) implement no kind of save/load support at all... With an improved qdev (which I think is QOM, but for now, just ignore that), we would be able to do the following: 1) create a device, that creates other devices as children of itself *without* those children being under a bus hierarchy. This is really important, yes. In fact in some ways the logical partitioning of a system doesn't inherently follow any kind of bus. So a Beagle board is a top-level thing which contains (among other things) an OMAP3 SOC, and some external-to-the-SOC devices like a NAND chip. The OMAP3 contains a CPU and a pile of devices including the GPMC which is the memory controller for the NAND chip. So the logical structure of the system is: beagleboard (the machine in qemu terms) - omap3 - cortex-a8 (cpu) - omap_gpmc - omap_uart - etc - NAND flash - etc even though the bus topology is more like: cortex-a8 - omap_uart - other system-bus devices - omap_gpmc - NAND flash - other devices hanging off the GPMC (and the interrupt topology is different again, ditto the clock tree). When you're trying to put together a machine then I think the logical structure is more useful than the memory bus or interrupt tree. 2) eliminate the notion of machines altogether, and instead replace machines with a chipset, soc device, or whatever is the logic device that basically equates to what the machine logic does today. This doesn't make any sense, though. A machine isn't a chipset or a SOC. It's a set of devices (including a CPU) wired up and configured in a particular way. A Beagle and an Overo are definitely different machines (which appear differently to guests in some ways) even though they share the same OMAP3 SOC. The pc machine code is basically the i440fx. You could take everything that it does, call it an i440fx object, and make machine properties properties of the i440fx. That makes what we think of as machine creation identical to device creation. I don't really know enough about PC hardware but I can't help thinking that doing this is basically putting things into the qemu i440fx object which aren't actually in the h/w i440fx. (Is the CPU really part of the chipset, just for example? RAM?) A random other point I'll throw in: along with composition (this device is really the result of wiring up and configuring these other devices like this, you also want to be able to have a device 'hide' and/or make read-only the properties of its subdevices, eg where My-SOC-USB implements USB by composing usb-ohci and usb-ehci but hardwires various things the generic OHCI/EHCI leave configurable. Also the machine model will want to hide things for similar reasons. -- PMM
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini pbonz...@redhat.com wrote: With the current migration format, VMS_STRUCTs with subsections are ambiguous. The protocol cannot tell whether a 0x5 byte after the VMS_STRUCT is a subsection or part of the parent data stream. In the past QEMU assumed it was always a part of a subsection; after commit eb60260 (savevm: fix corruption in vmstate_subsection_load(), 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections defined. Unfortunately, this means that if a destination has no subsections defined for the struct, it will happily read subsection data into its own fields. And if you are lucky enough to stumble on a zero byte at the right time, it will be interpreted as QEMU_VM_EOF and migration will be interrupted with half-loaded state. There is no way out of this except defining an incompatible migration protocol. Not-so-long-term we should really try to define one that is not a joke, but the bug is serious so we need a solution for 0.15. A sentinel at the end of embedded structs does remove the ambiguity. Of course, this can be restricted to new machine models, and this is what the patch series does. (And note that only patch 3 is specific to the short-term solution, everything else is entirely generic). Untested beyond compilation. I have now tested this series (exactly as sent) both by examining manually the differences between the two formats on the same guest state, and by a mix of saves/restores (new on new, 0.14 on new pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL). It always does what is expected. Michael Tsirkin objected that the format should be passed as a parameter in the migrate command. I kind of agree, however since this is a real bug you would need to bump the default for new machine types, and this default would still go in the QEMUMachine struct like I am doing. So I consider the two settings to be orthogonal. Also, the alternative requires changes to the whole management stack and if the default is not changed it imposes a broken format unless you update the management tools. Clearly much less bang for the buck. I think this is ready to go into 0.15. The bug happens when migrating to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a floppy. The media changed subsection is almost always included, and this causes problems when migrating to 0.14 which didn't have any subsection for the floppy device. While QEMU support for migration to old version admittedly depends on luck, this isn't true of certain downstreams :) which would like to have an unambiguous migration format. Paolo
Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
On 07/25/2011 04:10 PM, Paolo Bonzini wrote: On Thu, Jun 30, 2011 at 17:46, Paolo Bonzinipbonz...@redhat.com wrote: With the current migration format, VMS_STRUCTs with subsections are ambiguous. The protocol cannot tell whether a 0x5 byte after the VMS_STRUCT is a subsection or part of the parent data stream. In the past QEMU assumed it was always a part of a subsection; after commit eb60260 (savevm: fix corruption in vmstate_subsection_load(), 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections defined. Unfortunately, this means that if a destination has no subsections defined for the struct, it will happily read subsection data into its own fields. And if you are lucky enough to stumble on a zero byte at the right time, it will be interpreted as QEMU_VM_EOF and migration will be interrupted with half-loaded state. There is no way out of this except defining an incompatible migration protocol. Not-so-long-term we should really try to define one that is not a joke, but the bug is serious so we need a solution for 0.15. A sentinel at the end of embedded structs does remove the ambiguity. Of course, this can be restricted to new machine models, and this is what the patch series does. (And note that only patch 3 is specific to the short-term solution, everything else is entirely generic). Untested beyond compilation. I have now tested this series (exactly as sent) both by examining manually the differences between the two formats on the same guest state, and by a mix of saves/restores (new on new, 0.14 on new pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL). It always does what is expected. Michael Tsirkin objected that the format should be passed as a parameter in the migrate command. I kind of agree, however since this is a real bug you would need to bump the default for new machine types, and this default would still go in the QEMUMachine struct like I am doing. So I consider the two settings to be orthogonal. Also, the alternative requires changes to the whole management stack and if the default is not changed it imposes a broken format unless you update the management tools. Clearly much less bang for the buck. I think this is ready to go into 0.15. I'll take a look for 0.15. The bug happens when migrating to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a floppy. The media changed subsection is almost always included, and this causes problems when migrating to 0.14 which didn't have any subsection for the floppy device. While QEMU support for migration to old version admittedly depends on luck, this isn't true of certain downstreams :) which would like to have an unambiguous migration format. So this got me thinking about where we're at with migration and where we need to go. I actually think there might be a reasonable path forward if we attack the problem differently than we have so far. == Today == Today we only support generating the latest serialization of devices. To increase the probability of the latest version working on older versions of QEMU, we strategically omit fields that we know can safely be omitted with older versions (subsections). More than likely, migrating new to old won't work. Migrating old to new is more likely to work. We version each section in order to be able to identify when we're dealing with old. But all of this logic lives in one of two forms. Either as a savevm/loadvm callback that takes a QEMUFile and writes byte serialization to the stream in an open way (usually big endian) or encoded declaratively in a VMState section. == What we need == We need to decompose migration into three different problems: 1) serializing device state 2) transforming the device model in order to satisfy forwards and backwards compatibility 3) encoding the serialized device model on the wire. We also need a way to future proof ourselves. == What we can do == 1) Add migration capabilities to future proof ourselves. I think the simplest way this would work is to have a 'query-migration-capabilities' command that returned a bitmask of supported migration features. I think we also introduce a 'set-migration-capabilities' command that can mask some of the supported features. A management tool would query-migration features on the source and destination, take the intersection of the two masks, and set that mask on both the source and destination. Lack of support for these commands indicates a mask of zero which is the protocol we offer today. 2) Switch to a visitor model to serialize device state. This involves converting any occurance of: qemu_put_be32(f, port-guest_connected); To: visit_type_u32(v, guest_connected, port-guest_connected, local_err); It's 100% mechanical and makes absolutely no logic change. It works equally well with legacy and VMstate migration handlers. 3) Add a Visitor class that operates on QEMUFile. At this state, we can migrate to data
[Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
With the current migration format, VMS_STRUCTs with subsections are ambiguous. The protocol cannot tell whether a 0x5 byte after the VMS_STRUCT is a subsection or part of the parent data stream. In the past QEMU assumed it was always a part of a subsection; after commit eb60260 (savevm: fix corruption in vmstate_subsection_load(), 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections defined. Unfortunately, this means that if a destination has no subsections defined for the struct, it will happily read subsection data into its own fields. And if you are lucky enough to stumble on a zero byte at the right time, it will be interpreted as QEMU_VM_EOF and migration will be interrupted with half-loaded state. There is no way out of this except defining an incompatible migration protocol. Not-so-long-term we should really try to define one that is not a joke, but the bug is serious so we need a solution for 0.15. A sentinel at the end of embedded structs does remove the ambiguity. Of course, this can be restricted to new machine models, and this is what the patch series does. (And note that only patch 3 is specific to the short-term solution, everything else is entirely generic). Untested beyond compilation. Paolo Bonzini (4): add support for machine models to specify their migration format add pc-0.14 machine savevm: define new unambiguous migration format Revert savevm: fix corruption in vmstate_subsection_load(). cpu-common.h |3 --- qemu-common.h |2 ++ hw/boards.h |1 + hw/pc_piix.c | 16 +++- savevm.c | 44 +--- 5 files changed, 43 insertions(+), 23 deletions(-) 50,2 Bot -- 1.7.5.2