Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 09:54 AM, Anthony Liguori wrote: On 10/11/2011 08:27 AM, Juan Quintela wrote: I've been thinking about it this morning. I think it's solvable. We need to be able to save off the qdev construction properties right before init. This is just a matter of storing a list of strings. Then we need a qdev_torture function that will save the device state (will require a dummy QEMUFile that saves to memory). We then need to invoke destruction w/o actually freeing the memory of the device. We should then zero out the device memory. We then need to run through qdev creation, setting properties based on the saved construction properties. Then we should init and invoke the device's reset function. Finally we can pass the dummy QEMUFile to the device's load function (or vmstate). If you want, I have a 'dummy QEMUFile' implementation... Stefan
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 04:34 PM, Anthony Liguori wrote: > On 10/11/2011 09:01 AM, Avi Kivity wrote: >> On 10/11/2011 03:57 PM, Anthony Liguori wrote: What I'm trying to avoid is making choices today that close the door on better fixes in the future. >>> >>> >>> I think Juan made a really good point in his earlier post. We need to >>> focus on better testing for migration. With a solid migration torture >>> test, we can probably eliminate much of the problems we're facing >>> today. >> >> Agree, fingerprinting vmstate should help a lot. Actually I don't think >> the visitor is strictly required, the fingerprinter can just walk >> vmstate structs. > > You mean generating a schema? Dumping the vmstate descriptions in a canonical format, and having a tools that verifies that version A is compatible with version B. > I was talking about an active migration torture test. Those are good, but inherently limited. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 09:01 AM, Avi Kivity wrote: On 10/11/2011 03:57 PM, Anthony Liguori wrote: What I'm trying to avoid is making choices today that close the door on better fixes in the future. I think Juan made a really good point in his earlier post. We need to focus on better testing for migration. With a solid migration torture test, we can probably eliminate much of the problems we're facing today. Agree, fingerprinting vmstate should help a lot. Actually I don't think the visitor is strictly required, the fingerprinter can just walk vmstate structs. You mean generating a schema? I was talking about an active migration torture test. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 03:57 PM, Anthony Liguori wrote: >> What I'm trying to avoid is making choices today that close the door on >> better fixes in the future. > > > I think Juan made a really good point in his earlier post. We need to > focus on better testing for migration. With a solid migration torture > test, we can probably eliminate much of the problems we're facing today. Agree, fingerprinting vmstate should help a lot. Actually I don't think the visitor is strictly required, the fingerprinter can just walk vmstate structs. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 08:47 AM, Avi Kivity wrote: On 10/11/2011 03:27 PM, Anthony Liguori wrote: 5) Implement subsections through the wire as top-level sections (as originally intended). Keep existing subsections with (1). That was (3). Yes, sorry. btw, it's reasonable to require that backwards migration is only to a fully updated stable release, so we can do 5) too, or backport 1). But given the choice of a nasty silent failure to an not-quite-up-to-date stable release or failing migration to a fully up-to-date stable release, I think it's better that we err on the side of caution. We're erring on the side of no migration, it seems. Not being able to migrate because of a recoverable failure is annoying. Having a silent failure that possible results in corruption is unacceptable. What I'm trying to avoid is making choices today that close the door on better fixes in the future. I think Juan made a really good point in his earlier post. We need to focus on better testing for migration. With a solid migration torture test, we can probably eliminate much of the problems we're facing today. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 08:27 AM, Juan Quintela wrote: Avi Kivity wrote: On 10/10/2011 01:35 PM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Subsections, version numbers, migration to older releases. Subsections --- - Current subsections are a mess (TM). The idea was to only have them at the very end of sections. So it was clear that it was a section (start with QEMU_VM_SECTION_START), or a subsection of this section, (start with QEMU_VM_SUBSECTION). As you can see, there is no possible ambiguity. Guess what happened? We needed subsections in the middle of the struct, where we can't warantee what cames after (that can be QEMU_VM_SUBSECTION). My last migration "subsection detection fix" fixes this in the majority of the cases, but you can probably do a case by hand where it happens. Back to the beggining, Avi wanted/wants that subsections are just normal sections with a "funny" name ("section_name/subsection_name"), requiring FIFO ordering or something like that. So far, so good, but we still have the problem that: a- we need to assure that ordering is right (do-able) b- we need to assure that "post-load" functions are done in the right order (also do-able) c- we need to be able from toplevel where we only have pointers to the general state to find the correct "substruct" pointer that this subsection refers to. This is kind of complicated :-( My sugestion/plan: - integrate my migration detection fix on upstream + stable - port all current subsections to avi approach to see about how feasible is. If IDE subsections can be made to work, everything else should be doable. Version numbers --- What to do here? Basically we have been able to integrate all changes so far using subsections (some of them in a non-trivial way, thought). Last one is the change proposed on wavcapture, I stated some ideas, but got no answer from the author. Basically he did an incompatible change on the driver, and I can't see a trivial way to make it compatible. Chanels used to be either output/input, and now they need to be both, so he duplicated the channels. Migration to older releases --- Our test framework for that is inexistent. That is the more important issue for this to work. Problem is that nobody really knows how to do it. I've been thinking about it this morning. I think it's solvable. We need to be able to save off the qdev construction properties right before init. This is just a matter of storing a list of strings. Then we need a qdev_torture function that will save the device state (will require a dummy QEMUFile that saves to memory). We then need to invoke destruction w/o actually freeing the memory of the device. We should then zero out the device memory. We then need to run through qdev creation, setting properties based on the saved construction properties. Then we should init and invoke the device's reset function. Finally we can pass the dummy QEMUFile to the device's load function (or vmstate). I'll take a look at implementing this today. I think it'll be a bit hairy but it looks doable to me. Regards, Anthony Liguori One of the ideas is to run machine, stop, save everything, reload, and continue. Or doing it in a loop for each device, but so far, they haven't moved for the "design" phase (for lack of a better word to describe "something that is on someone head and needs to be done"). Once here, more migration issues - VMState finish: Still on ToDo list, once my two series on the list is integrated, I expect to work on virtio + other cpus. No way this is going to be done for the 15th, perhaps one week after that. - migration thread: another thing that I am going to look at, in paraller with previous stuff. Patches on RHEL not in qemu.git --- - qcow2 consistence for migration: we need to reload qcow2 headers after migration, should be an easy case of split open in open + reload. We have decided that we only support migration with cache=none, so part of the series is not needed. - Huge memory machines: Last time I proposed the series, Anthony agreed with everything except the last patch (that was a bandaid, I agree). Added with the migration thread descrived before, we should be done on that department. Changing the protocol? - Except if someone appears and found an use for the new protocol, I will stay away for changing it. Things that need to be done once that we change the protocol in an incopmatible way: - send command line arguments through the migration channel, at least put support for it there. Needs qdev/QOM or whatever changes first. - put sections size/end markers. - fix the arrays mess. Basically we need to send things like: total size of array (think malloc) number of elements used (how many we sent) start: (w
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 03:27 PM, Anthony Liguori wrote: >> 5) Implement subsections through the wire as top-level sections (as >> originally intended). Keep existing subsections with (1). > > > That was (3). > Yes, sorry. >> btw, it's reasonable to require that backwards migration is only to a >> fully updated stable release, so we can do 5) too, or backport 1). > > But given the choice of a nasty silent failure to an > not-quite-up-to-date stable release or failing migration to a fully > up-to-date stable release, I think it's better that we err on the side > of caution. We're erring on the side of no migration, it seems. > Not being able to migrate because of a recoverable failure is > annoying. Having a silent failure that possible results in corruption > is unacceptable. What I'm trying to avoid is making choices today that close the door on better fixes in the future. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] KVM call agenda for October 11th
Avi Kivity wrote: > On 10/10/2011 01:35 PM, Juan Quintela wrote: >> Hi >> >> Please send in any agenda items you are interested in covering. >> > > Subsections, version numbers, migration to older releases. Subsections --- - Current subsections are a mess (TM). The idea was to only have them at the very end of sections. So it was clear that it was a section (start with QEMU_VM_SECTION_START), or a subsection of this section, (start with QEMU_VM_SUBSECTION). As you can see, there is no possible ambiguity. Guess what happened? We needed subsections in the middle of the struct, where we can't warantee what cames after (that can be QEMU_VM_SUBSECTION). My last migration "subsection detection fix" fixes this in the majority of the cases, but you can probably do a case by hand where it happens. Back to the beggining, Avi wanted/wants that subsections are just normal sections with a "funny" name ("section_name/subsection_name"), requiring FIFO ordering or something like that. So far, so good, but we still have the problem that: a- we need to assure that ordering is right (do-able) b- we need to assure that "post-load" functions are done in the right order (also do-able) c- we need to be able from toplevel where we only have pointers to the general state to find the correct "substruct" pointer that this subsection refers to. This is kind of complicated :-( My sugestion/plan: - integrate my migration detection fix on upstream + stable - port all current subsections to avi approach to see about how feasible is. If IDE subsections can be made to work, everything else should be doable. Version numbers --- What to do here? Basically we have been able to integrate all changes so far using subsections (some of them in a non-trivial way, thought). Last one is the change proposed on wavcapture, I stated some ideas, but got no answer from the author. Basically he did an incompatible change on the driver, and I can't see a trivial way to make it compatible. Chanels used to be either output/input, and now they need to be both, so he duplicated the channels. Migration to older releases --- Our test framework for that is inexistent. That is the more important issue for this to work. Problem is that nobody really knows how to do it. One of the ideas is to run machine, stop, save everything, reload, and continue. Or doing it in a loop for each device, but so far, they haven't moved for the "design" phase (for lack of a better word to describe "something that is on someone head and needs to be done"). Once here, more migration issues - VMState finish: Still on ToDo list, once my two series on the list is integrated, I expect to work on virtio + other cpus. No way this is going to be done for the 15th, perhaps one week after that. - migration thread: another thing that I am going to look at, in paraller with previous stuff. Patches on RHEL not in qemu.git --- - qcow2 consistence for migration: we need to reload qcow2 headers after migration, should be an easy case of split open in open + reload. We have decided that we only support migration with cache=none, so part of the series is not needed. - Huge memory machines: Last time I proposed the series, Anthony agreed with everything except the last patch (that was a bandaid, I agree). Added with the migration thread descrived before, we should be done on that department. Changing the protocol? - Except if someone appears and found an use for the new protocol, I will stay away for changing it. Things that need to be done once that we change the protocol in an incopmatible way: - send command line arguments through the migration channel, at least put support for it there. Needs qdev/QOM or whatever changes first. - put sections size/end markers. - fix the arrays mess. Basically we need to send things like: total size of array (think malloc) number of elements used (how many we sent) start: (we don't always sent data from the start) circular buffers: At the moment, we use some arrays as "circular buffers", and we just send to the beginning. this is from top of memory, going through all the array users will make things clearer. index of array, we have for index everything, int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t. We should just use one type for index, and make all our arrays simpler. - bitmaps: we need a type to sent bitmaps, period. - remove all the warts that we don't need anymore due to backward compatibility. - cpus: specially x86_*. Our format support for x86 is a mess, things like: - how to store doubles (at least 4 formats) - a generic way of sending a list of MSR's is needed. We are going to need more MSR's in the future, and we are having a new subsection/version for each new MSR. To make things
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 08:21 AM, Avi Kivity wrote: On 10/11/2011 03:01 PM, Anthony Liguori wrote: On 10/11/2011 06:48 AM, Avi Kivity wrote: On 10/10/2011 01:35 PM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Subsections, version numbers, migration to older releases. Problem with subsections: The encoding of a subsection within an embedded structure is ambiguous because the subsection occurs at the end of the structure. QEMU may mistakenly parse what follows the structure as the end of subsection deliminator. Possible solutions: 1) Juan has a series that adds heuristics to better match the EOS deliminator. While not 100% perfect, it should handle practically all possible cases. The main issue is that it's not present in older QEMUs which means migrating a subsection within a structure to an old QEMU that doesn't have this heuristic could fail. Ways to mitigate: force all devices with subsections to bump their version number. Wave our hands around and claim that the new version requires the subsection heuristics to be present. 2) Add Paolo's protocol change. This will cause a migration flag day. Since we want to switch to ASN.1 too, we'll have another flag day for the next release too. 3) Change subsection protocol more dramatically than Paolo's change (make subsections stand alone sections). Not clear how much effort this is. 4) Avoid subsections until we introduce a new wire protocol based on ASN.1 that can better handle concepts like subsections. This misses some opportunity for backwards compatibility in the short term but avoids repeated flag days. 5) Implement subsections through the wire as top-level sections (as originally intended). Keep existing subsections with (1). That was (3). btw, it's reasonable to require that backwards migration is only to a fully updated stable release, so we can do 5) too, or backport 1). But given the choice of a nasty silent failure to an not-quite-up-to-date stable release or failing migration to a fully up-to-date stable release, I think it's better that we err on the side of caution. Not being able to migrate because of a recoverable failure is annoying. Having a silent failure that possible results in corruption is unacceptable. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 03:01 PM, Anthony Liguori wrote: > On 10/11/2011 06:48 AM, Avi Kivity wrote: >> On 10/10/2011 01:35 PM, Juan Quintela wrote: >>> Hi >>> >>> Please send in any agenda items you are interested in covering. >>> >> >> Subsections, version numbers, migration to older releases. > > Problem with subsections: > > The encoding of a subsection within an embedded structure is ambiguous > because the subsection occurs at the end of the structure. QEMU may > mistakenly parse what follows the structure as the end of subsection > deliminator. > > Possible solutions: > > 1) Juan has a series that adds heuristics to better match the EOS > deliminator. While not 100% perfect, it should handle practically all > possible cases. > > The main issue is that it's not present in older QEMUs which means > migrating a subsection within a structure to an old QEMU that doesn't > have this heuristic could fail. > > Ways to mitigate: force all devices with subsections to bump their > version number. Wave our hands around and claim that the new version > requires the subsection heuristics to be present. > > 2) Add Paolo's protocol change. This will cause a migration flag > day. Since we want to switch to ASN.1 too, we'll have another flag > day for the next release too. > > 3) Change subsection protocol more dramatically than Paolo's change > (make subsections stand alone sections). Not clear how much effort > this is. > > 4) Avoid subsections until we introduce a new wire protocol based on > ASN.1 that can better handle concepts like subsections. This misses > some opportunity for backwards compatibility in the short term but > avoids repeated flag days. > 5) Implement subsections through the wire as top-level sections (as originally intended). Keep existing subsections with (1). btw, it's reasonable to require that backwards migration is only to a fully updated stable release, so we can do 5) too, or backport 1). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 06:48 AM, Avi Kivity wrote: On 10/10/2011 01:35 PM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Subsections, version numbers, migration to older releases. Problem with subsections: The encoding of a subsection within an embedded structure is ambiguous because the subsection occurs at the end of the structure. QEMU may mistakenly parse what follows the structure as the end of subsection deliminator. Possible solutions: 1) Juan has a series that adds heuristics to better match the EOS deliminator. While not 100% perfect, it should handle practically all possible cases. The main issue is that it's not present in older QEMUs which means migrating a subsection within a structure to an old QEMU that doesn't have this heuristic could fail. Ways to mitigate: force all devices with subsections to bump their version number. Wave our hands around and claim that the new version requires the subsection heuristics to be present. 2) Add Paolo's protocol change. This will cause a migration flag day. Since we want to switch to ASN.1 too, we'll have another flag day for the next release too. 3) Change subsection protocol more dramatically than Paolo's change (make subsections stand alone sections). Not clear how much effort this is. 4) Avoid subsections until we introduce a new wire protocol based on ASN.1 that can better handle concepts like subsections. This misses some opportunity for backwards compatibility in the short term but avoids repeated flag days. Regards, Anthony Liguori
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/11/2011 06:36 AM, Paolo Bonzini wrote: On 10/10/2011 01:35 PM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Planning the feature freeze: - what is left to merge? - test day? Great topic. Just a reminder, we're looking at release dates of: | 2011-10-15 | Soft freeze |- | 2011-11-01 | Hard master |- | 2011-11-07 | Tag qemu-1.0-rc1 |- | 2011-11-14 | Tag qemu-1.0-rc2 |- | 2011-11-21 | Tag qemu-1.0-rc3 |- | 2011-11-28 | Tag qemu-1.0-rc4 |- | 2011-12-01 | Tag qemu-1.0 Soft Freeze FAQ: == What is the soft feature freeze? == The soft feature freeze is the beginning of the stabilization phase of QEMU's development process. By the date of the soft feature freeze, any major feature should have some code posted to the qemu-devel mailing list if it's targeting a given release. == What should I do by the soft feature freeze? == For any major feature that you're targeting to the next release, you should: # Make sure that you've posted a patch series to qemu-devel # Write a Feature page on the qemu.org wiki describing the feature and the motivation # On the release planning wiki page, link to your feature wiki page. == Will my patches be rejected if I don't post before the soft feature freeze? == That's ultimately up to the subsystem maintainer. It's a value call based on the relative importance of the feature verses the disruptiveness of the feature. It's always best to avoid this problem in the first place and release early, release often[http://en.wikipedia.org/wiki/Release_early,_release_often]. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/10/2011 01:35 PM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Subsections, version numbers, migration to older releases. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] KVM call agenda for October 11th
On 10/10/2011 01:35 PM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. Planning the feature freeze: - what is left to merge? - test day? Paolo
[Qemu-devel] KVM call agenda for October 11th
Hi Please send in any agenda items you are interested in covering. Thanks, Juan. pgpsWpNSfkqQb.pgp Description: PGP signature