----- Original Message ----- > From: "Eric Blake" <ebl...@redhat.com> > To: "Paolo Bonzini" <pbonz...@redhat.com> > Cc: kw...@redhat.com, fsimo...@redhat.com, qemu-devel@nongnu.org, > stefa...@linux.vnet.ibm.com, lcapitul...@redhat.com > Sent: Tuesday, March 13, 2012 9:48:10 PM > Subject: Re: [Qemu-devel] [PATCH v4 10/10] Add the drive-reopen command > > On 03/06/2012 10:56 AM, Paolo Bonzini wrote: > > From: Federico Simoncelli <fsimo...@redhat.com> > > > > Signed-off-by: Federico Simoncelli <fsimo...@redhat.com> > > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > > > ## > > +# @drive-reopen > > +# > > +# Assigns a new image file to a device. > > +# > > +# @device: the name of the device for which we are changing the > > image file. > > +# > > +# @new-image-file: the target of the new image. If the file > > doesn't exists the > > +# command will fail. > > +# > > +# @format: #optional the format of the new image, default is > > 'qcow2'. > > +# > > +# Returns: nothing on success > > +# If @device is not a valid block device, DeviceNotFound > > +# If @new-image-file can't be opened, OpenFileFailed > > +# If @format is invalid, InvalidBlockFormat > > +# > > +# Since 1.1 > > +## > > +{ 'command': 'drive-reopen', > > + 'data': { 'device': 'str', 'new-image-file': 'str', '*format': > > 'str' } } > > I still think we need a 'drive-reopen' action included in > 'transaction', > as an 11/10 on this series. For disk migration, it is true that you > can > migrate one disk at a time, and therefore only need to reopen one > disk > at a time, to get the guarantee that for a single disk image, the > current state of that image will be guaranteed to be consistent using > only one storage domain.
I'm not sure if this was already addressed on this mailing list but the main problem is that as general rule a qcow file cannot be opened in r/w mode twice. I believe there was only one exception to that with the live migration and it generated several issues. That said, reopen is really hard to be implemented as a transaction without breaking that rule. For example in the blkmirror case you'd need to open the destination image in r/w while the mirroring is in action (already having the same image in r/w mode). There are several solutions here but they are either really hard to implement or non definitive. For example: * We could try to implement the reopen command for each special case, eg: blkmirror, reopening the same image, etc... and in such cases reusing the same bs that we already have. The downside is that this command will be coupled with all this special cases. * We could use the transaction APIs without actually making it transaction (if we fail in the middle we can't rollback). The only advantage of this is that we'd provide a consistent API to libvirt and we would postpone the problem to the future. Anyway I strongly discourage this as it's completely unsafe and it's going to break the transaction semantic. Moreover it's a solution that relies too much on the hope of finding something appropriate in the future. * We could leave it as it is, a distinct command that is not part of the transaction and that it's closing the old image before opening the new one. > But since the API allows the creation of two mirrors in one command, > I'm > worried that someone will try to start a mirror on two disks at once, > but then be stuck doing two separate 'drive-reopen' commands. If the > first succeeds but the second fails, you have now stranded the qemu > process across two storage domains, which is exactly what we were > trying > to avoid in the first place by inventing transactions. That is, even > if > all disks are individually consistent in a single domain, the act of > migrating then reopening one disk at a time means you will have a > window > where disk 1 and disk 2 are opened on different storage domains. This is not completely correct, the main intent was to not spread one image chain across two storage domains (making it incomplete if one of them was missing). In the next oVirt release a VM can have different disks on different storage domains, so this wouldn't be a special case but just a normal situation. -- Federico