On 04/13/2012 10:23 AM, Paolo Bonzini wrote: > Management needs a way for QEMU to confirm that no I/O has been sent to the > target and not to the source. To provide this guarantee we rely on a file > in local persistent storage. QEMU receives a file descriptor via SCM_RIGHTS > and writes a single byte to it. If it fails, it will fail the drive-reopen > command too and management knows that no I/O request has been issued to the > new destination. Likewise, if management finds the file to have nonzero > size it knows that the target is valid and that indeed I/O requests could > have been submitted to it.
So if I understand correctly, the idea is that if just libvirtd goes down after issuing 'drive-reopen' but before getting a response, then I will miss both the return value and any event associated with drive-reopen (whether that event is success or failure), and when libvirtd restarts, 'query-block-jobs' no longer lists the job, so I don't know whether the pivot happened. However, if qemu stayed alive during that time, I can at least use 'query-block' to see what qemu thinks is open, and deduce from that. But if the world conspires against me, such as libvirt going down, then qemu completing the reopen, then the guest VM halting itself so that the qemu process goes away, all before libvirt restarts, then I'm stuck figuring out whether qemu finished the job (so that when I restart the guest, I want to pivot the filename) or failed the job (so that when I restart the guest, I want to revert to the source). To do this, I now have to create a new file on disk (not a pipe), pass in the fd in advance, and then call drive-reopen, as well as record that filename as the location where I will look as part of trying to re-establish connections with the guest when libvirtd restarts. I'm not quite sure how to expose this to upper-layer management applications when they are using libvirt transient guests, but that's not qemu's problem. [In particular, it sounds like the sort of thing that I can't cram into virDomainBlockRebase for RHEL 6.3 based on libvirt 0.9.10, but would have to consider for a more powerful virDomainBlockCopy for libvirt 0.9.12 or later.] Overall, the idea sounds workable, and does seem to offer an extra measure of protection for recovery to uncorrupted data after a worse-case drive-reopen failure. > +++ b/hmp.c > @@ -744,7 +744,7 @@ void hmp_drive_reopen(Monitor *mon, const QDict *qdict) > const char *format = qdict_get_try_str(qdict, "format"); > Error *errp = NULL; > > - qmp_drive_reopen(device, filename, !!format, format, &errp); > + qmp_drive_reopen(device, filename, !!format, format, false, NULL, &errp); > hmp_handle_error(mon, &errp); > } > > diff --git a/qapi-schema.json b/qapi-schema.json > index 0bf3a25..2e5a925 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -1228,6 +1228,13 @@ > # > # @format: #optional the format of the new image, default is 'qcow2'. > # > +# @witness: A file descriptor name that was passed via getfd. QEMU will > write Mark this #optional > +# a single byte to this file descriptor before completing the command > +# successfully. If the byte is not written to the file, it is > +# guaranteed that the guest has not issued any I/O to the new image. > +# Failure to write the byte is fatal just like failure to open the new > +# image, and will cause the guest to revert to the currently open file. Still seems like something that could fit if we get 'drive-reopen' shoehorned into 'transaction' in the future. Question - I know that 'drive-reopen' forces a block_job_cancel_sync() call before closing the source; how long can that take? After all, we recently make block-job-cancel asynchronous (with block_job_cancel_sync a wrapper around the asynchronous version that waits for things to settle). So that does mean that a call to 'drive-reopen' could indeed take a very long time from initially sending the monitor command before I finally get a response of success or failure, and that while the response will be accurate, the whole intent of this patch is that libvirt might not be around to get the response, so we want something a bit more persistent. Does this mean that if we add 'drive-reopen' to 'transaction', that transaction will be forced to wait for block_job_cancel_sync? And while it is waiting, are we locked out from all other monitor commands? Does this argue that 'transaction' needs to gain an asynchronous mode of operation, where you request a transaction but with immediate return, and then can issue other monitor commands while waiting for acknowledgment of whether the transaction could be acted on? But these questions start to sound like they are geared more to qemu 1.2, when we spend more time thinking about the situation of adding asynchronous commands and properly managing which commands are long-running vs. asynchronous. -- Eric Blake ebl...@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature