Re: [Qemu-devel] [PATCH 7/8] block: add witness argument to drive-reopen

Eric Blake Fri, 13 Apr 2012 15:33:07 -0700

On 04/13/2012 10:23 AM, Paolo Bonzini wrote:
> Management needs a way for QEMU to confirm that no I/O has been sent to the
> target and not to the source.  To provide this guarantee we rely on a file
> in local persistent storage.  QEMU receives a file descriptor via SCM_RIGHTS
> and writes a single byte to it.  If it fails, it will fail the drive-reopen
> command too and management knows that no I/O request has been issued to the
> new destination.  Likewise, if management finds the file to have nonzero
> size it knows that the target is valid and that indeed I/O requests could
> have been submitted to it.


So if I understand correctly, the idea is that if just libvirtd goes
down after issuing 'drive-reopen' but before getting a response, then I
will miss both the return value and any event associated with
drive-reopen (whether that event is success or failure), and when
libvirtd restarts, 'query-block-jobs' no longer lists the job, so I
don't know whether the pivot happened.  However, if qemu stayed alive
during that time, I can at least use 'query-block' to see what qemu
thinks is open, and deduce from that.

But if the world conspires against me, such as libvirt going down, then
qemu completing the reopen, then the guest VM halting itself so that the
qemu process goes away, all before libvirt restarts, then I'm stuck
figuring out whether qemu finished the job (so that when I restart the
guest, I want to pivot the filename) or failed the job (so that when I
restart the guest, I want to revert to the source).  To do this, I now
have to create a new file on disk (not a pipe), pass in the fd in
advance, and then call drive-reopen, as well as record that filename as
the location where I will look as part of trying to re-establish
connections with the guest when libvirtd restarts.

I'm not quite sure how to expose this to upper-layer management
applications when they are using libvirt transient guests, but that's
not qemu's problem.  [In particular, it sounds like the sort of thing
that I can't cram into virDomainBlockRebase for RHEL 6.3 based on
libvirt 0.9.10, but would have to consider for a more powerful
virDomainBlockCopy for libvirt 0.9.12 or later.]  Overall, the idea
sounds workable, and does seem to offer an extra measure of protection
for recovery to uncorrupted data after a worse-case drive-reopen failure.

> +++ b/hmp.c
> @@ -744,7 +744,7 @@ void hmp_drive_reopen(Monitor *mon, const QDict *qdict)
>      const char *format = qdict_get_try_str(qdict, "format");
>      Error *errp = NULL;
>  
> -    qmp_drive_reopen(device, filename, !!format, format, &errp);
> +    qmp_drive_reopen(device, filename, !!format, format, false, NULL, &errp);
>      hmp_handle_error(mon, &errp);
>  }
>  
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 0bf3a25..2e5a925 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1228,6 +1228,13 @@
>  #
>  # @format: #optional the format of the new image, default is 'qcow2'.
>  #
> +# @witness: A file descriptor name that was passed via getfd.  QEMU will 
> write

Mark this #optional

> +#   a single byte to this file descriptor before completing the command
> +#   successfully.  If the byte is not written to the file, it is
> +#   guaranteed that the guest has not issued any I/O to the new image.
> +#   Failure to write the byte is fatal just like failure to open the new
> +#   image, and will cause the guest to revert to the currently open file.

Still seems like something that could fit if we get 'drive-reopen'
shoehorned into 'transaction' in the future.

Question - I know that 'drive-reopen' forces a block_job_cancel_sync()
call before closing the source; how long can that take?  After all, we
recently make block-job-cancel asynchronous (with block_job_cancel_sync
a wrapper around the asynchronous version that waits for things to
settle).  So that does mean that a call to 'drive-reopen' could indeed
take a very long time from initially sending the monitor command before
I finally get a response of success or failure, and that while the
response will be accurate, the whole intent of this patch is that
libvirt might not be around to get the response, so we want something a
bit more persistent.  Does this mean that if we add 'drive-reopen' to
'transaction', that transaction will be forced to wait for
block_job_cancel_sync?  And while it is waiting, are we locked out from
all other monitor commands?  Does this argue that 'transaction' needs to
gain an asynchronous mode of operation, where you request a transaction
but with immediate return, and then can issue other monitor commands
while waiting for acknowledgment of whether the transaction could be
acted on?  But these questions start to sound like they are geared more
to qemu 1.2, when we spend more time thinking about the situation of
adding asynchronous commands and properly managing which commands are
long-running vs. asynchronous.

-- 
Eric Blake   ebl...@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 7/8] block: add witness argument to drive-reopen

Reply via email to