Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 17:44, Anthony Liguori ha scritto: > It also gets very challenging if some options are backported and others > aren't. It gets challenging anyway with backports. If qmp_block_stream_v1_2 knows of defaults and doesn't send them on the wire, it will work if you only rely on the subset of options that was backported. If it doesn't, downstreams will have to use vendor namespaces. > I guess let me ask the following question: > > Is the problem that should have an 'options' parameter to this command? > Do we anticipate future addition of arguments? First of all, streaming works nicely as it is. It is nice to make things work similarly for streaming and mirroring, but not paramount. Regarding the latter question: perhaps, but then where do you draw the line? Is it okay to add new kinds of data to unions? (Besides, unions are quite heavyweight to use in the JSON, and if I wanted to have an "options" array of unions I would like to put the speed in there too. So the boat has sailed). > If we can at least minimize the C API breakages, I'd be happy. My opinion is this: your goal makes sense, but it is not 100% practical yet because: 1) we're still far enough from being feature complete (in comparison to a certain proprietary product that does sport some extra bullets in its marketing brochures). 2) we don't have a C API, and we don't know how it would look like (e.g. would it use gio? or libev? or...). It's good to be in the right mindset from the beginning, but minimizing changes to hmp.c is by itself not a particularly useful goal. Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On 05/21/2012 09:47 AM, Paolo Bonzini wrote: Il 21/05/2012 16:40, Anthony Liguori ha scritto: On 05/21/2012 09:26 AM, Paolo Bonzini wrote: Il 21/05/2012 16:19, Anthony Liguori ha scritto: I'm not against it in principle, just in practice. Today, checking whether a command exists is: commands = qmp.query_commands() if 'block-stream' in commands: # has block-stream I have a hard time envisioning how schema introspection can be reasonably implemented in a client. schema = qmp.query_command_schema('block-stream') What would schema return? Did you mean: if schema['arguments'].has_key('on_error'): Yes, something like that. What about adding a parameter to a structure? schema = qmp.query_type('foo') if schema['data'].has_key('xyz') This may end up being workable. I hadn't thought of having granular QMP schema querying functions. BTW, the other problem with adding arguments like this is that it makes a stable C API impossible. Adding fields to structs is not a problem as long as "libqmp" takes care of all allocations on part of its client. Yes, the previous version I wrote handled this padding structures appropriately. Adding parameters to commands requires some smartness, but there are ways to do it: 1) add the first version number to the schema, generate versioned entry points qmp_block_stream_v1_1 qmp_block_stream_v1_2 etc. Provide multiple headers libqmp-1.1.h, libqmp-1.2.h etc. that take care of #define'ing qmp_block_stream to one of them This is pretty nasty :-/ It also gets very challenging if some options are backported and others aren't. I guess let me ask the following question: Is the problem that should have an 'options' parameter to this command? Do we anticipate future addition of arguments? If we can at least minimize the C API breakages, I'd be happy. Regards, Anthony Liguori 2) Same as (1) but use qmp_block_stream for the oldest version having the command. 3) have fun with the preprocessor (macro with variable arguments, sizeof, designated initializers, whatever) and emulate keyword arguments for the C API. Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 15:07, Kevin Wolf ha scritto: > Am 21.05.2012 13:02, schrieb Paolo Bonzini: >> Il 21/05/2012 12:32, Kevin Wolf ha scritto: >>> Am 21.05.2012 12:02, schrieb Paolo Bonzini: Il 21/05/2012 11:29, Kevin Wolf ha scritto: >>> If source/target is really the distinction we want to have, should the >>> available options be specific to the job type, so that you could have >>> src_error and dst_error for mirroring? >> >> Yes, that would make sense. > > Of course, at the same time it also makes the implementation a bit more > complicated. Not much, I'd have rerror/werror already split if I followed my spec. >>> Management doesn't necessarily exist. >> >> Even a human sitting at a console is management. (Though I don't plan >> HMP to expose rerror/werror; so you can assume in some sense that >> management exists). > > But it's management that cares about good defaults. :-) > > Why not expose the options in HMP? Honestly, because it's a pain. HMP doesn't have options with arguments, and I don't really care if HMP users curse at me because they end out-of-disk-space and their migration is dropped. If you're using HMP, more than likely: 1) you can just stop the VM and copy the storage manually; 2) even if you can't, if you get an EIO you can just stop the VM and figure the root cause. As far as storage migration is concerned (and quite a few other scenarios) HMP is a nice debugging tool, nothing more. Management may want to keep the VM stopped even for an error on the target, as long as mirroring has finished the initial synchronization step. The VM can perform large amounts of I/O while the job is paused, and then completing the job can take a large amount of time. >>> >>> If management wants to limit the impact of this, it can decide to >>> explicitly stop the VM when it receives the error event. >> >> That can be too late. >> >> Eric, is it a problem for libvirt if a pause or target error during >> mirroring causes the job to exit steady state? That means that after a >> target error the offset can go back from 100% to <100%. > > "too late" in what respect? Too late to properly show the error... though actually there is no guarantee that the VM will not clobber the error even with vm_stop, so it doesn't make a big difference in that, too. >> So for now I'll keep my proposed extension of query-block-jobs; later it >> can be modified so that the target will have a name if you started the >> mirroring with blockdev_mirror instead of drive_mirror. > > You mean the same QMP field is a string when the block device was added > with blockdev_add and a dict when it was added with drive_add? > Maintaining this sounds like a nightmare to me. No, the same QMP field is a dict. With blockdev_add it will have a name, and it will just be a shortcut to info block/info blockstat. With drive_add it will have no name. Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 16:40, Anthony Liguori ha scritto: > On 05/21/2012 09:26 AM, Paolo Bonzini wrote: >> Il 21/05/2012 16:19, Anthony Liguori ha scritto: >>> >>> I'm not against it in principle, just in practice. Today, checking >>> whether a command exists is: >>> >>> commands = qmp.query_commands() >>> >>> if 'block-stream' in commands: >>> # has block-stream >>> >>> I have a hard time envisioning how schema introspection can be >>> reasonably implemented in a client. >> >> schema = qmp.query_command_schema('block-stream') > > What would schema return? > > Did you mean: > > if schema['arguments'].has_key('on_error'): Yes, something like that. > What about adding a parameter to a structure? schema = qmp.query_type('foo') if schema['data'].has_key('xyz') > BTW, the other problem with adding arguments like this is that it makes > a stable C API impossible. Adding fields to structs is not a problem as long as "libqmp" takes care of all allocations on part of its client. Adding parameters to commands requires some smartness, but there are ways to do it: 1) add the first version number to the schema, generate versioned entry points qmp_block_stream_v1_1 qmp_block_stream_v1_2 etc. Provide multiple headers libqmp-1.1.h, libqmp-1.2.h etc. that take care of #define'ing qmp_block_stream to one of them 2) Same as (1) but use qmp_block_stream for the oldest version having the command. 3) have fun with the preprocessor (macro with variable arguments, sizeof, designated initializers, whatever) and emulate keyword arguments for the C API. Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On 05/21/2012 09:26 AM, Paolo Bonzini wrote: Il 21/05/2012 16:19, Anthony Liguori ha scritto: I'm not against it in principle, just in practice. Today, checking whether a command exists is: commands = qmp.query_commands() if 'block-stream' in commands: # has block-stream I have a hard time envisioning how schema introspection can be reasonably implemented in a client. schema = qmp.query_command_schema('block-stream') What would schema return? Did you mean: if schema['arguments'].has_key('on_error'): What about adding a parameter to a structure? BTW, the other problem with adding arguments like this is that it makes a stable C API impossible. Regards, Anthony Liguori if 'on-error' in schema: # has on-error Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 15:59, Luiz Capitulino ha scritto: > I understand your reasoning, and since the beginning I thought this was > something useful to do, but we've already settled for not doing this. > > I also think that we shouldn't have exceptions, as in practice this means > we're extending commands anyway. So either, we do it or we don't. Ok, I don't really care... streaming works decently even without this extension, it is only needed for mirroring (and only 1 patch to leave out of the series). Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 16:19, Anthony Liguori ha scritto: >> > > I'm not against it in principle, just in practice. Today, checking > whether a command exists is: > > commands = qmp.query_commands() > > if 'block-stream' in commands: > # has block-stream > > I have a hard time envisioning how schema introspection can be > reasonably implemented in a client. schema = qmp.query_command_schema('block-stream') if 'on-error' in schema: # has on-error Paolo
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On 05/21/2012 09:16 AM, Luiz Capitulino wrote: On Mon, 21 May 2012 09:10:40 -0500 Anthony Liguori wrote: On 05/21/2012 08:59 AM, Luiz Capitulino wrote: On Fri, 18 May 2012 19:08:42 +0200 Paolo Bonzini wrote: Modified QMP commands = As we have discussed on the ML, we're not going to extend QMP commands. I understand your reasoning, and since the beginning I thought this was something useful to do, but we've already settled for not doing this. I also think that we shouldn't have exceptions, as in practice this means we're extending commands anyway. So either, we do it or we don't. Well, I think we should ask ourselves the following question: How would a client figure out if the new options are available? This is the primary reason for not extending existing commands. Yes, I know. But if Paolo implements schema introspection, would you agree on extending commands. You seemed to be against even if we had schema introspection. I'm not against it in principle, just in practice. Today, checking whether a command exists is: commands = qmp.query_commands() if 'block-stream' in commands: # has block-stream I have a hard time envisioning how schema introspection can be reasonably implemented in a client. If we really feel it's important to extend commands, I'd prefer something like per-command capabilities. Regards, Anthony Liguori
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On 05/21/2012 09:09 AM, Kevin Wolf wrote: Am 21.05.2012 15:59, schrieb Luiz Capitulino: On Fri, 18 May 2012 19:08:42 +0200 Paolo Bonzini wrote: Modified QMP commands = As we have discussed on the ML, we're not going to extend QMP commands. I understand your reasoning, and since the beginning I thought this was something useful to do, but we've already settled for not doing this. I also think that we shouldn't have exceptions, as in practice this means we're extending commands anyway. So either, we do it or we don't. What's the reason for disallowing command extensions? I'd find it rather silly to maintain ten different functions that all do basically the same, just that some of them have more optional parameters than others... How does a client figure out if the command supports the new options? And start writing the Python code for it before answering... Even if we exposed the schema via QMP, checking via schema interpretation would be a nightmare. Regards, Anthony Liguori Kevin
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Am 21.05.2012 16:10, schrieb Anthony Liguori: > On 05/21/2012 08:59 AM, Luiz Capitulino wrote: >> On Fri, 18 May 2012 19:08:42 +0200 >> Paolo Bonzini wrote: >> >>> Modified QMP commands >>> = >> >> As we have discussed on the ML, we're not going to extend QMP commands. >> >> I understand your reasoning, and since the beginning I thought this was >> something useful to do, but we've already settled for not doing this. >> >> I also think that we shouldn't have exceptions, as in practice this means >> we're extending commands anyway. So either, we do it or we don't. > > Well, I think we should ask ourselves the following question: > > How would a client figure out if the new options are available? > > This is the primary reason for not extending existing commands. I agree that trying out if an optional argument is accepted or not isn't a nice solution, even though it works and is probably better than adding block-stream-2, ..., block-stream-7. Didn't you say a while ago that adding a command that would return the JSON schema wouldn't be all that hard? Maybe it's the right time to implement it for 1.2. Kevin
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On Mon, 21 May 2012 16:09:28 +0200 Kevin Wolf wrote: > Am 21.05.2012 15:59, schrieb Luiz Capitulino: > > On Fri, 18 May 2012 19:08:42 +0200 > > Paolo Bonzini wrote: > > > >> Modified QMP commands > >> = > > > > As we have discussed on the ML, we're not going to extend QMP commands. > > > > I understand your reasoning, and since the beginning I thought this was > > something useful to do, but we've already settled for not doing this. > > > > I also think that we shouldn't have exceptions, as in practice this means > > we're extending commands anyway. So either, we do it or we don't. > > What's the reason for disallowing command extensions? I'd find it rather > silly to maintain ten different functions that all do basically the > same, just that some of them have more optional parameters than others... See the subthread started by Anthony.
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On Mon, 21 May 2012 09:10:40 -0500 Anthony Liguori wrote: > On 05/21/2012 08:59 AM, Luiz Capitulino wrote: > > On Fri, 18 May 2012 19:08:42 +0200 > > Paolo Bonzini wrote: > > > >> Modified QMP commands > >> = > > > > As we have discussed on the ML, we're not going to extend QMP commands. > > > > I understand your reasoning, and since the beginning I thought this was > > something useful to do, but we've already settled for not doing this. > > > > I also think that we shouldn't have exceptions, as in practice this means > > we're extending commands anyway. So either, we do it or we don't. > > Well, I think we should ask ourselves the following question: > > How would a client figure out if the new options are available? > > This is the primary reason for not extending existing commands. Yes, I know. But if Paolo implements schema introspection, would you agree on extending commands. You seemed to be against even if we had schema introspection.
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On 05/21/2012 08:59 AM, Luiz Capitulino wrote: On Fri, 18 May 2012 19:08:42 +0200 Paolo Bonzini wrote: Modified QMP commands = As we have discussed on the ML, we're not going to extend QMP commands. I understand your reasoning, and since the beginning I thought this was something useful to do, but we've already settled for not doing this. I also think that we shouldn't have exceptions, as in practice this means we're extending commands anyway. So either, we do it or we don't. Well, I think we should ask ourselves the following question: How would a client figure out if the new options are available? This is the primary reason for not extending existing commands. Regards, Anthony Liguori
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Am 21.05.2012 15:59, schrieb Luiz Capitulino: > On Fri, 18 May 2012 19:08:42 +0200 > Paolo Bonzini wrote: > >> Modified QMP commands >> = > > As we have discussed on the ML, we're not going to extend QMP commands. > > I understand your reasoning, and since the beginning I thought this was > something useful to do, but we've already settled for not doing this. > > I also think that we shouldn't have exceptions, as in practice this means > we're extending commands anyway. So either, we do it or we don't. What's the reason for disallowing command extensions? I'd find it rather silly to maintain ten different functions that all do basically the same, just that some of them have more optional parameters than others... Kevin
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On Fri, 18 May 2012 19:08:42 +0200 Paolo Bonzini wrote: > Modified QMP commands > = As we have discussed on the ML, we're not going to extend QMP commands. I understand your reasoning, and since the beginning I thought this was something useful to do, but we've already settled for not doing this. I also think that we shouldn't have exceptions, as in practice this means we're extending commands anyway. So either, we do it or we don't.
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
On 05/21/2012 05:02 AM, Paolo Bonzini wrote: > Eric, is it a problem for libvirt if a pause or target error during > mirroring causes the job to exit steady state? That means that after a > target error the offset can go back from 100% to <100%. Libvirt would really like to have events present. If you emit an event on every change from <100% to 100%, and a counterpart event on every revert from 100% back to <100%, then that will help tremendously. For RHEL 6.3, libvirt requires 100% before attempting a drive-reopen. But since this proposal says qemu 1.2 won't even have a drive-reopen, but instead have a new block-job-complete, then libvirt should be okay if block-job-complete gracefully fails any time things are less than 100% (remember, drive-reopen has an attempted effect even if things are less than 100%, which is why libvirt had to pre-filter for that situation, but then again, RHEL never downgraded). In other words, I think downgrading out of steady state is not a problem, as long as events exist to help management track that, and as long as the command to pivot to the destination gracefully requires steady state. > > On the other hand, in other cases it can be desirable (qemu -S, run > streaming before the VM starts). Yes, libvirt would really like a way to be able to start a VM with mirroring active, in order to make it possible to support drive copy on persistent domains. -- Eric Blake ebl...@redhat.com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Am 21.05.2012 13:02, schrieb Paolo Bonzini: > Il 21/05/2012 12:32, Kevin Wolf ha scritto: >> Am 21.05.2012 12:02, schrieb Paolo Bonzini: >>> Il 21/05/2012 11:29, Kevin Wolf ha scritto: > * block-stream: I propose adding two options to the existing > block-stream command. If this is rejected, only mirroring will be able > to use rerror/werror. > > The new options are of course rerror/werror. They are enum options, > with the following possible values: Do we really need separate werror/rerror? For guest operations they really exist only for historical reasons: werror was there first, and when we wanted the same functionality, it seemed odd to overload werror to include reads as well. For block jobs, where there is no such option yet, we could go with a single error option, unless there is a use case for separate werror/rerror options. >>> >>> For mirroring rerror=source and werror=target. I'm not sure there is an >>> actual usecase, but at least it is more interesting than for devices... >> >> Hm. What if we add an active mirror? Then we can get some kind of COW, >> and rerror can happen on the target as well. > > Errors during the read part of COW are always reported as werror. Good point. Thinking a bit more about it, with an active mirror (i.e. a filter block driver) things become a bit less clear anyway. The filter would have to be linked to the job somehow. Another interesting question is if we'll want to restrict ourselves to one job at a time forever. But when we stop doing it, we'll need new APIs anyway. >> If source/target is really the distinction we want to have, should the >> available options be specific to the job type, so that you could have >> src_error and dst_error for mirroring? > > Yes, that would make sense. Of course, at the same time it also makes the implementation a bit more complicated. > 'stop': The VM *and* the job will be paused---the VM is stopped even if > the block device has neither rerror=stop nor werror={stop,enospc}. The > error is recorded in the block device's iostatus (which can be examined > with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a > job. > > Rationale: stopping all I/O seems to be the best choice in order > to limit the number of errors received. However, due to backwards- > compatibility with QEMU 1.1 we cannot pause the job when guest- > initiated I/O causes an error. We could do that if the block > device has rerror=stop/werror={stop,enospc}, but it seems more > complicated to just never do it. I don't agree with stopping the VM. Consider a case where the target is somewhere on the network and you lose the connection, but the primary image is local on the hard disk. You don't want to stop the VM just because continuing with the copy isn't possible for the moment. >>> >>> I think this is something that management should resolve. >> >> Management doesn't necessarily exist. > > Even a human sitting at a console is management. (Though I don't plan > HMP to expose rerror/werror; so you can assume in some sense that > management exists). But it's management that cares about good defaults. :-) Why not expose the options in HMP? >>> For an error on the source, stopping the VM makes sense. I don't >>> think management cares about what caused an I/O error on a device. >>> Does it matter if streaming was active or rather the guest was >>> executing "dd if=/dev/sda of=/dev/null". >> >> Yes, there's a big difference: If it was a job, the guest can keep >> running without any problems. If it was a guest operation, we would have >> to return an error to the guest, which may offline the disk in response. > > Ok, this makes sense. > >>> Management may want to keep the VM stopped even for an error on the >>> target, as long as mirroring has finished the initial synchronization >>> step. The VM can perform large amounts of I/O while the job is paused, >>> and then completing the job can take a large amount of time. >> >> If management wants to limit the impact of this, it can decide to >> explicitly stop the VM when it receives the error event. > > That can be too late. > > Eric, is it a problem for libvirt if a pause or target error during > mirroring causes the job to exit steady state? That means that after a > target error the offset can go back from 100% to <100%. "too late" in what respect? With the passive mirror, we already have a window in which data is on the source, but not copied to the target. Does it make a big difference if it is a few bytes more or less? If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going on at all. Do we really keep running the jobs in 1.1? If so, this is a bug and should be fixed before the release. >>> >>> Yes, we do. Do you think it's a problem for migration (thinking more >>> about it: ouch, yes, it s
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
This makes sense given the generic nature of block jobs. If mirroring was only for live migration, for example, then we could avoid all this by choosing a single policy. As a generic operation it's nice to have control over error policy. Stefan
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 12:32, Kevin Wolf ha scritto: > Am 21.05.2012 12:02, schrieb Paolo Bonzini: >> Il 21/05/2012 11:29, Kevin Wolf ha scritto: * block-stream: I propose adding two options to the existing block-stream command. If this is rejected, only mirroring will be able to use rerror/werror. The new options are of course rerror/werror. They are enum options, with the following possible values: >>> >>> Do we really need separate werror/rerror? For guest operations they >>> really exist only for historical reasons: werror was there first, and >>> when we wanted the same functionality, it seemed odd to overload werror >>> to include reads as well. >>> >>> For block jobs, where there is no such option yet, we could go with a >>> single error option, unless there is a use case for separate >>> werror/rerror options. >> >> For mirroring rerror=source and werror=target. I'm not sure there is an >> actual usecase, but at least it is more interesting than for devices... > > Hm. What if we add an active mirror? Then we can get some kind of COW, > and rerror can happen on the target as well. Errors during the read part of COW are always reported as werror. > If source/target is really the distinction we want to have, should the > available options be specific to the job type, so that you could have > src_error and dst_error for mirroring? Yes, that would make sense. 'stop': The VM *and* the job will be paused---the VM is stopped even if the block device has neither rerror=stop nor werror={stop,enospc}. The error is recorded in the block device's iostatus (which can be examined with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a job. Rationale: stopping all I/O seems to be the best choice in order to limit the number of errors received. However, due to backwards- compatibility with QEMU 1.1 we cannot pause the job when guest- initiated I/O causes an error. We could do that if the block device has rerror=stop/werror={stop,enospc}, but it seems more complicated to just never do it. >>> >>> I don't agree with stopping the VM. Consider a case where the target is >>> somewhere on the network and you lose the connection, but the primary >>> image is local on the hard disk. You don't want to stop the VM just >>> because continuing with the copy isn't possible for the moment. >> >> I think this is something that management should resolve. > > Management doesn't necessarily exist. Even a human sitting at a console is management. (Though I don't plan HMP to expose rerror/werror; so you can assume in some sense that management exists). >> For an error on the source, stopping the VM makes sense. I don't >> think management cares about what caused an I/O error on a device. >> Does it matter if streaming was active or rather the guest was >> executing "dd if=/dev/sda of=/dev/null". > > Yes, there's a big difference: If it was a job, the guest can keep > running without any problems. If it was a guest operation, we would have > to return an error to the guest, which may offline the disk in response. Ok, this makes sense. >> Management may want to keep the VM stopped even for an error on the >> target, as long as mirroring has finished the initial synchronization >> step. The VM can perform large amounts of I/O while the job is paused, >> and then completing the job can take a large amount of time. > > If management wants to limit the impact of this, it can decide to > explicitly stop the VM when it receives the error event. That can be too late. Eric, is it a problem for libvirt if a pause or target error during mirroring causes the job to exit steady state? That means that after a target error the offset can go back from 100% to <100%. >>> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going >>> on at all. Do we really keep running the jobs in 1.1? If so, this is a >>> bug and should be fixed before the release. >> >> Yes, we do. Do you think it's a problem for migration (thinking more >> about it: ouch, yes, it should be)? > > I'm pretty sure that it is a problem for migration. And it's likely a > problem in more cases. On the other hand, in other cases it can be desirable (qemu -S, run streaming before the VM starts). >> I'd rather make the extension of query-block-jobs more generic, with a >> list "devices" instead of a member "target", and making up the device >> name in the implementation (so you have "device": "target" for mirroring). > > Well, my idea for blockdev was something like (represented in a -drive > syntax because I don't know what it will look like): > > (qemu) blockdev_add file=foo.img,id=src > (qemu) device_add virtio-blk-pci,drive=src > ... > (qemu) blockdev_add file=bar.img,id=dst > (qemu) blockdev_mirror foo bar > > Once QOM reaches the block layer, I guess we'll want to make all > BlockDriverStates user visible anyway. I don't disagree, b
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Am 21.05.2012 12:02, schrieb Paolo Bonzini: > Il 21/05/2012 11:29, Kevin Wolf ha scritto: >>> * block-stream: I propose adding two options to the existing >>> block-stream command. If this is rejected, only mirroring will be able >>> to use rerror/werror. >>> >>> The new options are of course rerror/werror. They are enum options, >>> with the following possible values: >> >> Do we really need separate werror/rerror? For guest operations they >> really exist only for historical reasons: werror was there first, and >> when we wanted the same functionality, it seemed odd to overload werror >> to include reads as well. >> >> For block jobs, where there is no such option yet, we could go with a >> single error option, unless there is a use case for separate >> werror/rerror options. > > For mirroring rerror=source and werror=target. I'm not sure there is an > actual usecase, but at least it is more interesting than for devices... Hm. What if we add an active mirror? Then we can get some kind of COW, and rerror can happen on the target as well. If source/target is really the distinction we want to have, should the available options be specific to the job type, so that you could have src_error and dst_error for mirroring? >> Just like with guest operations it's a mostly useless mode, do we really >> need this option? > > Perhaps we should remove it for guest operations as well; certainly it > makes more sense (if any) for jobs than for guest operations. The guest operation one I used once for debugging/reproducing a specific error condition, and I'm not aware of other users. Maybe it can be useful for jobs in similar contexts. >>> 'stop': The VM *and* the job will be paused---the VM is stopped even if >>> the block device has neither rerror=stop nor werror={stop,enospc}. The >>> error is recorded in the block device's iostatus (which can be examined >>> with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a >>> job. >>> >>> Rationale: stopping all I/O seems to be the best choice in order >>> to limit the number of errors received. However, due to backwards- >>> compatibility with QEMU 1.1 we cannot pause the job when guest- >>> initiated I/O causes an error. We could do that if the block >>> device has rerror=stop/werror={stop,enospc}, but it seems more >>> complicated to just never do it. >> >> I don't agree with stopping the VM. Consider a case where the target is >> somewhere on the network and you lose the connection, but the primary >> image is local on the hard disk. You don't want to stop the VM just >> because continuing with the copy isn't possible for the moment. > > I think this is something that management should resolve. Management doesn't necessarily exist. > For an error > on the source, stopping the VM makes sense. I don't think management > cares about what caused an I/O error on a device. Does it matter if > streaming was active or rather the guest was executing "dd if=/dev/sda > of=/dev/null". Yes, there's a big difference: If it was a job, the guest can keep running without any problems. If it was a guest operation, we would have to return an error to the guest, which may offline the disk in response. As long as the VM can keep running, we should let it run. > Management may want to keep the VM stopped even for an error on the > target, as long as mirroring has finished the initial synchronization > step. The VM can perform large amounts of I/O while the job is paused, > and then completing the job can take a large amount of time. If management wants to limit the impact of this, it can decide to explicitly stop the VM when it receives the error event. >> Of course, this means that you can't reuse the block device's io_status, >> but you need a separate job_iostatus. > > For mirroring, source and target are separate devices and have separate > iostatuses anyway. > >> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going >> on at all. Do we really keep running the jobs in 1.1? If so, this is a >> bug and should be fixed before the release. > > Yes, we do. Do you think it's a problem for migration (thinking more > about it: ouch, yes, it should be)? I'm pretty sure that it is a problem for migration. And it's likely a problem in more cases. > We have no pause/resume infrastructure, so we could simply force > synchronous cancellation at the end (before vm_stop_force_state). > Stefan, do you have any free cycle for this? Not very nice, but considering that we're past -rc2 this might be the only thing we can do now. >>> * query-block-jobs: The returned JSON object will grow an additional >>> member, "target". The target field is a dictionary with two fields, >>> "info" and "stats" (resembling the output of query-block and >>> query-blockstat but for the mirroring target). Member "device" of the >>> BlockInfo structure will be made optional. >>> >>> Rationale: this allows libvirt to observe the high watermark of qcow2
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Il 21/05/2012 11:29, Kevin Wolf ha scritto: >> * block-stream: I propose adding two options to the existing >> block-stream command. If this is rejected, only mirroring will be able >> to use rerror/werror. >> >> The new options are of course rerror/werror. They are enum options, >> with the following possible values: > > Do we really need separate werror/rerror? For guest operations they > really exist only for historical reasons: werror was there first, and > when we wanted the same functionality, it seemed odd to overload werror > to include reads as well. > > For block jobs, where there is no such option yet, we could go with a > single error option, unless there is a use case for separate > werror/rerror options. For mirroring rerror=source and werror=target. I'm not sure there is an actual usecase, but at least it is more interesting than for devices... >> 'report': The behavior is the same as in 1.1. An I/O error, >> respectively during a read or a write, will complete the job immediately >> with an error code. >> >> 'ignore': An I/O error, respectively during a read or a write, will be >> ignored. For streaming, the job will complete with an error and the >> backing file will be left in place. For mirroring, the sector will be >> marked again as dirty and re-examined later. > > This is not really 'ignore' as used for guest operations. There it means > "no matter what the return value is, the operation has succeeded". For > streaming it would mean that it just goes on with the next cluster (and > if we don't cut the backing file link at the end, it would at least not > corrupt anything). Yes, for streaming it would mean that it just goes on with the next cluster and then report an error at the end. > Just like with guest operations it's a mostly useless mode, do we really > need this option? Perhaps we should remove it for guest operations as well; certainly it makes more sense (if any) for jobs than for guest operations. >> 'stop': The VM *and* the job will be paused---the VM is stopped even if >> the block device has neither rerror=stop nor werror={stop,enospc}. The >> error is recorded in the block device's iostatus (which can be examined >> with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a >> job. >> >> Rationale: stopping all I/O seems to be the best choice in order >> to limit the number of errors received. However, due to backwards- >> compatibility with QEMU 1.1 we cannot pause the job when guest- >> initiated I/O causes an error. We could do that if the block >> device has rerror=stop/werror={stop,enospc}, but it seems more >> complicated to just never do it. > > I don't agree with stopping the VM. Consider a case where the target is > somewhere on the network and you lose the connection, but the primary > image is local on the hard disk. You don't want to stop the VM just > because continuing with the copy isn't possible for the moment. I think this is something that management should resolve. For an error on the source, stopping the VM makes sense. I don't think management cares about what caused an I/O error on a device. Does it matter if streaming was active or rather the guest was executing "dd if=/dev/sda of=/dev/null". Management may want to keep the VM stopped even for an error on the target, as long as mirroring has finished the initial synchronization step. The VM can perform large amounts of I/O while the job is paused, and then completing the job can take a large amount of time. > Of course, this means that you can't reuse the block device's io_status, > but you need a separate job_iostatus. For mirroring, source and target are separate devices and have separate iostatuses anyway. > If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going > on at all. Do we really keep running the jobs in 1.1? If so, this is a > bug and should be fixed before the release. Yes, we do. Do you think it's a problem for migration (thinking more about it: ouch, yes, it should be)? We have no pause/resume infrastructure, so we could simply force synchronous cancellation at the end (before vm_stop_force_state). Stefan, do you have any free cycle for this? >> * query-block-jobs: The returned JSON object will grow an additional >> member, "target". The target field is a dictionary with two fields, >> "info" and "stats" (resembling the output of query-block and >> query-blockstat but for the mirroring target). Member "device" of the >> BlockInfo structure will be made optional. >> >> Rationale: this allows libvirt to observe the high watermark of qcow2 >> mirroring targets, and avoids putting a bad iostatus on a working >> migration source. > > The mirroring target should be present in query-block instead. It is a > user-visible BlockDriverState It is not user visible, and making it user visible adds a lot more things to worry about (e.g. making sure you cannot use it in a device_add). It reminds me of Xen's renami
Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Am 18.05.2012 19:08, schrieb Paolo Bonzini: > Hi all, > > the current block job API is designed for streaming; one property of > streaming is that in case of an error it can be restarted from the point > where it was left. > > In QEMU 1.2 I would like to add an implementation of mirroring (live > block copy) based on the block job API and on dirty-block tracking. > Unlike streaming, this operation is not restartable, because canceling > the job turns off dirty-block tracking. > > To avoid this problem, my proposal is to add to block jobs options > similar to rerror/werror. There are a few more details required to get > this to work, and the purpose of this email is to summarize these details. I generally like the proposal. For details, I'll comment inline. > * block-stream: I propose adding two options to the existing > block-stream command. If this is rejected, only mirroring will be able > to use rerror/werror. > > The new options are of course rerror/werror. They are enum options, > with the following possible values: Do we really need separate werror/rerror? For guest operations they really exist only for historical reasons: werror was there first, and when we wanted the same functionality, it seemed odd to overload werror to include reads as well. For block jobs, where there is no such option yet, we could go with a single error option, unless there is a use case for separate werror/rerror options. > 'report': The behavior is the same as in 1.1. An I/O error, > respectively during a read or a write, will complete the job immediately > with an error code. > > 'ignore': An I/O error, respectively during a read or a write, will be > ignored. For streaming, the job will complete with an error and the > backing file will be left in place. For mirroring, the sector will be > marked again as dirty and re-examined later. This is not really 'ignore' as used for guest operations. There it means "no matter what the return value is, the operation has succeeded". For streaming it would mean that it just goes on with the next cluster (and if we don't cut the backing file link at the end, it would at least not corrupt anything). Just like with guest operations it's a mostly useless mode, do we really need this option? > 'stop': The VM *and* the job will be paused---the VM is stopped even if > the block device has neither rerror=stop nor werror={stop,enospc}. The > error is recorded in the block device's iostatus (which can be examined > with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a > job. > > Rationale: stopping all I/O seems to be the best choice in order > to limit the number of errors received. However, due to backwards- > compatibility with QEMU 1.1 we cannot pause the job when guest- > initiated I/O causes an error. We could do that if the block > device has rerror=stop/werror={stop,enospc}, but it seems more > complicated to just never do it. I don't agree with stopping the VM. Consider a case where the target is somewhere on the network and you lose the connection, but the primary image is local on the hard disk. You don't want to stop the VM just because continuing with the copy isn't possible for the moment. Of course, this means that you can't reuse the block device's io_status, but you need a separate job_iostatus. If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going on at all. Do we really keep running the jobs in 1.1? If so, this is a bug and should be fixed before the release. > * query-block-jobs: The returned JSON object will grow an additional > member, "target". The target field is a dictionary with two fields, > "info" and "stats" (resembling the output of query-block and > query-blockstat but for the mirroring target). Member "device" of the > BlockInfo structure will be made optional. > > Rationale: this allows libvirt to observe the high watermark of qcow2 > mirroring targets, and avoids putting a bad iostatus on a working > migration source. The mirroring target should be present in query-block instead. It is a user-visible BlockDriverState, so let's treat it like one. We just need to give it a name. > * cont: even though cont does _not_ restart the block job that reported > an error, the iostatus is reset for all block devices that are attached > to a block job (like the mirroring target). > > Rationale: cont anyway resets the iostatus for the streaming target > or mirroring source, because there is a single iostatus for the > device and the job. It is simpler to do the same also for the > mirroring target. See above, better have a separate iostatus for the job. > > * block-job-resume also resets the iostatus on the mirroring target. > > * block-job-complete: new command specific to mirroring (switches the > device to the target), not related to the rest of the proposal. What semantics will block-job-cancel have then for mirroring? Will it be incompatible with RHEL 6? Kevin
[Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Hi all, the current block job API is designed for streaming; one property of streaming is that in case of an error it can be restarted from the point where it was left. In QEMU 1.2 I would like to add an implementation of mirroring (live block copy) based on the block job API and on dirty-block tracking. Unlike streaming, this operation is not restartable, because canceling the job turns off dirty-block tracking. To avoid this problem, my proposal is to add to block jobs options similar to rerror/werror. There are a few more details required to get this to work, and the purpose of this email is to summarize these details. New QMP commands The following QMP commands are added. * block-job-pause: Takes a block device (drive), pauses an active background block operation on that device. This command returns immediately after marking the active background block operation for pausing. It is an error to call this command if no operation is in progress. The operation will pause as soon as possible (it won't pause if the job is being cancelled). No event is emitted when the operation is actually paused. Cancelling a paused job automatically resumes it. * block-job-resume: Takes a block device (drive), resume a paused background block operation on that device. This command returns immediately after resuming a paused background block operation. It is an error to call this command if no operation is in progress. A successful block-job-resume operation also resets the iostatus on the device that is passed. Rationale: because block job errors can occur even while the VM is stopped, rerror=stop/werror=stop cannot reuse the "cont" monitor command to restart a block job. So block-job-resume is required anyway. Adding block-job-pause makes it simpler to test the new feature. Modified QMP commands = * block-stream: I propose adding two options to the existing block-stream command. If this is rejected, only mirroring will be able to use rerror/werror. The new options are of course rerror/werror. They are enum options, with the following possible values: 'report': The behavior is the same as in 1.1. An I/O error, respectively during a read or a write, will complete the job immediately with an error code. 'ignore': An I/O error, respectively during a read or a write, will be ignored. For streaming, the job will complete with an error and the backing file will be left in place. For mirroring, the sector will be marked again as dirty and re-examined later. 'stop': The VM *and* the job will be paused---the VM is stopped even if the block device has neither rerror=stop nor werror={stop,enospc}. The error is recorded in the block device's iostatus (which can be examined with query-block). However, a BLOCK_IO_ERROR event will _never_ pause a job. Rationale: stopping all I/O seems to be the best choice in order to limit the number of errors received. However, due to backwards- compatibility with QEMU 1.1 we cannot pause the job when guest- initiated I/O causes an error. We could do that if the block device has rerror=stop/werror={stop,enospc}, but it seems more complicated to just never do it. 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others. In all cases, even for 'report', the I/O error is reported as a QMP event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR. It is possible that while stopping the VM a BLOCK_IO_ERROR event will be reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa. This is not really avoidable since stopping the VM completes all pending I/O requests. In fact, it is already possible now that a series of BLOCK_IO_ERROR events are reported with rerror=stop. After cancelling a job, the job implementation MAY choose to treat stop and enospc values as report, i.e. complete the job immediately with an error code, as long as block_job_is_cancelled(job) returns true when the completion callback is called. Open problems: There could be unrecoverable errors in which the job will be completed as if rerror/werror were set to report (example: error while switching backing files). Does it make sense to fire an event before the point in time where such errors can happen? Other points specific to mirroring == * query-block-jobs: The returned JSON object will grow an additional member, "target". The target field is a dictionary with two fields, "info" and "stats" (resembling the output of query-block and query-blockstat but for the mirroring target). Member "device" of the BlockInfo structure will be made optional. Rationale: this allows libvirt to observe the high watermark of qcow2 mirroring targets, and avoids putting a bad iostatus on a working migration source. * cont: even though cont does _not_ restart the block job that reported an error, the iostatus is reset for all block devices that are attached to a block job (like the