Re: [Qemu-devel] [PATCH] bdrv_flush error handling
On Thu, Feb 21, 2008 at 11:28:23AM -0600, [EMAIL PROTECTED] wrote: > On Thu, Feb 21, 2008 at 05:24:10PM +, Daniel P. Berrange wrote: > > On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote: > > > > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on > > > > > error (the return value from write) whereas the other block read/write > > > > > methods return errno values. This is a mistake, surely ? -1 would be > > > > > -EPERM. If any of the callers did anything with these return values > > > > > you'd get incorrect error indications. > > > > > > > > > > > > > > > Finally, it would perhaps be best if the block device emulators wrote > > > > > to the qemu console to complain if they give write errors. Otherwise > > > > > the errno value and other important information will be lost, which > > > > > makes debugging hard. > > > > > > > > If by 'qemu console' you mean stderr, then fine, but please don't > > > > spew log messages to the monitor console, because that'll make it > > > > very hard to interact with reliably from management tools. > > > > > > Would it make sense to have a log messages screen associated with > > > the monitor (like Ctrl-Alt-7) to deal with those sorts of things? > > > > Why invent a new special QEMU log screen, when stderr works just fine. If an > > app wants to capture log messages they just capture stderr and persist it. > > We already capture stderr for exactly this reason in libvirt when managing > > QEMU instances. > > Please excuse my intrusion in this thread, but I'm a user of the new > ncurses user interface. when ssh'd in, running qemu, I don't believe > having messages pop out of stderr and over the current screen contents is > the appropriate behavior, as it sounds to me like it would cause redraw > defects in the normal text console (via ncurses) So just redirect stderr to a logfile, or /dev/null, etc, etc Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
On Thu, Feb 21, 2008 at 05:24:10PM +, Daniel P. Berrange wrote: > On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote: > > > > "Daniel P. Berrange" <[EMAIL PROTECTED]> wrote: > > > On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote: > > > Content-Description: message body text > > > > bdrv_flush is declared to return void, but this is wrong because it > > > > means that the implementations have nowhere to report their errors. > > > > Indeed, the implementations generally ignore errors. > > > > > > > > This patch corrects this by making it return int (implicitly, either 0 > > > > or -errno, as for other similar functions). All of the > > > > implementations and callers are adjusted too. > > > > > > > > > > > > While looking at this I was surprised to see this: > > > > > > > > static void scsi_write_complete(void * opaque, int ret) > > > > ... > > > > if (ret) { > > > > fprintf(stderr, "scsi-disc: IO write error\n"); > > > > exit(1); > > > > } > > > > > > > > Surely that is overkill ? > > > > > > Yes, any i/o errors I'd expect to be propagated back up to the guest > > > OS as the most appropriate IDE / SCSI error code. > > > > > > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on > > > > error (the return value from write) whereas the other block read/write > > > > methods return errno values. This is a mistake, surely ? -1 would be > > > > -EPERM. If any of the callers did anything with these return values > > > > you'd get incorrect error indications. > > > > > > > > > > > > Finally, it would perhaps be best if the block device emulators wrote > > > > to the qemu console to complain if they give write errors. Otherwise > > > > the errno value and other important information will be lost, which > > > > makes debugging hard. > > > > > > If by 'qemu console' you mean stderr, then fine, but please don't > > > spew log messages to the monitor console, because that'll make it > > > very hard to interact with reliably from management tools. > > > > Would it make sense to have a log messages screen associated with > > the monitor (like Ctrl-Alt-7) to deal with those sorts of things? > > Why invent a new special QEMU log screen, when stderr works just fine. If an > app wants to capture log messages they just capture stderr and persist it. > We already capture stderr for exactly this reason in libvirt when managing > QEMU instances. > > Dan, > -- > |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| > |=- Perl modules: http://search.cpan.org/~danberr/ -=| > |=- Projects: http://freshmeat.net/~danielpb/ -=| > |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| > Please excuse my intrusion in this thread, but I'm a user of the new ncurses user interface. when ssh'd in, running qemu, I don't believe having messages pop out of stderr and over the current screen contents is the appropriate behavior, as it sounds to me like it would cause redraw defects in the normal text console (via ncurses) Julia Longtin <[EMAIL PROTECTED]>
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote: > > "Daniel P. Berrange" <[EMAIL PROTECTED]> wrote: > > On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote: > > Content-Description: message body text > > > bdrv_flush is declared to return void, but this is wrong because it > > > means that the implementations have nowhere to report their errors. > > > Indeed, the implementations generally ignore errors. > > > > > > This patch corrects this by making it return int (implicitly, either 0 > > > or -errno, as for other similar functions). All of the > > > implementations and callers are adjusted too. > > > > > > > > > While looking at this I was surprised to see this: > > > > > > static void scsi_write_complete(void * opaque, int ret) > > > ... > > > if (ret) { > > > fprintf(stderr, "scsi-disc: IO write error\n"); > > > exit(1); > > > } > > > > > > Surely that is overkill ? > > > > Yes, any i/o errors I'd expect to be propagated back up to the guest > > OS as the most appropriate IDE / SCSI error code. > > > > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on > > > error (the return value from write) whereas the other block read/write > > > methods return errno values. This is a mistake, surely ? -1 would be > > > -EPERM. If any of the callers did anything with these return values > > > you'd get incorrect error indications. > > > > > > > > > Finally, it would perhaps be best if the block device emulators wrote > > > to the qemu console to complain if they give write errors. Otherwise > > > the errno value and other important information will be lost, which > > > makes debugging hard. > > > > If by 'qemu console' you mean stderr, then fine, but please don't > > spew log messages to the monitor console, because that'll make it > > very hard to interact with reliably from management tools. > > Would it make sense to have a log messages screen associated with > the monitor (like Ctrl-Alt-7) to deal with those sorts of things? Why invent a new special QEMU log screen, when stderr works just fine. If an app wants to capture log messages they just capture stderr and persist it. We already capture stderr for exactly this reason in libvirt when managing QEMU instances. Dan, -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
"Daniel P. Berrange" <[EMAIL PROTECTED]> wrote: > On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote: > Content-Description: message body text > > bdrv_flush is declared to return void, but this is wrong because it > > means that the implementations have nowhere to report their errors. > > Indeed, the implementations generally ignore errors. > > > > This patch corrects this by making it return int (implicitly, either 0 > > or -errno, as for other similar functions). All of the > > implementations and callers are adjusted too. > > > > > > While looking at this I was surprised to see this: > > > > static void scsi_write_complete(void * opaque, int ret) > > ... > > if (ret) { > > fprintf(stderr, "scsi-disc: IO write error\n"); > > exit(1); > > } > > > > Surely that is overkill ? > > Yes, any i/o errors I'd expect to be propagated back up to the guest > OS as the most appropriate IDE / SCSI error code. > > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on > > error (the return value from write) whereas the other block read/write > > methods return errno values. This is a mistake, surely ? -1 would be > > -EPERM. If any of the callers did anything with these return values > > you'd get incorrect error indications. > > > > > > Finally, it would perhaps be best if the block device emulators wrote > > to the qemu console to complain if they give write errors. Otherwise > > the errno value and other important information will be lost, which > > makes debugging hard. > > If by 'qemu console' you mean stderr, then fine, but please don't > spew log messages to the monitor console, because that'll make it > very hard to interact with reliably from management tools. Would it make sense to have a log messages screen associated with the monitor (like Ctrl-Alt-7) to deal with those sorts of things? Ben
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Sometimes the guest intentionally seeks the error. Example? TrueCrypt 5.0 supports encryption of the full system disk. To get the real size of the disk, the truecrypt driver queries the number of blocks of the disk, but then tries to read after the last block, until it gets an error. Qemu hangs in this operation. So, for me, the blockdriver has to give the errors back to the guest (in this case, reading behind eof). In no cases, qemu should die because of that. Tom PS: I had to patch truecrypt because of this qemu feature.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
On Wednesday 20 February 2008 01:01:33 pm Anthony Liguori wrote: > Daniel P. Berrange wrote: > > On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote: > >> Ian Jackson wrote: > >>> Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): > >>>> Disk full is a fundamentally unfriendly situation to be in. There is > >>>> no good answer. Reporting errors back to the host has its own set of > >>>> problems. Many guest OS effectively just lock up when this occurs. > >>> > >>> I think that's fine, surely ? A locked up guest isn't very nice but > >>> it's better than a guest shot dead. > >> > >> Well, a guest which receives an IDE write error might do things like > >> mark parts of the device bad, to avoiding writing to those parts. If > >> the guest is running software RAID, for example, it will radically > >> change its behaviour in response to those errors. > >> > >> Sometimes that's what you want, but sometimes it is really unwanted. > >> If the host runs out of disk space, ideally you might want to suspend > >> the guest until you can free up host disk space (or move to another > >> host), then resume the guest, perhaps manually. > >> > >> Is savevm-upon-disk-full a realistic prospect? > > > > In the 'out of disk space' scenario you wouldn't need to save the guest - > > merely stop its CPU. This gives the host admin the opportunity to hot-add > > storage to the host & then resume execution, or to kill the VM, or to > > free enough space to save the VM to disk / live migrate it to another > > host. > > I agree. Stopping the CPUs and spitting out a big fat warning message > would be the best thing to do. For the average user, this would give > the opportunity to free up some space if possible. agreed. > > Regards, > > Anthony Liguori > > > Shooting it dead on any I/O error doesn't give the host admin any choices > > at all > > > > Dan.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Daniel P. Berrange wrote: On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote: Ian Jackson wrote: Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): Disk full is a fundamentally unfriendly situation to be in. There is no good answer. Reporting errors back to the host has its own set of problems. Many guest OS effectively just lock up when this occurs. I think that's fine, surely ? A locked up guest isn't very nice but it's better than a guest shot dead. Well, a guest which receives an IDE write error might do things like mark parts of the device bad, to avoiding writing to those parts. If the guest is running software RAID, for example, it will radically change its behaviour in response to those errors. Sometimes that's what you want, but sometimes it is really unwanted. If the host runs out of disk space, ideally you might want to suspend the guest until you can free up host disk space (or move to another host), then resume the guest, perhaps manually. Is savevm-upon-disk-full a realistic prospect? In the 'out of disk space' scenario you wouldn't need to save the guest - merely stop its CPU. This gives the host admin the opportunity to hot-add storage to the host & then resume execution, or to kill the VM, or to free enough space to save the VM to disk / live migrate it to another host. I agree. Stopping the CPUs and spitting out a big fat warning message would be the best thing to do. For the average user, this would give the opportunity to free up some space if possible. Regards, Anthony Liguori Shooting it dead on any I/O error doesn't give the host admin any choices at all Dan.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote: > Ian Jackson wrote: > > Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): > > > Disk full is a fundamentally unfriendly situation to be in. There is no > > > good > > > answer. Reporting errors back to the host has its own set of problems. > > > Many > > > guest OS effectively just lock up when this occurs. > > > > I think that's fine, surely ? A locked up guest isn't very nice but > > it's better than a guest shot dead. > > Well, a guest which receives an IDE write error might do things like > mark parts of the device bad, to avoiding writing to those parts. If > the guest is running software RAID, for example, it will radically > change its behaviour in response to those errors. > > Sometimes that's what you want, but sometimes it is really unwanted. > If the host runs out of disk space, ideally you might want to suspend > the guest until you can free up host disk space (or move to another > host), then resume the guest, perhaps manually. > > Is savevm-upon-disk-full a realistic prospect? In the 'out of disk space' scenario you wouldn't need to save the guest - merely stop its CPU. This gives the host admin the opportunity to hot-add storage to the host & then resume execution, or to kill the VM, or to free enough space to save the VM to disk / live migrate it to another host. Shooting it dead on any I/O error doesn't give the host admin any choices at all Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Paul Brook wrote: > > Is savevm-upon-disk-full a realistic prospect? > > Not particularly useful. If you're run out of disk space, chances are that > savevm will also fail. I'm imagining (a) that the savevm space is preallocated, or (b) is on a different disk. -- Jamie
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
> Is savevm-upon-disk-full a realistic prospect? Not particularly useful. If you're run out of disk space, chances are that savevm will also fail. Paul
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Ian Jackson wrote: > Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): > > Disk full is a fundamentally unfriendly situation to be in. There is no > > good > > answer. Reporting errors back to the host has its own set of problems. Many > > guest OS effectively just lock up when this occurs. > > I think that's fine, surely ? A locked up guest isn't very nice but > it's better than a guest shot dead. Well, a guest which receives an IDE write error might do things like mark parts of the device bad, to avoiding writing to those parts. If the guest is running software RAID, for example, it will radically change its behaviour in response to those errors. Sometimes that's what you want, but sometimes it is really unwanted. If the host runs out of disk space, ideally you might want to suspend the guest until you can free up host disk space (or move to another host), then resume the guest, perhaps manually. Is savevm-upon-disk-full a realistic prospect? -- Jamie
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): > Disk full is a fundamentally unfriendly situation to be in. There is no good > answer. Reporting errors back to the host has its own set of problems. Many > guest OS effectively just lock up when this occurs. I think that's fine, surely ? A locked up guest isn't very nice but it's better than a guest shot dead. Ian.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
> Write errors for non-raw formats can easily be caused by a disk full > situation on the host. Killing the guest hard is unfriendly for that > situation. Disk full is a fundamentally unfriendly situation to be in. There is no good answer. Reporting errors back to the host has its own set of problems. Many guest OS effectively just lock up when this occurs. Paul
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Avi Kivity writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): > For non-raw formats, you can pass through errors on data, but it is > impossible to recover on metadata errors, so dying on I/O error should > be fine. You mean metadata write errors I assume. I don't see why a metadata read error is any worse than any other read error. The guest will often prefer to be told that the volume was broken and then to be denied the ability to continue accessing it. Who knows what the guest thinks of the device ? Perhaps it's an unimportant peripheral, or simulated network block device, used only for data interchange. Write errors for non-raw formats can easily be caused by a disk full situation on the host. Killing the guest hard is unfriendly for that situation. Ian.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Paul Brook wrote: Finally, it would perhaps be best if the block device emulators wrote to the qemu console to complain if they give write errors. Otherwise the errno value and other important information will be lost, which makes debugging hard. If by 'qemu console' you mean stderr, then fine, but please don't spew log messages to the monitor console, because that'll make it very hard to interact with reliably from management tools. Actually I think a better default would be for qemu to die right there and then. If the host is getting IO errors then something is badly wrong, and you're probably screwed anyway. If you're passing through a real disk or volume, this denies the guest the possibility of recover y (for example, if it is running a RAID setup over multiple volumes, or if you are testing the guest's ability to recover from errors). For non-raw formats, you can pass through errors on data, but it is impossible to recover on metadata errors, so dying on I/O error should be fine. -- Any sufficiently difficult bug is indistinguishable from a feature.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
> > Finally, it would perhaps be best if the block device emulators wrote > > to the qemu console to complain if they give write errors. Otherwise > > the errno value and other important information will be lost, which > > makes debugging hard. > > If by 'qemu console' you mean stderr, then fine, but please don't > spew log messages to the monitor console, because that'll make it > very hard to interact with reliably from management tools. Actually I think a better default would be for qemu to die right there and then. If the host is getting IO errors then something is badly wrong, and you're probably screwed anyway. Paul
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
Daniel P. Berrange writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"): > On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote: > > Finally, it would perhaps be best if the block device emulators wrote > > to the qemu console to complain if they give write errors. Otherwise > > the errno value and other important information will be lost, which > > makes debugging hard. > > If by 'qemu console' you mean stderr, then fine, but please don't > spew log messages to the monitor console, because that'll make it > very hard to interact with reliably from management tools. Is it permitted, then, to generally print to qemu's stderr from inside a device model ? This seems to be done quite a bit during initialisation, and quite a bit in various rather less common kinds of device, but not routinely in the main emulations. Perhaps it would be better to invent a `logprintf' function to make it easier to redirect these kind of messages later if that turns out to be desirable ? Ian.
Re: [Qemu-devel] [PATCH] bdrv_flush error handling
On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote: Content-Description: message body text > bdrv_flush is declared to return void, but this is wrong because it > means that the implementations have nowhere to report their errors. > Indeed, the implementations generally ignore errors. > > This patch corrects this by making it return int (implicitly, either 0 > or -errno, as for other similar functions). All of the > implementations and callers are adjusted too. > > > While looking at this I was surprised to see this: > > static void scsi_write_complete(void * opaque, int ret) > ... > if (ret) { > fprintf(stderr, "scsi-disc: IO write error\n"); > exit(1); > } > > Surely that is overkill ? Yes, any i/o errors I'd expect to be propagated back up to the guest OS as the most appropriate IDE / SCSI error code. > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on > error (the return value from write) whereas the other block read/write > methods return errno values. This is a mistake, surely ? -1 would be > -EPERM. If any of the callers did anything with these return values > you'd get incorrect error indications. > > > Finally, it would perhaps be best if the block device emulators wrote > to the qemu console to complain if they give write errors. Otherwise > the errno value and other important information will be lost, which > makes debugging hard. If by 'qemu console' you mean stderr, then fine, but please don't spew log messages to the monitor console, because that'll make it very hard to interact with reliably from management tools. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|