Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread Daniel P. Berrange
On Thu, Feb 21, 2008 at 11:28:23AM -0600, [EMAIL PROTECTED] wrote:
> On Thu, Feb 21, 2008 at 05:24:10PM +, Daniel P. Berrange wrote:
> > On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote:
> > > > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
> > > > > error (the return value from write) whereas the other block read/write
> > > > > methods return errno values.  This is a mistake, surely ?  -1 would be
> > > > > -EPERM.  If any of the callers did anything with these return values
> > > > > you'd get incorrect error indications.
> > > > > 
> > > > > 
> > > > > Finally, it would perhaps be best if the block device emulators wrote
> > > > > to the qemu console to complain if they give write errors.  Otherwise
> > > > > the errno value and other important information will be lost, which
> > > > > makes debugging hard.
> > > > 
> > > > If by 'qemu console' you mean stderr, then fine, but please don't
> > > > spew log messages to the  monitor console, because that'll make it
> > > > very hard to interact with reliably from management tools. 
> > > 
> > > Would it make sense to have a log messages screen associated with
> > > the monitor (like Ctrl-Alt-7) to deal with those sorts of things?
> > 
> > Why invent a new special QEMU log screen, when stderr works just fine. If an
> > app wants to capture log messages they just capture stderr and persist it.
> > We already capture stderr for exactly this reason in libvirt when managing
> > QEMU instances.
> 
> Please excuse my intrusion in this thread, but I'm a user of the new
> ncurses user interface. when ssh'd in, running qemu, I don't believe 
> having messages pop out of stderr and over the current screen contents is 
> the appropriate behavior, as it sounds to me like it would cause redraw 
> defects in the normal text console (via ncurses)

So just redirect stderr to a logfile, or /dev/null, etc, etc

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread risc
On Thu, Feb 21, 2008 at 05:24:10PM +, Daniel P. Berrange wrote:
> On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote:
> > 
> >  "Daniel P. Berrange" <[EMAIL PROTECTED]> wrote: 
> > > On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
> > > Content-Description: message body text
> > > > bdrv_flush is declared to return void, but this is wrong because it
> > > > means that the implementations have nowhere to report their errors.
> > > > Indeed, the implementations generally ignore errors.
> > > > 
> > > > This patch corrects this by making it return int (implicitly, either 0
> > > > or -errno, as for other similar functions).  All of the
> > > > implementations and callers are adjusted too.
> > > > 
> > > > 
> > > > While looking at this I was surprised to see this:
> > > > 
> > > > static void scsi_write_complete(void * opaque, int ret)
> > > > ...
> > > > if (ret) {
> > > > fprintf(stderr, "scsi-disc: IO write error\n");
> > > > exit(1);
> > > > }
> > > > 
> > > > Surely that is overkill ?
> > > 
> > > Yes, any i/o errors I'd expect to be propagated back up to the guest
> > > OS as the most appropriate IDE / SCSI error code.
> > > 
> > > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
> > > > error (the return value from write) whereas the other block read/write
> > > > methods return errno values.  This is a mistake, surely ?  -1 would be
> > > > -EPERM.  If any of the callers did anything with these return values
> > > > you'd get incorrect error indications.
> > > > 
> > > > 
> > > > Finally, it would perhaps be best if the block device emulators wrote
> > > > to the qemu console to complain if they give write errors.  Otherwise
> > > > the errno value and other important information will be lost, which
> > > > makes debugging hard.
> > > 
> > > If by 'qemu console' you mean stderr, then fine, but please don't
> > > spew log messages to the  monitor console, because that'll make it
> > > very hard to interact with reliably from management tools. 
> > 
> > Would it make sense to have a log messages screen associated with
> > the monitor (like Ctrl-Alt-7) to deal with those sorts of things?
> 
> Why invent a new special QEMU log screen, when stderr works just fine. If an
> app wants to capture log messages they just capture stderr and persist it.
> We already capture stderr for exactly this reason in libvirt when managing
> QEMU instances.
> 
> Dan,
> -- 
> |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
> |=-   Perl modules: http://search.cpan.org/~danberr/  -=|
> |=-   Projects: http://freshmeat.net/~danielpb/   -=|
> |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 
> 

Please excuse my intrusion in this thread, but I'm a user of the new
ncurses user interface. when ssh'd in, running qemu, I don't believe 
having messages pop out of stderr and over the current screen contents is 
the appropriate behavior, as it sounds to me like it would cause redraw 
defects in the normal text console (via ncurses)

Julia Longtin <[EMAIL PROTECTED]>





Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread Daniel P. Berrange
On Thu, Feb 21, 2008 at 12:19:22PM -0500, Ben Taylor wrote:
> 
>  "Daniel P. Berrange" <[EMAIL PROTECTED]> wrote: 
> > On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
> > Content-Description: message body text
> > > bdrv_flush is declared to return void, but this is wrong because it
> > > means that the implementations have nowhere to report their errors.
> > > Indeed, the implementations generally ignore errors.
> > > 
> > > This patch corrects this by making it return int (implicitly, either 0
> > > or -errno, as for other similar functions).  All of the
> > > implementations and callers are adjusted too.
> > > 
> > > 
> > > While looking at this I was surprised to see this:
> > > 
> > > static void scsi_write_complete(void * opaque, int ret)
> > > ...
> > >   if (ret) {
> > >   fprintf(stderr, "scsi-disc: IO write error\n");
> > >   exit(1);
> > >   }
> > > 
> > > Surely that is overkill ?
> > 
> > Yes, any i/o errors I'd expect to be propagated back up to the guest
> > OS as the most appropriate IDE / SCSI error code.
> > 
> > > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
> > > error (the return value from write) whereas the other block read/write
> > > methods return errno values.  This is a mistake, surely ?  -1 would be
> > > -EPERM.  If any of the callers did anything with these return values
> > > you'd get incorrect error indications.
> > > 
> > > 
> > > Finally, it would perhaps be best if the block device emulators wrote
> > > to the qemu console to complain if they give write errors.  Otherwise
> > > the errno value and other important information will be lost, which
> > > makes debugging hard.
> > 
> > If by 'qemu console' you mean stderr, then fine, but please don't
> > spew log messages to the  monitor console, because that'll make it
> > very hard to interact with reliably from management tools. 
> 
> Would it make sense to have a log messages screen associated with
> the monitor (like Ctrl-Alt-7) to deal with those sorts of things?

Why invent a new special QEMU log screen, when stderr works just fine. If an
app wants to capture log messages they just capture stderr and persist it.
We already capture stderr for exactly this reason in libvirt when managing
QEMU instances.

Dan,
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-21 Thread Ben Taylor

 "Daniel P. Berrange" <[EMAIL PROTECTED]> wrote: 
> On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
> Content-Description: message body text
> > bdrv_flush is declared to return void, but this is wrong because it
> > means that the implementations have nowhere to report their errors.
> > Indeed, the implementations generally ignore errors.
> > 
> > This patch corrects this by making it return int (implicitly, either 0
> > or -errno, as for other similar functions).  All of the
> > implementations and callers are adjusted too.
> > 
> > 
> > While looking at this I was surprised to see this:
> > 
> > static void scsi_write_complete(void * opaque, int ret)
> > ...
> > if (ret) {
> > fprintf(stderr, "scsi-disc: IO write error\n");
> > exit(1);
> > }
> > 
> > Surely that is overkill ?
> 
> Yes, any i/o errors I'd expect to be propagated back up to the guest
> OS as the most appropriate IDE / SCSI error code.
> 
> > Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
> > error (the return value from write) whereas the other block read/write
> > methods return errno values.  This is a mistake, surely ?  -1 would be
> > -EPERM.  If any of the callers did anything with these return values
> > you'd get incorrect error indications.
> > 
> > 
> > Finally, it would perhaps be best if the block device emulators wrote
> > to the qemu console to complain if they give write errors.  Otherwise
> > the errno value and other important information will be lost, which
> > makes debugging hard.
> 
> If by 'qemu console' you mean stderr, then fine, but please don't
> spew log messages to the  monitor console, because that'll make it
> very hard to interact with reliably from management tools. 

Would it make sense to have a log messages screen associated with
the monitor (like Ctrl-Alt-7) to deal with those sorts of things?

Ben




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Thomas Irlet
Sometimes the guest intentionally seeks the error. Example?

TrueCrypt 5.0 supports encryption of the full system disk. To get the real
size of the disk, the truecrypt driver queries the number of blocks of the
disk, but then tries to read after the last block, until it gets an error.
Qemu hangs in this operation. So, for me, the blockdriver has to give the
errors back to the guest (in this case, reading behind eof). In no cases,
qemu should die because of that.


Tom

PS: I had to patch truecrypt because of this qemu feature.


Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Rick Vernam
On Wednesday 20 February 2008 01:01:33 pm Anthony Liguori wrote:
> Daniel P. Berrange wrote:
> > On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote:
> >> Ian Jackson wrote:
> >>> Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error 
handling"):
> >>>> Disk full is a fundamentally unfriendly situation to be in. There is
> >>>> no good answer. Reporting errors back to the host has its own set of
> >>>> problems. Many guest OS effectively just lock up when this occurs.
> >>>
> >>> I think that's fine, surely ?  A locked up guest isn't very nice but
> >>> it's better than a guest shot dead.
> >>
> >> Well, a guest which receives an IDE write error might do things like
> >> mark parts of the device bad, to avoiding writing to those parts.  If
> >> the guest is running software RAID, for example, it will radically
> >> change its behaviour in response to those errors.
> >>
> >> Sometimes that's what you want, but sometimes it is really unwanted.
> >> If the host runs out of disk space, ideally you might want to suspend
> >> the guest until you can free up host disk space (or move to another
> >> host), then resume the guest, perhaps manually.
> >>
> >> Is savevm-upon-disk-full a realistic prospect?
> >
> > In the 'out of disk space' scenario you wouldn't need to save the guest -
> > merely stop its CPU. This gives the host admin the opportunity to hot-add
> > storage to the host & then resume execution, or to kill the VM, or to
> > free enough space to save the VM to disk / live migrate it to another
> > host.
>
> I agree.  Stopping the CPUs and spitting out a big fat warning message
> would be the best thing to do.  For the average user, this would give
> the opportunity to free up some space if possible.

agreed.

>
> Regards,
>
> Anthony Liguori
>
> > Shooting it dead on any I/O error doesn't give the host admin any choices
> > at all
> >
> > Dan.






Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Anthony Liguori

Daniel P. Berrange wrote:

On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote:
  

Ian Jackson wrote:


Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"):
  
Disk full is a fundamentally unfriendly situation to be in. There is no good 
answer. Reporting errors back to the host has its own set of problems. Many 
guest OS effectively just lock up when this occurs.


I think that's fine, surely ?  A locked up guest isn't very nice but
it's better than a guest shot dead.
  

Well, a guest which receives an IDE write error might do things like
mark parts of the device bad, to avoiding writing to those parts.  If
the guest is running software RAID, for example, it will radically
change its behaviour in response to those errors.

Sometimes that's what you want, but sometimes it is really unwanted.
If the host runs out of disk space, ideally you might want to suspend
the guest until you can free up host disk space (or move to another
host), then resume the guest, perhaps manually.

Is savevm-upon-disk-full a realistic prospect?



In the 'out of disk space' scenario you wouldn't need to save the guest - merely
stop its CPU. This gives the host admin the opportunity to hot-add storage
to the host & then resume execution, or to kill the VM, or to free enough
space to save the VM to disk / live migrate it to another host. 
  


I agree.  Stopping the CPUs and spitting out a big fat warning message 
would be the best thing to do.  For the average user, this would give 
the opportunity to free up some space if possible.


Regards,

Anthony Liguori


Shooting it dead on any I/O error doesn't give the host admin any choices
at all

Dan.
  






Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Daniel P. Berrange
On Wed, Feb 20, 2008 at 04:31:56PM +, Jamie Lokier wrote:
> Ian Jackson wrote:
> > Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"):
> > > Disk full is a fundamentally unfriendly situation to be in. There is no 
> > > good 
> > > answer. Reporting errors back to the host has its own set of problems. 
> > > Many 
> > > guest OS effectively just lock up when this occurs.
> > 
> > I think that's fine, surely ?  A locked up guest isn't very nice but
> > it's better than a guest shot dead.
> 
> Well, a guest which receives an IDE write error might do things like
> mark parts of the device bad, to avoiding writing to those parts.  If
> the guest is running software RAID, for example, it will radically
> change its behaviour in response to those errors.
> 
> Sometimes that's what you want, but sometimes it is really unwanted.
> If the host runs out of disk space, ideally you might want to suspend
> the guest until you can free up host disk space (or move to another
> host), then resume the guest, perhaps manually.
> 
> Is savevm-upon-disk-full a realistic prospect?

In the 'out of disk space' scenario you wouldn't need to save the guest - merely
stop its CPU. This gives the host admin the opportunity to hot-add storage
to the host & then resume execution, or to kill the VM, or to free enough
space to save the VM to disk / live migrate it to another host. 

Shooting it dead on any I/O error doesn't give the host admin any choices
at all

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Jamie Lokier
Paul Brook wrote:
> > Is savevm-upon-disk-full a realistic prospect?
> 
> Not particularly useful. If you're run out of disk space, chances are that 
> savevm will also fail.

I'm imagining (a) that the savevm space is preallocated, or (b) is on
a different disk.

-- Jamie




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Paul Brook
> Is savevm-upon-disk-full a realistic prospect?

Not particularly useful. If you're run out of disk space, chances are that 
savevm will also fail.

Paul




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Jamie Lokier
Ian Jackson wrote:
> Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"):
> > Disk full is a fundamentally unfriendly situation to be in. There is no 
> > good 
> > answer. Reporting errors back to the host has its own set of problems. Many 
> > guest OS effectively just lock up when this occurs.
> 
> I think that's fine, surely ?  A locked up guest isn't very nice but
> it's better than a guest shot dead.

Well, a guest which receives an IDE write error might do things like
mark parts of the device bad, to avoiding writing to those parts.  If
the guest is running software RAID, for example, it will radically
change its behaviour in response to those errors.

Sometimes that's what you want, but sometimes it is really unwanted.
If the host runs out of disk space, ideally you might want to suspend
the guest until you can free up host disk space (or move to another
host), then resume the guest, perhaps manually.

Is savevm-upon-disk-full a realistic prospect?

-- Jamie




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Ian Jackson
Paul Brook writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"):
> Disk full is a fundamentally unfriendly situation to be in. There is no good 
> answer. Reporting errors back to the host has its own set of problems. Many 
> guest OS effectively just lock up when this occurs.

I think that's fine, surely ?  A locked up guest isn't very nice but
it's better than a guest shot dead.

Ian.




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Paul Brook
> Write errors for non-raw formats can easily be caused by a disk full
> situation on the host.  Killing the guest hard is unfriendly for that
> situation.

Disk full is a fundamentally unfriendly situation to be in. There is no good 
answer. Reporting errors back to the host has its own set of problems. Many 
guest OS effectively just lock up when this occurs.

Paul




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Ian Jackson
Avi Kivity writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error handling"):
> For non-raw formats, you can pass through errors on data, but it is 
> impossible to recover on metadata errors, so dying on I/O error should 
> be fine.

You mean metadata write errors I assume.  I don't see why a metadata
read error is any worse than any other read error.

The guest will often prefer to be told that the volume was broken and
then to be denied the ability to continue accessing it.  Who knows
what the guest thinks of the device ?  Perhaps it's an unimportant
peripheral, or simulated network block device, used only for data
interchange.

Write errors for non-raw formats can easily be caused by a disk full
situation on the host.  Killing the guest hard is unfriendly for that
situation.

Ian.




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Avi Kivity

Paul Brook wrote:

Finally, it would perhaps be best if the block device emulators wrote
to the qemu console to complain if they give write errors.  Otherwise
the errno value and other important information will be lost, which
makes debugging hard.
  

If by 'qemu console' you mean stderr, then fine, but please don't
spew log messages to the  monitor console, because that'll make it
very hard to interact with reliably from management tools.



Actually I think a better default would be for qemu to die right there and 
then.  If the host is getting IO errors then something is badly wrong, and 
you're probably screwed anyway.
  


If you're passing through a real disk or volume, this denies the guest 
the possibility of recover y (for example, if it is running a RAID setup 
over multiple volumes, or if you are testing the guest's ability to 
recover from errors).


For non-raw formats, you can pass through errors on data, but it is 
impossible to recover on metadata errors, so dying on I/O error should 
be fine.


--
Any sufficiently difficult bug is indistinguishable from a feature.





Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Paul Brook
> > Finally, it would perhaps be best if the block device emulators wrote
> > to the qemu console to complain if they give write errors.  Otherwise
> > the errno value and other important information will be lost, which
> > makes debugging hard.
>
> If by 'qemu console' you mean stderr, then fine, but please don't
> spew log messages to the  monitor console, because that'll make it
> very hard to interact with reliably from management tools.

Actually I think a better default would be for qemu to die right there and 
then.  If the host is getting IO errors then something is badly wrong, and 
you're probably screwed anyway.

Paul




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Ian Jackson
Daniel P. Berrange writes ("Re: [Qemu-devel] [PATCH] bdrv_flush error 
handling"):
> On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
> > Finally, it would perhaps be best if the block device emulators wrote
> > to the qemu console to complain if they give write errors.  Otherwise
> > the errno value and other important information will be lost, which
> > makes debugging hard.
> 
> If by 'qemu console' you mean stderr, then fine, but please don't
> spew log messages to the  monitor console, because that'll make it
> very hard to interact with reliably from management tools. 

Is it permitted, then, to generally print to qemu's stderr from inside
a device model ?  This seems to be done quite a bit during
initialisation, and quite a bit in various rather less common kinds of
device, but not routinely in the main emulations.

Perhaps it would be better to invent a `logprintf' function to make it
easier to redirect these kind of messages later if that turns out to
be desirable ?

Ian.




Re: [Qemu-devel] [PATCH] bdrv_flush error handling

2008-02-20 Thread Daniel P. Berrange
On Wed, Feb 20, 2008 at 03:53:46PM +, Ian Jackson wrote:
Content-Description: message body text
> bdrv_flush is declared to return void, but this is wrong because it
> means that the implementations have nowhere to report their errors.
> Indeed, the implementations generally ignore errors.
> 
> This patch corrects this by making it return int (implicitly, either 0
> or -errno, as for other similar functions).  All of the
> implementations and callers are adjusted too.
> 
> 
> While looking at this I was surprised to see this:
> 
> static void scsi_write_complete(void * opaque, int ret)
> ...
>   if (ret) {
>   fprintf(stderr, "scsi-disc: IO write error\n");
>   exit(1);
>   }
> 
> Surely that is overkill ?

Yes, any i/o errors I'd expect to be propagated back up to the guest
OS as the most appropriate IDE / SCSI error code.

> Also, in block-raw-posix.c, raw_pwrite et al seem to return -1 on
> error (the return value from write) whereas the other block read/write
> methods return errno values.  This is a mistake, surely ?  -1 would be
> -EPERM.  If any of the callers did anything with these return values
> you'd get incorrect error indications.
> 
> 
> Finally, it would perhaps be best if the block device emulators wrote
> to the qemu console to complain if they give write errors.  Otherwise
> the errno value and other important information will be lost, which
> makes debugging hard.

If by 'qemu console' you mean stderr, then fine, but please don't
spew log messages to the  monitor console, because that'll make it
very hard to interact with reliably from management tools. 

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-   Perl modules: http://search.cpan.org/~danberr/  -=|
|=-   Projects: http://freshmeat.net/~danielpb/   -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|