Am 23.09.2011 10:55, schrieb Markus Armbruster:
> Kevin Wolf <kw...@redhat.com> writes:
> 
>> Am 19.09.2011 16:09, schrieb Luiz Capitulino:
>>> On Mon, 19 Sep 2011 15:40:06 +0200
>>> Kevin Wolf <kw...@redhat.com> wrote:
>>>
>>>> Am 01.09.2011 20:37, schrieb Luiz Capitulino:
>>>>> This series adds support to the block layer to keep track of devices'
>>>>> I/O status. That information is also made available in QMP and HMP.
>>>>>
>>>>> The goal here is to allow management applications that miss the
>>>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device 
>>>>> has
>>>>> caused the VM to stop and which device caused it.
>>>>>
>>>>> Please, note that this series depends on the following series:
>>>>>
>>>>>  o [PATCH v3 0/8]: Introduce the RunState type
>>>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
>>>>>
>>>>> And to be able to properly test it you'll also need:
>>>>>
>>>>>  o [PATCH 0/3] qcow2/coroutine fixes
>>>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
>>>>>
>>>>> Here's an HMP example:
>>>>>
>>>>>   (qemu) info status 
>>>>>   VM status: paused (io-error)
>>>>>   (qemu) info block
>>>>>   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 
>>>>> encrypted=0
>>>>>   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest 
>>>>> ro=0 drv=qcow2 encrypted=0
>>>>>   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
>>>>>   floppy0: removable=1 locked=0 [not inserted]
>>>>>   sd0: removable=1 locked=0 [not inserted]
>>>>>
>>>>> The "info status" command shows that the VM is stopped due to an I/O 
>>>>> error.
>>>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' 
>>>>> device
>>>>> caused the error, which turns out to be due to no space.
>>>>
>>>> Looks like I didn't reply here yet?
>>>
>>> No, you didn't.
>>>
>>>> I still don't quite like that the devices are involved, but their part
>>>> is minimal and it makes the implementation much easier, so for me that's
>>>> acceptable.
>>>
>>> Suggestions on better ways of implementing this are welcome! :)
>>
>> I don't have one. :-)
>>
>> Surely it would be possible to do everything in block.c, but I see that
>> this would make things much more complicated (would need to get an AIO
>> callback added to the chain that can save the error code). It's not
>> worth the trouble.
> 
> The block layer necessarily detects errors in a huge number of places.
> 
> It also has a rich and complex interface to device models, with error
> reporting in many places.
> 
> virtio-blk, ide and scsi-disk each have a single error handler to handle
> block layer errors.
> 
> For better or worse, these are the only block layer error "chokepoints"
> we have now, and therefore the only easy places to set BDS I/O status.
> But they're still wrong.
> 
> The block layer shouldn't require device models to tell it what it
> already knows.
> 
> Now, I don't want to block your series just because it adds this wart.
> But the wart should be documented in the code.
> 
> Also document that any device model implementing werror=stop|enospc or
> rerror=stop must update I/O status.  Extra points if you find a way to
> enforce it instead.
> 
> Apropos enforcing.  Currently, -drive accepts any werror and rerror
> action with if={ide,virtio,scsi,none}.  We rely on device models not
> implementing an action to check and fail during initialization.
> scsi-generic and the floppy devices do.  All the other qdevified devices
> don't, and that's broken.
> 
> Are werror and rerror really host state?

If you want to convince me of the opposite, show me the real device that
implements them.

In fact, for the implementation of rerror/werror the very same thing is
true as you say about I/O status updates. Other than being easy to
implement, there is really no reason to do it in the devices. The block
layer could just as well stop the VM and queue the requests for
resubmission when the VM is restarted. Of course, you can't keep
migration compatibility when doing such a fundamental change to the
design, but if you take the host state thing serious, this is what you
should do.

Kevin

Reply via email to