On 1/18/21 8:59 AM, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berra...@redhat.com) wrote:
>> On Thu, Jan 14, 2021 at 04:27:28PM -0800, Dongli Zhang wrote:
>>> The virtio device/driver (e.g., vhost-scsi and indeed any device including
>>> e1000e) may hang due to the lost of IRQ or the lost of doorbell register
>>> kick, e.g.,
>>>
>>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!K_zaQzJhlvPjRZe9efEtyX8vB6fMlKQeNy_RGz7oPp9k76pC8zarG1nSs1SFSL2xI1g$
>>>  
>>>
>>> The virtio-net was in trouble in above link because the 'kick' was not
>>> taking effect (missed).
>>>
>>> This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help
>>> narrow down if the issue is due to lost of irq/kick. So far the new
>>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>>> e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy
>>> IRQ).
>>>
>>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>>> on purpose by admin at QEMU/host side for a specific device.
>>
>> I'm really not convinced that we want to give admins the direct ability to
>> poke at internals of devices in a running QEMU. It feels like there is way
>> too much potential for the admin to make a situation far worse by doing
>> the wrong thing here,
> 
> We already do have commands to write to an iport, and to inject MCEs for
> example; is this that much different?
> 
>> and people dealing with support tickets will have
>> no idea that the admin has been poking internals of the device and broken
>> it by doing something wrong.
> 
> You could add a one time log entry to say that this mischeivous command
> had been used.
> 
>> You pointed to bug that hit where this could conceivably be useful, but
>> that's a one time issue and should not a common occurrance that justifies
>> making an official public API to poke at devices forever more IMHO.
> 
> I think where it might be practically useful is if you were debugging a
> hung customers VM and need to find a way to get it to move again.
> THat's something I'm not familiar with on the virtio side;
> mst - is this useful from a virtio side?

BTW, the linux kernel blk-mq has similar idea/interface. To run the below will
be able to 'run' the block IO queue on purpose.

echo "kick" > /sys/kernel/debug/block/sda/state

It is helpful for diagnostic if we assume the IO stall is due to an unknown race
that a 'run' of queue is missing.

Dongli Zhang

> 
> Dave
> 
>> Regards,
>> Daniel
>> -- 

Reply via email to