The virtio device/driver (e.g., vhost-scsi and indeed any device including e1000e) may hang due to the lost of IRQ or the lost of doorbell register kick, e.g.,
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html The virtio-net was in trouble in above link because the 'kick' was not taking effect (missed). This RFC adds a new debug interface 'DeviceEvent' to DeviceClass to help narrow down if the issue is due to lost of irq/kick. So far the new interface handles only two events: 'call' and 'kick'. Any device (e.g., e1000e or vhost-scsi) may implement (e.g., via eventfd, MSI-X or legacy IRQ). The 'call' is to inject irq on purpose by admin for a specific device (e.g., vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell on purpose by admin at QEMU/host side for a specific device. This device can also be used as a workaround if call/kick is lost due to virtualization software (e.g., kernel or QEMU) issue. Below is from live crash analysis. Initially, the queue=3 has count=30 for 'kick' eventfd_ctx. Suppose there is data in vring avail while there is no used available. We suspect this is because vhost-scsi was not notified by VM. In order to narrow down and analyze the issue, we use live crash to dump the current counter of eventfd for queue=3. crash> eventfd_ctx ffffa10392537ac0 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } } } }, head = { next = 0xffffa104ae40d360, prev = 0xffffa104ae40d360 } }, count = 30, -----> eventfd is 30 !!! flags = 526336, id = 26 } Now we kick the doorbell for vhost-scsi queue=3 on purpose for diagnostic with this interface. { "execute": "x-debug-device-event", "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick", "queue": 3 } } The counter increased to 31. Suppose the hang issue is addressed, it indicates something bad is in software that the 'kick' is lost. crash> eventfd_ctx ffffa10392537ac0 struct eventfd_ctx { kref = { refcount = { refs = { counter = 4 } } }, wqh = { lock = { { rlock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } } } }, head = { next = 0xffffa104ae40d360, prev = 0xffffa104ae40d360 } }, count = 31, -----> eventfd incremented to 31 !!! flags = 526336, id = 26 } Only the interface for vhost-scsi is implemented since this is RFC. I will implement for other types (e.g., eventfd or MSI-X) if the RFC is reasonable. Thank you very much! Dongli Zhang