Hi all, we've run into a really awkward customer situation where the guest would hang forever due to an SG_IO ioctl on the host not returning. Looking into it we found that qemu will submit direct I/O requests with an _infinite_ timeout (well, actually UINT_MAX, which due to a kernel bug gets translated into (ULONG)-2, resulting in a timeout of 4.2 years :-). And this particular I/O ran into a timeout on the wire due to a flaky connection. Which resulted in the 'normal' block-level timeout on the host being disabled, and the SCSI stack never sending any aborts as the block-layer was still waiting for the I/O timeout to expire.
Unfortunately I didn't find a way to create a stand-alone patch; the fix I'm proposing relies on fixes for qemu running on the host and the kernel side running on the guest. The proposed fix consists of several parts: - make the standard device-timeout user-settable via a 'timeout' attribute to 'scsi-disk' and 'scsi-generic' - Add a kernel patch to implement a eh_timeout_handler() for virtio_scsi(); this patch just checks if the command is still pending and resets the timer if so. - Add a request timeout to allow drivers to modify the timeout on a per-request base. - Implement a new VIRTIO_SCSI_F_TIMEOUT feature allowing virtio-scsi to pass in a timeout via the otherwise unused 'crn' field. - Add a kernel patch to implement the VIRTIO_SCSI_F_TIMEOUT feature so that the timeout is added per virtio request. With that virtio-scsi on the guest can pass in the used timeout to the qemu on the host side, which then can use this timeout to issue I/O requests to the host. The host can then properly aborting a command if the timeout is hit, and the aborted command will be returned to the guest. The guest itself doesn't need to (and, in fact, in most cases can't) abort any commands anymore, so it just need to reset the I/O timer until the requests are returned. However, as this is quite an elaborate construct I'd like to get some feedback for it. Hannes Reinecke (4): scsi: make default command timeout user-settable scsi: use host default timeouts for SCSI commands scsi: per-request timeouts virtio: implement VIRTIO_SCSI_F_TIMEOUT feature hw/scsi/scsi-bus.c | 1 + hw/scsi/scsi-disk.c | 16 ++++++++++++---- hw/scsi/scsi-generic.c | 11 +++++++++-- hw/scsi/virtio-scsi.c | 16 ++++++++++++++++ include/hw/scsi/scsi.h | 2 ++ include/standard-headers/linux/virtio_scsi.h | 1 + 6 files changed, 41 insertions(+), 6 deletions(-) -- 2.12.0