On Fri, Aug 09, 2013 at 03:05:22PM +0100, Andrei Mikhailovsky wrote: > I can confirm that I am having similar issues with ubuntu vm guests using fio > with bs=4k direct=1 numjobs=4 iodepth=16. Occasionally i see hang tasks, > occasionally guest vm stops responding without leaving anything in the logs > and sometimes i see kernel panic on the console. I typically leave the > runtime of the fio test for 60 minutes and it tends to stop responding after > about 10-30 mins. > > I am on ubuntu 12.04 with 3.5 kernel backport and using ceph 0.61.7 with qemu > 1.5.0 and libvirt 1.0.2
Josh, In addition to the Ceph logs you can also use QEMU tracing with the following events enabled: virtio_blk_handle_write virtio_blk_handle_read virtio_blk_rw_complete See docs/tracing.txt for details on usage. Inspecting the trace output will let you observe the I/O request submission/completion from the virtio-blk device perspective. You'll be able to see whether requests are never being completed in some cases. This bug seems like a corner case or race condition since most requests seem to complete just fine. The problem is that eventually the virtio-blk device becomes unusable when it runs out of descriptors (it has 128). And before that limit is reached the guest may become unusable due to the hung I/O requests. Stefan