I'm using both of them applied on top of 2.0 in production and have no
problems with them. I'm using NFS exclusively with cache=none.
So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
extra patches applied or reverted, on VM that is running fio, am I correct?
Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
prepared image, and I have one hundred percent hit rate for the
original issue with it.
I've reproduced your IO hang with 2.0 and both
9b1786829aefb83f37a8f3135e3ea91c56001b56 and
a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.
Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the
problem (but reintroduces block-migration hang). It's seems like qemu
bug rather than guest problem, as no-kvmclock parameters makes no
difference. IO just stops, all qemu IO threads die off. Almost like it
forgets to migrate them:-)
I'm attaching backtrace from guest kernel and qemu and qemu command line.
Going to compile 2.1-rc.
--
mg
[ 254.634525] SysRq : Show Blocked State
[ 254.635041] task PC stack pid father
[ 254.635304] kworker/0:2 D ffff88013fc145c0 0 83 2 0x00000000
[ 254.635304] Workqueue: xfs-log/vdb xfs_log_worker [xfs]
[ 254.635304] ffff880136bdfa58 0000000000000046 ffff880136bdffd8
00000000000145c0
[ 254.635304] ffff880136bdffd8 00000000000145c0 ffff880136ad8000
ffff88013fc14e88
[ 254.635304] ffff880037bd4380 ffff880037bc5068 ffff880037bd43b0
ffff880037bd4380
[ 254.635304] Call Trace:
[ 254.635304] [<ffffffff815e797d>] io_schedule+0x9d/0x140
[ 254.635304] [<ffffffff812921d5>] get_request+0x1b5/0x790
[ 254.635304] [<ffffffff81086ab0>] ? wake_up_bit+0x30/0x30
[ 254.635304] [<ffffffff81294236>] blk_queue_bio+0x96/0x390
[ 254.635304] [<ffffffff812904e2>] generic_make_request+0xe2/0x130
[ 254.635304] [<ffffffff812905a1>] submit_bio+0x71/0x150
[ 254.635304] [<ffffffff811e72c8>] ? bio_alloc_bioset+0x1e8/0x2e0
[ 254.635304] [<ffffffffa03310bb>] _xfs_buf_ioapply+0x2bb/0x3d0 [xfs]
[ 254.635304] [<ffffffffa038d3ef>] ? xlog_bdstrat+0x1f/0x50 [xfs]
[ 254.635304] [<ffffffffa03328e6>] xfs_buf_iorequest+0x46/0xa0 [xfs]
[ 254.635304] [<ffffffffa038d3ef>] xlog_bdstrat+0x1f/0x50 [xfs]
[ 254.635304] [<ffffffffa038f135>] xlog_sync+0x265/0x450 [xfs]
[ 254.635304] [<ffffffffa038f3b2>] xlog_state_release_iclog+0x92/0xb0 [xfs]
[ 254.635304] [<ffffffffa039016a>] _xfs_log_force+0x15a/0x290 [xfs]
[ 254.635304] [<ffffffff810115d6>] ? __switch_to+0x136/0x490
[ 254.635304] [<ffffffffa03902c6>] xfs_log_force+0x26/0x80 [xfs]
[ 254.635304] [<ffffffffa0390344>] xfs_log_worker+0x24/0x50 [xfs]
[ 254.635304] [<ffffffff8107e02b>] process_one_work+0x17b/0x460
[ 254.635304] [<ffffffff8107edfb>] worker_thread+0x11b/0x400
[ 254.635304] [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
[ 254.635304] [<ffffffff81085aef>] kthread+0xcf/0xe0
[ 254.635304] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[ 254.635304] [<ffffffff815f24ec>] ret_from_fork+0x7c/0xb0
[ 254.635304] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[ 254.635304] fio D ffff88013fc145c0 0 772 770 0x00000000
[ 254.635304] ffff8800bba4b8c8 0000000000000082 ffff8800bba4bfd8
00000000000145c0
[ 254.635304] ffff8800bba4bfd8 00000000000145c0 ffff8801376ff1c0
ffff88013fc14e88
[ 254.635304] ffff880037bd4380 ffff880037baba90 ffff880037bd43b0
ffff880037bd4380
[ 254.635304] Call Trace:
[ 254.635304] [<ffffffff815e797d>] io_schedule+0x9d/0x140
[ 254.635304] [<ffffffff812921d5>] get_request+0x1b5/0x790
[ 254.635304] [<ffffffff81086ab0>] ? wake_up_bit+0x30/0x30
[ 254.635304] [<ffffffff81294236>] blk_queue_bio+0x96/0x390
[ 254.635304] [<ffffffff812904e2>] generic_make_request+0xe2/0x130
[ 254.635304] [<ffffffff812905a1>] submit_bio+0x71/0x150
[ 254.635304] [<ffffffff811ed26c>] do_blockdev_direct_IO+0x14bc/0x2620
[ 254.635304] [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[ 254.635304] [<ffffffff811ee425>] __blockdev_direct_IO+0x55/0x60
[ 254.635304] [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[ 254.635304] [<ffffffffa032aaec>] xfs_vm_direct_IO+0x15c/0x180 [xfs]
[ 254.635304] [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[ 254.635304] [<ffffffff81143563>] generic_file_aio_read+0x6d3/0x750
[ 254.635304] [<ffffffff810b69c8>] ? ktime_get_ts+0x48/0xe0
[ 254.635304] [<ffffffff811030cf>] ? delayacct_end+0x8f/0xb0
[ 254.635304] [<ffffffff815e6a32>] ? down_read+0x12/0x30
[ 254.635304] [<ffffffffa0337224>] xfs_file_aio_read+0x154/0x2e0 [xfs]
[ 254.635304] [<ffffffffa03370d0>] ? xfs_file_splice_read+0x140/0x140 [xfs]
[ 254.635304] [<ffffffff811fd6a8>] do_io_submit+0x3b8/0x840
[ 254.635304] [<ffffffff811fdb40>] SyS_io_submit+0x10/0x20
[ 254.635304] [<ffffffff815f2599>] system_call_fastpath+0x16/0x1b
Thread 3 (Thread 0x7f4250f50700 (LWP 11955)):
#0 0x00007f4253d1a897 in ioctl () from /lib64/libc.so.6
#1 0x00007f4257f8adf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f4258e2aa90,
type=type@entry=44672)
at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796
#2 0x00007f4257f8af35 in kvm_cpu_exec (cpu=cpu@entry=0x7f4258e2aa90) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681
#3 0x00007f4257f3071c in qemu_kvm_cpu_thread_fn (arg=0x7f4258e2aa90) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#4 0x00007f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5 0x00007f4253d22dad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f424b5ff700 (LWP 11957)):
#0 0x00007f4253fecd0c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x00007f425802c019 in qemu_cond_wait (cond=cond@entry=0x7f4258f0cfc0,
mutex=mutex@entry=0x7f4258f0cff0)
at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135
#2 0x00007f4257f2070b in vnc_worker_thread_loop
(queue=queue@entry=0x7f4258f0cfc0)
at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222
#3 0x00007f4257f20ae0 in vnc_worker_thread (arg=0x7f4258f0cfc0) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323
#4 0x00007f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5 0x00007f4253d22dad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f4257cc6900 (LWP 11952)):
#0 0x00007f4253d19286 in ppoll () from /lib64/libc.so.6
#1 0x00007f4257eecd79 in ppoll (__ss=0x0, __timeout=0x7ffffc03af40,
__nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=883000000)
at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:316
#3 0x00007f4257eb02d4 in os_host_main_loop_wait (timeout=883000000) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:229
#4 main_loop_wait (nonblocking=<optimized out>) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:484
#5 0x00007f4257d7c05e in main_loop () at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:2051
#6 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:4507
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name
21eae881-5e6f-4d13-9b7d-0b8279aed737 -S -machine
pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,+kvmclock -m 4096 -realtime
mlock=on -smp 4,sockets=2,cores=10,threads=1 -uuid
21eae881-5e6f-4d13-9b7d-0b8279aed737 -smbios type=0,vendor=HAL 9000 -smbios
type=1,manufacturer=testcloud -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,clock=vm,driftfix=slew -no-hpet -global
kvm-pit.lost_tick_policy=discard -no-shutdown -boot order=dc,menu=on,strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
file=/mnt/nfs/volumes/e919ceff-8344-4de5-82da-db49a20c4c87/active.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=threads,bps_rd=68157440,bps_wr=68157440,iops_rd=325,iops_wr=325
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
-drive
file=/mnt/nfs/volumes/f2fb6c59-2960-4976-aaa1-6154f55f6a66/active.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none,aio=threads,bps_rd=68157440,bps_wr=68157440,iops_rd=325,iops_wr=325
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
-drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:07:6f:fb,bus=pci.0,addr=0x3
-netdev tap,fd=25,id=hostnet1,vhost=on,vhostfd=26 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:39:21:d3,bus=pci.0,addr=0x4
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-chardev
socket,id=charchannel0,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.agent,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.testcloud.agent,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.testcloud.guest_agent.1
-device usb-tablet,id=input0 -vnc 0.0.0.0:1,password -device
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:0.0.0.0:49152 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on -device pvpanic