I'm using both of them applied on top of 2.0 in production and have no
problems with them. I'm using NFS exclusively with cache=none.

So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
extra patches applied or reverted, on VM that is running fio, am I correct?


Yes, exactly. ISCSI-based setup can take some minutes to deploy, given
prepared image, and I have one hundred percent hit rate for the
original issue with it.

I've reproduced your IO hang with 2.0 and both 9b1786829aefb83f37a8f3135e3ea91c56001b56 and a096b3a6732f846ec57dc28b47ee9435aa0609bf applied.

Reverting 9b1786829aefb83f37a8f3135e3ea91c56001b56 indeed fixes the problem (but reintroduces block-migration hang). It's seems like qemu bug rather than guest problem, as no-kvmclock parameters makes no difference. IO just stops, all qemu IO threads die off. Almost like it forgets to migrate them:-)

I'm attaching backtrace from guest kernel and qemu and qemu command line.

Going to compile 2.1-rc.

--
mg
[  254.634525] SysRq : Show Blocked State
[  254.635041]   task                        PC stack   pid father
[  254.635304] kworker/0:2     D ffff88013fc145c0     0    83      2 0x00000000
[  254.635304] Workqueue: xfs-log/vdb xfs_log_worker [xfs]
[  254.635304]  ffff880136bdfa58 0000000000000046 ffff880136bdffd8 
00000000000145c0
[  254.635304]  ffff880136bdffd8 00000000000145c0 ffff880136ad8000 
ffff88013fc14e88
[  254.635304]  ffff880037bd4380 ffff880037bc5068 ffff880037bd43b0 
ffff880037bd4380
[  254.635304] Call Trace:
[  254.635304]  [<ffffffff815e797d>] io_schedule+0x9d/0x140
[  254.635304]  [<ffffffff812921d5>] get_request+0x1b5/0x790
[  254.635304]  [<ffffffff81086ab0>] ? wake_up_bit+0x30/0x30
[  254.635304]  [<ffffffff81294236>] blk_queue_bio+0x96/0x390
[  254.635304]  [<ffffffff812904e2>] generic_make_request+0xe2/0x130
[  254.635304]  [<ffffffff812905a1>] submit_bio+0x71/0x150
[  254.635304]  [<ffffffff811e72c8>] ? bio_alloc_bioset+0x1e8/0x2e0
[  254.635304]  [<ffffffffa03310bb>] _xfs_buf_ioapply+0x2bb/0x3d0 [xfs]
[  254.635304]  [<ffffffffa038d3ef>] ? xlog_bdstrat+0x1f/0x50 [xfs]
[  254.635304]  [<ffffffffa03328e6>] xfs_buf_iorequest+0x46/0xa0 [xfs]
[  254.635304]  [<ffffffffa038d3ef>] xlog_bdstrat+0x1f/0x50 [xfs]
[  254.635304]  [<ffffffffa038f135>] xlog_sync+0x265/0x450 [xfs]
[  254.635304]  [<ffffffffa038f3b2>] xlog_state_release_iclog+0x92/0xb0 [xfs]
[  254.635304]  [<ffffffffa039016a>] _xfs_log_force+0x15a/0x290 [xfs]
[  254.635304]  [<ffffffff810115d6>] ? __switch_to+0x136/0x490
[  254.635304]  [<ffffffffa03902c6>] xfs_log_force+0x26/0x80 [xfs]
[  254.635304]  [<ffffffffa0390344>] xfs_log_worker+0x24/0x50 [xfs]
[  254.635304]  [<ffffffff8107e02b>] process_one_work+0x17b/0x460
[  254.635304]  [<ffffffff8107edfb>] worker_thread+0x11b/0x400
[  254.635304]  [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
[  254.635304]  [<ffffffff81085aef>] kthread+0xcf/0xe0
[  254.635304]  [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[  254.635304]  [<ffffffff815f24ec>] ret_from_fork+0x7c/0xb0
[  254.635304]  [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
[  254.635304] fio             D ffff88013fc145c0     0   772    770 0x00000000
[  254.635304]  ffff8800bba4b8c8 0000000000000082 ffff8800bba4bfd8 
00000000000145c0
[  254.635304]  ffff8800bba4bfd8 00000000000145c0 ffff8801376ff1c0 
ffff88013fc14e88
[  254.635304]  ffff880037bd4380 ffff880037baba90 ffff880037bd43b0 
ffff880037bd4380
[  254.635304] Call Trace:
[  254.635304]  [<ffffffff815e797d>] io_schedule+0x9d/0x140
[  254.635304]  [<ffffffff812921d5>] get_request+0x1b5/0x790
[  254.635304]  [<ffffffff81086ab0>] ? wake_up_bit+0x30/0x30
[  254.635304]  [<ffffffff81294236>] blk_queue_bio+0x96/0x390
[  254.635304]  [<ffffffff812904e2>] generic_make_request+0xe2/0x130
[  254.635304]  [<ffffffff812905a1>] submit_bio+0x71/0x150
[  254.635304]  [<ffffffff811ed26c>] do_blockdev_direct_IO+0x14bc/0x2620
[  254.635304]  [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [<ffffffff811ee425>] __blockdev_direct_IO+0x55/0x60
[  254.635304]  [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [<ffffffffa032aaec>] xfs_vm_direct_IO+0x15c/0x180 [xfs]
[  254.635304]  [<ffffffffa032bc30>] ? xfs_get_blocks+0x20/0x20 [xfs]
[  254.635304]  [<ffffffff81143563>] generic_file_aio_read+0x6d3/0x750
[  254.635304]  [<ffffffff810b69c8>] ? ktime_get_ts+0x48/0xe0
[  254.635304]  [<ffffffff811030cf>] ? delayacct_end+0x8f/0xb0
[  254.635304]  [<ffffffff815e6a32>] ? down_read+0x12/0x30
[  254.635304]  [<ffffffffa0337224>] xfs_file_aio_read+0x154/0x2e0 [xfs]
[  254.635304]  [<ffffffffa03370d0>] ? xfs_file_splice_read+0x140/0x140 [xfs]
[  254.635304]  [<ffffffff811fd6a8>] do_io_submit+0x3b8/0x840
[  254.635304]  [<ffffffff811fdb40>] SyS_io_submit+0x10/0x20
[  254.635304]  [<ffffffff815f2599>] system_call_fastpath+0x16/0x1b

Thread 3 (Thread 0x7f4250f50700 (LWP 11955)):
#0  0x00007f4253d1a897 in ioctl () from /lib64/libc.so.6
#1  0x00007f4257f8adf9 in kvm_vcpu_ioctl (cpu=cpu@entry=0x7f4258e2aa90, 
type=type@entry=44672)
    at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1796
#2  0x00007f4257f8af35 in kvm_cpu_exec (cpu=cpu@entry=0x7f4258e2aa90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/kvm-all.c:1681
#3  0x00007f4257f3071c in qemu_kvm_cpu_thread_fn (arg=0x7f4258e2aa90) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/cpus.c:873
#4  0x00007f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4253d22dad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f424b5ff700 (LWP 11957)):
#0  0x00007f4253fecd0c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00007f425802c019 in qemu_cond_wait (cond=cond@entry=0x7f4258f0cfc0, 
mutex=mutex@entry=0x7f4258f0cff0)
    at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/util/qemu-thread-posix.c:135
#2  0x00007f4257f2070b in vnc_worker_thread_loop 
(queue=queue@entry=0x7f4258f0cfc0)
    at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:222
#3  0x00007f4257f20ae0 in vnc_worker_thread (arg=0x7f4258f0cfc0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/ui/vnc-jobs.c:323
#4  0x00007f4253fe8f3a in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4253d22dad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f4257cc6900 (LWP 11952)):
#0  0x00007f4253d19286 in ppoll () from /lib64/libc.so.6
#1  0x00007f4257eecd79 in ppoll (__ss=0x0, __timeout=0x7ffffc03af40, 
__nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, 
timeout=timeout@entry=883000000)
    at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/qemu-timer.c:316
#3  0x00007f4257eb02d4 in os_host_main_loop_wait (timeout=883000000) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:229
#4  main_loop_wait (nonblocking=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/main-loop.c:484
#5  0x00007f4257d7c05e in main_loop () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:2051
#6  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/vl.c:4507

/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
21eae881-5e6f-4d13-9b7d-0b8279aed737 -S -machine 
pc-i440fx-2.0,accel=kvm,usb=off -cpu SandyBridge,+kvmclock -m 4096 -realtime 
mlock=on -smp 4,sockets=2,cores=10,threads=1 -uuid 
21eae881-5e6f-4d13-9b7d-0b8279aed737 -smbios type=0,vendor=HAL 9000 -smbios 
type=1,manufacturer=testcloud -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -no-hpet -global 
kvm-pit.lost_tick_policy=discard -no-shutdown -boot order=dc,menu=on,strict=on 
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/mnt/nfs/volumes/e919ceff-8344-4de5-82da-db49a20c4c87/active.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=threads,bps_rd=68157440,bps_wr=68157440,iops_rd=325,iops_wr=325
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0
 -drive 
file=/mnt/nfs/volumes/f2fb6c59-2960-4976-aaa1-6154f55f6a66/active.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none,aio=threads,bps_rd=68157440,bps_wr=68157440,iops_rd=325,iops_wr=325
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1
 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev 
tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:07:6f:fb,bus=pci.0,addr=0x3 
-netdev tap,fd=25,id=hostnet1,vhost=on,vhostfd=26 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:39:21:d3,bus=pci.0,addr=0x4 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.agent,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev 
socket,id=charchannel1,path=/var/lib/libvirt/qemu/21eae881-5e6f-4d13-9b7d-0b8279aed737.testcloud.agent,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.testcloud.guest_agent.1
 -device usb-tablet,id=input0 -vnc 0.0.0.0:1,password -device 
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:0.0.0.0:49152 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -sandbox on -device pvpanic

Reply via email to