[Expired for linux (Ubuntu Yakkety) because there has been no activity for 60 days.]
** Changed in: linux (Ubuntu Yakkety) Status: Incomplete => Expired -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1596941 Title: KVM deadlock on KVM guest migration with latest QEMU (mitaka) from Xenial (or Mitaka Ubuntu Cloud Archive) Status in linux package in Ubuntu: Expired Status in linux source package in Trusty: Expired Status in linux source package in Vivid: Expired Status in linux source package in Wily: Expired Status in linux source package in Xenial: Expired Status in linux source package in Yakkety: Expired Bug description: It was brought to my knowledge that qemu-kvm live migration (with full storage copy) on Trusty + Mitaka Ubuntu Cloud Archive was broken. When investigating I stepped into the following situation: crash> sys KERNEL: /usr/lib/debug/boot/vmlinux-3.13.0-86-generic DUMPFILE: ./201606241546/dump.201606241546 [PARTIAL DUMP] CPUS: 4 DATE: Fri Jun 24 15:46:39 2016 UPTIME: 00:06:00 LOAD AVERAGE: 1.00, 0.60, 0.26 TASKS: 146 NODENAME: vmqemulivefail1 RELEASE: 3.13.0-86-generic VERSION: #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016 MACHINE: x86_64 (2494 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: hung_task: blocked tasks" Full backtrace doesn't have anything useful since i've configured kernel.softlockup_panic. From scheduled-out tasks (and from kern.log) I was able to see that in more than one occasion I had the qemu process possibly dead-locked when dealing with asynchronous page faults: ## kernel 3.13 # dump 1 PID: 1604 TASK: ffff8800374be000 CPU: 3 COMMAND: "qemu-system-x86" #0 [ffff8800ba115e28] __schedule at ffffffff8172e379 #1 [ffff8800ba115e90] schedule at ffffffff8172e859 #2 [ffff8800ba115ea0] kvm_async_pf_task_wait at ffffffff8105060f #3 [ffff8800ba115f38] do_async_page_fault at ffffffff81736090 #4 [ffff8800ba115f50] async_page_fault at ffffffff81732cd8 RIP: 00007fb4eff0a4b3 RSP: 00007fb4713facb0 RFLAGS: 00010206 RAX: 00007fb4cb9cf000 RBX: 00007fb4f166d8f0 RCX: 0000000000000010 RDX: 0000000000001fff RSI: 00007fb4cb9deff8 RDI: 4000000000000000 RBP: 0000000000000000 R8: 0000000000000000 R9: 00000002601b0000 R10: 00fffffffffffe00 R11: 0000000000001fff R12: 0000000000000008 R13: 00007fb4713fad84 R14: 00007fb4f1665290 R15: 00007fb4713fad88 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b # dump 2 PID: 1735 TASK: ffff8800b9bcb000 CPU: 2 COMMAND: "qemu-system-x86" #0 [ffff8802333c9e28] __schedule at ffffffff8172e379 #1 [ffff8802333c9e90] schedule at ffffffff8172e859 #2 [ffff8802333c9ea0] kvm_async_pf_task_wait at ffffffff8105060f #3 [ffff8802333c9f38] do_async_page_fault at ffffffff81736090 #4 [ffff8802333c9f50] async_page_fault at ffffffff81732cd8 RIP: 00007f631399d3b0 RSP: 00007f62912c7990 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 00007f6315f9e370 RCX: 00007f62ca714000 RDX: 0000000032914020 RSI: 0000000000001000 RDI: 00007f62ca714000 RBP: 00007f6315c66e40 R8: 00007f62912c7a40 R9: 00007f6315f9e3e0 R10: 0000000000000000 R11: 0000000032914020 R12: 0000000032914020 R13: 0000000000032914 R14: 00000000ffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b # dump 3 PID: 1617 TASK: ffff880232834800 CPU: 3 COMMAND: "qemu-system-x86" #0 [ffff880232a6de28] __schedule at ffffffff8172e379 #1 [ffff880232a6de90] schedule at ffffffff8172e859 #2 [ffff880232a6dea0] kvm_async_pf_task_wait at ffffffff8105060f #3 [ffff880232a6df38] do_async_page_fault at ffffffff81736090 #4 [ffff880232a6df50] async_page_fault at ffffffff81732cd8 RIP: 00007f8c39e8b3b0 RSP: 00007f8bb80c9990 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 00007f8c3aeba370 RCX: 00007f8bdea18000 RDX: 0000000022c18020 RSI: 0000000000001000 RDI: 00007f8bdea18000 RBP: 00007f8c3ab82e40 R8: 00007f8bb80c9a40 R9: 00007f8c3aeba498 R10: 0000000000000000 R11: 0000000022c18020 R12: 0000000022c18020 R13: 0000000000022c18 R14: 00000000ffffffff R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b ## kernel 4.4 # kern.log 544 [ 360.282132] INFO: task qemu-system-x86:1592 blocked for more than 120 seconds. 545 [ 360.282984] Not tainted 4.4.0-27-generic #46~14.04.1-Ubuntu 546 [ 360.283581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 547 [ 360.284439] qemu-system-x86 D ffff8800bb833e90 0 1592 1 0x00000000 548 [ 360.284443] ffff8800bb833e90 ffff88023151c4c0 ffff8802345eb700 ffff8800bb834000 549 [ 360.284444] 0000000000000010 ffffffff81efe6d0 000055ac8fa05520 00007f88fc7f7d88 550 [ 360.284445] ffff8800bb833ea8 ffffffff817ed5f5 ffff8800bb833ef0 ffff8800bb833f38 551 [ 360.284447] Call Trace: 552 [ 360.284472] [<ffffffff817ed5f5>] schedule+0x35/0x80 553 [ 360.284481] [<ffffffff81060a93>] kvm_async_pf_task_wait+0x1a3/0x1f0 554 [ 360.284487] [<ffffffff810bdc60>] ? prepare_to_wait_event+0xf0/0xf0 555 [ 360.284494] [<ffffffff811fe600>] ? do_sendfile+0x360/0x380 556 [ 360.284495] [<ffffffff81060c55>] do_async_page_fault+0x75/0x80 557 [ 360.284498] [<ffffffff817f2fe8>] async_page_fault+0x28/0x30 558 [ 360.284500] Sending NMI to all CPUs: To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1596941/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp