Upstream discussion about this bug: https://lkml.org/lkml/2015/2/11/247
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1413540 Title: soft lockup issues with nested KVM VMs running tempest Status in linux package in Ubuntu: Confirmed Bug description: [Impact] Users of nested KVM for testing openstack have soft lockups as follows: PID: 22262 TASK: ffff8804274bb000 CPU: 1 COMMAND: "qemu-system-x86" #0 [ffff88043fd03d18] machine_kexec at ffffffff8104ac02 #1 [ffff88043fd03d68] crash_kexec at ffffffff810e7203 #2 [ffff88043fd03e30] panic at ffffffff81719ff4 #3 [ffff88043fd03ea8] watchdog_timer_fn at ffffffff8110d7c5 #4 [ffff88043fd03ed8] __run_hrtimer at ffffffff8108e787 #5 [ffff88043fd03f18] hrtimer_interrupt at ffffffff8108ef4f #6 [ffff88043fd03f80] local_apic_timer_interrupt at ffffffff81043537 #7 [ffff88043fd03f98] smp_apic_timer_interrupt at ffffffff81733d4f #8 [ffff88043fd03fb0] apic_timer_interrupt at ffffffff817326dd --- <IRQ stack> --- #9 [ffff880426f0d958] apic_timer_interrupt at ffffffff817326dd [exception RIP: generic_exec_single+130] RIP: ffffffff810dbe62 RSP: ffff880426f0da00 RFLAGS: 00000202 RAX: 0000000000000002 RBX: ffff880426f0d9d0 RCX: 0000000000000001 RDX: ffffffff8180ad60 RSI: 0000000000000000 RDI: 0000000000000286 RBP: ffff880426f0da30 R8: ffffffff8180ad48 R9: ffff88042713bc68 R10: 00007fe7d1f2dbd0 R11: 0000000000000206 R12: ffff8804274bb000 R13: 0000000000000000 R14: ffff880407670280 R15: 0000000000000000 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #10 [ffff880426f0da38] smp_call_function_single at ffffffff810dbf75 #11 [ffff880426f0dab0] smp_call_function_many at ffffffff810dc3a6 #12 [ffff880426f0db10] native_flush_tlb_others at ffffffff8105c8f7 #13 [ffff880426f0db38] flush_tlb_mm_range at ffffffff8105c9cb #14 [ffff880426f0db68] pmdp_splitting_flush at ffffffff8105b80d #15 [ffff880426f0db88] __split_huge_page at ffffffff811ac90b #16 [ffff880426f0dc20] split_huge_page_to_list at ffffffff811acfb8 #17 [ffff880426f0dc48] __split_huge_page_pmd at ffffffff811ad956 #18 [ffff880426f0dcc8] unmap_page_range at ffffffff8117728d #19 [ffff880426f0dda0] unmap_single_vma at ffffffff81177341 #20 [ffff880426f0ddd8] zap_page_range at ffffffff811784cd #21 [ffff880426f0de90] sys_madvise at ffffffff81174fbf #22 [ffff880426f0df80] system_call_fastpath at ffffffff8173196d RIP: 00007fe7ca2cc647 RSP: 00007fe7be9febf0 RFLAGS: 00000293 RAX: 000000000000001c RBX: ffffffff8173196d RCX: ffffffffffffffff RDX: 0000000000000004 RSI: 00000000007fb000 RDI: 00007fe7be1ff000 RBP: 0000000000000000 R8: 0000000000000000 R9: 00007fe7d1cd2738 R10: 00007fe7d1f2dbd0 R11: 0000000000000206 R12: 00007fe7be9ff700 R13: 00007fe7be9ff9c0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 000000000000001c CS: 0033 SS: 002b [Test Case] - Deploy openstack on openstack - Run tempest on L1 cloud - Check kernel log of L1 nova-compute nodes (Although this may not necessarily be related to nested KVM) Potentially related: https://lkml.org/lkml/2014/11/14/656 -- Original Description: When installing qemu-kvm on a VM, KSM is enabled. I have encountered this problem in trusty:$ lsb_release -a Distributor ID: Ubuntu Description: Ubuntu 14.04.1 LTS Release: 14.04 Codename: trusty $ uname -a Linux juju-gema-machine-2 3.13.0-40-generic #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux The way to see the behaviour: 1) $ more /sys/kernel/mm/ksm/run 0 2) $ sudo apt-get install qemu-kvm 3) $ more /sys/kernel/mm/ksm/run 1 To see the soft lockups, deploy a cloud on a virtualised env like ctsstack, run tempest on it, the compute nodes of the virtualised deployment will eventually stop responding with (run tempest 2 times at least): 24096.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24124.072003] BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:24791] [24152.072002] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24180.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24208.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24236.072004] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] [24264.072003] BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-x86:24791] I am not sure whether the problem is that we are enabling KSM on a VM or the problem is that nested KSM is not behaving properly. Either way I can easily reproduce, please contact me if you need further details. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413540/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp