Re: [Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
Il 11/09/2014 19:03, Chris Webb ha scritto: Paolo Bonzini pbonz...@redhat.com wrote: This is a hypercall that should have kicked VCPU 3 (see rcx). Can you please apply this patch and gather a trace of the host (using trace-cmd -e kvm qemu-kvm arguments)? Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and have put the resulting trace.dat at http://cdw.me.uk/tmp/trace.dat This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having trouble persuading the -smp 4 qemu to crash as reliably under tracing. (Something timing related?) Otherwise the qemu-system-x86 command line is exactly as before. Do you by chance have CONFIG_DEBUG_RODATA set? In that case, the fix is simply not to set it. Paolo The guest kernel crash message which corresponds to this trace was: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 618 Comm: mkdir Not tainted 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: 88007c997080 ti: 88007c614000 task.ti: 88007c614000 RIP: 0010:[81037fe2] [81037fe2] kvm_unlock_kick+0x72/0x80 RSP: 0018:88007c617d40 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0001 RDX: 0001 RSI: 88007fd11c40 RDI: RBP: 88007fd11c40 R08: 81b98940 R09: 0001 R10: R11: 0007 R12: 00f6 R13: 0001 R14: 0001 R15: 00011c40 FS: 7f43eb1ed700() GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f43eace0a30 CR3: 01a12000 CR4: 000406f0 Stack: 88007c994380 88007c9949aa 0046 81689715 810f3174 0001 ea0001f16320 ea0001f17860 88007c99e1e8 88007c997080 0001 Call Trace: [81689715] ? _raw_spin_unlock+0x45/0x70 [810f3174] ? try_to_wake_up+0x2a4/0x330 [81101e2c] ? __wake_up_common+0x4c/0x80 [81102418] ? __wake_up_sync_key+0x38/0x60 [810d873a] ? do_notify_parent+0x19a/0x280 [810f4d56] ? sched_move_task+0xb6/0x190 [810cb4fc] ? do_exit+0xa1c/0xab0 [810cc344] ? do_group_exit+0x34/0xb0 [810cc3cb] ? SyS_exit_group+0xb/0x10 [8168a16d] ? system_call_fastpath+0x1a/0x1f Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 RIP [81037fe2] kvm_unlock_kick+0x72/0x80 RSP 88007c617d40 ---[ end trace bf5a4445f9decdbb ]--- Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: mkdir/618/0x0006 Modules linked in: CPU: 0 PID: 618 Comm: mkdir Tainted: G D 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 c022d302 81684029 810ee956 81686266 00011c40 88007c617fd8 00011c40 88007c997080 0006 0046 Call Trace: [81684029] ? dump_stack+0x49/0x6a [810ee956] ? __schedule_bug+0x46/0x60 [81686266] ? __schedule+0x5a6/0x7c0 [816828cd] ? printk+0x59/0x75 [810cb33b] ? do_exit+0x85b/0xab0 [816828cd] ? printk+0x59/0x75 [8100614a] ? oops_end+0x7a/0x100 [810033e5] ? do_error_trap+0x85/0x110 [81037fe2] ? kvm_unlock_kick+0x72/0x80 [8114a358] ? __alloc_pages_nodemask+0x108/0xa60 [8168b57e] ? divide_error+0x1e/0x30 [81037fe2] ? kvm_unlock_kick+0x72/0x80 [81689715] ? _raw_spin_unlock+0x45/0x70 [810f3174] ? try_to_wake_up+0x2a4/0x330 [81101e2c] ? __wake_up_common+0x4c/0x80 [81102418] ? __wake_up_sync_key+0x38/0x60 [810d873a] ? do_notify_parent+0x19a/0x280 [810f4d56] ? sched_move_task+0xb6/0x190 [810cb4fc] ? do_exit+0xa1c/0xab0 [810cc344] ? do_group_exit+0x34/0xb0 [810cc3cb] ? SyS_exit_group+0xb/0x10 [8168a16d] ? system_call_fastpath+0x1a/0x1f Best wishes, Chris.
Re: [Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
Paolo Bonzini pbonz...@redhat.com wrote: Il 11/09/2014 19:03, Chris Webb ha scritto: Paolo Bonzini pbonz...@redhat.com wrote: This is a hypercall that should have kicked VCPU 3 (see rcx). Can you please apply this patch and gather a trace of the host (using trace-cmd -e kvm qemu-kvm arguments)? Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and have put the resulting trace.dat at http://cdw.me.uk/tmp/trace.dat This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having trouble persuading the -smp 4 qemu to crash as reliably under tracing. (Something timing related?) Otherwise the qemu-system-x86 command line is exactly as before. Do you by chance have CONFIG_DEBUG_RODATA set? In that case, the fix is simply not to set it. Absolutely right: my host and guest kernels do have CONFIG_DEBUG_RODATA set! Your patch to use alternatives for VMCALL vs VMMCALL definitely fixed the divide-by-zero crashes I saw. Given that I can easily use either (or both) of these solutions, is it be more efficient to turn off CONFIG_DEBUG_RODATA in the guest kernel so kvm can fix up the instructions in-place, or is using alternatives for VMCALL/VMMCALL as implemented by your patch just as good? Best wishes, Chris.
Re: [Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
Il 22/09/2014 21:08, Chris Webb ha scritto: Do you by chance have CONFIG_DEBUG_RODATA set? In that case, the fix is simply not to set it. Absolutely right: my host and guest kernels do have CONFIG_DEBUG_RODATA set! Your patch to use alternatives for VMCALL vs VMMCALL definitely fixed the divide-by-zero crashes I saw. Given that I can easily use either (or both) of these solutions, is it be more efficient to turn off CONFIG_DEBUG_RODATA in the guest kernel so kvm can fix up the instructions in-place, or is using alternatives for VMCALL/VMMCALL as implemented by your patch just as good? I posted a patch to use alternatives if CONFIG_DEBUG_RODATA is enabled, but the bug is in KVM that explicitly documents you can use any of VMCALL or VMMCALL. I'll also see to fix KVM, but the patch is still useful because a) KVM would not patch a read-only text segment, so there would be a small performance benefit; b) you cannot control already deployed hypervisors. However, since there is a workaround, I won't push it into 3.17 so late in the cycle. Also, there's a chance that it is NACKed since it touches non-KVM files. Paolo
Re: [Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
Il 08/09/2014 15:28, Chris Webb ha scritto: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 743 Comm: syslogd Not tainted 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: 88007c972580 ti: 88007cb7c000 task.ti: 88007cb7c000 RIP: 0010:[81037fe2] [81037fe2] kvm_unlock_kick+0x72/0x80 RSP: :88007fc03ec8 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0003 RDX: 0003 RSI: 81a466a0 RDI: RBP: 81a466a0 R08: 81b98940 R09: 0246 R10: 0400 R11: R12: 00ea R13: 0009 R14: 0002 R15: 88007fc0d300 FS: 7f2a6473e700() GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 004a8240 CR3: 7ac75000 CR4: 000406f0 Stack: 81a46400 0246 0001 8168979d 0282 81110d97 0007 88007cb7ffd8 88007c972580 4b0782e8 0002 81a0b0c8 Call Trace: IRQ [8168979d] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [81110d97] ? rcu_process_callbacks+0x337/0x4f0 [810cde2d] ? __do_softirq+0xfd/0x210 [810ce06e] ? irq_exit+0x7e/0xa0 [8103063b] ? smp_apic_timer_interrupt+0x3b/0x50 [8168b04d] ? apic_timer_interrupt+0x6d/0x80 EOI [8114180b] ? filemap_map_pages+0x17b/0x240 [811418c0] ? filemap_map_pages+0x230/0x240 [811679e2] ? do_read_fault.isra.70+0x2a2/0x320 [811696cc] ? handle_mm_fault+0x37c/0xd00 [8103bb45] ? __do_page_fault+0x185/0x4c0 [8168b958] ? async_page_fault+0x28/0x30 [813b9610] ? __put_user_4+0x20/0x30 [8168b958] ? async_page_fault+0x28/0x30 Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 Hi Chris, sorry for not following up on your previous patch. This is a hypercall that should have kicked VCPU 3 (see rcx). Can you please apply this patch and gather a trace of the host (using trace-cmd -e kvm qemu-kvm arguments)? Thanks, diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index fb919c574e23..25ed29f68419 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -709,6 +709,8 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, int result = 0; struct kvm_vcpu *vcpu = apic-vcpu; + trace_kvm_apic_accept_irq(vcpu-vcpu_id, delivery_mode, + trig_mode, vector, false); switch (delivery_mode) { case APIC_DM_LOWEST: vcpu-arch.apic_arb_prio++; @@ -730,8 +732,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } - trace_kvm_apic_accept_irq(vcpu-vcpu_id, delivery_mode, - trig_mode, vector, false); break; case APIC_DM_REMRD: Paolo
Re: [Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
Paolo Bonzini pbonz...@redhat.com wrote: This is a hypercall that should have kicked VCPU 3 (see rcx). Can you please apply this patch and gather a trace of the host (using trace-cmd -e kvm qemu-kvm arguments)? Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and have put the resulting trace.dat at http://cdw.me.uk/tmp/trace.dat This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having trouble persuading the -smp 4 qemu to crash as reliably under tracing. (Something timing related?) Otherwise the qemu-system-x86 command line is exactly as before. The guest kernel crash message which corresponds to this trace was: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 618 Comm: mkdir Not tainted 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: 88007c997080 ti: 88007c614000 task.ti: 88007c614000 RIP: 0010:[81037fe2] [81037fe2] kvm_unlock_kick+0x72/0x80 RSP: 0018:88007c617d40 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0001 RDX: 0001 RSI: 88007fd11c40 RDI: RBP: 88007fd11c40 R08: 81b98940 R09: 0001 R10: R11: 0007 R12: 00f6 R13: 0001 R14: 0001 R15: 00011c40 FS: 7f43eb1ed700() GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f43eace0a30 CR3: 01a12000 CR4: 000406f0 Stack: 88007c994380 88007c9949aa 0046 81689715 810f3174 0001 ea0001f16320 ea0001f17860 88007c99e1e8 88007c997080 0001 Call Trace: [81689715] ? _raw_spin_unlock+0x45/0x70 [810f3174] ? try_to_wake_up+0x2a4/0x330 [81101e2c] ? __wake_up_common+0x4c/0x80 [81102418] ? __wake_up_sync_key+0x38/0x60 [810d873a] ? do_notify_parent+0x19a/0x280 [810f4d56] ? sched_move_task+0xb6/0x190 [810cb4fc] ? do_exit+0xa1c/0xab0 [810cc344] ? do_group_exit+0x34/0xb0 [810cc3cb] ? SyS_exit_group+0xb/0x10 [8168a16d] ? system_call_fastpath+0x1a/0x1f Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 RIP [81037fe2] kvm_unlock_kick+0x72/0x80 RSP 88007c617d40 ---[ end trace bf5a4445f9decdbb ]--- Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: mkdir/618/0x0006 Modules linked in: CPU: 0 PID: 618 Comm: mkdir Tainted: G D 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 c022d302 81684029 810ee956 81686266 00011c40 88007c617fd8 00011c40 88007c997080 0006 0046 Call Trace: [81684029] ? dump_stack+0x49/0x6a [810ee956] ? __schedule_bug+0x46/0x60 [81686266] ? __schedule+0x5a6/0x7c0 [816828cd] ? printk+0x59/0x75 [810cb33b] ? do_exit+0x85b/0xab0 [816828cd] ? printk+0x59/0x75 [8100614a] ? oops_end+0x7a/0x100 [810033e5] ? do_error_trap+0x85/0x110 [81037fe2] ? kvm_unlock_kick+0x72/0x80 [8114a358] ? __alloc_pages_nodemask+0x108/0xa60 [8168b57e] ? divide_error+0x1e/0x30 [81037fe2] ? kvm_unlock_kick+0x72/0x80 [81689715] ? _raw_spin_unlock+0x45/0x70 [810f3174] ? try_to_wake_up+0x2a4/0x330 [81101e2c] ? __wake_up_common+0x4c/0x80 [81102418] ? __wake_up_sync_key+0x38/0x60 [810d873a] ? do_notify_parent+0x19a/0x280 [810f4d56] ? sched_move_task+0xb6/0x190 [810cb4fc] ? do_exit+0xa1c/0xab0 [810cc344] ? do_group_exit+0x34/0xb0 [810cc3cb] ? SyS_exit_group+0xb/0x10 [8168a16d] ? system_call_fastpath+0x1a/0x1f Best wishes, Chris.
[Qemu-devel] [BUG] Guest kernel divide error in kvm_unlock_kick
I've reported this bug before, which reliably crashes a guest kernel shortly after boot, but have just reconfirmed that it is still present with Linux 3.16.2 guest and host kernels and Qemu 2.1. Running a 3.16.2 x86-64 SMP guest kernel on qemu-2.1, with kvm enabled and -cpu host on a 3.16.2 AMD Opteron host, I'm seeing a reliable kernel panic from the guest shortly after boot. I think is happening in kvm_unlock_kick() in the paravirt_ops code: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 743 Comm: syslogd Not tainted 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: 88007c972580 ti: 88007cb7c000 task.ti: 88007cb7c000 RIP: 0010:[81037fe2] [81037fe2] kvm_unlock_kick+0x72/0x80 RSP: :88007fc03ec8 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0003 RDX: 0003 RSI: 81a466a0 RDI: RBP: 81a466a0 R08: 81b98940 R09: 0246 R10: 0400 R11: R12: 00ea R13: 0009 R14: 0002 R15: 88007fc0d300 FS: 7f2a6473e700() GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 004a8240 CR3: 7ac75000 CR4: 000406f0 Stack: 81a46400 0246 0001 8168979d 0282 81110d97 0007 88007cb7ffd8 88007c972580 4b0782e8 0002 81a0b0c8 Call Trace: IRQ [8168979d] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [81110d97] ? rcu_process_callbacks+0x337/0x4f0 [810cde2d] ? __do_softirq+0xfd/0x210 [810ce06e] ? irq_exit+0x7e/0xa0 [8103063b] ? smp_apic_timer_interrupt+0x3b/0x50 [8168b04d] ? apic_timer_interrupt+0x6d/0x80 EOI [8114180b] ? filemap_map_pages+0x17b/0x240 [811418c0] ? filemap_map_pages+0x230/0x240 [811679e2] ? do_read_fault.isra.70+0x2a2/0x320 [811696cc] ? handle_mm_fault+0x37c/0xd00 [8103bb45] ? __do_page_fault+0x185/0x4c0 [8168b958] ? async_page_fault+0x28/0x30 [813b9610] ? __put_user_4+0x20/0x30 [8168b958] ? async_page_fault+0x28/0x30 Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 RIP [81037fe2] kvm_unlock_kick+0x72/0x80 RSP 88007fc03ec8 ---[ end trace be08885ac2c94c6a ]--- Kernel panic - not syncing: Fatal exception in interrupt My host kernel config is http://cdw.me.uk/tmp/host-config.txt and the guest config is http://cdw.me.uk/tmp/guest-config.txt with qemu command line: qemu-system-x86 -enable-kvm -cpu host -machine q35 -m 2048 -name $1 \ -smp sockets=1,cores=4 -pidfile /run/$1.pid -runas nobody \ -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \ -append console=ttyS0 root=/dev/vda \ -drive file=/dev/guest/$1,cache=none,format=raw,if=virtio \ -device virtio-rng-pci \ -device virtio-net-pci,netdev=nic,mac=$( /sys/class/net/$1/address) \ -netdev tap,id=nic,fd=3 3/dev/tap$( /sys/class/net/$1/ifindex) I can stop this crash by disabling CONFIG_PARAVIRT_SPINLOCKS in my guest kernel, running with -cpu qemu64 instead of -cpu host, or running with -smp 1 instead of -smp 4. (Removing/changing the -machine q35 makes no difference.) /proc/cpuinfo on the host has 8 of these: processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6328 stepping: 0 microcode : 0x600081c cpu MHz : 3200.000 cache size : 2048 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 32 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 bogomips: 6399.70 TLB size: 1536 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro and on the guest, has 4 of these: processor : 0 vendor_id : AuthenticAMD cpu family :