[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #10 from Mark kernelbugzilla.org.mark...@dfgh.net --- well thanks very much guys, I'll cautiously say that a patched guest kernel seems to resolve it :-) the bug seemed to appear even when the host is untainted; serial log says [ 375.989736] divide error: [#1] SMP that's a Kubuntu partition that manages without the proprietary drivers while Fedora won't seem to give graphics properly with nouveau; not entirely satisfactory of course given that I've now had to modify a virtual machine that it was important to try to avoid modifying for testing purposes; although definitely better than crashing :-) do we keep the bug report open so that the host kvm handling bug gets fixed too / is this the right place to report that? -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #8 from Mark kernelbugzilla.org.mark...@dfgh.net --- Created attachment 166461 -- https://bugzilla.kernel.org/attachment.cgi?id=166461action=edit dmesg -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #9 from Mark kernelbugzilla.org.mark...@dfgh.net --- I'll try both of your suggestions, thanks -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 Alan a...@lxorguk.ukuu.org.uk changed: What|Removed |Added Status|NEW |NEEDINFO CC||a...@lxorguk.ukuu.org.uk --- Comment #6 from Alan a...@lxorguk.ukuu.org.uk --- Can you reproduce it without the Nvidia blob loaded on the host ? -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 Paolo Bonzini bonz...@gnu.org changed: What|Removed |Added CC||bonz...@gnu.org --- Comment #7 from Paolo Bonzini bonz...@gnu.org --- Your guest is probably missing commit c1118b3602c2329671ad5ec8bdf8e374323d6343. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #4 from Mark kernelbugzilla.org.mark...@dfgh.net --- I should just add that that 'code' segment in the kvm dump is identical every time, I'd be happy to try to track what is causing it, providing someone could give me some pointers to debugging that kind of stuff in more detail Code=00 01 48 c7 c0 8a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 0f 01 c1 0f 1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9 f0 0f b0 0d fb 26 e6 00 40 -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #5 from Mark kernelbugzilla.org.mark...@dfgh.net --- Created attachment 166021 -- https://bugzilla.kernel.org/attachment.cgi?id=166021action=edit result of guest lsmod; identical for mono-cpu / multi-cpu -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #1 from Mark kernelbugzilla.org.mark...@dfgh.net --- Created attachment 165181 -- https://bugzilla.kernel.org/attachment.cgi?id=165181action=edit serial log during bug; when smp 1 -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] New: kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 Bug ID: 92291 Summary: kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well Product: Virtualization Version: unspecified Kernel Version: 3.18.3-201.fc21.x86_64 [host], 3.13.0-39-generic [ubuntu guest] Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: kvm Assignee: virtualization_...@kernel-bugs.osdl.org Reporter: kernelbugzilla.org.mark...@dfgh.net Regression: No Created attachment 165171 -- https://bugzilla.kernel.org/attachment.cgi?id=165171action=edit kvm register dumps Overview Whenever I launch a kvm guest in qemu-system-x86_64 with -smp 1 [-cpu host -enable-kvm of course], the guest kernel crashes at some stage; I've seen the kernel crash before completing boot-up, giving the recognizable kernel crash trace, as well as during 'normal' operation when the guest simply 'freezes'. The host kernel gives me an oops as well, although abrt won't let me report it now as I've got Nvidia proprietary drivers 'tainting' the kernel The bug seems non-specific to guest OS, though mainly it's been *buntu guests, with various different kernels including lowlatency, I even think I've seen the bug with a Windows guest. I've tried building qemu from git sources, no particularly noticeable difference; Steps to reproduce == 1) Start linux guest on a multicore system [AMD FX8* series CPU?] with $ qemu-system-x86_64 -m 2G -cpu host -smp 4 -enable-kvm -hda [image_file] -vga vmware 2) wait until the guest kernel crashes; virtually every time in less than 10 minutes Actual results == guest kernel crashes, host kernel gives an oops too, presumably as a result of kvm passthrough Expected results guest virtual machine should work normally then keep working normally it would be good to try to pipe dmesg from the guest to the host serial connection, in case that would assist? -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #3 from Mark kernelbugzilla.org.mark...@dfgh.net --- Created attachment 165201 -- https://bugzilla.kernel.org/attachment.cgi?id=165201action=edit host cpuinfo -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well
https://bugzilla.kernel.org/show_bug.cgi?id=92291 --- Comment #2 from Mark kernelbugzilla.org.mark...@dfgh.net --- Created attachment 165191 -- https://bugzilla.kernel.org/attachment.cgi?id=165191action=edit serial log when no smp -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2441883 ] KVM guest crashes when using linux-md software RAID5
Bugs item #2441883, was opened at 2008-12-17 19:22 Message generated for change (Settings changed) made by sf-robot You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Johannes Truschnigg (c0l0) Assigned to: Nobody/Anonymous (nobody) Summary: KVM guest crashes when using linux-md software RAID5 Initial Comment: CPU: Intel Core 2 Quad Q6600 (4 cores) Distro, kernel: Gentoo GNU/Linux ~amd64, Kernel 2.6.27.9 Bitness, compiler: x86_64, GCC 4.3.2 KVM versions: kvm-79, kvm-81 Trying to assemble a (software) RAID5-array under GNU/Linux, guest kernel version 2.6.24, segfaults kvm the second the md-driver finishes initially syncing the array's members. When trying to boot with the same configuration again, KVM crashes the moment the bootloader is supposed to take over. I've attached my test-case, which is also available here: http://johannes.truschnigg.info/tmp/kvm-79_segfault_crashatstart.tar.bz2 PLEASE NOTE that the extracted files consume around 21G, due to the zero-filled image files used as array components. The included shellscript, `start.sh`, needs to be adapted to find ubuntu-8.04.1-desktop-amd64.iso on your machine - an image which is available here: http://releases.ubuntu.com/hardy/ubuntu-8.04.1-desktop-amd64.iso I've hit this problem for the first time with KVM-79, but it's still not fixed for me with KVM-81. I'm happy to provide additional information upon request. -- Comment By: SourceForge Robot (sf-robot) Date: 2009-02-02 02:34 Message: This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). -- Comment By: Avi Kivity (avik) Date: 2008-12-24 10:43 Message: Please attach a stacktrace from the crash and any messages in the host kernel log. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Marcelo Tosatti wrote: Hi Alexander, On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote: Following the discussion on IRC, I tried -no-kvm-irqchip and found some virtual machines broken after 1 day of stress testing again: + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain2 cifspass=contain2 root=cifs://contain2:conta...@172.1 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net tap,ifname=tap2,sc ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? Stuck ?? BUG: unable to handle kernel NULL pointer dereference at IP: [802b539a] kfree+0x18b/0x26e PGD 0 Oops: [1] SMP last sysfs file: CPU 2 Modules linked in: Supported: Yes Pid: 0, comm: swapper Tainted: G S2.6.27.7-9-default #1 RIP: 0010:[802b539a] [802b539a] kfree+0x18b/0x26e RSP: 0018:88007a493e90 EFLAGS: 00010046 RAX: 0002 RBX: 8800010397f0 RCX: 88007a480778 RDX: e200 RSI: 8800010397f0 RDI: 88007a5ae140 RBP: R08: 8800010395d0 R09: 88007a493eb8 R10: 80a59980 R11: 8021c5d9 R12: 0001 R13: 88007ac04080 R14: 10200042 R15: 88007a5ae140 FS: () GS:88007a461f40() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 88007a48a000, task 88007a488280) Stack: 8023df9c 8073a108 0286 8024a1eb 80259d80 8800010397f0 0001 000a 10200042 0010 802831d0 Call Trace: [802831d0] __rcu_process_callbacks+0x189/0x203 [80283271] rcu_process_callbacks+0x27/0x47 [802464ed] __do_softirq+0x84/0x115 [8020dc9c] call_softirq+0x1c/0x28 [8020f067] do_softirq+0x3c/0x81 [80246204] irq_exit+0x3f/0x83 [8021ce5f] smp_apic_timer_interrupt+0x95/0xae [8020d4a3] apic_timer_interrupt+0x83/0x90 [80221f1d] native_safe_halt+0x2/0x3 [80213465] default_idle+0x38/0x54 [8020b34a] cpu_idle+0xa9/0xf1 Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 8b 55 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 RIP [802b539a] kfree+0x18b/0x26e RSP 88007a493e90 CR2: ---[ end trace 4eaa2a86a8e2da22 ]--- Also after two days of permanent stress testing I also got the Intel machine w/ current git down: + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:conta...@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? No backtrace here though. That's all I got from the serial console. The only issues I had with the UP guests so far was this: + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 root=cifs://contain6:conta...@172.16.6.1/contain6 realroot=//172.16.6.1/users/contain6 ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 ..MP-BIOS bug: 8254 timer not connected to IO-APIC Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. which can be annoying at times too. Can't we just detect that it's the detection and give the guest its interrupts? Or should the PIT reinjection thing help here? There are a number of problems that can result in this error, and the problems are possibly different between the in-kernel PIT and userspace PIT emulation (note it also happens with in-kernel PIT, just much more rarely now). You can use the
Re: KVM guest crashes
On Mon, Jan 26, 2009 at 04:53:21PM +0100, Alexander Graf wrote: There are a number of problems that can result in this error, and the problems are possibly different between the in-kernel PIT and userspace PIT emulation (note it also happens with in-kernel PIT, just much more rarely now). You can use the no_timer_check kernel option to bypass it. Hm - that option disables the whole check, making it always fail. I haven't seen any way to actually disable the check, telling Linux things are OK :-(. Hum, the option makes timer_irq_works always return true. Works for me with in-kernel PIT. What you see with apic=debug no_timer_check ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Marcelo Tosatti wrote: On Mon, Jan 26, 2009 at 04:53:21PM +0100, Alexander Graf wrote: There are a number of problems that can result in this error, and the problems are possibly different between the in-kernel PIT and userspace PIT emulation (note it also happens with in-kernel PIT, just much more rarely now). You can use the no_timer_check kernel option to bypass it. Hm - that option disables the whole check, making it always fail. I haven't seen any way to actually disable the check, telling Linux things are OK :-(. Hum, the option makes timer_irq_works always return true. Works for me with in-kernel PIT. What you see with apic=debug no_timer_check ? It does work with noapic for me, but that means I'm using the old PIC (which isn't necessarily bad, right?). So I can at least work around the issue for us now. It still needs to be fixed nevertheless. with apic=debug no_apic_timer 2.6.27 does: Setting APIC routing to flat ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... . (found apic 0 pin 0) ... ... works. while 2.6.25 does: ..MP-BIOS bug: 8254 timer not connected to IO-APIC Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the 'noapic' kernel parameter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote: rarely now). You can use the no_timer_check kernel option to bypass it. Ok :-). Thanks. The logic in the kernel for this is really stupid (basing timing on clock speed). What about disabling the check if we detect KVM? Yes, this is an option. We've talked about it before, but no patch was merged. The RHEL5.3 kernel skips those checks when it detects VMWare or KVM hypervisors. We should understand what is happening to fix the fullvirt/old guest case. For the in-kernel PIT, I believe there is a bug somewhere, either in PIT itself or in the interaction with IOAPIC (failure to inject interrupts for some reason). I started debugging it by constantly reboot'ing an SMP guest but my testbox died. Hope to get back to it soon. Regarding the corruption problem, I have a few questions: - It is SMP specific (ie both kernel/userspace irqchip fail). - which means UP guests are stable with both kernel/user irqchip. I have not been able to reproduce any of my issues with UP. I have to admit that I only tried UP with in-kernel irqchip. OK. The Stuck ?? messages seem to be coming from smpboot.c. So for some reason vcpu's are being reset. Don't seem to be a triple fault because in that case all vcpu's would be reset (so yes, the vcpu was really on BIOS code). Hm. I know that OSX turns off CPUs it doesn't need as an alternative to deep-sleep. Does Linux do that too? Not that I know of, unless you offline CPU's manually, which does not seem to be the case. Suggest the following: - Confirm the problem happens with root on ext3 filesystem (can't you mount the CIFS and copy the data over to a local guest disk to simulate similar load?). I had Stuck ?? messages without networking, but if it helps I can try that too. In the project we're using this for we do things over cifs, so that's why I built the test case around it. OK. Just trying to decrease the variables involved. I'll setup a machine to run a similar load next week. - Check that the kernel text is not corrupted. Save the good kernel text with QEMU's pmemsave or memsave (you can see start/end in the symbols _text/_etext, /proc/kallsyms) after booting. After you see the crash, save the bad kernel text, compare. This can give additional clues (or not). Good idea - I'll try. Also, you mentioned other reports previously, can you point to them, please? Yes, will do later. I gotta run now! Thanks for the reply - it's good to know this isn't getting ignored :-). Have a good weekend. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
On 24.01.2009, at 14:06, Marcelo Tosatti wrote: On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote: rarely now). You can use the no_timer_check kernel option to bypass it. Ok :-). Thanks. The logic in the kernel for this is really stupid (basing timing on clock speed). What about disabling the check if we detect KVM? Yes, this is an option. We've talked about it before, but no patch was merged. The RHEL5.3 kernel skips those checks when it detects VMWare or KVM hypervisors. That sounds clever. But I doubt I'll get anything as intrusive into the SLES11 kernel at this point in time :-(. We should understand what is happening to fix the fullvirt/old guest case. For the in-kernel PIT, I believe there is a bug somewhere, either in PIT itself or in the interaction with IOAPIC (failure to inject interrupts for some reason). I started debugging it by constantly reboot'ing an SMP guest but my testbox died. Hope to get back to it soon. Hm. If I ever get tracing working again, I can try to create one too :-). The Stuck ?? messages seem to be coming from smpboot.c. So for some reason vcpu's are being reset. Don't seem to be a triple fault because in that case all vcpu's would be reset (so yes, the vcpu was really on BIOS code). Hm. I know that OSX turns off CPUs it doesn't need as an alternative to deep-sleep. Does Linux do that too? Not that I know of, unless you offline CPU's manually, which does not seem to be the case. Nope, I don't hotplug anything (though the acpihp module is loaded). Suggest the following: - Confirm the problem happens with root on ext3 filesystem (can't you mount the CIFS and copy the data over to a local guest disk to simulate similar load?). I had Stuck ?? messages without networking, but if it helps I can try that too. In the project we're using this for we do things over cifs, so that's why I built the test case around it. OK. Just trying to decrease the variables involved. I'll setup a machine to run a similar load next week. Sounds good :-). I put all the files I tested with online with a link in the first mail of this thread. So feel free to take that as an inspiration. For non-network testing I simply put -net none there, but still had the initrd boot and kill the machine. Also, you mentioned other reports previously, can you point to them, please? Yes, will do later. I gotta run now! Thanks for the reply - it's good to know this isn't getting ignored :-). Have a good weekend. Same to you. I was running for a first-aid course though, not the weekend :-). I was mainly talking here about the thread Guest Hang Bugs. Though with 2.6.25 guests I did get BUG: soft lockup - CPU#x stuck for ns! messages instead of the Stuck ?? FWIW. Originally I created the whole test case to debug this exact bug we encountered as well: http://article.gmane.org/gmane.comp.emulators.kvm.devel/21828/ Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Alexander Graf wrote: Alexander Graf wrote: Alexander Graf wrote: [...] Also after two days of permanent stress testing I also got the Intel machine w/ current git down: + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:conta...@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? No backtrace here though. That's all I got from the serial console. + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:conta...@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? [...] In order to provide you with more dumps that might point to some direction (I'm still lost on figuring where to look), here's another AMD NPT guest crash with current git. It somehow looks as if the guest pagetable is corrupted. + sudo -u contain3 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=con tain3 cifspass=contain3 root=cifs://contain3:conta...@172.16.3.1/contain3 realroot=//172.16.3.1/users/contain3 ip=172.16.3.2:172.16.3.1::255.255.255.0::eth0:none console=tty S0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:3 -net tap,ifname=tap3,script=/bin/true -m 2000 -nographic -smp 8 -no-kvm-irqchip /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 pci :00:01.0: PIIX3: Enabling Passive Release IP-Config: Device `eth0' not found. doing fast boot Creating device nodes with udev ^MBoot logging started on /dev/ttyS0(/dev/console) at Thu Jan 22 23:05:55 2009^M [NETWORK] using static config based on ip=172.16.3.2:172.16.3.1::255.255.255.0::eth0:none^M Trying manual resume from /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1^M resume device /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1 not found (ignoring)^M Trying manual resume from /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1^M resume device /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1 not found (ignoring)^M node name not found^M Mounting root //172.16.3.1/contain3^M RTNETLINK answers: File exists^M 1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN ^M link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00^M inet 127.0.0.1/8 scope host lo^M 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000^M link/ether 52:54:00:12:34:03 brd ff:ff:ff:ff:ff:ff^M inet 172.16.3.2 peer 172.16.3.1/24 scope global eth0^M BUG: unable to handle kernel paging request at 00100100 IP: [8036a603] strnlen+0x10/0x19 PGD 7c596067 PUD 7c9ed067 PMD 0 Oops: [1] SMP last sysfs file: /sys/kernel/uevent_seqnum CPU 7 Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon Supported: Yes, External Pid: 782, comm: halt Tainted: G S2.6.27.7-9-default #1 RIP: 0010:[8036a603] [8036a603] strnlen+0x10/0x19 RSP: 0018:88007c46da70 EFLAGS: 00010082 RAX: 00100100 RBX: RCX: RDX: 00100100 RSI: fffe RDI: 00100100 RBP: 80ae0fad R08: R09: R10: 000a R11: R12: 00100100 R13: R14: 80ae13a0 R15: FS: 7f0b2aee06f0() GS:88007a57bf40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 00100100 CR3: 7c4e5000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process halt (pid: 782, threadinfo 88007c46c000, task 88007c17e0c0) Stack: 8036b39d 88007c46ddb8 80ae0fad 805d7e29 8036b6f6 7f0b2ace27e0 88007c595ab0 88007c0624a8 0400 80ae0fa0 Call Trace: [8036b39d] string+0x34/0x91 [8036b6f6] vsnprintf+0x2fc/0x574 [8036ba56]
Re: KVM guest crashes
Hi Alexander, On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote: Following the discussion on IRC, I tried -no-kvm-irqchip and found some virtual machines broken after 1 day of stress testing again: + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain2 cifspass=contain2 root=cifs://contain2:conta...@172.1 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net tap,ifname=tap2,sc ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? Stuck ?? BUG: unable to handle kernel NULL pointer dereference at IP: [802b539a] kfree+0x18b/0x26e PGD 0 Oops: [1] SMP last sysfs file: CPU 2 Modules linked in: Supported: Yes Pid: 0, comm: swapper Tainted: G S2.6.27.7-9-default #1 RIP: 0010:[802b539a] [802b539a] kfree+0x18b/0x26e RSP: 0018:88007a493e90 EFLAGS: 00010046 RAX: 0002 RBX: 8800010397f0 RCX: 88007a480778 RDX: e200 RSI: 8800010397f0 RDI: 88007a5ae140 RBP: R08: 8800010395d0 R09: 88007a493eb8 R10: 80a59980 R11: 8021c5d9 R12: 0001 R13: 88007ac04080 R14: 10200042 R15: 88007a5ae140 FS: () GS:88007a461f40() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 88007a48a000, task 88007a488280) Stack: 8023df9c 8073a108 0286 8024a1eb 80259d80 8800010397f0 0001 000a 10200042 0010 802831d0 Call Trace: [802831d0] __rcu_process_callbacks+0x189/0x203 [80283271] rcu_process_callbacks+0x27/0x47 [802464ed] __do_softirq+0x84/0x115 [8020dc9c] call_softirq+0x1c/0x28 [8020f067] do_softirq+0x3c/0x81 [80246204] irq_exit+0x3f/0x83 [8021ce5f] smp_apic_timer_interrupt+0x95/0xae [8020d4a3] apic_timer_interrupt+0x83/0x90 [80221f1d] native_safe_halt+0x2/0x3 [80213465] default_idle+0x38/0x54 [8020b34a] cpu_idle+0xa9/0xf1 Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 8b 55 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 RIP [802b539a] kfree+0x18b/0x26e RSP 88007a493e90 CR2: ---[ end trace 4eaa2a86a8e2da22 ]--- Also after two days of permanent stress testing I also got the Intel machine w/ current git down: + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:conta...@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? No backtrace here though. That's all I got from the serial console. The only issues I had with the UP guests so far was this: + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 root=cifs://contain6:conta...@172.16.6.1/contain6 realroot=//172.16.6.1/users/contain6 ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 ..MP-BIOS bug: 8254 timer not connected to IO-APIC Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. which can be annoying at times too. Can't we just detect that it's the detection and give the guest its interrupts? Or should the PIT reinjection thing help here? There are a number of problems that can result in this error, and the problems are possibly different between the in-kernel PIT and userspace PIT emulation (note it also happens with in-kernel PIT, just much more rarely now). You can use the no_timer_check kernel option to bypass
Re: KVM guest crashes
Alexander Graf wrote: Alexander Graf wrote: [...] Also after two days of permanent stress testing I also got the Intel machine w/ current git down: + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:conta...@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? No backtrace here though. That's all I got from the serial console. + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:conta...@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x7b24 Stuck ?? (qemu) info cpus * CPU #0: pc=0x80221f1d thread_id=15211 CPU #1: pc=0x80221f1d thread_id=15212 CPU #2: pc=0x80221f1d thread_id=15213 CPU #3: pc=0x80221f1d thread_id=15214 CPU #4: pc=0x8049f7d0 thread_id=15215 CPU #5: pc=0x80221f1d thread_id=15216 CPU #6: pc=0x80221f1d thread_id=15217 CPU #7: pc=0x0009f02c thread_id=15218 (qemu) cpu 7 (qemu) info registers EAX=0c06 EBX=05b8 ECX= EDX= ESI= EDI= EBP= ESP= EIP=002c EFL=00033002 [---] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES = f300 CS =9f00 0009f000 f300 SS = f300 DS = f300 FS = f300 GS = f300 LDT= 8200 TR = fffbd000 2088 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 FCW=037f FSW= [ST=0] FTW=00 MXCSR= FPR0= FPR1= FPR2= FPR3= FPR4= FPR5= FPR6= FPR7= XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= Is that guest really seriously in BIOS code? After booting Linux? (qemu) x /2i $pc-1 0x0009f02b: hlt 0x0009f02c: jmp0x9f02b Where is this? Looks like panic code to me. 0x0009f000: cli 0x0009f001: xor%ax,%ax 0x0009f003: mov%ax,%ds 0x0009f005: mov$0x510,%ebx 0x0009f00b: addr32 mov (%ebx),%ecx 0x0009f00f: test %ecx,%ecx 0x0009f012: je 0x9f026 0x0009f014: addr32 mov 0x4(%ebx),%eax 0x0009f019: addr32 mov 0x8(%ebx),%edx 0x0009f01e: wrmsr 0x0009f020: add$0xc,%ebx 0x0009f024: jmp0x9f00b 0x0009f026: lock incw 1856 0x0009f02b: hlt 0x0009f02c: jmp0x9f02b Looks a lot like this: smp_ap_boot_code_start: cli xor %ax, %ax mov %ax, %ds mov $SMP_MSR_ADDR, %ebx 11: mov 0(%ebx), %ecx test %ecx, %ecx jz 12f mov 4(%ebx), %eax mov 8(%ebx), %edx wrmsr add $12, %ebx jmp 11b 12: lock incw smp_cpus 1: hlt jmp 1b But that code shouldn't run after Linux booted, right? And without at least a Power Off message I'd expect Linux to still be up. The only thing the host's dmesg was saying is this: Ignoring delivery mode 3 (repeated often) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Avi Kivity wrote: Alexander Graf wrote: The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 (2.6.27) kernels. Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 I'm somewhat lost on the reason for these failures, so if you do have some time on your hands, please give me a hand debugging this! If I'd had to guess, I'd say it's either an APIC issue and/or guest memory corruption. I'd guess memory corruption. Does running a uniprocessor guest help? What about a uniprocessor guest pinned to one host core? So last night I started several guests with -smp 8 but without network to see if IO load is causing the problems. All VMs are down, but one panic log is rather new: Stuck ?? Stuck ?? Stuck ?? Stuck ?? Stuck ?? Stuck ?? BUG: unable to handle kernel NULL pointer dereference at IP: [80237454] cpu_attach_domain+0x84/0x207 PGD 0 Oops: [1] SMP last sysfs file: CPU 1 Modules linked in: Supported: Yes Pid: 1, comm: swapper Tainted: G S2.6.27.11-1-default #1 RIP: 0010:[80237454] [80237454] cpu_attach_domain+0x84/0x207 RSP: 0018:88007a419c50 EFLAGS: 00010202 RAX: RBX: 880001077a60 RCX: 88007a419c40 RDX: 044d RSI: 0200 RDI: RBP: 88007a419c90 R08: R09: 0200 R10: 0008 R11: 00018600 R12: 8800010778d0 R13: 880001077a78 R14: 8800010775b0 R15: 88000107f700 FS: () GS:88007afeb540() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00201000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 1, threadinfo 88007a418000, task 88007a406040) Stack: 00047a4616c0 88007a548000 002f044d 0004 80a275b0 88007a460e00 88007a45c140 88007a419ec0 80238190 88007a419dc0 88007a419e00 Call Trace: [80238190] __build_sched_domains+0xbb9/0xbf5 [80981ae4] sched_init_smp+0xa9/0x1d8 [8096b850] kernel_init+0x74/0xea [8020cf79] child_rip+0xa/0x11 Code: 00 4c 89 ef 89 45 d4 8b 83 88 00 00 00 89 45 d0 e8 d1 05 13 00 ff c8 74 5d 8b 93 88 00 00 00 f7 c2 8f 02 00 00 74 0d 48 8b 43 10 48 3b 00 0f 85 24 01 00 00 80 e2 70 0f 85 1b 01 00 00 eb 37 48 RIP [80237454] cpu_attach_domain+0x84/0x207 RSP 88007a419c50 CR2: ---[ end trace 4eaa2a86a8e2da22 ]--- Kernel panic - not syncing: Attempted to kill init! From what I've seen it's always related to IPIs, but that's just a guess. I'll start UP testing now. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Alexander Graf wrote: Avi Kivity wrote: Alexander Graf wrote: The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 (2.6.27) kernels. Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 I'm somewhat lost on the reason for these failures, so if you do have some time on your hands, please give me a hand debugging this! If I'd had to guess, I'd say it's either an APIC issue and/or guest memory corruption. I'd guess memory corruption. Does running a uniprocessor guest help? What about a uniprocessor guest pinned to one host core? So last night I started several guests with -smp 8 but without network to see if IO load is causing the problems. All VMs are down, but one panic log is rather new: Stuck ?? Stuck ?? Stuck ?? Stuck ?? Stuck ?? Stuck ?? BUG: unable to handle kernel NULL pointer dereference at IP: [80237454] cpu_attach_domain+0x84/0x207 This is right on startup, if I read things right. I suggest checking if you have the latest BIOS update applied. I've had bad experiences with un-updated processors. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Avi Kivity wrote: I suggest checking if you have the latest BIOS update applied. I've had bad experiences with un-updated processors. FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4 Barcelona host, happily make -j16ing an allmodconfig kernel. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Avi Kivity wrote: Avi Kivity wrote: I suggest checking if you have the latest BIOS update applied. I've had bad experiences with un-updated processors. FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4 Barcelona host, happily make -j16ing an allmodconfig kernel. Strange. I started the tests again with an updated BIOS now, installing an Intel machine to test on in parallel. old: # ./rdmsr /dev/cpu/0/msr $(( 0x008b )) 0x165 new: # ./rdmsr /dev/cpu/0/msr $(( 0x008b )) 0x183 But I already got one guest crashing: int3: [1] SMP last sysfs file: /sys/kernel/uevent_seqnum CPU 2 Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon Supported: Yes, External Pid: 0, comm: swapper Tainted: G S2.6.27.7-9-default #1 RIP: 0010:[80a500f1] [80a500f1] per_cpu__cpu_state+0x1/0x4 RSP: 0018:88007a493fa8 EFLAGS: 0083 RAX: 806f5fa0 RBX: 80a500f0 RCX: RDX: 880001033200 RSI: RDI: ff5fc0b0 RBP: 88007a48beb0 R08: R09: 880001039638 R10: R11: 8021c5d9 R12: R13: R14: R15: FS: 7fe3252e4950() GS:88007a461f40() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 0062d000 CR3: 7c10a000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 88007a48a000, task 88007a488280) Stack: 88007a48beb0 8020ca2e 88007a48beb0 EOI 007dd83ce327 0086 8800010396d0 02625a00 0002 0001eadc 007dd83ce327 0292 0292 Call Trace: Inexact backtrace: IRQ [8020ca2e] ? ret_from_intr+0x0/0x29 EOI [804a6992] ? notifier_call_chain+0x29/0x4c [80213465] ? default_idle+0x38/0x54 [8020b34a] ? cpu_idle+0xa9/0xf1 Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc RIP [80a500f1] per_cpu__cpu_state+0x1/0x4 RSP 88007a493fa8 ---[ end trace 17313f34f216af07 ]--- Kernel panic - not syncing: Attempted to kill the idle task! [ cut here ] WARNING: at kernel/smp.c:331 smp_call_function_mask+0x38/0x1f2() Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon Supported: Yes, External Pid: 0, comm: swapper Tainted: G SD 2.6.27.7-9-default #1 Call Trace: [8020e42e] show_trace_log_lvl+0x41/0x58 [804a1e97] dump_stack+0x69/0x6f [80240eb2] warn_on_slowpath+0x51/0x77 [80261fef] smp_call_function_mask+0x38/0x1f2 [802621d2] smp_call_function+0x29/0x2e [8021ba16] native_smp_send_stop+0x1a/0x3f [804a1f59] panic+0xbc/0x170 [802449e2] do_exit+0x6b/0x334 [804a4b9b] oops_begin+0x0/0x9e [804a524a] do_int3+0x7d/0xa1 [804a46e6] int3+0xb6/0xf0 [80a500f1] per_cpu__cpu_state+0x1/0x4 DWARF2 unwinder stuck at per_cpu__cpu_state+0x1/0x4 Leftover inexact backtrace: IRQ [8020ca2e] ret_from_intr+0x0/0x29 EOI [804a6992] notifier_call_chain+0x29/0x4c [80213465] default_idle+0x38/0x54 [8020b34a] cpu_idle+0xa9/0xf1 ---[ end trace 17313f34f216af07 ]--- The UP guests seemed to work fine - will start them again now. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM guest crashes
Hi list, recently I've been hitting some KVM bugs others seem to have reported as well, including - CIFS timeouts - Stuck ?? errors - Random segmentation faults in the guest so I figured, I'll put together a stress test that can be used to reproduce these issues. This is done by using a CIFS mount on the host and unpacking data from that mount to the mount. I have been able to bring kvm down to its knees a lot just by doing this. Simply run the test in an endless-loop. FWIW enabling NPT helps triggering the issue. The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 (2.6.27) kernels. Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 I'm somewhat lost on the reason for these failures, so if you do have some time on your hands, please give me a hand debugging this! If I'd had to guess, I'd say it's either an APIC issue and/or guest memory corruption. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
Alexander Graf wrote: Hi list, recently I've been hitting some KVM bugs others seem to have reported as well, including - CIFS timeouts - Stuck ?? errors - Random segmentation faults in the guest so I figured, I'll put together a stress test that can be used to reproduce these issues. This is done by using a CIFS mount on the host and unpacking data from that mount to the mount. I have been able to bring kvm down to its knees a lot just by doing this. Simply run the test in an endless-loop. FWIW enabling NPT helps triggering the issue. Are the problems specific to AMD? What does helps triggering mean - does it happen with NPT disabled? The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 (2.6.27) kernels. Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 I'm somewhat lost on the reason for these failures, so if you do have some time on your hands, please give me a hand debugging this! If I'd had to guess, I'd say it's either an APIC issue and/or guest memory corruption. I'd guess memory corruption. Does running a uniprocessor guest help? What about a uniprocessor guest pinned to one host core? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM guest crashes
On 20.01.2009, at 21:07, Avi Kivity a...@redhat.com wrote: Alexander Graf wrote: Hi list, recently I've been hitting some KVM bugs others seem to have reported as well, including - CIFS timeouts - Stuck ?? errors - Random segmentation faults in the guest so I figured, I'll put together a stress test that can be used to reproduce these issues. This is done by using a CIFS mount on the host and unpacking data from that mount to the mount. I have been able to bring kvm down to its knees a lot just by doing this. Simply run the test in an endless-loop. FWIW enabling NPT helps triggering the issue. Are the problems specific to AMD? I don't know, as all machines I tried it on were AMD so far. But judging from user reports on the ml, it happens on Intel too. What does helps triggering mean - does it happen with NPT disabled? It seems like the chances for breakage are higher with NPT enabled. I do see them without as well though. The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 (2.6.27) kernels. Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 I'm somewhat lost on the reason for these failures, so if you do have some time on your hands, please give me a hand debugging this! If I'd had to guess, I'd say it's either an APIC issue and/or guest memory corruption. I'd guess memory corruption. Does running a uniprocessor guest help? What about a uniprocessor guest pinned to one host core? I'll try to start tests tomorrow. Alex -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2441883 ] KVM guest crashes when using linux-md software RAID5
Bugs item #2441883, was opened at 2008-12-17 20:22 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Johannes Truschnigg (c0l0) Assigned to: Nobody/Anonymous (nobody) Summary: KVM guest crashes when using linux-md software RAID5 Initial Comment: CPU: Intel Core 2 Quad Q6600 (4 cores) Distro, kernel: Gentoo GNU/Linux ~amd64, Kernel 2.6.27.9 Bitness, compiler: x86_64, GCC 4.3.2 KVM versions: kvm-79, kvm-81 Trying to assemble a (software) RAID5-array under GNU/Linux, guest kernel version 2.6.24, segfaults kvm the second the md-driver finishes initially syncing the array's members. When trying to boot with the same configuration again, KVM crashes the moment the bootloader is supposed to take over. I've attached my test-case, which is also available here: http://johannes.truschnigg.info/tmp/kvm-79_segfault_crashatstart.tar.bz2 PLEASE NOTE that the extracted files consume around 21G, due to the zero-filled image files used as array components. The included shellscript, `start.sh`, needs to be adapted to find ubuntu-8.04.1-desktop-amd64.iso on your machine - an image which is available here: http://releases.ubuntu.com/hardy/ubuntu-8.04.1-desktop-amd64.iso I've hit this problem for the first time with KVM-79, but it's still not fixed for me with KVM-81. I'm happy to provide additional information upon request. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html