[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-12 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #10 from Mark kernelbugzilla.org.mark...@dfgh.net ---
well thanks very much guys, I'll cautiously say that a patched guest kernel
seems to resolve it :-)

the bug seemed to appear even when the host is untainted; serial log says
[  375.989736] divide error:  [#1] SMP 

that's a Kubuntu partition that manages without the proprietary drivers while
Fedora won't seem to give graphics properly with nouveau;

not entirely satisfactory of course given that I've now had to modify a virtual
machine that it was important to try to avoid modifying for testing purposes;
although definitely better than crashing :-)

do we keep the bug report open so that the host kvm handling bug gets fixed too
/ is this the right place to report that?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #8 from Mark kernelbugzilla.org.mark...@dfgh.net ---
Created attachment 166461
  -- https://bugzilla.kernel.org/attachment.cgi?id=166461action=edit
dmesg

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #9 from Mark kernelbugzilla.org.mark...@dfgh.net ---
I'll try both of your suggestions, thanks

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

Alan a...@lxorguk.ukuu.org.uk changed:

   What|Removed |Added

 Status|NEW |NEEDINFO
 CC||a...@lxorguk.ukuu.org.uk

--- Comment #6 from Alan a...@lxorguk.ukuu.org.uk ---
Can you reproduce it without the Nvidia blob loaded on the host ?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

Paolo Bonzini bonz...@gnu.org changed:

   What|Removed |Added

 CC||bonz...@gnu.org

--- Comment #7 from Paolo Bonzini bonz...@gnu.org ---
Your guest is probably missing commit c1118b3602c2329671ad5ec8bdf8e374323d6343.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #4 from Mark kernelbugzilla.org.mark...@dfgh.net ---
I should just add that that 'code' segment in the kvm dump is identical every
time, I'd be happy to try to track what is causing it, providing someone could
give me some pointers to debugging that kind of stuff in more detail

Code=00 01 48 c7 c0 8a b0 00 00 31 db 0f b7 0c 01 b8 05 00 00 00 0f 01 c1 0f
1f 44 00 00 5b 41 5c 41 5d 41 5e 5d c3 89 f0 31 c9 f0 0f b0 0d fb 26 e6 00 40

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-02-06 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #5 from Mark kernelbugzilla.org.mark...@dfgh.net ---
Created attachment 166021
  -- https://bugzilla.kernel.org/attachment.cgi?id=166021action=edit
result of guest lsmod; identical for mono-cpu / multi-cpu

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-01-29 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #1 from Mark kernelbugzilla.org.mark...@dfgh.net ---
Created attachment 165181
  -- https://bugzilla.kernel.org/attachment.cgi?id=165181action=edit
serial log during bug; when smp  1

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] New: kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-01-29 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

Bug ID: 92291
   Summary: kvm/guest crashes when smp  1 with AMD FX8300; with
host kernel oops from abrt as well
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.18.3-201.fc21.x86_64 [host], 3.13.0-39-generic
[ubuntu guest]
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: kernelbugzilla.org.mark...@dfgh.net
Regression: No

Created attachment 165171
  -- https://bugzilla.kernel.org/attachment.cgi?id=165171action=edit
kvm register dumps

Overview

Whenever I launch a kvm guest in qemu-system-x86_64 with -smp  1 [-cpu host
-enable-kvm of course], the guest kernel crashes at some stage; I've seen the
kernel crash before completing boot-up, giving the recognizable kernel crash
trace, as well as during 'normal' operation when the guest simply 'freezes'.
The host kernel gives me an oops as well, although abrt won't let me report it
now as I've got Nvidia proprietary drivers 'tainting' the kernel
The bug seems non-specific to guest OS, though mainly it's been *buntu guests,
with various different kernels including lowlatency, I even think I've seen the
bug with a Windows guest. I've tried building qemu from git sources, no
particularly noticeable difference;

Steps to reproduce
==

1) Start linux guest on a multicore system [AMD FX8* series CPU?] with
$ qemu-system-x86_64 -m 2G -cpu host -smp 4 -enable-kvm -hda [image_file] -vga
vmware

2) wait until the guest kernel crashes; virtually every time in less than 10
minutes

Actual results
==
guest kernel crashes, host kernel gives an oops too, presumably as a result of
kvm passthrough

Expected results

guest virtual machine should work normally then keep working normally

it would be good to try to pipe dmesg from the guest to the host serial
connection, in case that would assist?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-01-29 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #3 from Mark kernelbugzilla.org.mark...@dfgh.net ---
Created attachment 165201
  -- https://bugzilla.kernel.org/attachment.cgi?id=165201action=edit
host cpuinfo

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92291] kvm/guest crashes when smp 1 with AMD FX8300; with host kernel oops from abrt as well

2015-01-29 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92291

--- Comment #2 from Mark kernelbugzilla.org.mark...@dfgh.net ---
Created attachment 165191
  -- https://bugzilla.kernel.org/attachment.cgi?id=165191action=edit
serial log when no smp

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2441883 ] KVM guest crashes when using linux-md software RAID5

2009-02-01 Thread SourceForge.net
Bugs item #2441883, was opened at 2008-12-17 19:22
Message generated for change (Settings changed) made by sf-robot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Johannes Truschnigg (c0l0)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM guest crashes when using linux-md software RAID5

Initial Comment:
CPU: Intel Core 2 Quad Q6600 (4 cores)
Distro, kernel: Gentoo GNU/Linux ~amd64, Kernel 2.6.27.9
Bitness, compiler: x86_64, GCC 4.3.2
KVM versions: kvm-79, kvm-81

Trying to assemble a (software) RAID5-array under GNU/Linux, guest kernel 
version 2.6.24, segfaults kvm the second the md-driver finishes initially 
syncing the array's members. When trying to boot with the same configuration 
again, KVM crashes the moment the bootloader is supposed to take over.

I've attached my test-case, which is also available here: 
http://johannes.truschnigg.info/tmp/kvm-79_segfault_crashatstart.tar.bz2

PLEASE NOTE that the extracted files consume around 21G, due to the zero-filled 
image files used as array components. The included shellscript, `start.sh`, 
needs to be adapted to find ubuntu-8.04.1-desktop-amd64.iso on your machine - 
an image which is available here: 
http://releases.ubuntu.com/hardy/ubuntu-8.04.1-desktop-amd64.iso

I've hit this problem for the first time with KVM-79, but it's still not fixed 
for me with KVM-81.

I'm happy to provide additional information upon request.

--

Comment By: SourceForge Robot (sf-robot)
Date: 2009-02-02 02:34

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

--

Comment By: Avi Kivity (avik)
Date: 2008-12-24 10:43

Message:
Please attach a stacktrace from the crash and any messages in the host
kernel log.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-26 Thread Alexander Graf
Marcelo Tosatti wrote:
 Hi Alexander,

 On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote:

   
 Following the discussion on IRC, I tried -no-kvm-irqchip and found some
 virtual machines broken after 1 day of stress testing again:

 + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel
 -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm
 cifsuser=contain2 cifspass=contain2 root=cifs://contain2:conta...@172.1
 6.2.1/contain2 realroot=//172.16.2.1/users/contain2
 ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net
 tap,ifname=tap2,sc
 ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??
 Stuck ??
 BUG: unable to handle kernel NULL pointer dereference at 
 IP: [802b539a] kfree+0x18b/0x26e
 PGD 0
 Oops:  [1] SMP
 last sysfs file:
 CPU 2
 Modules linked in:
 Supported: Yes
 Pid: 0, comm: swapper Tainted: G S2.6.27.7-9-default #1
 RIP: 0010:[802b539a]  [802b539a] kfree+0x18b/0x26e
 RSP: 0018:88007a493e90  EFLAGS: 00010046
 RAX: 0002 RBX: 8800010397f0 RCX: 88007a480778
 RDX: e200 RSI: 8800010397f0 RDI: 88007a5ae140
 RBP:  R08: 8800010395d0 R09: 88007a493eb8
 R10: 80a59980 R11: 8021c5d9 R12: 0001
 R13: 88007ac04080 R14: 10200042 R15: 88007a5ae140
 FS:  () GS:88007a461f40() knlGS:
 CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 CR2:  CR3: 00201000 CR4: 06e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process swapper (pid: 0, threadinfo 88007a48a000, task 88007a488280)
 Stack:  8023df9c 8073a108 0286 8024a1eb
  80259d80 8800010397f0  0001
  000a 10200042 0010 802831d0
 Call Trace:
  [802831d0] __rcu_process_callbacks+0x189/0x203
  [80283271] rcu_process_callbacks+0x27/0x47
  [802464ed] __do_softirq+0x84/0x115
  [8020dc9c] call_softirq+0x1c/0x28
  [8020f067] do_softirq+0x3c/0x81
  [80246204] irq_exit+0x3f/0x83
  [8021ce5f] smp_apic_timer_interrupt+0x95/0xae
  [8020d4a3] apic_timer_interrupt+0x83/0x90
  [80221f1d] native_safe_halt+0x2/0x3
  [80213465] default_idle+0x38/0x54
  [8020b34a] cpu_idle+0xa9/0xf1


 Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08
 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 8b 55
 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00
 RIP  [802b539a] kfree+0x18b/0x26e
  RSP 88007a493e90
 CR2: 
 ---[ end trace 4eaa2a86a8e2da22 ]---


 Also after two days of permanent stress testing I also got the Intel
 machine w/ current git down:

 + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime
 -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
 root=cifs://contain1:conta...@172.16.1.1/contain1
 realroot=//172.16.1.1/users/contain1
 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net
 tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??

 No backtrace here though. That's all I got from the serial console.

 The only issues I had with the UP guests so far was this:

 + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel
 virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain6 cifspass=contain6
 root=cifs://contain6:conta...@172.16.6.1/contain6
 realroot=//172.16.6.1/users/contain6
 ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net
 tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with
 apic=debug and send a report.  Then try booting with the 'noapic' option.

 which can be annoying at times too. Can't we just detect that it's the
 detection and give the guest its interrupts? Or should the PIT
 reinjection thing help here?
 

 There are a number of problems that can result in this error, and the
 problems are possibly different between the in-kernel PIT and userspace
 PIT emulation (note it also happens with in-kernel PIT, just much more
 rarely now). You can use the 

Re: KVM guest crashes

2009-01-26 Thread Marcelo Tosatti
On Mon, Jan 26, 2009 at 04:53:21PM +0100, Alexander Graf wrote:
  There are a number of problems that can result in this error, and the
  problems are possibly different between the in-kernel PIT and userspace
  PIT emulation (note it also happens with in-kernel PIT, just much more
  rarely now). You can use the no_timer_check kernel option to bypass it.

 
 Hm - that option disables the whole check, making it always fail. I
 haven't seen any way to actually disable the check, telling Linux things
 are OK :-(.

Hum, the option makes timer_irq_works always return true. Works for me
with in-kernel PIT.

What you see with apic=debug no_timer_check ?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-26 Thread Alexander Graf
Marcelo Tosatti wrote:
 On Mon, Jan 26, 2009 at 04:53:21PM +0100, Alexander Graf wrote:
   
 There are a number of problems that can result in this error, and the
 problems are possibly different between the in-kernel PIT and userspace
 PIT emulation (note it also happens with in-kernel PIT, just much more
 rarely now). You can use the no_timer_check kernel option to bypass it.
   
   
 Hm - that option disables the whole check, making it always fail. I
 haven't seen any way to actually disable the check, telling Linux things
 are OK :-(.
 

 Hum, the option makes timer_irq_works always return true. Works for me
 with in-kernel PIT.

 What you see with apic=debug no_timer_check ?
   


It does work with noapic for me, but that means I'm using the old PIC
(which isn't necessarily bad, right?). So I can at least work around the
issue for us now. It still needs to be fixed nevertheless.

with apic=debug no_apic_timer 2.6.27 does:

Setting APIC routing to flat
..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
. (found apic 0 pin 0) ...
... works.


while 2.6.25 does:

..MP-BIOS bug: 8254 timer not connected to IO-APIC
Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the
'noapic' kernel parameter

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-24 Thread Marcelo Tosatti
On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote:
 rarely now). You can use the no_timer_check kernel option to bypass  
 it.

 Ok :-). Thanks. The logic in the kernel for this is really stupid  
 (basing timing on clock speed). What about disabling the check if we  
 detect KVM?

Yes, this is an option. We've talked about it before, but no patch was
merged. The RHEL5.3 kernel skips those checks when it detects VMWare 
or KVM hypervisors.

We should understand what is happening to fix the fullvirt/old guest
case. For the in-kernel PIT, I believe there is a bug somewhere, either
in PIT itself or in the interaction with IOAPIC (failure to inject
interrupts for some reason). I started debugging it by constantly
reboot'ing an SMP guest but my testbox died. Hope to get back to it
soon.

 Regarding the corruption problem, I have a few questions:

 - It is SMP specific (ie both kernel/userspace irqchip fail).
  - which means UP guests are stable with both kernel/user
irqchip.

 I have not been able to reproduce any of my issues with UP. I have to  
 admit that I only tried UP with in-kernel irqchip.

OK.

 The Stuck ?? messages seem to be coming from smpboot.c. So for some
 reason vcpu's are being reset. Don't seem to be a triple fault because
 in that case all vcpu's would be reset (so yes, the vcpu was really on
 BIOS code).

 Hm. I know that OSX turns off CPUs it doesn't need as an alternative to 
 deep-sleep. Does Linux do that too?

Not that I know of, unless you offline CPU's manually, which does not
seem to be the case.

 Suggest the following:
 - Confirm the problem happens with root on ext3 filesystem (can't you
  mount the CIFS and copy the data over to a local guest disk to
  simulate similar load?).

 I had Stuck ?? messages without networking, but if it helps I can try  
 that too. In the project we're using this for we do things over cifs, so 
 that's why I built the test case around it.

OK. Just trying to decrease the variables involved. I'll setup a machine
to run a similar load next week.

 - Check that the kernel text is not corrupted. Save the good kernel
  text with QEMU's pmemsave or memsave (you can see start/end in
  the symbols _text/_etext, /proc/kallsyms) after booting. After you
  see the crash, save the bad kernel text, compare. This can give
  additional clues (or not).

 Good idea - I'll try.

 Also, you mentioned other reports previously, can you point to them,
 please?

 Yes, will do later. I gotta run now! Thanks for the reply - it's good to 
 know this isn't getting ignored :-).

Have a good weekend.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-24 Thread Alexander Graf


On 24.01.2009, at 14:06, Marcelo Tosatti wrote:


On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote:

rarely now). You can use the no_timer_check kernel option to bypass
it.


Ok :-). Thanks. The logic in the kernel for this is really stupid
(basing timing on clock speed). What about disabling the check if we
detect KVM?


Yes, this is an option. We've talked about it before, but no patch was
merged. The RHEL5.3 kernel skips those checks when it detects VMWare
or KVM hypervisors.


That sounds clever. But I doubt I'll get anything as intrusive into  
the SLES11 kernel at this point in time :-(.



We should understand what is happening to fix the fullvirt/old guest
case. For the in-kernel PIT, I believe there is a bug somewhere,  
either

in PIT itself or in the interaction with IOAPIC (failure to inject
interrupts for some reason). I started debugging it by constantly
reboot'ing an SMP guest but my testbox died. Hope to get back to it
soon.


Hm. If I ever get tracing working again, I can try to create one  
too :-).


The Stuck ?? messages seem to be coming from smpboot.c. So for  
some
reason vcpu's are being reset. Don't seem to be a triple fault  
because
in that case all vcpu's would be reset (so yes, the vcpu was  
really on

BIOS code).


Hm. I know that OSX turns off CPUs it doesn't need as an  
alternative to

deep-sleep. Does Linux do that too?


Not that I know of, unless you offline CPU's manually, which does not
seem to be the case.


Nope, I don't hotplug anything (though the acpihp module is loaded).


Suggest the following:
- Confirm the problem happens with root on ext3 filesystem (can't  
you

mount the CIFS and copy the data over to a local guest disk to
simulate similar load?).


I had Stuck ?? messages without networking, but if it helps I can try
that too. In the project we're using this for we do things over  
cifs, so

that's why I built the test case around it.


OK. Just trying to decrease the variables involved. I'll setup a  
machine

to run a similar load next week.


Sounds good :-). I put all the files I tested with online with a link  
in the first mail of this thread. So feel free to take that as an  
inspiration. For non-network testing I simply put -net none there, but  
still had the initrd boot and kill the machine.



Also, you mentioned other reports previously, can you point to  
them,

please?


Yes, will do later. I gotta run now! Thanks for the reply - it's  
good to

know this isn't getting ignored :-).


Have a good weekend.


Same to you. I was running for a first-aid course though, not the  
weekend :-).


I was mainly talking here about the thread Guest Hang Bugs. Though  
with 2.6.25 guests I did get BUG: soft lockup - CPU#x stuck for ns!  
messages instead of the Stuck ?? FWIW.
Originally I created the whole test case to debug this exact bug we  
encountered as well: http://article.gmane.org/gmane.comp.emulators.kvm.devel/21828/


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-23 Thread Alexander Graf
Alexander Graf wrote:
 Alexander Graf wrote:
   
 Alexander Graf wrote:

 [...]
   
 
 Also after two days of permanent stress testing I also got the Intel
 machine w/ current git down:

 + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime
 -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
 root=cifs://contain1:conta...@172.16.1.1/contain1
 realroot=//172.16.1.1/users/contain1
 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net
 tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??

 No backtrace here though. That's all I got from the serial console.
   
 
   
 + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime
 -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
 root=cifs://contain1:conta...@172.16.1.1/contain1
 realroot=//172.16.1.1/users/contain1
 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net
 tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??
 
[...]

In order to provide you with more dumps that might point to some
direction (I'm still lost on figuring where to look), here's another AMD
NPT guest crash with current git. It somehow looks as if the guest
pagetable is corrupted.

+ sudo -u contain3 env -i /usr/local/bin/qemu-system-x86_64 -localtime
-kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
clocksource=acpi_pm cifsuser=con
tain3 cifspass=contain3
root=cifs://contain3:conta...@172.16.3.1/contain3
realroot=//172.16.3.1/users/contain3
ip=172.16.3.2:172.16.3.1::255.255.255.0::eth0:none console=tty
S0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:3
-net tap,ifname=tap3,script=/bin/true -m 2000 -nographic -smp 8
-no-kvm-irqchip /dev/null
qemu: loading initrd (0x1daf359 bytes) at 0x7b24
pci :00:01.0: PIIX3: Enabling Passive Release
IP-Config: Device `eth0' not found.
doing fast boot
Creating device nodes with udev
^MBoot logging started on /dev/ttyS0(/dev/console) at Thu Jan 22
23:05:55 2009^M
[NETWORK] using static config based on
ip=172.16.3.2:172.16.3.1::255.255.255.0::eth0:none^M
Trying manual resume from /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1^M
resume device /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1 not found
(ignoring)^M
Trying manual resume from /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1^M
resume device /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1 not found
(ignoring)^M
node name not found^M
Mounting root //172.16.3.1/contain3^M
RTNETLINK answers: File exists^M
1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN ^M
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00^M
inet 127.0.0.1/8 scope host lo^M
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast
state UNKNOWN qlen 1000^M
link/ether 52:54:00:12:34:03 brd ff:ff:ff:ff:ff:ff^M
inet 172.16.3.2 peer 172.16.3.1/24 scope global eth0^M
BUG: unable to handle kernel paging request at 00100100
IP: [8036a603] strnlen+0x10/0x19
PGD 7c596067 PUD 7c9ed067 PMD 0
Oops:  [1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
CPU 7
Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci
virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core
ata_generic sata_nv libata scsi_mod
dock thermal processor thermal_sys hwmon
Supported: Yes, External
Pid: 782, comm: halt Tainted: G S2.6.27.7-9-default #1
RIP: 0010:[8036a603]  [8036a603] strnlen+0x10/0x19
RSP: 0018:88007c46da70  EFLAGS: 00010082
RAX: 00100100 RBX:  RCX: 
RDX: 00100100 RSI: fffe RDI: 00100100
RBP: 80ae0fad R08:  R09: 
R10: 000a R11:  R12: 00100100
R13:  R14: 80ae13a0 R15: 
FS:  7f0b2aee06f0() GS:88007a57bf40() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00100100 CR3: 7c4e5000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process halt (pid: 782, threadinfo 88007c46c000, task 88007c17e0c0)
Stack:  8036b39d 88007c46ddb8 80ae0fad 805d7e29
   8036b6f6 7f0b2ace27e0
 88007c595ab0 88007c0624a8 0400 80ae0fa0
Call Trace:
 [8036b39d] string+0x34/0x91
 [8036b6f6] vsnprintf+0x2fc/0x574
 [8036ba56] 

Re: KVM guest crashes

2009-01-23 Thread Marcelo Tosatti
Hi Alexander,

On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote:

 Following the discussion on IRC, I tried -no-kvm-irqchip and found some
 virtual machines broken after 1 day of stress testing again:
 
 + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel
 -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm
 cifsuser=contain2 cifspass=contain2 root=cifs://contain2:conta...@172.1
 6.2.1/contain2 realroot=//172.16.2.1/users/contain2
 ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net
 tap,ifname=tap2,sc
 ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??
 Stuck ??
 BUG: unable to handle kernel NULL pointer dereference at 
 IP: [802b539a] kfree+0x18b/0x26e
 PGD 0
 Oops:  [1] SMP
 last sysfs file:
 CPU 2
 Modules linked in:
 Supported: Yes
 Pid: 0, comm: swapper Tainted: G S2.6.27.7-9-default #1
 RIP: 0010:[802b539a]  [802b539a] kfree+0x18b/0x26e
 RSP: 0018:88007a493e90  EFLAGS: 00010046
 RAX: 0002 RBX: 8800010397f0 RCX: 88007a480778
 RDX: e200 RSI: 8800010397f0 RDI: 88007a5ae140
 RBP:  R08: 8800010395d0 R09: 88007a493eb8
 R10: 80a59980 R11: 8021c5d9 R12: 0001
 R13: 88007ac04080 R14: 10200042 R15: 88007a5ae140
 FS:  () GS:88007a461f40() knlGS:
 CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 CR2:  CR3: 00201000 CR4: 06e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process swapper (pid: 0, threadinfo 88007a48a000, task 88007a488280)
 Stack:  8023df9c 8073a108 0286 8024a1eb
  80259d80 8800010397f0  0001
  000a 10200042 0010 802831d0
 Call Trace:
  [802831d0] __rcu_process_callbacks+0x189/0x203
  [80283271] rcu_process_callbacks+0x27/0x47
  [802464ed] __do_softirq+0x84/0x115
  [8020dc9c] call_softirq+0x1c/0x28
  [8020f067] do_softirq+0x3c/0x81
  [80246204] irq_exit+0x3f/0x83
  [8021ce5f] smp_apic_timer_interrupt+0x95/0xae
  [8020d4a3] apic_timer_interrupt+0x83/0x90
  [80221f1d] native_safe_halt+0x2/0x3
  [80213465] default_idle+0x38/0x54
  [8020b34a] cpu_idle+0xa9/0xf1
 
 
 Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08
 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 8b 55
 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00
 RIP  [802b539a] kfree+0x18b/0x26e
  RSP 88007a493e90
 CR2: 
 ---[ end trace 4eaa2a86a8e2da22 ]---
 
 
 Also after two days of permanent stress testing I also got the Intel
 machine w/ current git down:
 
 + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime
 -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
 root=cifs://contain1:conta...@172.16.1.1/contain1
 realroot=//172.16.1.1/users/contain1
 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net
 tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??
 
 No backtrace here though. That's all I got from the serial console.
 
 The only issues I had with the UP guests so far was this:
 
 + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel
 virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain6 cifspass=contain6
 root=cifs://contain6:conta...@172.16.6.1/contain6
 realroot=//172.16.6.1/users/contain6
 ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net
 tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with
 apic=debug and send a report.  Then try booting with the 'noapic' option.
 
 which can be annoying at times too. Can't we just detect that it's the
 detection and give the guest its interrupts? Or should the PIT
 reinjection thing help here?

There are a number of problems that can result in this error, and the
problems are possibly different between the in-kernel PIT and userspace
PIT emulation (note it also happens with in-kernel PIT, just much more
rarely now). You can use the no_timer_check kernel option to bypass 

Re: KVM guest crashes

2009-01-22 Thread Alexander Graf
Alexander Graf wrote:
 Alexander Graf wrote:

 [...]
   
 Also after two days of permanent stress testing I also got the Intel
 machine w/ current git down:

 + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime
 -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
 root=cifs://contain1:conta...@172.16.1.1/contain1
 realroot=//172.16.1.1/users/contain1
 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net
 tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??

 No backtrace here though. That's all I got from the serial console.
   
 

 + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime
 -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
 clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
 root=cifs://contain1:conta...@172.16.1.1/contain1
 realroot=//172.16.1.1/users/contain1
 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net
 tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
 qemu: loading initrd (0x1daf359 bytes) at 0x7b24
 Stuck ??

 (qemu) info cpus
 * CPU #0: pc=0x80221f1d thread_id=15211
   CPU #1: pc=0x80221f1d thread_id=15212
   CPU #2: pc=0x80221f1d thread_id=15213
   CPU #3: pc=0x80221f1d thread_id=15214
   CPU #4: pc=0x8049f7d0 thread_id=15215
   CPU #5: pc=0x80221f1d thread_id=15216
   CPU #6: pc=0x80221f1d thread_id=15217
   CPU #7: pc=0x0009f02c thread_id=15218

 (qemu) cpu 7
 (qemu) info registers
 EAX=0c06 EBX=05b8 ECX= EDX=
 ESI= EDI= EBP= ESP=
 EIP=002c EFL=00033002 [---] CPL=3 II=0 A20=1 SMM=0 HLT=0
 ES =   f300
 CS =9f00 0009f000  f300
 SS =   f300
 DS =   f300
 FS =   f300
 GS =   f300
 LDT=   8200
 TR = fffbd000 2088 8b00
 GDT=  
 IDT=  
 CR0=6010 CR2= CR3= CR4=
 DR0= DR1= DR2= DR3=
 DR6=0ff0 DR7=0400
 FCW=037f FSW= [ST=0] FTW=00 MXCSR=
 FPR0=  FPR1= 
 FPR2=  FPR3= 
 FPR4=  FPR5= 
 FPR6=  FPR7= 
 XMM00=
 XMM01=
 XMM02=
 XMM03=
 XMM04=
 XMM05=
 XMM06=
 XMM07=

 Is that guest really seriously in BIOS code? After booting Linux?

 (qemu) x /2i $pc-1
 0x0009f02b:  hlt   
 0x0009f02c:  jmp0x9f02b

 Where is this? Looks like panic code to me.
   
0x0009f000:  cli   
0x0009f001:  xor%ax,%ax
0x0009f003:  mov%ax,%ds
0x0009f005:  mov$0x510,%ebx
0x0009f00b:  addr32 mov (%ebx),%ecx
0x0009f00f:  test   %ecx,%ecx
0x0009f012:  je 0x9f026
0x0009f014:  addr32 mov 0x4(%ebx),%eax
0x0009f019:  addr32 mov 0x8(%ebx),%edx
0x0009f01e:  wrmsr 
0x0009f020:  add$0xc,%ebx
0x0009f024:  jmp0x9f00b
0x0009f026:  lock incw 1856
0x0009f02b:  hlt   
0x0009f02c:  jmp0x9f02b

Looks a lot like this:

smp_ap_boot_code_start:
  cli
  xor %ax, %ax
  mov %ax, %ds

  mov $SMP_MSR_ADDR, %ebx
11:
  mov 0(%ebx), %ecx
  test %ecx, %ecx
  jz 12f
  mov 4(%ebx), %eax
  mov 8(%ebx), %edx
  wrmsr
  add $12, %ebx
  jmp 11b
12:

  lock incw smp_cpus
1:
  hlt
  jmp 1b


But that code shouldn't run after Linux booted, right? And without at
least a Power Off message I'd expect Linux to still be up.
The only thing the host's dmesg was saying is this:

Ignoring delivery mode 3 (repeated often)

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-21 Thread Alexander Graf
Avi Kivity wrote:
 Alexander Graf wrote:
 The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1
 (2.6.27) kernels.

 Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2
 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2

 I'm somewhat lost on the reason for these failures, so if you do have
 some time on your hands, please give me a hand debugging this! If I'd
 had to guess, I'd say it's either an APIC issue and/or guest memory
 corruption.
   

 I'd guess memory corruption.

 Does running a uniprocessor guest help?  What about a uniprocessor
 guest pinned to one host core?

So last night I started several guests with -smp 8 but without network
to see if IO load is causing the problems. All VMs are down, but one
panic log is rather new:

Stuck ??
Stuck ??
Stuck ??
Stuck ??
Stuck ??
Stuck ??
BUG: unable to handle kernel NULL pointer dereference at 
IP: [80237454] cpu_attach_domain+0x84/0x207
PGD 0
Oops:  [1] SMP
last sysfs file:
CPU 1
Modules linked in:
Supported: Yes
Pid: 1, comm: swapper Tainted: G S2.6.27.11-1-default #1
RIP: 0010:[80237454]  [80237454]
cpu_attach_domain+0x84/0x207
RSP: 0018:88007a419c50  EFLAGS: 00010202
RAX:  RBX: 880001077a60 RCX: 88007a419c40
RDX: 044d RSI: 0200 RDI: 
RBP: 88007a419c90 R08:  R09: 0200
R10: 0008 R11: 00018600 R12: 8800010778d0
R13: 880001077a78 R14: 8800010775b0 R15: 88000107f700
FS:  () GS:88007afeb540() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2:  CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 88007a418000, task 88007a406040)
Stack:  00047a4616c0 88007a548000 002f044d 0004
 80a275b0  88007a460e00 88007a45c140
 88007a419ec0 80238190 88007a419dc0 88007a419e00
Call Trace:
 [80238190] __build_sched_domains+0xbb9/0xbf5
 [80981ae4] sched_init_smp+0xa9/0x1d8
 [8096b850] kernel_init+0x74/0xea
 [8020cf79] child_rip+0xa/0x11


Code: 00 4c 89 ef 89 45 d4 8b 83 88 00 00 00 89 45 d0 e8 d1 05 13 00 ff
c8 74 5d 8b 93 88 00 00 00 f7 c2 8f 02 00 00 74 0d 48 8b 43 10 48 3b
00 0f 85 24 01 00 00 80 e2 70 0f 85 1b 01 00 00 eb 37 48
RIP  [80237454] cpu_attach_domain+0x84/0x207
 RSP 88007a419c50
CR2: 
---[ end trace 4eaa2a86a8e2da22 ]---
Kernel panic - not syncing: Attempted to kill init!


From what I've seen it's always related to IPIs, but that's just a
guess. I'll start UP testing now.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-21 Thread Avi Kivity

Alexander Graf wrote:

Avi Kivity wrote:
  

Alexander Graf wrote:


The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1
(2.6.27) kernels.

Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2
And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2

I'm somewhat lost on the reason for these failures, so if you do have
some time on your hands, please give me a hand debugging this! If I'd
had to guess, I'd say it's either an APIC issue and/or guest memory
corruption.
  
  

I'd guess memory corruption.

Does running a uniprocessor guest help?  What about a uniprocessor
guest pinned to one host core?



So last night I started several guests with -smp 8 but without network
to see if IO load is causing the problems. All VMs are down, but one
panic log is rather new:

Stuck ??
Stuck ??
Stuck ??
Stuck ??
Stuck ??
Stuck ??
BUG: unable to handle kernel NULL pointer dereference at 
IP: [80237454] cpu_attach_domain+0x84/0x207
  


This is right on startup, if I read things right.

I suggest checking if you have the latest BIOS update applied.  I've had 
bad experiences with un-updated processors.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-21 Thread Avi Kivity

Avi Kivity wrote:


I suggest checking if you have the latest BIOS update applied.  I've 
had bad experiences with un-updated processors.




FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4 
Barcelona host, happily make -j16ing an allmodconfig kernel.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-21 Thread Alexander Graf
Avi Kivity wrote:
 Avi Kivity wrote:

 I suggest checking if you have the latest BIOS update applied.  I've
 had bad experiences with un-updated processors.


 FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4
 Barcelona host, happily make -j16ing an allmodconfig kernel.

Strange. I started the tests again with an updated BIOS now, installing
an Intel machine to test on in parallel.

old:

# ./rdmsr /dev/cpu/0/msr $(( 0x008b ))
0x165

new:

# ./rdmsr /dev/cpu/0/msr $(( 0x008b ))
0x183


But I already got one guest crashing:

int3:  [1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
CPU 2
Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci
virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core
ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys
 hwmon
Supported: Yes, External
Pid: 0, comm: swapper Tainted: G S2.6.27.7-9-default #1
RIP: 0010:[80a500f1]  [80a500f1]
per_cpu__cpu_state+0x1/0x4
RSP: 0018:88007a493fa8  EFLAGS: 0083
RAX: 806f5fa0 RBX: 80a500f0 RCX: 
RDX: 880001033200 RSI:  RDI: ff5fc0b0
RBP: 88007a48beb0 R08:  R09: 880001039638
R10:  R11: 8021c5d9 R12: 
R13:  R14:  R15: 
FS:  7fe3252e4950() GS:88007a461f40() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 0062d000 CR3: 7c10a000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 88007a48a000, task 88007a488280)
Stack:  88007a48beb0 8020ca2e 88007a48beb0 EOI 
007dd83ce327
 0086 8800010396d0 02625a00 0002
 0001eadc 007dd83ce327 0292 0292
Call Trace:
Inexact backtrace:

 IRQ  [8020ca2e] ? ret_from_intr+0x0/0x29
 EOI  [804a6992] ? notifier_call_chain+0x29/0x4c
 [80213465] ? default_idle+0x38/0x54
 [8020b34a] ? cpu_idle+0xa9/0xf1


Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
RIP  [80a500f1] per_cpu__cpu_state+0x1/0x4
 RSP 88007a493fa8
---[ end trace 17313f34f216af07 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
[ cut here ]
WARNING: at kernel/smp.c:331 smp_call_function_mask+0x38/0x1f2()
Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci
virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core
ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys
 hwmon
Supported: Yes, External
Pid: 0, comm: swapper Tainted: G SD   2.6.27.7-9-default #1

Call Trace:
 [8020e42e] show_trace_log_lvl+0x41/0x58
 [804a1e97] dump_stack+0x69/0x6f
 [80240eb2] warn_on_slowpath+0x51/0x77
 [80261fef] smp_call_function_mask+0x38/0x1f2
 [802621d2] smp_call_function+0x29/0x2e
 [8021ba16] native_smp_send_stop+0x1a/0x3f
 [804a1f59] panic+0xbc/0x170
 [802449e2] do_exit+0x6b/0x334
 [804a4b9b] oops_begin+0x0/0x9e
 [804a524a] do_int3+0x7d/0xa1
 [804a46e6] int3+0xb6/0xf0
 [80a500f1] per_cpu__cpu_state+0x1/0x4
DWARF2 unwinder stuck at per_cpu__cpu_state+0x1/0x4

Leftover inexact backtrace:

 IRQ  [8020ca2e] ret_from_intr+0x0/0x29
 EOI  [804a6992] notifier_call_chain+0x29/0x4c
 [80213465] default_idle+0x38/0x54
 [8020b34a] cpu_idle+0xa9/0xf1

---[ end trace 17313f34f216af07 ]---


The UP guests seemed to work fine - will start them again now.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM guest crashes

2009-01-20 Thread Alexander Graf
Hi list,

recently I've been hitting some KVM bugs others seem to have reported as
well, including

- CIFS timeouts
- Stuck ?? errors
- Random segmentation faults in the guest

so I figured, I'll put together a stress test that can be used to
reproduce these issues. This is done by using a CIFS mount on the host
and unpacking data from that mount to the mount. I have been able to
bring kvm down to its knees a lot just by doing this.
Simply run the test in an endless-loop. FWIW enabling NPT helps
triggering the issue.

The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1
(2.6.27) kernels.

Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2
And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2

I'm somewhat lost on the reason for these failures, so if you do have
some time on your hands, please give me a hand debugging this! If I'd
had to guess, I'd say it's either an APIC issue and/or guest memory
corruption.

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-20 Thread Avi Kivity

Alexander Graf wrote:

Hi list,

recently I've been hitting some KVM bugs others seem to have reported as
well, including

- CIFS timeouts
- Stuck ?? errors
- Random segmentation faults in the guest

so I figured, I'll put together a stress test that can be used to
reproduce these issues. This is done by using a CIFS mount on the host
and unpacking data from that mount to the mount. I have been able to
bring kvm down to its knees a lot just by doing this.
Simply run the test in an endless-loop. FWIW enabling NPT helps
triggering the issue.

  


Are the problems specific to AMD?  What does helps triggering mean - 
does it happen with NPT disabled?



The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1
(2.6.27) kernels.

Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2
And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2

I'm somewhat lost on the reason for these failures, so if you do have
some time on your hands, please give me a hand debugging this! If I'd
had to guess, I'd say it's either an APIC issue and/or guest memory
corruption.
  


I'd guess memory corruption.

Does running a uniprocessor guest help?  What about a uniprocessor guest 
pinned to one host core?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM guest crashes

2009-01-20 Thread Alexander Graf





On 20.01.2009, at 21:07, Avi Kivity a...@redhat.com wrote:


Alexander Graf wrote:

Hi list,

recently I've been hitting some KVM bugs others seem to have  
reported as

well, including

- CIFS timeouts
- Stuck ?? errors
- Random segmentation faults in the guest

so I figured, I'll put together a stress test that can be used to
reproduce these issues. This is done by using a CIFS mount on the  
host

and unpacking data from that mount to the mount. I have been able to
bring kvm down to its knees a lot just by doing this.
Simply run the test in an endless-loop. FWIW enabling NPT helps
triggering the issue.




Are the problems specific to AMD?


I don't know, as all machines I tried it on were AMD so far. But  
judging from user reports on the ml, it happens on Intel too.



What does helps triggering mean - does it happen with NPT disabled?


It seems like the chances for breakage are higher with NPT enabled. I  
do see them without as well though.






The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1
(2.6.27) kernels.

Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2
And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2

I'm somewhat lost on the reason for these failures, so if you do have
some time on your hands, please give me a hand debugging this! If I'd
had to guess, I'd say it's either an APIC issue and/or guest memory
corruption.



I'd guess memory corruption.

Does running a uniprocessor guest help?  What about a uniprocessor  
guest pinned to one host core?


I'll try to start tests tomorrow.

Alex




--
Do not meddle in the internals of kernels, for they are subtle and  
quick to panic.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2441883 ] KVM guest crashes when using linux-md software RAID5

2008-12-17 Thread SourceForge.net
Bugs item #2441883, was opened at 2008-12-17 20:22
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Johannes Truschnigg (c0l0)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM guest crashes when using linux-md software RAID5

Initial Comment:
CPU: Intel Core 2 Quad Q6600 (4 cores)
Distro, kernel: Gentoo GNU/Linux ~amd64, Kernel 2.6.27.9
Bitness, compiler: x86_64, GCC 4.3.2
KVM versions: kvm-79, kvm-81

Trying to assemble a (software) RAID5-array under GNU/Linux, guest kernel 
version 2.6.24, segfaults kvm the second the md-driver finishes initially 
syncing the array's members. When trying to boot with the same configuration 
again, KVM crashes the moment the bootloader is supposed to take over.

I've attached my test-case, which is also available here: 
http://johannes.truschnigg.info/tmp/kvm-79_segfault_crashatstart.tar.bz2

PLEASE NOTE that the extracted files consume around 21G, due to the zero-filled 
image files used as array components. The included shellscript, `start.sh`, 
needs to be adapted to find ubuntu-8.04.1-desktop-amd64.iso on your machine - 
an image which is available here: 
http://releases.ubuntu.com/hardy/ubuntu-8.04.1-desktop-amd64.iso

I've hit this problem for the first time with KVM-79, but it's still not fixed 
for me with KVM-81.

I'm happy to provide additional information upon request.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2441883group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html