Re: KVM Guest mmap.c bug

2010-03-09 Thread Bruno Cesar Ribas
On Mon, Mar 08, 2010 at 03:49:01PM +0100, Andrea Arcangeli wrote:
 On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote:
  It looks unrelated to kvm, though of course random memory corruption 
  cannot be ruled out.
  
  Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)?
  
  Andrea, any idea?
 
 Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma
 despite vma is the one with the smaller vm_end where the comparison
 vma-vm_start  vma-vm_end is true (the next vma is null and the
 prev will have vma-vm_start == prev-vm_end, not ).
 
 The bug check looks right, it doesn't seem false positive and this
 bugcheck indicates that the vma rbtree is memory-corrupted somehow.
 
 so yes fiddling with npt on and off sounds a good start, if it's a bug

I can confirm it happens with npt on and off.

And it also happens on a Nehalem XEON (it just happened).

 in shadow paging it's unlikely the exact same bug materializes with
 both npt and without. If the crash happens with npt on and off, then
 maybe it's not hypervisor related. Could also be bad RAM if it only

I doubt it is bad ram! This machine is working (wihtout KVM) for almost 2
years and MCE does not report any problems on the host machine.

And it happens on two identical machines (Opteron) and now o the new (5 days
old) Intel Nehalem XEON.

All guest are Running the same kernel. It happens with a kernel compiled by
me and from debian SID both 2.6.32.9, and from previous kernel I tried
(2.6.31.12 and 2.6.27.45)

 happens on a single host and all other hosts are fine with same binary
 guest/host kernels (rbtree walk might stress the memory bus more than
 other operations). Said that vm_next being null (and if it's null,
 likely vm_next pointer has no ram bitflip) is a bit weird and not
 common scenario and this page fault seems triggered with procfs
 copy_user call which is non standard, so maybe this is a guest bug. It
 would be interesting to know what is the vm_start address, at the end
 there are stack, vdso and vsyscall areas.

I'll make it print vm_start for next reboot.

-- 
Bruno Ribas - ri...@c3sl.ufpr.br
http://www.inf.ufpr.br/ribas
C3SL: http://www.c3sl.ufpr.br
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Guest mmap.c bug

2010-03-08 Thread Avi Kivity

On 03/02/2010 10:25 PM, BRUNO CESAR RIBAS wrote:

Hi,

I run a bunch of virtual servers using KVM. And I a mmap.c bug on the guest
machine. The virtual machines are desktop servers for Thin Clients.

My host is running a 2.6.33 kernel and have 32GB of rami, opteron with
amd-v.

The guest is running 2.6.27.45 (tried 2.6.31.12, 2.6.32.9, 2.6.33), some
guests are using 10GB, 4GB or 20GB of ram.

My qemu-kvm version is 0.12.3

All guests are using NFSROOT as the ROOT FS and virtio as the network
driver.

I run the guest with:
kvm  -cpu kvm64 -smp 4 -vnc :101 -daemonize -name ${NOME} -localtime -m $RAM
-net nic,macaddr=$VLAN0,model=virtio,vlan=0 -net tap,vlan=0,ifname=${NOME}0\
-net nic,macaddr=$VLAN121,model=virtio,vlan=121 -net 
tap,vlan=121,ifname=${NOME}121\
-net nic,macaddr=$VLAN112,model=virtio,vlan=112 -net 
tap,vlan=112,ifname=${NOME}112\
-kernel /root/vmlinuz-2.6.27.45-amd64-aufs-guest \
-append root=/dev/nfs rw ip=dhcp nfsroot=$5 init=/sbin/boot.sh


I have a machine running an identical kernel (without virtio stuff) for a
dedicated machine (as it does not have amd-v) and it stays up for days and
even months. But when running a guest machine with qemu-kvm i get some bug
message and lots of process in D state and i can't 'ps aux' or look inside
/proc and /sys without losing my shell (it hangs).

   



In `console` I get the folowing message, repeated for different processor,
different Pid and diferent  mmap.c line (line 486 appears to).

[ cut here ]
kernel BUG at mm/mmap.c:869!
invalid opcode:  [1] SMP
CPU 2
Pid: 31334, comm: nautilus Not tainted 2.6.27.45-amd64-aufs-guest-00267
  #2
RIP: 0010:[8027b2e1]  [8027b2e1] find_mergeable_ano
f1/0x200
RSP: :8804d933fb38  EFLAGS: 00010283
RAX: 8804cb44b9a8 RBX: 8804cb44b978 RCX: 8804fe6d3088
RDX: f4803000 RSI: 8804fe6d3088 RDI: 88049fa56138
RBP: 88049fa56138 R08: 8804d933e000 R09: 
R10:  R11:  R12: 00100073
R13: 00100073 R14: f4803000 R15: 806ce6c0
FS:  () GS:88051cc7d440(0063) knlGS:f41
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: f4803000 CR3: 0004a7d39000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process nautilus (pid: 31334, threadinfo 8804d933e000, task 880
)
Stack:  8052e62d   88049fa5
  88051a5aac40 80280382 8804cb41b790 880498919018
   88049f8dad20 3000 802770aa
Call Trace:
  [8052e62d] ? _spin_lock_irq+0xd/0x10
  [80280382] ? anon_vma_prepare+0x52/0x100
  [802770aa] ? handle_mm_fault+0x65a/0x900
  [802de6d8] ? proc_alloc_inode+0x58/0x90
  [8052e545] ? __down_read+0x85/0xbc
  [80223331] ? do_page_fault+0x2a1/0xab0
  [803d6899] ? vsnprintf+0x4d9/0x750
  [8029d7a1] ? do_lookup+0x81/0x240
  [8027265d] ? zone_statistics+0x7d/0x80
  [8052ea3a] ? error_exit+0x0/0x70
  [803d706d] ? copy_user_generic_string+0x2d/0x40
  [802e35ec] ? proc_file_read+0x12c/0x2e0
  [802e34c0] ? proc_file_read+0x0/0x2e0
  [802dec1a] ? proc_reg_read+0x8a/0xe0
  [80295995] ? vfs_read+0xb5/0x160
  [80295b2e] ? sys_read+0x4e/0x90
  [80227004] ? ia32_sysret+0x0/0x5


Code: 29 d0 48 c1 e8 0c 48 01 f8 48 3b 83 88 00 00 00 0f 85 5b fe ff ff
  78 e9 c5 fe ff ff 0f 1f 00 31 f6 31 db e9 a9 fe ff ff0f  0b eb fe 66
  1f 84 00 00 00 00 00 48 83 ec 08 48 8b
RIP  [8027b2e1] find_mergeable_anon_vma+0x1f1/0x200
  RSP8804d933fb38
---[ end trace e5ca25224cd7d1d4 ]---


Does anyone has a sugestion? Where to look? What else should I trace?

   


It looks unrelated to kvm, though of course random memory corruption 
cannot be ruled out.


Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)?

Andrea, any idea?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Guest mmap.c bug

2010-03-08 Thread Andrea Arcangeli
On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote:
 It looks unrelated to kvm, though of course random memory corruption 
 cannot be ruled out.
 
 Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)?
 
 Andrea, any idea?

Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma
despite vma is the one with the smaller vm_end where the comparison
vma-vm_start  vma-vm_end is true (the next vma is null and the
prev will have vma-vm_start == prev-vm_end, not ).

The bug check looks right, it doesn't seem false positive and this
bugcheck indicates that the vma rbtree is memory-corrupted somehow.

so yes fiddling with npt on and off sounds a good start, if it's a bug
in shadow paging it's unlikely the exact same bug materializes with
both npt and without. If the crash happens with npt on and off, then
maybe it's not hypervisor related. Could also be bad RAM if it only
happens on a single host and all other hosts are fine with same binary
guest/host kernels (rbtree walk might stress the memory bus more than
other operations). Said that vm_next being null (and if it's null,
likely vm_next pointer has no ram bitflip) is a bit weird and not
common scenario and this page fault seems triggered with procfs
copy_user call which is non standard, so maybe this is a guest bug. It
would be interesting to know what is the vm_start address, at the end
there are stack, vdso and vsyscall areas.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Guest mmap.c bug

2010-03-02 Thread BRUNO CESAR RIBAS
Hi,

I run a bunch of virtual servers using KVM. And I a mmap.c bug on the guest
machine. The virtual machines are desktop servers for Thin Clients.

My host is running a 2.6.33 kernel and have 32GB of rami, opteron with
amd-v.

The guest is running 2.6.27.45 (tried 2.6.31.12, 2.6.32.9, 2.6.33), some
guests are using 10GB, 4GB or 20GB of ram.

My qemu-kvm version is 0.12.3

All guests are using NFSROOT as the ROOT FS and virtio as the network
driver.

I run the guest with:
kvm  -cpu kvm64 -smp 4 -vnc :101 -daemonize -name ${NOME} -localtime -m $RAM
-net nic,macaddr=$VLAN0,model=virtio,vlan=0 -net tap,vlan=0,ifname=${NOME}0\
-net nic,macaddr=$VLAN121,model=virtio,vlan=121 -net 
tap,vlan=121,ifname=${NOME}121\
-net nic,macaddr=$VLAN112,model=virtio,vlan=112 -net 
tap,vlan=112,ifname=${NOME}112\
-kernel /root/vmlinuz-2.6.27.45-amd64-aufs-guest \
-append root=/dev/nfs rw ip=dhcp nfsroot=$5 init=/sbin/boot.sh


I have a machine running an identical kernel (without virtio stuff) for a
dedicated machine (as it does not have amd-v) and it stays up for days and
even months. But when running a guest machine with qemu-kvm i get some bug
message and lots of process in D state and i can't 'ps aux' or look inside
/proc and /sys without losing my shell (it hangs).

In `console` I get the folowing message, repeated for different processor,
different Pid and diferent  mmap.c line (line 486 appears to).

[ cut here ]
kernel BUG at mm/mmap.c:869!
invalid opcode:  [1] SMP 
CPU 2 
Pid: 31334, comm: nautilus Not tainted 2.6.27.45-amd64-aufs-guest-00267
 #2
RIP: 0010:[8027b2e1]  [8027b2e1] find_mergeable_ano
f1/0x200
RSP: :8804d933fb38  EFLAGS: 00010283
RAX: 8804cb44b9a8 RBX: 8804cb44b978 RCX: 8804fe6d3088
RDX: f4803000 RSI: 8804fe6d3088 RDI: 88049fa56138
RBP: 88049fa56138 R08: 8804d933e000 R09: 
R10:  R11:  R12: 00100073
R13: 00100073 R14: f4803000 R15: 806ce6c0
FS:  () GS:88051cc7d440(0063) knlGS:f41
CS:  0010 DS: 002b ES: 002b CR0: 8005003b
CR2: f4803000 CR3: 0004a7d39000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process nautilus (pid: 31334, threadinfo 8804d933e000, task 880
)
Stack:  8052e62d   88049fa5
 88051a5aac40 80280382 8804cb41b790 880498919018
  88049f8dad20 3000 802770aa
Call Trace:
 [8052e62d] ? _spin_lock_irq+0xd/0x10
 [80280382] ? anon_vma_prepare+0x52/0x100
 [802770aa] ? handle_mm_fault+0x65a/0x900
 [802de6d8] ? proc_alloc_inode+0x58/0x90
 [8052e545] ? __down_read+0x85/0xbc
 [80223331] ? do_page_fault+0x2a1/0xab0
 [803d6899] ? vsnprintf+0x4d9/0x750
 [8029d7a1] ? do_lookup+0x81/0x240
 [8027265d] ? zone_statistics+0x7d/0x80
 [8052ea3a] ? error_exit+0x0/0x70
 [803d706d] ? copy_user_generic_string+0x2d/0x40
 [802e35ec] ? proc_file_read+0x12c/0x2e0
 [802e34c0] ? proc_file_read+0x0/0x2e0
 [802dec1a] ? proc_reg_read+0x8a/0xe0
 [80295995] ? vfs_read+0xb5/0x160
 [80295b2e] ? sys_read+0x4e/0x90
 [80227004] ? ia32_sysret+0x0/0x5


Code: 29 d0 48 c1 e8 0c 48 01 f8 48 3b 83 88 00 00 00 0f 85 5b fe ff ff
 78 e9 c5 fe ff ff 0f 1f 00 31 f6 31 db e9 a9 fe ff ff 0f 0b eb fe 66
 1f 84 00 00 00 00 00 48 83 ec 08 48 8b 
RIP  [8027b2e1] find_mergeable_anon_vma+0x1f1/0x200
 RSP 8804d933fb38
---[ end trace e5ca25224cd7d1d4 ]---


Does anyone has a sugestion? Where to look? What else should I trace?

Thanks in advance,
-- 
Bruno Ribas - ri...@c3sl.ufpr.br
http://www.inf.ufpr.br/ribas
C3SL: http://www.c3sl.ufpr.br
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html