Re: KVM Guest mmap.c bug
On Mon, Mar 08, 2010 at 03:49:01PM +0100, Andrea Arcangeli wrote: On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote: It looks unrelated to kvm, though of course random memory corruption cannot be ruled out. Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)? Andrea, any idea? Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma despite vma is the one with the smaller vm_end where the comparison vma-vm_start vma-vm_end is true (the next vma is null and the prev will have vma-vm_start == prev-vm_end, not ). The bug check looks right, it doesn't seem false positive and this bugcheck indicates that the vma rbtree is memory-corrupted somehow. so yes fiddling with npt on and off sounds a good start, if it's a bug I can confirm it happens with npt on and off. And it also happens on a Nehalem XEON (it just happened). in shadow paging it's unlikely the exact same bug materializes with both npt and without. If the crash happens with npt on and off, then maybe it's not hypervisor related. Could also be bad RAM if it only I doubt it is bad ram! This machine is working (wihtout KVM) for almost 2 years and MCE does not report any problems on the host machine. And it happens on two identical machines (Opteron) and now o the new (5 days old) Intel Nehalem XEON. All guest are Running the same kernel. It happens with a kernel compiled by me and from debian SID both 2.6.32.9, and from previous kernel I tried (2.6.31.12 and 2.6.27.45) happens on a single host and all other hosts are fine with same binary guest/host kernels (rbtree walk might stress the memory bus more than other operations). Said that vm_next being null (and if it's null, likely vm_next pointer has no ram bitflip) is a bit weird and not common scenario and this page fault seems triggered with procfs copy_user call which is non standard, so maybe this is a guest bug. It would be interesting to know what is the vm_start address, at the end there are stack, vdso and vsyscall areas. I'll make it print vm_start for next reboot. -- Bruno Ribas - ri...@c3sl.ufpr.br http://www.inf.ufpr.br/ribas C3SL: http://www.c3sl.ufpr.br -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Guest mmap.c bug
On 03/02/2010 10:25 PM, BRUNO CESAR RIBAS wrote: Hi, I run a bunch of virtual servers using KVM. And I a mmap.c bug on the guest machine. The virtual machines are desktop servers for Thin Clients. My host is running a 2.6.33 kernel and have 32GB of rami, opteron with amd-v. The guest is running 2.6.27.45 (tried 2.6.31.12, 2.6.32.9, 2.6.33), some guests are using 10GB, 4GB or 20GB of ram. My qemu-kvm version is 0.12.3 All guests are using NFSROOT as the ROOT FS and virtio as the network driver. I run the guest with: kvm -cpu kvm64 -smp 4 -vnc :101 -daemonize -name ${NOME} -localtime -m $RAM -net nic,macaddr=$VLAN0,model=virtio,vlan=0 -net tap,vlan=0,ifname=${NOME}0\ -net nic,macaddr=$VLAN121,model=virtio,vlan=121 -net tap,vlan=121,ifname=${NOME}121\ -net nic,macaddr=$VLAN112,model=virtio,vlan=112 -net tap,vlan=112,ifname=${NOME}112\ -kernel /root/vmlinuz-2.6.27.45-amd64-aufs-guest \ -append root=/dev/nfs rw ip=dhcp nfsroot=$5 init=/sbin/boot.sh I have a machine running an identical kernel (without virtio stuff) for a dedicated machine (as it does not have amd-v) and it stays up for days and even months. But when running a guest machine with qemu-kvm i get some bug message and lots of process in D state and i can't 'ps aux' or look inside /proc and /sys without losing my shell (it hangs). In `console` I get the folowing message, repeated for different processor, different Pid and diferent mmap.c line (line 486 appears to). [ cut here ] kernel BUG at mm/mmap.c:869! invalid opcode: [1] SMP CPU 2 Pid: 31334, comm: nautilus Not tainted 2.6.27.45-amd64-aufs-guest-00267 #2 RIP: 0010:[8027b2e1] [8027b2e1] find_mergeable_ano f1/0x200 RSP: :8804d933fb38 EFLAGS: 00010283 RAX: 8804cb44b9a8 RBX: 8804cb44b978 RCX: 8804fe6d3088 RDX: f4803000 RSI: 8804fe6d3088 RDI: 88049fa56138 RBP: 88049fa56138 R08: 8804d933e000 R09: R10: R11: R12: 00100073 R13: 00100073 R14: f4803000 R15: 806ce6c0 FS: () GS:88051cc7d440(0063) knlGS:f41 CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f4803000 CR3: 0004a7d39000 CR4: 06a0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nautilus (pid: 31334, threadinfo 8804d933e000, task 880 ) Stack: 8052e62d 88049fa5 88051a5aac40 80280382 8804cb41b790 880498919018 88049f8dad20 3000 802770aa Call Trace: [8052e62d] ? _spin_lock_irq+0xd/0x10 [80280382] ? anon_vma_prepare+0x52/0x100 [802770aa] ? handle_mm_fault+0x65a/0x900 [802de6d8] ? proc_alloc_inode+0x58/0x90 [8052e545] ? __down_read+0x85/0xbc [80223331] ? do_page_fault+0x2a1/0xab0 [803d6899] ? vsnprintf+0x4d9/0x750 [8029d7a1] ? do_lookup+0x81/0x240 [8027265d] ? zone_statistics+0x7d/0x80 [8052ea3a] ? error_exit+0x0/0x70 [803d706d] ? copy_user_generic_string+0x2d/0x40 [802e35ec] ? proc_file_read+0x12c/0x2e0 [802e34c0] ? proc_file_read+0x0/0x2e0 [802dec1a] ? proc_reg_read+0x8a/0xe0 [80295995] ? vfs_read+0xb5/0x160 [80295b2e] ? sys_read+0x4e/0x90 [80227004] ? ia32_sysret+0x0/0x5 Code: 29 d0 48 c1 e8 0c 48 01 f8 48 3b 83 88 00 00 00 0f 85 5b fe ff ff 78 e9 c5 fe ff ff 0f 1f 00 31 f6 31 db e9 a9 fe ff ff0f 0b eb fe 66 1f 84 00 00 00 00 00 48 83 ec 08 48 8b RIP [8027b2e1] find_mergeable_anon_vma+0x1f1/0x200 RSP8804d933fb38 ---[ end trace e5ca25224cd7d1d4 ]--- Does anyone has a sugestion? Where to look? What else should I trace? It looks unrelated to kvm, though of course random memory corruption cannot be ruled out. Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)? Andrea, any idea? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Guest mmap.c bug
On Mon, Mar 08, 2010 at 03:32:19PM +0200, Avi Kivity wrote: It looks unrelated to kvm, though of course random memory corruption cannot be ruled out. Is npt enabled on the host (cat /sys/module/kvm_amd/parameters/npt)? Andrea, any idea? Basically find_vma(vma-vm_mm, vma-vm_start) doesn't return vma despite vma is the one with the smaller vm_end where the comparison vma-vm_start vma-vm_end is true (the next vma is null and the prev will have vma-vm_start == prev-vm_end, not ). The bug check looks right, it doesn't seem false positive and this bugcheck indicates that the vma rbtree is memory-corrupted somehow. so yes fiddling with npt on and off sounds a good start, if it's a bug in shadow paging it's unlikely the exact same bug materializes with both npt and without. If the crash happens with npt on and off, then maybe it's not hypervisor related. Could also be bad RAM if it only happens on a single host and all other hosts are fine with same binary guest/host kernels (rbtree walk might stress the memory bus more than other operations). Said that vm_next being null (and if it's null, likely vm_next pointer has no ram bitflip) is a bit weird and not common scenario and this page fault seems triggered with procfs copy_user call which is non standard, so maybe this is a guest bug. It would be interesting to know what is the vm_start address, at the end there are stack, vdso and vsyscall areas. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Guest mmap.c bug
Hi, I run a bunch of virtual servers using KVM. And I a mmap.c bug on the guest machine. The virtual machines are desktop servers for Thin Clients. My host is running a 2.6.33 kernel and have 32GB of rami, opteron with amd-v. The guest is running 2.6.27.45 (tried 2.6.31.12, 2.6.32.9, 2.6.33), some guests are using 10GB, 4GB or 20GB of ram. My qemu-kvm version is 0.12.3 All guests are using NFSROOT as the ROOT FS and virtio as the network driver. I run the guest with: kvm -cpu kvm64 -smp 4 -vnc :101 -daemonize -name ${NOME} -localtime -m $RAM -net nic,macaddr=$VLAN0,model=virtio,vlan=0 -net tap,vlan=0,ifname=${NOME}0\ -net nic,macaddr=$VLAN121,model=virtio,vlan=121 -net tap,vlan=121,ifname=${NOME}121\ -net nic,macaddr=$VLAN112,model=virtio,vlan=112 -net tap,vlan=112,ifname=${NOME}112\ -kernel /root/vmlinuz-2.6.27.45-amd64-aufs-guest \ -append root=/dev/nfs rw ip=dhcp nfsroot=$5 init=/sbin/boot.sh I have a machine running an identical kernel (without virtio stuff) for a dedicated machine (as it does not have amd-v) and it stays up for days and even months. But when running a guest machine with qemu-kvm i get some bug message and lots of process in D state and i can't 'ps aux' or look inside /proc and /sys without losing my shell (it hangs). In `console` I get the folowing message, repeated for different processor, different Pid and diferent mmap.c line (line 486 appears to). [ cut here ] kernel BUG at mm/mmap.c:869! invalid opcode: [1] SMP CPU 2 Pid: 31334, comm: nautilus Not tainted 2.6.27.45-amd64-aufs-guest-00267 #2 RIP: 0010:[8027b2e1] [8027b2e1] find_mergeable_ano f1/0x200 RSP: :8804d933fb38 EFLAGS: 00010283 RAX: 8804cb44b9a8 RBX: 8804cb44b978 RCX: 8804fe6d3088 RDX: f4803000 RSI: 8804fe6d3088 RDI: 88049fa56138 RBP: 88049fa56138 R08: 8804d933e000 R09: R10: R11: R12: 00100073 R13: 00100073 R14: f4803000 R15: 806ce6c0 FS: () GS:88051cc7d440(0063) knlGS:f41 CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f4803000 CR3: 0004a7d39000 CR4: 06a0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process nautilus (pid: 31334, threadinfo 8804d933e000, task 880 ) Stack: 8052e62d 88049fa5 88051a5aac40 80280382 8804cb41b790 880498919018 88049f8dad20 3000 802770aa Call Trace: [8052e62d] ? _spin_lock_irq+0xd/0x10 [80280382] ? anon_vma_prepare+0x52/0x100 [802770aa] ? handle_mm_fault+0x65a/0x900 [802de6d8] ? proc_alloc_inode+0x58/0x90 [8052e545] ? __down_read+0x85/0xbc [80223331] ? do_page_fault+0x2a1/0xab0 [803d6899] ? vsnprintf+0x4d9/0x750 [8029d7a1] ? do_lookup+0x81/0x240 [8027265d] ? zone_statistics+0x7d/0x80 [8052ea3a] ? error_exit+0x0/0x70 [803d706d] ? copy_user_generic_string+0x2d/0x40 [802e35ec] ? proc_file_read+0x12c/0x2e0 [802e34c0] ? proc_file_read+0x0/0x2e0 [802dec1a] ? proc_reg_read+0x8a/0xe0 [80295995] ? vfs_read+0xb5/0x160 [80295b2e] ? sys_read+0x4e/0x90 [80227004] ? ia32_sysret+0x0/0x5 Code: 29 d0 48 c1 e8 0c 48 01 f8 48 3b 83 88 00 00 00 0f 85 5b fe ff ff 78 e9 c5 fe ff ff 0f 1f 00 31 f6 31 db e9 a9 fe ff ff 0f 0b eb fe 66 1f 84 00 00 00 00 00 48 83 ec 08 48 8b RIP [8027b2e1] find_mergeable_anon_vma+0x1f1/0x200 RSP 8804d933fb38 ---[ end trace e5ca25224cd7d1d4 ]--- Does anyone has a sugestion? Where to look? What else should I trace? Thanks in advance, -- Bruno Ribas - ri...@c3sl.ufpr.br http://www.inf.ufpr.br/ribas C3SL: http://www.c3sl.ufpr.br -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html