Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/05/2012 07:12 AM, Peter Lieven wrote: On 07/03/12 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm here we go: commit ca7d58f375c650cf36900cb1da1ca2cc99b13393 Author: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Date: Wed Jul 13 14:31:08 2011 +0800 KVM: x86: fix broken read emulation spans a page boundary Ah, i will try to reproduce it and fix it. Thanks for your work. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 06/28/2012 05:11 PM, Peter Lieven wrote: that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) There are two mmio emulation after user-space-exit, it is caused by mmio read access which spans two pages. But it should be fixed by: commit f78146b0f9230765c6315b2e14f56112513389ad Author: Avi Kivity a...@redhat.com Date: Wed Apr 18 19:22:47 2012 +0300 KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of fragments, and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Could you please pull the code from: https://git.kernel.org/pub/scm/virt/kvm/kvm.git and trace it again? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 05.07.2012 10:51, Xiao Guangrong wrote: On 06/28/2012 05:11 PM, Peter Lieven wrote: that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) There are two mmio emulation after user-space-exit, it is caused by mmio read access which spans two pages. But it should be fixed by: commit f78146b0f9230765c6315b2e14f56112513389ad Author: Avi Kivitya...@redhat.com Date: Wed Apr 18 19:22:47 2012 +0300 KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of fragments, and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivitya...@redhat.com Signed-off-by: Marcelo Tosattimtosa...@redhat.com Could you please pull the code from: https://git.kernel.org/pub/scm/virt/kvm/kvm.git and trace it again? Thank you very much, this fixes the issue I have seen. Thanks, Peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/03/12 15:25, Avi Kivity wrote: On 07/03/2012 04:15 PM, Peter Lieven wrote: On 03.07.2012 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm would it also be ok to bisect kvm-kmod? Yes, but note that kvm-kmod is spread across two repositories which are not often tested out of sync, so you may get build failures. ok, i just started with this with a 3.0 (good) and 3.2 (bad) vanilla kernel. i can confirm the bug and i am no starting to bisect. it will take while with my equipment if anyone has a powerful testbed to run this i would greatly appreciate help. if anyone wants to reproduce: a) v3.2 from git.kernel.org b) qemu-kvm 1.0.1 from sourceforge c) ubuntu 64-bit 12.04 server cd d) empty (e.g. all zero) hard disk image cmdline: ./qemu-system-x86_64 -m 512 -cdrom /home/lieven/Downloads/ubuntu-12.04-server-amd64.iso -hda /dev/hd1/vmtest -vnc :1 -monitor stdio -boot dc then choose boot from first harddisk and try to quit the qemu monitor with 'quit'. - hypervisor hangs. peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/03/12 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm here we go: commit ca7d58f375c650cf36900cb1da1ca2cc99b13393 Author: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Date: Wed Jul 13 14:31:08 2011 +0800 KVM: x86: fix broken read emulation spans a page boundary -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? Thanks, Peter On 02.07.2012 17:05, Avi Kivity wrote: On 06/28/2012 12:38 PM, Peter Lieven wrote: does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr 0xa-4 addr= 0xa) kvm_run-mmio.len == 3) return 0; Just what it says. There is a 4-byte access to address 0x9. The first byte lies in RAM, the next three bytes are in mmio. qemu is geared to power-of-two accesses even though x86 can generate accesses to any number of bytes between 1 and 8. It appears that this has happened with your guest. It's not impossible that it's genuine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 03.07.2012 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm would it also be ok to bisect kvm-kmod? thanks, peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 07/03/2012 04:15 PM, Peter Lieven wrote: On 03.07.2012 15:13, Avi Kivity wrote: On 07/03/2012 04:01 PM, Peter Lieven wrote: Further output from my testing. Working: Linux 2.6.38 with included kvm module Linux 3.0.0 with included kvm module Not-Working: Linux 3.2.0 with included kvm module Linux 2.6.28 with kvm-kmod 3.4 Linux 3.0.0 with kvm-kmod 3.4 Linux 3.2.0 with kvm-kmod 3.4 I can trigger the race with any of qemu-kvm 0.12.5, 1.0 or 1.0.1. It might be that the code was introduced somewhere between 3.0.0 and 3.2.0 in the kvm kernel module and that the flaw is not in qemu-kvm. Any hints? A bisect could tell us where the problem is. To avoid bisecting all of linux, try git bisect v3.2 v3.0 virt/kvm arch/x86/kvm would it also be ok to bisect kvm-kmod? Yes, but note that kvm-kmod is spread across two repositories which are not often tested out of sync, so you may get build failures. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 06/28/2012 12:38 PM, Peter Lieven wrote: does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr 0xa-4 addr = 0xa) kvm_run-mmio.len == 3) return 0; Just what it says. There is a 4-byte access to address 0x9. The first byte lies in RAM, the next three bytes are in mmio. qemu is geared to power-of-two accesses even though x86 can generate accesses to any number of bytes between 1 and 8. It appears that this has happened with your guest. It's not impossible that it's genuine. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 02.07.2012 17:05, Avi Kivity wrote: On 06/28/2012 12:38 PM, Peter Lieven wrote: does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr 0xa-4 addr= 0xa) kvm_run-mmio.len == 3) return 0; Just what it says. There is a 4-byte access to address 0x9. The first byte lies in RAM, the next three bytes are in mmio. qemu is geared to power-of-two accesses even though x86 can generate accesses to any number of bytes between 1 and 8. I just stumbled across the word hack in the comment. When the race occurs the CPU is basically reading from 0xa in an endless loop. It appears that this has happened with your guest. It's not impossible that it's genuine. I had a lot to do the last days, but I update our build environment to Ubuntu LTS 12.04 64-bit Server which is based on Linux 3.2.0. I still see the issue. If I use the kvm Module provided with the kernel it is working correctly. If I use kvm-kmod-3.4 with qemu-kvm-1.0.1 (both from sourceforge) I can reproduce the race condition. I will keep you posted when I have more evidence. Thanks, Peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. thanks peter Jan [1] http://www.linux-kvm.org/page/Tracing --- Hi, we recently came across multiple VMs racing and stopping working. It seems to happen when the system is at 100% cpu. One way to reproduce this is: qemu-kvm-1.0.1 with vnc-thread enabled cmdline (or similar): /usr/bin/qemu-kvm-1.0.1 -net tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 02-debug-race -boot order=dc,menu=off -cdrom /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile /var/run/qemu/vm-221.pid -mem-prealloc -cpu host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus it is important that the attached virtio image contains only zeroes. if the system boots from cd, select boot from first harddisk. the hypervisor then hangs at 100% cpu and neither monitor nor qmp are responsive anymore. i have also seen customers reporting this when a VM is shut down. if this is connected to the threaded vnc server it might be important to connected at this time. debug backtrace attached. Thanks, Peter -- (gdb) file /usr/bin/qemu-kvm-1.0.1 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done. (gdb) attach 5145 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] [New Thread 0x7f54d08b9700 (LWP 5253)] [New Thread 0x7f5552757700 (LWP 5152)] [New Thread 0x7f5552f58700 (LWP 5151)] 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) info threads
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a real 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. However, let's focus on bisecting at kernel and qemu-kvm level first. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a real 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. thanks peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
does anyone know whats that here in handle_mmio? /* hack: Red Hat 7.1 generates these weird accesses. */ if ((addr 0xa-4 addr = 0xa) kvm_run-mmio.len == 3) return 0; thanks, peter On 28.06.2012 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a real 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. thanks peter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 2012-06-28 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a real 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. Unless you played with priorities and affinities, the Linux scheduler should provide the required time to the iothread. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. Right. Even if the guest runs wild, you must be able to control the vm via the monitor etc. If not, that's a bug. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 28.06.2012 11:39, Jan Kiszka wrote: On 2012-06-28 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a real 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. Unless you played with priorities and affinities, the Linux scheduler should provide the required time to the iothread. I have a 1.1GB (85MB compressed) trace-file. If you have time to look at it I could drop it somewhere. We currently run all VMs with nice 1 because we observed that this improves that controlability of the Node in case all VMs have execessive CPU load. Running the VM unniced does not change the behaviour unfortunately. Peter my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. Right. Even if the guest runs wild, you must be able to control the vm via the monitor etc. If not, that's a bug. Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 28.06.2012 11:39, Jan Kiszka wrote: On 2012-06-28 11:31, Peter Lieven wrote: On 28.06.2012 11:21, Jan Kiszka wrote: On 2012-06-28 11:11, Peter Lieven wrote: On 27.06.2012 18:54, Jan Kiszka wrote: On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? sorry, i should have been more specific. maybe I also misunderstood sth. I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if I use a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. however, maybe we don't have to dig to deep - see below. kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 working on an older kernel. This step may introduce bugs of its own. Therefore my suggestion to use a real 3.x kernel to exclude that risk first of all. Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) its doing that forever. this is tracing the kvm module. doing the qemu-system-x86_64 trace is a bit compilcated, but maybe this is already sufficient. otherwise i will of course gather this info as well. That's only tracing KVM event, and it's tracing when things went wrong already. We may need a full trace (-e all) specifically for the period when this pattern above started. i will do that. maybe i should explain that the vcpu is executing garbage when this above starts. its basically booting from an empty harddisk. if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); maybe the time to handle the monitor/qmp connection is just to short. if i understand furhter correctly, it can only handle monitor connections while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i wrong here? the time spend in this state might be rather short. Unless you played with priorities and affinities, the Linux scheduler should provide the required time to the iothread. my concern is not that the machine hangs, just the the hypervisor is unresponsive and its impossible to reset or quit gracefully. the only way to get the hypervisor ended is via SIGKILL. Right. Even if the guest runs wild, you must be able to control the vm via the monitor etc. If not, that's a bug. what i observed just know is that the monitor is working up to the point i try to quit the hypervisor or try to reset the cpu. so we where looking at a completely wrong place... it seems that in this short excerpt, that the deadlock appears not on excution but when the vcpus shall be paused. Program received signal SIGINT, Interrupt. 0x7fc8ec36785c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 (gdb) thread apply all bt Thread 4 (Thread
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 2012-06-27 17:39, Peter Lieven wrote: Hi all, i debugged this further and found out that kvm-kmod-3.0 is working with qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). Has anyone a clue which new KVM feature could cause this if a vcpu is in an infinite loop? Before accusing kvm-kmod ;), can you check if the effect is visible with an original Linux 3.3.x or 3.4.x kernel as well? Then, bisection the change in qemu-kvm that apparently resolved the issue would be interesting. If we have to dig deeper, tracing [1] the lockup would likely be helpful (all events of the qemu process, not just KVM related ones: trace-cmd record -e all qemu-system-x86_64 ...). Jan [1] http://www.linux-kvm.org/page/Tracing --- Hi, we recently came across multiple VMs racing and stopping working. It seems to happen when the system is at 100% cpu. One way to reproduce this is: qemu-kvm-1.0.1 with vnc-thread enabled cmdline (or similar): /usr/bin/qemu-kvm-1.0.1 -net tap,vlan=141,script=no,downscript=no,ifname=tap15,vnet_hdr -net nic,vlan=141,model=virtio,macaddr=52:54:00:ff:00:f7 -drive format=host_device,file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-efdf4e007-16700198c7f4fead-02-debug-race-hd01,if=virtio,cache=none,aio=native -m 2048 -smp 2,sockets=1,cores=2,threads=1 -monitor tcp:0:4026,server,nowait -vnc :26 -qmp tcp:0:3026,server,nowait -name 02-debug-race -boot order=dc,menu=off -cdrom /home/kvm/cdrom//root/ubuntu-12.04-server-amd64.iso -k de -pidfile /var/run/qemu/vm-221.pid -mem-prealloc -cpu host,+x2apic,model_id=Intel(R) Xeon(R) CPU L5640 @ 2.27GHz,-tsc -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus it is important that the attached virtio image contains only zeroes. if the system boots from cd, select boot from first harddisk. the hypervisor then hangs at 100% cpu and neither monitor nor qmp are responsive anymore. i have also seen customers reporting this when a VM is shut down. if this is connected to the threaded vnc server it might be important to connected at this time. debug backtrace attached. Thanks, Peter -- (gdb) file /usr/bin/qemu-kvm-1.0.1 Reading symbols from /usr/bin/qemu-kvm-1.0.1...done. (gdb) attach 5145 Attaching to program: /usr/bin/qemu-kvm-1.0.1, process 5145 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] [New Thread 0x7f54d08b9700 (LWP 5253)] [New Thread 0x7f5552757700 (LWP 5152)] [New Thread 0x7f5552f58700 (LWP 5151)] 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) info threads 4 Thread 0x7f5552f58700 (LWP 5151) 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 3 Thread 0x7f5552757700 (LWP 5152) 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 2 Thread 0x7f54d08b9700 (LWP 5253) 0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 * 1 Thread 0x7f50d700 (LWP 5145) 0x7f5553c6b5a3 in select () from /lib/libc.so.6 (gdb) thread apply all bt Thread 4 (Thread 0x7f5552f58700 (LWP 5151)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f5557652f10, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x7f72728a in kvm_cpu_exec (env=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f5557652f10) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 3 (Thread 0x7f5552757700 (LWP 5152)): #0 0x7f5553c6a747 in ioctl () from /lib/libc.so.6 #1 0x7f727830 in kvm_vcpu_ioctl (env=0x7f555766ae60, type=44672) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:1101 #2 0x7f72728a in kvm_cpu_exec (env=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/kvm-all.c:987 #3 0x7f6f5c08 in qemu_kvm_cpu_thread_fn (arg=0x7f555766ae60) at /usr/src/qemu-kvm-1.0.1/cpus.c:740 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 2 (Thread 0x7f54d08b9700 (LWP 5253)): #0 0x7f5553f1a85c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x7f679f5d in qemu_cond_wait (cond=0x7f5557ede1e0, mutex=0x7f5557ede210) at qemu-thread-posix.c:113 #2 0x7f6b06a1 in vnc_worker_thread_loop (queue=0x7f5557ede1e0) at ui/vnc-jobs-async.c:222 #3 0x7f6b0b7f in vnc_worker_thread (arg=0x7f5557ede1e0) at ui/vnc-jobs-async.c:318 #4 0x7f5553f159ca in start_thread () from /lib/libpthread.so.0 #5 0x7f5553c72cdd in clone () from /lib/libc.so.6 #6 0x in ?? () Thread 1 (Thread