I sent the email below to smartos-discuss a week ago but haven't gotten a response yet. Since I suspect this issue applies to KVM on Illumos in general, I'm sending a slightly modified copy of this email to this list as well. Please let me know if there's any additional information I can provide, or if there are any diagnostics I should run (e.g. DTrace scripts) next time this problem happens.
I ran into some issues last week with two of my KVM VMs running on SmartOS 20120629T002039Z becoming temporarily becoming unresponsive. I believe the same problem happened again later on that week, although I haven't been able to confirm it. One of the VMs is running Debian Linux Wheezy, and the other Windows Server 2008 R2. The Debian Linux VM had become unresponsive while performing apt-get dist-upgrade. When the VMs were unresponsive, executing "vmadm info" would just hang. Each qemu process would consume 100% of one CPU while they were hanging. Eventually both VMs came back up (not at the same time) without requiring a reboot, at which point the loads went back to normal. At one point, after the Linux VM had gotten back up, running vmadm info on the Windows VM would actually return a response, but it would say: "Unable to get VM info for <uuid>: Unable to get info from vmadmd, query returned 500." Eventually this error message went away. I've had VMs running on this machine for a few weeks now and I haven't run into this problem until now (to my knowledge). /var/adm/messages showed some messages during the time that the VMs froze: 2012-07-16T16:14:54.776339+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:14:54.776362+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0x50866b 2012-07-16T16:14:54.776374+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:14:54.776643+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0xfe75e1c0 2012-07-16T16:14:54.776656+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:14:54.776669+00:00 virt1 kvm: [ID 291337 kern.info] vcpu 1 received sipi with vector # 10 2012-07-16T16:14:54.776673+00:00 virt1 kvm: [ID 420667 kern.info] kvm_lapic_reset: vcpu=ffffff04eb294000, id=1, base_msr= fee00800 PRIx64 base_address=fee00000 2012-07-16T16:25:38.233767+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0xff29b633 2012-07-16T16:25:38.233795+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:25:38.233862+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0x50866b 2012-07-16T16:25:38.233876+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:25:38.234174+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0xfe75ea30 2012-07-16T16:25:38.234191+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x3 data 0 2012-07-16T16:25:38.262789+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0x50866b 2012-07-16T16:25:38.262824+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:25:38.262874+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0x50866b 2012-07-16T16:25:38.262902+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x0 data 0 2012-07-16T16:25:38.263185+00:00 virt1 kvm: [ID 713435 kern.info] unhandled rdmsr: 0xfe75ea30 2012-07-16T16:25:38.263203+00:00 virt1 kvm: [ID 391722 kern.info] unhandled wrmsr: 0x3 data 0 . 2012-07-16T16:31:45+00:00 virt1 stop-F[3132]: [ID 702911 local0.error] 45.608 connect ECONNREFUSED . The server has a Xeon E3-1230 CPU on a SuperMicro X9SCM-F motherboard. Alex ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
