WinPE blue screen in guest w/HyperV enlightenment
I've been trying to get WinPE to boot correctly as a KVM guest. I've found that WinPE > 4.0 (aka server 2012) will boot fine in KVM, but older versions will not. I only see this issue if the HyperV enlightenments are enabled. A screenshot of the crash can be found at: https://dl.dropboxusercontent.com/u/2078961/winpe_kvm.png Host: CentOS 6 x64 3.10.9-1.el6.x86_64 qemu 1.6.0 /usr/libexec/qemu-kvm -name SRVID4538 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192 -smp 4,sockets=2,cores=16,threads=2 -uuid 4807c195-f10c-404d-b2da-de1b726c19e5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=//var/lib/libvirt/qemu/SRVID4538.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc,driftfix=slew -no-hpet -boot c -usb -drive file=/dev/vmimages/SRVID4538,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc 127.0.0.1:4538,password -k en-us -vga cirrus -device pci-assign,host=06:10.0,id=hostdev0,bus=pci.0,addr=0x3,rombar=1,romfile=/usr/share/gpxe/80861520.rom -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 If I change '-cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000' to '-cpu host', WinPE boots up fine. Is this a bug in KVM? I'm not really sure what other information would be helpful here. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Further Windows performance optimizations?
Are there any optimizations that I can do for EOI/APIC for a Windows 2008R2 guest? I'm seeing a significant amount of kernel CPU usage from kvm_ioapic_update_eoi. I can't seem to find any information on further optimizations for this. Sample of trace output: https://gist.github.com/devicenull/d1a918879d38955053dd/raw/3aed63b8e60e98c3e7fe21a42ca123d8bf309e0c/trace Host setup: 3.10.9-1.el6.x86_64 #1 SMP Tue Aug 27 15:27:08 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux with this patchset applied: http://www.spinics.net/lists/kvm/msg91214.html CentOS 6 qemu 1.6.0 (also patched with the above enlightenment) 2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores including hyperthread cores) 24GB memory swap file is enabled, but unused Guest setup: Windows Server 2008R2 (64 bit) 24 vCPUs 20 GB memory VirtIO disk drivers SR-IOV for network (with Intel I350 network chipset) /usr/libexec/qemu-kvm -name VMID109 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 20480 -smp 24,sockets=1,cores=12,threads=2 -uuid 6a7517f5-3b1c-43c2-aa71-96b143356b3d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=//var/lib/libvirt/qemu/VMID109.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc,driftfix=slew -no-hpet -boot c -usb -drive file=/dev/vmimages/VMID109,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc 127.0.0.1:109 -k en-us -vga cirrus -device pci-assign,host=02:10.0,id=hostdev0,bus=pci.0,addr=0x3,rombar=1,romfile=/usr/share/gpxe/80861520.rom -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 I removed a bunch of empty entries from below: # perf stat -e 'kvm:*' -a sleep 1m Performance counter stats for 'sleep 1m': 9,707,680 kvm:kvm_entry [100.00%] 8,199 kvm:kvm_hv_hypercall [100.00%] 188,418 kvm:kvm_pio [100.00%] 6 kvm:kvm_cpuid [100.00%] 3,983,787 kvm:kvm_apic [100.00%] 9,715,744 kvm:kvm_exit [100.00%] 4,028,354 kvm:kvm_inj_virq [100.00%] 3,245,823 kvm:kvm_msr [100.00%] 185,573 kvm:kvm_pic_set_irq [100.00%] 741,665 kvm:kvm_apic_ipi [100.00%] 2,518,242 kvm:kvm_apic_accept_irq [100.00%] 2,506,003 kvm:kvm_eoi [100.00%] 125,532 kvm:kvm_emulate_insn [100.00%] 187,912 kvm:kvm_userspace_exit [100.00%] 309,091 kvm:kvm_set_irq [100.00%] 186,014 kvm:kvm_ioapic_set_irq [100.00%] 124,458 kvm:kvm_msi_set_irq [100.00%] 1,475,484 kvm:kvm_ack_irq [100.00%] 1,295,360 kvm:kvm_fpu [100.00%] 60.001063613 seconds time elapsed perf top -G output: - 25.65% [kernel][k] _raw_spin_lock - _raw_spin_lock - 98.63% kvm_ioapic_update_eoi kvm_ioapic_send_eoi apic_set_eoi apic_reg_write kvm_hv_vapic_msr_write set_msr_hyperv kvm_set_msr_common vmx_set_msr handle_wrmsr vmx_handle_exit vcpu_enter_guest __vcpu_run kvm_arch_vcpu_ioctl_run kvm_vcpu_ioctl do_vfs_ioctl SyS_ioctl system_call_fastpath + __GI___ioctl -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Server 2008R2 KVM guest performance issues
On 8/27/2013 11:09 AM, Paolo Bonzini wrote: Il 27/08/2013 16:44, Brian Rak ha scritto: Il 26/08/2013 21:15, Brian Rak ha scritto: Samples: 62M of event 'cycles', Event count (approx.): 642019289177 64.69% [kernel][k] _raw_spin_lock 2.59% qemu-system-x86_64 [.] 0x001e688d 1.90% [kernel][k] native_write_msr_safe 0.84% [kvm] [k] vcpu_enter_guest 0.80% [kernel][k] __schedule 0.77% [kvm_intel] [k] vmx_vcpu_run 0.68% [kernel][k] effective_load 0.65% [kernel][k] update_cfs_shares 0.62% [kernel][k] _raw_spin_lock_irq 0.61% [kernel][k] native_read_msr_safe 0.56% [kernel][k] enqueue_entity Can you capture the call graphs, too (perf record -g)? Sure. I'm not entire certain how to use perf effectively. I've used `perf record`, then manually expanded the call stacks in `perf report`. If this isn't what you wanted, please let me know. https://gist.github.com/devicenull/7961f23e6756b647a86a/raw/a04718db2c26b31e50fb7f521d47d911610383d8/gistfile1.txt This is actually quite useful! - 41.41% qemu-system-x86 [kernel.kallsyms] 0x815ef6d5 k [k] _raw_spin_lock - _raw_spin_lock - 48.06% futex_wait_setup futex_wait do_futex SyS_futex system_call_fastpath - __lll_lock_wait 99.32% 0x1010002 - 44.71% futex_wake do_futex SyS_futex system_call_fastpath - __lll_unlock_wake 99.33% 0x1010002 This could be multiple VCPUs competing on QEMU's "big lock" because the pmtimer is being read by different VCPUs at the same time. This can be fixed, and probably will in 1.7 or 1.8. I've successfully applied the patch set, and have seen significant performance increases. Kernel CPU usage is no longer half of all CPU usage, and my insn_emulation counts are down to ~2000/s rather then 20,000/s. I did end up having to patch qemu in a terrible way in order to get this working. I've just enabled the TSC optimizations whenever hv_vapic is enabled. This is far from the best way of doing it, but I'm not really a C developer and we'll always want the TSC optimizations on our windows VMs. In case anyone wants to do the same, it's a pretty simple patch: *** clean/qemu-1.6.0/target-i386/kvm.c 2013-08-15 15:56:23.0 -0400 --- qemu-1.6.0/target-i386/kvm.c2013-08-27 11:08:21.388841555 -0400 *** int kvm_arch_init_vcpu(CPUState *cs) *** 477,482 --- 477,484 if (hyperv_vapic_recommended()) { c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE; c->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE; + c->eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE; + c->eax |= 0x200; } c = &cpuid_data.entries[cpuid_i++]; It also seems that if you have useplatformclock=yes in the guest, it will not use the enlightened TSC. `bcdedit /set useplatformclock=no` and a reboot will correct that. Are there any sort of guidelines for what I should be seeing from kvm_stat? This is pretty much average for me now: exits 1362839114 195453 fpu_reload 11016 34100 halt_exits 187767718 33222 halt_wakeup 198400078 35628 host_state_reload222907845 36212 insn_emulation221089422091 io_exits 320944553132 irq_exits 88852031 15855 irq_injections 332358611 60694 irq_window61495812 12125 (all the other ones do not change frequently) The only real way I know to judge things is based on the performance of the guest. Are there any sort of thresholds for these numbers that would indicate a problem? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Server 2008R2 KVM guest performance issues
On 8/27/2013 3:18 AM, Paolo Bonzini wrote: Il 26/08/2013 21:15, Brian Rak ha scritto: Samples: 62M of event 'cycles', Event count (approx.): 642019289177 64.69% [kernel][k] _raw_spin_lock 2.59% qemu-system-x86_64 [.] 0x001e688d 1.90% [kernel][k] native_write_msr_safe 0.84% [kvm] [k] vcpu_enter_guest 0.80% [kernel][k] __schedule 0.77% [kvm_intel] [k] vmx_vcpu_run 0.68% [kernel][k] effective_load 0.65% [kernel][k] update_cfs_shares 0.62% [kernel][k] _raw_spin_lock_irq 0.61% [kernel][k] native_read_msr_safe 0.56% [kernel][k] enqueue_entity Can you capture the call graphs, too (perf record -g)? Sure. I'm not entire certain how to use perf effectively. I've used `perf record`, then manually expanded the call stacks in `perf report`. If this isn't what you wanted, please let me know. https://gist.github.com/devicenull/7961f23e6756b647a86a/raw/a04718db2c26b31e50fb7f521d47d911610383d8/gistfile1.txt I've captured 20,000 lines of kvm trace output. This can be found https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace The guest is doing quite a lot of exits per second, mostly to (a) access the ACPI timer (b) service NMIs. In fact, every NMI is reading the timer too and causing an exit to QEMU. So it is also possible that you have to debug this inside the guest, to see if these exits are expected or not. Do you have any suggestions for how I would do this? Given that the guest is Windows, I'm not certain how I could even begin to debug this. Also, for that patch set I found, do I also need a patch for qemu to actually enable the new enlightenment? I haven't been able to find anything for qemu that matches that patch. I did find http://www.mail-archive.com/kvm@vger.kernel.org/msg82495.html , but that's from significantly before the patchset, so I can't tell if that's still related. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Server 2008R2 KVM guest performance issues
On 8/27/2013 3:38 AM, Gleb Natapov wrote: On Tue, Aug 27, 2013 at 09:18:00AM +0200, Paolo Bonzini wrote: I've captured 20,000 lines of kvm trace output. This can be found https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace The guest is doing quite a lot of exits per second, mostly to (a) access the ACPI timer I see a lot of PM timer access not ACPI timer. The solution for that is the patchset Brian linked. (b) service NMIs. In fact, every NMI is reading the timer too and causing an exit to QEMU. Do you mean "kvm_exit: reason EXCEPTION_NMI rip 0xf800016dcf84 info 0 8307"? Those are not NMIs, single NMI will kill Windows, they are #NM exceptions. Brian, is your workload uses floating point calculation? Yes, our workload uses floating point heavily. I'd also strongly suspect it's doing various things with timers quite frequently. (This is all third party software, so I don't have the source to examine to determine exactly what it's doing). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Server 2008R2 KVM guest performance issues
On 8/26/2013 3:15 PM, Brian Rak wrote: I've been trying to track down the cause of some serious performance issues with a Windows 2008R2 KVM guest. So far, I've been unable to determine what exactly is causing the issue. When the guest is under load, I see very high kernel CPU usage, as well as terrible guest performance. The workload on the guest is approximately 1/4 of what we'd run unvirtualized on the same hardware. Even at that level, we max out every vCPU in the guest. While the guest runs, I see very high kernel CPU usage (based on `htop` output). Host setup: Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux CentOS 6 qemu 1.6.0 2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores including hyperthread cores) 24GB memory swap file is enabled, but unused Guest setup: Windows Server 2008R2 (64 bit) 24 vCPUs 16 GB memory VirtIO disk and network drivers installed /qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 24,sockets=1,cores=12,threads=2 -uuid 90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc,driftfix=slew -no-hpet -boot c -usb -drive file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 -vnc 127.0.0.1:100 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 The beginning of `perf top` output: Samples: 62M of event 'cycles', Event count (approx.): 642019289177 64.69% [kernel][k] _raw_spin_lock 2.59% qemu-system-x86_64 [.] 0x001e688d 1.90% [kernel][k] native_write_msr_safe 0.84% [kvm] [k] vcpu_enter_guest 0.80% [kernel][k] __schedule 0.77% [kvm_intel] [k] vmx_vcpu_run 0.68% [kernel][k] effective_load 0.65% [kernel][k] update_cfs_shares 0.62% [kernel][k] _raw_spin_lock_irq 0.61% [kernel][k] native_read_msr_safe 0.56% [kernel][k] enqueue_entity I've captured 20,000 lines of kvm trace output. This can be found https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace So far, I've tried the following with very little effect: * Disable HPET on the guest * Enable hv_relaxed, hv_vapic, hv_spinlocks * Enable SR-IOV * Pin vCPUs to physical CPUs * Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes) * bcdedit /set useplatformclock yes and no Any suggestions as to what I can do to get better performance out of ths guest? Or reasons why I'm seeing such high kernel cpu usage with it? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've done some additional research on this, and I believe that 'kvm_pio: pio_read at 0xb008 size 4 count 1' is related to windows trying to read the pm timer. This timer appears to use the TSC in some cases (I think). I found this patchset: http://www.spinics.net/lists/kvm/msg91214.html which doesn't appear to be applied yet. Does it seem reasonable that this patchset would eliminate the need for windows to read from the pm timer continuously? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Windows Server 2008R2 KVM guest performance issues
I've been trying to track down the cause of some serious performance issues with a Windows 2008R2 KVM guest. So far, I've been unable to determine what exactly is causing the issue. When the guest is under load, I see very high kernel CPU usage, as well as terrible guest performance. The workload on the guest is approximately 1/4 of what we'd run unvirtualized on the same hardware. Even at that level, we max out every vCPU in the guest. While the guest runs, I see very high kernel CPU usage (based on `htop` output). Host setup: Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux CentOS 6 qemu 1.6.0 2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores including hyperthread cores) 24GB memory swap file is enabled, but unused Guest setup: Windows Server 2008R2 (64 bit) 24 vCPUs 16 GB memory VirtIO disk and network drivers installed /qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 24,sockets=1,cores=12,threads=2 -uuid 90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc,driftfix=slew -no-hpet -boot c -usb -drive file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 -vnc 127.0.0.1:100 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 The beginning of `perf top` output: Samples: 62M of event 'cycles', Event count (approx.): 642019289177 64.69% [kernel][k] _raw_spin_lock 2.59% qemu-system-x86_64 [.] 0x001e688d 1.90% [kernel][k] native_write_msr_safe 0.84% [kvm] [k] vcpu_enter_guest 0.80% [kernel][k] __schedule 0.77% [kvm_intel] [k] vmx_vcpu_run 0.68% [kernel][k] effective_load 0.65% [kernel][k] update_cfs_shares 0.62% [kernel][k] _raw_spin_lock_irq 0.61% [kernel][k] native_read_msr_safe 0.56% [kernel][k] enqueue_entity I've captured 20,000 lines of kvm trace output. This can be found https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace So far, I've tried the following with very little effect: * Disable HPET on the guest * Enable hv_relaxed, hv_vapic, hv_spinlocks * Enable SR-IOV * Pin vCPUs to physical CPUs * Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes) * bcdedit /set useplatformclock yes and no Any suggestions as to what I can do to get better performance out of ths guest? Or reasons why I'm seeing such high kernel cpu usage with it? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html