WinPE blue screen in guest w/HyperV enlightenment

2013-10-10 Thread Brian Rak
I've been trying to get WinPE to boot correctly as a KVM guest. I've 
found that WinPE > 4.0 (aka server 2012) will boot fine in KVM, but 
older versions will not.  I only see this issue if the HyperV 
enlightenments are enabled.


A screenshot of the crash can be found at: 
https://dl.dropboxusercontent.com/u/2078961/winpe_kvm.png


Host:
CentOS 6 x64
3.10.9-1.el6.x86_64
qemu 1.6.0
/usr/libexec/qemu-kvm -name SRVID4538 -S -machine 
pc-i440fx-1.6,accel=kvm,usb=off -cpu 
host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 8192 -smp 
4,sockets=2,cores=16,threads=2 -uuid 
4807c195-f10c-404d-b2da-de1b726c19e5 -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=//var/lib/libvirt/qemu/SRVID4538.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc 
base=utc,driftfix=slew -no-hpet -boot c -usb -drive 
file=/dev/vmimages/SRVID4538,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc 
127.0.0.1:4538,password -k en-us -vga cirrus -device 
pci-assign,host=06:10.0,id=hostdev0,bus=pci.0,addr=0x3,rombar=1,romfile=/usr/share/gpxe/80861520.rom 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


If I change '-cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000' to '-cpu 
host', WinPE boots up fine.


Is this a bug in KVM?  I'm not really sure what other information would 
be helpful here.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Further Windows performance optimizations?

2013-09-06 Thread Brian Rak
Are there any optimizations that I can do for EOI/APIC for a Windows 
2008R2 guest?  I'm seeing a significant amount of kernel CPU usage from 
kvm_ioapic_update_eoi.  I can't seem to find any information on further 
optimizations for this.


Sample of trace output: 
https://gist.github.com/devicenull/d1a918879d38955053dd/raw/3aed63b8e60e98c3e7fe21a42ca123d8bf309e0c/trace


Host setup:
3.10.9-1.el6.x86_64 #1 SMP Tue Aug 27 15:27:08 EDT 2013 x86_64 x86_64 
x86_64 GNU/Linux with this patchset applied: 
http://www.spinics.net/lists/kvm/msg91214.html

CentOS 6
qemu 1.6.0 (also patched with the above enlightenment)
2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores 
including hyperthread cores)

24GB memory
swap file is enabled, but unused

Guest setup:
Windows Server 2008R2 (64 bit)
24 vCPUs
20 GB memory
VirtIO disk drivers
SR-IOV for network (with Intel I350 network chipset)
/usr/libexec/qemu-kvm -name VMID109 -S -machine 
pc-i440fx-1.6,accel=kvm,usb=off -cpu 
host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 20480 -smp 
24,sockets=1,cores=12,threads=2 -uuid 
6a7517f5-3b1c-43c2-aa71-96b143356b3d -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=//var/lib/libvirt/qemu/VMID109.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc 
base=utc,driftfix=slew -no-hpet -boot c -usb -drive 
file=/dev/vmimages/VMID109,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -vnc 
127.0.0.1:109 -k en-us -vga cirrus -device 
pci-assign,host=02:10.0,id=hostdev0,bus=pci.0,addr=0x3,rombar=1,romfile=/usr/share/gpxe/80861520.rom 
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5



I removed a bunch of empty entries from below:
# perf stat -e 'kvm:*' -a sleep 1m

 Performance counter stats for 'sleep 1m':

 9,707,680 kvm:kvm_entry [100.00%]
 8,199 kvm:kvm_hv_hypercall [100.00%]
   188,418 kvm:kvm_pio [100.00%]
 6 kvm:kvm_cpuid [100.00%]
 3,983,787 kvm:kvm_apic [100.00%]
 9,715,744 kvm:kvm_exit [100.00%]
 4,028,354 kvm:kvm_inj_virq [100.00%]
 3,245,823 kvm:kvm_msr [100.00%]
185,573 kvm:kvm_pic_set_irq [100.00%]
   741,665 kvm:kvm_apic_ipi [100.00%]
 2,518,242 kvm:kvm_apic_accept_irq [100.00%]
 2,506,003 kvm:kvm_eoi [100.00%]
   125,532 kvm:kvm_emulate_insn [100.00%]
   187,912 kvm:kvm_userspace_exit [100.00%]
   309,091 kvm:kvm_set_irq [100.00%]
   186,014 kvm:kvm_ioapic_set_irq [100.00%]
   124,458 kvm:kvm_msi_set_irq [100.00%]
 1,475,484 kvm:kvm_ack_irq [100.00%]
 1,295,360 kvm:kvm_fpu [100.00%]

  60.001063613 seconds time elapsed

perf top -G output:

-  25.65%  [kernel][k] _raw_spin_lock
   - _raw_spin_lock
  - 98.63% kvm_ioapic_update_eoi
kvm_ioapic_send_eoi
apic_set_eoi
apic_reg_write
kvm_hv_vapic_msr_write
set_msr_hyperv
kvm_set_msr_common
vmx_set_msr
handle_wrmsr
vmx_handle_exit
vcpu_enter_guest
__vcpu_run
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
SyS_ioctl
system_call_fastpath
 + __GI___ioctl

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Server 2008R2 KVM guest performance issues

2013-08-27 Thread Brian Rak


On 8/27/2013 11:09 AM, Paolo Bonzini wrote:

Il 27/08/2013 16:44, Brian Rak ha scritto:

Il 26/08/2013 21:15, Brian Rak ha scritto:

Samples: 62M of event 'cycles', Event count (approx.): 642019289177
   64.69%  [kernel][k] _raw_spin_lock
2.59%  qemu-system-x86_64  [.] 0x001e688d
1.90%  [kernel][k] native_write_msr_safe
0.84%  [kvm]   [k] vcpu_enter_guest
0.80%  [kernel][k] __schedule
0.77%  [kvm_intel] [k] vmx_vcpu_run
0.68%  [kernel][k] effective_load
0.65%  [kernel][k] update_cfs_shares
0.62%  [kernel][k] _raw_spin_lock_irq
0.61%  [kernel][k] native_read_msr_safe
0.56%  [kernel][k] enqueue_entity

Can you capture the call graphs, too (perf record -g)?

Sure.  I'm not entire certain how to use perf effectively.  I've used
`perf record`, then manually expanded the call stacks in `perf report`.
If this isn't what you wanted, please let me know.

https://gist.github.com/devicenull/7961f23e6756b647a86a/raw/a04718db2c26b31e50fb7f521d47d911610383d8/gistfile1.txt


This is actually quite useful!

-  41.41%  qemu-system-x86  [kernel.kallsyms]   
  0x815ef6d5 k [k] _raw_spin_lock
- _raw_spin_lock
   - 48.06% futex_wait_setup
futex_wait
do_futex
SyS_futex
system_call_fastpath
  - __lll_lock_wait
   99.32% 0x1010002
   - 44.71% futex_wake
do_futex
SyS_futex
system_call_fastpath
  - __lll_unlock_wake
   99.33% 0x1010002

This could be multiple VCPUs competing on QEMU's "big lock" because the pmtimer
is being read by different VCPUs at the same time.  This can be fixed, and
probably will in 1.7 or 1.8.



I've successfully applied the patch set, and have seen significant 
performance increases.  Kernel CPU usage is no longer half of all CPU 
usage, and my insn_emulation counts are down to ~2000/s rather then 
20,000/s.


I did end up having to patch qemu in a terrible way in order to get this 
working. I've just enabled the TSC optimizations whenever hv_vapic is 
enabled.  This is far from the best way of doing it, but I'm not really 
a C developer and we'll always want the TSC optimizations on our windows 
VMs.  In case anyone wants to do the same, it's a pretty simple patch:


*** clean/qemu-1.6.0/target-i386/kvm.c  2013-08-15 15:56:23.0 -0400
--- qemu-1.6.0/target-i386/kvm.c2013-08-27 11:08:21.388841555 -0400
*** int kvm_arch_init_vcpu(CPUState *cs)
*** 477,482 
--- 477,484 
  if (hyperv_vapic_recommended()) {
  c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
  c->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
+   c->eax |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
+   c->eax |= 0x200;
  }

  c = &cpuid_data.entries[cpuid_i++];

It also seems that if you have useplatformclock=yes in the guest, it 
will not use the enlightened TSC.  `bcdedit /set useplatformclock=no` 
and a reboot will correct that.


Are there any sort of guidelines for what I should be seeing from 
kvm_stat?  This is pretty much average for me now:


 exits   1362839114  195453
 fpu_reload   11016   34100
 halt_exits   187767718   33222
 halt_wakeup  198400078   35628
 host_state_reload222907845   36212
 insn_emulation221089422091
 io_exits  320944553132
 irq_exits 88852031   15855
 irq_injections   332358611   60694
 irq_window61495812   12125

(all the other ones do not change frequently)

The only real way I know to judge things is based on the performance of 
the guest.  Are there any sort of thresholds for these numbers that 
would indicate a problem?




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Server 2008R2 KVM guest performance issues

2013-08-27 Thread Brian Rak


On 8/27/2013 3:18 AM, Paolo Bonzini wrote:

Il 26/08/2013 21:15, Brian Rak ha scritto:

Samples: 62M of event 'cycles', Event count (approx.): 642019289177
  64.69%  [kernel][k] _raw_spin_lock
   2.59%  qemu-system-x86_64  [.] 0x001e688d
   1.90%  [kernel][k] native_write_msr_safe
   0.84%  [kvm]   [k] vcpu_enter_guest
   0.80%  [kernel][k] __schedule
   0.77%  [kvm_intel] [k] vmx_vcpu_run
   0.68%  [kernel][k] effective_load
   0.65%  [kernel][k] update_cfs_shares
   0.62%  [kernel][k] _raw_spin_lock_irq
   0.61%  [kernel][k] native_read_msr_safe
   0.56%  [kernel][k] enqueue_entity

Can you capture the call graphs, too (perf record -g)?


Sure.  I'm not entire certain how to use perf effectively.  I've used 
`perf record`, then manually expanded the call stacks in `perf report`.  
If this isn't what you wanted, please let me know.


https://gist.github.com/devicenull/7961f23e6756b647a86a/raw/a04718db2c26b31e50fb7f521d47d911610383d8/gistfile1.txt


I've captured 20,000 lines of kvm trace output.  This can be found
https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace

The guest is doing quite a lot of exits per second, mostly to (a) access
the ACPI timer (b) service NMIs.  In fact, every NMI is reading the
timer too and causing an exit to QEMU.

So it is also possible that you have to debug this inside the guest, to
see if these exits are expected or not.
Do you have any suggestions for how I would do this?  Given that the 
guest is Windows, I'm not certain how I could even begin to debug this.



Also, for that patch set I found, do I also need a patch for qemu to 
actually enable the new enlightenment? I haven't been able to find 
anything for qemu that matches that patch.  I did find 
http://www.mail-archive.com/kvm@vger.kernel.org/msg82495.html , but 
that's from significantly before the patchset, so I can't tell if that's 
still related.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Server 2008R2 KVM guest performance issues

2013-08-27 Thread Brian Rak


On 8/27/2013 3:38 AM, Gleb Natapov wrote:

On Tue, Aug 27, 2013 at 09:18:00AM +0200, Paolo Bonzini wrote:

I've captured 20,000 lines of kvm trace output.  This can be found
https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace

The guest is doing quite a lot of exits per second, mostly to (a) access
the ACPI timer

I see a lot of PM timer access not ACPI timer. The solution for that is
the patchset Brian linked.


 (b) service NMIs.  In fact, every NMI is reading the
timer too and causing an exit to QEMU.


Do you mean "kvm_exit: reason EXCEPTION_NMI rip 0xf800016dcf84 info
0 8307"? Those are not NMIs, single NMI will kill Windows, they are #NM
exceptions. Brian, is your workload uses floating point calculation?


Yes, our workload uses floating point heavily.  I'd also strongly 
suspect it's doing various things with timers quite frequently. (This is 
all third party software, so I don't have the source to examine to 
determine exactly what it's doing).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Server 2008R2 KVM guest performance issues

2013-08-26 Thread Brian Rak

On 8/26/2013 3:15 PM, Brian Rak wrote:
I've been trying to track down the cause of some serious performance 
issues with a Windows 2008R2 KVM guest.  So far, I've been unable to 
determine what exactly is causing the issue.


When the guest is under load, I see very high kernel CPU usage, as 
well as terrible guest performance.  The workload on the guest is 
approximately 1/4 of what we'd run unvirtualized on the same 
hardware.  Even at that level, we max out every vCPU in the guest. 
While the guest runs, I see very high kernel CPU usage (based on 
`htop` output).



Host setup:
Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 
2013 x86_64 x86_64 x86_64 GNU/Linux

CentOS 6
qemu 1.6.0
2x Intel E5-2630 (virtualization extensions turned on, total of 24 
cores including hyperthread cores)

24GB memory
swap file is enabled, but unused

Guest setup:
Windows Server 2008R2 (64 bit)
24 vCPUs
16 GB memory
VirtIO disk and network drivers installed
/qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine 
pc-i440fx-1.6,accel=kvm,usb=off -cpu 
host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 
24,sockets=1,cores=12,threads=2 -uuid 
90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc 
base=utc,driftfix=slew -no-hpet -boot c -usb -drive 
file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 
-netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 
-vnc 127.0.0.1:100 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


The beginning of `perf top` output:

Samples: 62M of event 'cycles', Event count (approx.): 642019289177
 64.69%  [kernel][k] _raw_spin_lock
  2.59%  qemu-system-x86_64  [.] 0x001e688d
  1.90%  [kernel][k] native_write_msr_safe
  0.84%  [kvm]   [k] vcpu_enter_guest
  0.80%  [kernel][k] __schedule
  0.77%  [kvm_intel] [k] vmx_vcpu_run
  0.68%  [kernel][k] effective_load
  0.65%  [kernel][k] update_cfs_shares
  0.62%  [kernel][k] _raw_spin_lock_irq
  0.61%  [kernel][k] native_read_msr_safe
  0.56%  [kernel][k] enqueue_entity

I've captured 20,000 lines of kvm trace output.  This can be found 
https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace 



So far, I've tried the following with very little effect:
* Disable HPET on the guest
* Enable hv_relaxed, hv_vapic, hv_spinlocks
* Enable SR-IOV
* Pin vCPUs to physical CPUs
* Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes)
* bcdedit /set useplatformclock yes and no


Any suggestions as to what I can do to get better performance out of 
ths guest?  Or reasons why I'm seeing such high kernel cpu usage with it?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I've done some additional research on this, and I believe that 'kvm_pio: 
pio_read at 0xb008 size 4 count 1' is related to windows trying to read 
the pm timer.  This timer appears to use the TSC in some cases (I 
think).  I found this patchset: 
http://www.spinics.net/lists/kvm/msg91214.html which doesn't appear to 
be applied yet.  Does it seem reasonable that this patchset would 
eliminate the need for windows to read from the pm timer continuously?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Windows Server 2008R2 KVM guest performance issues

2013-08-26 Thread Brian Rak
I've been trying to track down the cause of some serious performance 
issues with a Windows 2008R2 KVM guest.  So far, I've been unable to 
determine what exactly is causing the issue.


When the guest is under load, I see very high kernel CPU usage, as well 
as terrible guest performance.  The workload on the guest is 
approximately 1/4 of what we'd run unvirtualized on the same hardware.  
Even at that level, we max out every vCPU in the guest. While the guest 
runs, I see very high kernel CPU usage (based on `htop` output).



Host setup:
Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 
2013 x86_64 x86_64 x86_64 GNU/Linux

CentOS 6
qemu 1.6.0
2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores 
including hyperthread cores)

24GB memory
swap file is enabled, but unused

Guest setup:
Windows Server 2008R2 (64 bit)
24 vCPUs
16 GB memory
VirtIO disk and network drivers installed
/qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine 
pc-i440fx-1.6,accel=kvm,usb=off -cpu 
host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 
24,sockets=1,cores=12,threads=2 -uuid 
90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=readline -rtc 
base=utc,driftfix=slew -no-hpet -boot c -usb -drive 
file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native 
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 
-netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 
-vnc 127.0.0.1:100 -k en-us -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


The beginning of `perf top` output:

Samples: 62M of event 'cycles', Event count (approx.): 642019289177
 64.69%  [kernel][k] _raw_spin_lock
  2.59%  qemu-system-x86_64  [.] 0x001e688d
  1.90%  [kernel][k] native_write_msr_safe
  0.84%  [kvm]   [k] vcpu_enter_guest
  0.80%  [kernel][k] __schedule
  0.77%  [kvm_intel] [k] vmx_vcpu_run
  0.68%  [kernel][k] effective_load
  0.65%  [kernel][k] update_cfs_shares
  0.62%  [kernel][k] _raw_spin_lock_irq
  0.61%  [kernel][k] native_read_msr_safe
  0.56%  [kernel][k] enqueue_entity

I've captured 20,000 lines of kvm trace output.  This can be found 
https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace


So far, I've tried the following with very little effect:
* Disable HPET on the guest
* Enable hv_relaxed, hv_vapic, hv_spinlocks
* Enable SR-IOV
* Pin vCPUs to physical CPUs
* Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes)
* bcdedit /set useplatformclock yes and no


Any suggestions as to what I can do to get better performance out of ths 
guest?  Or reasons why I'm seeing such high kernel cpu usage with it?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html