[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #29 from Rosen 2012-02-15 07:49:35 --- (In reply to comment #28) > (In reply to comment #27) > > and there soon will be video capture with 'perf top' > > > > http://vbox7.com/play:199e9ede30 > > Run it while the guest is also running. Good Morning! There will be video http://vbox7.com/play:7128f03f1f after some momments. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in kvm on i386
The Buildbot has detected a new failure on builder i386 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/i386/builds/454 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_master' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 04:13:42PM +0400, Cyrill Gorcunov wrote: > On Tue, Feb 14, 2012 at 01:10:59PM +0200, Pekka Enberg wrote: > > On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai wrote: > > > Since on X86, bios is always at the end of the address space, so I > > > have some thought about how to implement the seabios support for kvm > > > tool. > > > > > > 1. using kvm__register_mem to map the end of address space to the > > > guest then copy the code of seabios to this mem region. Just emulating > > > the bios chip. > > I think this is what should be done. > > > > > > > 2. leave the bios code alone and don't touch the guest's address > > > space. If the guest accesses the address belonging to the bios, it > > > will be an IO request and we can emulate the IO access to the bios > > > chip. > > > > > > Any ideas about this? > > > > The latter solution doesn't make any sense to me. Cyrill, do we really > > need to put the BIOS at the end of the address space? Don't we have > > unused space below 1 MB? > > I don't remember for sure how SeaBIOS works actually. What I rememer > is that it aquires all hw environment might have. So without real look > into seabios code I fear I can't answer. But reserving end of 4G address > space for bios copy sounds reasonable if we going to behave as real > hardware. Maybe we could poke someone from KVM camp for a hint? SeaBIOS has two ways to be deployed - first is to copy the image to the top of the first 1MB (eg, 0xe-0xf) and jump to 0xf000:0xfff0 in 16bit mode. The second way is to use the SeaBIOS elf and deploy into memory (according to the elf memory map) and jump to SeaBIOS in 32bit mode (according to the elf entry point). SeaBIOS doesn't really need to be in the top 4G of ram. SeaBIOS does expect to have normal PC hardware devices (eg, a PIC), though many hardware devices can be compiled out via its kconfig interface. The more interesting challenge will likely be in communicating critical pieces of information (eg, total memory size) into SeaBIOS. The SeaBIOS mailing list (seab...@seabios.org) is probably a better location for technical seabios questions. -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AESNI and guest hosts
Thanks for the reply, I was thinking AESNI was supported in the way SSE/MMX and other cpu flags are supported? is a QEMU or a KVM issue? On Wed, Feb 15, 2012 at 7:18 AM, Brian Jackson wrote: > On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote: >> Sorry for being a noob here, Any clues with this?, anyone ... >> >> On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown wrote: >> > Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest >> > kernel is running 3.2.5. The cpu is an E3-1230, but for some reason >> > its not able to supply the guest with aesni. Is there a config option >> > or is there something we're missing? > > > > I don't think it's supported to pass that functionality to the guest. > > > >> > >> > >> > x86_64 >> > Westmere >> > Intel >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > Guest: >> > [root@fanboy:~]# cat /proc/cpuinfo >> > processor : 0 >> > vendor_id : GenuineIntel >> > cpu family : 6 >> > model : 2 >> > model name : QEMU Virtual CPU version 1.0 >> > stepping : 3 >> > microcode : 0x1 >> > cpu MHz : 3192.748 >> > cache size : 4096 KB >> > fdiv_bug : no >> > hlt_bug : no >> > f00f_bug : no >> > coma_bug : no >> > fpu : yes >> > fpu_exception : yes >> > cpuid level : 4 >> > wp : yes >> > flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca >> > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt >> > hypervisor lahf_lm >> > bogomips : 6385.49 >> > clflush size : 64 >> > cache_alignment : 64 >> > address sizes : 40 bits physical, 48 bits virtual >> > power management: >> > >> > processor : 1 >> > vendor_id : GenuineIntel >> > cpu family : 6 >> > model : 2 >> > model name : QEMU Virtual CPU version 1.0 >> > stepping : 3 >> > microcode : 0x1 >> > cpu MHz : 3192.748 >> > cache size : 4096 KB >> > fdiv_bug : no >> > hlt_bug : no >> > f00f_bug : no >> > coma_bug : no >> > fpu : yes >> > fpu_exception : yes >> > cpuid level : 4 >> > wp : yes >> > flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca >> > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt >> > hypervisor lahf_lm >> > bogomips : 6385.49 >> > clflush size : 64 >> > cache_alignment : 64 >> > address sizes : 40 bits physical, 48 bits virtual >> >> > power management: >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Tue, 14 Feb 2012, Marcelo Tosatti wrote: > On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote: > > On Tue, 14 Feb 2012, Marcelo Tosatti wrote: > > > > > On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote: > > > > On Wed, 08 Feb 2012, Eric B Munson wrote: > > > > > > > > > > > > > > When a guest kernel is stopped by the host hypervisor it can look > > > > > like a soft > > > > > lockup to the guest kernel. This false warning can mask later soft > > > > > lockup > > > > > warnings which may be real. This patch series adds a method for a > > > > > host > > > > > hypervisor to communicate to a guest kernel that it is being stopped. > > > > > The > > > > > final patch in the series has the watchdog check this flag when it > > > > > goes to > > > > > issue a soft lockup warning and skip the warning if the guest knows > > > > > it was > > > > > stopped. > > > > > > > > > > It was attempted to solve this in Qemu, but the side effects of > > > > > saving and > > > > > restoring the clock and tsc for each vcpu put the wall clock of the > > > > > guest behind > > > > > by the amount of time of the pause. This forces a guest to have ntp > > > > > running > > > > > in order to keep the wall clock accurate. > > > > > > > > Avi, > > > > > > > > Is this set fit for merging or is there something else you want changed? > > > > > > Eric, > > > > > > On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked: > > > > > > How is the stub getting included for other architectures again? > > > > > > > Marcelo, > > > > Sorry, I put out V13 to answer that. There is a stub in asm-generic that > > was > > lost in the V11-V12 rebase. This stub has be included in the V13 set. > > > > Eric > > Eric, > > I know the stub has been included in the series. But i am asking how > it is #include'ed for other architectures? (can't see that). Marcelo, kernel/watchdog.c now includes linux/kvm_para.h which includes asm/kvm_para.h. The check_and_clear function is defined in arch include/asm/kvm_para.h or in asm-generic/kvm_para.h for any arch lacking the specific header in their asm include dir. If I have misunderstood how these headers work, please let me know and I will fix it. Eric signature.asc Description: Digital signature
Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
On Tue, Feb 14, 2012 at 07:53:56PM +0100, Andrea Arcangeli wrote: > On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote: > > The problem the patch is fixing is not related to page freeing, but > > rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d > > Can't find the commit on kvm.git. Sorry, we got kvm.git out of sync. But you can see an equivalent below. > > > (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"): > > > > > > During protecting pages for dirty logging, other threads may also try > > to protect a page in mmu_sync_children() or kvm_mmu_get_page(). > > > > In such a case, because get_dirty_log releases mmu_lock before flushing > > TLB's, the following race condition can happen: > > > > A (get_dirty_log) B (another thread) > > > > lock(mmu_lock) > > clear pte.w > > unlock(mmu_lock) > > lock(mmu_lock) > > pte.w is already cleared > > unlock(mmu_lock) > > skip TLB flush > > Not sure which tree it is, but in kvm and upstream I see an > unconditional tlb flush here, not skip (both > kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I > assume this has been updated in your tree to eb conditional. if (!direct) { if (rmap_write_protect(vcpu->kvm, gfn)) kvm_flush_remote_tlbs(vcpu->kvm); > Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock > in the kvm_mmu_rmap_write_protect case (like in quoted description), > so two write_protect_slot in parallel against each other may not be > ok, but that may be enforced by design if qemu won't ever call that > ioctl from two different userland threads (it doesn't sounds security > related so it should be ok to enforce its safety by userland design). Yes, here is the fix: http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=02b48d00d7f1853bdf8a06da19ca5413ebe334c6 This is an equivalent of 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d. > > > return > > ... > > TLB flush > > > > Though thread B assumes the page has already been protected when it > > returns, the remaining TLB entry will break that assumption. > > Now I get the question of why not running the TLB flush inside the > mmu_lock only if the spte was writable :). > > kvm_mmu_get_page as long as it only runs in the context of a kvm page > fault is ok, because the page fault would be inhibited by the mmu > notifier invalidates, so maybe it's safe. Ah, perhaps, but this was not taken into account before. Can you confirm this is the case so we can revert the invalidate_page patch? > mmu_sync_children seems to have a problem instead, in your tree > get_dirty_log also has an issue if it has been updated to skip the > flush on readonly sptes, like I guess. > > Interesting how the spte is already non present, the page is just > being freed shortly later, but yet we still need to trigger write > faults synchronously and prevent other CPUs in guest mode to further > modify the page to avoid losing dirty bits updates or updates on > pagetables that maps pagetables in the not NPT/EPT case. If the page > was really only going to be freed it would be ok if the other cpus > would still write to it for a little longer until the page was freed. > > Like I wrote in previous email, I was thinking if we'd change the mmu > notifier methods to do an unconditional flush, then every other flush > could also run outside of the mmu_lock. But then I didn't think enough > about this to be sure. My guess is we could move all flushes outside > the mmu_lock if we stop flushling the tlb conditonally to the current > spte values. It'd clearly be slower for an UP guest though :). Large > SMP guests might benefit, if that is feasible at all... It depends how > problematic the mmu_lock is on the large SMP guests and how much we're > saving by doing conditional TLB flushes. Also it should not be necessary for these flushes to be inside mmu_lock on EPT/NPT case (since there is no write protection there). But it would be awkward to differentiate the unlock position based on EPT/NPT. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On 2/14/2012 11:05 AM, Stephen Hemminger wrote: > On Tue, 14 Feb 2012 10:57:04 -0800 > John Fastabend wrote: > >> On 2/14/2012 5:18 AM, jamal wrote: >>> On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote: >>> The use case here is multiple VFs but the same solution should work with multiple PFs as well. FDB controls should be independent of how the ports are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc. >>> >>> Makes sense. >>> With events and ADD/DEL/GET FDB controls we can solve both cases. This also solves Roopa's case with macvlan where she wants to add additional addresses to macvlan ports. >>> >>> Not familiar with that issue - I'll prowl the list. >> >> Roopa was likely on the right track here, >> >> http://patchwork.ozlabs.org/patch/123064/ >> >> But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX >> netlink messages. And if possible drive this without extending ndo_ops. >> >> An ideal user space interaction IMHO would look like, >> >> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10 >> [root@jf-dev1-dcblab iproute2]# ./br/br fdb >> portmac addrflags >> veth2 36:a6:35:9b:96:c4 local >> veth4 aa:54:b0:7b:42:ef local >> veth0 2a:e8:5c:95:6c:1b local >> veth6 6e:26:d5:43:a3:36 local >> veth0 f2:c1:39:76:6a:fb >> veth8 4e:35:16:af:87:13 local >> veth10 52:e5:62:7b:57:88 static >> veth10 aa:a9:35:21:15:c4 local >> [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88 >> RTNETLINK answers: Invalid argument > > I am going to put bridge (nameclash with br) tool into iproute2 (soon). I've been using it on my dev box for awhile now and it works well for me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Tue, 14 Feb 2012 10:57:04 -0800 John Fastabend wrote: > On 2/14/2012 5:18 AM, jamal wrote: > > On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote: > > > >> The use case here is multiple VFs but the same solution should work with > >> multiple PFs as well. FDB controls should be independent of how the ports > >> are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc. > > > > Makes sense. > > > >> With events and ADD/DEL/GET FDB controls we can solve both cases. This also > >> solves Roopa's case with macvlan where she wants to add additional > >> addresses > >> to macvlan ports. > > > > Not familiar with that issue - I'll prowl the list. > > Roopa was likely on the right track here, > > http://patchwork.ozlabs.org/patch/123064/ > > But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX > netlink messages. And if possible drive this without extending ndo_ops. > > An ideal user space interaction IMHO would look like, > > [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10 > [root@jf-dev1-dcblab iproute2]# ./br/br fdb > portmac addrflags > veth2 36:a6:35:9b:96:c4 local > veth4 aa:54:b0:7b:42:ef local > veth0 2a:e8:5c:95:6c:1b local > veth6 6e:26:d5:43:a3:36 local > veth0 f2:c1:39:76:6a:fb > veth8 4e:35:16:af:87:13 local > veth10 52:e5:62:7b:57:88 static > veth10 aa:a9:35:21:15:c4 local > [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88 > RTNETLINK answers: Invalid argument I am going to put bridge (nameclash with br) tool into iproute2 (soon). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On 2/14/2012 5:18 AM, jamal wrote: > On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote: > >> The use case here is multiple VFs but the same solution should work with >> multiple PFs as well. FDB controls should be independent of how the ports >> are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc. > > Makes sense. > >> With events and ADD/DEL/GET FDB controls we can solve both cases. This also >> solves Roopa's case with macvlan where she wants to add additional addresses >> to macvlan ports. > > Not familiar with that issue - I'll prowl the list. Roopa was likely on the right track here, http://patchwork.ozlabs.org/patch/123064/ But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX netlink messages. And if possible drive this without extending ndo_ops. An ideal user space interaction IMHO would look like, [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10 [root@jf-dev1-dcblab iproute2]# ./br/br fdb portmac addrflags veth2 36:a6:35:9b:96:c4 local veth4 aa:54:b0:7b:42:ef local veth0 2a:e8:5c:95:6c:1b local veth6 6e:26:d5:43:a3:36 local veth0 f2:c1:39:76:6a:fb veth8 4e:35:16:af:87:13 local veth10 52:e5:62:7b:57:88 static veth10 aa:a9:35:21:15:c4 local [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88 RTNETLINK answers: Invalid argument Using Stephen's br tool. First command adds FDB entry to SW bridge and if the same tool could be used to add entries to embedded bridge I think that would be the best case. So no RTNETLINK error on the second cmd. Then embedded FDB entries could be dumped this way also so I get a complete view of my FDB setup across multiple sw bridges and embedded bridges. I don't think br is part of iproute2 yet I just pulled it out of some RFC but it works reasonably well and is intuitive enough. > >> Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA. > > Ok. So there is a toggle somewhere which controls how flooding should > happen. > Yes. The hardware has a bit to support this which is currently not exposed to user space. That's a case where we have 'yet another knob' that needs a clean solution. This causes real bugs today when users try to use the macvlan devices in VEPA mode on top of SR-IOV. By the way these modes are all part of the 802.1Qbg spec which people actually want to use with Linux so a good clean solution is probably needed. >> >> Maybe not. But the kernel already has the needed signals with one extra >> hook we can save running a daemon in user space. Maybe that's not a great >> argument to add kernel code though. > > You make a reasonable arguement to have it in the kernel but i think we > win more if we separate the control. So while i empathize, I am hoping > that youd go with the path that is hard to travel ;-> > >> The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the >> br_netlink_init() path. > > Hrm - hadnt paid attention to that before. Nasty. > The bridge seems to be hard-coding policy on station movement, no? > This is a good example of the qualms i have on adding things to the > kernel;-> > I may not want to auto update a MAC address moving ports as part of > some policy i have. I can go and add YAK (Yet Another Knob) - but where > is the line drawn? > I have no problem with drawing the line here and trying to implement something over PF_BRIDGE:RTM_xxx nlmsgs. I'll work with Roopa and see if we can come up with something in the next couple days. w.r.t. VEPA/VEB and flooding behavior we could probably have a bit to indicate if the port is a flooding port or not. Then users could build any sort of forwarding table they wanted OR we could just drive it through a notifier (ndo_ops?) in the macvlan path which does VEPA today. OK I'll try to write some actual code now that can be critiqued. > cheers, > jamal > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote: > The problem the patch is fixing is not related to page freeing, but > rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d Can't find the commit on kvm.git. > (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"): > > > During protecting pages for dirty logging, other threads may also try > to protect a page in mmu_sync_children() or kvm_mmu_get_page(). > > In such a case, because get_dirty_log releases mmu_lock before flushing > TLB's, the following race condition can happen: > > A (get_dirty_log) B (another thread) > > lock(mmu_lock) > clear pte.w > unlock(mmu_lock) > lock(mmu_lock) > pte.w is already cleared > unlock(mmu_lock) > skip TLB flush Not sure which tree it is, but in kvm and upstream I see an unconditional tlb flush here, not skip (both kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I assume this has been updated in your tree to eb conditional. Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock in the kvm_mmu_rmap_write_protect case (like in quoted description), so two write_protect_slot in parallel against each other may not be ok, but that may be enforced by design if qemu won't ever call that ioctl from two different userland threads (it doesn't sounds security related so it should be ok to enforce its safety by userland design). > return > ... > TLB flush > > Though thread B assumes the page has already been protected when it > returns, the remaining TLB entry will break that assumption. Now I get the question of why not running the TLB flush inside the mmu_lock only if the spte was writable :). kvm_mmu_get_page as long as it only runs in the context of a kvm page fault is ok, because the page fault would be inhibited by the mmu notifier invalidates, so maybe it's safe. mmu_sync_children seems to have a problem instead, in your tree get_dirty_log also has an issue if it has been updated to skip the flush on readonly sptes, like I guess. Interesting how the spte is already non present, the page is just being freed shortly later, but yet we still need to trigger write faults synchronously and prevent other CPUs in guest mode to further modify the page to avoid losing dirty bits updates or updates on pagetables that maps pagetables in the not NPT/EPT case. If the page was really only going to be freed it would be ok if the other cpus would still write to it for a little longer until the page was freed. Like I wrote in previous email, I was thinking if we'd change the mmu notifier methods to do an unconditional flush, then every other flush could also run outside of the mmu_lock. But then I didn't think enough about this to be sure. My guess is we could move all flushes outside the mmu_lock if we stop flushling the tlb conditonally to the current spte values. It'd clearly be slower for an UP guest though :). Large SMP guests might benefit, if that is feasible at all... It depends how problematic the mmu_lock is on the large SMP guests and how much we're saving by doing conditional TLB flushes. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AESNI and guest hosts
On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote: > Sorry for being a noob here, Any clues with this?, anyone ... > > On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown wrote: > > Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest > > kernel is running 3.2.5. The cpu is an E3-1230, but for some reason > > its not able to supply the guest with aesni. Is there a config option > > or is there something we're missing? I don't think it's supported to pass that functionality to the guest. > > > > > > x86_64 > > Westmere > > Intel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Guest: > > [root@fanboy:~]# cat /proc/cpuinfo > > processor : 0 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 2 > > model name : QEMU Virtual CPU version 1.0 > > stepping: 3 > > microcode : 0x1 > > cpu MHz : 3192.748 > > cache size : 4096 KB > > fdiv_bug: no > > hlt_bug : no > > f00f_bug: no > > coma_bug: no > > fpu : yes > > fpu_exception : yes > > cpuid level : 4 > > wp : yes > > flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca > > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt > > hypervisor lahf_lm > > bogomips: 6385.49 > > clflush size: 64 > > cache_alignment : 64 > > address sizes : 40 bits physical, 48 bits virtual > > power management: > > > > processor : 1 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 2 > > model name : QEMU Virtual CPU version 1.0 > > stepping: 3 > > microcode : 0x1 > > cpu MHz : 3192.748 > > cache size : 4096 KB > > fdiv_bug: no > > hlt_bug : no > > f00f_bug: no > > coma_bug: no > > fpu : yes > > fpu_exception : yes > > cpuid level : 4 > > wp : yes > > flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca > > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt > > hypervisor lahf_lm > > bogomips: 6385.49 > > clflush size: 64 > > cache_alignment : 64 > > address sizes : 40 bits physical, 48 bits virtual > > > power management: > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [VT-d reboot problems] Re: [PATCH] x86 / reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot
On Tue, Jan 31, 2012 at 1:15 PM, Ingo Molnar wrote: > > (added KVM folks to the Cc:) > > * Bastien ROUCARIES wrote: > >> Ping^2 >> >> Bastien >> On Mon, Jan 23, 2012 at 11:28 AM, Bastien ROUCARIES >> wrote: >> > On Mon, Jan 16, 2012 at 8:21 PM, H. Peter Anvin wrote: >> >> On 01/16/2012 03:27 AM, Bastien ROUCARIES wrote: >> >> Does it work if you disable VT-d in the firmware? If so, then adding it >> to the reboot method blacklist is the wrong fix - we need to figure out >> why VT-d interferes with Dell's reboot code. >> >>> >> >>> Yes it work >> >>> >> >> >> >> This is particularly so since we are very close to having a full Dell >> >> model catalogue in the kernel... >> > >> > Ping ? Do you need some dump ? testing ? > > So disabling VT-d in the BIOS fixes the reboot problem and > Matthew Garrett suggests we should figure out why and how VT-d > on this Dell box interferes with the reboot method. > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
On Tue, Feb 14, 2012 at 06:10:44PM +0100, Andrea Arcangeli wrote: > On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote: > > Other threads may process the same page in that small window and skip > > TLB flush and then return before these functions do flush. > > It's correct to flush the shadow MMU TLB without the mmu_lock only in > the context of mmu notifier methods. So the below while won't hurt, > it's performance regression and shouldn't be applied (and > it obfuscates the code by not being strict anymore). > > To the contrary every other place that does a shadow/secondary MMU smp > tlb flush _must_ happen inside the mmu_lock, otherwise the > serialization isn't correct anymore against the very below mmu_lock in > the below quoted patch taken by > kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start. > > The explanation is in commit 4539b35881ae9664b0e2953438dd83f5ee02c0b4. > > I'll try to explain it more clearly: the moment you drop mmu_lock, > pages can be freed. So if you invalidate a spte in any place inside > the KVM code (except the mmu notifier methods where a reference of the > page is implicitly hold by the caller and so the page can't go away > under a mmu notifier method by design), then the below > kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start > won't get their need_tlb_flush set anymore, and they won't run the tlb > flush before freeing the page. > > So every other place (except mmu notifier) must flush the secondary > MMU smp tlb _before_ releasing the mmu_lock. > > Only mmu notifier is safe to flush the secondary MMU TLB _after_ > releasing the mmu_lock. > > If we changed the mmu notifier methods to unconditionally flush the > shadow TLB (regardless if a spte was present or not), we might not > need to hold the mmu_lock in every tlb flush outside the context of > the mmu notifier methods. But then the mmu notifier methods would > become more expensive, I didn't evaluate fully what would be the side > effects of such a change. Also note, only the > kvm_mmu_notifier_invalidate_page and > kvm_mmu_notifier_invalidate_range_start would need to do that, because > they're the only two where the page reference gets dropped. > > Even shorter: because the mmu notifier a implicit reference on the > page exists and is hold by the caller, they can flush outside the > mmu_lock. Every other place in KVM only holds an implicit valid > reference on the page only as long as you hold the mmu_lock, or while > a spte is still established. > > Well it's not easy logic so it's not surprising it wasn't totally > clear. > > It's probably not heavily documented, and the fact you changed it > still is still good so we refresh our minds on the exact rules of mmu > notifier locking, thanks! The problem the patch is fixing is not related to page freeing, but rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d (replace "A (get_dirty_log)" with "mmu_notifier_invalidate_page"): During protecting pages for dirty logging, other threads may also try to protect a page in mmu_sync_children() or kvm_mmu_get_page(). In such a case, because get_dirty_log releases mmu_lock before flushing TLB's, the following race condition can happen: A (get_dirty_log) B (another thread) lock(mmu_lock) clear pte.w unlock(mmu_lock) lock(mmu_lock) pte.w is already cleared unlock(mmu_lock) skip TLB flush return ... TLB flush Though thread B assumes the page has already been protected when it returns, the remaining TLB entry will break that assumption. > > Andrea > > > > > Signed-off-by: Takuya Yoshikawa > > --- > > virt/kvm/kvm_main.c | 19 ++- > > 1 files changed, 10 insertions(+), 9 deletions(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 470e305..2b4bc77 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_page(struct > > mmu_notifier *mn, > > */ > > idx = srcu_read_lock(&kvm->srcu); > > spin_lock(&kvm->mmu_lock); > > + > > kvm->mmu_notifier_seq++; > > need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty; > > - spin_unlock(&kvm->mmu_lock); > > - srcu_read_unlock(&kvm->srcu, idx); > > - > > /* we've to flush the tlb before the pages can be freed */ > > if (need_tlb_flush) > > kvm_flush_remote_tlbs(kvm); > > > > + spin_unlock(&kvm->mmu_lock); > > + srcu_read_unlock(&kvm->srcu, idx); > > } > > > > static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > > @@ -335,12 +335,12 @@ static void > > kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > > for (; start < end; start += PAGE_SIZE) > > need_tlb_flush |= kvm_unmap_hva(kvm, start); > > need_tlb_flush |= kvm->tlbs_dirty; > > - spi
Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
On Fri, Feb 10, 2012 at 03:52:49PM +0800, Xiao Guangrong wrote: > On 02/10/2012 02:28 PM, Takuya Yoshikawa wrote: > > > Other threads may process the same page in that small window and skip > > TLB flush and then return before these functions do flush. > > > > > It is possible that flush tlb in mmu lock only when writeable > spte is invalided? Sometimes, kvm_flush_remote_tlbs need > long time to wait. readonly isn't enough to defer the flush after mmu_lock is released... if you do it only for writable spte, then what can happen is the guest may read random data and would crash. However for this case, the mmu_notifier methods (and only them) are perfectly safe to flush the shadow MMU TLB after the mmu_lock is released because the page reference is guaranteed hold by the caller (not the case for any other place where a spte gets dropped in KVM, all other places dropping sptes, can only on the mmu notifier to block on the mmu_lock in order to have a guarantee of the page not being freed under them, so in every other place the shadow MMU TLB flush must happen before releasing the mmu_lock so the mmu_notifier will wait and prevent the page to be freed until all other CPUs running in guest mode stopped accessing it). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: MMU: Flush TLBs only once in invlpg() before releasing mmu_lock
On Tue, Feb 14, 2012 at 01:56:17PM +0900, Takuya Yoshikawa wrote: > (2012/02/14 13:36), Takuya Yoshikawa wrote: > > > BTW, do you think that "kvm_mmu_flush_tlb()" should be moved inside of the > > mmu_lock critical section? > > > > Ah, forget about this. Trivially no. Yes the reason is that it's the local flush and guest mode isn't running if we're running that function so it's ok to run it later. About the other change you did in this patch 2/2, I can't find the code you're patching in the 3.2 upstream source, when I added the tlb flush to invlpg, I immediately used a cumulative need_flush at the end (before relasing mmu_lock of course). if (need_flush) kvm_flush_remote_tlbs(vcpu->kvm); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote: > Other threads may process the same page in that small window and skip > TLB flush and then return before these functions do flush. It's correct to flush the shadow MMU TLB without the mmu_lock only in the context of mmu notifier methods. So the below while won't hurt, it's performance regression and shouldn't be applied (and it obfuscates the code by not being strict anymore). To the contrary every other place that does a shadow/secondary MMU smp tlb flush _must_ happen inside the mmu_lock, otherwise the serialization isn't correct anymore against the very below mmu_lock in the below quoted patch taken by kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start. The explanation is in commit 4539b35881ae9664b0e2953438dd83f5ee02c0b4. I'll try to explain it more clearly: the moment you drop mmu_lock, pages can be freed. So if you invalidate a spte in any place inside the KVM code (except the mmu notifier methods where a reference of the page is implicitly hold by the caller and so the page can't go away under a mmu notifier method by design), then the below kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start won't get their need_tlb_flush set anymore, and they won't run the tlb flush before freeing the page. So every other place (except mmu notifier) must flush the secondary MMU smp tlb _before_ releasing the mmu_lock. Only mmu notifier is safe to flush the secondary MMU TLB _after_ releasing the mmu_lock. If we changed the mmu notifier methods to unconditionally flush the shadow TLB (regardless if a spte was present or not), we might not need to hold the mmu_lock in every tlb flush outside the context of the mmu notifier methods. But then the mmu notifier methods would become more expensive, I didn't evaluate fully what would be the side effects of such a change. Also note, only the kvm_mmu_notifier_invalidate_page and kvm_mmu_notifier_invalidate_range_start would need to do that, because they're the only two where the page reference gets dropped. Even shorter: because the mmu notifier a implicit reference on the page exists and is hold by the caller, they can flush outside the mmu_lock. Every other place in KVM only holds an implicit valid reference on the page only as long as you hold the mmu_lock, or while a spte is still established. Well it's not easy logic so it's not surprising it wasn't totally clear. It's probably not heavily documented, and the fact you changed it still is still good so we refresh our minds on the exact rules of mmu notifier locking, thanks! Andrea > > Signed-off-by: Takuya Yoshikawa > --- > virt/kvm/kvm_main.c | 19 ++- > 1 files changed, 10 insertions(+), 9 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 470e305..2b4bc77 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_page(struct > mmu_notifier *mn, >*/ > idx = srcu_read_lock(&kvm->srcu); > spin_lock(&kvm->mmu_lock); > + > kvm->mmu_notifier_seq++; > need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm->tlbs_dirty; > - spin_unlock(&kvm->mmu_lock); > - srcu_read_unlock(&kvm->srcu, idx); > - > /* we've to flush the tlb before the pages can be freed */ > if (need_tlb_flush) > kvm_flush_remote_tlbs(kvm); > > + spin_unlock(&kvm->mmu_lock); > + srcu_read_unlock(&kvm->srcu, idx); > } > > static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > @@ -335,12 +335,12 @@ static void > kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > for (; start < end; start += PAGE_SIZE) > need_tlb_flush |= kvm_unmap_hva(kvm, start); > need_tlb_flush |= kvm->tlbs_dirty; > - spin_unlock(&kvm->mmu_lock); > - srcu_read_unlock(&kvm->srcu, idx); > - > /* we've to flush the tlb before the pages can be freed */ > if (need_tlb_flush) > kvm_flush_remote_tlbs(kvm); > + > + spin_unlock(&kvm->mmu_lock); > + srcu_read_unlock(&kvm->srcu, idx); > } > > static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn, > @@ -378,13 +378,14 @@ static int kvm_mmu_notifier_clear_flush_young(struct > mmu_notifier *mn, > > idx = srcu_read_lock(&kvm->srcu); > spin_lock(&kvm->mmu_lock); > - young = kvm_age_hva(kvm, address); > - spin_unlock(&kvm->mmu_lock); > - srcu_read_unlock(&kvm->srcu, idx); > > + young = kvm_age_hva(kvm, address); > if (young) > kvm_flush_remote_tlbs(kvm); > > + spin_unlock(&kvm->mmu_lock); > + srcu_read_unlock(&kvm->srcu, idx); > + > return young; > } > > -- > 1.7.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.
[PATCH] virt: Fix migration bg command
In migration tests, the command we were using as a 'watchdog' command was tcpdump, but without specifying which interface it should listen to. As this may fail depending on the interface ordering, let's change the command to listen in all interfaces, since this way it's safer and the command won't fail depending on the interface ordering. Signed-off-by: Eduardo Habkost --- client/virt/subtests.cfg.sample |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample index b08a5c4..56043e0 100644 --- a/client/virt/subtests.cfg.sample +++ b/client/virt/subtests.cfg.sample @@ -350,7 +350,7 @@ variants: - migrate: install setup image_copy unattended_install.cdrom type = migration migration_test_command = help -migration_bg_command = "cd /tmp; nohup tcpdump -q -t ip host localhost" +migration_bg_command = "cd /tmp; nohup tcpdump -q -i any -t ip host localhost" migration_bg_check_command = pgrep tcpdump migration_bg_kill_command = pkill tcpdump kill_vm_on_error = yes -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote: > On Tue, 14 Feb 2012, Marcelo Tosatti wrote: > > > On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote: > > > On Wed, 08 Feb 2012, Eric B Munson wrote: > > > > > > > > > > > When a guest kernel is stopped by the host hypervisor it can look like > > > > a soft > > > > lockup to the guest kernel. This false warning can mask later soft > > > > lockup > > > > warnings which may be real. This patch series adds a method for a host > > > > hypervisor to communicate to a guest kernel that it is being stopped. > > > > The > > > > final patch in the series has the watchdog check this flag when it goes > > > > to > > > > issue a soft lockup warning and skip the warning if the guest knows it > > > > was > > > > stopped. > > > > > > > > It was attempted to solve this in Qemu, but the side effects of saving > > > > and > > > > restoring the clock and tsc for each vcpu put the wall clock of the > > > > guest behind > > > > by the amount of time of the pause. This forces a guest to have ntp > > > > running > > > > in order to keep the wall clock accurate. > > > > > > Avi, > > > > > > Is this set fit for merging or is there something else you want changed? > > > > Eric, > > > > On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked: > > > > How is the stub getting included for other architectures again? > > > > Marcelo, > > Sorry, I put out V13 to answer that. There is a stub in asm-generic that was > lost in the V11-V12 rebase. This stub has be included in the V13 set. > > Eric Eric, I know the stub has been included in the series. But i am asking how it is #include'ed for other architectures? (can't see that). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Tue, 14 Feb 2012, Marcelo Tosatti wrote: > On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote: > > On Wed, 08 Feb 2012, Eric B Munson wrote: > > > > > > > > When a guest kernel is stopped by the host hypervisor it can look like a > > > soft > > > lockup to the guest kernel. This false warning can mask later soft lockup > > > warnings which may be real. This patch series adds a method for a host > > > hypervisor to communicate to a guest kernel that it is being stopped. The > > > final patch in the series has the watchdog check this flag when it goes to > > > issue a soft lockup warning and skip the warning if the guest knows it was > > > stopped. > > > > > > It was attempted to solve this in Qemu, but the side effects of saving and > > > restoring the clock and tsc for each vcpu put the wall clock of the guest > > > behind > > > by the amount of time of the pause. This forces a guest to have ntp > > > running > > > in order to keep the wall clock accurate. > > > > Avi, > > > > Is this set fit for merging or is there something else you want changed? > > Eric, > > On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked: > > How is the stub getting included for other architectures again? > Marcelo, Sorry, I put out V13 to answer that. There is a stub in asm-generic that was lost in the V11-V12 rebase. This stub has be included in the V13 set. Eric signature.asc Description: Digital signature
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote: > On Wed, 08 Feb 2012, Eric B Munson wrote: > > > > > When a guest kernel is stopped by the host hypervisor it can look like a > > soft > > lockup to the guest kernel. This false warning can mask later soft lockup > > warnings which may be real. This patch series adds a method for a host > > hypervisor to communicate to a guest kernel that it is being stopped. The > > final patch in the series has the watchdog check this flag when it goes to > > issue a soft lockup warning and skip the warning if the guest knows it was > > stopped. > > > > It was attempted to solve this in Qemu, but the side effects of saving and > > restoring the clock and tsc for each vcpu put the wall clock of the guest > > behind > > by the amount of time of the pause. This forces a guest to have ntp running > > in order to keep the wall clock accurate. > > Avi, > > Is this set fit for merging or is there something else you want changed? Eric, On Message-ID: <20120210160536.ga23...@amt.cnet>, i asked: How is the stub getting included for other architectures again? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 5:38 PM, Cyrill Gorcunov wrote: >> > Ideally we should get rid of our minibios completely and only have >> > seabios here instead. >> >> No, no, they should co-exist. There's absolutely no reason to force >> people to use a BIOS to boot Linux. > > I meant run-time (ie in memory). I didn't mean substitude our minibios, > but rather have an ability to either run with compiled-in bios or with > seabios instead. Sure. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 05:35:47PM +0200, Pekka Enberg wrote: > On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote: > >> And will seabios replace the present bios implement or co-exsit? > > On Tue, Feb 14, 2012 at 3:32 PM, Cyrill Gorcunov wrote: > > Ideally we should get rid of our minibios completely and only have > > seabios here instead. > > No, no, they should co-exist. There's absolutely no reason to force > people to use a BIOS to boot Linux. > I meant run-time (ie in memory). I didn't mean substitude our minibios, but rather have an ability to either run with compiled-in bios or with seabios instead. Cyrill -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote: >> And will seabios replace the present bios implement or co-exsit? On Tue, Feb 14, 2012 at 3:32 PM, Cyrill Gorcunov wrote: > Ideally we should get rid of our minibios completely and only have > seabios here instead. No, no, they should co-exist. There's absolutely no reason to force people to use a BIOS to boot Linux. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host
On Wed, 08 Feb 2012, Eric B Munson wrote: > > When a guest kernel is stopped by the host hypervisor it can look like a soft > lockup to the guest kernel. This false warning can mask later soft lockup > warnings which may be real. This patch series adds a method for a host > hypervisor to communicate to a guest kernel that it is being stopped. The > final patch in the series has the watchdog check this flag when it goes to > issue a soft lockup warning and skip the warning if the guest knows it was > stopped. > > It was attempted to solve this in Qemu, but the side effects of saving and > restoring the clock and tsc for each vcpu put the wall clock of the guest > behind > by the amount of time of the pause. This forces a guest to have ntp running > in order to keep the wall clock accurate. Avi, Is this set fit for merging or is there something else you want changed? Eric > > Cc: mi...@redhat.com > Cc: h...@zytor.com > Cc: ry...@linux.vnet.ibm.com > Cc: aligu...@us.ibm.com > Cc: mtosa...@redhat.com > Cc: kvm@vger.kernel.org > Cc: linux-a...@vger.kernel.org > Cc: x...@kernel.org > Cc: linux-ker...@vger.kernel.org > > Eric B Munson (4): > Add flag to indicate that a vm was stopped by the host > Add functions to check if the host has stopped the vm > Add ioctl for KVM_KVMCLOCK_CTRL > Add check for suspended vm in softlockup detector > > Documentation/virtual/kvm/api.txt | 13 + > arch/ia64/include/asm/kvm_para.h|5 + > arch/powerpc/include/asm/kvm_para.h |5 + > arch/s390/include/asm/kvm_para.h|5 + > arch/x86/include/asm/kvm_para.h |8 > arch/x86/include/asm/pvclock-abi.h |1 + > arch/x86/kernel/kvmclock.c | 21 + > arch/x86/kvm/x86.c | 22 ++ > include/asm-generic/kvm_para.h | 14 ++ > include/linux/kvm.h |3 +++ > kernel/watchdog.c | 12 > 11 files changed, 109 insertions(+), 0 deletions(-) > create mode 100644 include/asm-generic/kvm_para.h > > -- > 1.7.5.4 > signature.asc Description: Digital signature
Re: level in kvm_mmu_page_role
On 02/13/2012 11:30 PM, Sanidhya Kashyap wrote: > I have been going through the kvm code but didn't get the significance > of level in kvm_mmu_page_role. So, it would be nice if anyone can > explain it what is its use? > > It's the page table level. Level 1 contains page table entries pointing to 4k pages. Level 2 contains page directory entries pointing to level 1 page tables, or pointers to 2M pages, and so forth. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 6/9] kvmvapic: Introduce TPR access optimization for Windows guests
This enables acceleration for MMIO-based TPR registers accesses of 32-bit Windows guest systems. It is mostly useful with KVM enabled, either on older Intel CPUs (without flexpriority feature, can also be manually disabled for testing) or any current AMD processor. The approach introduced here is derived from the original version of qemu-kvm. It was refactored, documented, and extended by support for user space APIC emulation, both with and without KVM acceleration. The VMState format was kept compatible, so was the ABI to the option ROM that implements the guest-side para-virtualized driver service. This enables seamless migration from qemu-kvm to upstream or, one day, between KVM and TCG mode. The basic concept goes like this: - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel irqchip) a vmcall hypercall is registered - VAPIC option ROM is loaded into guest - option ROM activates TPR MMIO access reporting via port 0x7e - TPR accesses are trapped and patched in the guest to call into option ROM instead, VAPIC support is enabled - option ROM TPR helpers track state in memory and invoke hypercall to poll for pending IRQs if required Signed-off-by: Jan Kiszka --- Makefile.target|3 +- hw/apic.c | 126 - hw/apic_common.c | 64 - hw/apic_internal.h | 27 ++ hw/kvm/apic.c | 32 ++ hw/kvmvapic.c | 803 6 files changed, 1041 insertions(+), 14 deletions(-) create mode 100644 hw/kvmvapic.c diff --git a/Makefile.target b/Makefile.target index 68481a3..ec7eff8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -230,7 +230,8 @@ obj-y += device-hotplug.o # Hardware support obj-i386-y += mc146818rtc.o pc.o -obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o +obj-i386-y += apic_common.o apic.o kvmvapic.o +obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o diff --git a/hw/apic.c b/hw/apic.c index 086c544..2ebf3ca 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -35,6 +35,10 @@ #define MSI_ADDR_DEST_ID_SHIFT 12 #defineMSI_ADDR_DEST_ID_MASK 0x000 +#define SYNC_FROM_VAPIC 0x1 +#define SYNC_TO_VAPIC 0x2 +#define SYNC_ISR_IRR_TO_VAPIC 0x4 + static APICCommonState *local_apics[MAX_APICS + 1]; static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode); @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index) return !!(tab[i] & mask); } +/* return -1 if no bit is set */ +static int get_highest_priority_int(uint32_t *tab) +{ +int i; +for (i = 7; i >= 0; i--) { +if (tab[i] != 0) { +return i * 32 + fls_bit(tab[i]); +} +} +return -1; +} + +static void apic_sync_vapic(APICCommonState *s, int sync_type) +{ +VAPICState vapic_state; +size_t length; +off_t start; +int vector; + +if (!s->vapic_paddr) { +return; +} +if (sync_type & SYNC_FROM_VAPIC) { +cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state, + sizeof(vapic_state), 0); +s->tpr = vapic_state.tpr; +} +if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) { +start = offsetof(VAPICState, isr); +length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr); + +if (sync_type & SYNC_TO_VAPIC) { +assert(qemu_cpu_is_self(s->cpu_env)); + +vapic_state.tpr = s->tpr; +vapic_state.enabled = 1; +start = 0; +length = sizeof(VAPICState); +} + +vector = get_highest_priority_int(s->isr); +if (vector < 0) { +vector = 0; +} +vapic_state.isr = vector & 0xf0; + +vapic_state.zero = 0; + +vector = get_highest_priority_int(s->irr); +if (vector < 0) { +vector = 0; +} +vapic_state.irr = vector & 0xff; + +cpu_physical_memory_write_rom(s->vapic_paddr + start, + ((void *)&vapic_state) + start, length); +} +} + +static void apic_vapic_base_update(APICCommonState *s) +{ +apic_sync_vapic(s, SYNC_TO_VAPIC); +} + static void apic_local_deliver(APICCommonState *s, int vector) { uint32_t lvt = s->lvt[vector]; @@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t val) static void apic_set_tpr(APICCommonState *s, uint8_t val) { -s->tpr = (val & 0x0f) << 4; -apic_update_irq(s); +/* Updates from cr8 are ignored while the VAPIC is active */ +if (!s->vapic_paddr) { +s->tpr = val << 4; +apic_update_irq(s); +} } -/* return -1 if no bit is set */ -static int get_highest_priority_int(uint32_t *tab) +static uint8_t apic_get_tpr(APICCommonState *s) { -int i; -f
[PATCH v3 0/9] uq/master: TPR access optimization for Windows guests
v3 comes with the following changes: - clear TPR access report on system reset (in case we load a guest without the option ROM) - addressed review comments on details in kvmvapic.c - streamlined 16-bit VAPIC port handling - included cleanup for useless next_cpu casts in cpus.c (to avoid conflicts on merge) The series is also available at git://git.kiszka.org/qemu-kvm.git queues/kvm-tpr Please review/apply. CC: Paolo Bonzini Jan Kiszka (9): kvm: Set cpu_single_env only once Remove useless casts from cpu iterators Allow to use pause_all_vcpus from VCPU context target-i386: Add infrastructure for reporting TPR MMIO accesses kvmvapic: Add option ROM kvmvapic: Introduce TPR access optimization for Windows guests kvmvapic: Simplify mp/up_set_tpr optionsrom: Reserve space for checksum kvmvapic: Use optionrom helpers .gitignore|1 + Makefile |2 +- Makefile.target |3 +- cpu-all.h |3 +- cpus.c| 21 +- hw/apic.c | 126 ++- hw/apic.h |2 + hw/apic_common.c | 68 - hw/apic_internal.h| 27 ++ hw/kvm/apic.c | 32 ++ hw/kvmvapic.c | 803 + kvm-all.c |5 - pc-bios/optionrom/Makefile|2 +- pc-bios/optionrom/kvmvapic.S | 335 + pc-bios/optionrom/optionrom.h |3 +- target-i386/cpu.h | 11 + target-i386/helper.c | 19 + target-i386/kvm.c | 24 ++- 18 files changed, 1458 insertions(+), 29 deletions(-) create mode 100644 hw/kvmvapic.c create mode 100644 pc-bios/optionrom/kvmvapic.S -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 8/9] optionsrom: Reserve space for checksum
Always add a byte before the final 512-bytes alignment to reserve the space for the ROM checksum. Signed-off-by: Jan Kiszka --- pc-bios/optionrom/optionrom.h |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h index aa783de..3daf7da 100644 --- a/pc-bios/optionrom/optionrom.h +++ b/pc-bios/optionrom/optionrom.h @@ -124,7 +124,8 @@ movw%ax, %ds; #define OPTION_ROM_END \ -.align 512, 0; \ + .byte 0; \ + .align 512, 0; \ _end: #define BOOT_ROM_END \ -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/9] Allow to use pause_all_vcpus from VCPU context
In order to perform critical manipulations on the VM state in the context of a VCPU, specifically code patching, stopping and resuming of all VCPUs may be necessary. resume_all_vcpus is already compatible, now enable pause_all_vcpus for this use case by stopping the calling context before starting to wait for the whole gang. CC: Paolo Bonzini Signed-off-by: Jan Kiszka --- cpus.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/cpus.c b/cpus.c index 4e65894..290daa8 100644 --- a/cpus.c +++ b/cpus.c @@ -870,6 +870,18 @@ void pause_all_vcpus(void) penv = penv->next_cpu; } +if (!qemu_thread_is_self(&io_thread)) { +cpu_stop_current(); +if (!kvm_enabled()) { +while (penv) { +penv->stop = 0; +penv->stopped = 1; +penv = penv->next_cpu; +} +return; +} +} + while (!all_vcpus_paused()) { qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex); penv = first_cpu; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/9] target-i386: Add infrastructure for reporting TPR MMIO accesses
This will allow the APIC core to file a TPR access report. Depending on the accelerator and kernel irqchip mode, it will either be delivered right away or queued for later reporting. In TCG mode, we can restart the triggering instruction and can therefore forward the event directly. KVM does not allows us to restart, so we postpone the delivery of events recording in the user space APIC until the current instruction is completed. Note that KVM without in-kernel irqchip will report the address after the instruction that triggered a write access. In contrast, read accesses will return the precise information. Signed-off-by: Jan Kiszka --- cpu-all.h|3 ++- hw/apic.h|2 ++ hw/apic_common.c |4 target-i386/cpu.h| 11 +++ target-i386/helper.c | 19 +++ target-i386/kvm.c| 24 ++-- 6 files changed, 60 insertions(+), 3 deletions(-) diff --git a/cpu-all.h b/cpu-all.h index e2c3c49..80e6d42 100644 --- a/cpu-all.h +++ b/cpu-all.h @@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env); #define CPU_INTERRUPT_TGT_INT_0 0x0100 #define CPU_INTERRUPT_TGT_INT_1 0x0400 #define CPU_INTERRUPT_TGT_INT_2 0x0800 +#define CPU_INTERRUPT_TGT_INT_3 0x2000 -/* First unused bit: 0x2000. */ +/* First unused bit: 0x4000. */ /* The set of all bits that should be masked when single-stepping. */ #define CPU_INTERRUPT_SSTEP_MASK \ diff --git a/hw/apic.h b/hw/apic.h index a62d83b..45598bd 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val); uint8_t cpu_get_apic_tpr(DeviceState *s); void apic_init_reset(DeviceState *s); void apic_sipi(DeviceState *s); +void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, + int access); /* pc.c */ int cpu_is_bsp(CPUState *env); diff --git a/hw/apic_common.c b/hw/apic_common.c index 8373d79..588531b 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d) return s ? s->tpr >> 4 : 0; } +void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access) +{ +} + void apic_report_irq_delivered(int delivered) { apic_irq_delivered += delivered; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 37dde79..c2e9ca3 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -482,6 +482,7 @@ #define CPU_INTERRUPT_VIRQ CPU_INTERRUPT_TGT_INT_0 #define CPU_INTERRUPT_INIT CPU_INTERRUPT_TGT_INT_1 #define CPU_INTERRUPT_SIPI CPU_INTERRUPT_TGT_INT_2 +#define CPU_INTERRUPT_TPR CPU_INTERRUPT_TGT_INT_3 enum { @@ -772,6 +773,9 @@ typedef struct CPUX86State { XMMReg ymmh_regs[CPU_NB_REGS]; uint64_t xcr0; + +target_ulong tpr_access_ip; +int tpr_access_type; } CPUX86State; CPUX86State *cpu_x86_init(const char *cpu_model); @@ -1064,4 +1068,11 @@ void svm_check_intercept(CPUState *env1, uint32_t type); uint32_t cpu_cc_compute_all(CPUState *env1, int op); +typedef enum TPRAccess { +TPR_ACCESS_READ, +TPR_ACCESS_WRITE, +} TPRAccess; + +void cpu_report_tpr_access(CPUState *env, TPRAccess access); + #endif /* CPU_I386_H */ diff --git a/target-i386/helper.c b/target-i386/helper.c index 2586aff..79aeb8f 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank, } } } + +void cpu_report_tpr_access(CPUState *env, TPRAccess access) +{ +TranslationBlock *tb; + +if (kvm_enabled()) { +cpu_synchronize_state(env); + +env->tpr_access_ip = env->eip; +env->tpr_access_type = access; + +cpu_interrupt(env, CPU_INTERRUPT_TPR); +} else { +tb = tb_find_pc(env->mem_io_pc); +cpu_restore_state(tb, env, env->mem_io_pc); + +apic_handle_tpr_access_report(env->apic_state, env->eip, access); +} +} #endif /* !CONFIG_USER_ONLY */ static void mce_init(CPUX86State *cenv) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 981192d..fa77f9d 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run) } if (!kvm_irqchip_in_kernel()) { -/* Force the VCPU out of its inner loop to process the INIT request */ -if (env->interrupt_request & CPU_INTERRUPT_INIT) { +/* Force the VCPU out of its inner loop to process any INIT requests + * or pending TPR access reports. */ +if (env->interrupt_request & +(CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) { env->exit_request = 1; } @@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env) kvm_cpu_synchronize_state(env); do_cpu_sipi(env); } +if (env->interrupt_request & CPU_INTERRUPT_TPR) { +env->interrupt_request &= ~CPU_INTERRUPT_TPR; +apic_handle_tpr_access_
[PATCH v3 9/9] kvmvapic: Use optionrom helpers
Use OPTION_ROM_START/END from the common header file, add comment to init code. Signed-off-by: Jan Kiszka --- pc-bios/optionrom/kvmvapic.S | 18 -- 1 files changed, 8 insertions(+), 10 deletions(-) diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S index 856c1e5..aa17a40 100644 --- a/pc-bios/optionrom/kvmvapic.S +++ b/pc-bios/optionrom/kvmvapic.S @@ -9,12 +9,10 @@ # option) any later version. See the COPYING file in the top-level directory. # - .text 0 - .code16 -.global _start -_start: - .short 0xaa55 - .byte (_end - _start) / 512 +#include "optionrom.h" + +OPTION_ROM_START + # clear vapic area: firmware load using rep insb may cause # stale tpr/isr/irr data to corrupt the vapic area. push %es @@ -26,8 +24,11 @@ _start: cld rep stosw pop %es + + # announce presence to the hypervisor mov $vapic_base, %ax out %ax, $0x7e + lret .code32 @@ -331,7 +332,4 @@ up_set_tpr_poll_irq: vapic: . = . + vapic_size -.byte 0 # reserve space for signature -.align 512, 0 - -_end: +OPTION_ROM_END -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/9] kvm: Set cpu_single_env only once
As we have thread-local cpu_single_env now and KVM uses exactly one thread per VCPU, we can drop the cpu_single_env updates from the loop and initialize this variable only once during setup. Signed-off-by: Jan Kiszka --- cpus.c|1 + kvm-all.c |5 - 2 files changed, 1 insertions(+), 5 deletions(-) diff --git a/cpus.c b/cpus.c index f45a438..d0c8340 100644 --- a/cpus.c +++ b/cpus.c @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) qemu_mutex_lock(&qemu_global_mutex); qemu_thread_get_self(env->thread); env->thread_id = qemu_get_thread_id(); +cpu_single_env = env; r = kvm_init_vcpu(env); if (r < 0) { diff --git a/kvm-all.c b/kvm-all.c index c4babda..e2cbc03 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env) return EXCP_HLT; } -cpu_single_env = env; - do { if (env->kvm_vcpu_dirty) { kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE); @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env) */ qemu_cpu_kick_self(); } -cpu_single_env = NULL; qemu_mutex_unlock_iothread(); run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0); qemu_mutex_lock_iothread(); -cpu_single_env = env; kvm_arch_post_run(env, run); kvm_flush_coalesced_mmio_buffer(); @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env) } env->exit_request = 0; -cpu_single_env = NULL; return ret; } -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/9] Remove useless casts from cpu iterators
CPUState::next_cpu is already CPUState *. Signed-off-by: Jan Kiszka --- cpus.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/cpus.c b/cpus.c index d0c8340..4e65894 100644 --- a/cpus.c +++ b/cpus.c @@ -853,7 +853,7 @@ static int all_vcpus_paused(void) if (!penv->stopped) { return 0; } -penv = (CPUState *)penv->next_cpu; +penv = penv->next_cpu; } return 1; @@ -867,7 +867,7 @@ void pause_all_vcpus(void) while (penv) { penv->stop = 1; qemu_cpu_kick(penv); -penv = (CPUState *)penv->next_cpu; +penv = penv->next_cpu; } while (!all_vcpus_paused()) { @@ -875,7 +875,7 @@ void pause_all_vcpus(void) penv = first_cpu; while (penv) { qemu_cpu_kick(penv); -penv = (CPUState *)penv->next_cpu; +penv = penv->next_cpu; } } } @@ -889,7 +889,7 @@ void resume_all_vcpus(void) penv->stop = 0; penv->stopped = 0; qemu_cpu_kick(penv); -penv = (CPUState *)penv->next_cpu; +penv = penv->next_cpu; } } -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 7/9] kvmvapic: Simplify mp/up_set_tpr
The CH registers is only written, never read. So we can remove these operations and, in case of up_set_tpr, also the ECX push/pop. Signed-off-by: Jan Kiszka --- pc-bios/optionrom/kvmvapic.S |6 +- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S index e1d8f18..856c1e5 100644 --- a/pc-bios/optionrom/kvmvapic.S +++ b/pc-bios/optionrom/kvmvapic.S @@ -202,7 +202,6 @@ mp_isr_is_bigger: mov %bh, %bl mp_tpr_is_bigger: /* %bl = ppr */ - mov %bl, %ch /* ch = ppr */ rol $8, %ebx /* now: %bl = irr, %bh = ppr */ cmp %bh, %bl @@ -276,7 +275,6 @@ up_set_tpr_eax: up_set_tpr: pushf push %eax - push %ecx push %ebx reenable_vtpr @@ -284,7 +282,7 @@ up_set_tpr_failed: mov vapic, %eax ; fixup mov %eax, %ebx - mov 20(%esp), %bl + mov 16(%esp), %bl /* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */ @@ -298,7 +296,6 @@ up_isr_is_bigger: mov %bh, %bl up_tpr_is_bigger: /* %bl = ppr */ - mov %bl, %ch /* ch = ppr */ rol $8, %ebx /* now: %bl = irr, %bh = ppr */ cmp %bh, %bl @@ -306,7 +303,6 @@ up_tpr_is_bigger: up_set_tpr_out: pop %ebx - pop %ecx pop %eax popf ret $4 -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/9] kvmvapic: Add option ROM
This imports and builds the original VAPIC option ROM of qemu-kvm. Its interaction with QEMU is described in the commit that introduces the corresponding device model. Signed-off-by: Jan Kiszka --- .gitignore |1 + Makefile |2 +- pc-bios/optionrom/Makefile |2 +- pc-bios/optionrom/kvmvapic.S | 341 ++ 4 files changed, 344 insertions(+), 2 deletions(-) create mode 100644 pc-bios/optionrom/kvmvapic.S diff --git a/.gitignore b/.gitignore index f5aab2c..d3b78c3 100644 --- a/.gitignore +++ b/.gitignore @@ -75,6 +75,7 @@ pc-bios/vgabios-pq/status pc-bios/optionrom/linuxboot.bin pc-bios/optionrom/multiboot.bin pc-bios/optionrom/multiboot.raw +pc-bios/optionrom/kvmvapic.bin .stgit-* cscope.* tags diff --git a/Makefile b/Makefile index 47acf3d..c2ef135 100644 --- a/Makefile +++ b/Makefile @@ -255,7 +255,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \ pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \ bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \ mpc8544ds.dtb \ -multiboot.bin linuxboot.bin \ +multiboot.bin linuxboot.bin kvmvapic.bin \ s390-zipl.rom \ spapr-rtas.bin slof.bin \ palcode-clipper diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile index 2caf7e6..f6b4027 100644 --- a/pc-bios/optionrom/Makefile +++ b/pc-bios/optionrom/Makefile @@ -14,7 +14,7 @@ CFLAGS += -I$(SRC_PATH) CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector) QEMU_CFLAGS = $(CFLAGS) -build-all: multiboot.bin linuxboot.bin +build-all: multiboot.bin linuxboot.bin kvmvapic.bin # suppress auto-removal of intermediate files .SECONDARY: diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S new file mode 100644 index 000..e1d8f18 --- /dev/null +++ b/pc-bios/optionrom/kvmvapic.S @@ -0,0 +1,341 @@ +# +# Local APIC acceleration for Windows XP and related guests +# +# Copyright 2011 Red Hat, Inc. and/or its affiliates +# +# Author: Avi Kivity +# +# This work is licensed under the terms of the GNU GPL, version 2, or (at your +# option) any later version. See the COPYING file in the top-level directory. +# + + .text 0 + .code16 +.global _start +_start: + .short 0xaa55 + .byte (_end - _start) / 512 + # clear vapic area: firmware load using rep insb may cause + # stale tpr/isr/irr data to corrupt the vapic area. + push %es + push %cs + pop %es + xor %ax, %ax + mov $vapic_size/2, %cx + lea vapic, %di + cld + rep stosw + pop %es + mov $vapic_base, %ax + out %ax, $0x7e + lret + + .code32 +vapic_size = 2*4096 + +.macro fixup delta=-4 +777: + .text 1 + .long 777b + \delta - vapic_base + .text 0 +.endm + +.macro reenable_vtpr + out %al, $0x7e +.endm + +.text 1 + fixup_start = . +.text 0 + +.align 16 + +vapic_base: + .ascii "kvm aPiC" + + /* relocation data */ + .long vapic_base; fixup + .long fixup_start ; fixup + .long fixup_end ; fixup + + .long vapic ; fixup + .long vapic_size +vcpu_shift: + .long 0 +real_tpr: + .long 0 + .long up_set_tpr; fixup + .long up_set_tpr_eax; fixup + .long up_get_tpr_eax; fixup + .long up_get_tpr_ecx; fixup + .long up_get_tpr_edx; fixup + .long up_get_tpr_ebx; fixup + .long 0 /* esp. won't work. */ + .long up_get_tpr_ebp; fixup + .long up_get_tpr_esi; fixup + .long up_get_tpr_edi; fixup + .long up_get_tpr_stack ; fixup + .long mp_set_tpr; fixup + .long mp_set_tpr_eax; fixup + .long mp_get_tpr_eax; fixup + .long mp_get_tpr_ecx; fixup + .long mp_get_tpr_edx; fixup + .long mp_get_tpr_ebx; fixup + .long 0 /* esp. won't work. */ + .long mp_get_tpr_ebp; fixup + .long mp_get_tpr_esi; fixup + .long mp_get_tpr_edi; fixup + .long mp_get_tpr_stack ; fixup + +.macro kvm_hypercall + .byte 0x0f, 0x01, 0xc1 +.endm + +kvm_hypercall_vapic_poll_irq = 1 + +pcr_cpu = 0x51 + +.align 64 + +mp_get_tpr_eax: + pushf + cli + reenable_vtpr + push %ecx + + fs/movzbl pcr_cpu, %eax + + mov vcpu_shift, %ecx; fixup + shl %cl, %eax + testb $1, vapic+4(%eax) ; fixup delta=-5 + jz mp_get_tpr_bad + movzbl vapic(%eax), %eax ; fixup + +mp_get_tpr_out: + pop %ecx + popf + ret + +mp_get_tpr_bad: + mov real_tpr, %eax ; fixup + mov (%eax), %eax + jmp mp_get_tpr_out + +mp_get_tpr_ebx: + mov %eax, %ebx + call mp_get_tpr_eax + xchg %eax, %ebx + ret + +mp_get_tpr_ecx: + mov %eax, %ecx + call mp_get_tpr_eax + xchg %eax, %ecx + ret + +mp_get_tpr_edx: + mov %eax, %edx + call mp_get_tpr_eax +
Re: The way of mapping BIOS into the guest's address space
On 02/14/2012 07:03 PM, Yang Bai wrote: Hi all, Since on X86, bios is always at the end of the address space, so I have some thought about how to implement the seabios support for kvm tool. 1. using kvm__register_mem to map the end of address space to the guest then copy the code of seabios to this mem region. Just emulating the bios chip. 2. leave the bios code alone and don't touch the guest's address space. If the guest accesses the address belonging to the bios, it will be an IO request and we can emulate the IO access to the bios chip. Any ideas about this? Can I ask what's the purpose of mapping BIOS code to guest? Any usage? Shouldn't BIOS's behavior be emulated by hypervisor? Thanks. -cody And question: How could I set the first instruction address after we issue the vmlaunch instruction? Thanks, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane
https://bugzilla.kernel.org/show_bug.cgi?id=42755 --- Comment #28 from Avi Kivity 2012-02-14 14:47:38 --- (In reply to comment #27) > and there soon will be video capture with 'perf top' > > http://vbox7.com/play:199e9ede30 Run it while the guest is also running. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Tuesday 14
Juan Quintela wrote: > Hi > > Please send in any agenda items you are interested in covering. As there are no topics, call is cancelled. Happy hacking, Juan. PD. You should use the extra time to draw a qemu mascot O:-) > Cheers, > > Juan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub
On Tue, Feb 14, 2012 at 01:47:59PM +, Paul Brook wrote: > > > > Now an OS can have a standard driver and use it > > > > to activate hotplug functionality. This is OS hotplug (OSHP). > > > > > > So presumably this will work on targets that don't have ACPI? > > > Assuming a competent guest OS of course. Have you tested this? > > > > This being the qemu side of things? I run Linux > > and verified that it calls OSHP and afterwards, > > runs the native driver and handles hotplug/unplug > > without invoking ACPI at all. > > I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets > (i.e anything other than x86 PC). From your description it sounds like it > *should* work. > > > It seems that at least the SHPC driver in linux > > doesn't work if you don't have an acpi table > > with the OSHP method - not many people run with acpi=off > > nowdays, so it's probably just a bug. > > I'll check how hard it is to fix this. > > Targets other than x86 don't have ACPI to start with. > > Paul So #ifdef CONFIG_ACPI #include static inline int get_hp_hw_control_from_firmware(struct pci_dev *dev) { u32 flags = OSC_SHPC_NATIVE_HP_CONTROL; return acpi_get_hp_hw_control_from_firmware(dev, flags); } #else #define get_hp_hw_control_from_firmware(dev) (0) #endif So if you build your guest without acpi, things should work fine. -- MMST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Win 2000 driver for -vga std ?
On 02/14/12 07:25, Michael Tokarev wrote: On 14.02.2012 05:42, Reeted wrote: Hello, subject says it all The driver for windows 2000 for the -vga std should be the Anapa VBE Vesa VBEMP if I understand correctly but I cannot on earth find this executable http://navozhdeniye.narod.ru/vbemp.htm all links for download all over the world are dangling! Anybody has conserved this very important driver? This "adapter" works in all versions of windows with a built-in vesa driver just fine, no replacement is necessary or desired. The only problem is that some versions of windows consider that driver to be "problematic" somehow and mark the corresponding device with yellow exclamation sign. Go ask M$ about this. I don't think so... It detects new hardware (I am virtualizing an existing machine), asks me where to look for a driver, I make it go looking into the Win2000 installation CD and online at Windows Update but it says it cannot find a driver for such video adapter. It asks me if I want to disable the device or be prompted again for installation at the next boot. And it keeps running at 16 colors (4 bit depth) 800x600 and very poor performances when moving windows around. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
- Original Message - From: "Avi Kivity" To: "David Cure" Cc: kvm@vger.kernel.org, "Vadim Rozenfeld" Sent: Tuesday, February 14, 2012 3:32:16 PM Subject: Re: performance trouble On 02/10/2012 12:09 PM, David Cure wrote: > hello, > > Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait : > > > > Please post a trace as documented in http://www.linux-kvm.org/page/Tracing. > > I made the trace : started just before the slow function launch > and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one > user connected to the VM to launch the test. > > The trace file is too big to post here, I gzip it and the file > is available here : http://www.roullier.net/report.txt.gz > > I hope you can find something strange. > It's reading the HPET like crazy. There are also tons of interrupts. Please use the windows performance tools to see which devices trigger these interrupts. [VR] +1 Try Microsoft Windows Performance Toolkit from Windows SDK http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3138 It's really good. The HPET issue will be fixed by the hyper-V enlightenments, but these will take some time to cook. You can also try vhost-net to improve networking latency. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub
> > > Now an OS can have a standard driver and use it > > > to activate hotplug functionality. This is OS hotplug (OSHP). > > > > So presumably this will work on targets that don't have ACPI? > > Assuming a competent guest OS of course. Have you tested this? > > This being the qemu side of things? I run Linux > and verified that it calls OSHP and afterwards, > runs the native driver and handles hotplug/unplug > without invoking ACPI at all. I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets (i.e anything other than x86 PC). From your description it sounds like it *should* work. > It seems that at least the SHPC driver in linux > doesn't work if you don't have an acpi table > with the OSHP method - not many people run with acpi=off > nowdays, so it's probably just a bug. > I'll check how hard it is to fix this. Targets other than x86 don't have ACPI to start with. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On Tue, Feb 14, 2012 at 03:32:16PM +0200, Avi Kivity wrote: > On 02/10/2012 12:09 PM, David Cure wrote: > > hello, > > > > Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait : > > > > > > Please post a trace as documented in > > > http://www.linux-kvm.org/page/Tracing. > > > > I made the trace : started just before the slow function launch > > and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one > > user connected to the VM to launch the test. > > > > The trace file is too big to post here, I gzip it and the file > > is available here : http://www.roullier.net/report.txt.gz > > > > I hope you can find something strange. > > > > It's reading the HPET like crazy. There are also tons of interrupts. > Please use the windows performance tools to see which devices trigger > these interrupts. > > The HPET issue will be fixed by the hyper-V enlightenments, but these > will take some time to cook. > Try to add -no-hpet to qemu command line and see if it helps. > You can also try vhost-net to improve networking latency. > > -- > error compiling committee.c: too many arguments to function > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub
On Tue, Feb 14, 2012 at 12:49:08PM +, Paul Brook wrote: > > > In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking > > > for an additional Ack. > > > > No problem, I'll get an Ack :) > > Meanwhile - here's a summary, as far as I understand it. > > > > Originally PCI SIG only defined the electrical > > and mechanical requirements from hotplug, no standard > > software interface. So it needed ACPI to drive device-specific registers > > to actually do hotplug. > > At some point PCISIG defined standard interfaces > > for PCI hotplug. There are two of them: standard > > hot plug controller (SHPC) for PCI and PCIE hotplug > > for Express. > > > > Now an OS can have a standard driver and use it > > to activate hotplug functionality. This is OS hotplug (OSHP). > > So presumably this will work on targets that don't have ACPI? > Assuming a competent guest OS of course. Have you tested this? > > Paul This being the qemu side of things? I run Linux and verified that it calls OSHP and afterwards, runs the native driver and handles hotplug/unplug without invoking ACPI at all. It seems that at least the SHPC driver in linux doesn't work if you don't have an acpi table with the OSHP method - not many people run with acpi=off nowdays, so it's probably just a bug. I'll check how hard it is to fix this. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 02/10/2012 12:09 PM, David Cure wrote: > hello, > > Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait : > > > > Please post a trace as documented in http://www.linux-kvm.org/page/Tracing. > > I made the trace : started just before the slow function launch > and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one > user connected to the VM to launch the test. > > The trace file is too big to post here, I gzip it and the file > is available here : http://www.roullier.net/report.txt.gz > > I hope you can find something strange. > It's reading the HPET like crazy. There are also tons of interrupts. Please use the windows performance tools to see which devices trigger these interrupts. The HPET issue will be fixed by the hyper-V enlightenments, but these will take some time to cook. You can also try vhost-net to improve networking latency. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote: > And will seabios replace the present bios implement or co-exsit? Ideally we should get rid of our minibios completely and only have seabios here instead. Cyrill -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub
> > In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking > > for an additional Ack. > > No problem, I'll get an Ack :) > Meanwhile - here's a summary, as far as I understand it. > > Originally PCI SIG only defined the electrical > and mechanical requirements from hotplug, no standard > software interface. So it needed ACPI to drive device-specific registers > to actually do hotplug. > At some point PCISIG defined standard interfaces > for PCI hotplug. There are two of them: standard > hot plug controller (SHPC) for PCI and PCIE hotplug > for Express. > > Now an OS can have a standard driver and use it > to activate hotplug functionality. This is OS hotplug (OSHP). So presumably this will work on targets that don't have ACPI? Assuming a competent guest OS of course. Have you tested this? Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware
On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote: > The use case here is multiple VFs but the same solution should work with > multiple PFs as well. FDB controls should be independent of how the ports > are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc. Makes sense. > With events and ADD/DEL/GET FDB controls we can solve both cases. This also > solves Roopa's case with macvlan where he wants to add additional addresses > to macvlan ports. Not familiar with that issue - I'll prowl the list. > Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA. Ok. So there is a toggle somewhere which controls how flooding should happen. > > Maybe not. But the kernel already has the needed signals with one extra > hook we can save running a daemon in user space. Maybe that's not a great > argument to add kernel code though. You make a reasonable arguement to have it in the kernel but i think we win more if we separate the control. So while i empathize, I am hoping that youd go with the path that is hard to travel ;-> > The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the > br_netlink_init() path. Hrm - hadnt paid attention to that before. Nasty. The bridge seems to be hard-coding policy on station movement, no? This is a good example of the qualms i have on adding things to the kernel;-> I may not want to auto update a MAC address moving ports as part of some policy i have. I can go and add YAK (Yet Another Knob) - but where is the line drawn? cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 01:10:59PM +0200, Pekka Enberg wrote: > On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai wrote: > > Since on X86, bios is always at the end of the address space, so I > > have some thought about how to implement the seabios support for kvm > > tool. > > > > 1. using kvm__register_mem to map the end of address space to the > > guest then copy the code of seabios to this mem region. Just emulating > > the bios chip. I think this is what should be done. > > > > 2. leave the bios code alone and don't touch the guest's address > > space. If the guest accesses the address belonging to the bios, it > > will be an IO request and we can emulate the IO access to the bios > > chip. > > > > Any ideas about this? > > The latter solution doesn't make any sense to me. Cyrill, do we really > need to put the BIOS at the end of the address space? Don't we have > unused space below 1 MB? I don't remember for sure how SeaBIOS works actually. What I rememer is that it aquires all hw environment might have. So without real look into seabios code I fear I can't answer. But reserving end of 4G address space for bios copy sounds reasonable if we going to behave as real hardware. Maybe we could poke someone from KVM camp for a hint? Cyrill -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The way of mapping BIOS into the guest's address space
On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai wrote: > Since on X86, bios is always at the end of the address space, so I > have some thought about how to implement the seabios support for kvm > tool. > > 1. using kvm__register_mem to map the end of address space to the > guest then copy the code of seabios to this mem region. Just emulating > the bios chip. > > 2. leave the bios code alone and don't touch the guest's address > space. If the guest accesses the address belonging to the bios, it > will be an IO request and we can emulate the IO access to the bios > chip. > > Any ideas about this? The latter solution doesn't make any sense to me. Cyrill, do we really need to put the BIOS at the end of the address space? Don't we have unused space below 1 MB? > And question: How could I set the first instruction address after we > issue the vmlaunch instruction? You need to set ->boot_ip and fiends. See tools/kvm/x86/kvm.c::load_bzimage() for an example. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The way of mapping BIOS into the guest's address space
Hi all, Since on X86, bios is always at the end of the address space, so I have some thought about how to implement the seabios support for kvm tool. 1. using kvm__register_mem to map the end of address space to the guest then copy the code of seabios to this mem region. Just emulating the bios chip. 2. leave the bios code alone and don't touch the guest's address space. If the guest accesses the address belonging to the bios, it will be an IO request and we can emulate the IO access to the bios chip. Any ideas about this? And question: How could I set the first instruction address after we issue the vmlaunch instruction? Thanks, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pe: [PATCH v5 1/3] virtio-scsi: first version
On 02/14/2012 02:11 AM, Michael S. Tsirkin wrote: On Tue, Feb 14, 2012 at 11:49:55AM +1100, ronnie sahlberg wrote: By just exposing this device to the kernel, the kernel keeps sending, or if not the kernel maybe some other process trying to poll the status? every few seconds : PREVENT_ALLOW_MEDIUM_REMOVAL prevent removal PREVENT_ALLOW_MEDIUM_REMOVAL to immediatel change it back to allow removal again TEST_UNIT_READY After I run this mount /dev/sdd1 /mnt The kernel sends a single PREVENT_ALLOW_MEDIUM_REMOVAL to prevent removal then every few seconds a TEST_UNIT_READY Sorry to interrupt you again guys, but: the discussion started with virtio-blk hotplug and now we're talking about SCSI commands? Sure somebody switched topic at some point :) and anyway this is irrelevant to what virtio-blk can/cannot do. BTW, for virtio-scsi the spec provides a way to do hotplug and hotunplug without any polling, though it's not implemented yet in the driver. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()
On 02/10/2012 07:16 PM, Marcelo Tosatti wrote: > On Thu, Feb 09, 2012 at 04:25:36PM +0200, Avi Kivity wrote: > > On 02/08/2012 08:45 PM, Marcelo Tosatti wrote: > > > > BTW do we really need fast slot creation/destruction? > > > > > > At the moment yes. Boot a RHEL/Fedora installation disk (or any other > > > guest which uses SYSLINUX splash screen) and you will see. > > > > Another workload that suffers is Windows XP clearing the screen during boot. > > > > > That > > > particular case is a limitation of cirrus in QEMU, ideally it should be > > > optimized there. > > > > Why do you say that? > > There is no fundamental need to create/destroy the 0xa VGA memory > slot repeatedly. If the guest writes to it, then the need exists. > But you are right that the aim should be decent performance > nevertheless. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()
On 02/10/2012 03:25 PM, Takuya Yoshikawa wrote: > Avi Kivity wrote: > > > > 2. When we create(and shift?) a memory slot, we call > > > kvm_arch_flush_shadow() > > > to clear all mmio sptes, again not restricted to that slot. > > > > > > /* > > >* If the new memory slot is created, we need to clear all > > >* mmio sptes. > > >*/ > > > if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) > > > kvm_arch_flush_shadow(kvm); > > > > This is pretty rare outside the previous scenario (memory/pci hotplug). > > Is this condition correct? > > When npages != 0 and old.npages == 0, the slot is being newly created, do we > really need to flush shadow pages? > > This should be > if (npages && old.npages && (old.base_gfn != base_gfn)) > Your condition is more correct, but in practice there's no difference. If old.npages == 0, then old.base_gfn will be 0, and the condition will fail, except for the first slot created (when the shadow cache is empty anyway). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AESNI and guest hosts
Sorry for being a noob here, Any clues with this?, anyone ... On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown wrote: > Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest > kernel is running 3.2.5. The cpu is an E3-1230, but for some reason > its not able to supply the guest with aesni. Is there a config option > or is there something we're missing? > > > x86_64 > Westmere > Intel > > > > > > > > > > > > > > > > > > > Guest: > [root@fanboy:~]# cat /proc/cpuinfo > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 2 > model name : QEMU Virtual CPU version 1.0 > stepping : 3 > microcode : 0x1 > cpu MHz : 3192.748 > cache size : 4096 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 4 > wp : yes > flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt > hypervisor lahf_lm > bogomips : 6385.49 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: > > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 2 > model name : QEMU Virtual CPU version 1.0 > stepping : 3 > microcode : 0x1 > cpu MHz : 3192.748 > cache size : 4096 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 4 > wp : yes > flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca > cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt > hypervisor lahf_lm > bogomips : 6385.49 > clflush size : 64 > cache_alignment : 64 > address sizes : 40 bits physical, 48 bits virtual > power management: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
On Tue, Feb 14, 2012 at 09:55:46AM +0100, Jan Kiszka wrote: > On 2012-02-14 08:54, Gleb Natapov wrote: > > On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote: > Unfortunately, this is only an internal structure, not officially > documented by MS. However, all supported OS versions a legacy by now, no > longer changing its structure. > >>> > >>> This and a note about the supported OS versions could be added as comment. > >> > >> OK. > >> > >> For the folks that developed it in qemu-kvm: This targets Windows XP, > >> Vista and Server 2003, all 32-bit, right? > >> > > Not Vista. Not sure about Server 2003. > > I think I saw some 2003 reference in the qemu-kvm git logs. > Very likely. AFAIK it uses the same kernel as XP. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
On 2012-02-14 08:54, Gleb Natapov wrote: > On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote: Unfortunately, this is only an internal structure, not officially documented by MS. However, all supported OS versions a legacy by now, no longer changing its structure. >>> >>> This and a note about the supported OS versions could be added as comment. >> >> OK. >> >> For the folks that developed it in qemu-kvm: This targets Windows XP, >> Vista and Server 2003, all 32-bit, right? >> > Not Vista. Not sure about Server 2003. I think I saw some 2003 reference in the qemu-kvm git logs. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel mode VGAs?
On 2012-02-14 08:12, Gerhard Wiesinger wrote: > Hello, > > Current QEMU-KVM VGA implementation have the following problem with > legacy OS (e.g. DOS with INT10h calls): Performance is low on accessing > A000:0 > page and doing bank switching at the 64k page. Do we already understand the mode and access patterns here? Which VGA adapter? Cirrus, standard, or any? What is the concrete test case (one that won't require me digging for MS Dose floppy disks in my basement)? > > Would a kernel mode VGA solve these problems? > How complicated is it? > Is it possible to have only some parts in kernel mode? > Any further ideas or suggestions? Provided we take heavy exits so far, in-kernel acceleration may reduce the exit overhead by factor, hmm, maybe 3-4. Better is to avoid exists completely, i.e. switch the region to RAM mode. But that depends on the graphic mode, and I'm afraid we have already covered all which can be mapped like this. In any case, before discussing solutions, we need to analyze the problem. Jan signature.asc Description: OpenPGP digital signature