Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held
Marcelo Tosatti wrote: > On Mon, Apr 19, 2010 at 01:08:29PM +0300, Avi Kivity wrote: >> On 04/19/2010 12:58 PM, Lai Jiangshan wrote: >>> Applied the patch I just sent and let CONFIG_PROVE_RCU=y, >>> we can got the following dmesg. And we found that it is >>> because some codes in KVM dereferences srcu-protected pointer without >>> srcu_read_lock() held or update-side lock held. >>> >>> It is not hard to fix, the problem is that: >>> Where is the most proper place to put a srcu_read_lock()? >>> >>> I can not determine the answer, so I report this bug >>> instead of fixing it. >>> >> I think the else branch in complete_pio() should work. Marcelo? >> >> Longer term I'd like to see the lock taken at the high levels >> (ioctls, in virt/kvm) and dropped only for guest entry and when we >> explicitly sleep (hlt emulation). >> >> Note: complete_pio() is gone in the current code. > > Yes, this was fixed by 7fb2ea1e6. > > Applied the patch I sent yesterday and let CONFIG_PROVE_RCU=y I can get the following dmesg. Under very simple test, these is no complaint from PROVE_RCU after this patch applied. More test or reviewing of code are need in future. -- Subject: [PATCH] kvm: add missing srcu_read_lock() I got this dmesg due to srcu_read_lock() is missing in kvm_mmu_notifier_release(). === [ INFO: suspicious rcu_dereference_check() usage. ] --- arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/3100: #0: (rcu_read_lock){.+.+..}, at: [] __mmu_notifier_release+0x38/0xdf #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_mmu_zap_all+0x21/0x5e [kvm] stack backtrace: Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2 Call Trace: [] lockdep_rcu_dereference+0xaa/0xb3 [] unalias_gfn+0x56/0xab [kvm] [] gfn_to_memslot+0x16/0x25 [kvm] [] gfn_to_rmap+0x17/0x6e [kvm] [] rmap_remove+0xa0/0x19d [kvm] [] kvm_mmu_zap_page+0x109/0x34d [kvm] [] kvm_mmu_zap_all+0x35/0x5e [kvm] [] kvm_arch_flush_shadow+0x16/0x22 [kvm] [] kvm_mmu_notifier_release+0x15/0x17 [kvm] [] __mmu_notifier_release+0x88/0xdf [] ? __mmu_notifier_release+0x38/0xdf [] ? exit_mm+0xe0/0x115 [] exit_mmap+0x2c/0x17e [] mmput+0x2d/0xd4 [] exit_mm+0x108/0x115 [...] Signed-off-by: Lai Jiangshan --- diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a5dfea1..a6d639d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -341,7 +341,11 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm) { struct kvm *kvm = mmu_notifier_to_kvm(mn); + int idx; + + idx = srcu_read_lock(&kvm->srcu); kvm_arch_flush_shadow(kvm); + srcu_read_unlock(&kvm->srcu, idx); } static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
On Tue, 2010-04-20 at 08:09 +0200, Ingo Molnar wrote: > * Ingo Molnar wrote: > > > > > * Zhang, Yanmin wrote: > > > > > unsigned long perf_misc_flags(struct pt_regs *regs) > > > { > > > int misc = 0; > > > + > > > if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { > > > + if (perf_guest_cbs->is_user_mode()) > > > + misc |= PERF_RECORD_MISC_GUEST_USER; > > > + else > > > + misc |= PERF_RECORD_MISC_GUEST_KERNEL; > > > + } else if (user_mode(regs)) > > > + misc |= PERF_RECORD_MISC_USER; > > > + else > > > + misc |= PERF_RECORD_MISC_KERNEL; > > > + > > > > We try to use balanced curly braces. I.e.: > > > > if (x) { > > boo(); > > } else { > > if (y) > > foo(); > > else > > bar(); > > } > > > > And avoid unbalanced ones: > > > > if (x) { > > boo(); > > } else > > if (y) > > foo(); > > else > > bar(); > > Note, i fixed this in the patch and applied it to perf/core. (the invalid-C > problem was causing build failures) Thanks for your teaching. Originally, I used {} in the 2nd half, but deleted it. Yanmin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
* Ingo Molnar wrote: > > * Zhang, Yanmin wrote: > > > unsigned long perf_misc_flags(struct pt_regs *regs) > > { > > int misc = 0; > > + > > if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { > > + if (perf_guest_cbs->is_user_mode()) > > + misc |= PERF_RECORD_MISC_GUEST_USER; > > + else > > + misc |= PERF_RECORD_MISC_GUEST_KERNEL; > > + } else if (user_mode(regs)) > > + misc |= PERF_RECORD_MISC_USER; > > + else > > + misc |= PERF_RECORD_MISC_KERNEL; > > + > > We try to use balanced curly braces. I.e.: > > if (x) { > boo(); > } else { > if (y) > foo(); > else > bar(); > } > > And avoid unbalanced ones: > > if (x) { > boo(); > } else > if (y) > foo(); > else > bar(); Note, i fixed this in the patch and applied it to perf/core. (the invalid-C problem was causing build failures) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
* Zhang, Yanmin wrote: > unsigned long perf_misc_flags(struct pt_regs *regs) > { > int misc = 0; > + > if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { > + if (perf_guest_cbs->is_user_mode()) > + misc |= PERF_RECORD_MISC_GUEST_USER; > + else > + misc |= PERF_RECORD_MISC_GUEST_KERNEL; > + } else if (user_mode(regs)) > + misc |= PERF_RECORD_MISC_USER; > + else > + misc |= PERF_RECORD_MISC_KERNEL; > + We try to use balanced curly braces. I.e.: if (x) { boo(); } else { if (y) foo(); else bar(); } And avoid unbalanced ones: if (x) { boo(); } else if (y) foo(); else bar(); Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM test: Silence screendump thread by default
On 04/16/2010 09:12 PM, Lucas Meneghel Rodrigues wrote: > The VM screendump thread recently introduced generates > a lot of output on debug logs. Such output is not needed > most of the time (we are interested to see if a screenshot > production attempt failed though) and distracts the user > from other more important info. > > So let's add an additional parameter on send_monitor_cmd > that specifies if we actually want the monitor command logged, > defaulting it to True so the rest of the callers won't have > any change on their behavior. The screendump thread will > call send_monitor_cmd with verbose=False unless in the > configuration file one sets screendump_verbose=yes (defaults > to no on the sample confifg file). > > Signed-off-by: Lucas Meneghel Rodrigues > --- > client/tests/kvm/kvm_preprocessing.py | 10 +- > client/tests/kvm/kvm_vm.py |7 +-- > client/tests/kvm/tests_base.cfg.sample |1 + > 3 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/client/tests/kvm/kvm_preprocessing.py > b/client/tests/kvm/kvm_preprocessing.py > index 50db65c..4b9290c 100644 > --- a/client/tests/kvm/kvm_preprocessing.py > +++ b/client/tests/kvm/kvm_preprocessing.py > @@ -401,6 +401,10 @@ def _take_screendumps(test, params, env): > kvm_utils.generate_random_string(6)) > delay = float(params.get("screendump_delay", 5)) > quality = int(params.get("screendump_quality", 30)) > +if params.get("screendump_verbose") == 'yes': > +screendump_verbose = True > +else: > +screendump_verbose = False Why not: screendump_verbose = params.get("screendump_verbose") == "yes" > cache = {} > > @@ -408,7 +412,11 @@ def _take_screendumps(test, params, env): > for vm in kvm_utils.env_get_all_vms(env): > if vm.is_dead(): > continue > -vm.send_monitor_cmd("screendump %s" % temp_filename) > +if screendump_verbose: > +vm.send_monitor_cmd("screendump %s" % temp_filename) > +else: > +vm.send_monitor_cmd("screendump %s" % temp_filename, > +verbose=False) Also, why not: vm.send_monitor_cmd("screendump %s" % temp_filename, verbose=screendump_verbose) > if not os.path.exists(temp_filename): > logging.warn("VM '%s' failed to produce a screendump", > vm.name) > continue > diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py > index 047505a..244355e 100755 > --- a/client/tests/kvm/kvm_vm.py > +++ b/client/tests/kvm/kvm_vm.py > @@ -498,7 +498,7 @@ class VM: > lockfile.close() > > > -def send_monitor_cmd(self, command, block=True, timeout=20.0): > +def send_monitor_cmd(self, command, block=True, timeout=20.0, > verbose=True): > """ > Send command to the QEMU monitor. > > @@ -541,8 +541,11 @@ class VM: > time.sleep(0.01) > return (False, o) > > +# In certain conditions printing this debug output might be too much > +# Just print it if verbose is enabled (True by default) > +if verbose: > +logging.debug("Sending monitor command: %s" % command) > # Connect to monitor > -logging.debug("Sending monitor command: %s" % command) > try: > s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) > s.setblocking(False) > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index 4baa1dc..e74c7cb 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++ b/client/tests/kvm/tests_base.cfg.sample > @@ -23,6 +23,7 @@ keep_screendumps_on_error = yes > screendump_delay = 5 > screendump_quality = 30 > screendump_temp_dir = /dev/shm > +screendump_verbose = no > > # Some default VM params > qemu_binary = qemu -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
On Tue, Apr 20, 2010 at 12:54 AM, jvrao wrote: > Mohammed Gamal wrote: >> On Tue, Apr 13, 2010 at 9:08 PM, jvrao wrote: >>> jvrao wrote: Alexander Graf wrote: > On 12.04.2010, at 13:58, Jamie Lokier wrote: > >> Mohammed Gamal wrote: >>> On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier >>> wrote: Javier Guerra Giraldez wrote: > On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal > wrote: >> On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier >> wrote: >>> To throw a spanner in, the most widely supported filesystem across >>> operating systems is probably NFS, version 2 :-) >> Remember that Windows usage on a VM is not some rare use case, and >> it'd be a little bit of a pain from a user's perspective to have to >> install a third party NFS client for every VM they use. Having >> something supported on the VM out of the box is a better option IMO. > i don't think virtio-CIFS has any more support out of the box (on any > system) than virtio-9P. It doesn't, but at least network-CIFS tends to work ok and is the method of choice for Windows VMs - when you can setup Samba on the host (which as previously noted you cannot always do non-disruptively with current Sambas). -- Jamie >>> I think having support for both 9p and CIFS would be the best option. >>> In that case the user will have the option to use either one, >>> depending on how their guests support these filesystems. In that case >>> I'd prefer to work on CIFS support while the 9p effort can still go >>> on. I don't think both efforts are mutually exclusive. >>> >>> What do the rest of you guys think? >> I only noted NFS because most old OSes do not support CIFS or 9P - >> especially all the old unixes. >> >> I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?) >> even support current CIFS. They need extra server settings to work >> - such as setting passwords on the server to non-encrypted and other >> quirks. >> >> Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to >> properly see symlinks and hard links. >> >> So there is no really nice out of the box file service which works >> easily with all guest OSes. >> >> I'm guessing that out of all the filesystems, CIFS is the most widely >> supported in recent OSes (released in the last 10 years). But I'm not >> really sure what the state of CIFS is for non-Windows, non-Linux, >> non-BSD guests. > So what? If you want to have direct host fs access, install guest > drivers. If you can't, set up networking and use CIFS or NFS or whatever. > >> I'm not sure why 9P is being pursued. Does anything much support it, >> or do all OSes except quite recent Linux need a custom driver for 9P? >> Even Linux only got the first commit in the kernel 5 years ago, so >> probably it was only about 3 years ago that it will have begun >> appearing in stable distros, if at all. Filesystem passthrough to >> Linux guests installed in the last couple of years is a useful >> feature, and I know that for many people that is their only use of >> KVM, but compared with CIFS' broad support it seems like quite a >> narrow goal. > The goal is to have something simple and fast. We can fine-tune 9P to > align with the Linux VFS structures, making it really little overhead > (and little headache too). For Windows guests, nothing prevents us to > expose yet another 9P flavor. That again would be aligned well with > Windows's VFS and be slim and fast there. > > The biggest problem I see with CIFS is that it's a huge beast. There are > a lot of corner cases where it just doesn't fit in. See my previous mail > for more details. > As Alex mentioned, 9P is chosen for its mere simplicity and easy adaptability. NFS and CIFS does not give that flexibility. As we mentioned in the patch series, we are already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU knowledge to share physical resources like memory/cache between the host and the guest. I think looking into the windows side of 9P client would be great option too. The current patch on the mailing list supports .U (unix) protocol and will be introducing .L (Linux) variant as we move forward. >>> Hi Mohammed, >>> Please let us know once you decide on where your interest lies. >>> Will be glad to have you on VirtFS (9P) though. :) >>> >>> >>> - JV >>> >> >> It seems the community is more keen on getting 9P support merged than >> getting CIFS supported, and they have made good points to support >> their argument. I'm not sure whether work on this project could fit in >> as a GSoC project and if th
Re: [PATCH V3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
On Monday 19 April 2010 16:25:17 Avi Kivity wrote: > On 04/17/2010 09:12 PM, Avi Kivity wrote: > > I think you were right the first time around. > > Re-reading again (esp. the part about treatment of indirect NMI > vmexits), I think this was wrong, and that the code is correct. I am > now thoroughly confused. > My fault... To my understanding now, "If an event causes a VM exit directly, it does not update architectural state as it would have if it had it not caused the VM exit:", means: in NMI case, NMI would involve the NMI handler, and change the "architectural state" to NMI block. In VMX non-root mode, the behavior of calling NMI handler changed(determine by some VMCS fields), but not the affection to the "architectural state". So the NMI block state would remain the same. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Drop cr4.pge from shadow page role
On Mon, Apr 19, 2010 at 05:31:49PM +0300, Avi Kivity wrote: > Since commit bf47a760f66ad, we no longer handle ptes with the global bit > set specially, so there is no reason to distinguish between shadow pages > created with cr4.gpe set and clear. > > Such tracking is expensive when the guest toggles cr4.pge, so drop it. > > Signed-off-by: Avi Kivity ACK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2976863 ] 32PAE Windows guest blue screen when booting with apci on
Bugs item #2976863, was opened at 2010-03-26 14:44 Message generated for change (Comment added) made by haoxudong You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2976863&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Xudong Hao (haoxudong) Assigned to: Nobody/Anonymous (nobody) Summary: 32PAE Windows guest blue screen when booting with apci on Initial Comment: Environment: Host OS (ia32/ia32e/IA64):32e and pae Guest OS (ia32/ia32e/IA64): ia32pae Guest OS Type (Linux/Windows): windows(xp, vista, and Win7) kvm.git Commit: b8ed93d4.. qemu-kvm Commit: aa4df134.. Host Kernel Version:2.6.34-rc2 Hardware: Stoakely Bug detailed description: -- when we boot a 32pae Windows guest with acpi on, the guest will become bule screen of death. Reproduce steps: 1. boot into host, loal kvm and kvm_intel module 2. use following command to boot a guest: qemu-system-x86_64 -m 256 -net nic,macaddr=00:16:3e:27:b6:8a,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/Windows.img 3. the guest will become bule screen of death. -- >Comment By: Xudong Hao (haoxudong) Date: 2010-04-20 10:35 Message: Now BSOD issue got out on kvm commit: bc8a97a218d0d2367d125db3e11ec45bb0fa0a3f-a1b705841caa33fca0b95928505c01d5105b706f. However, boot a 32-bit XP with acpi on will cost about 9 minutes, did you meet similar issue on your side? -- Comment By: Marcelo Tosatti (mtosatti) Date: 2010-04-15 04:51 Message: 1) 32-bit WinXP qemu-system-x86_64 -drive file=/root/winxp-32-good.qcow2,cache=writeback -m 2000 -vnc :1 -usbdevice tablet -net nic,model=e1000 -net tap,script=/root/ifup 2) acpi on Control panes shows "ACPI multiprocessor PC" 3) shadow mode [r...@localhost ~]# cat /sys/module/kvm_intel/parameters/ept N host: Linux localhost.localdomain 2.6.34-rc3 #101 SMP Mon Apr 12 18:18:39 BRT 2010 i686 i686 i386 GNU/Linux Can you send a screenshot of the hang? -- Comment By: Xudong Hao (haoxudong) Date: 2010-04-13 13:45 Message: Motsatti, I checked again, guest still booting stop at the loading scroll bar. Let find what difference between ours. Can you check if your reproduce steps: 1) 32-bit WinXP 1) acpi on 2) shadow mode -- Comment By: Marcelo Tosatti (mtosatti) Date: 2010-04-13 06:28 Message: Xudong, Tested WinXP.32 and WinVista.32 on 32-bit host, both work fine. -- Comment By: Xudong Hao (haoxudong) Date: 2010-04-12 09:56 Message: Latest commit: 069c71e7a5e5f968099ca551f68e5dfe5a3f71bc qemu: 5c781061a45abe1855ad0a95d834336da574703c Windows guest did not blue screen with latest KVM commit, but guest hang at the loading scroll bar. This issue only happen on shadow mode. Windows guest can boot up with EPT mode. -- Comment By: Marcelo Tosatti (mtosatti) Date: 2010-04-06 02:50 Message: Xudong, Can you reproduce the problem with latest git repositories? The BSOD screenshot is truncated, the error number is not present. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2976863&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.
Michael, >> What we have not done yet: >> packet split support >> >What does this mean, exactly? We can support 1500MTU, but for jumbo frame, since vhost driver before don't support mergeable buffer, we cannot try it for multiple sg. >>>I do not see why, vhost currently supports 64K buffers with indirect >>>descriptors. >>> >> The receive_skb() in guest virtio-net driver will merge the multiple sg to >> skb frags, how >>can indirect descriptors to that? >See add_recvbuf_big. I don't mean this, it's for buffer submission. I mean when packet is received, in receive_buf(), mergeable buffer knows which pages received can be hooked in skb frags, it's receive_mergeable() which do this. When a NIC driver supports packet split mode, then each ring descriptor contains a skb and a page. When packet is received, if the status is not EOP, then hook the page of the next descriptor to the prev skb. We don't how many frags belongs to one skb. So when guest submit buffers, it should submit multiple pages, and when receive, the guest should know which pages are belongs to one skb and hook them together. I think receive_mergeable() can do this, but I don't see how big->packets handle this. May I miss something here? Thanks Xiaohui -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1 V2 RESEND] Add new VM Exit codes
sorry the last one was word wrapped. now i fixed it. hi, this patch, * adds two more EXIT_REASONS 33 and 34 in vmx.h * also adds them to exit reason strings. Signed-off-by: Manish Regmi diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index fb9a080..9bf84db 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -251,6 +251,8 @@ enum vmcs_field { #define EXIT_REASON_IO_INSTRUCTION 30 #define EXIT_REASON_MSR_READ31 #define EXIT_REASON_MSR_WRITE 32 +#define EXIT_REASON_INVALID_GUEST_STATE 33 +#define EXIT_REASON_MSR_LOADING 34 #define EXIT_REASON_MWAIT_INSTRUCTION 36 #define EXIT_REASON_MONITOR_INSTRUCTION 39 #define EXIT_REASON_PAUSE_INSTRUCTION 40 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0b896ac..a5c53ad 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4061,6 +4061,8 @@ static const struct trace_print_flags vmx_exit_reasons_str[] = { _ER(IO_INSTRUCTION), _ER(MSR_READ), _ER(MSR_WRITE), +_ER(INVALID_GUEST_STATE), +_ER(MSR_LOADING), _ER(MWAIT_INSTRUCTION), _ER(MONITOR_INSTRUCTION), _ER(PAUSE_INSTRUCTION), regards, Manish Regmi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1 V2] clear leftmost bit when exit failure is vm entry type
hi, When the vm exit reason is VM Entry failures it has leftmost bit set. This patch - clears the leftmost bit when copying to vmx->exit_reason. This will make the checks like if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) valid in vmx_complete_interrupts. Signed-off-by: Manish Regmi diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0b896ac..e0ca917 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3642,7 +3642,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); + vmx->exit_reason = vmcs_read32(VM_EXIT_REASON) & ~VMX_EXIT_REASONS_FAILED_VMENTRY; /* Handle machine checks before interrupts are enabled */ if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) regards Manish Regmi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
On Mon, 2010-04-19 at 20:11 +0200, Ingo Molnar wrote: > FYI, i found a few problems that need fixing: > > > + unsigned long ip; > > + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) > > missing newline. > > > + int misc = 0; > > + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { > > ditto. > > > + PERF_RECORD_MISC_GUEST_KERNEL; > > + } else > > + misc |= user_mode(regs) ? PERF_RECORD_MISC_USER : > > + PERF_RECORD_MISC_KERNEL; > > - unbalanced curly braces > - missing curly brace for multi-line statement. > - unnecessary line-break due to col80 warning from checkpatch > > > +extern struct perf_guest_info_callbacks *perf_guest_cbs; > > +extern int perf_register_guest_info_callbacks( > > + struct perf_guest_info_callbacks *); > > +extern int perf_unregister_guest_info_callbacks( > > + struct perf_guest_info_callbacks *); > > - unnecessary line-break due to col80 warning from checkpatch > > > +static inline int perf_register_guest_info_callbacks > > +(struct perf_guest_info_callbacks *) {return 0; } > > +static inline int perf_unregister_guest_info_callbacks > > +(struct perf_guest_info_callbacks *) {return 0; } > > - invalid C: function parameter needs name even if unused > - missing space after opening curly brace > > Please provide delta fixes. Here is the fix on the top of the prior 3 patches of V5. From: Zhang, Yanmin Fix some programming style issues on the top of perf kvm enhancement V5. Signed-off-by: Zhang Yanmin --- diff -Nraup linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0417_perfkvmstyle/arch/x86/kernel/cpu/perf_event.c --- linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-04-19 09:53:59.689452915 +0800 +++ linux-2.6_tip0417_perfkvmstyle/arch/x86/kernel/cpu/perf_event.c 2010-04-20 10:48:18.500999849 +0800 @@ -1752,23 +1752,29 @@ void perf_arch_fetch_caller_regs(struct unsigned long perf_instruction_pointer(struct pt_regs *regs) { unsigned long ip; + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) ip = perf_guest_cbs->get_guest_ip(); else ip = instruction_pointer(regs); + return ip; } unsigned long perf_misc_flags(struct pt_regs *regs) { int misc = 0; + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { - misc |= perf_guest_cbs->is_user_mode() ? - PERF_RECORD_MISC_GUEST_USER : - PERF_RECORD_MISC_GUEST_KERNEL; - } else - misc |= user_mode(regs) ? PERF_RECORD_MISC_USER : - PERF_RECORD_MISC_KERNEL; + if (perf_guest_cbs->is_user_mode()) + misc |= PERF_RECORD_MISC_GUEST_USER; + else + misc |= PERF_RECORD_MISC_GUEST_KERNEL; + } else if (user_mode(regs)) + misc |= PERF_RECORD_MISC_USER; + else + misc |= PERF_RECORD_MISC_KERNEL; + if (regs->flags & PERF_EFLAGS_EXACT) misc |= PERF_RECORD_MISC_EXACT; diff -Nraup linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c linux-2.6_tip0417_perfkvmstyle/arch/x86/kvm/x86.c --- linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c2010-04-19 09:53:59.691378953 +0800 +++ linux-2.6_tip0417_perfkvmstyle/arch/x86/kvm/x86.c 2010-04-20 10:11:40.507545564 +0800 @@ -3776,16 +3776,20 @@ static int kvm_is_in_guest(void) static int kvm_is_user_mode(void) { int user_mode = 3; + if (percpu_read(current_vcpu)) user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu)); + return user_mode != 0; } static unsigned long kvm_get_guest_ip(void) { unsigned long ip = 0; + if (percpu_read(current_vcpu)) ip = kvm_rip_read(percpu_read(current_vcpu)); + return ip; } diff -Nraup linux-2.6_tip0417_perfkvm/include/linux/perf_event.h linux-2.6_tip0417_perfkvmstyle/include/linux/perf_event.h --- linux-2.6_tip0417_perfkvm/include/linux/perf_event.h2010-04-19 09:53:59.691378953 +0800 +++ linux-2.6_tip0417_perfkvmstyle/include/linux/perf_event.h 2010-04-20 10:08:03.531551890 +0800 @@ -941,10 +941,8 @@ static inline void perf_event_mmap(struc } extern struct perf_guest_info_callbacks *perf_guest_cbs; -extern int perf_register_guest_info_callbacks( - struct perf_guest_info_callbacks *); -extern int perf_unregister_guest_info_callbacks( - struct perf_guest_info_callbacks *); +extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); +extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks); extern void perf_event_comm(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); @@ -1016,9 +1014,9 @@ static inline void perf_bp_event(struct perf_event *event, void *data){ } static i
Re: [PATCH] kvm: use the correct RCU API
Paul E. McKenney wrote: > On Mon, Apr 19, 2010 at 12:49:04PM +0300, Avi Kivity wrote: >> On 04/19/2010 12:41 PM, Lai Jiangshan wrote: >>> The RCU/SRCU API have already changed for proving RCU usage. >>> >>> I got the following dmesg when PROVE_RCU=y because we used incorrect API. >>> This patch coverts rcu_deference() to srcu_dereference() or family API. >>> >>> === >>> [ INFO: suspicious rcu_dereference_check() usage. ] >>> --- >>> arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! >>> >>> other info that might help us debug this: >>> >>> >>> rcu_scheduler_active = 1, debug_locks = 0 >>> 2 locks held by qemu-system-x86/8550: >>> #0: (&kvm->slots_lock){+.+.+.}, at: [] >>> kvm_set_memory_region+0x29/0x50 [kvm] >>> #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] >>> kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] >>> >>> stack backtrace: >>> Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 >>> #27 >>> Call Trace: >>> [] lockdep_rcu_dereference+0xaa/0xb3 >>> [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] >>> [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] >>> [] __kvm_set_memory_region+0x636/0x6e2 [kvm] >>> [] kvm_set_memory_region+0x37/0x50 [kvm] >>> [] vmx_set_tss_addr+0x46/0x5a [kvm_intel] >>> [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] >>> [] ? unlock_page+0x27/0x2c >>> [] ? __do_fault+0x3a9/0x3e1 >>> [] kvm_vm_ioctl+0x364/0x38d [kvm] >>> [] ? up_read+0x23/0x3d >>> [] vfs_ioctl+0x32/0xa6 >>> [] do_vfs_ioctl+0x495/0x4db >>> [] ? fget_light+0xc2/0x241 >>> [] ? do_sys_open+0x104/0x116 >>> [] ? retint_swapgs+0xe/0x13 >>> [] sys_ioctl+0x47/0x6a >>> [] system_call_fastpath+0x16/0x1b >>> >>> >>> >>> +static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) >>> +{ >>> + return rcu_dereference_check(kvm->memslots, >>> + srcu_read_lock_held(&kvm->srcu) >>> + || lockdep_is_held(&kvm->slots_lock)); >>> +} >>> + >> >> This open-codes srcu_dereference(). I guess we need an >> srcu_dereference_check(). Paul? > rcu_dereference_check() is useful when rcu_dereference(), rcu_dereference_bh(), rcu_dereference_sched() and srcu_dereference() are not appropriate. I think we don't need srcu_dereference_check() nor rcu_dereference_bh_check() nor rcu_dereference_sched_check(). > One is coming in Arnd's sparse-based patchset. It is probably best > to open-code this in the meantime and clean up later, but I will > double-check with Arnd. > >> btw, perhaps it is possible not to call rcu_dereference from the >> write paths. > > There is an rcu_dereference_protected() on its way to mainline to handle > the case where the reference is always protected by a lock. Why not > just access it directly? Because if you do that, the sparse-based checks > will yell at you. > > There is also an rcu_access_pointer() on its way to mainline for cases > where you only want to test the pointer itself, not dereference it. > > Thanx, Paul > I reviewed the code, the functions can be called from the srcu-read-site or update-site, rcu_dereference_check() can simplify the code. If we use rcu_dereference_protected(), we may need duplicate the functions. I think there is very small overhead of using rcu_dereference(), so we can call it from write paths. Thanks, Lai -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, Apr 19, 2010 at 03:25:43PM -0300, Glauber Costa wrote: > On Mon, Apr 19, 2010 at 09:19:38AM -0700, Jeremy Fitzhardinge wrote: > > On 04/19/2010 07:26 AM, Glauber Costa wrote: > > >> Is the problem that the tscs are starting out of sync, or that they're > > >> drifting relative to each other over time? Do the problems become worse > > >> the longer the uptime? How large are the offsets we're talking about > > >> here? > > >> > > > The offsets usually seem pretty small, under a microsecond. So I don't > > > think > > > it has anything to do with tscs starting out of sync. Specially because > > > the > > > delta-based calculation has the exact purpose of handling that case. > > > > > > > So you think they're drifting out of sync from an initially synced > > state? If so, what would bound the drift? > I think delta calculation introduces errors. Yes. > Marcelo can probably confirm it, but he has a nehalem with an appearently > very good tsc source. Even this machine warps. > > It stops warping if we only write pvclock data structure once and forget it, > (which only updated tsc_timestamp once), according to him. Yes. So its not as if the guest visible TSCs go out of sync (they don't on this machine Glauber mentioned, or even on a multi-core Core 2 Duo), but the delta calculation is very hard (if not impossible) to get right. The timewarps i've seen were in the 0-200ns range, and very rare (once every 10 minutes or so). > Obviously, we can't do that everywhere. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1 V2] Add new VM Exit codes
hi, this patch, * adds two more EXIT_REASONS 33 and 34 in vmx.h * also adds them to exit reason strings. Signed-off-by: Manish Regmi diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index fb9a080..9bf84db 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -251,6 +251,8 @@ enum vmcs_field { #define EXIT_REASON_IO_INSTRUCTION 30 #define EXIT_REASON_MSR_READ31 #define EXIT_REASON_MSR_WRITE 32 +#define EXIT_REASON_INVALID_GUEST_STATE 33 +#define EXIT_REASON_MSR_LOADING 34 #define EXIT_REASON_MWAIT_INSTRUCTION 36 #define EXIT_REASON_MONITOR_INSTRUCTION 39 #define EXIT_REASON_PAUSE_INSTRUCTION 40 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0b896ac..a5c53ad 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4061,6 +4061,8 @@ static const struct trace_print_flags vmx_exit_reasons_str[] = { _ER(IO_INSTRUCTION), _ER(MSR_READ), _ER(MSR_WRITE), +_ER(INVALID_GUEST_STATE), +_ER(MSR_LOADING), _ER(MWAIT_INSTRUCTION), _ER(MONITOR_INSTRUCTION), _ER(PAUSE_INSTRUCTION), regards, Manish Regmi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held
On Mon, Apr 19, 2010 at 01:08:29PM +0300, Avi Kivity wrote: > On 04/19/2010 12:58 PM, Lai Jiangshan wrote: > >Applied the patch I just sent and let CONFIG_PROVE_RCU=y, > >we can got the following dmesg. And we found that it is > >because some codes in KVM dereferences srcu-protected pointer without > >srcu_read_lock() held or update-side lock held. > > > >It is not hard to fix, the problem is that: > >Where is the most proper place to put a srcu_read_lock()? > > > >I can not determine the answer, so I report this bug > >instead of fixing it. > > > > I think the else branch in complete_pio() should work. Marcelo? > > Longer term I'd like to see the lock taken at the high levels > (ioctls, in virt/kvm) and dropped only for guest entry and when we > explicitly sleep (hlt emulation). > > Note: complete_pio() is gone in the current code. Yes, this was fixed by 7fb2ea1e6. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH
Michael S. Tsirkin wrote: > I took a stub at documenting CMD and FLUSH request types in virtio > block. Christoph, could you look over this please? > > I note that the interface seems full of warts to me, > this might be a first step to cleaning them. > > One issue I struggled with especially is how type > field mixes bits and non-bit values. I ended up > simply defining all legal values, so that we have > CMD = 2, CMD_OUT = 3 and so on. > > I also avoided instroducing inhdr/outhdr structures > that virtio blk driver in linux uses, I was concerned > that nesting tables will confuse the reader. > > Comments welcome. > > Signed-off-by: Michael S. Tsirkin > > -- > > diff --git a/virtio-spec.lyx b/virtio-spec.lyx > index d16104a..ed35893 100644 > --- a/virtio-spec.lyx > +++ b/virtio-spec.lyx > @@ -67,7 +67,11 @@ IBM Corporation > \end_layout > > \begin_layout Standard > + > +\change_deleted 0 1266531118 > FIXME: virtio block scsi passthrough section > +\change_unchanged > + > \end_layout > > \begin_layout Standard > @@ -4376,7 +4380,7 @@ struct virtio_net_ctrl_mac { > The device can filter incoming packets by any number of destination MAC > addresses. > \begin_inset Foot > -status open > +status collapsed > > \begin_layout Plain Layout > Since there are no guarentees, it can use a hash filter orsilently switch > @@ -4549,6 +4553,22 @@ blk_size > \end_inset > > . > +\change_inserted 0 1266444580 > + > +\end_layout > + > +\begin_layout Description > + > +\change_inserted 0 1266471229 > +VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands. > +\end_layout > + > +\begin_layout Description > + > +\change_inserted 0 1266444605 > +VIRTIO_BLK_F_FLUSH (9) Cache flush command support. > +\change_unchanged > + > \end_layout > > \begin_layout Description > @@ -4700,17 +4720,25 @@ struct virtio_blk_req { > > \begin_layout Plain Layout > > +\change_deleted 0 1266472188 > + > #define VIRTIO_BLK_T_IN 0 > \end_layout > > \begin_layout Plain Layout > > +\change_deleted 0 1266472188 > + > #define VIRTIO_BLK_T_OUT 1 > \end_layout > > \begin_layout Plain Layout > > +\change_deleted 0 1266472188 > + > #define VIRTIO_BLK_T_BARRIER 0x8000 > +\change_unchanged > + > \end_layout > > \begin_layout Plain Layout > @@ -4735,11 +4763,15 @@ struct virtio_blk_req { > > \begin_layout Plain Layout > > +\change_deleted 0 1266472204 > + > #define VIRTIO_BLK_S_OK0 > \end_layout > > \begin_layout Plain Layout > > +\change_deleted 0 1266472204 > + > #define VIRTIO_BLK_S_IOERR 1 > \end_layout > > @@ -4759,32 +4791,481 @@ struct virtio_blk_req { > \end_layout > > \begin_layout Standard > -The type of the request is either a read (VIRTIO_BLK_T_IN) or a write > (VIRTIO_BL > -K_T_OUT); the high bit indicates that this request acts as a barrier and > - that all preceeding requests must be complete before this one, and all > - following requests must not be started until this is complete. > + > +\change_inserted 0 1266472490 > +If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet > + command requests, each of these requests is of form: > +\begin_inset listings > +inline false > +status open > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472395 > + > +struct virtio_scsi_pc_req { > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472375 > + > + u32 type; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472375 > + > + u32 ioprio; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266474298 > + > + u64 sector; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266474308 > + > +char cmd[]; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266505809 > + > + char data[][512]; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266505825 > + > +#define SCSI_SENSE_BUFFERSIZE 96 > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266505848 > + > +u8 sense[SCSI_SENSE_BUFFERSIZE]; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472969 > + > +u32 errors; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472979 > + > +u32 data_len; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472984 > + > +u32 sense_len; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472987 > + > +u32 residual; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472375 > + > + u8 status; > +\end_layout > + > +\begin_layout Plain Layout > + > +\change_inserted 0 1266472375 > + > +}; > +\end_layout > + > +\end_inset > + > + > +\change_unchanged > + > \end_layout > > \begin_layout Standard > -The ioprio field is a hint about the relative priorities of requests to > - the
Re: [PATCH] kvm x86 mmu: simplify kvm_mmu_unlink_parents()
On Sat, Apr 17, 2010 at 04:50:13PM +0800, Lai Jiangshan wrote: > > mmu_page_remove_parent_pte() does much maintenance works, > but kvm_mmu_unlink_parents() unlink all parents, so > such maintenance works are not need. > > This patch simplifies the works of kvm_mmu_unlink_parents() > by unlinking parents without so many maintenance works. > > Signed-off-by: Lai Jiangshan > --- > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 90f666e..71faa04 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -1453,22 +1453,33 @@ static void kvm_mmu_reset_last_pte_updated(struct kvm > *kvm) > > static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) > { > - u64 *parent_pte; > + if (!sp->multimapped) { > + if (!sp->parent_pte) > + return; > > - while (sp->multimapped || sp->parent_pte) { > - if (!sp->multimapped) > - parent_pte = sp->parent_pte; > - else { > - struct kvm_pte_chain *chain; > + __set_spte(sp->parent_pte, shadow_trap_nonpresent_pte); > + sp->parent_pte = NULL; > + return; > + } > > - chain = container_of(sp->parent_ptes.first, > - struct kvm_pte_chain, link); > - parent_pte = chain->parent_ptes[0]; > + while (!hlist_empty(&sp->parent_ptes)) { > + struct kvm_pte_chain *chain; > + u64 *parent_pte; > + int i; > + > + chain = hlist_entry(sp->parent_ptes.first, > + struct kvm_pte_chain, link); > + for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) { > + parent_pte = chain->parent_ptes[i]; > + if (!parent_pte) > + break; > + __set_spte(parent_pte, shadow_trap_nonpresent_pte); > } > - BUG_ON(!parent_pte); > - kvm_mmu_put_page(sp, parent_pte); > - __set_spte(parent_pte, shadow_trap_nonpresent_pte); > + hlist_del(&chain->link); > + mmu_free_pte_chain(chain); > } > + sp->multimapped = 0; > + sp->parent_pte = NULL; > } > > static int mmu_zap_unsync_children(struct kvm *kvm, > Personally i don't see why this is any better than the previous code. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] KVM: PPC: Convert u64 -> ulong
There are some pieces in the code that I overlooked that still use u64s instead of longs. This slows down 32 bit hosts unnecessarily, so let's just move them to ulong. Signed-off-by: Alexander Graf --- v1 -> v2: - don't touch vsid - that stays u64! --- arch/powerpc/include/asm/kvm_book3s.h |4 ++-- arch/powerpc/include/asm/kvm_host.h |6 +++--- arch/powerpc/kvm/book3s.c |6 +++--- arch/powerpc/kvm/book3s_32_mmu.c |6 +++--- arch/powerpc/kvm/book3s_32_mmu_host.c |8 +++- arch/powerpc/kvm/book3s_64_mmu.c |4 ++-- arch/powerpc/kvm/book3s_64_mmu_host.c |6 +++--- 7 files changed, 19 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 9517b8d..5d3bd0c 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -107,9 +107,9 @@ struct kvmppc_vcpu_book3s { #define VSID_BAT 0x7fb0ULL #define VSID_PR0x8000ULL -extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask); +extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong ea_mask); extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask); -extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end); +extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end); extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr); extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu); extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 5a83995..0c9ad86 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -124,9 +124,9 @@ struct kvm_arch { }; struct kvmppc_pte { - u64 eaddr; + ulong eaddr; u64 vpage; - u64 raddr; + ulong raddr; bool may_read : 1; bool may_write : 1; bool may_execute: 1; @@ -145,7 +145,7 @@ struct kvmppc_mmu { int (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data); void (*reset_msr)(struct kvm_vcpu *vcpu); void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large); - int (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid); + int (*esid_to_vsid)(struct kvm_vcpu *vcpu, ulong esid, u64 *vsid); u64 (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data); bool (*is_dcbz32)(struct kvm_vcpu *vcpu); }; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 5805f99..a7de709 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -812,12 +812,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * so we can't use the NX bit inside the guest. Let's cross our fingers, * that no guest that needs the dcbz hack does NX. */ - kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), ~0xFFFULL); + kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), ~0xFFFUL); r = RESUME_GUEST; } else { vcpu->arch.msr |= to_svcpu(vcpu)->shadow_srr1 & 0x5800; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); - kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), ~0xFFFULL); + kvmppc_mmu_pte_flush(vcpu, kvmppc_get_pc(vcpu), ~0xFFFUL); r = RESUME_GUEST; } break; @@ -843,7 +843,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->arch.dear = dar; to_book3s(vcpu)->dsisr = to_svcpu(vcpu)->fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); - kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL); + kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFUL); r = RESUME_GUEST; } break; diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 48efb37..33186b7 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -60,7 +60,7 @@ static inline bool check_debug_ip(struct kvm_vcpu *vcpu) static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data); -static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid, +static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid, u64 *vsid); static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s
[PATCH 7/9] KVM: PPC: Fix Book3S_64 Host MMU debug output
We have some debug output in Book3S_64. Some of that was invalid though, partially not even compiling because it accessed incorrect variables. So let's fix that up, making debugging more fun again. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu_host.c | 20 1 files changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index 5545c45..e4b5744 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -48,8 +48,8 @@ static void invalidate_pte(struct hpte_cache *pte) { - dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", - i, pte->pte.eaddr, pte->pte.vpage, pte->host_va); + dprintk_mmu("KVM: Flushing SPT: 0x%lx (0x%llx) -> 0x%llx\n", + pte->pte.eaddr, pte->pte.vpage, pte->host_va); ppc_md.hpte_invalidate(pte->slot, pte->host_va, MMU_PAGE_4K, MMU_SEGSIZE_256M, @@ -66,7 +66,7 @@ void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask) { int i; - dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n", + dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%lx & 0x%lx\n", vcpu->arch.hpte_cache_offset, guest_ea, ea_mask); BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM); @@ -114,8 +114,8 @@ void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, ulong pa_start, ulong pa_end) { int i; - dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n", - vcpu->arch.hpte_cache_offset, guest_pa, pa_mask); + dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%lx & 0x%lx\n", + vcpu->arch.hpte_cache_offset, pa_start, pa_end); BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM); for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) { @@ -186,7 +186,7 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid) sid_map_mask = kvmppc_sid_hash(vcpu, gvsid); map = &to_book3s(vcpu)->sid_map[sid_map_mask]; if (map->guest_vsid == gvsid) { - dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n", + dprintk_slb("SLB: Searching: 0x%llx -> 0x%llx\n", gvsid, map->host_vsid); return map; } @@ -198,7 +198,8 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid) return map; } - dprintk_slb("SLB: Searching 0x%llx -> not found\n", gvsid); + dprintk_slb("SLB: Searching %d/%d: 0x%llx -> not found\n", + sid_map_mask, SID_MAP_MASK - sid_map_mask, gvsid); return NULL; } @@ -275,7 +276,7 @@ map_again: int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu); struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id]; - dprintk_mmu("KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> %lx\n", + dprintk_mmu("KVM: %c%c Map 0x%lx: [%lx] 0x%lx (0x%llx) -> %lx\n", ((rflags & HPTE_R_PP) == 3) ? '-' : 'w', (rflags & HPTE_R_N) ? '-' : 'x', orig_pte->eaddr, hpteg, va, orig_pte->vpage, hpaddr); @@ -331,6 +332,9 @@ static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid) map->guest_vsid = gvsid; map->valid = true; + dprintk_slb("SLB: New mapping at %d: 0x%llx -> 0x%llx\n", + sid_map_mask, gvsid, map->host_vsid); + return map; } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] KVM: PPC: Be more informative on BUG
We have a condition in the ppc64 host mmu code that should never occur. Unfortunately, it just did happen to me and I was rather puzzled on why, because BUG_ON doesn't tell me anything useful. So let's add some more debug output in case this goes wrong. Also change BUG to WARN, since I don't want to reboot every time I mess something up. Signed-off-by: Alexander Graf --- v1 -> v2: - make WARN bail out --- arch/powerpc/kvm/book3s_64_mmu_host.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c index 41af12f..5545c45 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_host.c +++ b/arch/powerpc/kvm/book3s_64_mmu_host.c @@ -231,10 +231,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid); map = find_sid_vsid(vcpu, vsid); if (!map) { - kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr); + ret = kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr); + WARN_ON(ret < 0); map = find_sid_vsid(vcpu, vsid); } - BUG_ON(!map); + if (!map) { + printk(KERN_ERR "KVM: Segment map for 0x%llx (0x%lx) failed\n", + vsid, orig_pte->eaddr); + WARN_ON(true); + return -EINVAL; + } vsid = map->host_vsid; va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M); -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] KVM: PPC: Enable native paired singles
When we're on a paired single capable host, we can just always enable paired singles and expose them to the guest directly. This approach breaks when multiple VMs run and access PS concurrently, but this should suffice until we get a proper framework for it in Linux. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_asm.h |1 + arch/powerpc/kvm/book3s.c | 19 +++ arch/powerpc/kvm/book3s_emulate.c |5 - 3 files changed, 24 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 7238c04..c5ea4cd 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -89,6 +89,7 @@ #define BOOK3S_HFLAG_DCBZ320x1 #define BOOK3S_HFLAG_SLB 0x2 #define BOOK3S_HFLAG_PAIRED_SINGLE 0x4 +#define BOOK3S_HFLAG_NATIVE_PS 0x8 #define RESUME_FLAG_NV (1<<0) /* Reload guest nonvolatile state? */ #define RESUME_FLAG_HOST(1<<1) /* Resume host? */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 91dc42d..cfc3051 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -344,6 +344,8 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) { + u32 host_pvr; + vcpu->arch.hflags &= ~BOOK3S_HFLAG_SLB; vcpu->arch.pvr = pvr; #ifdef CONFIG_PPC_BOOK3S_64 @@ -375,6 +377,23 @@ void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) /* 32 bit Book3S always has 32 byte dcbz */ vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32; #endif + + /* On some CPUs we can execute paired single operations natively */ + asm ( "mfpvr %0" : "=r"(host_pvr)); + switch (host_pvr) { + case 0x00080200:/* lonestar 2.0 */ + case 0x00088202:/* lonestar 2.2 */ + case 0x7100:/* gekko 1.0 */ + case 0x00080100:/* gekko 2.0 */ + case 0x00083203:/* gekko 2.3a */ + case 0x00083213:/* gekko 2.3b */ + case 0x00083204:/* gekko 2.4 */ + case 0x00083214:/* gekko 2.4e (8SE) - retail HW2 */ + case 0x00087200:/* broadway */ + vcpu->arch.hflags |= BOOK3S_HFLAG_NATIVE_PS; + /* Enable HID2.PSE - in case we need it later */ + mtspr(SPRN_HID2_GEKKO, mfspr(SPRN_HID2_GEKKO) | (1 << 29)); + } } /* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 3f7afb5..c85f906 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -365,7 +365,10 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) case 0x00083213:/* gekko 2.3b */ case 0x00083204:/* gekko 2.4 */ case 0x00083214:/* gekko 2.4e (8SE) - retail HW2 */ - if (spr_val & (1 << 29)) { /* HID2.PSE */ + case 0x00087200:/* broadway */ + if (vcpu->arch.hflags & BOOK3S_HFLAG_NATIVE_PS) { + /* Native paired singles */ + } else if (spr_val & (1 << 29)) { /* HID2.PSE */ vcpu->arch.hflags |= BOOK3S_HFLAG_PAIRED_SINGLE; kvmppc_giveup_ext(vcpu, MSR_FP); } else { -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] KVM: PPC: Find HTAB ourselves
For KVM we need to find the location of the HTAB. We can either rely on internal data structures of the kernel or ask the hardware. Ben issued complaints about the internal data structure method, so let's switch it to our own inquiry of the HTAB. Now we're fully independend :-). CC: Benjamin Herrenschmidt Signed-off-by: Alexander Graf --- arch/powerpc/kernel/ppc_ksyms.c |5 - arch/powerpc/kvm/book3s_32_mmu_host.c | 21 + 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 2b7c43f..bc9f39d 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -178,11 +178,6 @@ EXPORT_SYMBOL(switch_mmu_context); extern long mol_trampoline; EXPORT_SYMBOL(mol_trampoline); /* For MOL */ EXPORT_SYMBOL(flush_hash_pages); /* For MOL */ - -extern struct hash_pte *Hash; -extern unsigned long _SDR1; -EXPORT_SYMBOL_GPL(Hash); /* For KVM */ -EXPORT_SYMBOL_GPL(_SDR1); /* For KVM */ #ifdef CONFIG_SMP extern int mmu_hash_lock; EXPORT_SYMBOL(mmu_hash_lock); /* For MOL */ diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 2bb67e6..0bb6600 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -54,6 +54,9 @@ #error Only 32 bit pages are supported for now #endif +static ulong htab; +static u32 htabmask; + static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) { volatile u32 *pteg; @@ -217,14 +220,11 @@ static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid) return NULL; } -extern struct hash_pte *Hash; -extern unsigned long _SDR1; - static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, u32 vsid, u32 eaddr, bool primary) { - u32 page, hash, htabmask; - ulong pteg = (ulong)Hash; + u32 page, hash; + ulong pteg = htab; page = (eaddr & ~ESID_MASK) >> 12; @@ -232,13 +232,12 @@ static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, u32 vsid, u32 eaddr, if (!primary) hash = ~hash; - htabmask = ((_SDR1 & 0x1FF) << 16) | 0xFFC0; hash &= htabmask; pteg |= hash; - dprintk_mmu("htab: %p | hash: %x | htabmask: %x | pteg: %lx\n", - Hash, hash, htabmask, pteg); + dprintk_mmu("htab: %lx | hash: %x | htabmask: %x | pteg: %lx\n", + htab, hash, htabmask, pteg); return (u32*)pteg; } @@ -453,6 +452,7 @@ int kvmppc_mmu_init(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_book3s *vcpu3s = to_book3s(vcpu); int err; + ulong sdr1; err = __init_new_context(); if (err < 0) @@ -474,5 +474,10 @@ int kvmppc_mmu_init(struct kvm_vcpu *vcpu) vcpu3s->vsid_next = vcpu3s->vsid_first; + /* Remember where the HTAB is */ + asm ( "mfsdr1 %0" : "=r"(sdr1) ); + htabmask = ((sdr1 & 0x1FF) << 16) | 0xFFC0; + htab = (ulong)__va(sdr1 & 0x); + return 0; } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] KVM: PPC: Improve split mode
When in split mode, instruction relocation and data relocation are not equal. So far we implemented this mode by reserving a special pseudo-VSID for the two cases and flushing all PTEs when going into split mode, which is slow. Unfortunately 32bit Linux and Mac OS X use split mode extensively. So to not slow down things too much, I came up with a different idea: Mark the split mode with a bit in the VSID and then treat it like any other segment. This means we can just flush the shadow segment cache, but keep the PTEs intact. I verified that this works with ppc32 Linux and Mac OS X 10.4 guests and does speed them up. Signed-off-by: Alexander Graf --- arch/powerpc/include/asm/kvm_book3s.h |9 - arch/powerpc/kvm/book3s.c | 28 ++-- arch/powerpc/kvm/book3s_32_mmu.c | 21 + arch/powerpc/kvm/book3s_64_mmu.c | 27 +++ 4 files changed, 46 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5d3bd0c..6f74d93 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -100,11 +100,10 @@ struct kvmppc_vcpu_book3s { #define CONTEXT_GUEST 1 #define CONTEXT_GUEST_END 2 -#define VSID_REAL_DR 0x7ff0ULL -#define VSID_REAL_IR 0x7fe0ULL -#define VSID_SPLIT_MASK0x7fe0ULL -#define VSID_REAL 0x7fc0ULL -#define VSID_BAT 0x7fb0ULL +#define VSID_REAL 0x1fc0ULL +#define VSID_BAT 0x1fb0ULL +#define VSID_REAL_DR 0x2000ULL +#define VSID_REAL_IR 0x4000ULL #define VSID_PR0x8000ULL extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong ea, ulong ea_mask); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index a03163b..91dc42d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -147,16 +147,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr) } } - if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) || - (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) { - bool dr = (vcpu->arch.msr & MSR_DR) ? true : false; - bool ir = (vcpu->arch.msr & MSR_IR) ? true : false; - - /* Flush split mode PTEs */ - if (dr != ir) - kvmppc_mmu_pte_vflush(vcpu, VSID_SPLIT_MASK, - VSID_SPLIT_MASK); - + if ((vcpu->arch.msr & (MSR_PR|MSR_IR|MSR_DR)) != + (old_msr & (MSR_PR|MSR_IR|MSR_DR))) { kvmppc_mmu_flush_segments(vcpu); kvmppc_mmu_map_segment(vcpu, kvmppc_get_pc(vcpu)); } @@ -534,6 +526,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, bool is_mmio = false; bool dr = (vcpu->arch.msr & MSR_DR) ? true : false; bool ir = (vcpu->arch.msr & MSR_IR) ? true : false; + u64 vsid; relocated = data ? dr : ir; @@ -551,13 +544,20 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) { case 0: - pte.vpage |= VSID_REAL; + pte.vpage |= ((u64)VSID_REAL << (SID_SHIFT - 12)); break; case MSR_DR: - pte.vpage |= VSID_REAL_DR; - break; case MSR_IR: - pte.vpage |= VSID_REAL_IR; + vcpu->arch.mmu.esid_to_vsid(vcpu, eaddr >> SID_SHIFT, &vsid); + + if ((vcpu->arch.msr & (MSR_DR|MSR_IR)) == MSR_DR) + pte.vpage |= ((u64)VSID_REAL_DR << (SID_SHIFT - 12)); + else + pte.vpage |= ((u64)VSID_REAL_IR << (SID_SHIFT - 12)); + pte.vpage |= vsid; + + if (vsid == -1) + page_found = -EINVAL; break; } diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 33186b7..0b10503 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -330,30 +330,35 @@ static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool lar static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid, u64 *vsid) { + ulong ea = esid << SID_SHIFT; + struct kvmppc_sr *sr; + u64 gvsid = esid; + + if (vcpu->arch.msr & (MSR_DR|MSR_IR)) { + sr = find_sr(to_book3s(vcpu), ea); + if (sr->valid) + gvsid = sr->vsid; + } + /* In case we only have one of MSR_IR or MSR_DR set, let's put that in the real-mode context (and hope RM doesn't access high memory) */ switch (vcpu
[PATCH 0/9] Post-PPC32 series v2
While working with the PPC32 host target we finally have I stumbled over several things. Thanks to the now possible performance measurements I also tracked down split mode as one of the major slowdowns to KVM. What's left now that slows us down is the normal flushing code that needs to move to a table based lookup and instruction emulation. On PPC32 guests we waste about 70% of our time on emulating mfmsr, mtmsr, mfsprg, mtsprg and friends. Either way - this patch series deprecates the former performance counter and u64 patch. Avi / Marcelo, please apply the former series and this series. Ignore the two patches in between. v1 -> v2: - add paired single patch - move WARN bailing to the correct patch Alexander Graf (9): KVM: PPC: Convert u64 -> ulong KVM: PPC: Make Performance Counters work KVM: PPC: Improve split mode KVM: PPC: Make Alignment interrupts work again KVM: PPC: Be more informative on BUG KVM: PPC: Set VSID_PR also for Book3S_64 KVM: PPC: Fix Book3S_64 Host MMU debug output KVM: PPC: Find HTAB ourselves KVM: PPC: Enable native paired singles arch/powerpc/include/asm/kvm_asm.h|1 + arch/powerpc/include/asm/kvm_book3s.h | 13 +++ arch/powerpc/include/asm/kvm_host.h |6 ++-- arch/powerpc/kernel/ppc_ksyms.c |5 --- arch/powerpc/kvm/book3s.c | 56 +++-- arch/powerpc/kvm/book3s_32_mmu.c | 27 +-- arch/powerpc/kvm/book3s_32_mmu_host.c | 29 +--- arch/powerpc/kvm/book3s_64_mmu.c | 34 arch/powerpc/kvm/book3s_64_mmu_host.c | 36 +--- arch/powerpc/kvm/book3s_emulate.c |5 ++- arch/powerpc/kvm/book3s_interrupts.S |2 + arch/powerpc/kvm/book3s_segment.S |2 + 12 files changed, 132 insertions(+), 84 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] KVM: PPC: Set VSID_PR also for Book3S_64
Book3S_64 didn't set VSID_PR when we're in PR=1. This lead to pretty bad behavior when searching for the shadow segment, as part of the code relied on VSID_PR being set. This patch fixes booting Book3S_64 guests. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_64_mmu.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 612de6e..4025ea2 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -473,6 +473,9 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid, break; } + if (vcpu->arch.msr & MSR_PR) + *vsid |= VSID_PR; + return 0; } -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] KVM: PPC: Make Alignment interrupts work again
In the process of merging Book3S_32 and 64 I somehow ended up having the alignment interrupt handler take last_inst, but the fetching code not fetching it. So we ended up with stale last_inst values. Let's just enable last_inst fetching for alignment interrupts too. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_segment.S |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index 778e3fc..ede47fd 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -196,6 +196,8 @@ kvmppc_handler_trampoline_exit: beq ld_last_inst cmpwi r12, BOOK3S_INTERRUPT_PROGRAM beq ld_last_inst + cmpwi r12, BOOK3S_INTERRUPT_ALIGNMENT + beq-ld_last_inst b no_ld_last_inst -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] KVM: PPC: Make Performance Counters work
When we get a performance counter interrupt we need to route it on to the Linux handler after we got out of the guest context. We also need to tell our handling code that this particular interrupt doesn't need treatment. So let's add those two bits in, making perf work while having a KVM guest running. Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s.c|3 +++ arch/powerpc/kvm/book3s_interrupts.S |2 ++ 2 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index a7de709..a03163b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -872,6 +872,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, vcpu->stat.ext_intr_exits++; r = RESUME_GUEST; break; + case BOOK3S_INTERRUPT_PERFMON: + r = RESUME_GUEST; + break; case BOOK3S_INTERRUPT_PROGRAM: { enum emulation_result er; diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index f5b3358..e486193 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -228,6 +228,8 @@ no_dcbz32_off: beq call_linux_handler cmpwi r12, BOOK3S_INTERRUPT_DECREMENTER beq call_linux_handler + cmpwi r12, BOOK3S_INTERRUPT_PERFMON + beq call_linux_handler /* Back to EE=1 */ mtmsr r6 -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Apr 20
On Monday 19 April 2010 18:30:44 Chris Wright wrote: > Please send in any agenda items you are interested in covering. 0.12.4? > > thanks, > -chris > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: use the correct RCU API
On Mon, Apr 19, 2010 at 12:49:04PM +0300, Avi Kivity wrote: > On 04/19/2010 12:41 PM, Lai Jiangshan wrote: > >The RCU/SRCU API have already changed for proving RCU usage. > > > >I got the following dmesg when PROVE_RCU=y because we used incorrect API. > >This patch coverts rcu_deference() to srcu_dereference() or family API. > > > >=== > >[ INFO: suspicious rcu_dereference_check() usage. ] > >--- > >arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! > > > >other info that might help us debug this: > > > > > >rcu_scheduler_active = 1, debug_locks = 0 > >2 locks held by qemu-system-x86/8550: > > #0: (&kvm->slots_lock){+.+.+.}, at: [] > > kvm_set_memory_region+0x29/0x50 [kvm] > > #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] > > kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] > > > >stack backtrace: > >Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 > >#27 > >Call Trace: > > [] lockdep_rcu_dereference+0xaa/0xb3 > > [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] > > [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] > > [] __kvm_set_memory_region+0x636/0x6e2 [kvm] > > [] kvm_set_memory_region+0x37/0x50 [kvm] > > [] vmx_set_tss_addr+0x46/0x5a [kvm_intel] > > [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] > > [] ? unlock_page+0x27/0x2c > > [] ? __do_fault+0x3a9/0x3e1 > > [] kvm_vm_ioctl+0x364/0x38d [kvm] > > [] ? up_read+0x23/0x3d > > [] vfs_ioctl+0x32/0xa6 > > [] do_vfs_ioctl+0x495/0x4db > > [] ? fget_light+0xc2/0x241 > > [] ? do_sys_open+0x104/0x116 > > [] ? retint_swapgs+0xe/0x13 > > [] sys_ioctl+0x47/0x6a > > [] system_call_fastpath+0x16/0x1b > > > > > > > >+static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) > >+{ > >+return rcu_dereference_check(kvm->memslots, > >+srcu_read_lock_held(&kvm->srcu) > >+|| lockdep_is_held(&kvm->slots_lock)); > >+} > >+ > > > This open-codes srcu_dereference(). I guess we need an > srcu_dereference_check(). Paul? One is coming in Arnd's sparse-based patchset. It is probably best to open-code this in the meantime and clean up later, but I will double-check with Arnd. > btw, perhaps it is possible not to call rcu_dereference from the > write paths. There is an rcu_dereference_protected() on its way to mainline to handle the case where the reference is always protected by a lock. Why not just access it directly? Because if you do that, the sparse-based checks will yell at you. There is also an rcu_access_pointer() on its way to mainline for cases where you only want to test the pointer itself, not dereference it. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for Apr 20
Please send in any agenda items you are interested in covering. thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.
Mohammed Gamal wrote: > On Tue, Apr 13, 2010 at 9:08 PM, jvrao wrote: >> jvrao wrote: >>> Alexander Graf wrote: On 12.04.2010, at 13:58, Jamie Lokier wrote: > Mohammed Gamal wrote: >> On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier >> wrote: >>> Javier Guerra Giraldez wrote: On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal wrote: > On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier > wrote: >> To throw a spanner in, the most widely supported filesystem across >> operating systems is probably NFS, version 2 :-) > Remember that Windows usage on a VM is not some rare use case, and > it'd be a little bit of a pain from a user's perspective to have to > install a third party NFS client for every VM they use. Having > something supported on the VM out of the box is a better option IMO. i don't think virtio-CIFS has any more support out of the box (on any system) than virtio-9P. >>> It doesn't, but at least network-CIFS tends to work ok and is the >>> method of choice for Windows VMs - when you can setup Samba on the >>> host (which as previously noted you cannot always do non-disruptively >>> with current Sambas). >>> >>> -- Jamie >>> >> I think having support for both 9p and CIFS would be the best option. >> In that case the user will have the option to use either one, >> depending on how their guests support these filesystems. In that case >> I'd prefer to work on CIFS support while the 9p effort can still go >> on. I don't think both efforts are mutually exclusive. >> >> What do the rest of you guys think? > I only noted NFS because most old OSes do not support CIFS or 9P - > especially all the old unixes. > > I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?) > even support current CIFS. They need extra server settings to work > - such as setting passwords on the server to non-encrypted and other > quirks. > > Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to > properly see symlinks and hard links. > > So there is no really nice out of the box file service which works > easily with all guest OSes. > > I'm guessing that out of all the filesystems, CIFS is the most widely > supported in recent OSes (released in the last 10 years). But I'm not > really sure what the state of CIFS is for non-Windows, non-Linux, > non-BSD guests. So what? If you want to have direct host fs access, install guest drivers. If you can't, set up networking and use CIFS or NFS or whatever. > I'm not sure why 9P is being pursued. Does anything much support it, > or do all OSes except quite recent Linux need a custom driver for 9P? > Even Linux only got the first commit in the kernel 5 years ago, so > probably it was only about 3 years ago that it will have begun > appearing in stable distros, if at all. Filesystem passthrough to > Linux guests installed in the last couple of years is a useful > feature, and I know that for many people that is their only use of > KVM, but compared with CIFS' broad support it seems like quite a > narrow goal. The goal is to have something simple and fast. We can fine-tune 9P to align with the Linux VFS structures, making it really little overhead (and little headache too). For Windows guests, nothing prevents us to expose yet another 9P flavor. That again would be aligned well with Windows's VFS and be slim and fast there. The biggest problem I see with CIFS is that it's a huge beast. There are a lot of corner cases where it just doesn't fit in. See my previous mail for more details. >>> As Alex mentioned, 9P is chosen for its mere simplicity and easy >>> adaptability. >>> NFS and CIFS does not give that flexibility. As we mentioned in the patch >>> series, we are >>> already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU >>> knowledge >>> to share physical resources like memory/cache between the host and the >>> guest. >>> >>> I think looking into the windows side of 9P client would be great option >>> too. >>> The current patch on the mailing list supports .U (unix) protocol and will >>> be introducing >>> .L (Linux) variant as we move forward. >> Hi Mohammed, >> Please let us know once you decide on where your interest lies. >> Will be glad to have you on VirtFS (9P) though. :) >> >> >> - JV >> > > It seems the community is more keen on getting 9P support merged than > getting CIFS supported, and they have made good points to support > their argument. I'm not sure whether work on this project could fit in > as a GSoC project and if there is much remaining work that could make > it fit in that direction. But I'd be glad to volunteer anyway :) I was thinking over the wk-end what fit
Re: [PATCH 0/1] trace all instructions whose emulation failed
On Mon, Apr 19, 2010 at 4:20 AM, Avi Kivity wrote: >> } >> >> > > It's better not to trace #UD triggered emulations, since we except these to > fail, for example if the guest executes the UD2 instruction. > ya. that sounds more logical. Thanks for explaining. --- regards Manish Regmi http://manish-cs.blogspot.com http://ext2read.sf.net -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Add mergeable RX bufs support to vhost
This patch adds the mergeable RX buffers feature to vhost. Signed-off-by: David L Stevens diff -ruNp net-next-p0/drivers/vhost/net.c net-next-v4/drivers/vhost/net.c --- net-next-p0/drivers/vhost/net.c 2010-03-22 12:04:38.0 -0700 +++ net-next-v4/drivers/vhost/net.c 2010-04-19 14:23:38.0 -0700 @@ -108,7 +108,7 @@ static void handle_tx(struct vhost_net * }; size_t len, total_len = 0; int err, wmem; - size_t hdr_size; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq->private_data); if (!sock) return; @@ -127,13 +127,13 @@ static void handle_tx(struct vhost_net * if (wmem < sock->sk->sk_sndbuf / 2) tx_poll_stop(net); - hdr_size = vq->hdr_size; + vhost_hlen = vq->vhost_hlen; for (;;) { - head = vhost_get_vq_desc(&net->dev, vq, vq->iov, -ARRAY_SIZE(vq->iov), -&out, &in, -NULL, NULL); + head = vhost_get_desc(&net->dev, vq, vq->iov, + ARRAY_SIZE(vq->iov), + &out, &in, + NULL, NULL); /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq->num) { wmem = atomic_read(&sock->sk->sk_wmem_alloc); @@ -154,20 +154,20 @@ static void handle_tx(struct vhost_net * break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq->iov, vq->hdr, hdr_size, out); + s = move_iovec_hdr(vq->iov, vq->hdr, vhost_hlen, out); msg.msg_iovlen = out; len = iov_length(vq->iov, out); /* Sanity check */ if (!len) { vq_err(vq, "Unexpected header len for TX: " "%zd expected %zd\n", - iov_length(vq->hdr, s), hdr_size); + iov_length(vq->hdr, s), vhost_hlen); break; } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(NULL, sock, &msg, len); if (unlikely(err < 0)) { - vhost_discard_vq_desc(vq); + vhost_discard_desc(vq, 1); tx_poll_start(net, sock); break; } @@ -186,12 +186,25 @@ static void handle_tx(struct vhost_net * unuse_mm(net->dev.mm); } +static int vhost_head_len(struct vhost_virtqueue *vq, struct sock *sk) +{ + struct sk_buff *head; + int len = 0; + + lock_sock(sk); + head = skb_peek(&sk->sk_receive_queue); + if (head) + len = head->len + vq->sock_hlen; + release_sock(sk); + return len; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_rx(struct vhost_net *net) { struct vhost_virtqueue *vq = &net->dev.vqs[VHOST_NET_VQ_RX]; - unsigned head, out, in, log, s; + unsigned in, log, s; struct vhost_log *vq_log; struct msghdr msg = { .msg_name = NULL, @@ -202,14 +215,14 @@ static void handle_rx(struct vhost_net * .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr_mrg_rxbuf hdr = { + .hdr.flags = 0, + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t len, total_len = 0; - int err; - size_t hdr_size; + int err, headcount, datalen; + size_t vhost_hlen; struct socket *sock = rcu_dereference(vq->private_data); if (!sock || skb_queue_empty(&sock->sk->sk_receive_queue)) return; @@ -217,18 +230,18 @@ static void handle_rx(struct vhost_net * use_mm(net->dev.mm); mutex_lock(&vq->mutex); vhost_disable_notify(vq); - hdr_size = vq->hdr_size; + vhost_hlen = vq->vhost_hlen; vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ? vq->log : NULL; - for (;;) { - head = vhost_get_vq_desc(&net->dev, vq, vq->iov, -ARRAY_SIZE(vq->iov), -&out, &in, -vq_log, &log); + while ((datalen = vhost_head_len(vq, sock->sk))) { + headcount = vhost_get_desc_n(vq, vq->heads, datalen+vhost_hlen, +&in, vq_log, &log); + if (headcount < 0) + break;
[PATCH V3] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes
These are changes to uio_pci_generic.c to allow better use of the driver by non-privileged processes. 1. Add back old code which allowed interrupt re-enablement through uio fd. 2. Translate PCI bards to uio mmap regions, to allow mmap through uio fd. 3. Allow devices which support MSI or MSI-X, but not IRQ. Signed-off-by: Tom Lyon --- checkpatch.pl is happy with this one. --- linux-2.6.33/drivers/uio/uio_pci_generic.c 2010-02-24 10:52:17.0 -0800 +++ mylinux-2.6.33/drivers/uio/uio_pci_generic.c2010-04-19 14:57:21.0 -0700 @@ -14,9 +14,10 @@ * # ls -l /sys/bus/pci/devices/:00:19.0/driver * .../:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic * - * Driver won't bind to devices which do not support the Interrupt Disable Bit - * in the command register. All devices compliant to PCI 2.3 (circa 2002) and - * all compliant PCI Express devices should support this bit. + * Driver won't bind to devices which do not support MSI, MSI-x, or the + * Interrupt Disable Bit in the command register. All devices compliant + * to PCI 2.3 (circa 2002) and all compliant PCI Express devices should + * support one of these. */ #include @@ -27,7 +28,7 @@ #define DRIVER_VERSION "0.01.0" #define DRIVER_AUTHOR "Michael S. Tsirkin " -#define DRIVER_DESC"Generic UIO driver for PCI 2.3 devices" +#define DRIVER_DESC"Generic UIO driver for PCIe & PCI 2.3 devices" struct uio_pci_generic_dev { struct uio_info info; @@ -41,6 +42,39 @@ to_uio_pci_generic_dev(struct uio_info * return container_of(info, struct uio_pci_generic_dev, info); } +/* Read/modify/write command register to disable interrupts. + * Note: we could cache the value and optimize the read if there was a way to + * get notified of user changes to command register through sysfs. + * */ +static void irqtoggle(struct uio_pci_generic_dev *gdev, int irq_on) +{ + struct pci_dev *pdev = gdev->pdev; + unsigned long flags; + u16 orig, new; + + spin_lock_irqsave(&gdev->lock, flags); + pci_block_user_cfg_access(pdev); + pci_read_config_word(pdev, PCI_COMMAND, &orig); + new = irq_on ? (orig & ~PCI_COMMAND_INTX_DISABLE) +: (orig | PCI_COMMAND_INTX_DISABLE); + if (new != orig) + pci_write_config_word(gdev->pdev, PCI_COMMAND, new); + pci_unblock_user_cfg_access(pdev); + spin_unlock_irqrestore(&gdev->lock, flags); +} + +/* irqcontrol is use by userspace to enable/disable interrupts. */ +/* A privileged app can write the PCI_COMMAND register directly, + * but we need this for normal apps + */ +static int irqcontrol(struct uio_info *info, s32 irq_on) +{ + struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info); + + irqtoggle(gdev, irq_on); + return 0; +} + /* Interrupt handler. Read/modify/write the command register to disable * the interrupt. */ static irqreturn_t irqhandler(int irq, struct uio_info *info) @@ -89,7 +123,7 @@ done: /* Verify that the device supports Interrupt Disable bit in command register, * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly * in PCI 2.2. */ -static int __devinit verify_pci_2_3(struct pci_dev *pdev) +static int verify_pci_2_3(struct pci_dev *pdev) { u16 orig, new; int err = 0; @@ -121,17 +155,52 @@ err: return err; } -static int __devinit probe(struct pci_dev *pdev, +/* we could've used the generic pci sysfs stuff for mmap, + * but this way we can allow non-privileged users as long + * as /dev/uio* has the right permissions + */ +static void uio_do_maps(struct uio_pci_generic_dev *gdev) +{ + struct pci_dev *pdev = gdev->pdev; + struct uio_info *info = &gdev->info; + int i, j; + char *name; + + for (i = 0, j = 0; i < PCI_STD_RESOURCE_END && j < MAX_UIO_MAPS; i++) { + if (pci_resource_flags(pdev, i) & IORESOURCE_MEM) { + name = kmalloc(8, GFP_KERNEL); + if (name == NULL) + break; + sprintf(name, "membar%d", i); + info->mem[j].name = name; + info->mem[j].addr = pci_resource_start(pdev, i); + info->mem[j].size = pci_resource_len(pdev, i); + info->mem[j].memtype = UIO_MEM_PHYS; + j++; + } + } + for (i = 0, j = 0; i < PCI_STD_RESOURCE_END && + j < MAX_UIO_PORT_REGIONS; i++) { + if (pci_resource_flags(pdev, i) & IORESOURCE_IO) { + name = kmalloc(8, GFP_KERNEL); + if (name == NULL) + break; + sprintf(name, "iobar%d", i); + info->port[j].name = name; + info->port[j].start = pci_resource_start(pdev, i); + info->port[j].siz
Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Fri, Feb 19, 2010 at 12:22:20AM +0200, Michael S. Tsirkin wrote: > I took a stub at documenting CMD and FLUSH request types in virtio > block. Any comments? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
About cpu_set, CPU hotplug and related subjects
Hi folks, I've implemented a functional test for the cpu_set feature some time ago. I was going through the patch queue and realized the patch needed some respin, so I did it. I decided to try and see what is the state of the feature, considering last time I tried it was not working (segfaulting). Now, with the latest qemu-kvm things are on a much better shape, cpu_set is not giving me segfaults, and qemu monitor reports the new cpus added. 10:21:20 INFO | Adding 1 CPUs to guest 10:21:20 DEBUG| Sending monitor command: cpu_set 0 online 10:21:20 DEBUG| Sending monitor command: cpu_set 1 online 10:21:20 DEBUG| Sending monitor command: info cpus 10:21:20 DEBUG| Output of info cpus: * CPU #0: pc=0x8102e23d thread_id=11035 CPU #1: pc=0x thread_id=11065 The address of the CPU #1 seems a little strange. The guest OS is completely unaware of any changes to the number of CPUs though. After doing some reading it seems to me that the reason why that is not happening is because SeaBIOS still doesn't have code to support CPU hot plugging as BochsBIOS did. I also looked after some documentation about how that feature is supposed to work, and didn't find it. So, assuming my understanding about the current status is correct: 1) Is anybody planning on adding the necessary support to SeaBIOS anytime soon? 2) How the whole functionality is supposed to work? My mental model goes like this: * cpu_set [total number of CPUs you want the system to have] online * Guest should notice new CPUS added to the system and they appear under /sys/devices/system/cpu/cpu*, initially offlined * The online status of each CPU is in /sys/devices/system/cpu/cpu*/online, and onlining it is just a matter of writing 1 to this file * It is not possible to downgrade the number of CPUs of the system during the lifetime of the VM process Thanks for your attention, Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
QEMU-KVM and video performance
Hello, Finally I got QEMU-KVM to work but video performance under DOS is very low (QEMU 0.12.3 stable and QEMU GIT master branch is fast, QEMU KVM is slow) I'm measuring 2 performance critical video performance parameters: 1.) INT 10h, function AX=4F05h (set same window/set window/get window) 2.) Memory performance to segment page A000h So BIOS performance (which might be port performance to VGA index/value port) is about factor 5 slower, memory performance is about factor 100 slower. QEMU 0.12.3 and QEMU GIT performance is the same (in the measurement tolerance) and listed only once, QEMU KVM is much more slower (details see below). Test programs can be provided, source code will be release soon. Any ideas why KVM is so slow? Any ideas for improvement? Thnx. Ciao, Gerhard -- http://www.wiesinger.com/ == QEMU 0.12.3 and QEMU GIT == INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger VESA set window: seconds=2, operations=100, ops/s=50.0 VESA set window (nc): seconds=2, operations=100, ops/s=50.0 VESA get window: seconds=2, operations=100, ops/s=50.0 MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger BYTE performance, time=10s, bytes=930611200, rate=88.750 MB/s WORD performance, time=10s, bytes=766771200, rate=73.125 MB/s DWORD performance, time=10s, bytes=812646400, rate=77.500 MB/s QWORD performance, time=10s, bytes=806092800, rate=76.875 MB/s == == QEMU-KVM == INT10PER performance information V1.0, (c) 2010 by Gerhard Wiesinger VESA set window: seconds=9, operations=100, ops/s=11.1 VESA set window (nc): seconds=9, operations=100, ops/s=11.1 VESA get window: seconds=5, operations=100, ops/s=20.0 MEMPERF video performance V1.0, (c) 2010 by Gerhard Wiesinger BYTE performance, time=13s, bytes=13107200, rate=0.962 MB/s WORD performance, time=13s, bytes=13107200, rate=0.962 MB/s DWORD performance, time=12s, bytes=13107200, rate=1.042 MB/s QWORD performance, time=13s, bytes=13107200, rate=0.962 MB/s == -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] Autotest: Unattended_install testcase always fail with rhel3.9-32 guest
On Sun, 2010-04-18 at 21:32 -0600, David S. Ahern wrote: > On 04/18/2010 12:26 PM, Lucas Meneghel Rodrigues wrote: > > On Sat, 2010-04-17 at 22:55 -0600, David S. Ahern wrote: > >> > >> On 04/17/2010 10:09 PM, Amos Kong wrote: > >>> %post --interpreter /usr/bin/python > >>> import socket, os > >>> os.system('dhclient') > >>> os.system('chkconfig sshd on') > >>> os.system('iptables -F') > >>> os.system('echo 0 > /selinux/enforce') > >>> server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) > >>> server.bind(('', 12323)) > >>> server.listen(1) > >>> (client, addr) = server.accept() > >>> client.send("done") > >>> client.close() > >> > >> So, effectively after the install completes use dhclient to configure a > >> network address, start a server on a known port and when a client > >> connects send the message "done". I would expect that to work just fine. > > > > Me too, it has been working for RHEL 4.X, 5.X 32/64 bit and 3.X 64 bit. > > The problem has been effectively 3.9 32 bit. > > I fired up a 3.9 guest with your ks.cfg. The problem is due to the > limited functionality in the RHEL3 BOOT kernel for i386. Specifically, > dhclient is failing at: > > setsockopt(6, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\6\10\240Y\n\10", 8) = > -1 ENOPROTOOPT > > So dhclient client is out. But you can still configure and use > networking via ifconfig if static addressing is an option for you. I was > able to use that command to configure eth0 and push an strace output > file for dhclient. > > Also, a couple of comments on this use case: > - SELinux is not applicable > - 32 GB of RAM is way beyond what the RHEL3 i386 can detect and use > - 12 vcpus seems high as well. Problem resolved :) Yay! http://autotest.kernel.org/changeset/4432 Thank you very much David, -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Fix RHEL 3.9 32 bit installation
Fix a long standing bug where the unattended installs for RHEL 3.9 32 bit were failing due to some limitations on the boot install kernel for this specific version. Instead of using dhclient, resort to static IP configuration and bring joy to the land. This modification to the kickstart file does not regress 3.9 64 bit. Special thanks go to David S. Ahern for the time spent on resolving this problem. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/unattended/RHEL-3-series.ks |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks b/client/tests/kvm/unattended/RHEL-3-series.ks index ad748cb..884b386 100644 --- a/client/tests/kvm/unattended/RHEL-3-series.ks +++ b/client/tests/kvm/unattended/RHEL-3-series.ks @@ -24,10 +24,9 @@ skipx %post --interpreter /usr/bin/python import socket, os -os.system('dhclient') +os.system('/sbin/ifconfig eth0 10.0.2.15 netmask 255.255.255.0 up') +os.system('/sbin/route add default gw 10.0.2.2') os.system('chkconfig sshd on') -os.system('iptables -F') -os.system('echo 0 > /selinux/enforce') server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.bind(('', 12323)) server.listen(1) -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Forum 2010: Call for Papers
= CALL FOR PAPERS KVM Forum 2010 = DESCRIPTION The KVM Forum is back! After a break last year we're proud to present this year's gathering around KVM again. The idea is to have everyone involved with KVM development come together to talk about the future and current state of KVM, teaching everyone some pieces of the puzzle they might be missing without. So if you're a KVM developer, mark the dates in your calendar! If possible, also submit a talk -- we're interested in a wide variety of KVM topics, so don't hesitate to propose a talk on your work. If you're not a KVM developer, please read on nevertheless (or jump to END USER COLLABORATION). DATES / LOCATION Conference: August 9 - 10, 2010 Location: Renaissance Boston Waterfront in Boston, MA Abstracts due: May 14th, 2010 Notification: May 28th, 2010 Yes, we're colocated with LinuxCon. Tickets for the KVM Forum also count for LinuxCon. http://events.linuxfoundation.org/component/registrationpro/?func=details&did=34 PROCESS At first check if it's before May 14th. If you're past that date, you're out of luck. Now try to think hard and come up with a great idea that you could talk about. Once you have that set, we need you to write up a short abstract (~150 words) on it. In your submission please note how long your talk will take. Slots vary in length up to one hour. Also include in your proposal the proposal type -- one of: technical talk, breakout session, or end-user talk. Add that information to the abstract and submit it at the following URL: http://events.linuxfoundation.org/cfp/cfp-add Now, wait until May 24th. You will receive a notification on whether your talk was accepted or not. SCOPE OF TALKS We have a list of suggested presentation topics below. These suggestions are just for guidance, please feel free to submit a proposal on any of these or related topics. In general, the more it's about backend infrastructure, the better. KVM - Scaling and performance - Nested virtualization - I/O improvements - Driver domains - Time keeping - Memory management (page sharing, swapping, huge pages, etc) - Fault tolerance - VEPA, vswitch Embedded KVM - KVM on ARM, PPC, MIPS, ...? - Real-time requirements host/guest - Device pass-through w/o iommu - Custom device/platform models QEMU - Device model improvements - New devices - Security model - Scaling and performance - Desktop virtualization - Increasing robustness - Management interfaces - QMP protocol and implementation - Live migration Virtio - Speeding up existing devices - Vhost - Alternatives - Using virtio in non-kvm environments - Virtio on non-Linux Management infrastructure - Libvirt - Kvm autotest - Easy networking - Qemud BREAKOUT SESSION We will reserve some time each day to break out for working sessions. These sessions will be less formal than a presentation and more focused on developing a solution to some real development issue. If you are interested in getting developers together to hack on some code, submit your proposal and just make it clear it's a breakout session proposal. END-USER COLLABORATION One of the big challenges as developers is to know what, where and how people actually use our software. To solve this issue at least a little, there will be a few slots reserved for end users talking about their deployments, problems and achievements. So if you have a KVM based deployment running in production or are about to roll out one, please also submit a talk (see PROCESS), and simply mark it asn an end-user collaboration proposal. We would love to have an open discussion of fields where KVM/Qemu can still improve and you would have the unique chance to steer that process! Keep in mind that most of the Forum will be focused on development though, so we suggest you also come with a good portion of technical interest :-). And of course, no product marketing please! The purpose is to engage with KVM developers. LIGHTNING TALKS In addition to submitted talks we will also have some room for lightning talks. So if you have something you think might be done until the KVM Forum, but you're not sure you could fill 15 minutes with it. Or if you don't know if you'll make it until there, just keep in mind that you will still get the chance to talk about it. Lightning talk submissions and scheduling will be handled on-site at KVM Forum. Thank you for your interest in KVM. We're looking forward to your submissions and seeing you at the KVM Forum 2010 in August! Now, start thinking about that talk you want to give. Thanks, your KVM Forum 2010 Program Commitee Alexander Graf, Novell Anthony Liguori, IBM Avi Kivity, Red Hat Chris Wright, Red Hat Dor Laor
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 12:54 AM, Avi Kivity wrote: On 04/19/2010 01:51 PM, Peter Zijlstra wrote: Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and synced between cores. Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)? Not sure, IIRC we clear that when the TSC sync test fails, eg when we mark the tsc clocksource unusable. Worrying. By the time we detect this the guest may already have gotten confused by clocks going backwards. Upstream, we are marking the TSC unstable preemptively when hardware which will eventually sync test is detected, so this should be fine. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, Apr 19, 2010 at 09:19:38AM -0700, Jeremy Fitzhardinge wrote: > On 04/19/2010 07:26 AM, Glauber Costa wrote: > >> Is the problem that the tscs are starting out of sync, or that they're > >> drifting relative to each other over time? Do the problems become worse > >> the longer the uptime? How large are the offsets we're talking about here? > >> > > The offsets usually seem pretty small, under a microsecond. So I don't think > > it has anything to do with tscs starting out of sync. Specially because the > > delta-based calculation has the exact purpose of handling that case. > > > > So you think they're drifting out of sync from an initially synced > state? If so, what would bound the drift? I think delta calculation introduces errors. Marcelo can probably confirm it, but he has a nehalem with an appearently very good tsc source. Even this machine warps. It stops warping if we only write pvclock data structure once and forget it, (which only updated tsc_timestamp once), according to him. Obviously, we can't do that everywhere. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
FYI, i found a few problems that need fixing: > + unsigned long ip; > + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) missing newline. > + int misc = 0; > + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { ditto. > + PERF_RECORD_MISC_GUEST_KERNEL; > + } else > + misc |= user_mode(regs) ? PERF_RECORD_MISC_USER : > + PERF_RECORD_MISC_KERNEL; - unbalanced curly braces - missing curly brace for multi-line statement. - unnecessary line-break due to col80 warning from checkpatch > +extern struct perf_guest_info_callbacks *perf_guest_cbs; > +extern int perf_register_guest_info_callbacks( > + struct perf_guest_info_callbacks *); > +extern int perf_unregister_guest_info_callbacks( > + struct perf_guest_info_callbacks *); - unnecessary line-break due to col80 warning from checkpatch > +static inline int perf_register_guest_info_callbacks > +(struct perf_guest_info_callbacks *) {return 0; } > +static inline int perf_unregister_guest_info_callbacks > +(struct perf_guest_info_callbacks *) {return 0; } - invalid C: function parameter needs name even if unused - missing space after opening curly brace Please provide delta fixes. Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 07:26 AM, Glauber Costa wrote: >> Is the problem that the tscs are starting out of sync, or that they're >> drifting relative to each other over time? Do the problems become worse >> the longer the uptime? How large are the offsets we're talking about here? >> > The offsets usually seem pretty small, under a microsecond. So I don't think > it has anything to do with tscs starting out of sync. Specially because the > delta-based calculation has the exact purpose of handling that case. > So you think they're drifting out of sync from an initially synced state? If so, what would bound the drift? J -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 07:46 AM, Peter Zijlstra wrote: > What avi says! :-) > > On a 32bit machine a 64bit read are two 32bit reads, so > > last = last_value; > > becomes: > > last.high = last_value.high; > last.low = last_vlue.low; > > (or the reverse of course) > > Now imagine a write getting interleaved with that ;-) > You could explicitly do: do { h = last.high; barrier(); l = last.low; barrier(); } while (last.high != h); This works because we expect last to be always increasing, so the only worry is low wrapping and incrementing high, and is more efficient than making the read fully atomic (the write is still cmpxchg64). But it's pretty ugly to open code just for 32b architectures; its something that might be useful to turn into a general abstraction (monotonic_read_64 FTW!). I already have code like this in the Xen time code, so I could make immediate use of it. J -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 07:33 AM, Avi Kivity wrote: > On 04/19/2010 05:21 PM, Glauber Costa wrote: >> >>> Oh yes, just trying to avoid a patch with both atomic64_read() and >>> ACCESS_ONCE(). >>> >> you're mixing the private version of the patch you saw with this one. >> there isn't any atomic reads in here. I'll use a barrier then >> > > This patch writes last_value atomically, but reads it non-atomically. > A barrier is insufficient. Well, on a 32b system, you can explicitly order the updates of low and high, then do a high-low-checkhigh read. That would be much more efficient than atomic64. If we really care about 32b. J -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_out_of_tree
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/299 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Adam Litke ,Alexander Graf ,Amos Kong ,Anthony Liguori ,Arnaud Patard (Rtp) ,Aurelien Jarno ,Avi Kivity ,Bjørn Mork ,Blue Swirl ,Chris Webb ,Christoph Hellwig ,Dmitry Ilyevsky ,Edgar E. Iglesias ,Gerd Hoffmann ,Jan Kiszka ,Johan Bengtsson ,Juan Quintela ,Juergen Lock ,Kevin Wolf ,Lars Munch ,Marcelo Tosatti ,Markus Armbruster ,Max Reitz ,Michael Casadevall ,Michael S. Tsirkin ,Michael Tokarev ,Naphtali Sprei ,Paolo Bonzini ,Paul Bolle ,Paul Brook ,Rabin Vincent ,Richard Henderson ,Riku Voipio ,Rob Landley ,Ryota Ozaki ,Shahar Havivi ,Stefan Weil ,TeLeMan ,malc BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] export new cpuid KVM_CAP
On Sat, Apr 17, 2010 at 09:58:26PM +0300, Avi Kivity wrote: > On 04/15/2010 09:37 PM, Glauber Costa wrote: > >Since we're changing the msrs kvmclock uses, we have to communicate > >that to the guest, through cpuid. We can add a new KVM_CAP to the > >hypervisor, and then patch userspace to recognize it. > > > >And if we ever add a new cpuid bit in the future, we have to do that again, > >which create some complexity and delay in feature adoption. > > > >Instead, what I'm proposing in this patch is a new capability, called > >KVM_CAP_X86_CPUID_FEATURE_LIST, that returns the current feature list > >currently supported by the hypervisor. If we ever want to add or remove > >some feature, we only need to tweak into the HV, leaving userspace untouched. > > > > Hm. We need to update userspace anyway, since we don't like turning > features on unconditionally (it breaks live migration into an older > kernel). Right now, we don't have any mechanism to disable, say, kvmclock cpuid bit at userspace. But let's suppose we have: What's the difference between disabling it in the way it is now, and disabling it with the method I am proposing? All this ioctl say is: "Those are the current supported stuff in this HV". It does not mandate userspace to expose all of this to the guest. It just saves us from the job of creating yet another CAP for every bit we plan on including. If we want to be conservative, we can keep everything but the things we know already disabled, in userspace. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_out_of_tree
The Buildbot has detected a new failure of default_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/300 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch master] HEAD Blamelist: Adam Litke ,Alexander Graf ,Amos Kong ,Anthony Liguori ,Arnaud Patard (Rtp) ,Aurelien Jarno ,Avi Kivity ,Bjørn Mork ,Blue Swirl ,Chris Webb ,Christoph Hellwig ,Dmitry Ilyevsky ,Edgar E. Iglesias ,Gerd Hoffmann ,Jan Kiszka ,Johan Bengtsson ,Juan Quintela ,Juergen Lock ,Kevin Wolf ,Lars Munch ,Marcelo Tosatti ,Markus Armbruster ,Max Reitz ,Michael Casadevall ,Michael S. Tsirkin ,Michael Tokarev ,Naphtali Sprei ,Paolo Bonzini ,Paul Bolle ,Paul Brook ,Rabin Vincent ,Richard Henderson ,Riku Voipio ,Rob Landley ,Ryota Ozaki ,Shahar Havivi ,Stefan Weil ,TeLeMan ,malc BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_debian_5_0
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/361 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch master] HEAD Blamelist: Adam Litke ,Alexander Graf ,Amos Kong ,Anthony Liguori ,Arnaud Patard (Rtp) ,Aurelien Jarno ,Avi Kivity ,Bjørn Mork ,Blue Swirl ,Chris Webb ,Christoph Hellwig ,Dmitry Ilyevsky ,Edgar E. Iglesias ,Gerd Hoffmann ,Jan Kiszka ,Johan Bengtsson ,Juan Quintela ,Juergen Lock ,Kevin Wolf ,Lars Munch ,Marcelo Tosatti ,Markus Armbruster ,Max Reitz ,Michael Casadevall ,Michael S. Tsirkin ,Michael Tokarev ,Naphtali Sprei ,Paolo Bonzini ,Paul Bolle ,Paul Brook ,Rabin Vincent ,Richard Henderson ,Riku Voipio ,Rob Landley ,Ryota Ozaki ,Shahar Havivi ,Stefan Weil ,TeLeMan ,malc BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 17:33 +0300, Avi Kivity wrote: > On 04/19/2010 05:21 PM, Glauber Costa wrote: > > > >> Oh yes, just trying to avoid a patch with both atomic64_read() and > >> ACCESS_ONCE(). > >> > > you're mixing the private version of the patch you saw with this one. > > there isn't any atomic reads in here. I'll use a barrier then > > > > This patch writes last_value atomically, but reads it non-atomically. A > barrier is insufficient. What avi says! :-) On a 32bit machine a 64bit read are two 32bit reads, so last = last_value; becomes: last.high = last_value.high; last.low = last_vlue.low; (or the reverse of course) Now imagine a write getting interleaved with that ;-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 05:32 PM, Glauber Costa wrote: Right, another option is to put the initial read outside of the loop, that way you'll have the best of all cases, a single LOCK'ed op in the loop, and only a single LOCK'ed op for the fast path on sensible architectures ;-) last = atomic64_read(&last_value); isn't a barrier enough here? No. On i386, the statement last = last_value; will be split by the compiler into two 32-bit loads. If a write (atomic, using cmpxchg) on another cpu happens between those two loads, then the variable last will have a corrupted value. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 05:21 PM, Glauber Costa wrote: Oh yes, just trying to avoid a patch with both atomic64_read() and ACCESS_ONCE(). you're mixing the private version of the patch you saw with this one. there isn't any atomic reads in here. I'll use a barrier then This patch writes last_value atomically, but reads it non-atomically. A barrier is insufficient. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, Apr 19, 2010 at 01:19:43PM +0200, Peter Zijlstra wrote: > On Mon, 2010-04-19 at 14:13 +0300, Avi Kivity wrote: > > On 04/19/2010 01:56 PM, Peter Zijlstra wrote: > > > > > > > > >>> Right, do bear in mind that the x86 implementation of atomic64_read() is > > >>> terrifyingly expensive, it is better to not do that read and simply use > > >>> the result of the cmpxchg. > > >>> > > >>> > > >>> > > >> atomic64_read() _is_ cmpxchg64b. Are you thinking of some clever > > >> implementation for smp i386? > > >> > > > > > > No, what I was suggesting was to rewrite that loop no to need the > > > initial read but use the cmpxchg result of the previous iteration. > > > > > > Something like: > > > > > >u64 last = 0; > > > > > >/* more stuff */ > > > > > >do { > > > if (ret< last) > > >return last; > > > last = cmpxchg64(&last_value, last, ret); > > >} while (last != ret); > > > > > > That only has a single cmpxchg8 in there per loop instead of two > > > (avoiding the atomic64_read() one). > > > > > > > Still have two cmpxchgs in the common case. The first iteration will > > fail, fetching last_value, the second will work. > > > > It will be better when we have contention, though, so it's worthwhile. > > Right, another option is to put the initial read outside of the loop, > that way you'll have the best of all cases, a single LOCK'ed op in the > loop, and only a single LOCK'ed op for the fast path on sensible > architectures ;-) > > last = atomic64_read(&last_value); isn't a barrier enough here? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: Drop cr4.pge from shadow page role
Since commit bf47a760f66ad, we no longer handle ptes with the global bit set specially, so there is no reason to distinguish between shadow pages created with cr4.gpe set and clear. Such tracking is expensive when the guest toggles cr4.pge, so drop it. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 - arch/x86/kvm/mmutrace.h |3 +-- arch/x86/kvm/x86.c |1 - 3 files changed, 1 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3c31c5a..d47d087 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -178,7 +178,6 @@ union kvm_mmu_page_role { unsigned direct:1; unsigned access:3; unsigned invalid:1; - unsigned cr4_pge:1; unsigned nxe:1; }; }; diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 3851f1f..bc4f7f0 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -28,7 +28,7 @@ \ role.word = __entry->role; \ \ - trace_seq_printf(p, "sp gfn %llx %u%s q%u%s %s%s %spge" \ + trace_seq_printf(p, "sp gfn %llx %u%s q%u%s %s%s" \ " %snxe root %u %s%c", \ __entry->gfn, role.level, \ role.cr4_pae ? " pae" : "",\ @@ -36,7 +36,6 @@ role.direct ? " direct" : "", \ access_str[role.access], \ role.invalid ? " invalid" : "",\ -role.cr4_pge ? "" : "!", \ role.nxe ? "" : "!", \ __entry->root_count, \ __entry->unsync ? "unsync" : "sync", 0); \ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 58a96e6..d7d58ce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -488,7 +488,6 @@ void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) } kvm_x86_ops->set_cr4(vcpu, cr4); vcpu->arch.cr4 = cr4; - vcpu->arch.mmu.base_role.cr4_pge = (cr4 & X86_CR4_PGE) && !tdp_enabled; kvm_mmu_reset_context(vcpu); } EXPORT_SYMBOL_GPL(kvm_set_cr4); -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Fri, Apr 16, 2010 at 01:36:34PM -0700, Jeremy Fitzhardinge wrote: > On 04/15/2010 11:37 AM, Glauber Costa wrote: > > In recent stress tests, it was found that pvclock-based systems > > could seriously warp in smp systems. Using ingo's time-warp-test.c, > > I could trigger a scenario as bad as 1.5mi warps a minute in some systems. > > > > Is that "1.5 million"? Yes it is. But as I said, this seem to be a very deep worst case scenario. Most of boxes are not even close to being that bad. > > > (to be fair, it wasn't that bad in most of them). Investigating further, I > > found out that such warps were caused by the very offset-based calculation > > pvclock is based on. > > > > Is the problem that the tscs are starting out of sync, or that they're > drifting relative to each other over time? Do the problems become worse > the longer the uptime? How large are the offsets we're talking about here? The offsets usually seem pretty small, under a microsecond. So I don't think it has anything to do with tscs starting out of sync. Specially because the delta-based calculation has the exact purpose of handling that case. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, Apr 19, 2010 at 02:10:54PM +0300, Avi Kivity wrote: > On 04/19/2010 02:05 PM, Peter Zijlstra wrote: > >> > >>>ACCESS_ONCE() is your friend. > >>> > >>I think it's implied with atomic64_read(). > >Yes it would be. I was merely trying to point out that > > > > last = ACCESS_ONCE(last_value); > > > >Is a narrower way of writing: > > > > last = last_value; > > barrier(); > > > >In that it need not clobber all memory locations and makes it instantly > >clear what we want the barrier for. > > Oh yes, just trying to avoid a patch with both atomic64_read() and > ACCESS_ONCE(). you're mixing the private version of the patch you saw with this one. there isn't any atomic reads in here. I'll use a barrier then -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2989366 ] huge memory leak (qemu-kvm 0.12.3)
Bugs item #2989366, was opened at 2010-04-19 15:47 Message generated for change (Tracker Item Submitted) made by tgr1 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: tgr (tgr1) Assigned to: Nobody/Anonymous (nobody) Summary: huge memory leak (qemu-kvm 0.12.3) Initial Comment: CPU: Xeon E5450 KVM: qemu-kvm 0.12.3 Host: Debian Squeeze amd64, kernel 2.6.32 Guest: Debian Lenny amd64, kernel 2.6.26 qemu command line below (no helpers like libvirt used) whether the problem goes away if using the -no-kvm-irqchip or -no-kvm-pit switch: seems unrelated whether the problem also appears with the -no-kvm switch: load profile of the guest makes running without KVM infeasible The problem: after 4 days, the kvm process of a guest run with -mem 3072 takes 12 GB VSZ / 9 GB RSS - grows until OOM ("kvm invoked oom-killer" in dmesg). At that point the guest shows more than 2 GB of free memory. qemu command line: kvm -smp 4 -m 3072 -net nic,vlan=0,model=virtio,name=eth0,macaddr=x:x:x:x:x:x -net tap,vlan=0,ifname=tap6 -net nic,vlan=1,model=virtio,name=eth1,macaddr=x:x:x:x:x:x -net tap,vlan=1,ifname=tap7 -drive if=virtio,file=/dev/x/x-rootfs,index=0 -drive if=virtio,file=/dev/x/x-swap,index=1 -name x -nographic -kernel /boot/vmlinuz-2.6.26-2-amd64 -initrd /boot/initrd.img-2.6.26-2-amd64 -append root=/dev/vda console=ttyS0 -echr 2 -serial mon:telnet:localhost:12344,server,nowait -monitor unix:/var/run/kvm-x,server,nowait -daemonize -watchdog i6300esb -watchdog-action reset (the monitor on both telnet and unix socket is intentional, the one on the socket is only used for sending system_reset and system_powerdown qemu commands). -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2989366&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix GFP flags passed from the virtio balloon driver
The virtio balloon driver can dig into the reservation pools of the OS to satisfy a balloon request. This is not advisable and other balloon drivers (drivers/xen/balloon.c) avoid this as well. The patch also adds changes to avoid printing a warning if allocation fails, since we retry after sometime anyway. To: kvm Cc: Rusty Russell Signed-off-by: Balbir Singh --- drivers/virtio/virtio_balloon.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 369f2ee..f8ffe8c 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -102,7 +102,8 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) num = min(num, ARRAY_SIZE(vb->pfns)); for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) { - struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY); + struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY | + __GFP_NOMEMALLOC | __GFP_NOWARN); if (!page) { if (printk_ratelimit()) dev_printk(KERN_INFO, &vb->vdev->dev, -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Add cpu_set subtest
Tests the ability of adding virtual cpus on the fly to qemu using the monitor command cpu_set, then after everything is OK, run the cpu_hotplug testsuite on the guest through autotest. Updates: The cpu_set feature has been worked out and is in better shape, however it is my understanding that SeaBIOS code has to be extended to support cpu_hotplug, like bochs bios did. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/tests/cpu_set.py | 95 client/tests/kvm/tests_base.cfg.sample |7 ++ 2 files changed, 102 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/cpu_set.py diff --git a/client/tests/kvm/tests/cpu_set.py b/client/tests/kvm/tests/cpu_set.py new file mode 100644 index 000..e79fc95 --- /dev/null +++ b/client/tests/kvm/tests/cpu_set.py @@ -0,0 +1,95 @@ +import os, logging, re +from autotest_lib.client.common_lib import error +import kvm_test_utils + +def run_cpu_set(test, params, env): +""" +Runs CPU set test: + +1) Pick up a living guest +2) Send the monitor command cpu_set [n cpus] +3) Verify if guest has the additional CPU showing up under +/sys/devices/system/cpu +4) Try to bring it online by writing 1 to the 'online' file inside that dir +5) Run the CPU Hotplug test suite shipped with autotest inside guest + +@param test: kvm test object. +@param params: Dictionary with test parameters. +@param env: Dictionary with the test environment. +""" +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm) + +n_cpus_add = int(params.get("n_cpus_add", 1)) +current_cpus = int(params.get("smp", 1)) +total_cpus = current_cpus + n_cpus_add +# Let's define some boundaries for this test +if total_cpus > 100: +raise error.TestError("Unsupported number of total CPUs: %s" % + total_cpus) + +dmesg_before = session.get_command_output("dmesg -c") + +logging.info("Adding %d CPUs to guest", n_cpus_add) +for i in range(total_cpus): +status, output = vm.send_monitor_cmd("cpu_set %s online" % i) +if status != 0: +raise error.TestFail("Failed to add CPU %s to guest: %s" % + (i, output)) + +status, output = vm.send_monitor_cmd("info cpus") +logging.debug("Output of info cpus:\n%s", output) +cpu_regexp = re.compile("CPU #(\d+)") +total_cpus_monitor = len(cpu_regexp.findall(output)) +if total_cpus_monitor != total_cpus: +raise error.TestFail("Monitor reports %s CPUs, when VM should have %s" % + (total_cpus_monitor, total_cpus)) + +dmesg_after = session.get_command_output("dmesg -c") +logging.debug("Guest dmesg output after CPU add:\n%s" % dmesg_after) + +# Verify whether the new cpus are showing up on /sys +n_cmd = 'find /sys/devices/system/cpu/cpu[0-99] -maxdepth 0 -type d | wc -l' +output = session.get_command_output(n_cmd) +try: +cpus_after_addition = int(output) +except ValueError: +logging.error("Verify command output: %s", output) +raise error.TestFail("Unable to get CPU count after CPU addition") + +if cpus_after_addition != total_cpus: +raise error.TestFail("%s CPUs are showing up under " + "/sys/devices/system/cpu, was expecting %s" % + (cpus_after_addition, total_cpus)) + +r_cmd = 'find /sys/devices/system/cpu/cpu[0-99]/online -maxdepth 0 -type f' +online_files = session.get_command_output(r_cmd).split().sort() + +if not online_files: +raise error.TestFail("Could not find CPUs that can be " + "enabled/disabled on guest") + +for online_file in online_files: +cpu_regexp = re.compile("cpu(\d+)", re.IGNORECASE) +cpu_id = cpu_regexp.findall(online_file)[0] +try: +check_online_status = int(session.get_command_output("cat %s" % + online_file)) +except ValueError: +raise error.TestFail("Unable to get online status from CPU %s" % + cpu_id) +if check_online_status == 0: +logging.debug("CPU %s offline, bringing it online" % cpu_id) +bring_online_status = session.get_command_status("echo 1 > %s" % + online_file) +if bring_online_status != 0: +raise error.TestError("Error bringing guest CPU %s online" % + cpu_id) + +# Now that all CPUs were onlined, let's execute the +# autotest CPU Hotplug test +control_path = os.path.join(test.bindir, "autotest_control", +"cpu_hotplug.control") +kvm_test_utils.run_autotest(vm, session, control_path,
Re: does qmp supports usb_add?
On Mon, Apr 19, 2010 at 08:55:37PM +0800, chunhui zhao wrote: > Thanks Dan. > However, as I checked this command "device_add", I did not found QMP > support it. > I use the command: > {"QMP": {"query-command"}} > In the return list, I did not found the "device_add". The list is as below: > - > {"return": [{"name": "quit"}, {"name": "eject"}, {"name": "change"}, > {"name": "stop"}, {"name": "cont"}, {"name": "system_reset"}, {"name": > "system_powerdown"}, {"name": "memsave"}, {"name": "pmemsave"}, > {"name": "migrate"}, {"name": "migrate_cancel"}, {"name": "pci_add"}, > {"name": "pci_del"}, {"name": "balloon"}, {"name": "getfd"}, {"name": > "closefd"}, {"name": "block_passwd"}, {"name": "query-version"}, > {"name": "query-commands"}, {"name": "query-chardev"}, {"name": > "query-block"}, {"name": "query-blockstats"}, {"name": "query-cpus"}, > {"name": "query-hpet"}, {"name": "query-kvm"}, {"name": > "query-status"}, {"name": "query-mice"}, {"name": "query-vnc"}, > {"name": "query-name"}, {"name": "query-uuid"}, {"name": > "query-migrate"}, {"name": "query-balloon"}]} > - > > And on the page : http://www.linux-kvm.org/page/MonitorProtocol. I > notice that : > - > TODO > High Priority > do_device_add()/do_device_del() conversions (markus) > do_netdev_add()/do_netdev_del() conversions (markus) > do_blockdev_add()/do_blockdev_del() conversions (markus) > Events Grouping (luiz) > Make qmp-shell work again (luiz) > Self-description & High-level protocol documentation > High-level internal documentation > Better QObjects and QMP debug support > - > > Cause the do_device_add is in the to_do list, does this mean it is not > finished yet? > > BTW: my qemu version is 0.12.3 which I believe is the latest version. It is not practical to use QMP with any 0.12.x version of QEMU. The 0.13.x release series will be the first where it is complete enough to use. Current GIT code is getting reasonably complete - waiting for netdev_add/del to be merged, and then blockdev_add/del will complete the functionality for device hotplug use cases. Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: does qmp supports usb_add?
Thanks Dan. However, as I checked this command "device_add", I did not found QMP support it. I use the command: {"QMP": {"query-command"}} In the return list, I did not found the "device_add". The list is as below: - {"return": [{"name": "quit"}, {"name": "eject"}, {"name": "change"}, {"name": "stop"}, {"name": "cont"}, {"name": "system_reset"}, {"name": "system_powerdown"}, {"name": "memsave"}, {"name": "pmemsave"}, {"name": "migrate"}, {"name": "migrate_cancel"}, {"name": "pci_add"}, {"name": "pci_del"}, {"name": "balloon"}, {"name": "getfd"}, {"name": "closefd"}, {"name": "block_passwd"}, {"name": "query-version"}, {"name": "query-commands"}, {"name": "query-chardev"}, {"name": "query-block"}, {"name": "query-blockstats"}, {"name": "query-cpus"}, {"name": "query-hpet"}, {"name": "query-kvm"}, {"name": "query-status"}, {"name": "query-mice"}, {"name": "query-vnc"}, {"name": "query-name"}, {"name": "query-uuid"}, {"name": "query-migrate"}, {"name": "query-balloon"}]} - And on the page : http://www.linux-kvm.org/page/MonitorProtocol. I notice that : - TODO High Priority do_device_add()/do_device_del() conversions (markus) do_netdev_add()/do_netdev_del() conversions (markus) do_blockdev_add()/do_blockdev_del() conversions (markus) Events Grouping (luiz) Make qmp-shell work again (luiz) Self-description & High-level protocol documentation High-level internal documentation Better QObjects and QMP debug support - Cause the do_device_add is in the to_do list, does this mean it is not finished yet? BTW: my qemu version is 0.12.3 which I believe is the latest version. My linux kernel is 2.6.33 Good Luck! On Fri, Apr 16, 2010 at 9:28 PM, Daniel P. Berrange wrote: > On Fri, Apr 16, 2010 at 09:18:12PM +0800, chunhui zhao wrote: >> Hi guys. >> >> As the kvm does not support auto connection of USB device, unless >> someone inputs the command "usb_add" in the monitor. I wonder if >> there is a better way, say, like some linux, auto mount the USB device >> without any command input. And I know there is also qmp(qemu monitor >> protocol) to use to communicate with the monitor. So I wanna write a >> program which will capture the hotplug signal and communicate with the >> monitor to auto mount the USB device. >> >> The problem is, the qmp does not support the command "usb_add". >> (http://www.linux-kvm.org/page/MonitorProtocol) >> Is there any method to solve this? > > usb_add/del & pci_add/del are all deprecated and not going to be ported > to QMP. Instead use the device_add command eg in plain monitor format: > > device_add usb-host,hostbus=BUS,hostaddr=ADDR,vendorid=VID,productid=PRID > > Or QMP format > > { "execute": "device_add", "arguments": { "driver": "usb-host", > "hostbus": "BUS", > "hostaddr": "ADDR", > "vendorid": "VID", > "productid": "PRID" }} > > This accepts identical param names as the -device command line arg which is > documented in docs/qdev-device-use.txt > > Regards, > Daniel > -- > |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| > |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timedrift in KVM guests after livemigration.
On 04/19/2010 12:29 PM, Gleb Natapov wrote: On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote: Den 18.04.2010 11:56, skrev Gleb Natapov: That's two different things here: The issue that Espen is reporting is that the hosts have different frequency and guests that relay on the tsc as a source clock will notice that post migration. The is indeed a problem that -tdf does not solve. -tdf only adds compensation for the RTC clock emulation. It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel pit is used it does nothing. So this "hack" will not solve our problem? As I also stated, in the past the kvmclock MSRs were not sync upon live migration and it was fixed in 1a03675db146dfc760b3b48b3448075189f142cc , better check with the code. If your guest uses RTC for time keeping it may help. Otherwise it does nothing. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 02:19 PM, Peter Zijlstra wrote: Still have two cmpxchgs in the common case. The first iteration will fail, fetching last_value, the second will work. It will be better when we have contention, though, so it's worthwhile. Right, another option is to put the initial read outside of the loop, that way you'll have the best of all cases, a single LOCK'ed op in the loop, and only a single LOCK'ed op for the fast path on sensible architectures ;-) last = atomic64_read(&last_value); do { if (ret< last) return last; last = atomic64_cmpxchg(&last_value, last, ret); } while (unlikely(last != ret)); Or so. Yes, looks optimal when !NONSTOP_TSC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:59 PM, Peter Zijlstra wrote: So what do we need? test for both TSC_RELIABLE and NONSTOP_TSC? IMO TSC_RELIABLE should imply NONSTOP_TSC. Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP&& CONSTANT does not make RELIABLE. The manual says: 16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.8007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource. and this maps to NONSTOP, so I think we're fine. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 14:13 +0300, Avi Kivity wrote: > On 04/19/2010 01:56 PM, Peter Zijlstra wrote: > > > > > >>> Right, do bear in mind that the x86 implementation of atomic64_read() is > >>> terrifyingly expensive, it is better to not do that read and simply use > >>> the result of the cmpxchg. > >>> > >>> > >>> > >> atomic64_read() _is_ cmpxchg64b. Are you thinking of some clever > >> implementation for smp i386? > >> > > > > No, what I was suggesting was to rewrite that loop no to need the > > initial read but use the cmpxchg result of the previous iteration. > > > > Something like: > > > >u64 last = 0; > > > >/* more stuff */ > > > >do { > > if (ret< last) > >return last; > > last = cmpxchg64(&last_value, last, ret); > >} while (last != ret); > > > > That only has a single cmpxchg8 in there per loop instead of two > > (avoiding the atomic64_read() one). > > > > Still have two cmpxchgs in the common case. The first iteration will > fail, fetching last_value, the second will work. > > It will be better when we have contention, though, so it's worthwhile. Right, another option is to put the initial read outside of the loop, that way you'll have the best of all cases, a single LOCK'ed op in the loop, and only a single LOCK'ed op for the fast path on sensible architectures ;-) last = atomic64_read(&last_value); do { if (ret < last) return last; last = atomic64_cmpxchg(&last_value, last, ret); } while (unlikely(last != ret)); Or so. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:56 PM, Peter Zijlstra wrote: Right, do bear in mind that the x86 implementation of atomic64_read() is terrifyingly expensive, it is better to not do that read and simply use the result of the cmpxchg. atomic64_read() _is_ cmpxchg64b. Are you thinking of some clever implementation for smp i386? No, what I was suggesting was to rewrite that loop no to need the initial read but use the cmpxchg result of the previous iteration. Something like: u64 last = 0; /* more stuff */ do { if (ret< last) return last; last = cmpxchg64(&last_value, last, ret); } while (last != ret); That only has a single cmpxchg8 in there per loop instead of two (avoiding the atomic64_read() one). Still have two cmpxchgs in the common case. The first iteration will fail, fetching last_value, the second will work. It will be better when we have contention, though, so it's worthwhile. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 02:05 PM, Peter Zijlstra wrote: ACCESS_ONCE() is your friend. I think it's implied with atomic64_read(). Yes it would be. I was merely trying to point out that last = ACCESS_ONCE(last_value); Is a narrower way of writing: last = last_value; barrier(); In that it need not clobber all memory locations and makes it instantly clear what we want the barrier for. Oh yes, just trying to avoid a patch with both atomic64_read() and ACCESS_ONCE(). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM Test: Add KVM unit test (kvmctl) v2
From: sshang The test use kvm test harness kvmctl load binary test case file to test various functions of the kvm kernel module. This test is for unit testing on older KVM branches, after some consideration we decided to keep the modules for upstream and branches that use kvmctl separated. Changes from v1: * Fixed some bugs * Renamed the test file * Made code shorter Signed-off-by: Shuxi Shang --- client/tests/kvm/tests/unit_test_kvmctl.py | 28 client/tests/kvm/tests_base.cfg.sample | 32 2 files changed, 60 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/unit_test_kvmctl.py diff --git a/client/tests/kvm/tests/unit_test_kvmctl.py b/client/tests/kvm/tests/unit_test_kvmctl.py new file mode 100644 index 000..a5b88b6 --- /dev/null +++ b/client/tests/kvm/tests/unit_test_kvmctl.py @@ -0,0 +1,28 @@ +import os +from autotest_lib.client.bin import utils +from autotest_lib.client.common_lib import error + + +def run_unit_test_kvmctl(test, params, env): +""" +This is kvm userspace unit test, use kvm test harness kvmctl load binary +test case file to test various functions of the kvm kernel module. +The output of all unit tests can be found in the test result dir. + +@param test: KVM test object. +@param params: Dictionary with the test parameters. +@param env: Dictionary with test environment. +""" +case = params.get("case") +srcdir = params.get("srcdir", test.srcdir) +unit_dir = os.path.join(srcdir, "kvm_userspace", "kvm", "user") +os.chdir(unit_dir) + +cmd = "./kvmctl test/x86/bootstrap test/x86/%s.flat" % case +try: +results = utils.system_output(cmd) +except error.CmdError, e: +raise error.TestFail("Unit test %s failed" % case) + +result_file = os.path.join(test.resultsdir, case) +utils.open_write_close(result_file, results) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index e73ba44..9f82ffb 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -306,6 +306,38 @@ variants: - ksm_parallel: ksm_mode = "parallel" +# This unit test module is for older branches of KVM that use the +# kvmctl test harness (such as the code shipped with RHEL 5.x) +- unit_test_kvmctl: +type = unit_test +vms = '' +profilers = '' +variants: +- access: +case = access +- apic: +case = apic +- emulator: +case = emulator +- hypercall: +case = hypercall +- msr: +case = msr +- port80: +case = port80 +- realmode: +case = realmode +- sieve: +case = sieve +- smptest: +case = smptest +- tsc: +case = tsc +- stringio: +case = stringio +- vmexit: +case = vmexit + - qemu_img: type = qemu_img vms = '' -- 1.6.6.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 13:50 +0300, Avi Kivity wrote: > On 04/19/2010 01:39 PM, Peter Zijlstra wrote: > > On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote: > > > >>> + do { > >>> + last = last_value; > >>> > >>> > >> Does this need a barrier() to prevent the compiler from re-reading > >> last_value for the subsequent lines? Otherwise "(ret< last)" and > >> "return last" could execute with different values for "last". > > > ACCESS_ONCE() is your friend. > > > > I think it's implied with atomic64_read(). Yes it would be. I was merely trying to point out that last = ACCESS_ONCE(last_value); Is a narrower way of writing: last = last_value; barrier(); In that it need not clobber all memory locations and makes it instantly clear what we want the barrier for. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 13:53 +0300, Avi Kivity wrote: > On 04/19/2010 01:49 PM, Peter Zijlstra wrote: > > > >> Right, so on x86 we have: > >> > >> X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency > >> independent, not that it doesn't stop in C states and similar fun stuff. > >> > >> X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant > >> and synced between cores. > >> > > Fun, we also have: > > > > X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C > > states. > > > > All of them? I though tsc stops in some mwait deep REM sleep thing. The idea is that TSC will not stop ever (except s2r like stuff), not sure about the actual implementation of the NONSTOP_TSC bit though. > So what do we need? test for both TSC_RELIABLE and NONSTOP_TSC? IMO > TSC_RELIABLE should imply NONSTOP_TSC. Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP && CONSTANT does not make RELIABLE. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 13:47 +0300, Avi Kivity wrote: > On 04/19/2010 01:43 PM, Peter Zijlstra wrote: > > > >>> > + > cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) > { > struct pvclock_shadow_time shadow; > unsigned version; > cycle_t ret, offset; > +u64 last; > > > +do { > +last = last_value; > > >>> Otherwise, this assignment can see a partial update. > >>> > >> On a 32-bit guest, that is. > >> > > Right, do bear in mind that the x86 implementation of atomic64_read() is > > terrifyingly expensive, it is better to not do that read and simply use > > the result of the cmpxchg. > > > > > > atomic64_read() _is_ cmpxchg64b. Are you thinking of some clever > implementation for smp i386? No, what I was suggesting was to rewrite that loop no to need the initial read but use the cmpxchg result of the previous iteration. Something like: u64 last = 0; /* more stuff */ do { if (ret < last) return last; last = cmpxchg64(&last_value, last, ret); } while (last != ret); That only has a single cmpxchg8 in there per loop instead of two (avoiding the atomic64_read() one). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Make Performance Counters work
On 19.04.2010, at 12:34, Avi Kivity wrote: > On 04/17/2010 01:22 AM, Alexander Graf wrote: >> When we get a performance counter interrupt we need to route it on to the >> Linux handler after we got out of the guest context. We also need to tell >> our handling code that this particular interrupt doesn't need treatment. >> >> So let's add those two bits in, making perf work while having a KVM guest >> running. >> >> > > Doesn't apply - does it depend on another patchset? This patch is deprecated by the "Post-PPC32" patchset. I only noted that in the cover letter of the Post-PPC32 patchset, not in a reply to this patch. Sorry. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:51 PM, Peter Zijlstra wrote: Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and synced between cores. Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)? Not sure, IIRC we clear that when the TSC sync test fails, eg when we mark the tsc clocksource unusable. Worrying. By the time we detect this the guest may already have gotten confused by clocks going backwards. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:49 PM, Peter Zijlstra wrote: Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and synced between cores. Fun, we also have: X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C states. All of them? I though tsc stops in some mwait deep REM sleep thing. So what do we need? test for both TSC_RELIABLE and NONSTOP_TSC? IMO TSC_RELIABLE should imply NONSTOP_TSC. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 13:49 +0300, Avi Kivity wrote: > On 04/19/2010 01:46 PM, Peter Zijlstra wrote: > > On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote: > > > >>> After this patch is applied, I don't see a single warp in time during 5 > >>> days > >>> of execution, in any of the machines I saw them before. > >>> > >>> > >>> > >> Please define a cpuid bit that makes this optional. When we eventually > >> enable it in the future, it will allow a wider range of guests to enjoy it. > >> > > Right, so on x86 we have: > > > > X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency > > independent, not that it doesn't stop in C states and similar fun stuff. > > > > X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant > > and synced between cores. > > > > > > Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)? Not sure, IIRC we clear that when the TSC sync test fails, eg when we mark the tsc clocksource unusable. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:39 PM, Peter Zijlstra wrote: On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote: + do { + last = last_value; Does this need a barrier() to prevent the compiler from re-reading last_value for the subsequent lines? Otherwise "(ret< last)" and "return last" could execute with different values for "last". ACCESS_ONCE() is your friend. I think it's implied with atomic64_read(). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Mon, 2010-04-19 at 12:46 +0200, Peter Zijlstra wrote: > On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote: > > > After this patch is applied, I don't see a single warp in time during 5 > > > days > > > of execution, in any of the machines I saw them before. > > > > > > > > > > Please define a cpuid bit that makes this optional. When we eventually > > enable it in the future, it will allow a wider range of guests to enjoy it. > > Right, so on x86 we have: > > X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency > independent, not that it doesn't stop in C states and similar fun stuff. > > X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant > and synced between cores. Fun, we also have: X86_FEATURE_NONSTOP_TSC, which states the thing doesn't stop in C states. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:46 PM, Peter Zijlstra wrote: On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote: After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Please define a cpuid bit that makes this optional. When we eventually enable it in the future, it will allow a wider range of guests to enjoy it. Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and synced between cores. Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On 04/19/2010 01:43 PM, Peter Zijlstra wrote: + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; unsigned version; cycle_t ret, offset; +u64 last; +do { +last = last_value; Otherwise, this assignment can see a partial update. On a 32-bit guest, that is. Right, do bear in mind that the x86 implementation of atomic64_read() is terrifyingly expensive, it is better to not do that read and simply use the result of the cmpxchg. atomic64_read() _is_ cmpxchg64b. Are you thinking of some clever implementation for smp i386? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote: > > After this patch is applied, I don't see a single warp in time during 5 days > > of execution, in any of the machines I saw them before. > > > > > > Please define a cpuid bit that makes this optional. When we eventually > enable it in the future, it will allow a wider range of guests to enjoy it. Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and synced between cores. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Sat, 2010-04-17 at 21:49 +0300, Avi Kivity wrote: > On 04/17/2010 09:48 PM, Avi Kivity wrote: > > > >> > >> +static u64 last_value = 0; > > > > Needs to be atomic64_t. > > > >> + > >> cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) > >> { > >> struct pvclock_shadow_time shadow; > >> unsigned version; > >> cycle_t ret, offset; > >> +u64 last; > >> > >> > >> +do { > >> +last = last_value; > > > > Otherwise, this assignment can see a partial update. > > On a 32-bit guest, that is. Right, do bear in mind that the x86 implementation of atomic64_read() is terrifyingly expensive, it is better to not do that read and simply use the result of the cmpxchg. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Add a global synchronization point for pvclock
On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote: > > + do { > > + last = last_value; > > > Does this need a barrier() to prevent the compiler from re-reading > last_value for the subsequent lines? Otherwise "(ret < last)" and > "return last" could execute with different values for "last". > ACCESS_ONCE() is your friend. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Make Performance Counters work
On 04/17/2010 01:22 AM, Alexander Graf wrote: When we get a performance counter interrupt we need to route it on to the Linux handler after we got out of the guest context. We also need to tell our handling code that this particular interrupt doesn't need treatment. So let's add those two bits in, making perf work while having a KVM guest running. Doesn't apply - does it depend on another patchset? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.
On Mon, Apr 19, 2010 at 06:05:17PM +0800, Xin, Xiaohui wrote: > > Michael, > > >>> The idea is simple, just to pin the guest VM user space and then > > >>> let host NIC driver has the chance to directly DMA to it. > > >>> The patches are based on vhost-net backend driver. We add a device > > >>> which provides proto_ops as sendmsg/recvmsg to vhost-net to > > >>> send/recv directly to/from the NIC driver. KVM guest who use the > > >>> vhost-net backend may bind any ethX interface in the host side to > > >>> get copyless data transfer thru guest virtio-net frontend. > > >>> > > >>> The scenario is like this: > > >>> > > >>> The guest virtio-net driver submits multiple requests thru vhost-net > > >>> backend driver to the kernel. And the requests are queued and then > > >>> completed after corresponding actions in h/w are done. > > >>> > > >>> For read, user space buffers are dispensed to NIC driver for rx when > > >>> a page constructor API is invoked. Means NICs can allocate user buffers > > >>> from a page constructor. We add a hook in netif_receive_skb() function > > >>> to intercept the incoming packets, and notify the zero-copy device. > > >>> > > >>> For write, the zero-copy deivce may allocates a new host skb and puts > > >>> payload on the skb_shinfo(skb)->frags, and copied the header to > > >>> skb->data. > > >>> The request remains pending until the skb is transmitted by h/w. > > >>> > > >>> Here, we have ever considered 2 ways to utilize the page constructor > > >>> API to dispense the user buffers. > > >>> > > >>> One:Modify __alloc_skb() function a bit, it can only allocate a > > >>> structure of sk_buff, and the data pointer is pointing to a > > >>> user buffer which is coming from a page constructor API. > > >>> Then the shinfo of the skb is also from guest. > > >>> When packet is received from hardware, the skb->data is filled > > >>> directly by h/w. What we have done is in this way. > > >>> > > >>> Pros: We can avoid any copy here. > > >>> Cons: Guest virtio-net driver needs to allocate skb as almost > > >>> the same method with the host NIC drivers, say the size > > >>> of netdev_alloc_skb() and the same reserved space in the > > >>> head of skb. Many NIC drivers are the same with guest > > >>> and > > >>> ok for this. But some lastest NIC drivers reserves > > >>> special > > >>> room in skb head. To deal with it, we suggest to provide > > >>> a method in guest virtio-net driver to ask for parameter > > >>> we interest from the NIC driver when we know which > > >>> device > > >>> we have bind to do zero-copy. Then we ask guest to do > > >>> so. > > >>> Is that reasonable? > > >>Unfortunately, this would break compatibility with existing virtio. > > >>This also complicates migration. > >> You mean any modification to the guest virtio-net driver will break the > >> compatibility? We tried to enlarge the virtio_net_config to contains the > >> 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe() > >> will check the feature flag, and get the parameters, then virtio-net > >> driver use > >> it to allocate buffers. How about this? > > >This means that we can't, for example, live-migrate between different systems > >without flushing outstanding buffers. > > Ok. What we have thought about now is to do something with skb_reserve(). > If the device is binded by mp, then skb_reserve() will do nothing with it. > > > >>What is the room in skb head used for? > > >I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes > > >compared to > >> NET_IP_ALIGN. > > >Looking at code, this seems to do with alignment - could just be > >a performance optimization. > > > >>> Two:Modify driver to get user buffer allocated from a page > > >>> constructor > > >>> API(to substitute alloc_page()), the user buffer are used as > > >>> payload > > >>> buffers and filled by h/w directly when packet is received. > > >>> Driver > > >>> should associate the pages with skb (skb_shinfo(skb)->frags). > > >>> For > > >>> the head buffer side, let host allocates skb, and h/w fills it. > > >>> After that, the data filled in host skb header will be copied > > >>> into > > >>> guest header buffer which is submitted together with the > > >>> payload buffer. > > >>> > > >>> Pros: We could less care the way how guest or host allocates > > >>> their > > >>> buffers. > > >>> Cons: We still need a bit copy here for the skb header. > > >>> > > >>> We are not sure which way is the better here. > > >>The obvious question would be whether you see any speed difference > > >>with the two approaches. If no, then the second approach would be > > >>better. > > > >> I remember the second a
Re: [PATCH] KVM-Test: Add KVM unit test (kvmctl)
Lucas Meneghel Rodrigues wrote: > On Mon, 2010-04-19 at 10:50 +0200, Jan Kiszka wrote: >> Lucas Meneghel Rodrigues wrote: >>> From: sshang >>> >>> The test use kvm test harness kvmctl load binary test case file >>> to test various functions of the kvm kernel module. >> I thought that tool is deprecated (or even broken), and I was about to >> suggest dropping it from the tree in favor of a simple script like this: >> >> $QEMU -chardev stdio,id=cons -device testdev,chardev=cons -vga none \ >> -vnc none -kernel $TESTCASE > > The purpose of the this specific test module is to do the unit testing > in older branches of the code (such as RHEL 5.x branch). For upstream we > have another module that will use the -kernel option being worked. I > plan on calling it unit_test_kernel. Ah, see. Thanks for the clarification. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM-Test: Add KVM unit test (kvmctl)
On Mon, 2010-04-19 at 10:50 +0200, Jan Kiszka wrote: > Lucas Meneghel Rodrigues wrote: > > From: sshang > > > > The test use kvm test harness kvmctl load binary test case file > > to test various functions of the kvm kernel module. > > I thought that tool is deprecated (or even broken), and I was about to > suggest dropping it from the tree in favor of a simple script like this: > > $QEMU -chardev stdio,id=cons -device testdev,chardev=cons -vga none \ > -vnc none -kernel $TESTCASE The purpose of the this specific test module is to do the unit testing in older branches of the code (such as RHEL 5.x branch). For upstream we have another module that will use the -kernel option being worked. I plan on calling it unit_test_kernel. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held
On 04/19/2010 12:58 PM, Lai Jiangshan wrote: Applied the patch I just sent and let CONFIG_PROVE_RCU=y, we can got the following dmesg. And we found that it is because some codes in KVM dereferences srcu-protected pointer without srcu_read_lock() held or update-side lock held. It is not hard to fix, the problem is that: Where is the most proper place to put a srcu_read_lock()? I can not determine the answer, so I report this bug instead of fixing it. I think the else branch in complete_pio() should work. Marcelo? Longer term I'd like to see the lock taken at the high levels (ioctls, in virt/kvm) and dropped only for guest entry and when we explicitly sleep (hlt emulation). Note: complete_pio() is gone in the current code. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC][PATCH v2 0/3] Provide a zero-copy method on KVM virtio-net.
> Michael, > >>> The idea is simple, just to pin the guest VM user space and then > >>> let host NIC driver has the chance to directly DMA to it. > >>> The patches are based on vhost-net backend driver. We add a device > >>> which provides proto_ops as sendmsg/recvmsg to vhost-net to > >>> send/recv directly to/from the NIC driver. KVM guest who use the > >>> vhost-net backend may bind any ethX interface in the host side to > >>> get copyless data transfer thru guest virtio-net frontend. > >>> > >>> The scenario is like this: > >>> > >>> The guest virtio-net driver submits multiple requests thru vhost-net > >>> backend driver to the kernel. And the requests are queued and then > >>> completed after corresponding actions in h/w are done. > >>> > >>> For read, user space buffers are dispensed to NIC driver for rx when > >>> a page constructor API is invoked. Means NICs can allocate user buffers > >>> from a page constructor. We add a hook in netif_receive_skb() function > >>> to intercept the incoming packets, and notify the zero-copy device. > >>> > >>> For write, the zero-copy deivce may allocates a new host skb and puts > >>> payload on the skb_shinfo(skb)->frags, and copied the header to skb->data. > >>> The request remains pending until the skb is transmitted by h/w. > >>> > >>> Here, we have ever considered 2 ways to utilize the page constructor > >>> API to dispense the user buffers. > >>> > >>> One: Modify __alloc_skb() function a bit, it can only allocate a > >>> structure of sk_buff, and the data pointer is pointing to a > >>> user buffer which is coming from a page constructor API. > >>> Then the shinfo of the skb is also from guest. > >>> When packet is received from hardware, the skb->data is filled > >>> directly by h/w. What we have done is in this way. > >>> > >>> Pros: We can avoid any copy here. > >>> Cons: Guest virtio-net driver needs to allocate skb as almost > >>> the same method with the host NIC drivers, say the size > >>> of netdev_alloc_skb() and the same reserved space in the > >>> head of skb. Many NIC drivers are the same with guest and > >>> ok for this. But some lastest NIC drivers reserves special > >>> room in skb head. To deal with it, we suggest to provide > >>> a method in guest virtio-net driver to ask for parameter > >>> we interest from the NIC driver when we know which device > >>> we have bind to do zero-copy. Then we ask guest to do so. > >>> Is that reasonable? > >>Unfortunately, this would break compatibility with existing virtio. > >>This also complicates migration. >> You mean any modification to the guest virtio-net driver will break the >> compatibility? We tried to enlarge the virtio_net_config to contains the >> 2 parameter, and add one VIRTIO_NET_F_PASSTHRU flag, virtionet_probe() >> will check the feature flag, and get the parameters, then virtio-net driver >> use >> it to allocate buffers. How about this? >This means that we can't, for example, live-migrate between different systems >without flushing outstanding buffers. Ok. What we have thought about now is to do something with skb_reserve(). If the device is binded by mp, then skb_reserve() will do nothing with it. > >>What is the room in skb head used for? > >I'm not sure, but the latest ixgbe driver does this, it reserves 32 bytes > >compared to >> NET_IP_ALIGN. >Looking at code, this seems to do with alignment - could just be >a performance optimization. > >>> Two: Modify driver to get user buffer allocated from a page > >>> constructor > >>> API(to substitute alloc_page()), the user buffer are used as payload > >>> buffers and filled by h/w directly when packet is received. Driver > >>> should associate the pages with skb (skb_shinfo(skb)->frags). For > >>> the head buffer side, let host allocates skb, and h/w fills it. > >>> After that, the data filled in host skb header will be copied into > >>> guest header buffer which is submitted together with the payload buffer. > >>> > >>> Pros: We could less care the way how guest or host allocates their > >>> buffers. > >>> Cons: We still need a bit copy here for the skb header. > >>> > >>> We are not sure which way is the better here. > >>The obvious question would be whether you see any speed difference > >>with the two approaches. If no, then the second approach would be > >>better. > >> I remember the second approach is a bit slower in 1500MTU. >> But we did not tested too much. >Well, that's an important datapoint. By the way, you'll need >header copy to activate LRO in host, so that's a good >reason to go with option 2 as well. > >>> This is the first thing we want > >>> to get comments from the community. We wish the modification to the > >>> network > >>> part will be generic which not used by vhost-net backend only, but a user > >>> application may use it as well when the
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
On 04/19/2010 12:22 PM, Zhang, Yanmin wrote: I don't want to merge tip/master... does tip/perf/core contain the needed updates? I think so. A moment ago, I checked out to b5a80b7e9 of tip/perf/core. All 3 patches could be applied cleanly and compilation is ok. A quick testing shows tip/perf/core kernel does work. Thanks. I applied all three patches to the 'perf' branch in kvm.git and merged it to master. Ingo, please pull git://git.kernel.org/pub/scm/virt/kvm/kvm.git perf into tip's perf/core to receive those three patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held
Applied the patch I just sent and let CONFIG_PROVE_RCU=y, we can got the following dmesg. And we found that it is because some codes in KVM dereferences srcu-protected pointer without srcu_read_lock() held or update-side lock held. It is not hard to fix, the problem is that: Where is the most proper place to put a srcu_read_lock()? I can not determine the answer, so I report this bug instead of fixing it. Thanks. Lai. Reported-by: Lai Jiangshan === [ INFO: suspicious rcu_dereference_check() usage. ] --- arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/3027: #0: (&vcpu->mutex){+.+.+.}, at: [] vcpu_load+0x1a/0x66 [kvm] stack backtrace: Pid: 3027, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1-dirty #28 Call Trace: [] lockdep_rcu_dereference+0xaa/0xb3 [] unalias_gfn_instantiation+0x56/0xaf [kvm] [] gfn_to_hva+0x14/0x4c [kvm] [] kvm_write_guest_page+0x2a/0x7f [kvm] [] kvm_write_guest+0x41/0x83 [kvm] [] kvm_write_guest_virt+0x78/0xa1 [kvm] [] pio_copy_data+0x46/0x75 [kvm] [] ? sub_preempt_count+0x9/0x83 [] complete_pio+0x91/0x1b9 [kvm] [] kvm_arch_vcpu_ioctl_run+0x93/0xd2b [kvm] [] ? kvm_arch_vcpu_ioctl_run+0x8e5/0xd2b [kvm] [] ? __lock_acquire+0x7b4/0x16d5 [] kvm_vcpu_ioctl+0x103/0x97b [kvm] [] ? kvm_vm_ioctl+0x364/0x38d [kvm] [] ? fget_light+0xf1/0x241 [] vfs_ioctl+0x32/0xa6 [] do_vfs_ioctl+0x495/0x4db [] ? fget_light+0x231/0x241 [] ? fget_light+0xf1/0x241 [] sys_ioctl+0x47/0x6a [] system_call_fastpath+0x16/0x1b -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: use the correct RCU API
On 04/19/2010 12:41 PM, Lai Jiangshan wrote: The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. === [ INFO: suspicious rcu_dereference_check() usage. ] --- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [] lockdep_rcu_dereference+0xaa/0xb3 [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [] __kvm_set_memory_region+0x636/0x6e2 [kvm] [] kvm_set_memory_region+0x37/0x50 [kvm] [] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [] ? unlock_page+0x27/0x2c [] ? __do_fault+0x3a9/0x3e1 [] kvm_vm_ioctl+0x364/0x38d [kvm] [] ? up_read+0x23/0x3d [] vfs_ioctl+0x32/0xa6 [] do_vfs_ioctl+0x495/0x4db [] ? fget_light+0xc2/0x241 [] ? do_sys_open+0x104/0x116 [] ? retint_swapgs+0xe/0x13 [] sys_ioctl+0x47/0x6a [] system_call_fastpath+0x16/0x1b +static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) +{ + return rcu_dereference_check(kvm->memslots, + srcu_read_lock_held(&kvm->srcu) + || lockdep_is_held(&kvm->slots_lock)); +} + This open-codes srcu_dereference(). I guess we need an srcu_dereference_check(). Paul? btw, perhaps it is possible not to call rcu_dereference from the write paths. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: use the correct RCU API
The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. === [ INFO: suspicious rcu_dereference_check() usage. ] --- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [] lockdep_rcu_dereference+0xaa/0xb3 [] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [] __kvm_set_memory_region+0x636/0x6e2 [kvm] [] kvm_set_memory_region+0x37/0x50 [kvm] [] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [] ? unlock_page+0x27/0x2c [] ? __do_fault+0x3a9/0x3e1 [] kvm_vm_ioctl+0x364/0x38d [kvm] [] ? up_read+0x23/0x3d [] vfs_ioctl+0x32/0xa6 [] do_vfs_ioctl+0x495/0x4db [] ? fget_light+0xc2/0x241 [] ? do_sys_open+0x104/0x116 [] ? retint_swapgs+0xe/0x13 [] sys_ioctl+0x47/0x6a [] system_call_fastpath+0x16/0x1b Signed-off-by: Lai Jiangshan --- diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 73c5c2b..52b8ece 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1379,7 +1379,7 @@ static void kvm_release_vm_pages(struct kvm *kvm) int i, j; unsigned long base_gfn; - slots = rcu_dereference(kvm->memslots); + slots = kvm_memslots(kvm); for (i = 0; i < slots->nmemslots; i++) { memslot = &slots->memslots[i]; base_gfn = memslot->base_gfn; diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 60f09ab..cfa9d17 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -72,7 +72,7 @@ static inline void kvm_s390_vcpu_set_mem(struct kvm_vcpu *vcpu) struct kvm_memslots *memslots; idx = srcu_read_lock(&vcpu->kvm->srcu); - memslots = rcu_dereference(vcpu->kvm->memslots); + memslots = kvm_memslots(vcpu->kvm); mem = &memslots->memslots[0]; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 71faa04..bb61881 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -792,7 +792,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, int retval = 0; struct kvm_memslots *slots; - slots = rcu_dereference(kvm->memslots); + slots = kvm_memslots(kvm); for (i = 0; i < slots->nmemslots; i++) { struct kvm_memory_slot *memslot = &slots->memslots[i]; @@ -3017,7 +3017,8 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm) unsigned int nr_pages = 0; struct kvm_memslots *slots; - slots = rcu_dereference(kvm->memslots); + slots = kvm_memslots(kvm); + for (i = 0; i < slots->nmemslots; i++) nr_pages += slots->memslots[i].npages; @@ -3292,7 +3293,7 @@ static int count_rmaps(struct kvm_vcpu *vcpu) int i, j, k, idx; idx = srcu_read_lock(&kvm->srcu); - slots = rcu_dereference(kvm->memslots); + slots = kvm_memslots(kvm); for (i = 0; i < KVM_MEMORY_SLOTS; ++i) { struct kvm_memory_slot *m = &slots->memslots[i]; struct kvm_rmap_desc *d; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 077cac5..725e7b6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1514,7 +1514,7 @@ static gva_t rmode_tss_base(struct kvm *kvm) struct kvm_memslots *slots; gfn_t base_gfn; - slots = rcu_dereference(kvm->memslots); + slots = kvm_memslots(kvm); base_gfn = kvm->memslots->memslots[0].base_gfn + kvm->memslots->memslots[0].npages - 3; return base_gfn << PAGE_SHIFT; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6120e33..4dcd62c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2408,7 +2408,7 @@ gfn_t unalias_gfn_instantiation(struct kvm *kvm, gfn_t gfn) struct kvm_mem_alias *alias; struct kvm_mem_aliases *aliases; - aliases = rcu_dereference(kvm->arch.aliases); + aliases = kvm_aliases(kvm); for (i = 0; i < aliases->naliases; ++i) { alias = &aliases->aliases[i]; @@ -2427,7 +2427,7 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn) struct kvm_mem_alias *alias; struct kvm_mem_aliases *aliases; - aliases = rcu_dereference(kvm->arch.aliases); + aliases = kvm_aliases(kvm); for (i = 0; i < al
Re: Timedrift in KVM guests after livemigration.
On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote: > Den 18.04.2010 11:56, skrev Gleb Natapov: > > >>That's two different things here: > >>The issue that Espen is reporting is that the hosts have different > >>frequency and guests that relay on the tsc as a source clock will > >>notice that post migration. The is indeed a problem that -tdf does > >>not solve. -tdf only adds compensation for the RTC clock emulation. > >> > >It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel > >pit is used it does nothing. > > So this "hack" will not solve our problem? > If your guest uses RTC for time keeping it may help. Otherwise it does nothing. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] correctly handle VM Entry Exit reasons and also show them in trace.
On 04/18/2010 09:35 AM, Manish Regmi wrote: Hi, When the vm exit reason is VM Entry failures it has leftmost bit set. This patch - clears the leftmost bit when copying to vmx->exit_reason. This will make the checks like if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) valid in vmx_complete_interrupts. - adds two more EXIT_REASONS 33 and 34 in vmx.h - also adds them to exit reason strings. Please let me know if there is anything missing or wrong. Thank you. The patch is wordwrapped. Please use git send-email or equivalent. diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7e2f8d5..e93be6f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3641,7 +3641,7 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx) exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); - vmx->exit_reason = vmcs_read32(VM_EXIT_REASON); + vmx->exit_reason = vmcs_read32(VM_EXIT_REASON)& ~VMX_EXIT_REASONS_FAILED_VMENTRY; (here for example) /* Handle machine checks before interrupts are enabled */ if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) @@ -4057,6 +4057,8 @@ static const struct trace_print_flags vmx_exit_reasons_str[] = { _ER(IO_INSTRUCTION), _ER(MSR_READ), _ER(MSR_WRITE), + _ER(INVALID_GUEST_STATE), + _ER(MSR_LOADING), _ER(MWAIT_INSTRUCTION), _ER(MONITOR_INSTRUCTION), _ER(PAUSE_INSTRUCTION), Please split the new exit codes into a separate patch. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 1/3] perf & kvm: Enhance perf to collect KVM guest os statistics from host side
On Mon, 2010-04-19 at 11:59 +0300, Avi Kivity wrote: > On 04/19/2010 11:55 AM, Zhang, Yanmin wrote: > > On Mon, 2010-04-19 at 11:37 +0300, Avi Kivity wrote: > > > >> On 04/19/2010 08:32 AM, Zhang, Yanmin wrote: > >> > >>> Below patch introduces perf_guest_info_callbacks and related > >>> register/unregister > >>> functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and > >>> guest user > >>> space. > >>> > >>> > >> This doesn't apply against upstream. > >> > > What does upstream mean here? The vanilla 2.6.34-rc4? > > > > Yes, sorry for being unclear. > > >>What branch was this generated > >> against? > >> > >> > > It's against the latest tip/master. I checked out to 19b26586090 as the > > latest > > tip/master has some updates on perf. > > > > I don't want to merge tip/master... does tip/perf/core contain the > needed updates? I think so. A moment ago, I checked out to b5a80b7e9 of tip/perf/core. All 3 patches could be applied cleanly and compilation is ok. A quick testing shows tip/perf/core kernel does work. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Timedrift in KVM guests after livemigration.
Den 18.04.2010 11:56, skrev Gleb Natapov: That's two different things here: The issue that Espen is reporting is that the hosts have different frequency and guests that relay on the tsc as a source clock will notice that post migration. The is indeed a problem that -tdf does not solve. -tdf only adds compensation for the RTC clock emulation. It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel pit is used it does nothing. So this "hack" will not solve our problem? Espen -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] trace all instructions whose emulation failed
On 04/18/2010 09:33 AM, Manish Regmi wrote: Hi, The following patch makes sure all code path of failed emulation runs trace_kvm_emulate_insn_failed(). Please let me know if there is anything missing or wrong. Thank you. Signed-off-by: Manish Regmi diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b6e7535..fd1e875 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3784,36 +3784,35 @@ int emulate_instruction(struct kvm_vcpu *vcpu, c =&vcpu->arch.emulate_ctxt.decode; if (emulation_type& EMULTYPE_TRAP_UD) { if (!c->twobyte) - return EMULATE_FAIL; + goto emulate_failed; switch (c->b) { case 0x01: /* VMMCALL */ if (c->modrm_mod != 3 || c->modrm_rm != 1) - return EMULATE_FAIL; + goto emulate_failed; break; case 0x34: /* sysenter */ case 0x35: /* sysexit */ if (c->modrm_mod != 0 || c->modrm_rm != 0) - return EMULATE_FAIL; + goto emulate_failed; break; case 0x05: /* syscall */ if (c->modrm_mod != 0 || c->modrm_rm != 0) - return EMULATE_FAIL; + goto emulate_failed;; break; default: - return EMULATE_FAIL; + goto emulate_failed; } if (!(c->modrm_reg == 0 || c->modrm_reg == 3)) - return EMULATE_FAIL; + goto emulate_failed; } ++vcpu->stat.insn_emulation; if (r) { ++vcpu->stat.insn_emulation_fail; - trace_kvm_emulate_insn_failed(vcpu); if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) return EMULATE_DONE; - return EMULATE_FAIL; + goto emulate_failed; } } It's better not to trace #UD triggered emulations, since we except these to fail, for example if the guest executes the UD2 instruction. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html