Re: mmu_notifers, pte_dirty questions
On 06/06/2010 09:36 PM, Andrea Arcangeli wrote: On Sun, Jun 06, 2010 at 03:07:27PM +0300, Avi Kivity wrote: Why no notifer when testing and clearing the dirty bit? (*clear_flush_dirty)(...). static int page_mkclean_one(struct page *page, struct vm_area_struct *vma, unsigned long address) { struct mm_struct *mm = vma->vm_mm; pte_t *pte; spinlock_t *ptl; int ret = 0; pte = page_check_address(page, mm, address,&ptl, 1); if (!pte) goto out; if (pte_dirty(*pte) || pte_write(*pte)) { pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); set_pte_at_notify()? without this (or clear_flush_dirty) Linux will assume all ptes are now clean; if the guest writes to a page nothing will catch it. -> with set_pte_at_notify(), we can drop the spte and mark the page as dirty, so the next write will re-instantiate the spte -> with ->clear_flush_dirty(), we can track the dirty state without dropping the spte. ret = 1; } pte_unmap_unlock(pte, ptl); out: return ret; I'm probably missing something big as I can't see how this works. Under the PT lock it's safe to keep the PTE zero, just the pte must be non zero again before pte_unmap_unlock. The sptes are all gone by the time ptep_clear_flush_notify returns (also gup-fast is stopped with the IPI of the flush) and no spte can be established again before pte_unmap_unlock runs, so it's all safe as far as I can tell. Somehow I missed the ptep_clear_flush_notify()... so all should be fine. set_pte_at_notify might prevent a minor fault though. I'm thinking of how to implement speculative write access for kvm: consider a read fault for a writeable page. We could install a writeable spte with the dirty bit clear, and examine the dirty bit at pte_clear_flush_notify() time and transfer it to the page flags. However I can't see where the mm code checks the pte dirty bit for anonymous pages? Does it assume anonymous pages are always dirty? (they could have a clean copy in swap, no?) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c
> -Original Message- > From: kvm-ow...@vger.kernel.org > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Friday, June 04, 2010 10:08 PM > To: Kevin Hao > Cc: Marcelo Tosatti; Kumar Gala; Avi Kivity; > kvm@vger.kernel.org list; linuxppc-...@ozlabs.org; Liu Yu-B13201 > Subject: Re: [PATCH] kvm/powerpc: fix a build error in e500_tlb.c > > > On 03.06.2010, at 07:52, Kevin Hao wrote: > > > We use the wrong number arguments when invoking > trace_kvm_stlb_inval, > > and cause the following build error. > > arch/powerpc/kvm/e500_tlb.c: In function > 'kvmppc_e500_stlbe_invalidate': > > arch/powerpc/kvm/e500_tlb.c:230: error: too many arguments > to function 'trace_kvm_stlb_inval' > > Liu, I'd like to get an ack from you here. > Seems commit e43f2f741a49483034bf968841275cfa553a4cb3 has solved this. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Reboot Problem
Hi, Yes, Insipte of adding kernel.panic = 10 in sysctl.conf, the kernel does not reboot when kvm-i9ntel.ko is been inserted. Do you have any idea , because, when the kvm-intel.ko is not inserted, kernel reboots. -Original Message- From: Sterling Windmill [mailto:sterl...@ampx.net] Sent: Monday, June 07, 2010 8:49 AM To: Chaitra Gorantla Cc: kvm@vger.kernel.org Subject: Re: Reboot Problem The Linux kernel does not reboot on panic by default. Read here: http://www.cyberciti.biz/tips/reboot-linux-box-after-a-kernel-panic.html - Original Message - From: "Chaitra Gorantla" To: kvm@vger.kernel.org Sent: Sunday, June 6, 2010 10:59:11 PM Subject: Reboot Problem Hi all, We know that kernel reboots after any panic or crash. I am using 2.6.21 kernel (HOST machine) and KVM-88 package. After inserting kvm-intel.ko , if any kernel crash occurs, the Host halts= . The kernel never reboots. Please suggest regarding this. Thanks in advance, Chaitra This Email may contain confidential or privileged information for the intended recipient (s) If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. __ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html __ This Email may contain confidential or privileged information for the intended recipient (s) If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. __ N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: Reboot Problem
The Linux kernel does not reboot on panic by default. Read here: http://www.cyberciti.biz/tips/reboot-linux-box-after-a-kernel-panic.html - Original Message - From: "Chaitra Gorantla" To: kvm@vger.kernel.org Sent: Sunday, June 6, 2010 10:59:11 PM Subject: Reboot Problem Hi all, We know that kernel reboots after any panic or crash. I am using 2.6.21 kernel (HOST machine) and KVM-88 package. After inserting kvm-intel.ko , if any kernel crash occurs, the Host halts= . The kernel never reboots. Please suggest regarding this. Thanks in advance, Chaitra This Email may contain confidential or privileged information for the intended recipient (s) If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. __ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Reboot Problem
Hi all, We know that kernel reboots after any panic or crash. I am using 2.6.21 kernel (HOST machine) and KVM-88 package. After inserting kvm-intel.ko , if any kernel crash occurs, the Host halts= . The kernel never reboots. Please suggest regarding this. Thanks in advance, Chaitra This Email may contain confidential or privileged information for the intended recipient (s) If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. __ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single()
The name "pid_sync_vcpu_all" isn't appropriate since it just affect a single vpid, so rename it to vpid_sync_vcpu_single(). Signed-off-by: Gui Jianfeng --- arch/x86/kvm/vmx.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aaf8a28..4d893ac 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -504,7 +504,7 @@ static void vcpu_clear(struct vcpu_vmx *vmx) smp_call_function_single(vmx->vcpu.cpu, __vcpu_clear, vmx, 1); } -static inline void vpid_sync_vcpu_all(struct vcpu_vmx *vmx) +static inline void vpid_sync_vcpu_single(struct vcpu_vmx *vmx) { if (vmx->vpid == 0) return; @@ -522,7 +522,7 @@ static inline void vpid_sync_vcpu_global(void) static inline void vpid_sync_context(struct vcpu_vmx *vmx) { if (cpu_has_vmx_invvpid_single()) - vpid_sync_vcpu_all(vmx); + vpid_sync_vcpu_single(vmx); else vpid_sync_vcpu_global(); } -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM VMX: Add all-context INVVPID type support
Add all-context INVVPID type support. Signed-off-by: Gui Jianfeng --- arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/vmx.c | 23 +-- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index b4e2840..96a5886 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -377,6 +377,7 @@ enum vmcs_field { #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26) #define VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT (1ull << 9) /* (41 - 32) */ +#define VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT (1ull << 10) /* (42 - 32) */ #define VMX_EPT_DEFAULT_GAW3 #define VMX_EPT_MAX_GAW0x4 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 240a407..aaf8a28 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -360,6 +360,11 @@ static inline bool cpu_has_vmx_invvpid_single(void) return vmx_capability.vpid & VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT; } +static inline bool cpu_has_vmx_invvpid_global(void) +{ + return vmx_capability.vpid & VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT; +} + static inline bool cpu_has_vmx_ept(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & @@ -508,6 +513,20 @@ static inline void vpid_sync_vcpu_all(struct vcpu_vmx *vmx) __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx->vpid, 0); } +static inline void vpid_sync_vcpu_global(void) +{ + if (cpu_has_vmx_invvpid_global()) + __invvpid(VMX_VPID_EXTENT_ALL_CONTEXT, 0, 0); +} + +static inline void vpid_sync_context(struct vcpu_vmx *vmx) +{ + if (cpu_has_vmx_invvpid_single()) + vpid_sync_vcpu_all(vmx); + else + vpid_sync_vcpu_global(); +} + static inline void ept_sync_global(void) { if (cpu_has_vmx_invept_global()) @@ -1797,7 +1816,7 @@ static void exit_lmode(struct kvm_vcpu *vcpu) static void vmx_flush_tlb(struct kvm_vcpu *vcpu) { - vpid_sync_vcpu_all(to_vmx(vcpu)); + vpid_sync_context(to_vmx(vcpu)); if (enable_ept) ept_sync_context(construct_eptp(vcpu->arch.mmu.root_hpa)); } @@ -2753,7 +2772,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmx_fpu_activate(&vmx->vcpu); update_exception_bitmap(&vmx->vcpu); - vpid_sync_vcpu_all(vmx); + vpid_sync_context(vmx); ret = 0; -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qemu-kvm] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc
On Sun, Jun 06, 2010 at 04:22:45PM +0300, Avi Kivity wrote: > On 06/06/2010 04:13 PM, Ken CC wrote: > >To make the directory architecture match the file config file > >kvm/test/config-powerpc.mk. > > > > Can you please resent with 'diff.renames' set in your git > configuration? That will make the patch reviewable, as it will > convert it to a pattern of moves instead of creates/deletes. > Hi, Avi, I have resent the patch, please check. > Also, isn't a Makefile change needed? > the content of config-powerpc.mak shares the same directory structure with that of config-x86-common.mak, so i think it is not needed to change Makefile while moving "powerpc" out of the "test" dir. Regards, Ken CC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH QEMU-KVM v2 diff.renames] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc
To make the directory architecture match the make config file kvm/test/config-powerpc.mk. Signed-off-by: Ken CC --- kvm/test/{ => }/test/powerpc/44x/tlbsx.S |0 kvm/test/{ => }/test/powerpc/44x/tlbwe.S |0 kvm/test/{ => }/test/powerpc/44x/tlbwe_16KB.S |0 kvm/test/{ => }/test/powerpc/44x/tlbwe_hole.S |0 kvm/test/{ => }/test/powerpc/cstart.S |0 kvm/test/{ => }/test/powerpc/exit.c |0 kvm/test/{ => }/test/powerpc/helloworld.c |0 kvm/test/{ => }/test/powerpc/io.S |0 kvm/test/{ => }/test/powerpc/spin.S |0 kvm/test/{ => }/test/powerpc/sprg.S |0 10 files changed, 0 insertions(+), 0 deletions(-) rename kvm/test/{ => }/test/powerpc/44x/tlbsx.S (100%) rename kvm/test/{ => }/test/powerpc/44x/tlbwe.S (100%) rename kvm/test/{ => }/test/powerpc/44x/tlbwe_16KB.S (100%) rename kvm/test/{ => }/test/powerpc/44x/tlbwe_hole.S (100%) rename kvm/test/{ => }/test/powerpc/cstart.S (100%) rename kvm/test/{ => }/test/powerpc/exit.c (100%) rename kvm/test/{ => }/test/powerpc/helloworld.c (100%) rename kvm/test/{ => }/test/powerpc/io.S (100%) rename kvm/test/{ => }/test/powerpc/spin.S (100%) rename kvm/test/{ => }/test/powerpc/sprg.S (100%) diff --git a/kvm/test/test/powerpc/44x/tlbsx.S b/kvm/test/powerpc/44x/tlbsx.S similarity index 100% rename from kvm/test/test/powerpc/44x/tlbsx.S rename to kvm/test/powerpc/44x/tlbsx.S diff --git a/kvm/test/test/powerpc/44x/tlbwe.S b/kvm/test/powerpc/44x/tlbwe.S similarity index 100% rename from kvm/test/test/powerpc/44x/tlbwe.S rename to kvm/test/powerpc/44x/tlbwe.S diff --git a/kvm/test/test/powerpc/44x/tlbwe_16KB.S b/kvm/test/powerpc/44x/tlbwe_16KB.S similarity index 100% rename from kvm/test/test/powerpc/44x/tlbwe_16KB.S rename to kvm/test/powerpc/44x/tlbwe_16KB.S diff --git a/kvm/test/test/powerpc/44x/tlbwe_hole.S b/kvm/test/powerpc/44x/tlbwe_hole.S similarity index 100% rename from kvm/test/test/powerpc/44x/tlbwe_hole.S rename to kvm/test/powerpc/44x/tlbwe_hole.S diff --git a/kvm/test/test/powerpc/cstart.S b/kvm/test/powerpc/cstart.S similarity index 100% rename from kvm/test/test/powerpc/cstart.S rename to kvm/test/powerpc/cstart.S diff --git a/kvm/test/test/powerpc/exit.c b/kvm/test/powerpc/exit.c similarity index 100% rename from kvm/test/test/powerpc/exit.c rename to kvm/test/powerpc/exit.c diff --git a/kvm/test/test/powerpc/helloworld.c b/kvm/test/powerpc/helloworld.c similarity index 100% rename from kvm/test/test/powerpc/helloworld.c rename to kvm/test/powerpc/helloworld.c diff --git a/kvm/test/test/powerpc/io.S b/kvm/test/powerpc/io.S similarity index 100% rename from kvm/test/test/powerpc/io.S rename to kvm/test/powerpc/io.S diff --git a/kvm/test/test/powerpc/spin.S b/kvm/test/powerpc/spin.S similarity index 100% rename from kvm/test/test/powerpc/spin.S rename to kvm/test/powerpc/spin.S diff --git a/kvm/test/test/powerpc/sprg.S b/kvm/test/powerpc/sprg.S similarity index 100% rename from kvm/test/test/powerpc/sprg.S rename to kvm/test/powerpc/sprg.S -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
Still not sure this is a good idea for a couple of reasons: 1. We already have lots of special cases with skb's (frags and fraglist), and skb's travel through a lot of different parts of the kernel. So any new change like this creates lots of exposed points for new bugs. Look at cases like MD5 TCP and netfilter, and forwarding these SKB's to ipsec and ppp and ... 2. SKB's can have infinite lifetime in the kernel. If these buffers come from a fixed size pool in an external device, they can easily all get tied up if you have a slow listener. What happens then? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next
On 06.06.2010, at 18:13, Avi Kivity wrote: > On 06/06/2010 07:09 PM, Avi Kivity wrote: >> From: Avi Kivity >> >> * 'master' of >> ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (1443 >> commits) >> Linux 2.6.35-rc2 >> drm/i915: Move non-phys cursors into the GTT >> ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags >> module: fix bne2 "gave up waiting for init of module libcrc32c" >> module: verify_export_symbols under the lock >> module: move find_module check to end >> module: make locking more fine-grained. >> module: Make module sysfs functions private. >> module: move sysfs exposure to end of load_module >> module: fix kdb's illicit use of struct module_use. >> module: Make the 'usage' lists be two-way >> X25: remove duplicated #include >> tcp: use correct net ns in cookie_v4_check() >> rps: tcp: fix rps_sock_flow_table table updates >> ppp_generic: fix multilink fragment sizes >> syncookies: remove Kconfig text line about disabled-by-default >> ixgbe: only check pfc bits in hang logic if pfc is enabled >> net: check for refcount if pop a stacked dst_entry >> omap: remove BUG_ON for disabled interrupts >> vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim >> failure >> ... >> >> Conflicts: >> arch/powerpc/kernel/ppc_ksyms.c >> > > Alex, can you verify that I merged this correctly (dropping the export)? Yes, dropping was the right choice :). Thank you! Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mmu_notifers, pte_dirty questions
On Sun, Jun 06, 2010 at 03:07:27PM +0300, Avi Kivity wrote: > Why no notifer when testing and clearing the dirty bit? > > (*clear_flush_dirty)(...). > > > static int page_mkclean_one(struct page *page, struct vm_area_struct *vma, > > unsigned long address) > > { > > struct mm_struct *mm = vma->vm_mm; > > pte_t *pte; > > spinlock_t *ptl; > > int ret = 0; > > > > pte = page_check_address(page, mm, address, &ptl, 1); > > if (!pte) > > goto out; > > > > if (pte_dirty(*pte) || pte_write(*pte)) { > > pte_t entry; > > > > flush_cache_page(vma, address, pte_pfn(*pte)); > > entry = ptep_clear_flush_notify(vma, address, pte); > > entry = pte_wrprotect(entry); > > entry = pte_mkclean(entry); > > set_pte_at(mm, address, pte, entry); > > set_pte_at_notify()? without this (or clear_flush_dirty) Linux will > assume all ptes are now clean; if the guest writes to a page nothing > will catch it. > > -> with set_pte_at_notify(), we can drop the spte and mark the page as > dirty, so the next write will re-instantiate the spte > -> with ->clear_flush_dirty(), we can track the dirty state without > dropping the spte. > > > ret = 1; > > } > > > > pte_unmap_unlock(pte, ptl); > > out: > > return ret; > > I'm probably missing something big as I can't see how this works. Under the PT lock it's safe to keep the PTE zero, just the pte must be non zero again before pte_unmap_unlock. The sptes are all gone by the time ptep_clear_flush_notify returns (also gup-fast is stopped with the IPI of the flush) and no spte can be established again before pte_unmap_unlock runs, so it's all safe as far as I can tell. set_pte_at_notify might prevent a minor fault though. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v4 3/3] block: add sheepdog driver for distributed storage support
At Fri, 04 Jun 2010 13:04:00 +0200, Kevin Wolf wrote: > > Am 03.06.2010 18:23, schrieb MORITA Kazutaka: > >>> +static void sd_aio_cancel(BlockDriverAIOCB *blockacb) > >>> +{ > >>> + SheepdogAIOCB *acb = (SheepdogAIOCB *)blockacb; > >>> + > >>> + acb->canceled = 1; > >>> +} > >> > >> Does this provide the right semantics? You haven't really cancelled the > >> request, but you pretend to. So you actually complete the request in the > >> background and then throw the return code away. > >> > >> I seem to remember that posix-aio-compat.c waits at this point for > >> completion of the requests, calls the callbacks and only afterwards > >> returns from aio_cancel when no more requests are in flight. > >> > >> Or if you can really cancel requests, it would be the best option, of > >> course. > >> > > > > Sheepdog cannot cancel the requests which are already sent to the > > servers. So, as you say, we pretend to cancel the requests without > > waiting for completion of them. However, are there any situation > > where pretending to cancel causes problems in practice? > > I'm not sure how often it would happen in practice, but if the guest OS > thinks the old value is on disk when in fact the new one is, this could > lead to corruption. I think if it can happen, even without evidence that > it actually does, it's already relevant enough. > I agree. > > To wait for completion of the requests here, we may need to create > > another thread for processing I/O like posix-aio-compat.c. > > I don't think you need a thread to get the same behaviour, you just need > to call the fd handlers like in the main loop. It would probably be the > first driver doing this, though, and it's not an often used code path, > so it might be a bad idea. > > Maybe it's reasonable to just complete the request with -EIO? This way > the guest couldn't make any assumption about the data written. On the > other hand, it could be unhappy about failed requests, but that's > probably better than corruption. > Completing with -EIO looks good to me. Thanks for the advice. I'll send an updated patch tomorrow. Regards, Kazutaka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: porting fixes regarding kvm-clock and lost irqs to stable qemu-kvm 0.12.4
Am 06.06.2010 um 16:53 schrieb Faidon Liambotis: > Peter Lieven wrote: i was looking for a generic way to disable it in the hypervisor without the need to touch every guests kernel commandline. this would be easy revertible once its all working as expected. >>> This is a huge hack but for the same reason I've made an LD_PRELOAD >>> wrapper that wraps ioctl masking the hsot kernel's kvmclock feature from >>> userspace kvm. I'll be happy to share it if you're interested. >> >> i would be grateful if you can sent it to me offlist. > http://people.debian.org/~paravoid/kvm-noclock-3.tar.gz > >> however, i came up with this hack, >> that seems to work also: > > > Yes, that's equivalent. We didn't use that since we wanted to use our > distribution's binaries and be able to upgrade without modifying the source. thank you. i want to share this patch for stable until -cpu xxx,-kvmclock works in the stable tree. diff -ur qemu-kvm-0.12.4.orig/qemu-kvm.c qemu-kvm-0.12.4/qemu-kvm.c --- qemu-kvm-0.12.4.orig/qemu-kvm.c 2010-05-09 13:05:19.0 +0200 +++ qemu-kvm-0.12.4/qemu-kvm.c 2010-06-06 16:58:12.0 +0200 @@ -54,7 +54,7 @@ int kvm_pit = 1; int kvm_pit_reinject = 1; int kvm_nested = 0; - +int kvm_clock = 1; KVMState *kvm_state; kvm_context_t kvm_context; diff -ur qemu-kvm-0.12.4.orig/qemu-kvm.h qemu-kvm-0.12.4/qemu-kvm.h --- qemu-kvm-0.12.4.orig/qemu-kvm.h 2010-05-09 13:05:19.0 +0200 +++ qemu-kvm-0.12.4/qemu-kvm.h 2010-06-06 17:04:02.0 +0200 @@ -1035,6 +1035,7 @@ extern int kvm_pit; extern int kvm_pit_reinject; extern int kvm_nested; +extern int kvm_clock; extern kvm_context_t kvm_context; struct ioperm_data { diff -ur qemu-kvm-0.12.4.orig/qemu-kvm-x86.c qemu-kvm-0.12.4/qemu-kvm-x86.c --- qemu-kvm-0.12.4.orig/qemu-kvm-x86.c 2010-05-09 13:05:19.0 +0200 +++ qemu-kvm-0.12.4/qemu-kvm-x86.c 2010-06-06 16:46:08.0 +0200 @@ -1259,6 +1259,7 @@ int i, features = 0; for (i = 0; i < ARRAY_SIZE(para_features)-1; i++) { +if (para_features[i].cap == KVM_CAP_CLOCKSOURCE && !kvm_clock) continue; if (kvm_check_extension(kvm_state, para_features[i].cap)) features |= (1 << para_features[i].feature); } diff -ur qemu-kvm-0.12.4.orig/qemu-options.hx qemu-kvm-0.12.4/qemu-options.hx --- qemu-kvm-0.12.4.orig/qemu-options.hx2010-05-09 13:05:19.0 +0200 +++ qemu-kvm-0.12.4/qemu-options.hx 2010-06-06 16:38:46.0 +0200 @@ -1959,6 +1959,9 @@ "-no-kvm-pit disable KVM kernel mode PIT\n") DEF("no-kvm-pit-reinjection", 0, QEMU_OPTION_no_kvm_pit_reinjection, "-no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection\n") +DEF("no-kvm-clock", 0, QEMU_OPTION_no_kvm_clock, +"-no-kvm-clock disable KVM pvclock\n") + #if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(TARGET_IA64) || defined(__linux__) DEF("pcidevice", HAS_ARG, QEMU_OPTION_pcidevice, "-pcidevice host=bus:dev.func[,dma=none][,name=string]\n" diff -ur qemu-kvm-0.12.4.orig/vl.c qemu-kvm-0.12.4/vl.c --- qemu-kvm-0.12.4.orig/vl.c 2010-05-09 13:05:19.0 +0200 +++ qemu-kvm-0.12.4/vl.c2010-06-06 17:05:53.0 +0200 @@ -5530,6 +5530,10 @@ kvm_pit_reinject = 0; break; } +case QEMU_OPTION_no_kvm_clock: { +kvm_clock = 0; +break; +} case QEMU_OPTION_enable_nesting: { kvm_nested = 1; break; > > Regards, > Faidon > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On 06/06/2010 06:02 PM, Steven Rostedt wrote: On Sun, 2010-06-06 at 11:08 +0300, Avi Kivity wrote: Also, I kicked this off in kernelshark, and it made no difference that I can see. This is because kernelshark only evaluates the viewable area of the screen. Neat. Can it also search? Where can I find it? It's in the same repo as trace-cmd: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git I'm still working on it. The latest is in the branch kernelshark-devel. And yes, it does searches. I'll be tracking it. In case you're interested in wishlists, my #1 would be to be able to initiate traces from the GUI (so it's easy to explore the available trace events and enable them interactively). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [COMMIT master] Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into next
On 06/06/2010 07:09 PM, Avi Kivity wrote: From: Avi Kivity * 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (1443 commits) Linux 2.6.35-rc2 drm/i915: Move non-phys cursors into the GTT ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags module: fix bne2 "gave up waiting for init of module libcrc32c" module: verify_export_symbols under the lock module: move find_module check to end module: make locking more fine-grained. module: Make module sysfs functions private. module: move sysfs exposure to end of load_module module: fix kdb's illicit use of struct module_use. module: Make the 'usage' lists be two-way X25: remove duplicated #include tcp: use correct net ns in cookie_v4_check() rps: tcp: fix rps_sock_flow_table table updates ppp_generic: fix multilink fragment sizes syncookies: remove Kconfig text line about disabled-by-default ixgbe: only check pfc bits in hang logic if pfc is enabled net: check for refcount if pop a stacked dst_entry omap: remove BUG_ON for disabled interrupts vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure ... Conflicts: arch/powerpc/kernel/ppc_ksyms.c Alex, can you verify that I merged this correctly (dropping the export)? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make-release: make mtime, owner, group consistent
On Fri, Jun 04, 2010 at 01:46:25PM -0300, Marcelo Tosatti wrote: > On Wed, Jun 02, 2010 at 06:27:20PM +0300, Michael S. Tsirkin wrote: > > Files from git have modification time set to one > > of commit, and owner/group to root. > > Making it so for generated files as well makes > > it easier to generate an identical tarball from git. > > > > Setting owner/group to root is especially important because > > otherwise you must have a user/group with same name > > to generate an identical tarball. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > kvm/scripts/make-release |9 +++-- > > 1 files changed, 7 insertions(+), 2 deletions(-) > > Applied, thanks. Could you re-generate and upload the qemu-kvm-0.12.4 release tarball with this applied? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: rework remove-write-access for a slot
On 06/04/2010 11:14 AM, Lai Jiangshan wrote: - I thought of a different approach to write protection: write protect the L4 sptes, on write fault add write permission to the L4 spte and write protect the L3 sptes that it points to, etc. This method can use the slot bitmap to reduce the number of write faults. However we can reintroduce the slot bitmap if/when we use the method, this shouldn't block the patch. It is very a good approach and it is blazing fast. I have no time to implement it currently, could you update it into the TODO list? Done. +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp) +{ +u64 *spte = rmap_next(kvm, rmapp, NULL); + +while (spte) { +/* avoid RMW */ +if (is_writable_pte(*spte)) +*spte&= ~PT_WRITABLE_MASK; Must use an atomic operation here to avoid losing dirty or accessed bit. Atomic operation is too expensive, I retained the comment "/* avoid RMW */" and wait someone take a good approach for it. You are right, it is an existing problem. I just posted a patchset which fixes the problem, when it's merged please rebase on top. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: porting fixes regarding kvm-clock and lost irqs to stable qemu-kvm 0.12.4
Peter Lieven wrote: >>> i was looking for a generic way to disable it in the hypervisor >>> without the need to touch every guests kernel commandline. >>> this would be easy revertible once its all working as expected. >> This is a huge hack but for the same reason I've made an LD_PRELOAD >> wrapper that wraps ioctl masking the hsot kernel's kvmclock feature from >> userspace kvm. I'll be happy to share it if you're interested. > > i would be grateful if you can sent it to me offlist. http://people.debian.org/~paravoid/kvm-noclock-3.tar.gz > however, i came up with this hack, > that seems to work also: Yes, that's equivalent. We didn't use that since we wanted to use our distribution's binaries and be able to upgrade without modifying the source. Regards, Faidon -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v6 5/6] Inter-VM shared memory PCI device
On 06/05/2010 12:44 PM, Blue Swirl wrote: On Fri, Jun 4, 2010 at 9:45 PM, Cam Macdonell wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=[,shm=] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=[,shm=] [,chardev=][,msi=on][,irqfd=on][,vectors=n][,role=peer|master] -chardev socket,path=,id= (shared memory server is qemu.git/contrib/ivshmem-server) Sample programs and init scripts are in a git repo here: Why is this KVM specific BTW, Posix SHM is available on many platforms? What would happen if kvm_set_foobar functions were not called when KVM is not being used? Is host eventfd support essential? It's not kvm specific, it's atomic-ops-on-shared-memory-are-visible-as-atomic-ops specific, which is currently only available with kvm. When tcg gains true smp support (and not just against other tcg threads) this can work with tcg as well. I guess that needs a host with at least 32/64 bit CAS for 32/64 bit targets respectively, and double that if the target has DCAS. Not sure how targets with ll/sc can be implemented, especially if there are limits as to what can go in between. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On Sun, 2010-06-06 at 11:08 +0300, Avi Kivity wrote: > > Also, I kicked this off in kernelshark, and it made no difference that I > > can see. This is because kernelshark only evaluates the viewable area of > > the screen. > > > > Neat. Can it also search? Where can I find it? > It's in the same repo as trace-cmd: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git I'm still working on it. The latest is in the branch kernelshark-devel. And yes, it does searches. -- Steve -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2933400 ] virtio-blk io errors / data corruption on raw drives > 1 TB
Bugs item #2933400, was opened at 2010-01-16 17:35 Message generated for change (Comment added) made by nevenchannyy You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2933400&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: MaSc82 (masc82) Assigned to: Nobody/Anonymous (nobody) Summary: virtio-blk io errors / data corruption on raw drives > 1 TB Initial Comment: When attaching raw drives > 1 TB, buffer io errors will most likely occur, filesystems get corrupted. Processes (like mkfs.ext4) might freeze completely when filesystems are created on the guest. Here's a typical log excerpt when using mkfs.ext4 on a 1.5 TB drive on a Ubuntu 9.10 guest: (kern.log) Jan 15 20:40:44 q kernel: [ 677.076602] Buffer I/O error on device vde, logical block 366283764 Jan 15 20:40:44 q kernel: [ 677.076607] Buffer I/O error on device vde, logical block 366283765 Jan 15 20:40:44 q kernel: [ 677.076611] Buffer I/O error on device vde, logical block 366283766 Jan 15 20:40:44 q kernel: [ 677.076616] Buffer I/O error on device vde, logical block 366283767 Jan 15 20:40:44 q kernel: [ 677.076621] Buffer I/O error on device vde, logical block 366283768 Jan 15 20:40:44 q kernel: [ 677.076626] Buffer I/O error on device vde, logical block 366283769 (messages) Jan 15 20:40:44 q kernel: [ 677.076534] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076541] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076546] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076599] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076604] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076609] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076613] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076618] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076623] lost page write due to I/O error on vde Jan 15 20:40:44 q kernel: [ 677.076628] lost page write due to I/O error on vde Jan 15 20:45:55 q Backgrounding to notify hosts... (The following entries will repeat infinitely, mkfs.ext4 will not exit and cannot be killed) Jan 15 20:49:27 q kernel: [ 1200.520096] mkfs.ext4 D 0 1839 1709 0x Jan 15 20:49:27 q kernel: [ 1200.520101] 88004e157cb8 0082 88004e157c58 00015880 Jan 15 20:49:27 q kernel: [ 1200.520115] 88004ef6c7c0 00015880 00015880 00015880 Jan 15 20:49:27 q kernel: [ 1200.520118] 00015880 88004ef6c7c0 00015880 00015880 Jan 15 20:49:27 q kernel: [ 1200.520123] Call Trace: Jan 15 20:49:27 q kernel: [ 1200.520157] [] ? sync_page+0x0/0x50 Jan 15 20:49:27 q kernel: [ 1200.520178] [] io_schedule+0x28/0x40 Jan 15 20:49:27 q kernel: [ 1200.520182] [] sync_page+0x3d/0x50 Jan 15 20:49:27 q kernel: [ 1200.520185] [] __wait_on_bit+0x57/0x80 Jan 15 20:49:27 q kernel: [ 1200.520192] [] wait_on_page_bit+0x6e/0x80 Jan 15 20:49:27 q kernel: [ 1200.520205] [] ? wake_bit_function+0x0/0x40 Jan 15 20:49:27 q kernel: [ 1200.520210] [] ? pagevec_lookup_tag+0x20/0x30 Jan 15 20:49:27 q kernel: [ 1200.520213] [] wait_on_page_writeback_range+0xf5/0x190 Jan 15 20:49:27 q kernel: [ 1200.520217] [] filemap_fdatawait+0x27/0x30 Jan 15 20:49:27 q kernel: [ 1200.520220] [] filemap_write_and_wait+0x44/0x50 Jan 15 20:49:27 q kernel: [ 1200.520235] [] __sync_blockdev+0x1f/0x40 Jan 15 20:49:27 q kernel: [ 1200.520239] [] sync_blockdev+0xe/0x10 Jan 15 20:49:27 q kernel: [ 1200.520241] [] block_fsync+0x1a/0x20 Jan 15 20:49:27 q kernel: [ 1200.520249] [] vfs_fsync+0x86/0xf0 Jan 15 20:49:27 q kernel: [ 1200.520252] [] do_fsync+0x39/0x60 Jan 15 20:49:27 q kernel: [ 1200.520255] [] sys_fsync+0xb/0x10 Jan 15 20:49:27 q kernel: [ 1200.520271] [] system_call_fastpath+0x16/0x1b In my case I was switching to virtio at one point, but the behaviour didn't show until there was > 1 TB data on the filesystem. very dangerous. Tested using 2 different SATA controllers, 1.5 TB lvm/mdraid, single 1.5 TB drive and 2 TB lvm/mdraid. The behaviour does not occur with if=scsi or if=ide. #2914397 might be related: https://sourceforge.net/tracker/?func=detail&aid=2914397&group_id=180599&atid=893831 This blog post might also relate: http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/ CPU: Intel Xeon E5430 KVM: qemu-kvm-0.12.1.2 Kernel: 2.6.32.2, x86_64 Guest OS: Verified to occur on guests Ubuntu Linux 9.10 (64-bit) and Gentoo Linux (64-bit) Commandline (atm using ide instead of virtio for large drives as a workaround): qemu-system-x86_64 -S -M p
Re: [PATCH 1/4] KVM: MMU: Introduce drop_spte()
On 06/06/2010 04:06 PM, Avi Kivity wrote: When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. @@ -1498,13 +1502,14 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, ent&= PT64_BASE_ADDR_MASK; mmu_page_remove_parent_pte(page_header(ent), &pt[i]); + pt[i] = shadow_trap_nonpresent_pte; } else { if (is_large_pte(ent)) --kvm->stat.lpages; - rmap_remove(kvm,&pt[i]); + drop_spte(kvm,&pt[i], + shadow_trap_nonpresent_pte); } } - pt[i] = shadow_trap_nonpresent_pte; } } Autotest points out that this transformation (and an identical one in zap_pte) does not preserve the semantics; if the outer if () fails, the new code does not update pt[i]. With the original line after the if () retained, autotest is happier. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On 06/02/2010 07:53 PM, Chris Wright wrote: * Avi Kivity (a...@redhat.com) wrote: The interface would only work for clients which support it: kvm, vhost, and iommu/devices with restartable dma. BTW, there is no such thing as restartable dma. There is a provision in new specs (read: no real hardware) that allows a device to request pages before using them. So it's akin to demand paging, but the demand is an explicit request rather than a page fault. Thanks for the correction. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qemu-kvm] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc
On 06/06/2010 04:13 PM, Ken CC wrote: To make the directory architecture match the file config file kvm/test/config-powerpc.mk. Can you please resent with 'diff.renames' set in your git configuration? That will make the patch reviewable, as it will convert it to a pattern of moves instead of creates/deletes. Also, isn't a Makefile change needed? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: mmu: Remove unused local variable
On 06/03/2010 10:08 AM, Jan Kiszka wrote: From: Jan Kiszka Signed-off-by: Jan Kiszka --- No one else checks for new build warnings? My fault, sorry, I updated the patch in place to remove a dangling "r = something;" and introduced two new warnings. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3][STABLE] KVM: Various issues in virtio_net
On 06/03/2010 10:31 PM, Bruce Rogers wrote: These are patches which we have found useful for our 2.6.32 based SLES 11 SP1 release. The first patch is already upstream, but should be included in stable. The second patch is a subset of another upstream patch. Again, stable material. The third patch solves the last remaining issue we saw when testing kvm configurations with the SUSE certification test suite. Under heavy load, we observed rx stalls (first two patches applied), and this third patch was crafted to address the issue. Please apply to stable. I assume this last problem also exists in more recent kernels than 2.6.32, but I haven't validated that. With these 3 patches applied we no longer see any issues with virito networking using our certification test suite. Please send virtio patches, including patches for the stable branches, to the virtio maintainer (Rusty). Copying Michael (m...@redhat.com) is also a good idea. Also: - use git format-patch/git sent-email or equivalents; these ensure the correct formatting and group the patches together for threaded mail clients - indicate the upstream commit (often it's enough to ask the maintainers to backport a commit) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qemu-kvm] kvm: rename kvm/test/test/powerpc to kvm/test/powerpc
To make the directory architecture match the file config file kvm/test/config-powerpc.mk. Signed-off-by: Ken CC --- kvm/test/powerpc/44x/tlbsx.S | 33 kvm/test/powerpc/44x/tlbwe.S | 27 +++ kvm/test/powerpc/44x/tlbwe_16KB.S | 35 + kvm/test/powerpc/44x/tlbwe_hole.S | 27 +++ kvm/test/powerpc/cstart.S | 38 kvm/test/powerpc/exit.c| 23 +++ kvm/test/powerpc/helloworld.c | 27 +++ kvm/test/powerpc/io.S | 32 +++ kvm/test/powerpc/spin.S|4 +++ kvm/test/powerpc/sprg.S|7 ++ kvm/test/test/powerpc/44x/tlbsx.S | 33 kvm/test/test/powerpc/44x/tlbwe.S | 27 --- kvm/test/test/powerpc/44x/tlbwe_16KB.S | 35 - kvm/test/test/powerpc/44x/tlbwe_hole.S | 27 --- kvm/test/test/powerpc/cstart.S | 38 kvm/test/test/powerpc/exit.c | 23 --- kvm/test/test/powerpc/helloworld.c | 27 --- kvm/test/test/powerpc/io.S | 32 --- kvm/test/test/powerpc/spin.S |4 --- kvm/test/test/powerpc/sprg.S |7 -- 20 files changed, 253 insertions(+), 253 deletions(-) create mode 100644 kvm/test/powerpc/44x/tlbsx.S create mode 100644 kvm/test/powerpc/44x/tlbwe.S create mode 100644 kvm/test/powerpc/44x/tlbwe_16KB.S create mode 100644 kvm/test/powerpc/44x/tlbwe_hole.S create mode 100644 kvm/test/powerpc/cstart.S create mode 100644 kvm/test/powerpc/exit.c create mode 100644 kvm/test/powerpc/helloworld.c create mode 100644 kvm/test/powerpc/io.S create mode 100644 kvm/test/powerpc/spin.S create mode 100644 kvm/test/powerpc/sprg.S delete mode 100644 kvm/test/test/powerpc/44x/tlbsx.S delete mode 100644 kvm/test/test/powerpc/44x/tlbwe.S delete mode 100644 kvm/test/test/powerpc/44x/tlbwe_16KB.S delete mode 100644 kvm/test/test/powerpc/44x/tlbwe_hole.S delete mode 100644 kvm/test/test/powerpc/cstart.S delete mode 100644 kvm/test/test/powerpc/exit.c delete mode 100644 kvm/test/test/powerpc/helloworld.c delete mode 100644 kvm/test/test/powerpc/io.S delete mode 100644 kvm/test/test/powerpc/spin.S delete mode 100644 kvm/test/test/powerpc/sprg.S diff --git a/kvm/test/powerpc/44x/tlbsx.S b/kvm/test/powerpc/44x/tlbsx.S new file mode 100644 index 000..b15874b --- /dev/null +++ b/kvm/test/powerpc/44x/tlbsx.S @@ -0,0 +1,33 @@ +#define SPRN_MMUCR 0x3b2 + +#define TLBWORD0 0x1210 +#define TLBWORD1 0x1000 +#define TLBWORD2 0x0003 + +.global _start +_start: + li r4, 0 + mtspr SPRN_MMUCR, r4 + + li r3, 23 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 0 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 1 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 2 + + lis r4, 0x1000 + tlbsx r5, r4, r0 + cmpwi r5, 23 + beq good + trap + +good: + b . diff --git a/kvm/test/powerpc/44x/tlbwe.S b/kvm/test/powerpc/44x/tlbwe.S new file mode 100644 index 000..ec6ef5c --- /dev/null +++ b/kvm/test/powerpc/44x/tlbwe.S @@ -0,0 +1,27 @@ +#define SPRN_MMUCR 0x3b2 + +/* Create a mapping at 4MB */ +#define TLBWORD0 0x00400210 +#define TLBWORD1 0x0040 +#define TLBWORD2 0x0003 + +.global _start +_start: + li r4, 0 + mtspr SPRN_MMUCR, r4 + + li r3, 23 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 0 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 1 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 2 + + b . diff --git a/kvm/test/powerpc/44x/tlbwe_16KB.S b/kvm/test/powerpc/44x/tlbwe_16KB.S new file mode 100644 index 000..1bd10bf --- /dev/null +++ b/kvm/test/powerpc/44x/tlbwe_16KB.S @@ -0,0 +1,35 @@ +#define SPRN_MMUCR 0x3b2 + +/* 16KB mapping at 4MB */ +#define TLBWORD0 0x00400220 +#define TLBWORD1 0x0040 +#define TLBWORD2 0x0003 + +.global _start +_start: + li r4, 0 + mtspr SPRN_MMUCR, r4 + + li r3, 5 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 0 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 1 + + lis r4, tlbwo...@h + ori r4, r4, tlbwo...@l + tlbwe r4, r3, 2 + + /* load from 4MB */ + lis r3, 0x0040 + lwz r4, 0(r3) + + /* load from 4MB+8KB */ + ori r3, r3, 0x2000 +
[PATCH 1/4] KVM: MMU: Introduce drop_spte()
When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 33 +++-- arch/x86/kvm/paging_tmpl.h | 13 ++--- 2 files changed, 25 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6b2c644..17331c2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -666,6 +666,12 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) } } +static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte) +{ + rmap_remove(kvm, sptep); + __set_spte(sptep, new_spte); +} + static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) { struct kvm_rmap_desc *desc; @@ -731,9 +737,9 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn) BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)); pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn); if (is_writable_pte(*spte)) { - rmap_remove(kvm, spte); + drop_spte(kvm, spte, + shadow_trap_nonpresent_pte); --kvm->stat.lpages; - __set_spte(spte, shadow_trap_nonpresent_pte); spte = NULL; write_protected = 1; } @@ -753,8 +759,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, while ((spte = rmap_next(kvm, rmapp, NULL))) { BUG_ON(!(*spte & PT_PRESENT_MASK)); rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte); - rmap_remove(kvm, spte); - __set_spte(spte, shadow_trap_nonpresent_pte); + drop_spte(kvm, spte, shadow_trap_nonpresent_pte); need_tlb_flush = 1; } return need_tlb_flush; @@ -776,8 +781,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte); need_flush = 1; if (pte_write(*ptep)) { - rmap_remove(kvm, spte); - __set_spte(spte, shadow_trap_nonpresent_pte); + drop_spte(kvm, spte, shadow_trap_nonpresent_pte); spte = rmap_next(kvm, rmapp, NULL); } else { new_spte = *spte &~ (PT64_BASE_ADDR_MASK); @@ -1498,13 +1502,14 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, ent &= PT64_BASE_ADDR_MASK; mmu_page_remove_parent_pte(page_header(ent), &pt[i]); + pt[i] = shadow_trap_nonpresent_pte; } else { if (is_large_pte(ent)) --kvm->stat.lpages; - rmap_remove(kvm, &pt[i]); + drop_spte(kvm, &pt[i], + shadow_trap_nonpresent_pte); } } - pt[i] = shadow_trap_nonpresent_pte; } } @@ -1902,9 +1907,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (level > PT_PAGE_TABLE_LEVEL && has_wrprotected_page(vcpu->kvm, gfn, level)) { ret = 1; - rmap_remove(vcpu->kvm, sptep); - spte = shadow_trap_nonpresent_pte; - goto set_pte; + drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); + goto done; } spte |= PT_WRITABLE_MASK; @@ -1936,6 +1940,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, set_pte: __set_spte(sptep, spte); +done: return ret; } @@ -1972,7 +1977,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } else if (pfn != spte_to_pfn(*sptep)) { pgprintk("hfn old %lx new %lx\n", spte_to_pfn(*sptep), pfn); - rmap_remove(vcpu->kvm, sptep); + drop_spte(vcpu->kvm, sptep, shadow_trap_nonpresent_pte); } else was_rmapped = 1; } @@ -2623,13 +2628,13 @@ static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu, pte = *spte; if (is_shadow_present_
[PATCH 3/4] KVM: MMU: Atomically check for accessed bit when dropping an spte
Currently, in the window between the check for the accessed bit, and actually dropping the spte, a vcpu can access the page through the spte and set the bit, which will be ignored by the mmu. Fix by using an exchange operation to atmoically fetch the spte and drop it. Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 26 +- 1 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f93948d..b565a14 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -290,6 +290,21 @@ static void __set_spte(u64 *sptep, u64 spte) #endif } +static u64 __xchg_spte(u64 *sptep, u64 new_spte) +{ +#ifdef CONFIG_X86_64 + return xchg(sptep, new_spte); +#else + u64 old_spte; + + do { + old_spte = *sptep; + } while (cmpxchg64(sptep, old_spte, new_spte) != old_spte); + + return old; +#endif +} + static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, struct kmem_cache *base_cache, int min) { @@ -661,16 +676,17 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte) { pfn_t pfn; + u64 old_spte; - if (!is_rmap_spte(*sptep)) + old_spte = __xchg_spte(sptep, new_spte); + if (!is_rmap_spte(old_spte)) return; - pfn = spte_to_pfn(*sptep); - if (*sptep & shadow_accessed_mask) + pfn = spte_to_pfn(old_spte); + if (old_spte & shadow_accessed_mask) kvm_set_pfn_accessed(pfn); - if (is_writable_pte(*sptep)) + if (is_writable_pte(old_spte)) kvm_set_pfn_dirty(pfn); rmap_remove(kvm, sptep); - __set_spte(sptep, new_spte); } static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: MMU: Don't drop accessed bit while updating an spte
__set_spte() will happily replace an spte with the accessed bit set with one that has the accessed bit clear. Add a helper update_spte() which checks for this condition and updates the page flag if needed. Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 25 + 1 files changed, 21 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b565a14..4c98726 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -305,6 +305,19 @@ static u64 __xchg_spte(u64 *sptep, u64 new_spte) #endif } +static void update_spte(u64 *sptep, u64 new_spte) +{ + u64 old_spte; + + if (!shadow_accessed_mask || (new_spte & shadow_accessed_mask)) { + __set_spte(sptep, new_spte); + } else { + old_spte = __xchg_spte(sptep, new_spte); + if (old_spte & shadow_accessed_mask) + mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte))); + } +} + static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, struct kmem_cache *base_cache, int min) { @@ -730,7 +743,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn) BUG_ON(!(*spte & PT_PRESENT_MASK)); rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte); if (is_writable_pte(*spte)) { - __set_spte(spte, *spte & ~PT_WRITABLE_MASK); + update_spte(spte, *spte & ~PT_WRITABLE_MASK); write_protected = 1; } spte = rmap_next(kvm, rmapp, spte); @@ -786,7 +799,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, unsigned long data) { int need_flush = 0; - u64 *spte, new_spte; + u64 *spte, new_spte, old_spte; pte_t *ptep = (pte_t *)data; pfn_t new_pfn; @@ -806,9 +819,13 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, new_spte &= ~PT_WRITABLE_MASK; new_spte &= ~SPTE_HOST_WRITEABLE; + new_spte &= ~shadow_accessed_mask; if (is_writable_pte(*spte)) kvm_set_pfn_dirty(spte_to_pfn(*spte)); - __set_spte(spte, new_spte); + old_spte = __xchg_spte(spte, new_spte); + if (is_shadow_present_pte(old_spte) + && (old_spte & shadow_accessed_mask)) + mark_page_accessed(pfn_to_page(spte_to_pfn(old_spte))); spte = rmap_next(kvm, rmapp, spte); } } @@ -1956,7 +1973,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, mark_page_dirty(vcpu->kvm, gfn); set_pte: - __set_spte(sptep, spte); + update_spte(sptep, spte); done: return ret; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Fix accessed bit tracking
The kvm mmu synchronizes shadow ptes using the mmu lock, however the cpu will happily ignore the lock when setting the accessed bit. This can cause the accessed bit to be lost. Luckily this only results in incorrect page selection for swap. This patchset fixes the problem by atomically updating the spte when needed while taking care of the accessed bit. Avi Kivity (4): KVM: MMU: Introduce drop_spte() KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte() KVM: MMU: Atomically check for accessed bit when dropping an spte KVM: MMU: Don't drop accessed bit while updating an spte arch/x86/kvm/mmu.c | 91 +++ arch/x86/kvm/paging_tmpl.h | 13 +++--- 2 files changed, 71 insertions(+), 33 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte()
Since we need to make the check atomic, move it to the place that will set the new spte. Signed-off-by: Avi Kivity --- arch/x86/kvm/mmu.c | 17 + 1 files changed, 9 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 17331c2..f93948d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -620,19 +620,11 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) struct kvm_rmap_desc *desc; struct kvm_rmap_desc *prev_desc; struct kvm_mmu_page *sp; - pfn_t pfn; gfn_t gfn; unsigned long *rmapp; int i; - if (!is_rmap_spte(*spte)) - return; sp = page_header(__pa(spte)); - pfn = spte_to_pfn(*spte); - if (*spte & shadow_accessed_mask) - kvm_set_pfn_accessed(pfn); - if (is_writable_pte(*spte)) - kvm_set_pfn_dirty(pfn); gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); rmapp = gfn_to_rmap(kvm, gfn, sp->role.level); if (!*rmapp) { @@ -668,6 +660,15 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte) { + pfn_t pfn; + + if (!is_rmap_spte(*sptep)) + return; + pfn = spte_to_pfn(*sptep); + if (*sptep & shadow_accessed_mask) + kvm_set_pfn_accessed(pfn); + if (is_writable_pte(*sptep)) + kvm_set_pfn_dirty(pfn); rmap_remove(kvm, sptep); __set_spte(sptep, new_spte); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mmu_notifers, pte_dirty questions
Why no notifer when testing and clearing the dirty bit? (*clear_flush_dirty)(...). static int page_mkclean_one(struct page *page, struct vm_area_struct *vma, unsigned long address) { struct mm_struct *mm = vma->vm_mm; pte_t *pte; spinlock_t *ptl; int ret = 0; pte = page_check_address(page, mm, address, &ptl, 1); if (!pte) goto out; if (pte_dirty(*pte) || pte_write(*pte)) { pte_t entry; flush_cache_page(vma, address, pte_pfn(*pte)); entry = ptep_clear_flush_notify(vma, address, pte); entry = pte_wrprotect(entry); entry = pte_mkclean(entry); set_pte_at(mm, address, pte, entry); set_pte_at_notify()? without this (or clear_flush_dirty) Linux will assume all ptes are now clean; if the guest writes to a page nothing will catch it. -> with set_pte_at_notify(), we can drop the spte and mark the page as dirty, so the next write will re-instantiate the spte -> with ->clear_flush_dirty(), we can track the dirty state without dropping the spte. ret = 1; } pte_unmap_unlock(pte, ptl); out: return ret; I'm probably missing something big as I can't see how this works. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Print a user-friendly message on failed vmentry
This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. Failed vmentries were handled with handle_unhandled() which prints a rather unfriendly message to the user. This patch separates handling vmentry failures from unknown exit reasons and prints a friendly message to the user. Signed-off-by: Mohammed Gamal --- qemu-kvm.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 9534b31..1a66f55 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -106,6 +106,24 @@ static int handle_unhandled(uint64_t reason) return -EINVAL; } +#define VMX_INVALID_GUEST_STATE 0x8021 + +static int handle_failed_vmentry(uint64_t reason) +{ +fprintf(stderr, "kvm: vm entry failed with error 0x%" PRIx64 "\n\n", reason); + +/* Perhaps we will need to check if this machine is intel since exit reason 0x21 + has a different interpretation on SVM */ +if (reason == VMX_INVALID_GUEST_STATE) { +fprintf(stderr, "If you're runnning a guest on an Intel machine without\n"); +fprintf(stderr, "unrestricted mode support, the failure can be most likely\n"); +fprintf(stderr, "due to the guest entering an invalid state for Intel VT.\n"); +fprintf(stderr, "For example, the guest maybe running in big real mode\n"); +fprintf(stderr, "which is not supported on less recent Intel processors.\n\n"); +} + +return -EINVAL; +} static inline void set_gsi(kvm_context_t kvm, unsigned int gsi) { @@ -586,7 +604,7 @@ int kvm_run(CPUState *env) r = handle_unhandled(run->hw.hardware_exit_reason); break; case KVM_EXIT_FAIL_ENTRY: -r = handle_unhandled(run->fail_entry.hardware_entry_failure_reason); +r = handle_failed_vmentry(run->fail_entry.hardware_entry_failure_reason); break; case KVM_EXIT_EXCEPTION: fprintf(stderr, "exception %d (%x)\n", run->ex.exception, -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/7] KVM: MMU: skip invalid sp when unprotect page
On 06/04/2010 04:52 PM, Xiao Guangrong wrote: In kvm_mmu_unprotect_page(), the invalid sp can be skipped Applied all, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers
On Thu, Jun 03, 2010 at 02:41:38PM -0700, Tom Lyon wrote: > OK, in the interest of making progress, I am about to embark on the following: > > 1. Create a user-iommu-domain driver - opening it will give a new empty > domain. > Ultimately this can also populate sysfs with the state of its world, > which would > also be a good addition to the base iommu stuff. > If someone closes the fd while in use, the domain stays valid anyway > until users > drop off. > > 2. Add DOMAIN_SET and DOMAIN_UNSET ioctls to the vfio driver. Require that >a domain be set before using the VFIO_DMA_MAP_IOVA ioctl Require domain to be set before you allow any access to the device: mmap, write, read. IMO this is the only safe way to make sure userspace does not corrupt memory, and this removes the need to special-case MSI memory, play with bus master enable and hope it can be cleared without reset, etc. > (this is the one >that KVM wants). Not sure I understand. I think that MAP should be done on the domain, not the device, this handles pinning pages correctly and this way you don't need any special checks. >However, the VFIO_DMA_MAP_ANYWHERE ioctl is the one >which uses the dma_sg interface which has no expicit control of domains. I >intend to keep it the way it is, but expect only non-hypervisor programs > would >want to use it. If we support MAP_IOVA, why is MAP_ANYWHERE useful? Can't non-hypervisors just pick an address? > 3. Clean up the docs and other nits that folks have found. > > Comments? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 1/2] virtio: support layout with avail ring before idx
On Sat, Jun 05, 2010 at 01:40:26PM +0930, Rusty Russell wrote: > On Fri, 4 Jun 2010 09:12:05 pm Michael S. Tsirkin wrote: > > On Fri, Jun 04, 2010 at 08:46:49PM +0930, Rusty Russell wrote: > > > I'm uncomfortable with moving a field. > > > > > > We haven't done that before and I wonder what will break with old code. > > > > With e.g. my patch, We only do this conditionally when bit is negotitated. > > Of course, but see this change: > > commit ef688e151c00e5d529703be9a04fd506df8bc54e > Author: Rusty Russell > Date: Fri Jun 12 22:16:35 2009 -0600 > > virtio: meet virtio spec by finalizing features before using device > > Virtio devices are supposed to negotiate features before they start using > the device, but the current code doesn't do this. This is because the > driver's probe() function invariably has to add buffers to a virtqueue, > or probe the disk (virtio_blk). > > This currently doesn't matter since no existing backend is strict about > the feature negotiation. But it's possible to imagine a future feature > which completely changes how a device operates: in this case, we'd need > to acknowledge it before using the device. > > Signed-off-by: Rusty Russell > > Now, this isn't impossible to overcome: we know that if they use the ring > before completing feature negotiation then they don't understand the new > format. > > But we have to be aware of that on the qemu side. Are we? I think we are ok. virtqueue_init which sets the avail/ysed pointers is called when we write the base address. So we only need to be careful and not change this feature bit after creating the rings. > > > Should we instead just abandon the flags field and use last_used only? > > > Or, more radically, put flags == last_used when the feature is on? > > > > > > Thoughts? > > > Rusty. > > > > Hmm, e.g. with TX and virtio net, we almost never want interrupts, > > whatever the index value. > > Good point. OK, I give in, I'll take your patch which moves the fields > to the end. Is that your preference? Yes, I think so. You mean PATCHv3 unchanged with 254 byte padding? > Please be careful with the qemu side though... > > It's not inconceivable that I'll write that virtio cacheline simulator this > (coming) week, too... > > Thanks. > Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf trace event parse errors for KVM events
On 06/04/2010 12:57 AM, Steven Rostedt wrote: On Tue, 2010-06-01 at 15:39 +0300, Avi Kivity wrote: On 06/01/2010 02:59 PM, Steven Rostedt wrote: I meant that viewing would be slowed down. It's an important part of using ftrace! How long does the Python formatter take to process 100k or 1M events? I finally got around to testing this. I ran a trace on lock_acquire, and traced 1,253,296 events. I then created a python plugin to analyze the trace: def lock_acquire(trace_seq, event): t = '' r = '' if int(event['flags'])& 1: t = 'try' if int(event['flags'])& 2: r = 'read' trace_seq.puts('t %x %s%s%s' % ( event['lockdep_addr'], t, r, event['name'])) def register(pevent): pevent.register_event_handler("lock", "lock_acquire", lock_acquire) Disclaimer, I'm not a python expert, and I'm sure the above python code sucks. I think Python more or less guarantees your code will suck no matter what you do. [r...@ixf9 trace-cmd.git]# time ./trace-cmd report -N>/dev/null 2>&1 real0m4.653s user0m4.234s sys 0m0.419s * -N keeps trace-cmd from loading any plugins. [r...@ixf9 trace-cmd.git]# time PYTHONPATH=`pwd` ./trace-cmd report>/dev/null 2>&1 real0m53.916s user0m53.047s sys 0m0.859s Yes, running a python interpreter is a bit more expensive. It took 4 seconds to read the million events with plain C, but 53 seconds to read it in python. That said... This would only affect you if you were writing this to a file. I doubt that you would notice this if you were scanning the trace with less. I'm more worried about searching with less. But a minute for a million events isn't that bad. Also, I kicked this off in kernelshark, and it made no difference that I can see. This is because kernelshark only evaluates the viewable area of the screen. Neat. Can it also search? Where can I find it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH]: kdump on KVM
On 06/04/2010 11:11 PM, Chris Lalancette wrote: Hello, I've gone back to look at the problem with kdump while running as a KVM guest. I'll preface this discussion by noting that all of the problems I describe below only happen when the HPET emulation is shutoff. However, since there still seem to be some lingering issues with the HPET emulation, running without the HPET is a reasonable mode of operation, and one in which it would be nice to have kexec/kdump working. The basic problem with getting kdump working is that currently KVM delivers the timer interrupts only to vcpu 0, so kexecing onto another processor will never see timer interrupts. In the KVM code, there are a number of places where we make incorrect assumptions regarding the timer and which vcpu to deliver it to: 1) In the i8254.c code when the hrtimer representing the PIT expires. In this case, when we get the callback, we kick only the BSP. 2) In the i8254.c code when a vcpu is migrated from one processor to another. In this case we only migrate the PIT timer if the vcpu to be migrated is the BSP. 3) In the lapic code when deciding whether to accept a PIC interrupt, we only accept interrupts on the BSP. 4) In the irq_comm.c code when calling kvm_irq_delivery_to_apic(). The problem here is that we don't take into account the fact that an LAPIC might be disabled when trying to deliver an interrupt in DM_LOWEST mode. Further, on a kexec, the processor that we are kexec'ing *to* gets it's APIC ID re-written to the BSP APIC ID. What it means in the end is that we are currently still matching against the BSP even though vcpu 1 (where the kexec is happening) would match if we let it. The attached patch takes care of 1), 3), and 4), and works in my testing (it needs to be cleaned up a bit to not be so inefficient, but it should work). However, problem 2) is pretty sticky. The reason we are currently migrating the PIT timer around with the BSP is pretty well explained in commit 2f5997140f22f68f6390c49941150d3fa8a95cb7. With my new patch, though, we are no longer guaranteeing that we are going to inject onto CPU 0. I think we can do something where when the hrtimer expires, we figure out which vcpu will get the timer interrupt and IPI to that physical processor to cause the VMEXIT. Unfortunately it's racy because the expiration of the hrtimer is de-coupled from the setting of the IRQ for the interrupt. It's only racy because of (perhaps not immature) optimization. Logically, the flow of events is as follows: 1. the httimer fires 2. irq0 is toggled (perhaps via a workqueue since the hrtimer is in interrupt context) 3. the PIC and IOAPIC state machines run and decide which local APIC(s), if any, need to be notified 4. the local APICs are IPIed. Perhaps we should start by deoptimizing the code to work like the above. In light of the below, I'm not sure the race actually exists. That means that the hrtimer could expire, we could choose vcpu 2 (say), IPI to cause a VMEXIT, but by the time it goes to VMENTER the guest has changed something in the (IO)APIC(s) so that the set_irq logic chooses vcpu 3 to do the injection. This would result in a delayed injection that 2f5997 is trying to avoid. Not sure that's the race here. Between the vmexit and vmenter we have the following trace: kvm_inject_pending_timer_irqs() kvm_inject_pit_timer_irqs() __inject_pit_timer_intr() kvm_set_irq() So, even if we end up on the wrong vcpu, we will IPI the right one later. Taking a step back, it seems to me that something along the lines of my previous patchset (where we do set_irq directly from the hrtimer callback) is the right way to go. We would still need to IPI to the appropriate physical cpu to cause a VMEXIT on the cpu we care about, but we would avoid the race I describe in the previous paragraph. I agree. But we also want the vcpu that accepts the interrupt to pull the hrtimer after it (like it does now) to minimize cross-cpu talk. Unfortunately that patchset is also much more risky. I'd like a minimal, backportable patchset first, then bigger changes later. We have at least two choices, irq-safe pit and ioapic, and using a workqueue for to match impedances. The latter is a lot safer since we already do it for assigned devices. diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 3c4f8f6..bc40143 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -231,7 +231,13 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu) { struct kvm_pit *pit = vcpu->kvm->arch.vpit; - if (pit&& kvm_vcpu_is_bsp(vcpu)&& pit->pit_state.irq_ack) + /* We only want to inject a single IRQ for each PIT timer that goes +* off. If 2 vcpus happen to come in here at the same time, we may +* erroneously tell one of them that it has a timer pending. That's +* OK, though; in kvm_inject_pit_timer_irqs we will make sure to only +* let one of