Re: [Qemu-devel] qemu crashed when starting vm(kvm) with vnc connect
On Tue, Apr 02, 2013 at 09:02:02AM +, Zhanghaoyu (A) wrote: > I start a kvm VM with vnc(using the zrle protocol) connect, sometimes qemu > program crashed during starting period, received signal SIGABRT. > Trying about 20 times, this crash may be reproduced. > I guess the cause memory corruption or double free. Which version of QEMU are you running? Please try qemu.git/master. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 6/6] kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
Am 05.04.2013 um 00:35 schrieb Scott Wood : > On 04/04/2013 05:30:05 PM, Alexander Graf wrote: >> Am 04.04.2013 um 20:41 schrieb Scott Wood : >> > On 04/04/2013 07:54:20 AM, Alexander Graf wrote: >> >> On 03.04.2013, at 03:57, Scott Wood wrote: >> >> > +if (opp->mpic_mode_mask == GCR_MODE_PROXY) >> >> Shouldn't this be an &? >> > >> > The way the mode field was originally documented was a two-bit field, >> > where 0b11 was external proxy, and 0b10 was reserved. If we use & it >> > would have to be: >> > >> >if ((opp->mpic_mode_mask & GCR_MODE_PROXY) == GCR_MODE_PROXY) >> >... >> > >> > Simply testing "opp->mpic_mode_mask & GCR_MODE_PROXY" would return true in >> > the case of GCR_MODE_MIXED. >> > >> > In MPIC 4.3 external proxy is defined as a separate bit (GCR[CI]) that is >> > ignored if the mixed-mode bit (GCR[M]) is not set, which makes it a bit >> > more legitimate to view it as a bitmap. Still, I doubt we'll see new mode >> > bits. >> Ok, please add a comment about this here then :). > > What sort of comment would you like? Or do you want me to use the "(x & y) > == y" version? /* This might need to be changed if GCR gets extended */ > >> >> > @@ -460,6 +464,13 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) >> >> >tasklet_kill(&vcpu->arch.tasklet); >> >> > >> >> >kvmppc_remove_vcpu_debugfs(vcpu); >> >> > + >> >> > +switch (vcpu->arch.irq_type) { >> >> > +case KVMPPC_IRQ_MPIC: >> >> > +kvmppc_mpic_put(vcpu->arch.mpic); >> >> This doesn't tell the MPIC that this exact CPU is getting killed. What if >> >> we hotplug remove just a single CPU? Don't we have to deregister the CPU >> >> with the MPIC? >> > >> > If we ever support hot vcpu removal, yes. We'd probably need some MPIC >> > code changes to accommodate that, and we wouldn't currently have a way to >> > test it, so I'd rather make it obviously not supported for now. >> Is there any way to break heavily if user space attempts this? > > Is there any way for userspace to request this currently? They can close the > vcpu fd, but the vcpu won't actually be destroyed until the vm goes down. Are you sure? X86 does CPU hotplug today, so there has to be something :) Alex > > -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 1/6] kvm: add device control API
On Thu, Apr 04, 2013 at 01:41:35PM +0300, Gleb Natapov wrote: > Since now each device has its own fd is it an advantage to enforce > common interface between different devices? If we do so though why > not handle file creation, ioctl and file descriptor lifetime in the > common code. Common code will have "struct kvm_device" with "struct > kvm_device_arch" and "struct kvm_device_ops" members. Instead of > kvm_mpic_ioctl there will be kvm_device_ioctl which will despatch ioctls > to a device using kvm_device->ops->(set|get|has)_attr pointers. I thought about making the same request, but when I looked at it, the amount of code that could be made common in this way is pretty tiny, and doing that involves a bit of extra complexity, so I thought that on the whole it wouldn't be worthwhile. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] kvm/arm: Fix missing include build error
Hi, On Thu, 2013-04-04 at 17:15 -0700, Christoffer Dall wrote: > On Thu, Apr 4, 2013 at 5:04 PM, Geoff Levand wrote: > > > Sorry, this one is from Marc's kvm-for-next branch. I'll send it > > to him to include, unless you just want to take it in preparation. > > > Hmm, not sure what the status of his branch is exactly. Does it > contain the arm64 stuff? Yes, it is pretty much his most recent. > I don't really want to merge this before it's needed. OK, that's fine. I'll just add it to a few other 64 bit specific fixups I'll be sending to Marc. -Geoff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] kvm/arm: Fix missing include build error
On Thu, Apr 4, 2013 at 5:04 PM, Geoff Levand wrote: > Hi Christoffer, > > On Thu, 2013-04-04 at 16:51 -0700, Christoffer Dall wrote: >> On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: >> > Include linux/cpu.h in kvm/arm.c. Fixes build errors like >> > these with ARCH=arm64: >> > >> > arch/arm/kvm/arm.c: error: ‘CPU_STARTING_FROZEN’ undeclared >> > >> > Signed-off-by: Geoff Levand >> > --- >> > arch/arm/kvm/arm.c |1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> > index 5a93698..49b4bc6 100644 >> > --- a/arch/arm/kvm/arm.c >> > +++ b/arch/arm/kvm/arm.c >> > @@ -24,6 +24,7 @@ >> > #include >> > #include >> > #include >> > +#include >> > #include >> > #include >> > >> > -- >> > >> >> I'm confused, I don't see this symbol in arm.c - against which tree do >> these patches apply exactly? If this is something introduced by arm64, >> then it's premature, and should be added to that series. > > Sorry, this one is from Marc's kvm-for-next branch. I'll send it > to him to include, unless you just want to take it in preparation. > Hmm, not sure what the status of his branch is exactly. Does it contain the arm64 stuff? I don't really want to merge this before it's needed. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] kvm/arm: Fix missing include build error
Hi Christoffer, On Thu, 2013-04-04 at 16:51 -0700, Christoffer Dall wrote: > On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: > > Include linux/cpu.h in kvm/arm.c. Fixes build errors like > > these with ARCH=arm64: > > > > arch/arm/kvm/arm.c: error: ‘CPU_STARTING_FROZEN’ undeclared > > > > Signed-off-by: Geoff Levand > > --- > > arch/arm/kvm/arm.c |1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > > index 5a93698..49b4bc6 100644 > > --- a/arch/arm/kvm/arm.c > > +++ b/arch/arm/kvm/arm.c > > @@ -24,6 +24,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > > > -- > > > > I'm confused, I don't see this symbol in arm.c - against which tree do > these patches apply exactly? If this is something introduced by arm64, > then it's premature, and should be added to that series. Sorry, this one is from Marc's kvm-for-next branch. I'll send it to him to include, unless you just want to take it in preparation. -Geoff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] KVM minor fixups
On Thu, Apr 4, 2013 at 4:57 PM, Geoff Levand wrote: > Hi Christoffer, > > On Thu, 2013-04-04 at 16:53 -0700, Christoffer Dall wrote: >> On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: >> > Hi Marcelo, >> > >> > These are a few fixups I found when running sparse and building Marc's 64 >> > bit >> > ARM tree. Please consider for 3.10. >> >> "running sparse" - what does this mean? > > make C=1. It runs the checker, by default, sparse. > > http://en.wikipedia.org/wiki/Sparse > got it, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] kvm/arm: Remove unused kvm_arch_set_memory_region
On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: > Remove the unused and empty routine kvm_arch_set_memory_region(). > > Signed-off-by: Geoff Levand > --- > arch/arm/kvm/arm.c |8 > 1 file changed, 8 deletions(-) > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index 86becdc..a67b798 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -221,14 +221,6 @@ long kvm_arch_dev_ioctl(struct file *filp, > return -EINVAL; > } > > -int kvm_arch_set_memory_region(struct kvm *kvm, > - struct kvm_userspace_memory_region *mem, > - struct kvm_memory_slot old, > - int user_alloc) > -{ > - return 0; > -} > - > int kvm_arch_prepare_memory_region(struct kvm *kvm, >struct kvm_memory_slot *memslot, >struct kvm_memory_slot old, > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html and I can take this one too. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] KVM minor fixups
Hi Christoffer, On Thu, 2013-04-04 at 16:53 -0700, Christoffer Dall wrote: > On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: > > Hi Marcelo, > > > > These are a few fixups I found when running sparse and building Marc's 64 > > bit > > ARM tree. Please consider for 3.10. > > "running sparse" - what does this mean? make C=1. It runs the checker, by default, sparse. http://en.wikipedia.org/wiki/Sparse -Geoff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] kvm/arm: Make force_vm_exit static
On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: > The routine force_vm_exit() is not referenced outside kvm/arm.c, > so make it have static linkage. > > Signed-off-by: Geoff Levand > --- > arch/arm/include/asm/kvm_host.h |1 - > arch/arm/kvm/arm.c |2 +- > 2 files changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index d1736a5..1e93cef 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -149,7 +149,6 @@ struct kvm_one_reg; > int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); > int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); > u64 kvm_call_hyp(void *hypfn, ...); > -void force_vm_exit(const cpumask_t *mask); > > #define KVM_ARCH_WANT_MMU_NOTIFIER > struct kvm; > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index 49b4bc6..86becdc 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -400,7 +400,7 @@ static void exit_vm_noop(void *info) > { > } > > -void force_vm_exit(const cpumask_t *mask) > +static void force_vm_exit(const cpumask_t *mask) > { > smp_call_function_many(mask, exit_vm_noop, NULL, true); > } > -- > 1.7.9.5 > > I can take this one. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] KVM minor fixups
On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: > Hi Marcelo, > > These are a few fixups I found when running sparse and building Marc's 64 bit > ARM tree. Please consider for 3.10. "running sparse" - what does this mean? -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] kvm/arm: Fix missing include build error
On Thu, Apr 4, 2013 at 4:33 PM, Geoff Levand wrote: > Include linux/cpu.h in kvm/arm.c. Fixes build errors like > these with ARCH=arm64: > > arch/arm/kvm/arm.c: error: ‘CPU_STARTING_FROZEN’ undeclared > > Signed-off-by: Geoff Levand > --- > arch/arm/kvm/arm.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index 5a93698..49b4bc6 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > #include > #include > > -- > I'm confused, I don't see this symbol in arm.c - against which tree do these patches apply exactly? If this is something introduced by arm64, then it's premature, and should be added to that series. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 1/6] kvm: add device control API
On 04/04/2013 05:41:35 AM, Gleb Natapov wrote: On Tue, Apr 02, 2013 at 08:57:48PM -0500, Scott Wood wrote: > +struct kvm_device_attr { > + __u32 flags; /* no flags currently defined */ > + __u32 group; /* device-defined */ > + __u64 attr; /* group-defined */ > + __u64 addr; /* userspace address of attr data */ > +}; > + Since now each device has its own fd is it an advantage to enforce common interface between different devices? I think so, even if only to avoid repeating the various pains surrounding adding ioctls. Not necessarily "enforce", just enable. If a device has some sort of command that does not fit neatly into the "set or get" model, it could still add a new ioctl. If we do so though why not handle file creation, ioctl and file descriptor lifetime in the common code. Common code will have "struct kvm_device" with "struct kvm_device_arch" and "struct kvm_device_ops" members. Instead of kvm_mpic_ioctl there will be kvm_device_ioctl which will despatch ioctls to a device using kvm_device->ops->(set|get|has)_attr pointers. So make it more like the pre-fd version, except for the actual fd usage? It would make destruction a bit simpler (assuming there's no need for vcpu destruction code to access a device). Hopefully nobody asks me to change it back again, though. :-) -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] kvm: Make local routines static
The routines get_user_page_nowait(), kvm_io_bus_sort_cmp(), kvm_io_bus_insert_dev() and kvm_io_bus_get_first_dev() are only referenced within kvm_main.c, so give them static linkage. Fixes sparse warnings like these: virt/kvm/kvm_main.c: warning: symbol 'get_user_page_nowait' was not declared. Should it be static? Signed-off-by: Geoff Levand --- virt/kvm/kvm_main.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index adc68fe..82ca8e2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1099,7 +1099,7 @@ static int kvm_read_hva_atomic(void *data, void __user *hva, int len) return __copy_from_user_inatomic(data, hva, len); } -int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm, +static int get_user_page_nowait(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int write, struct page **page) { int flags = FOLL_TOUCH | FOLL_NOWAIT | FOLL_HWPOISON | FOLL_GET; @@ -2631,7 +2631,7 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus) kfree(bus); } -int kvm_io_bus_sort_cmp(const void *p1, const void *p2) +static int kvm_io_bus_sort_cmp(const void *p1, const void *p2) { const struct kvm_io_range *r1 = p1; const struct kvm_io_range *r2 = p2; @@ -2643,7 +2643,7 @@ int kvm_io_bus_sort_cmp(const void *p1, const void *p2) return 0; } -int kvm_io_bus_insert_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev, +static int kvm_io_bus_insert_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev, gpa_t addr, int len) { bus->range[bus->dev_count++] = (struct kvm_io_range) { @@ -2658,7 +2658,7 @@ int kvm_io_bus_insert_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev, return 0; } -int kvm_io_bus_get_first_dev(struct kvm_io_bus *bus, +static int kvm_io_bus_get_first_dev(struct kvm_io_bus *bus, gpa_t addr, int len) { struct kvm_io_range *range, key; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] kvm: Move vm_list, kvm_lock dec's out of x86
The variables vm_list and kvm_lock are common to all architectures, so move the declarations from arch/x86/include/asm/kvm_host.h to include/linux/kvm_host.h. Fixes sparse warnings like these when building for arm64: virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static? virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static? Signed-off-by: Geoff Levand --- arch/x86/include/asm/kvm_host.h |3 --- include/linux/kvm_host.h|3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4979778..40fc39f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -94,9 +94,6 @@ #define ASYNC_PF_PER_VCPU 64 -extern raw_spinlock_t kvm_lock; -extern struct list_head vm_list; - struct kvm_vcpu; struct kvm; struct kvm_async_pf; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index cad77fe..8ef2212 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -133,6 +133,9 @@ struct kvm; struct kvm_vcpu; extern struct kmem_cache *kvm_vcpu_cache; +extern raw_spinlock_t kvm_lock; +extern struct list_head vm_list; + struct kvm_io_range { gpa_t addr; int len; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] kvm/arm: Make force_vm_exit static
The routine force_vm_exit() is not referenced outside kvm/arm.c, so make it have static linkage. Signed-off-by: Geoff Levand --- arch/arm/include/asm/kvm_host.h |1 - arch/arm/kvm/arm.c |2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index d1736a5..1e93cef 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -149,7 +149,6 @@ struct kvm_one_reg; int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); u64 kvm_call_hyp(void *hypfn, ...); -void force_vm_exit(const cpumask_t *mask); #define KVM_ARCH_WANT_MMU_NOTIFIER struct kvm; diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 49b4bc6..86becdc 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -400,7 +400,7 @@ static void exit_vm_noop(void *info) { } -void force_vm_exit(const cpumask_t *mask) +static void force_vm_exit(const cpumask_t *mask) { smp_call_function_many(mask, exit_vm_noop, NULL, true); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] kvm: Move kvm_spurious_fault dec out of x86
The routine kvm_spurious_fault() is a common kvm routine, so move its declaration from arch/x86/include/asm/kvm_host.h to arch/arm/include/asm/kvm_host.h. Fixes sparse warning when building on arm64: virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static? Signed-off-by: Geoff Levand --- arch/x86/include/asm/kvm_host.h |6 -- include/linux/kvm_host.h|7 +++ 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 40fc39f..da7c126 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -964,12 +964,6 @@ enum { #define HF_IRET_MASK (1 << 4) #define HF_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */ -/* - * Hardware virtualization extension instructions may fault if a - * reboot turns off virtualization while processes are running. - * Trap the fault and ignore the instruction if that happens. - */ -asmlinkage void kvm_spurious_fault(void); extern bool kvm_rebooting; #define kvm_handle_fault_on_reboot(insn, cleanup_insn) \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8ef2212..4bfb062 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1031,6 +1031,13 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) } } +/* + * Hardware virtualization extension instructions may fault if a + * reboot turns off virtualization while processes are running. + * Trap the fault and ignore the instruction if that happens. + */ +asmlinkage void kvm_spurious_fault(void); + #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] kvm: Move kvm_rebooting dec out of x86
The variable kvm_rebooting is a common kvm variable, so move its declaration from arch/x86/include/asm/kvm_host.h to arch/arm/include/asm/kvm_host.h. Fixes sparse warning when building on arm64: virt/kvm/kvm_main.c:warning: symbol 'kvm_rebooting' was not declared. Should it be static? Signed-off-by: Geoff Levand --- arch/x86/include/asm/kvm_host.h |2 -- include/linux/kvm_host.h|1 + 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index da7c126..225139a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -964,8 +964,6 @@ enum { #define HF_IRET_MASK (1 << 4) #define HF_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */ -extern bool kvm_rebooting; - #define kvm_handle_fault_on_reboot(insn, cleanup_insn) \ "666: " insn "\n\t" \ "668: \n\t" \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4bfb062..7165bd8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1037,6 +1037,7 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) * Trap the fault and ignore the instruction if that happens. */ asmlinkage void kvm_spurious_fault(void); +extern bool kvm_rebooting; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] KVM minor fixups
Hi Marcelo, These are a few fixups I found when running sparse and building Marc's 64 bit ARM tree. Please consider for 3.10. -Geoff The following changes since commit 07961ac7c0ee8b546658717034fe692fd12eefa9: Linux 3.9-rc5 (2013-03-31 15:12:43 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/geoff/kvm.git for-kvm for you to fetch changes up to 5d9e775fd2d7c9ab03d70175610a5d8cedcc3799: kvm/arm: Remove unused kvm_arch_set_memory_region (2013-04-04 16:08:28 -0700) Geoff Levand (7): kvm: Make local routines static kvm: Move vm_list, kvm_lock dec's out of x86 kvm: Move kvm_spurious_fault dec out of x86 kvm: Move kvm_rebooting dec out of x86 kvm/arm: Fix missing include build error kvm/arm: Make force_vm_exit static kvm/arm: Remove unused kvm_arch_set_memory_region arch/arm/include/asm/kvm_host.h |1 - arch/arm/kvm/arm.c | 11 ++- arch/x86/include/asm/kvm_host.h | 11 --- include/linux/kvm_host.h| 11 +++ virt/kvm/kvm_main.c |8 5 files changed, 17 insertions(+), 25 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/7] kvm/arm: Fix missing include build error
Include linux/cpu.h in kvm/arm.c. Fixes build errors like these with ARCH=arm64: arch/arm/kvm/arm.c: error: ‘CPU_STARTING_FROZEN’ undeclared Signed-off-by: Geoff Levand --- arch/arm/kvm/arm.c |1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 5a93698..49b4bc6 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/7] kvm/arm: Remove unused kvm_arch_set_memory_region
Remove the unused and empty routine kvm_arch_set_memory_region(). Signed-off-by: Geoff Levand --- arch/arm/kvm/arm.c |8 1 file changed, 8 deletions(-) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 86becdc..a67b798 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -221,14 +221,6 @@ long kvm_arch_dev_ioctl(struct file *filp, return -EINVAL; } -int kvm_arch_set_memory_region(struct kvm *kvm, - struct kvm_userspace_memory_region *mem, - struct kvm_memory_slot old, - int user_alloc) -{ - return 0; -} - int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, struct kvm_memory_slot old, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 5/6] kvm/ppc/mpic: in-kernel MPIC emulation
On 04/04/2013 12:59:02 AM, Gleb Natapov wrote: On Wed, Apr 03, 2013 at 03:58:04PM -0500, Scott Wood wrote: > KVM_DEV_MPIC_* could go elsewhere if you want to avoid cluttering > the main kvm.h. The arch header would be OK, since the non-arch > header includes the arch header, and thus it wouldn't be visible to > userspace where it is -- if there later is a need for MPIC (or > whatever other device follows MPIC's example) on another > architecture, it could be moved without breaking anything. Or, we > could just have a header for each device type. > If device will be used by more then one arch it will move into virt/kvm and will have its own header, like ioapic. virt/kvm/ioapic.h is not uapi. The ioapic uapi component (e.g. struct kvm_ioapic_state) is duplicated between x86 and ia64, which is the sort of thing I'd like to avoid. I'm OK with putting it in the PPC header if, upon a later need for multi-architecture support, it could move into either the main uapi header or a separate uapi header that the main uapi header includes (i.e. no userspace-visible change in which header needs to be included). -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
[...] >> >> to give us some idea how much performance we would gain from each >> >> approach? Thoughput should be completely unaffected anyway, since virtio >> >> just coalesces kicks internally. >> > >> > Latency is dominated by the scheduling latency. >> > This means virtio-net is not the best benchmark. >> >> So what is a good benchmark? > > E.g. ping pong stress will do but need to look at CPU utilization, > that's what is affected, not latency. > >> Is there any difference in speed at all? I strongly doubt it. One of >> virtio's main points is to reduce the number of kicks. > > For this stage of the project I think microbenchmarks are more appropriate. > Doubling the price of exit is likely to be measureable. 30 cycles likely > not ... > I don't quite understand this point here. If we don't have anything real-world where we can measure a decent difference, then why are we doing this? I would agree with Alex that the three test scenarios proposed by him should be tried out before adding this complexity, measured in CPU utilization or latency as you wish. FWIW, ARM always uses MMIO and provides hardware decoding of all sane (not user register-writeback) instruction, but the hypercall vs. mmio looks like this: hvc: 4,917 mmio_kernel: 6,248 But I doubt that an hvc wrapper around mmio decoding would take care of all this difference, because the mmio operation needs to do other work not realated to emulating the instruction in software, which you'd have to do for an hvc anyway (populate kvm_mmio structure etc.) -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RFC: vfio API changes needed for powerpc (v2)
> -Original Message- > From: Wood Scott-B07421 > Sent: Thursday, April 04, 2013 5:52 PM > To: Yoder Stuart-B08248 > Cc: Alex Williamson; Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; > Sethi Varun-B16395; > kvm@vger.kernel.org; qemu-de...@nongnu.org; io...@lists.linux-foundation.org > Subject: Re: RFC: vfio API changes needed for powerpc (v2) > > On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote: > > > > > > /* > > > > * VFIO_PAMU_MAP_MSI_BANK > > > > * > > > > * Maps the MSI bank at the specified index and iova. User space > > must > > > > * call this ioctl once for each MSI bank (count of banks is > > returned by > > > > * VFIO_IOMMU_GET_MSI_BANK_COUNT). > > > > * Caller provides struct vfio_pamu_msi_bank_map with all fields > > set. > > > > * Operates on VFIO file descriptor (/dev/vfio/vfio). > > > > * Return: 0 on success, -errno on failure > > > > */ > > > > > > > > struct vfio_pamu_msi_bank_map { > > > > __u32 argsz; > > > > __u32 msi_bank_index; /* the index of the MSI bank */ > > > > __u64 iova; /* the iova the bank is to be mapped > > to */ > > > > }; > > > > > > Again, flags. If we dynamically add or remove devices from a > > container > > > the bank count can change, right? If bank count goes from 2 to 3, > > does > > > userspace know assume the new bank is [2]? If bank count goes from > > 3 to > > > 2, which index was that? If removing a device removes an MSI bank > > then > > > vfio-pamu will automatically do the unmap, right? > > > > My assumption is that the bank count returned by > > VFIO_IOMMU_GET_MSI_BANK_COUNT > > is the max bank count for a platform. (number will mostly likely > > always be > > 3 or 4). So it won't change as devices are added or removed. > > It should be the actual number of banks used. This is required if > we're going to have userspace do the iteration and specify the exact > iovas to use -- and even if we aren't going to do that, it would be > more restrictive on available iova-space than is necessary. Usually > there will be only one bank in the group. > > Actually mapping all of the MSI banks, all the time, would completely > negate the point of using the separate alias pages. The geometry, windows, DMA mappings, etc is set on a 'container' which may have multiple groups in it. So user space needs to determine the total number of MSI windows needed when setting the geometry and window count. In the flow you proposed: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i < count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); ...that "count" has to be the total, not the count for 1 of N possible groups. So the get count ioctl is not done on a group. However, like you pointed out we don't want to negate isolation of the separate alias pages. All this API is doing is telling the kernel which windows to use for which MSI banks. It's up to the kernel to actually enable them as needed. Say 3 MSI banks exist. If there are no groups added to the vfio container and all 3 MAP_MSI_BANK calls occurred the picture may look like this (based on my earlier example): win gphys/ # enabled iovaphys size --- --- 5 N 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 N 0x1800 0xf_fe046000 4KB// msi bank 3 User space adds 2 groups that use bank 1: win gphys/ # enabled iovaphys size --- --- 5 Y 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 N 0x1800 0xf_fe046000 4KB// msi bank 3 User space adds another group that uses bank 3: win gphys/ # enabled iovaphys size --- --- 5 Y 0x1000 0xf_fe044000 4KB// msi bank 1 6 N 0x1400 0xf_fe045000 4KB// msi bank 2 7 Y 0x1800 0xf_fe046000 4KB// msi bank 3 User space doesn't need to care what is actually enabled, it just needs to the the kernel which windows to use and the kernel can take care of the rest. Stuart -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio API changes needed for powerpc (v2)
On 04/04/2013 04:38:44 PM, Yoder Stuart-B08248 wrote: > > /* > > * VFIO_PAMU_MAP_MSI_BANK > > * > > * Maps the MSI bank at the specified index and iova. User space must > > * call this ioctl once for each MSI bank (count of banks is returned by > > * VFIO_IOMMU_GET_MSI_BANK_COUNT). > > * Caller provides struct vfio_pamu_msi_bank_map with all fields set. > > * Operates on VFIO file descriptor (/dev/vfio/vfio). > > * Return: 0 on success, -errno on failure > > */ > > > > struct vfio_pamu_msi_bank_map { > > __u32 argsz; > > __u32 msi_bank_index; /* the index of the MSI bank */ > > __u64 iova; /* the iova the bank is to be mapped to */ > > }; > > Again, flags. If we dynamically add or remove devices from a container > the bank count can change, right? If bank count goes from 2 to 3, does > userspace know assume the new bank is [2]? If bank count goes from 3 to > 2, which index was that? If removing a device removes an MSI bank then > vfio-pamu will automatically do the unmap, right? My assumption is that the bank count returned by VFIO_IOMMU_GET_MSI_BANK_COUNT is the max bank count for a platform. (number will mostly likely always be 3 or 4). So it won't change as devices are added or removed. It should be the actual number of banks used. This is required if we're going to have userspace do the iteration and specify the exact iovas to use -- and even if we aren't going to do that, it would be more restrictive on available iova-space than is necessary. Usually there will be only one bank in the group. Actually mapping all of the MSI banks, all the time, would completely negate the point of using the separate alias pages. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls
On Thu, Apr 04, 2013 at 11:49:55AM +0200, Alexander Graf wrote: > > On 04.04.2013, at 07:37, Paul Mackerras wrote: > > > On Thu, Mar 21, 2013 at 09:52:16AM +0100, Alexander Graf wrote: > >>> +/* Platform specific hcalls, used by KVM */ > >>> +#define H_RTAS 0xf000 > >> > >> How about you define a different hcall ID for this? Then QEMU would > >> create its "rtas entry blob" such that KVM-routed RTAS handling goes > >> to KVM directly. > > > > QEMU can still do that, and I don't see that it would change the > > kernel side if it did. We would still have to have agreement between > > the kernel and userspace as to what the hcall number for invoking the > > in-kernel RTAS calls was, and the kernel would still have to keep a > > list of token numbers and how they correspond to the functions it > > provides. The only thing different would be that the in-kernel RTAS > > hcall could return to the guest if it didn't recognize the token > > number, rather than pushing the problem up to userspace. However, > > that wouldn't make the code any simpler, and it isn't a situation > > where performance is an issue. > > > > Do you see some kernel-side improvements or simplifications from your > > suggestion that I'm missing? Remember, the guest gets the token > > numbers from the device tree (properties under the /rtas node), so > > they are under the control of userspace/QEMU. > > The code flow with this patch: > > > > foreach (override in overrides) > ioctl(OVERRIDE_RTAS, ...); > > > > switch (hcall_id) { > case QEMU_RTAS_ID: > foreach (override in kvm_overrides) { > int rtas_id = ...; > if (override.rtas_id == rtas_id) { > handle_rtas(); Actually this is more like: override.handler(); > handled = true; > } > } > if (!handled) > pass_to_qemu(); > break; > default: > pass_to_qemu(); > break > } > > What I'm suggesting: > > > > nothing from KVM's point of view Actually, this can't be "nothing". The way the RTAS calls work is that there is a name and a "token" (32-bit integer value) for each RTAS call. The tokens have to be unique for each different name. Userspace puts the names and tokens in the device tree under the /rtas node (a set of properties where the property name is the RTAS function name and the property value is the token). The guest looks up the token for each RTAS function it wants to use, and passes the token in the argument buffer for the RTAS call. This means that userspace has to know the names and tokens for all supported RTAS functions, both the ones implemented in the kernel and the ones implemented in userspace. Also, the token numbers are pretty arbitrary, and the token numbers for the kernel-implemented RTAS functions could be chosen by userspace or by the kernel. If they're chosen by the kernel, then userspace needs a way to discover them (so it can put them in the device tree), and also has to avoid choosing any token numbers for its functions that collide with a kernel-chosen token. If userspace chooses the token numbers, it has to tell the kernel what token numbers it has chosen for the kernel-implemented RTAS functions. We chose the latter since it gives userspace more control. So this code has to be either (your suggestion): foreach RTAS function possibly implemented in kernel { query kernel token for function, by name if that gives an error, mark function as needing to be implemented in userspace } (userspace) allocate tokens for remaining functions, avoiding collisions with kernel-chosen tokens or else it is (my suggestion): (userspace) allocate tokens for all RTAS functions foreach RTAS function possibly implemented in kernel { tell kernel the (name, token) correspondence } > > > switch (hcall_id) { > case KVM_RTAS_ID: > handle_rtas(); Here, you've compressed details that you expanded in your pseudo-code above, making this a less than fair comparison. This handle_rtas() function has to fetch the token and branch out to the appropriate handler routine. Whether that's a switch statement or a loop over registered handlers doesn't make all that much difference. > break; > default: > pass_to_qemu(); > break; > } > > > Which one looks easier and less error prone to you? :) > > Speaking of which, how does user space know that the kernel actually > supports a specific RTAS token? It's really the names that are more important, the tokens are pretty arbitrary. In my scheme, userspace does a KVM_PPC_RTAS_DEFINE_TOKEN ioctl giving the name and the (userspace-chosen) token, which gets an error if the kernel doesn't recognize the name. In your scheme, there would have to be an equivalent ioctl to query the (kernel-chosen) token for a given name, which once again would return an error if the kernel doesn't recognize the name. Either way the kernel
Re: [RFC PATCH v3 6/6] kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
On 04/04/2013 05:30:05 PM, Alexander Graf wrote: Am 04.04.2013 um 20:41 schrieb Scott Wood : > On 04/04/2013 07:54:20 AM, Alexander Graf wrote: >> On 03.04.2013, at 03:57, Scott Wood wrote: >> > +if (opp->mpic_mode_mask == GCR_MODE_PROXY) >> Shouldn't this be an &? > > The way the mode field was originally documented was a two-bit field, where 0b11 was external proxy, and 0b10 was reserved. If we use & it would have to be: > >if ((opp->mpic_mode_mask & GCR_MODE_PROXY) == GCR_MODE_PROXY) >... > > Simply testing "opp->mpic_mode_mask & GCR_MODE_PROXY" would return true in the case of GCR_MODE_MIXED. > > In MPIC 4.3 external proxy is defined as a separate bit (GCR[CI]) that is ignored if the mixed-mode bit (GCR[M]) is not set, which makes it a bit more legitimate to view it as a bitmap. Still, I doubt we'll see new mode bits. Ok, please add a comment about this here then :). What sort of comment would you like? Or do you want me to use the "(x & y) == y" version? >> > @@ -460,6 +464,13 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) >> >tasklet_kill(&vcpu->arch.tasklet); >> > >> >kvmppc_remove_vcpu_debugfs(vcpu); >> > + >> > +switch (vcpu->arch.irq_type) { >> > +case KVMPPC_IRQ_MPIC: >> > +kvmppc_mpic_put(vcpu->arch.mpic); >> This doesn't tell the MPIC that this exact CPU is getting killed. What if we hotplug remove just a single CPU? Don't we have to deregister the CPU with the MPIC? > > If we ever support hot vcpu removal, yes. We'd probably need some MPIC code changes to accommodate that, and we wouldn't currently have a way to test it, so I'd rather make it obviously not supported for now. Is there any way to break heavily if user space attempts this? Is there any way for userspace to request this currently? They can close the vcpu fd, but the vcpu won't actually be destroyed until the vm goes down. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 6/6] kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
Am 04.04.2013 um 20:41 schrieb Scott Wood : > On 04/04/2013 07:54:20 AM, Alexander Graf wrote: >> On 03.04.2013, at 03:57, Scott Wood wrote: >> > +if (opp->mpic_mode_mask == GCR_MODE_PROXY) >> Shouldn't this be an &? > > The way the mode field was originally documented was a two-bit field, where > 0b11 was external proxy, and 0b10 was reserved. If we use & it would have to > be: > >if ((opp->mpic_mode_mask & GCR_MODE_PROXY) == GCR_MODE_PROXY) >... > > Simply testing "opp->mpic_mode_mask & GCR_MODE_PROXY" would return true in > the case of GCR_MODE_MIXED. > > In MPIC 4.3 external proxy is defined as a separate bit (GCR[CI]) that is > ignored if the mixed-mode bit (GCR[M]) is not set, which makes it a bit more > legitimate to view it as a bitmap. Still, I doubt we'll see new mode bits. Ok, please add a comment about this here then :). > >> > @@ -460,6 +464,13 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) >> >tasklet_kill(&vcpu->arch.tasklet); >> > >> >kvmppc_remove_vcpu_debugfs(vcpu); >> > + >> > +switch (vcpu->arch.irq_type) { >> > +case KVMPPC_IRQ_MPIC: >> > +kvmppc_mpic_put(vcpu->arch.mpic); >> This doesn't tell the MPIC that this exact CPU is getting killed. What if we >> hotplug remove just a single CPU? Don't we have to deregister the CPU with >> the MPIC? > > If we ever support hot vcpu removal, yes. We'd probably need some MPIC code > changes to accommodate that, and we wouldn't currently have a way to test it, > so I'd rather make it obviously not supported for now. Is there any way to break heavily if user space attempts this? Alex > > -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] ARM: KVM: switch to a dual-step HYP init code
Hi, On Tue, 2013-04-02 at 14:25 +0100, Marc Zyngier wrote: > + @ Jump to the trampoline page > + ldr r2, =#PAGE_MASK > + adr r3, target > + bic r3, r3, r2 > + ldr r2, =#TRAMPOLINE_VA > + add r3, r3, r2 > + mov pc, r3 I guess you need 'ldr r2, =PAGE_MASK'. arch/arm/kvm/init.S:114: Error: bad expression -- `ldr r2,=#(~((1<<12)-1))' arch/arm/kvm/init.S:117: Error: bad expression -- `ldr r2,=#0x' -Geoff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: vfio API changes needed for powerpc (v3)
-v3 updates -made vfio_pamu_attr a union, added flags -s/VFIO_PAMU_/VFIO_IOMMU_PAMU_/ for the ioctls to make it more clear which fd is being operated on -added flags to vfio_pamu_msi_bank_map/umap -VFIO_PAMU_GET_MSI_BANK_COUNT now just returns a __u32 not a struct -fixed some typos The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova->phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size These ioctls operate on the VFIO file descriptor (/dev/vfio/vfio). /* * VFIO_IOMMU_PAMU_GET_ATTR * * Gets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets argsz and attribute. The ioctl fills in * the provided struct vfio_pamu_attr based on the attribute * value that was set. * Return: 0 on success, -errno on failure */ struct vfio_pamu_attr { __u32 argsz; __u32 flags;/* no flags currently */ __u32 attribute; union { /* VFIO_ATTR_GEOMETRY */ struct { __u64 aperture_start; /* first addr that can be mapped */ __u64 aperture_end; /* last addr that can be mapped */ } attr; /* VFIO_ATTR_WINDOWS */ __u32 windows; /* number of windows in the aperture */ /* initially this will be the max number * of windows that can be set */ /* VFIO_ATTR_PAMU_STASH */ struct { __u32 cpu; /* CPU number for stashing */ __u32 cache; /* cache ID for stashing */ } stash; } }; #define VFIO_IOMMU_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_IOMMU_PAMU_SET_ATTR * * Sets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets struct vfio_pamu attr, including argsz and attribute and * setting any fields that are valid for the attribute. * Return: 0 on success, -errno on failure */ #define VFIO_IOMMU_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT * * Returns the number of MSI banks for this platform. This tells user space * how many aperture windows should be reserved for MSI banks when setting * the PAMU geometry and window count. * Return: __u32 bank count on success, -errno on failure */ #define VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, __u32) /* * VFIO_IOMMU_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_PAMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 flags; /* no flags currently */ __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; #define VFIO_IOMMU_PAMU_MAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_bank_map ) /* * VFIO_IOMMU_PAMU_UNMAP_MSI_BANK * * Unmaps the MSI bank at the specified iova. * Caller provides struct vfio_pamu_msi_bank_unmap with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_unmap { __u32 argsz; __u32 flags; /* no flags currently */ __u64 iova; /* the iova to be unmapped to */ }; #define VFIO_IOMMU_PAMU_UNMAP_MSI_BANK _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_bank_unmap ) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RFC: vfio API changes needed for powerpc (v2)
> > /* > > * VFIO_PAMU_MAP_MSI_BANK > > * > > * Maps the MSI bank at the specified index and iova. User space must > > * call this ioctl once for each MSI bank (count of banks is returned by > > * VFIO_IOMMU_GET_MSI_BANK_COUNT). > > * Caller provides struct vfio_pamu_msi_bank_map with all fields set. > > * Operates on VFIO file descriptor (/dev/vfio/vfio). > > * Return: 0 on success, -errno on failure > > */ > > > > struct vfio_pamu_msi_bank_map { > > __u32 argsz; > > __u32 msi_bank_index; /* the index of the MSI bank */ > > __u64 iova; /* the iova the bank is to be mapped to */ > > }; > > Again, flags. If we dynamically add or remove devices from a container > the bank count can change, right? If bank count goes from 2 to 3, does > userspace know assume the new bank is [2]? If bank count goes from 3 to > 2, which index was that? If removing a device removes an MSI bank then > vfio-pamu will automatically do the unmap, right? My assumption is that the bank count returned by VFIO_IOMMU_GET_MSI_BANK_COUNT is the max bank count for a platform. (number will mostly likely always be 3 or 4). So it won't change as devices are added or removed. If devices are added or removed, the kernel side can enable or disable the corresponding MSI windows. But that is hidden from user space. Stuart
Re: [PATCH v2] KVM: Call kvm_apic_match_dest() to check destination vcpu
On Wed, Apr 03, 2013 at 12:34:37PM +0300, Gleb Natapov wrote: > On Wed, Apr 03, 2013 at 08:47:58AM +0800, Yang Zhang wrote: > > From: Yang Zhang > > > > For a given vcpu, kvm_apic_match_dest() will tell you whether > > the vcpu in the destination list quickly. Drop kvm_calculate_eoi_exitmap() > > and use kvm_apic_match_dest() instead. > > > > Signed-off-by: Yang Zhang > Applied, thanks. > The patch does not compile, dropped. > > --- > > arch/x86/kvm/lapic.c | 47 --- > > arch/x86/kvm/lapic.h |4 > > virt/kvm/ioapic.c|9 +++-- > > 3 files changed, 3 insertions(+), 57 deletions(-) > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > > index a8e9369..e227474 100644 > > --- a/arch/x86/kvm/lapic.c > > +++ b/arch/x86/kvm/lapic.c > > @@ -145,53 +145,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic) > > return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff; > > } > > > > -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu, > > - struct kvm_lapic_irq *irq, > > - u64 *eoi_exit_bitmap) > > -{ > > - struct kvm_lapic **dst; > > - struct kvm_apic_map *map; > > - unsigned long bitmap = 1; > > - int i; > > - > > - rcu_read_lock(); > > - map = rcu_dereference(vcpu->kvm->arch.apic_map); > > - > > - if (unlikely(!map)) { > > - __set_bit(irq->vector, (unsigned long *)eoi_exit_bitmap); > > - goto out; > > - } > > - > > - if (irq->dest_mode == 0) { /* physical mode */ > > - if (irq->delivery_mode == APIC_DM_LOWEST || > > - irq->dest_id == 0xff) { > > - __set_bit(irq->vector, > > - (unsigned long *)eoi_exit_bitmap); > > - goto out; > > - } > > - dst = &map->phys_map[irq->dest_id & 0xff]; > > - } else { > > - u32 mda = irq->dest_id << (32 - map->ldr_bits); > > - > > - dst = map->logical_map[apic_cluster_id(map, mda)]; > > - > > - bitmap = apic_logical_id(map, mda); > > - } > > - > > - for_each_set_bit(i, &bitmap, 16) { > > - if (!dst[i]) > > - continue; > > - if (dst[i]->vcpu == vcpu) { > > - __set_bit(irq->vector, > > - (unsigned long *)eoi_exit_bitmap); > > - break; > > - } > > - } > > - > > -out: > > - rcu_read_unlock(); > > -} > > - > > static void recalculate_apic_map(struct kvm *kvm) > > { > > struct kvm_apic_map *new, *old = NULL; > > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > > index 2c721b9..baa20cf 100644 > > --- a/arch/x86/kvm/lapic.h > > +++ b/arch/x86/kvm/lapic.h > > @@ -160,10 +160,6 @@ static inline u16 apic_logical_id(struct kvm_apic_map > > *map, u32 ldr) > > return ldr & map->lid_mask; > > } > > > > -void kvm_calculate_eoi_exitmap(struct kvm_vcpu *vcpu, > > - struct kvm_lapic_irq *irq, > > - u64 *eoi_bitmap); > > - > > static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu) > > { > > return vcpu->arch.apic->pending_events; > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > > index 5ba005c..bb3906d 100644 > > --- a/virt/kvm/ioapic.c > > +++ b/virt/kvm/ioapic.c > > @@ -124,7 +124,6 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu > > *vcpu, > > { > > struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic; > > union kvm_ioapic_redirect_entry *e; > > - struct kvm_lapic_irq irqe; > > int index; > > > > spin_lock(&ioapic->lock); > > @@ -135,11 +134,9 @@ void kvm_ioapic_calculate_eoi_exitmap(struct kvm_vcpu > > *vcpu, > > (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || > > kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, > > index))) { > > - irqe.dest_id = e->fields.dest_id; > > - irqe.vector = e->fields.vector; > > - irqe.dest_mode = e->fields.dest_mode; > > - irqe.delivery_mode = e->fields.delivery_mode << 8; > > - kvm_calculate_eoi_exitmap(vcpu, &irqe, eoi_exit_bitmap); > > + if (kvm_apic_match_dest(vcpu, NULL, 0, > > + e->fields.dest_id, e->fields.dest_mode)) > > + __set_bit(e->fileds.vector, (unsigned long > > *)eoi_exit_bitmap); > > } > > } > > spin_unlock(&ioapic->lock); > > -- > > 1.7.1 > > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More major
[PATCH] KVM: x86: Fix memory leak in vmx.c
If userspace creates and destroys multiple VMs within the same process we leak 20k of memory in the userspace process context per VM. This patch frees the memory in kvm_arch_destroy_vm. If the process exits without closing the VM file descriptor or the file descriptor has been shared with another process then we don't need to free the memory. Messing with user space memory from an fd is not ideal, but other changes would require user space changes and this is consistent with how the memory is currently allocated. Tested: Test ran several VMs and ran against test program meant to demonstrate the leak (www.spinics.net/lists/kvm/msg83734.html). Signed-off-by: Andrew Honig --- arch/x86/include/asm/kvm_host.h |3 +++ arch/x86/kvm/vmx.c |3 +++ arch/x86/kvm/x86.c | 11 +++ 3 files changed, 17 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4979778..975a74d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -553,6 +553,9 @@ struct kvm_arch { struct page *ept_identity_pagetable; bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; + unsigned long ept_ptr; + unsigned long apic_ptr; + unsigned long tss_ptr; unsigned long irq_sources_bitmap; s64 kvmclock_offset; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6667042..8aa5d81 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3703,6 +3703,7 @@ static int alloc_apic_access_page(struct kvm *kvm) } kvm->arch.apic_access_page = page; + kvm->arch.apic_ptr = kvm_userspace_mem.userspace_addr; out: mutex_unlock(&kvm->slots_lock); return r; @@ -3733,6 +3734,7 @@ static int alloc_identity_pagetable(struct kvm *kvm) } kvm->arch.ept_identity_pagetable = page; + kvm->arch.ept_ptr = kvm_userspace_mem.userspace_addr; out: mutex_unlock(&kvm->slots_lock); return r; @@ -4366,6 +4368,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) if (ret) return ret; kvm->arch.tss_addr = addr; + kvm->arch.tss_ptr = tss_mem.userspace_addr; if (!init_rmode_tss(kvm)) return -ENOMEM; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f19ac0a..411ff2a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6812,6 +6812,16 @@ void kvm_arch_sync_events(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { + if (current->mm == kvm->mm) { + /* +* Free pages allocated on behalf of userspace, unless the +* the memory map has changed due to process exit or fd +* copying. +*/ + vm_munmap(kvm->arch.apic_ptr, PAGE_SIZE); + vm_munmap(kvm->arch.ept_ptr, PAGE_SIZE); + vm_munmap(kvm->arch.tss_ptr, PAGE_SIZE * 3); + } kvm_iommu_unmap_guest(kvm); kfree(kvm->arch.vpic); kfree(kvm->arch.vioapic); @@ -6929,6 +6939,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, return PTR_ERR((void *)userspace_addr); memslot->userspace_addr = userspace_addr; + mem->userspace_addr = userspace_addr; } return 0; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: vfio API changes needed for powerpc (v2)
On Thu, 2013-04-04 at 17:32 +, Yoder Stuart-B08248 wrote: > Based on the email thread over the last couple of days, I have > below an more concrete proposal (v2) for new ioctls supporting vfio-pci > on SoCs with the Freescale PAMU. > > Example usage is as described by Scott: > > count = VFIO_IOMMU_GET_MSI_BANK_COUNT > VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) > VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) > // do other DMA maps now, or later, or not at all, doesn't matter > for (i = 0; i < count; i++) > VFIO_IOMMU_MAP_MSI_BANK(iova, i); > // The kernel now knows where each bank has been mapped, and can > // update PCI config space appropriately. > > Thanks, > Stuart > > > > The Freescale PAMU is an aperture-based IOMMU with the following > characteristics. Each device has an entry in a table in memory > describing the iova->phys mapping. The mapping has: >-an overall aperture that is power of 2 sized, and has a start iova that > is naturally aligned >-has 1 or more windows within the aperture > -number of windows must be power of 2, max is 256 > -size of each window is determined by aperture size / # of windows > -iova of each window is determined by aperture start iova / # of windows > -the mapped region in each window can be different than >the window size...mapping must power of 2 > -physical address of the mapping must be naturally aligned >with the mapping size > > /* > * VFIO_PAMU_GET_ATTR > * > * Gets the iommu attributes for the current vfio container. This > * ioctl is applicable to an iommu type of VFIO_PAMU only. > * Caller sets argsz and attribute. The ioctl fills in > * the provided struct vfio_pamu_attr based on the attribute > * value that was set. > * Operates on VFIO file descriptor (/dev/vfio/vfio). > * Return: 0 on success, -errno on failure > */ > struct vfio_pamu_attr { > __u32 argsz; > __u32 attribute; > #define VFIO_ATTR_GEOMETRY 1 > #define VFIO_ATTR_WINDOWS 2 > #define VFIO_ATTR_PAMU_STASH 3 > > /* fowlling fields are for VFIO_ATTR_GEOMETRY */ > __u64 aperture_start; /* first address that can be mapped*/ > __u64 aperture_end; /* last address that can be mapped */ > > /* follwing fields are for VFIO_ATTR_WINDOWS */ > __u32 windows;/* number of windows in the aperture */ > /* initially this will be the max number >* of windows that can be set >*/ > > /* following fields are for VFIO_ATTR_PAMU_STASH */ > __u32 cpu;/* CPU number for stashing */ > __u32 cache; /* cache ID for stashing */ Can multiple attribute bits be set at once? If not then the above could be a union and the attribute could be an enum. A flags field is probably still a good idea. > }; > #define VFIO_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, > struct vfio_pamu_attr) > > /* > * VFIO_PAMU_SET_ATTR > * > * Sets the iommu attributes for the current vfio container. This > * ioctl is applicable to an iommu type of VFIO_PAMU only. > * Caller sets struct vfio_pamu attr, including argsz and attribute and > * setting any fields that are valid for the attribute. > * Operates on VFIO file descriptor (/dev/vfio/vfio). > * Return: 0 on success, -errno on failure > */ > #define VFIO_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, > struct vfio_pamu_attr) > > /* > * VFIO_PAMU_GET_MSI_BANK_COUNT > * > * Returns the number of MSI banks for this platform. This tells user space > * how many aperture windows should be reserved for MSI banks when setting > * the PAMU geometry and window count. > * Fills in provided struct vfio_pamu_msi_banks. Caller sets argsz. > * Operates on VFIO file descriptor (/dev/vfio/vfio). > * Return: 0 on success, -errno on failure > */ > struct vfio_pamu_msi_banks { > __u32 argsz; > __u32 bank_count; /* the number of MSI > }; > #define VFIO_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, > struct vfio_pamu_msi_banks) argsz w/o flags doesn't really buy us much. > > /* > * VFIO_PAMU_MAP_MSI_BANK > * > * Maps the MSI bank at the specified index and iova. User space must > * call this ioctl once for each MSI bank (count of banks is returned by > * VFIO_IOMMU_GET_MSI_BANK_COUNT). > * Caller provides struct vfio_pamu_msi_bank_map with all fields set. > * Operates on VFIO file descriptor (/dev/vfio/vfio). > * Return: 0 on success, -errno on failure > */ > > struct vfio_pamu_msi_bank_map { > __u32 argsz; > __u32 msi_bank_index; /* the index of the MSI bank */ > __u64 iova; /* the iova the bank is to be mapped to */ > }; Again, flags. If we dynamically add or remove devices from a container the bank count can change
Re: [RFC PATCH v3 6/6] kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
On 04/04/2013 07:54:20 AM, Alexander Graf wrote: On 03.04.2013, at 03:57, Scott Wood wrote: > + if (opp->mpic_mode_mask == GCR_MODE_PROXY) Shouldn't this be an &? The way the mode field was originally documented was a two-bit field, where 0b11 was external proxy, and 0b10 was reserved. If we use & it would have to be: if ((opp->mpic_mode_mask & GCR_MODE_PROXY) == GCR_MODE_PROXY) ... Simply testing "opp->mpic_mode_mask & GCR_MODE_PROXY" would return true in the case of GCR_MODE_MIXED. In MPIC 4.3 external proxy is defined as a separate bit (GCR[CI]) that is ignored if the mixed-mode bit (GCR[M]) is not set, which makes it a bit more legitimate to view it as a bitmap. Still, I doubt we'll see new mode bits. > @@ -460,6 +464,13 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) >tasklet_kill(&vcpu->arch.tasklet); > >kvmppc_remove_vcpu_debugfs(vcpu); > + > + switch (vcpu->arch.irq_type) { > + case KVMPPC_IRQ_MPIC: > + kvmppc_mpic_put(vcpu->arch.mpic); This doesn't tell the MPIC that this exact CPU is getting killed. What if we hotplug remove just a single CPU? Don't we have to deregister the CPU with the MPIC? If we ever support hot vcpu removal, yes. We'd probably need some MPIC code changes to accommodate that, and we wouldn't currently have a way to test it, so I'd rather make it obviously not supported for now. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: vfio API changes needed for powerpc (v2)
Based on the email thread over the last couple of days, I have below an more concrete proposal (v2) for new ioctls supporting vfio-pci on SoCs with the Freescale PAMU. Example usage is as described by Scott: count = VFIO_IOMMU_GET_MSI_BANK_COUNT VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // do other DMA maps now, or later, or not at all, doesn't matter for (i = 0; i < count; i++) VFIO_IOMMU_MAP_MSI_BANK(iova, i); // The kernel now knows where each bank has been mapped, and can // update PCI config space appropriately. Thanks, Stuart The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova->phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size /* * VFIO_PAMU_GET_ATTR * * Gets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets argsz and attribute. The ioctl fills in * the provided struct vfio_pamu_attr based on the attribute * value that was set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_attr { __u32 argsz; __u32 attribute; #define VFIO_ATTR_GEOMETRY 1 #define VFIO_ATTR_WINDOWS 2 #define VFIO_ATTR_PAMU_STASH 3 /* fowlling fields are for VFIO_ATTR_GEOMETRY */ __u64 aperture_start; /* first address that can be mapped*/ __u64 aperture_end; /* last address that can be mapped */ /* follwing fields are for VFIO_ATTR_WINDOWS */ __u32 windows;/* number of windows in the aperture */ /* initially this will be the max number * of windows that can be set */ /* following fields are for VFIO_ATTR_PAMU_STASH */ __u32 cpu;/* CPU number for stashing */ __u32 cache; /* cache ID for stashing */ }; #define VFIO_PAMU_GET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_PAMU_SET_ATTR * * Sets the iommu attributes for the current vfio container. This * ioctl is applicable to an iommu type of VFIO_PAMU only. * Caller sets struct vfio_pamu attr, including argsz and attribute and * setting any fields that are valid for the attribute. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ #define VFIO_PAMU_SET_ATTR _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_attr) /* * VFIO_PAMU_GET_MSI_BANK_COUNT * * Returns the number of MSI banks for this platform. This tells user space * how many aperture windows should be reserved for MSI banks when setting * the PAMU geometry and window count. * Fills in provided struct vfio_pamu_msi_banks. Caller sets argsz. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_banks { __u32 argsz; __u32 bank_count; /* the number of MSI }; #define VFIO_PAMU_GET_MSI_BANK_COUNT _IO(VFIO_TYPE, VFIO_BASE + x, struct vfio_pamu_msi_banks) /* * VFIO_PAMU_MAP_MSI_BANK * * Maps the MSI bank at the specified index and iova. User space must * call this ioctl once for each MSI bank (count of banks is returned by * VFIO_IOMMU_GET_MSI_BANK_COUNT). * Caller provides struct vfio_pamu_msi_bank_map with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_map { __u32 argsz; __u32 msi_bank_index; /* the index of the MSI bank */ __u64 iova; /* the iova the bank is to be mapped to */ }; /* * VFIO_PAMU_UNMAP_MSI_BANK * * Unmaps the MSI bank at the specified iova. * Caller provides struct vfio_pamu_msi_bank_unmap with all fields set. * Operates on VFIO file descriptor (/dev/vfio/vfio). * Return: 0 on success, -errno on failure */ struct vfio_pamu_msi_bank_unmap { __u32 argsz; __u64 iova; /* the iova to be unmapped to */ }; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 06:36:30PM +0300, Michael S. Tsirkin wrote: > > processor : 0 > > vendor_id : AuthenticAMD > > cpu family : 16 > > model : 8 > > model name : Six-Core AMD Opteron(tm) Processor 8435 > > stepping: 0 > > cpu MHz : 800.000 > > cache size : 512 KB > > physical id : 0 > > siblings: 6 > > core id : 0 > > cpu cores : 6 > > apicid : 8 > > initial apicid : 0 > > fpu : yes > > fpu_exception : yes > > cpuid level : 5 > > wp : yes > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge > > mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc > > extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic > > cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv > > svm_lock nrip_save pausefilter > > bogomips: 5199.87 > > TLB size: 1024 4K pages > > clflush size: 64 > > cache_alignment : 64 > > address sizes : 48 bits physical, 48 bits virtual > > power management: ts ttp tm stc 100mhzsteps hwpstate > > Hmm, svm code seems less optimized for MMIO, but PIO > is almost identical. Gleb says the unittest is broken > on AMD so I'll wait until it's fixed to test. > It's not unittest is broken, its my environment is broken :) > Did you do PIO reads by chance? > > > > > > > Or could be different software, this is on top of 3.9.0-rc5, what > > > did you try? > > > > 3.0 plus kvm-kmod of whatever was current back in autumn :). > > > > > > > >> MST, could you please do a real world latency benchmark with virtio-net > > >> and > > >> > > >> * normal ioeventfd > > >> * mmio-pv eventfd > > >> * hcall eventfd > > > > > > I can't do this right away, sorry. For MMIO we are discussing the new > > > layout on the virtio mailing list, guest and qemu need a patch for this > > > too. My hcall patches are stale and would have to be brought up to > > > date. > > > > > > > > >> to give us some idea how much performance we would gain from each > > >> approach? Thoughput should be completely unaffected anyway, since virtio > > >> just coalesces kicks internally. > > > > > > Latency is dominated by the scheduling latency. > > > This means virtio-net is not the best benchmark. > > > > So what is a good benchmark? > > E.g. ping pong stress will do but need to look at CPU utilization, > that's what is affected, not latency. > > > Is there any difference in speed at all? I strongly doubt it. One of > > virtio's main points is to reduce the number of kicks. > > For this stage of the project I think microbenchmarks are more appropriate. > Doubling the price of exit is likely to be measureable. 30 cycles likely > not ... > > > > > > >> I'm also slightly puzzled why the wildcard eventfd mechanism is so > > >> significantly slower, while it was only a few percent on my test system. > > >> What are the numbers you're listing above? Cycles? How many cycles do > > >> you execute in a second? > > >> > > >> > > >> Alex > > > > > > > > > It's the TSC divided by number of iterations. kvm unittest this value, > > > here's > > > what it does (removed some dead code): > > > > > > #define GOAL (1ull << 30) > > > > > >do { > > >iterations *= 2; > > >t1 = rdtsc(); > > > > > >for (i = 0; i < iterations; ++i) > > >func(); > > >t2 = rdtsc(); > > >} while ((t2 - t1) < GOAL); > > >printf("%s %d\n", test->name, (int)((t2 - t1) / iterations)); > > > > So it's the number of cycles per run. > > > > That means translated my numbers are: > > > > MMIO: 4307 > > PIO: 3658 > > HCALL: 1756 > > > > MMIO - PIO = 649 > > > > which aligns roughly with your PV MMIO callback. > > > > My MMIO benchmark was to poke the LAPIC version register. That does go > > through instruction emulation, no? > > > > > > Alex > > Why wouldn't it? > Intel decodes access to apic page, but we use it only for fast eoi. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 05:36:40PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 15:33, Michael S. Tsirkin wrote: > > > On Thu, Apr 04, 2013 at 03:06:42PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:56, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:45, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:38, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: > >>> we > >>> know the address from the VMCS so if the address is unique, we > >>> can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the > >>> length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 > >>> and > >>> silently ignored for NPT and MMU, so everything works correctly > >>> but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of > >> magnitudes slower than hypercalls. Why don't you just create a PV > >> MMIO hypercall that the guest can use to invoke MMIO accesses > >> towards the host based on physical addresses with explicit length > >> encodings? > >> > > It is slower, but not an order of magnitude slower. It become faster > > with newer HW. > > > >> That way you simplify and speed up all code paths, exceeding the > >> speed of PIO exits even. It should also be quite easily portable, > >> as all other platforms have hypercalls available as well. > >> > > We are trying to avoid PV as much as possible (well this is also PV, > > but not guest visible > > Also, how is this not guest visible? Who sets > KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition > indicates that the guest does so, so it is guest visible. > > >>> QEMU sets it. > >> > >> How does QEMU know? > >> > > Knows what? When to create such eventfd? virtio device knows. > > Where does it know from? > > >>> It does it always. > >>> > > > >>> > +/* > + * PV_MMIO - Guest can promise us that all accesses touching this > address > + * are writes of specified length, starting at the specified > address. > + * If not - it's a Guest bug. > + * Can not be used together with either PIO or DATAMATCH. > + */ > > >>> Virtio spec will state that access to a kick register needs to be of > >>> specific length. This is reasonable thing for HW to ask. > >> > >> This is a spec change. So the guest would have to indicate that it > >> adheres to a newer spec. Thus it's a guest visible change. > >> > > There is not virtio spec that has kick register in MMIO. The spec is in > > the works AFAIK. Actually PIO will not be deprecated and my suggestion > > So the guest would indicate that it supports a newer revision of the > spec (in your case, that it supports MMIO). How is that any different > from exposing that it supports a PV MMIO hcall? > > >>> Guest will indicate nothing. New driver will use MMIO if PIO is bar is > >>> not configured. All driver will not work for virtio devices with MMIO > >>> bar, but not PIO bar. > >> > >> I can't parse that, sorry :). > > > > It's simple. Driver does iowrite16 or whatever is appropriate for the OS. > > QEMU tells KVM which addr
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 05:36:40PM +0200, Alexander Graf wrote: > > > > #define GOAL (1ull << 30) > > > >do { > >iterations *= 2; > >t1 = rdtsc(); > > > >for (i = 0; i < iterations; ++i) > >func(); > >t2 = rdtsc(); > >} while ((t2 - t1) < GOAL); > >printf("%s %d\n", test->name, (int)((t2 - t1) / iterations)); > > So it's the number of cycles per run. > > That means translated my numbers are: > > MMIO: 4307 > PIO: 3658 > HCALL: 1756 > > MMIO - PIO = 649 > > which aligns roughly with your PV MMIO callback. > > My MMIO benchmark was to poke the LAPIC version register. That does go > through instruction emulation, no? > It should and PIO 'in' or string also goes through the emulator. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [virt-test][PATCH 4/7] virt: Adds named variants to Cartesian config.
- Original Message - > Sorry for not reading the commit message before my previous reply. Now I > see the origin of the ">" syntax. > > On Fri, Mar 29, 2013 at 06:14:07PM +0100, Jiří Župka wrote: > [...] > > > > For filtering of named variants is used character ">" because there was > > problem with conflict with = in expression key = value. The char ">" > > could be changed to something better but it should be different from "=" > > for optimization of speed. > > IMO we need really strong reasons to use anything different from "=" > because it is the most obvious choice we have. Using ">" doesn't make > any sense to me. There is not necessary solve conflict with = or : in code. Code parsing is straightforward with that . Chars "=" and ":" was one of my first selection too but it brings conflicts in parsing. But it could be changed because there were more voice against it. Users could prefer better syntax instead of little improve of speed. > > What kind of speed optimization are you talking about, exactly? We need > to keep algorithm time/space complexity under control, but making two or > three additional regexp matches per line won't make the code much > slower, will it? Sometime yes (https://github.com/autotest/virt-test/pull/229). But I don't think that it is this case. I'll will try think about more. > Also: whatever symbol we use, I would really like to make it > whitespace-insensitive. > > I mean: if "foo>x" or "foo=x" works, "foo > x" or "foo = x" should work, > too. I am absolutely sure people _will_ eventually try to put whitespace > around the operator symbol, and this shouldn't cause unpleasant > surprises. Thank a lot that you catch this bug. It is only bug not intention. I have forgot one strip(). But I will repair the bug after we will finish discussion about named variant. > > > > > > Additionally named variant adds keys to final dictionary in case of > > example is it (virt_system = linux). It should reduce size of config file. > > Keys defined in config and keys defined by named variants are in same > > name space. > > This is the part I like the most. Thanks! > > > > > > Signed-off-by: Jiří Župka > > --- > > virttest/cartesian_config.py | 138 > > ++- > > 1 file changed, 124 insertions(+), 14 deletions(-) > > > > diff --git a/virttest/cartesian_config.py b/virttest/cartesian_config.py > > index ef91051..04ed2b5 100755 > > --- a/virttest/cartesian_config.py > > +++ b/virttest/cartesian_config.py > > @@ -145,6 +145,74 @@ class MissingIncludeError: > > num_failed_cases = 5 > > > > > > +class Label(object): > > +__slots__ = ["name", "var_name", "long_name", "hash_val", "hash_var"] > > + > > +def __init__(self, name, next_name=None): > > +if next_name is None: > > +self.name = name > > +self.var_name = None > > +else: > > +self.name = next_name > > +self.var_name = name > > + > > +if self.var_name is None: > > +self.long_name = "%s" % (self.name) > > +else: > > +self.long_name = "%s>%s" % (self.var_name, self.name) > > + > > +self.hash_val = self.hash_name() > > +self.hash_var = None > > +if self.var_name: > > +self.hash_var = self.hash_variant() > > + > > + > > +def __str__(self): > > +return self.long_name > > + > > + > > +def __repr__(self): > > +return self.long_name > > + > > + > > +def __eq__(self, o): > > +""" > > +The comparison is asymmetric due to optimization. > > +""" > > +if o.var_name: > > +if self.long_name == o.long_name: > > +return True > > +else: > > +if self.name == o.name: > > +return True > > +return False > > + > > + > > +def __ne__(self, o): > > +""" > > +The comparison is asymmetric due to optimization. > > +""" > > +if o.var_name: > > +if self.long_name != o.long_name: > > +return True > > +else: > > +if self.name != o.name: > > +return True > > +return False > > + > > + > > +def __hash__(self): > > +return self.hash_val > > + > > + > > +def hash_name(self): > > +return sum([i + 1 * ord(x) for i, x in enumerate(self.name)]) > > + > > + > > +def hash_variant(self): > > +return sum([i + 1 * ord(x) for i, x in enumerate(str(self))]) > > + > > + > > class Node(object): > > __slots__ = ["name", "dep", "content", "children", "labels", > > "append_to_shortname", "failed_cases", "default"] > > @@ -212,18 +280,19 @@ class Filter(object): > > def __init__(self, s): > > self.filter = [] > > for char in s: > > -if not (char.isalnum() or char.isspace() or char in ".,_-"): > > +if not (char.isalnum() or cha
Re: [PATCH v2 5/6] kvm: add PV MMIO
On Thu, Apr 04, 2013 at 04:56:11PM +0200, Paolo Bonzini wrote: > Il 04/04/2013 15:10, Michael S. Tsirkin ha scritto: > > > I would like Avi to comment on this, because I think this is not the > > > "memory-API approved" way of doing things. You need KVM to define its > > > own AddressSpace, and make KVM's listener use > > > memory_region_to_address_space to figure out if it is for PV MMIO. > > > > > > To handle accesses from TCG, the PV AddressSpace can simply have just an > > > alias to the actual MemoryRegion where the doorbell is. > > > > This is not really different from other eventfd flags like datamatch. > > Separate address space is not appropriate here. > > This is regular memory space, KVM eventfd simply has a flag that says > > "guest is well behaved so please make eventfd go faster". > > Having a separate address space would match what you do in the kernel > though. Not really, in kernel the device is as usual on the MMIO bus. The pv_mmio_bus is an extra datastructure to enable lookups where length is not known in an efficient way. Userspace does not need to know about these details. > I just don't like functions with a dozen arguments... Can you just make > datamatch and pv a single flags argument, as is the case with the KVM API? > > Paolo Extra parameters are detected by compiler :) I don't really care, anyone else objects to changing datamatch to a flags argument? Please let me know to avoid changing it back and forth. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 15:33, Michael S. Tsirkin wrote: > On Thu, Apr 04, 2013 at 03:06:42PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:56, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: On 04.04.2013, at 14:45, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:38, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: On 04.04.2013, at 14:08, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: >> >>> With KVM, MMIO is much slower than PIO, due to the need to >>> do page walk and emulation. But with EPT, it does not have to be: we >>> know the address from the VMCS so if the address is unique, we can >>> look >>> up the eventfd directly, bypassing emulation. >>> >>> Add an interface for userspace to specify this per-address, we can >>> use this e.g. for virtio. >>> >>> The implementation adds a separate bus internally. This serves two >>> purposes: >>> - minimize overhead for old userspace that does not use PV MMIO >>> - minimize disruption in other code (since we don't know the length, >>> devices on the MMIO bus only get a valid address in write, this >>> way we don't need to touch all devices to teach them handle >>> an dinvalid length) >>> >>> At the moment, this optimization is only supported for EPT on x86 >>> and >>> silently ignored for NPT and MMU, so everything works correctly but >>> slowly. >>> >>> TODO: NPT, MMU and non x86 architectures. >>> >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for >>> pre-review and suggestions. >>> >>> Signed-off-by: Michael S. Tsirkin >> >> This still uses page fault intercepts which are orders of magnitudes >> slower than hypercalls. Why don't you just create a PV MMIO >> hypercall that the guest can use to invoke MMIO accesses towards the >> host based on physical addresses with explicit length encodings? >> > It is slower, but not an order of magnitude slower. It become faster > with newer HW. > >> That way you simplify and speed up all code paths, exceeding the >> speed of PIO exits even. It should also be quite easily portable, as >> all other platforms have hypercalls available as well. >> > We are trying to avoid PV as much as possible (well this is also PV, > but not guest visible Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates that the guest does so, so it is guest visible. >>> QEMU sets it. >> >> How does QEMU know? >> > Knows what? When to create such eventfd? virtio device knows. Where does it know from? >>> It does it always. >>> > >>> +/* + * PV_MMIO - Guest can promise us that all accesses touching this address + * are writes of specified length, starting at the specified address. + * If not - it's a Guest bug. + * Can not be used together with either PIO or DATAMATCH. + */ >>> Virtio spec will state that access to a kick register needs to be of >>> specific length. This is reasonable thing for HW to ask. >> >> This is a spec change. So the guest would have to indicate that it >> adheres to a newer spec. Thus it's a guest visible change. >> > There is not virtio spec that has kick register in MMIO. The spec is in > the works AFAIK. Actually PIO will not be deprecated and my suggestion So the guest would indicate that it supports a newer revision of the spec (in your case, that it supports MMIO). How is that any different from exposing that it supports a PV MMIO hcall? >>> Guest will indicate nothing. New driver will use MMIO if PIO is bar is >>> not configured. All driver will not work for virtio devices with MMIO >>> bar, but not PIO bar. >> >> I can't parse that, sorry :). > > It's simple. Driver does iowrite16 or whatever is appropriate for the OS. > QEMU tells KVM which address driver uses, to make exits faster. This is not > different from how eventfd works. For example if exits to QEMU suddenly > become > very cheap we can remove eventfd completely. > >>> > is to move to MMIO only when PIO address space is exhausted. For PCI it > will be never, for PCI-e it will be after ~16 devices. Ok, let's go back
Re: [PATCH 4/7] ARM: KVM: enforce page alignment for identity mapped code
On Thu, Apr 4, 2013 at 3:47 AM, Marc Zyngier wrote: > On 04/04/13 00:15, Christoffer Dall wrote: >> On Wed, Apr 03, 2013 at 11:00:39AM +0100, Marc Zyngier wrote: >>> On 03/04/13 10:50, Will Deacon wrote: On Tue, Apr 02, 2013 at 02:25:12PM +0100, Marc Zyngier wrote: > We're about to move to a init procedure where we rely on the > fact that the init code fits in a single page. Make sure we > align the idmap text on a page boundary, and that the code is > not bigger than a single page. > > Signed-off-by: Marc Zyngier > --- > arch/arm/kernel/vmlinux.lds.S | 2 +- > arch/arm/kvm/init.S | 7 +++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S > index b571484..d9dd265 100644 > --- a/arch/arm/kernel/vmlinux.lds.S > +++ b/arch/arm/kernel/vmlinux.lds.S > @@ -20,7 +20,7 @@ >VMLINUX_SYMBOL(__idmap_text_start) = .; \ >*(.idmap.text) \ >VMLINUX_SYMBOL(__idmap_text_end) = .; \ > - ALIGN_FUNCTION(); \ > + . = ALIGN(PAGE_SIZE); \ >VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \ >*(.hyp.idmap.text) \ >VMLINUX_SYMBOL(__hyp_idmap_text_end) = .; > diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S > index 9f37a79..35a463f 100644 > --- a/arch/arm/kvm/init.S > +++ b/arch/arm/kvm/init.S > @@ -111,4 +111,11 @@ __do_hyp_init: >.globl __kvm_hyp_init_end > __kvm_hyp_init_end: > > + /* > + * The above code *must* fit in a single page for the trampoline > + * madness to work. Whoever decides to change it must make sure > + * we map the right amount of memory for the trampoline to work. > + * The line below ensures any breakage will get noticed. >> >> This comment is a little funny. What's this about changing it? And why >> madness? OK, it's a little mad. >> >> I think the nice thing to explain here is why .org would complain - I >> had to look in the AS reference to figure out that it will complain if >> the location pointer is forwarded outside the section - if that is the >> right reason? >> > + */ > + .org__kvm_hyp_init + PAGE_SIZE >.popsection What effect does this have on the size of the kernel image? I'd expect the idmap code to be pretty small, so aligning to a page might be overkill a lot of the time. >>> >>> So we're indeed wasting memory between the kernel and the HYP idmaps. >>> The reason for this is to ensure that the init code is not spread across >>> two pages, which would make the trampoline harder to allocate. >>> >>> If there's a way to ensure that the init.S code will fit a single page, >>> then we can remove this ALIGN(PAGE_SIZE). >> >> Just allocate a temporary page and copy the bit of code into there? It >> should be easy enough to make sure it's location independent. We can >> even give that page back later if we want and re-do the trick when a CPU >> gets hot-plugged if we want. No? > > I'd really like to avoid freeing stuff, because it gives us even more > chances to screw our TLBs (we'd have to invalidate on unmap, and deal > with potential races between CPUs coming up). > > We could allocate this page *if* the HYP init code happens to cross a > page boundary, and only then. I'll have a look at implementing this as > it should solve Will's issue (which is even worse on arm64 with 64k pages). > The freeing stuff was a minor point, that I don't care much about, but reducing the kernel size is a valid concern I guess. -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 04:14:57PM +0300, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 03:06:42PM +0200, Alexander Graf wrote: > > > > On 04.04.2013, at 14:56, Gleb Natapov wrote: > > > > > On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > > >> > > >> On 04.04.2013, at 14:45, Gleb Natapov wrote: > > >> > > >>> On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > > > On 04.04.2013, at 14:38, Gleb Natapov wrote: > > > > > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > >> > > >> On 04.04.2013, at 14:08, Gleb Natapov wrote: > > >> > > >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > > > With KVM, MMIO is much slower than PIO, due to the need to > > > do page walk and emulation. But with EPT, it does not have to be: > > > we > > > know the address from the VMCS so if the address is unique, we > > > can look > > > up the eventfd directly, bypassing emulation. > > > > > > Add an interface for userspace to specify this per-address, we can > > > use this e.g. for virtio. > > > > > > The implementation adds a separate bus internally. This serves two > > > purposes: > > > - minimize overhead for old userspace that does not use PV MMIO > > > - minimize disruption in other code (since we don't know the > > > length, > > > devices on the MMIO bus only get a valid address in write, this > > > way we don't need to touch all devices to teach them handle > > > an dinvalid length) > > > > > > At the moment, this optimization is only supported for EPT on x86 > > > and > > > silently ignored for NPT and MMU, so everything works correctly > > > but > > > slowly. > > > > > > TODO: NPT, MMU and non x86 architectures. > > > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > > pre-review and suggestions. > > > > > > Signed-off-by: Michael S. Tsirkin > > > > This still uses page fault intercepts which are orders of > > magnitudes slower than hypercalls. Why don't you just create a PV > > MMIO hypercall that the guest can use to invoke MMIO accesses > > towards the host based on physical addresses with explicit length > > encodings? > > > > >>> It is slower, but not an order of magnitude slower. It become faster > > >>> with newer HW. > > >>> > > That way you simplify and speed up all code paths, exceeding the > > speed of PIO exits even. It should also be quite easily portable, > > as all other platforms have hypercalls available as well. > > > > >>> We are trying to avoid PV as much as possible (well this is also PV, > > >>> but not guest visible > > >> > > >> Also, how is this not guest visible? Who sets > > >> KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition > > >> indicates that the guest does so, so it is guest visible. > > >> > > > QEMU sets it. > > > > How does QEMU know? > > > > >>> Knows what? When to create such eventfd? virtio device knows. > > >> > > >> Where does it know from? > > >> > > > It does it always. > > > > > >>> > > > > > >> +/* > > >> + * PV_MMIO - Guest can promise us that all accesses touching this > > >> address > > >> + * are writes of specified length, starting at the specified > > >> address. > > >> + * If not - it's a Guest bug. > > >> + * Can not be used together with either PIO or DATAMATCH. > > >> + */ > > >> > > > Virtio spec will state that access to a kick register needs to be of > > > specific length. This is reasonable thing for HW to ask. > > > > This is a spec change. So the guest would have to indicate that it > > adheres to a newer spec. Thus it's a guest visible change. > > > > >>> There is not virtio spec that has kick register in MMIO. The spec is in > > >>> the works AFAIK. Actually PIO will not be deprecated and my suggestion > > >> > > >> So the guest would indicate that it supports a newer revision of the > > >> spec (in your case, that it supports MMIO). How is that any different > > >> from exposing that it supports a PV MMIO hcall? > > >> > > > Guest will indicate nothing. New driver will use MMIO if PIO is bar is > > > not configured. All driver will not work for virtio devices with MMIO > > > bar, but not PIO bar. > > > > I can't parse that, sorry :). > > > I am sure MST can explain it better, but I'll try one more time. > Device will have two BARs with kick register one is PIO another is MMIO. > Old driver works only with PIO
RE: [PATCH] bookehv: Handle debug exception on guest exit
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, April 04, 2013 6:55 PM > To: Bhushan Bharat-R65777 > Cc: linuxppc-...@lists.ozlabs.org; kvm@vger.kernel.org; > kvm-...@vger.kernel.org; > Wood Scott-B07421; Bhushan Bharat-R65777 > Subject: Re: [PATCH] bookehv: Handle debug exception on guest exit > > > On 20.03.2013, at 18:45, Bharat Bhushan wrote: > > > EPCR.DUVD controls whether the debug events can come in hypervisor > > mode or not. When KVM guest is using the debug resource then we do not > > want debug events to be captured in guest entry/exit path. So we set > > EPCR.DUVD when entering and clears EPCR.DUVD when exiting from guest. > > > > Debug instruction complete is a post-completion debug exception but > > debug event gets posted on the basis of MSR before the instruction is > > executed. Now if the instruction switches the context from guest mode > > (MSR.GS = 1) to hypervisor mode (MSR.GS = 0) then the xSRR0 points to > > first instruction of KVM handler and xSRR1 points that MSR.GS is clear > > (hypervisor context). Now as xSRR1.GS is used to decide whether KVM > > handler will be invoked to handle the exception or host host kernel > > debug handler will be invoked to handle the exception. > > This leads to host kernel debug handler handling the exception which > > should either be handled by KVM. > > > > This is tested on e500mc in 32 bit mode > > > > Signed-off-by: Bharat Bhushan > > --- > > v0: > > - Do not apply this change for debug_crit as we do not know those chips have > issue or not. > > - corrected 64bit case branching > > > > arch/powerpc/kernel/exceptions-64e.S | 29 - > > arch/powerpc/kernel/head_booke.h | 26 ++ > > 2 files changed, 54 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/kernel/exceptions-64e.S > > b/arch/powerpc/kernel/exceptions-64e.S > > index 4684e33..8b26294 100644 > > --- a/arch/powerpc/kernel/exceptions-64e.S > > +++ b/arch/powerpc/kernel/exceptions-64e.S > > @@ -516,6 +516,33 @@ kernel_dbg_exc: > > andis. r15,r14,DBSR_IC@h > > beq+1f > > > > +#ifdef CONFIG_KVM_BOOKE_HV > > + /* > > +* EPCR.DUVD controls whether the debug events can come in > > +* hypervisor mode or not. When KVM guest is using the debug > > +* resource then we do not want debug events to be captured > > +* in guest entry/exit path. So we set EPCR.DUVD when entering > > +* and clears EPCR.DUVD when exiting from guest. > > +* Debug instruction complete is a post-completion debug > > +* exception but debug event gets posted on the basis of MSR > > +* before the instruction is executed. Now if the instruction > > +* switches the context from guest mode (MSR.GS = 1) to hypervisor > > +* mode (MSR.GS = 0) then the xSRR0 points to first instruction of > > Can't we just execute that code path with MSR.DE=0? Single stepping uses DBCR0.IC (instruction complete). Can you describe how MSR.DE = 0 will work? > > > Alex > > > +* KVM handler and xSRR1 points that MSR.GS is clear > > +* (hypervisor context). Now as xSRR1.GS is used to decide whether > > +* KVM handler will be invoked to handle the exception or host > > +* host kernel debug handler will be invoked to handle the exception. > > +* This leads to host kernel debug handler handling the exception > > +* which should either be handled by KVM. > > +*/ > > + mfspr r10, SPRN_EPCR > > + andis. r10,r10,SPRN_EPCR_DUVD@h > > + beq+2f > > + > > + andis. r10,r9,MSR_GS@h > > + beq+3f > > +2: > > +#endif > > LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) > > LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) > > cmpld cr0,r10,r14 > > @@ -523,7 +550,7 @@ kernel_dbg_exc: > > blt+cr0,1f > > bge+cr1,1f > > > > - /* here it looks like we got an inappropriate debug exception. */ > > +3: /* here it looks like we got an inappropriate debug exception. */ > > lis r14,DBSR_IC@h /* clear the IC event */ > > rlwinm r11,r11,0,~MSR_DE /* clear DE in the DSRR1 value */ > > mtspr SPRN_DBSR,r14 > > diff --git a/arch/powerpc/kernel/head_booke.h > > b/arch/powerpc/kernel/head_booke.h > > index 5f051ee..edc6a3b 100644 > > --- a/arch/powerpc/kernel/head_booke.h > > +++ b/arch/powerpc/kernel/head_booke.h > > @@ -285,7 +285,33 @@ label: > > mfspr r10,SPRN_DBSR; /* check single-step/branch taken */ \ > > andis. r10,r10,(DBSR_IC|DBSR_BT)@h; \ > > beq+2f; \ > > +#ifdef CONFIG_KVM_BOOKE_HV \ > > + /*\ > > +* EPCR.DUVD controls whether the debug events can come in\ > > +* hypervisor mode or not. When KVM guest is using the debug \ > > +*
Re: [PATCH v2 5/6] kvm: add PV MMIO
Il 04/04/2013 15:10, Michael S. Tsirkin ha scritto: > > I would like Avi to comment on this, because I think this is not the > > "memory-API approved" way of doing things. You need KVM to define its > > own AddressSpace, and make KVM's listener use > > memory_region_to_address_space to figure out if it is for PV MMIO. > > > > To handle accesses from TCG, the PV AddressSpace can simply have just an > > alias to the actual MemoryRegion where the doorbell is. > > This is not really different from other eventfd flags like datamatch. > Separate address space is not appropriate here. > This is regular memory space, KVM eventfd simply has a flag that says > "guest is well behaved so please make eventfd go faster". Having a separate address space would match what you do in the kernel though. I just don't like functions with a dozen arguments... Can you just make datamatch and pv a single flags argument, as is the case with the KVM API? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 04:02:57PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:58, Michael S. Tsirkin wrote: > > > On Thu, Apr 04, 2013 at 02:22:09PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:08, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes > slower than hypercalls. Why don't you just create a PV MMIO hypercall > that the guest can use to invoke MMIO accesses towards the host based on > physical addresses with explicit length encodings? > > >>> It is slower, but not an order of magnitude slower. It become faster > >>> with newer HW. > >>> > That way you simplify and speed up all code paths, exceeding the speed > of PIO exits even. It should also be quite easily portable, as all other > platforms have hypercalls available as well. > > >>> We are trying to avoid PV as much as possible (well this is also PV, > >>> but not guest visible). We haven't replaced PIO with hypercall for the > >>> same reason. My hope is that future HW will provide us with instruction > >>> decode for basic mov instruction at which point this optimisation can be > >>> dropped. > >> > >> The same applies to an MMIO hypercall. Once the PV interface becomes > >> obsolete, we can drop the capability we expose to the guest. > > > > Yes but unlike a hypercall this optimization does not need special code > > in the guest. You can use standard OS interfaces for memory access. > > Yes, but let's try to understand the room for optimization we're > talking about here. The "normal" MMIO case seems excessively slower in > MST's benchmarks than it did in mine. So maybe we're really just > looking at a bug here. Could be. I posted the code (kvm,qemu and test) so please review and try to spot a bug. > Also, if hcalls are again only 50% of a fast MMIO callback, it's > certainly worth checking out what room for improvement we're really > wasting. > > > Alex Take a look at 'kvm: pci PORT IO MMIO and PV MMIO speed tests'. Try running the test on your hardware and see what happens. Or post the test you used, I can try it on my box if you like. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Guest Lock up (100%) again!
One of my KVM guests locked up again at 100% CPU! Any thoughts on how I can diagnose it ? We would love to put into production but am very concerned about the current stability. I have tried to re-direct the console, through screen, to see whether there is a spin lock that is causing the problem; though all I got in the log file was a login prompt. What is the correct way of redirecting the console in KVM on a CentOS 6.4 system please ? Thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 03:06:42PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:56, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:45, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:38, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:08, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can > > look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 > > and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes > slower than hypercalls. Why don't you just create a PV MMIO > hypercall that the guest can use to invoke MMIO accesses towards the > host based on physical addresses with explicit length encodings? > > >>> It is slower, but not an order of magnitude slower. It become faster > >>> with newer HW. > >>> > That way you simplify and speed up all code paths, exceeding the > speed of PIO exits even. It should also be quite easily portable, as > all other platforms have hypercalls available as well. > > >>> We are trying to avoid PV as much as possible (well this is also PV, > >>> but not guest visible > >> > >> Also, how is this not guest visible? Who sets > >> KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates > >> that the guest does so, so it is guest visible. > >> > > QEMU sets it. > > How does QEMU know? > > >>> Knows what? When to create such eventfd? virtio device knows. > >> > >> Where does it know from? > >> > > It does it always. > > > >>> > > > >> +/* > >> + * PV_MMIO - Guest can promise us that all accesses touching this > >> address > >> + * are writes of specified length, starting at the specified address. > >> + * If not - it's a Guest bug. > >> + * Can not be used together with either PIO or DATAMATCH. > >> + */ > >> > > Virtio spec will state that access to a kick register needs to be of > > specific length. This is reasonable thing for HW to ask. > > This is a spec change. So the guest would have to indicate that it > adheres to a newer spec. Thus it's a guest visible change. > > >>> There is not virtio spec that has kick register in MMIO. The spec is in > >>> the works AFAIK. Actually PIO will not be deprecated and my suggestion > >> > >> So the guest would indicate that it supports a newer revision of the spec > >> (in your case, that it supports MMIO). How is that any different from > >> exposing that it supports a PV MMIO hcall? > >> > > Guest will indicate nothing. New driver will use MMIO if PIO is bar is > > not configured. All driver will not work for virtio devices with MMIO > > bar, but not PIO bar. > > I can't parse that, sorry :). It's simple. Driver does iowrite16 or whatever is appropriate for the OS. QEMU tells KVM which address driver uses, to make exits faster. This is not different from how eventfd works. For example if exits to QEMU suddenly become very cheap we can remove eventfd completely. > > > >>> is to move to MMIO only when PIO address space is exhausted. For PCI it > >>> will be never, for PCI-e it will be after ~16 devices. > >> > >> Ok, let's go back a step here. Are you actually able to measure any speed > >> in performan
Re: [Qemu-devel] [PATCH v2 5/6] kvm: add PV MMIO
On Thu, Apr 04, 2013 at 03:23:43PM +0100, Peter Maydell wrote: > On 4 April 2013 14:11, Michael S. Tsirkin wrote: > > On Thu, Apr 04, 2013 at 01:50:36PM +0100, Peter Maydell wrote: > >> Can we keep linux-headers updates in separate patches, please > >> (with commit messages saying what kernel commit/version the > >> sync is with)? > > > OK, first it needs to get into linux though :). > > In that case you need an RFC tag on your patches :-) > > -- PMM The cover letter said patches 5 and 6 are RFC, but patches 1-4 are ready and should go in I think. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/6] kvm: add PV MMIO
On 4 April 2013 14:11, Michael S. Tsirkin wrote: > On Thu, Apr 04, 2013 at 01:50:36PM +0100, Peter Maydell wrote: >> Can we keep linux-headers updates in separate patches, please >> (with commit messages saying what kernel commit/version the >> sync is with)? > OK, first it needs to get into linux though :). In that case you need an RFC tag on your patches :-) -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 03:56:49PM +0300, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > > > > On 04.04.2013, at 14:45, Gleb Natapov wrote: > > > > > On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > >> > > >> On 04.04.2013, at 14:38, Gleb Natapov wrote: > > >> > > >>> On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > >> > > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > >> > > >>> With KVM, MMIO is much slower than PIO, due to the need to > > >>> do page walk and emulation. But with EPT, it does not have to be: we > > >>> know the address from the VMCS so if the address is unique, we can > > >>> look > > >>> up the eventfd directly, bypassing emulation. > > >>> > > >>> Add an interface for userspace to specify this per-address, we can > > >>> use this e.g. for virtio. > > >>> > > >>> The implementation adds a separate bus internally. This serves two > > >>> purposes: > > >>> - minimize overhead for old userspace that does not use PV MMIO > > >>> - minimize disruption in other code (since we don't know the length, > > >>> devices on the MMIO bus only get a valid address in write, this > > >>> way we don't need to touch all devices to teach them handle > > >>> an dinvalid length) > > >>> > > >>> At the moment, this optimization is only supported for EPT on x86 > > >>> and > > >>> silently ignored for NPT and MMU, so everything works correctly but > > >>> slowly. > > >>> > > >>> TODO: NPT, MMU and non x86 architectures. > > >>> > > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > >>> pre-review and suggestions. > > >>> > > >>> Signed-off-by: Michael S. Tsirkin > > >> > > >> This still uses page fault intercepts which are orders of magnitudes > > >> slower than hypercalls. Why don't you just create a PV MMIO > > >> hypercall that the guest can use to invoke MMIO accesses towards the > > >> host based on physical addresses with explicit length encodings? > > >> > > > It is slower, but not an order of magnitude slower. It become faster > > > with newer HW. > > > > > >> That way you simplify and speed up all code paths, exceeding the > > >> speed of PIO exits even. It should also be quite easily portable, as > > >> all other platforms have hypercalls available as well. > > >> > > > We are trying to avoid PV as much as possible (well this is also PV, > > > but not guest visible > > > > Also, how is this not guest visible? Who sets > > KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates > > that the guest does so, so it is guest visible. > > > > >>> QEMU sets it. > > >> > > >> How does QEMU know? > > >> > > > Knows what? When to create such eventfd? virtio device knows. > > > > Where does it know from? > > > It does it always. > > > > > > >>> > > +/* > > + * PV_MMIO - Guest can promise us that all accesses touching this > > address > > + * are writes of specified length, starting at the specified address. > > + * If not - it's a Guest bug. > > + * Can not be used together with either PIO or DATAMATCH. > > + */ > > > > >>> Virtio spec will state that access to a kick register needs to be of > > >>> specific length. This is reasonable thing for HW to ask. > > >> > > >> This is a spec change. So the guest would have to indicate that it > > >> adheres to a newer spec. Thus it's a guest visible change. > > >> > > > There is not virtio spec that has kick register in MMIO. The spec is in > > > the works AFAIK. Actually PIO will not be deprecated and my suggestion > > > > So the guest would indicate that it supports a newer revision of the spec > > (in your case, that it supports MMIO). How is that any different from > > exposing that it supports a PV MMIO hcall? > > > Guest will indicate nothing. New driver will use MMIO if PIO is bar is > not configured. All driver will not work for virtio devices with MMIO > bar, but not PIO bar. > > > > is to move to MMIO only when PIO address space is exhausted. For PCI it > > > will be never, for PCI-e it will be after ~16 devices. > > > > Ok, let's go back a step here. Are you actually able to measure any speed > > in performance with this patch applied and without when going through MMIO > > kicks? > > > > > That's the question for MST. I think he did only micro benchmarks till > now and he already posted his result here: > > mmio-wildcard-eventfd:pci-mem 3529 > mmio-pv-eventfd:pci-mem 1878 > portio-wildcard-eventfd:pci-io 1846 > > So the patch speedup mmio by almost 100% and it is almost the same as PIO. Exa
Re: [Qemu-devel] [PATCH v2 5/6] kvm: add PV MMIO
On Thu, Apr 04, 2013 at 01:50:36PM +0100, Peter Maydell wrote: > On 4 April 2013 11:40, Michael S. Tsirkin wrote: > > Add an option for users to specify PV eventfd > > listeners, and utilize KVM's PV MMIO underneath. > > Upodate all callers. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > hw/dataplane/hostmem.c| 1 + > > hw/ivshmem.c | 2 ++ > > hw/pci-testdev.c | 2 ++ > > hw/vhost.c| 4 ++-- > > hw/virtio-pci.c | 4 ++-- > > include/exec/memory.h | 10 ++ > > kvm-all.c | 15 --- > > linux-headers/linux/kvm.h | 8 > > memory.c | 9 +++-- > > Can we keep linux-headers updates in separate patches, please > (with commit messages saying what kernel commit/version the > sync is with)? > > thanks > -- PMM OK, first it needs to get into linux though :). -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/6] kvm: add PV MMIO
On Thu, Apr 04, 2013 at 02:46:11PM +0200, Paolo Bonzini wrote: > Il 04/04/2013 12:40, Michael S. Tsirkin ha scritto: > > Add an option for users to specify PV eventfd > > listeners, and utilize KVM's PV MMIO underneath. > > Upodate all callers. > > I would like Avi to comment on this, because I think this is not the > "memory-API approved" way of doing things. You need KVM to define its > own AddressSpace, and make KVM's listener use > memory_region_to_address_space to figure out if it is for PV MMIO. > > To handle accesses from TCG, the PV AddressSpace can simply have just an > alias to the actual MemoryRegion where the doorbell is. > > Paolo This is not really different from other eventfd flags like datamatch. Separate address space is not appropriate here. This is regular memory space, KVM eventfd simply has a flag that says "guest is well behaved so please make eventfd go faster". > > Signed-off-by: Michael S. Tsirkin > > --- > > hw/dataplane/hostmem.c| 1 + > > hw/ivshmem.c | 2 ++ > > hw/pci-testdev.c | 2 ++ > > hw/vhost.c| 4 ++-- > > hw/virtio-pci.c | 4 ++-- > > include/exec/memory.h | 10 ++ > > kvm-all.c | 15 --- > > linux-headers/linux/kvm.h | 8 > > memory.c | 9 +++-- > > 9 files changed, 46 insertions(+), 9 deletions(-) > > > > diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c > > index 380537e..41d43e7 100644 > > --- a/hw/dataplane/hostmem.c > > +++ b/hw/dataplane/hostmem.c > > @@ -126,6 +126,7 @@ static void > > hostmem_listener_section_dummy(MemoryListener *listener, > > > > static void hostmem_listener_eventfd_dummy(MemoryListener *listener, > > MemoryRegionSection *section, > > + bool pv, > > bool match_data, uint64_t data, > > EventNotifier *e) > > { > > diff --git a/hw/ivshmem.c b/hw/ivshmem.c > > index 68a2cf2..28e5978 100644 > > --- a/hw/ivshmem.c > > +++ b/hw/ivshmem.c > > @@ -352,6 +352,7 @@ static void ivshmem_add_eventfd(IVShmemState *s, int > > posn, int i) > > memory_region_add_eventfd(&s->ivshmem_mmio, > >DOORBELL, > >4, > > + false, > >true, > >(posn << 16) | i, > >&s->peers[posn].eventfds[i]); > > @@ -362,6 +363,7 @@ static void ivshmem_del_eventfd(IVShmemState *s, int > > posn, int i) > > memory_region_del_eventfd(&s->ivshmem_mmio, > >DOORBELL, > >4, > > + false, > >true, > >(posn << 16) | i, > >&s->peers[posn].eventfds[i]); > > diff --git a/hw/pci-testdev.c b/hw/pci-testdev.c > > index 9486624..f0ebf99 100644 > > --- a/hw/pci-testdev.c > > +++ b/hw/pci-testdev.c > > @@ -80,6 +80,7 @@ static int pci_testdev_start(IOTest *test) > > memory_region_add_eventfd(test->mr, > >le32_to_cpu(test->hdr->offset), > >test->size, > > + false, > >test->match_data, > >test->hdr->data, > >&test->notifier); > > @@ -94,6 +95,7 @@ static void pci_testdev_stop(IOTest *test) > > memory_region_del_eventfd(test->mr, > >le32_to_cpu(test->hdr->offset), > >test->size, > > + false, > >test->match_data, > >test->hdr->data, > >&test->notifier); > > diff --git a/hw/vhost.c b/hw/vhost.c > > index 4d6aee3..9ad7101 100644 > > --- a/hw/vhost.c > > +++ b/hw/vhost.c > > @@ -750,13 +750,13 @@ static void vhost_virtqueue_stop(struct vhost_dev > > *dev, > > } > > > > static void vhost_eventfd_add(MemoryListener *listener, > > - MemoryRegionSection *section, > > + MemoryRegionSection *section, bool pv, > >bool match_data, uint64_t data, > > EventNotifier *e) > > { > > } > > > > static void vhost_eventfd_del(MemoryListener *listener, > > - MemoryRegionSection *section, > > + MemoryRegionSection *section, bool pv, > >bool match_data, uint64_t data, > > EventNotifier *e) > > { > > } > > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c > > index 19965e5..f9ff994 100644 > > --- a/hw/virtio-pci.c > > +++ b/hw/virtio-pci.c > > @@ -189,10 +189,10
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:38, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:08, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes > slower than hypercalls. Why don't you just create a PV MMIO hypercall > that the guest can use to invoke MMIO accesses towards the host based on > physical addresses with explicit length encodings? > > >>> It is slower, but not an order of magnitude slower. It become faster > >>> with newer HW. > >>> > That way you simplify and speed up all code paths, exceeding the speed > of PIO exits even. It should also be quite easily portable, as all other > platforms have hypercalls available as well. > > >>> We are trying to avoid PV as much as possible (well this is also PV, > >>> but not guest visible > >> > >> Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? > >> The comment above its definition indicates that the guest does so, so it > >> is guest visible. > >> > > QEMU sets it. > > How does QEMU know? > > > > >> +/* > >> + * PV_MMIO - Guest can promise us that all accesses touching this address > >> + * are writes of specified length, starting at the specified address. > >> + * If not - it's a Guest bug. > >> + * Can not be used together with either PIO or DATAMATCH. > >> + */ > >> > > Virtio spec will state that access to a kick register needs to be of > > specific length. This is reasonable thing for HW to ask. > > This is a spec change. So the guest would have to indicate that it adheres to > a newer spec. Thus it's a guest visible change. > > > Alex To use MMIO is a spec change, yes. What Gleb refers to is that the optimization is not guest visible. If you have hardware that can quickly decode MMIO you can drop the PV ioctl from qemu without changing guests. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. Why don't you just create a PV MMIO hypercall that > >> the guest can use to invoke MMIO accesses towards the host based on > >> physical addresses with explicit length encodings? > >> > > It is slower, but not an order of magnitude slower. It become faster > > with newer HW. > > > >> That way you simplify and speed up all code paths, exceeding the speed of > >> PIO exits even. It should also be quite easily portable, as all other > >> platforms have hypercalls available as well. > >> > > We are trying to avoid PV as much as possible (well this is also PV, > > but not guest visible > > Also, how is this not guest visible? It's visible but it does not require special code in the guest. Only on qemu. Guest can use standard iowritel/w/b. > Who sets > KVM_IOEVENTFD_FLAG_PV_MMIO? It's an ioctl so qemu does this. > The comment above its definition indicates > that the guest does so, so it is guest visible. > > +/* > + * PV_MMIO - Guest can promise us that all accesses touching this address > + * are writes of specified length, starting at the specified address. > + * If not - it's a Guest bug. > + * Can not be used together with either PIO or DATAMATCH. > + */ > > > Alex The requirement is to only access a specific address by a single aligned instruction. For example, only aligned signle-word accesses are allowed. This is a standard practice with many devices, in fact, virtio spec already requires this for notifications. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:58, Michael S. Tsirkin wrote: > On Thu, Apr 04, 2013 at 02:22:09PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:08, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > With KVM, MMIO is much slower than PIO, due to the need to > do page walk and emulation. But with EPT, it does not have to be: we > know the address from the VMCS so if the address is unique, we can look > up the eventfd directly, bypassing emulation. > > Add an interface for userspace to specify this per-address, we can > use this e.g. for virtio. > > The implementation adds a separate bus internally. This serves two > purposes: > - minimize overhead for old userspace that does not use PV MMIO > - minimize disruption in other code (since we don't know the length, > devices on the MMIO bus only get a valid address in write, this > way we don't need to touch all devices to teach them handle > an dinvalid length) > > At the moment, this optimization is only supported for EPT on x86 and > silently ignored for NPT and MMU, so everything works correctly but > slowly. > > TODO: NPT, MMU and non x86 architectures. > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > pre-review and suggestions. > > Signed-off-by: Michael S. Tsirkin This still uses page fault intercepts which are orders of magnitudes slower than hypercalls. Why don't you just create a PV MMIO hypercall that the guest can use to invoke MMIO accesses towards the host based on physical addresses with explicit length encodings? >>> It is slower, but not an order of magnitude slower. It become faster >>> with newer HW. >>> That way you simplify and speed up all code paths, exceeding the speed of PIO exits even. It should also be quite easily portable, as all other platforms have hypercalls available as well. >>> We are trying to avoid PV as much as possible (well this is also PV, >>> but not guest visible). We haven't replaced PIO with hypercall for the >>> same reason. My hope is that future HW will provide us with instruction >>> decode for basic mov instruction at which point this optimisation can be >>> dropped. >> >> The same applies to an MMIO hypercall. Once the PV interface becomes >> obsolete, we can drop the capability we expose to the guest. > > Yes but unlike a hypercall this optimization does not need special code > in the guest. You can use standard OS interfaces for memory access. Yes, but let's try to understand the room for optimization we're talking about here. The "normal" MMIO case seems excessively slower in MST's benchmarks than it did in mine. So maybe we're really just looking at a bug here. Also, if hcalls are again only 50% of a fast MMIO callback, it's certainly worth checking out what room for improvement we're really wasting. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:22:09PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. Why don't you just create a PV MMIO hypercall that > >> the guest can use to invoke MMIO accesses towards the host based on > >> physical addresses with explicit length encodings? > >> > > It is slower, but not an order of magnitude slower. It become faster > > with newer HW. > > > >> That way you simplify and speed up all code paths, exceeding the speed of > >> PIO exits even. It should also be quite easily portable, as all other > >> platforms have hypercalls available as well. > >> > > We are trying to avoid PV as much as possible (well this is also PV, > > but not guest visible). We haven't replaced PIO with hypercall for the > > same reason. My hope is that future HW will provide us with instruction > > decode for basic mov instruction at which point this optimisation can be > > dropped. > > The same applies to an MMIO hypercall. Once the PV interface becomes > obsolete, we can drop the capability we expose to the guest. Yes but unlike a hypercall this optimization does not need special code in the guest. You can use standard OS interfaces for memory access. > > And hypercall has its own set of problems with Windows guests. > > When KVM runs in Hyper-V emulation mode it expects to get Hyper-V > > hypercalls. Mixing KVM hypercalls and Hyper-V requires some tricks. It > > Can't we simply register a hypercall ID range with Microsoft? > > > may also affect WHQLing Windows drivers since driver will talk to HW > > bypassing Windows interfaces. > > Then the WHQL'ed driver doesn't support the PV MMIO hcall? > > > Alex -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/7 v2] KVM: PPC: e500: Expose MMU registers via ONE_REG
> -Original Message- > From: Wood Scott-B07421 > Sent: Wednesday, March 27, 2013 12:43 AM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de > Subject: Re: [PATCH 1/7 v2] KVM: PPC: e500: Expose MMU registers via > ONE_REG > > On 03/26/2013 05:05:06 PM, Mihai Caraman wrote: > > +int kvmppc_set_one_reg_e500_tlb(struct kvm_vcpu *vcpu, u64 id, > > + union kvmppc_one_reg *val) > > +{ > > + int r = 0; > > + long int i; > > + > > + switch (id) { > > + case KVM_REG_PPC_MAS0: > > + vcpu->arch.shared->mas0 = set_reg_val(id, *val); > > + break; > > + case KVM_REG_PPC_MAS1: > > + vcpu->arch.shared->mas1 = set_reg_val(id, *val); > > + break; > > + case KVM_REG_PPC_MAS2: > > + vcpu->arch.shared->mas2 = set_reg_val(id, *val); > > + break; > > + case KVM_REG_PPC_MAS7_3: > > + vcpu->arch.shared->mas7_3 = set_reg_val(id, *val); > > + break; > > + case KVM_REG_PPC_MAS4: > > + vcpu->arch.shared->mas4 = set_reg_val(id, *val); > > + break; > > + case KVM_REG_PPC_MAS6: > > + vcpu->arch.shared->mas6 = set_reg_val(id, *val); > > + break; > > + /* Only allow MMU registers to be set to the config supported > > by KVM */ > > + case KVM_REG_PPC_MMUCFG: { > > + if (set_reg_val(id, *val) != vcpu->arch.mmucfg) > > + r = -EINVAL; > > + break; > > + } > > + case KVM_REG_PPC_TLB0CFG: > > + case KVM_REG_PPC_TLB1CFG: > > + case KVM_REG_PPC_TLB2CFG: > > + case KVM_REG_PPC_TLB3CFG: { > > + /* MMU geometry (N_ENTRY/ASSOC) can be set only using > > SW_TLB */ > > + i = id - KVM_REG_PPC_TLB0CFG; > > + if (set_reg_val(id, *val) != vcpu->arch.tlbcfg[i]) > > + r = -EINVAL; > > + break; > > + } > > Am I the only one that finds the set_reg_val/get_reg_val naming > confusing? At first glance it looks like it sets TLBnCFG and then > later tests whether it should have. :-P > > Functions should be named for what they do, not for what context you > use them in. I also found the existing code difficult to read. If we stick to reg and val varible names what about val_to_reg/reg_to_val. -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bookehv: Handle debug exception on guest exit
On 20.03.2013, at 18:45, Bharat Bhushan wrote: > EPCR.DUVD controls whether the debug events can come in > hypervisor mode or not. When KVM guest is using the debug > resource then we do not want debug events to be captured > in guest entry/exit path. So we set EPCR.DUVD when entering > and clears EPCR.DUVD when exiting from guest. > > Debug instruction complete is a post-completion debug > exception but debug event gets posted on the basis of MSR > before the instruction is executed. Now if the instruction > switches the context from guest mode (MSR.GS = 1) to hypervisor > mode (MSR.GS = 0) then the xSRR0 points to first instruction of > KVM handler and xSRR1 points that MSR.GS is clear > (hypervisor context). Now as xSRR1.GS is used to decide whether > KVM handler will be invoked to handle the exception or host > host kernel debug handler will be invoked to handle the exception. > This leads to host kernel debug handler handling the exception > which should either be handled by KVM. > > This is tested on e500mc in 32 bit mode > > Signed-off-by: Bharat Bhushan > --- > v0: > - Do not apply this change for debug_crit as we do not know those chips have > issue or not. > - corrected 64bit case branching > > arch/powerpc/kernel/exceptions-64e.S | 29 - > arch/powerpc/kernel/head_booke.h | 26 ++ > 2 files changed, 54 insertions(+), 1 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64e.S > b/arch/powerpc/kernel/exceptions-64e.S > index 4684e33..8b26294 100644 > --- a/arch/powerpc/kernel/exceptions-64e.S > +++ b/arch/powerpc/kernel/exceptions-64e.S > @@ -516,6 +516,33 @@ kernel_dbg_exc: > andis. r15,r14,DBSR_IC@h > beq+1f > > +#ifdef CONFIG_KVM_BOOKE_HV > + /* > + * EPCR.DUVD controls whether the debug events can come in > + * hypervisor mode or not. When KVM guest is using the debug > + * resource then we do not want debug events to be captured > + * in guest entry/exit path. So we set EPCR.DUVD when entering > + * and clears EPCR.DUVD when exiting from guest. > + * Debug instruction complete is a post-completion debug > + * exception but debug event gets posted on the basis of MSR > + * before the instruction is executed. Now if the instruction > + * switches the context from guest mode (MSR.GS = 1) to hypervisor > + * mode (MSR.GS = 0) then the xSRR0 points to first instruction of Can't we just execute that code path with MSR.DE=0? Alex > + * KVM handler and xSRR1 points that MSR.GS is clear > + * (hypervisor context). Now as xSRR1.GS is used to decide whether > + * KVM handler will be invoked to handle the exception or host > + * host kernel debug handler will be invoked to handle the exception. > + * This leads to host kernel debug handler handling the exception > + * which should either be handled by KVM. > + */ > + mfspr r10, SPRN_EPCR > + andis. r10,r10,SPRN_EPCR_DUVD@h > + beq+2f > + > + andis. r10,r9,MSR_GS@h > + beq+3f > +2: > +#endif > LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) > LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) > cmpld cr0,r10,r14 > @@ -523,7 +550,7 @@ kernel_dbg_exc: > blt+cr0,1f > bge+cr1,1f > > - /* here it looks like we got an inappropriate debug exception. */ > +3: /* here it looks like we got an inappropriate debug exception. */ > lis r14,DBSR_IC@h /* clear the IC event */ > rlwinm r11,r11,0,~MSR_DE /* clear DE in the DSRR1 value */ > mtspr SPRN_DBSR,r14 > diff --git a/arch/powerpc/kernel/head_booke.h > b/arch/powerpc/kernel/head_booke.h > index 5f051ee..edc6a3b 100644 > --- a/arch/powerpc/kernel/head_booke.h > +++ b/arch/powerpc/kernel/head_booke.h > @@ -285,7 +285,33 @@ label: > mfspr r10,SPRN_DBSR; /* check single-step/branch taken */ \ > andis. r10,r10,(DBSR_IC|DBSR_BT)@h; \ > beq+2f; \ > +#ifdef CONFIG_KVM_BOOKE_HV \ > + /*\ > + * EPCR.DUVD controls whether the debug events can come in\ > + * hypervisor mode or not. When KVM guest is using the debug \ > + * resource then we do not want debug events to be captured \ > + * in guest entry/exit path. So we set EPCR.DUVD when entering\ > + * and clears EPCR.DUVD when exiting from guest. \ > + * Debug instruction complete is a post-completion debug \ > + * exception but debug event gets posted on the basis of MSR \ > + * before the instruction is executed. Now if the instruction \ > + * switches the context from guest mode (MSR.GS = 1) to hypervisor
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 03:06:42PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:56, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:45, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:38, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:08, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can > > look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 > > and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes > slower than hypercalls. Why don't you just create a PV MMIO > hypercall that the guest can use to invoke MMIO accesses towards the > host based on physical addresses with explicit length encodings? > > >>> It is slower, but not an order of magnitude slower. It become faster > >>> with newer HW. > >>> > That way you simplify and speed up all code paths, exceeding the > speed of PIO exits even. It should also be quite easily portable, as > all other platforms have hypercalls available as well. > > >>> We are trying to avoid PV as much as possible (well this is also PV, > >>> but not guest visible > >> > >> Also, how is this not guest visible? Who sets > >> KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates > >> that the guest does so, so it is guest visible. > >> > > QEMU sets it. > > How does QEMU know? > > >>> Knows what? When to create such eventfd? virtio device knows. > >> > >> Where does it know from? > >> > > It does it always. > > > >>> > > > >> +/* > >> + * PV_MMIO - Guest can promise us that all accesses touching this > >> address > >> + * are writes of specified length, starting at the specified address. > >> + * If not - it's a Guest bug. > >> + * Can not be used together with either PIO or DATAMATCH. > >> + */ > >> > > Virtio spec will state that access to a kick register needs to be of > > specific length. This is reasonable thing for HW to ask. > > This is a spec change. So the guest would have to indicate that it > adheres to a newer spec. Thus it's a guest visible change. > > >>> There is not virtio spec that has kick register in MMIO. The spec is in > >>> the works AFAIK. Actually PIO will not be deprecated and my suggestion > >> > >> So the guest would indicate that it supports a newer revision of the spec > >> (in your case, that it supports MMIO). How is that any different from > >> exposing that it supports a PV MMIO hcall? > >> > > Guest will indicate nothing. New driver will use MMIO if PIO is bar is > > not configured. All driver will not work for virtio devices with MMIO > > bar, but not PIO bar. > > I can't parse that, sorry :). > I am sure MST can explain it better, but I'll try one more time. Device will have two BARs with kick register one is PIO another is MMIO. Old driver works only with PIO new one support both. MMIO is used only when PIO space is exhausted. So old driver will not be able to drive new virtio device that have no PIO bar configured. > > > >>> is to move to MMIO only when PIO address space is exhausted. For PCI it > >>> will be never, for PCI-e it will be after ~16 devices. > >> > >> Ok, let's go back a step here. Are yo
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:56, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:45, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: On 04.04.2013, at 14:38, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:08, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > With KVM, MMIO is much slower than PIO, due to the need to > do page walk and emulation. But with EPT, it does not have to be: we > know the address from the VMCS so if the address is unique, we can > look > up the eventfd directly, bypassing emulation. > > Add an interface for userspace to specify this per-address, we can > use this e.g. for virtio. > > The implementation adds a separate bus internally. This serves two > purposes: > - minimize overhead for old userspace that does not use PV MMIO > - minimize disruption in other code (since we don't know the length, > devices on the MMIO bus only get a valid address in write, this > way we don't need to touch all devices to teach them handle > an dinvalid length) > > At the moment, this optimization is only supported for EPT on x86 and > silently ignored for NPT and MMU, so everything works correctly but > slowly. > > TODO: NPT, MMU and non x86 architectures. > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > pre-review and suggestions. > > Signed-off-by: Michael S. Tsirkin This still uses page fault intercepts which are orders of magnitudes slower than hypercalls. Why don't you just create a PV MMIO hypercall that the guest can use to invoke MMIO accesses towards the host based on physical addresses with explicit length encodings? >>> It is slower, but not an order of magnitude slower. It become faster >>> with newer HW. >>> That way you simplify and speed up all code paths, exceeding the speed of PIO exits even. It should also be quite easily portable, as all other platforms have hypercalls available as well. >>> We are trying to avoid PV as much as possible (well this is also PV, >>> but not guest visible >> >> Also, how is this not guest visible? Who sets >> KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates >> that the guest does so, so it is guest visible. >> > QEMU sets it. How does QEMU know? >>> Knows what? When to create such eventfd? virtio device knows. >> >> Where does it know from? >> > It does it always. > >>> > >> +/* >> + * PV_MMIO - Guest can promise us that all accesses touching this >> address >> + * are writes of specified length, starting at the specified address. >> + * If not - it's a Guest bug. >> + * Can not be used together with either PIO or DATAMATCH. >> + */ >> > Virtio spec will state that access to a kick register needs to be of > specific length. This is reasonable thing for HW to ask. This is a spec change. So the guest would have to indicate that it adheres to a newer spec. Thus it's a guest visible change. >>> There is not virtio spec that has kick register in MMIO. The spec is in >>> the works AFAIK. Actually PIO will not be deprecated and my suggestion >> >> So the guest would indicate that it supports a newer revision of the spec >> (in your case, that it supports MMIO). How is that any different from >> exposing that it supports a PV MMIO hcall? >> > Guest will indicate nothing. New driver will use MMIO if PIO is bar is > not configured. All driver will not work for virtio devices with MMIO > bar, but not PIO bar. I can't parse that, sorry :). > >>> is to move to MMIO only when PIO address space is exhausted. For PCI it >>> will be never, for PCI-e it will be after ~16 devices. >> >> Ok, let's go back a step here. Are you actually able to measure any speed in >> performance with this patch applied and without when going through MMIO >> kicks? >> >> > That's the question for MST. I think he did only micro benchmarks till > now and he already posted his result here: > > mmio-wildcard-eventfd:pci-mem 3529 > mmio-pv-eventfd:pci-mem 1878 > portio-wildcard-eventfd:pci-io 1846 > > So the patch speedup mmio by almost 100% and it is almost the same as PIO. Those numbers don't align at all with what I measured. MST, could you please do a real world latency benchmark with virtio-net and * normal ioeventfd * mmio-pv eventfd * hcall
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:49:39PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:45, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:38, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can > >>> look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. Why don't you just create a PV MMIO hypercall > >> that the guest can use to invoke MMIO accesses towards the host based > >> on physical addresses with explicit length encodings? > >> > > It is slower, but not an order of magnitude slower. It become faster > > with newer HW. > > > >> That way you simplify and speed up all code paths, exceeding the speed > >> of PIO exits even. It should also be quite easily portable, as all > >> other platforms have hypercalls available as well. > >> > > We are trying to avoid PV as much as possible (well this is also PV, > > but not guest visible > > Also, how is this not guest visible? Who sets > KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates > that the guest does so, so it is guest visible. > > >>> QEMU sets it. > >> > >> How does QEMU know? > >> > > Knows what? When to create such eventfd? virtio device knows. > > Where does it know from? > It does it always. > > > >>> > +/* > + * PV_MMIO - Guest can promise us that all accesses touching this > address > + * are writes of specified length, starting at the specified address. > + * If not - it's a Guest bug. > + * Can not be used together with either PIO or DATAMATCH. > + */ > > >>> Virtio spec will state that access to a kick register needs to be of > >>> specific length. This is reasonable thing for HW to ask. > >> > >> This is a spec change. So the guest would have to indicate that it adheres > >> to a newer spec. Thus it's a guest visible change. > >> > > There is not virtio spec that has kick register in MMIO. The spec is in > > the works AFAIK. Actually PIO will not be deprecated and my suggestion > > So the guest would indicate that it supports a newer revision of the spec (in > your case, that it supports MMIO). How is that any different from exposing > that it supports a PV MMIO hcall? > Guest will indicate nothing. New driver will use MMIO if PIO is bar is not configured. All driver will not work for virtio devices with MMIO bar, but not PIO bar. > > is to move to MMIO only when PIO address space is exhausted. For PCI it > > will be never, for PCI-e it will be after ~16 devices. > > Ok, let's go back a step here. Are you actually able to measure any speed in > performance with this patch applied and without when going through MMIO kicks? > > That's the question for MST. I think he did only micro benchmarks till now and he already posted his result here: mmio-wildcard-eventfd:pci-mem 3529 mmio-pv-eventfd:pci-mem 1878 portio-wildcard-eventfd:pci-io 1846 So the patch speedup mmio by almost 100% and it is almost the same as PIO. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 6/6] kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
On 03.04.2013, at 03:57, Scott Wood wrote: > Enabling this capability connects the vcpu to the designated in-kernel > MPIC. Using explicit connections between vcpus and irqchips allows > for flexibility, but the main benefit at the moment is that it > simplifies the code -- KVM doesn't need vm-global state to remember > which MPIC object is associated with this vm, and it doesn't need to > care about ordering between irqchip creation and vcpu creation. > > Signed-off-by: Scott Wood > --- > Documentation/virtual/kvm/api.txt |8 ++ > arch/powerpc/include/asm/kvm_host.h |8 ++ > arch/powerpc/include/asm/kvm_ppc.h |2 ++ > arch/powerpc/kvm/booke.c|4 ++- > arch/powerpc/kvm/mpic.c | 49 +++ > arch/powerpc/kvm/powerpc.c | 26 +++ > include/uapi/linux/kvm.h|1 + > 7 files changed, 92 insertions(+), 6 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index d52f3f9..4c326ae 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2728,3 +2728,11 @@ to receive the topmost interrupt vector. > When disabled (args[0] == 0), behavior is as if this facility is unsupported. > > When this capability is enabled, KVM_EXIT_EPR can occur. > + > +6.6 KVM_CAP_IRQ_MPIC > + > +Architectures: ppc > +Parameters: args[0] is the MPIC device fd > +args[1] is the MPIC CPU number for this vcpu > + > +This capability connects the vcpu to an in-kernel MPIC device. > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 7e7aef9..2a2e235 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -375,6 +375,11 @@ struct kvmppc_booke_debug_reg { > u64 dac[KVMPPC_BOOKE_MAX_DAC]; > }; > > +#define KVMPPC_IRQ_DEFAULT 0 > +#define KVMPPC_IRQ_MPIC 1 > + > +struct openpic; > + > struct kvm_vcpu_arch { > ulong host_stack; > u32 host_pid; > @@ -554,6 +559,9 @@ struct kvm_vcpu_arch { > unsigned long magic_page_pa; /* phys addr to map the magic page to */ > unsigned long magic_page_ea; /* effect. addr to map the magic page to */ > > + int irq_type; /* one of KVM_IRQ_* */ > + struct openpic *mpic; /* KVM_IRQ_MPIC */ > + > #ifdef CONFIG_KVM_BOOK3S_64_HV > struct kvm_vcpu_arch_shared shregs; > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index 3b63b97..f54707f 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -276,6 +276,8 @@ static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, > u32 epr) > } > > void kvmppc_mpic_set_epr(struct kvm_vcpu *vcpu); > +int kvmppc_mpic_connect_vcpu(struct file *mpic_filp, struct kvm_vcpu *vcpu, > + u32 cpu); > > int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu, > struct kvm_config_tlb *cfg); > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > index cddc6b3..7d00222 100644 > --- a/arch/powerpc/kvm/booke.c > +++ b/arch/powerpc/kvm/booke.c > @@ -430,8 +430,10 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu > *vcpu, > if (update_epr == true) { > if (vcpu->arch.epr_flags & KVMPPC_EPR_USER) > kvm_make_request(KVM_REQ_EPR_EXIT, vcpu); > - else if (vcpu->arch.epr_flags & KVMPPC_EPR_KERNEL) > + else if (vcpu->arch.epr_flags & KVMPPC_EPR_KERNEL) { > + BUG_ON(vcpu->arch.irq_type != KVMPPC_IRQ_MPIC); > kvmppc_mpic_set_epr(vcpu); > + } > } > > new_msr &= msr_mask; > diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c > index 8cda2fa..caffe3b 100644 > --- a/arch/powerpc/kvm/mpic.c > +++ b/arch/powerpc/kvm/mpic.c > @@ -1159,7 +1159,7 @@ static uint32_t openpic_iack(struct openpic *opp, > struct irq_dest *dst, > > void kvmppc_mpic_set_epr(struct kvm_vcpu *vcpu) > { > - struct openpic *opp = vcpu->arch.irqchip_priv; > + struct openpic *opp = vcpu->arch.mpic; > int cpu = vcpu->vcpu_id; > unsigned long flags; > > @@ -1442,10 +1442,10 @@ static void map_mmio(struct openpic *opp) > > static void unmap_mmio(struct openpic *opp) > { > - BUG_ON(opp->mmio_mapped); > - opp->mmio_mapped = false; > - > - kvm_io_bus_unregister_dev(opp->kvm, KVM_MMIO_BUS, &opp->mmio); > + if (opp->mmio_mapped) { > + opp->mmio_mapped = false; > + kvm_io_bus_unregister_dev(opp->kvm, KVM_MMIO_BUS, &opp->mmio); > + } > } > > static int set_base_addr(struct openpic *opp, struct kvm_device_attr *attr) > @@ -1681,6 +1681,45 @@ static const struct file_operations kvm_mpic_fops = { > .release = kvm_mpic_release, > }; >
Re: [PATCH 6/7] ARM: KVM: switch to a dual-step HYP init code
On 04/04/13 00:15, Christoffer Dall wrote: > On Tue, Apr 02, 2013 at 02:25:14PM +0100, Marc Zyngier wrote: >> Our HYP init code suffers from two major design issues: >> - it cannot support CPU hotplug, as we tear down the idmap very early >> - it cannot perform a TLB invalidation when switching from init to >> runtime mappings, as pages are manipulated from PL1 exclusively >> >> The hotplug problem mandates that we keep two sets of page tables >> (boot and runtime). The TLB problem mandates that we're able to >> transition from one PGD to another while in HYP, invalidating the TLBs >> in the process. >> >> To be able to do this, we need to share a page between the two page >> tables. A page that will have the same VA in both configurations. All we >> need is a VA that has the following properties: >> - This VA can't be used to represent a kernel mapping. >> - This VA will not conflict with the physical address of the kernel text >> >> The vectors page seems to satisfy this requirement: >> - The kernel never maps anything else there >> - The kernel text being copied at the beginning of the physical memory, >> it is unlikely to use the last 64kB (I doubt we'll ever support KVM >> on a system with something like 4MB of RAM, but patches are very >> welcome). >> >> Let's call this VA the trampoline VA. >> >> Now, we map our init page at 3 locations: >> - idmap in the boot pgd >> - trampoline VA in the boot pgd >> - trampoline VA in the runtime pgd >> >> The init scenario is now the following: >> - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd, >> runtime stack, runtime vectors >> - Enable the MMU with the boot pgd >> - Jump to a target into the trampoline page (remember, this is the same >> physical page!) >> - Now switch to the runtime pgd (same VA, and still the same physical >> page!) >> - Invalidate TLBs >> - Set stack and vectors >> - Profit! (or eret, if you only care about the code). > > So I'm going to do my usual commenting routine. Was it an idea to insert > this commit text (which I really liked by the way!) into init.S where > the current comment is a little lacking giving the massive complexity > this is turning into, madness? Will do. [...] >> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S >> index 35a463f..b2c6967 100644 >> --- a/arch/arm/kvm/init.S >> +++ b/arch/arm/kvm/init.S >> @@ -21,6 +21,7 @@ >> #include >> #include >> #include >> +#include >> >> / >> * Hypervisor initialization >> @@ -47,6 +48,9 @@ __kvm_hyp_init: >> W(b). >> >> __do_hyp_init: >> + cmp r2, #0 @ We have a SP? >> + bne phase2 @ Yes, second stage init >> + >> @ Set the HTTBR to point to the hypervisor PGD pointer passed >> mcrrp15, 4, r0, r1, c2 >> >> @@ -96,14 +100,35 @@ __do_hyp_init: >> orr r0, r0, r1 >> isb >> mcr p15, 4, r0, c1, c0, 0 @ HSCR >> - isb >> >> - @ Set stack pointer and return to the kernel >> + eret > > Could you add some comment here to indicate we're done with phase1, it > seems like this eret should not go unnoticed by casual readers (ok, they > shouldn't read this code casually, but anyway..., it will make me sleep > better) Sure, no problem. M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 5/6] kvm: add PV MMIO
On 4 April 2013 11:40, Michael S. Tsirkin wrote: > Add an option for users to specify PV eventfd > listeners, and utilize KVM's PV MMIO underneath. > Upodate all callers. > > Signed-off-by: Michael S. Tsirkin > --- > hw/dataplane/hostmem.c| 1 + > hw/ivshmem.c | 2 ++ > hw/pci-testdev.c | 2 ++ > hw/vhost.c| 4 ++-- > hw/virtio-pci.c | 4 ++-- > include/exec/memory.h | 10 ++ > kvm-all.c | 15 --- > linux-headers/linux/kvm.h | 8 > memory.c | 9 +++-- Can we keep linux-headers updates in separate patches, please (with commit messages saying what kernel commit/version the sync is with)? thanks -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:45, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:38, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: On 04.04.2013, at 14:08, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: >> >>> With KVM, MMIO is much slower than PIO, due to the need to >>> do page walk and emulation. But with EPT, it does not have to be: we >>> know the address from the VMCS so if the address is unique, we can look >>> up the eventfd directly, bypassing emulation. >>> >>> Add an interface for userspace to specify this per-address, we can >>> use this e.g. for virtio. >>> >>> The implementation adds a separate bus internally. This serves two >>> purposes: >>> - minimize overhead for old userspace that does not use PV MMIO >>> - minimize disruption in other code (since we don't know the length, >>> devices on the MMIO bus only get a valid address in write, this >>> way we don't need to touch all devices to teach them handle >>> an dinvalid length) >>> >>> At the moment, this optimization is only supported for EPT on x86 and >>> silently ignored for NPT and MMU, so everything works correctly but >>> slowly. >>> >>> TODO: NPT, MMU and non x86 architectures. >>> >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for >>> pre-review and suggestions. >>> >>> Signed-off-by: Michael S. Tsirkin >> >> This still uses page fault intercepts which are orders of magnitudes >> slower than hypercalls. Why don't you just create a PV MMIO hypercall >> that the guest can use to invoke MMIO accesses towards the host based on >> physical addresses with explicit length encodings? >> > It is slower, but not an order of magnitude slower. It become faster > with newer HW. > >> That way you simplify and speed up all code paths, exceeding the speed >> of PIO exits even. It should also be quite easily portable, as all other >> platforms have hypercalls available as well. >> > We are trying to avoid PV as much as possible (well this is also PV, > but not guest visible Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates that the guest does so, so it is guest visible. >>> QEMU sets it. >> >> How does QEMU know? >> > Knows what? When to create such eventfd? virtio device knows. Where does it know from? > >>> +/* + * PV_MMIO - Guest can promise us that all accesses touching this address + * are writes of specified length, starting at the specified address. + * If not - it's a Guest bug. + * Can not be used together with either PIO or DATAMATCH. + */ >>> Virtio spec will state that access to a kick register needs to be of >>> specific length. This is reasonable thing for HW to ask. >> >> This is a spec change. So the guest would have to indicate that it adheres >> to a newer spec. Thus it's a guest visible change. >> > There is not virtio spec that has kick register in MMIO. The spec is in > the works AFAIK. Actually PIO will not be deprecated and my suggestion So the guest would indicate that it supports a newer revision of the spec (in your case, that it supports MMIO). How is that any different from exposing that it supports a PV MMIO hcall? > is to move to MMIO only when PIO address space is exhausted. For PCI it > will be never, for PCI-e it will be after ~16 devices. Ok, let's go back a step here. Are you actually able to measure any speed in performance with this patch applied and without when going through MMIO kicks? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:39:51PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:38, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 14:08, Gleb Natapov wrote: > >> > >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes > slower than hypercalls. Why don't you just create a PV MMIO hypercall > that the guest can use to invoke MMIO accesses towards the host based on > physical addresses with explicit length encodings? > > >>> It is slower, but not an order of magnitude slower. It become faster > >>> with newer HW. > >>> > That way you simplify and speed up all code paths, exceeding the speed > of PIO exits even. It should also be quite easily portable, as all other > platforms have hypercalls available as well. > > >>> We are trying to avoid PV as much as possible (well this is also PV, > >>> but not guest visible > >> > >> Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? > >> The comment above its definition indicates that the guest does so, so it > >> is guest visible. > >> > > QEMU sets it. > > How does QEMU know? > Knows what? When to create such eventfd? virtio device knows. > > > >> +/* > >> + * PV_MMIO - Guest can promise us that all accesses touching this address > >> + * are writes of specified length, starting at the specified address. > >> + * If not - it's a Guest bug. > >> + * Can not be used together with either PIO or DATAMATCH. > >> + */ > >> > > Virtio spec will state that access to a kick register needs to be of > > specific length. This is reasonable thing for HW to ask. > > This is a spec change. So the guest would have to indicate that it adheres to > a newer spec. Thus it's a guest visible change. > There is not virtio spec that has kick register in MMIO. The spec is in the works AFAIK. Actually PIO will not be deprecated and my suggestion is to move to MMIO only when PIO address space is exhausted. For PCI it will be never, for PCI-e it will be after ~16 devices. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/6] kvm: add PV MMIO
Il 04/04/2013 12:40, Michael S. Tsirkin ha scritto: > Add an option for users to specify PV eventfd > listeners, and utilize KVM's PV MMIO underneath. > Upodate all callers. I would like Avi to comment on this, because I think this is not the "memory-API approved" way of doing things. You need KVM to define its own AddressSpace, and make KVM's listener use memory_region_to_address_space to figure out if it is for PV MMIO. To handle accesses from TCG, the PV AddressSpace can simply have just an alias to the actual MemoryRegion where the doorbell is. Paolo > Signed-off-by: Michael S. Tsirkin > --- > hw/dataplane/hostmem.c| 1 + > hw/ivshmem.c | 2 ++ > hw/pci-testdev.c | 2 ++ > hw/vhost.c| 4 ++-- > hw/virtio-pci.c | 4 ++-- > include/exec/memory.h | 10 ++ > kvm-all.c | 15 --- > linux-headers/linux/kvm.h | 8 > memory.c | 9 +++-- > 9 files changed, 46 insertions(+), 9 deletions(-) > > diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c > index 380537e..41d43e7 100644 > --- a/hw/dataplane/hostmem.c > +++ b/hw/dataplane/hostmem.c > @@ -126,6 +126,7 @@ static void hostmem_listener_section_dummy(MemoryListener > *listener, > > static void hostmem_listener_eventfd_dummy(MemoryListener *listener, > MemoryRegionSection *section, > + bool pv, > bool match_data, uint64_t data, > EventNotifier *e) > { > diff --git a/hw/ivshmem.c b/hw/ivshmem.c > index 68a2cf2..28e5978 100644 > --- a/hw/ivshmem.c > +++ b/hw/ivshmem.c > @@ -352,6 +352,7 @@ static void ivshmem_add_eventfd(IVShmemState *s, int > posn, int i) > memory_region_add_eventfd(&s->ivshmem_mmio, >DOORBELL, >4, > + false, >true, >(posn << 16) | i, >&s->peers[posn].eventfds[i]); > @@ -362,6 +363,7 @@ static void ivshmem_del_eventfd(IVShmemState *s, int > posn, int i) > memory_region_del_eventfd(&s->ivshmem_mmio, >DOORBELL, >4, > + false, >true, >(posn << 16) | i, >&s->peers[posn].eventfds[i]); > diff --git a/hw/pci-testdev.c b/hw/pci-testdev.c > index 9486624..f0ebf99 100644 > --- a/hw/pci-testdev.c > +++ b/hw/pci-testdev.c > @@ -80,6 +80,7 @@ static int pci_testdev_start(IOTest *test) > memory_region_add_eventfd(test->mr, >le32_to_cpu(test->hdr->offset), >test->size, > + false, >test->match_data, >test->hdr->data, >&test->notifier); > @@ -94,6 +95,7 @@ static void pci_testdev_stop(IOTest *test) > memory_region_del_eventfd(test->mr, >le32_to_cpu(test->hdr->offset), >test->size, > + false, >test->match_data, >test->hdr->data, >&test->notifier); > diff --git a/hw/vhost.c b/hw/vhost.c > index 4d6aee3..9ad7101 100644 > --- a/hw/vhost.c > +++ b/hw/vhost.c > @@ -750,13 +750,13 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, > } > > static void vhost_eventfd_add(MemoryListener *listener, > - MemoryRegionSection *section, > + MemoryRegionSection *section, bool pv, >bool match_data, uint64_t data, EventNotifier > *e) > { > } > > static void vhost_eventfd_del(MemoryListener *listener, > - MemoryRegionSection *section, > + MemoryRegionSection *section, bool pv, >bool match_data, uint64_t data, EventNotifier > *e) > { > } > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c > index 19965e5..f9ff994 100644 > --- a/hw/virtio-pci.c > +++ b/hw/virtio-pci.c > @@ -189,10 +189,10 @@ static int > virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > } > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > - true, n, notifier); > + false, true, n, notifier); > } else { > memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, > - true, n, notifier); > +
Re: [PATCH 2/7] ARM: KVM: fix HYP mapping limitations around zero
On 04/04/13 00:14, Christoffer Dall wrote: > On Tue, Apr 02, 2013 at 02:25:10PM +0100, Marc Zyngier wrote: >> The current code for creating HYP mapping doesn't like to wrap >> around zero, which prevents from mapping anything into the last >> page of the virtual address space. >> >> It doesn't take much effort to remove this limitation, making >> the code more consistent with the rest of the kernel in the process. > > This is quite funny, the code used to behave like this, but > reviewers said that this was old fashioned mm-style code that was > unwanted and wrapping around zero should be avoided for all costs :) My original attempt at this series was using the very last page of the memory, and it took me a while to notice that the code couldn't map anything in the last page. Just for this reason, the code needed to be fixed. > In any case, I welcome the familiarity. Amen. M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:38, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:08, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > With KVM, MMIO is much slower than PIO, due to the need to > do page walk and emulation. But with EPT, it does not have to be: we > know the address from the VMCS so if the address is unique, we can look > up the eventfd directly, bypassing emulation. > > Add an interface for userspace to specify this per-address, we can > use this e.g. for virtio. > > The implementation adds a separate bus internally. This serves two > purposes: > - minimize overhead for old userspace that does not use PV MMIO > - minimize disruption in other code (since we don't know the length, > devices on the MMIO bus only get a valid address in write, this > way we don't need to touch all devices to teach them handle > an dinvalid length) > > At the moment, this optimization is only supported for EPT on x86 and > silently ignored for NPT and MMU, so everything works correctly but > slowly. > > TODO: NPT, MMU and non x86 architectures. > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > pre-review and suggestions. > > Signed-off-by: Michael S. Tsirkin This still uses page fault intercepts which are orders of magnitudes slower than hypercalls. Why don't you just create a PV MMIO hypercall that the guest can use to invoke MMIO accesses towards the host based on physical addresses with explicit length encodings? >>> It is slower, but not an order of magnitude slower. It become faster >>> with newer HW. >>> That way you simplify and speed up all code paths, exceeding the speed of PIO exits even. It should also be quite easily portable, as all other platforms have hypercalls available as well. >>> We are trying to avoid PV as much as possible (well this is also PV, >>> but not guest visible >> >> Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? >> The comment above its definition indicates that the guest does so, so it is >> guest visible. >> > QEMU sets it. How does QEMU know? > >> +/* >> + * PV_MMIO - Guest can promise us that all accesses touching this address >> + * are writes of specified length, starting at the specified address. >> + * If not - it's a Guest bug. >> + * Can not be used together with either PIO or DATAMATCH. >> + */ >> > Virtio spec will state that access to a kick register needs to be of > specific length. This is reasonable thing for HW to ask. This is a spec change. So the guest would have to indicate that it adheres to a newer spec. Thus it's a guest visible change. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:32:08PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. Why don't you just create a PV MMIO hypercall that > >> the guest can use to invoke MMIO accesses towards the host based on > >> physical addresses with explicit length encodings? > >> > > It is slower, but not an order of magnitude slower. It become faster > > with newer HW. > > > >> That way you simplify and speed up all code paths, exceeding the speed of > >> PIO exits even. It should also be quite easily portable, as all other > >> platforms have hypercalls available as well. > >> > > We are trying to avoid PV as much as possible (well this is also PV, > > but not guest visible > > Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? The > comment above its definition indicates that the guest does so, so it is guest > visible. > QEMU sets it. > +/* > + * PV_MMIO - Guest can promise us that all accesses touching this address > + * are writes of specified length, starting at the specified address. > + * If not - it's a Guest bug. > + * Can not be used together with either PIO or DATAMATCH. > + */ > Virtio spec will state that access to a kick register needs to be of specific length. This is reasonable thing for HW to ask. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:34, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:22:09PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 14:08, Gleb Natapov wrote: >> >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > With KVM, MMIO is much slower than PIO, due to the need to > do page walk and emulation. But with EPT, it does not have to be: we > know the address from the VMCS so if the address is unique, we can look > up the eventfd directly, bypassing emulation. > > Add an interface for userspace to specify this per-address, we can > use this e.g. for virtio. > > The implementation adds a separate bus internally. This serves two > purposes: > - minimize overhead for old userspace that does not use PV MMIO > - minimize disruption in other code (since we don't know the length, > devices on the MMIO bus only get a valid address in write, this > way we don't need to touch all devices to teach them handle > an dinvalid length) > > At the moment, this optimization is only supported for EPT on x86 and > silently ignored for NPT and MMU, so everything works correctly but > slowly. > > TODO: NPT, MMU and non x86 architectures. > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > pre-review and suggestions. > > Signed-off-by: Michael S. Tsirkin This still uses page fault intercepts which are orders of magnitudes slower than hypercalls. Why don't you just create a PV MMIO hypercall that the guest can use to invoke MMIO accesses towards the host based on physical addresses with explicit length encodings? >>> It is slower, but not an order of magnitude slower. It become faster >>> with newer HW. >>> That way you simplify and speed up all code paths, exceeding the speed of PIO exits even. It should also be quite easily portable, as all other platforms have hypercalls available as well. >>> We are trying to avoid PV as much as possible (well this is also PV, >>> but not guest visible). We haven't replaced PIO with hypercall for the >>> same reason. My hope is that future HW will provide us with instruction >>> decode for basic mov instruction at which point this optimisation can be >>> dropped. >> >> The same applies to an MMIO hypercall. Once the PV interface becomes >> obsolete, we can drop the capability we expose to the guest. >> > Disable it on newer HW is easy, but it is not that simple to get rid of the > guest code. You need to have guest code that runs with non-PV hcalls anyways, since you want to be able to run on older KVM versions. So what's the problem? > >>> And hypercall has its own set of problems with Windows guests. >>> When KVM runs in Hyper-V emulation mode it expects to get Hyper-V >>> hypercalls. Mixing KVM hypercalls and Hyper-V requires some tricks. It >> >> Can't we simply register a hypercall ID range with Microsoft? > Doubt it. This is not only about the rang though. The calling convention > is completely different. There is no calling convention for KVM hypercalls anymore, or is there? Either way, we would of course just adhere to the MS calling convention. We're only talking about register values here... > >> >>> may also affect WHQLing Windows drivers since driver will talk to HW >>> bypassing Windows interfaces. >> >> Then the WHQL'ed driver doesn't support the PV MMIO hcall? >> > All Windows drivers have to be WHQL'ed, saying that is like saying that > we do not care about Windows guest virtio speed. I don't mind if a Linux guest is faster than a Windows guest, if that's what you're asking. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:22:09PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 14:08, Gleb Natapov wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. Why don't you just create a PV MMIO hypercall that > >> the guest can use to invoke MMIO accesses towards the host based on > >> physical addresses with explicit length encodings? > >> > > It is slower, but not an order of magnitude slower. It become faster > > with newer HW. > > > >> That way you simplify and speed up all code paths, exceeding the speed of > >> PIO exits even. It should also be quite easily portable, as all other > >> platforms have hypercalls available as well. > >> > > We are trying to avoid PV as much as possible (well this is also PV, > > but not guest visible). We haven't replaced PIO with hypercall for the > > same reason. My hope is that future HW will provide us with instruction > > decode for basic mov instruction at which point this optimisation can be > > dropped. > > The same applies to an MMIO hypercall. Once the PV interface becomes > obsolete, we can drop the capability we expose to the guest. > Disable it on newer HW is easy, but it is not that simple to get rid of the guest code. > > And hypercall has its own set of problems with Windows guests. > > When KVM runs in Hyper-V emulation mode it expects to get Hyper-V > > hypercalls. Mixing KVM hypercalls and Hyper-V requires some tricks. It > > Can't we simply register a hypercall ID range with Microsoft? Doubt it. This is not only about the rang though. The calling convention is completely different. > > > may also affect WHQLing Windows drivers since driver will talk to HW > > bypassing Windows interfaces. > > Then the WHQL'ed driver doesn't support the PV MMIO hcall? > All Windows drivers have to be WHQL'ed, saying that is like saying that we do not care about Windows guest virtio speed. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] ARM: KVM: simplify HYP mapping population
On 04/04/13 00:13, Christoffer Dall wrote: > On Tue, Apr 02, 2013 at 02:25:09PM +0100, Marc Zyngier wrote: >> The way we populate HYP mappings is a bit convoluted, to say the least. >> Passing a pointer around to keep track of the current PFN is quite >> odd, and we end-up having two different PTE accessors for no good >> reason. >> >> Simplify the whole thing by unifying the two PTE accessors, passing >> a pgprot_t around, and moving the various validity checks to the >> upper layers. >> >> Signed-off-by: Marc Zyngier >> --- >> arch/arm/kvm/mmu.c | 100 >> ++--- >> 1 file changed, 41 insertions(+), 59 deletions(-) >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 2f12e40..24811d1 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -125,54 +125,34 @@ void free_hyp_pmds(void) >> } >> >> static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start, >> -unsigned long end) >> +unsigned long end, unsigned long pfn, >> +pgprot_t prot) >> { >> pte_t *pte; >> unsigned long addr; >> -struct page *page; >> >> -for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) { >> -unsigned long hyp_addr = KERN_TO_HYP(addr); >> - >> -pte = pte_offset_kernel(pmd, hyp_addr); >> -BUG_ON(!virt_addr_valid(addr)); >> -page = virt_to_page(addr); >> -kvm_set_pte(pte, mk_pte(page, PAGE_HYP)); >> -} >> -} >> - >> -static void create_hyp_io_pte_mappings(pmd_t *pmd, unsigned long start, >> - unsigned long end, >> - unsigned long *pfn_base) >> -{ >> -pte_t *pte; >> -unsigned long addr; >> - >> -for (addr = start & PAGE_MASK; addr < end; addr += PAGE_SIZE) { >> -unsigned long hyp_addr = KERN_TO_HYP(addr); >> - >> -pte = pte_offset_kernel(pmd, hyp_addr); >> -BUG_ON(pfn_valid(*pfn_base)); >> -kvm_set_pte(pte, pfn_pte(*pfn_base, PAGE_HYP_DEVICE)); >> -(*pfn_base)++; >> +for (addr = start; addr < end; addr += PAGE_SIZE) { >> +pte = pte_offset_kernel(pmd, addr); >> +kvm_set_pte(pte, pfn_pte(pfn, prot)); >> +pfn++; >> } >> } >> >> static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start, >> - unsigned long end, unsigned long *pfn_base) >> + unsigned long end, unsigned long pfn, >> + pgprot_t prot) >> { >> pmd_t *pmd; >> pte_t *pte; >> unsigned long addr, next; >> >> for (addr = start; addr < end; addr = next) { >> -unsigned long hyp_addr = KERN_TO_HYP(addr); >> -pmd = pmd_offset(pud, hyp_addr); >> +pmd = pmd_offset(pud, addr); >> >> BUG_ON(pmd_sect(*pmd)); >> >> if (pmd_none(*pmd)) { >> -pte = pte_alloc_one_kernel(NULL, hyp_addr); >> +pte = pte_alloc_one_kernel(NULL, addr); >> if (!pte) { >> kvm_err("Cannot allocate Hyp pte\n"); >> return -ENOMEM; >> @@ -182,25 +162,17 @@ static int create_hyp_pmd_mappings(pud_t *pud, >> unsigned long start, >> >> next = pmd_addr_end(addr, end); >> >> -/* >> - * If pfn_base is NULL, we map kernel pages into HYP with the >> - * virtual address. Otherwise, this is considered an I/O >> - * mapping and we map the physical region starting at >> - * *pfn_base to [start, end[. >> - */ >> -if (!pfn_base) >> -create_hyp_pte_mappings(pmd, addr, next); >> -else >> -create_hyp_io_pte_mappings(pmd, addr, next, pfn_base); >> +create_hyp_pte_mappings(pmd, addr, next, pfn, prot); >> +pfn += (next - addr) >> PAGE_SHIFT; > > so this scheme always assumes a physically contiguous memory reason, > and I wasn't sure if we ever wanted to support mapping vmalloc'ed > regions into Hyp mode, which is why I wrote the code the way it was > (which goes to your "for no good reason" comment above). So let's look at what we're actually mapping: - kernel code/kmalloc-ed data: this is always physically contiguous - MMIO: While this is mapped in the vmalloc region, this is actually physically contiguous. > I'm fine with assuming that this only works for contiguous regions, but > I think it deserves a line in the comments on the exported functions > (the non-IO one anyway). Sure, I'll add that. >> } >> >> return 0; >> } >> >> -static int __create_hyp_mappings(void *from, void *to, unsigned long >> *pfn_base) >> +static int __create_hyp_mappings(pgd_t *pgdp, >> +
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:08, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: >> >>> With KVM, MMIO is much slower than PIO, due to the need to >>> do page walk and emulation. But with EPT, it does not have to be: we >>> know the address from the VMCS so if the address is unique, we can look >>> up the eventfd directly, bypassing emulation. >>> >>> Add an interface for userspace to specify this per-address, we can >>> use this e.g. for virtio. >>> >>> The implementation adds a separate bus internally. This serves two >>> purposes: >>> - minimize overhead for old userspace that does not use PV MMIO >>> - minimize disruption in other code (since we don't know the length, >>> devices on the MMIO bus only get a valid address in write, this >>> way we don't need to touch all devices to teach them handle >>> an dinvalid length) >>> >>> At the moment, this optimization is only supported for EPT on x86 and >>> silently ignored for NPT and MMU, so everything works correctly but >>> slowly. >>> >>> TODO: NPT, MMU and non x86 architectures. >>> >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for >>> pre-review and suggestions. >>> >>> Signed-off-by: Michael S. Tsirkin >> >> This still uses page fault intercepts which are orders of magnitudes slower >> than hypercalls. Why don't you just create a PV MMIO hypercall that the >> guest can use to invoke MMIO accesses towards the host based on physical >> addresses with explicit length encodings? >> > It is slower, but not an order of magnitude slower. It become faster > with newer HW. > >> That way you simplify and speed up all code paths, exceeding the speed of >> PIO exits even. It should also be quite easily portable, as all other >> platforms have hypercalls available as well. >> > We are trying to avoid PV as much as possible (well this is also PV, > but not guest visible Also, how is this not guest visible? Who sets KVM_IOEVENTFD_FLAG_PV_MMIO? The comment above its definition indicates that the guest does so, so it is guest visible. +/* + * PV_MMIO - Guest can promise us that all accesses touching this address + * are writes of specified length, starting at the specified address. + * If not - it's a Guest bug. + * Can not be used together with either PIO or DATAMATCH. + */ Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:09:53PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 13:04, Michael S. Tsirkin wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. > > > > Not really. Here's a test: > > compare vmcall to portio: > > > > vmcall 1519 > > ... > > outl_to_kernel 1745 > > > > compare portio to mmio: > > > > mmio-wildcard-eventfd:pci-mem 3529 > > mmio-pv-eventfd:pci-mem 1878 > > portio-wildcard-eventfd:pci-io 1846 > > > > So not orders of magnitude. > > https://dl.dropbox.com/u/8976842/KVM%20Forum%202012/MMIO%20Tuning.pdf > > Check out page 41. Higher is better (number is number of loop cycles in a > second). My test system was an AMD Istanbul based box. Wow 2x difference. The difference seems to be much smaller now. Newer hardware? Newer software? > > > >> Why don't you just create a PV MMIO hypercall > >> that the guest can use to invoke MMIO accesses towards the host based > >> on physical addresses with explicit length encodings? > >> That way you simplify and speed up all code paths, exceeding the speed > >> of PIO exits even. It should also be quite easily portable, as all > >> other platforms have hypercalls available as well. > >> > >> > >> Alex > > > > I sent such a patch, but maintainers seem reluctant to add hypercalls. > > Gleb, could you comment please? > > > > A fast way to do MMIO is probably useful in any case ... > > Yes, but at least according to my numbers optimizing anything that is not > hcalls is a waste of time. > > > Alex This was the implementation: 'kvm_para: add mmio word store hypercall'. Again, I don't mind. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:19, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 02:09:53PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 13:04, Michael S. Tsirkin wrote: >> >>> On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > With KVM, MMIO is much slower than PIO, due to the need to > do page walk and emulation. But with EPT, it does not have to be: we > know the address from the VMCS so if the address is unique, we can look > up the eventfd directly, bypassing emulation. > > Add an interface for userspace to specify this per-address, we can > use this e.g. for virtio. > > The implementation adds a separate bus internally. This serves two > purposes: > - minimize overhead for old userspace that does not use PV MMIO > - minimize disruption in other code (since we don't know the length, > devices on the MMIO bus only get a valid address in write, this > way we don't need to touch all devices to teach them handle > an dinvalid length) > > At the moment, this optimization is only supported for EPT on x86 and > silently ignored for NPT and MMU, so everything works correctly but > slowly. > > TODO: NPT, MMU and non x86 architectures. > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > pre-review and suggestions. > > Signed-off-by: Michael S. Tsirkin This still uses page fault intercepts which are orders of magnitudes slower than hypercalls. >>> >>> Not really. Here's a test: >>> compare vmcall to portio: >>> >>> vmcall 1519 >>> ... >>> outl_to_kernel 1745 >>> >>> compare portio to mmio: >>> >>> mmio-wildcard-eventfd:pci-mem 3529 >>> mmio-pv-eventfd:pci-mem 1878 >>> portio-wildcard-eventfd:pci-io 1846 >>> >>> So not orders of magnitude. >> >> https://dl.dropbox.com/u/8976842/KVM%20Forum%202012/MMIO%20Tuning.pdf >> >> Check out page 41. Higher is better (number is number of loop cycles in a >> second). My test system was an AMD Istanbul based box. >> > Have you bypassed instruction emulation in your testing? PIO doesn't do instruction emulation. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 14:08, Gleb Natapov wrote: > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: >> >>> With KVM, MMIO is much slower than PIO, due to the need to >>> do page walk and emulation. But with EPT, it does not have to be: we >>> know the address from the VMCS so if the address is unique, we can look >>> up the eventfd directly, bypassing emulation. >>> >>> Add an interface for userspace to specify this per-address, we can >>> use this e.g. for virtio. >>> >>> The implementation adds a separate bus internally. This serves two >>> purposes: >>> - minimize overhead for old userspace that does not use PV MMIO >>> - minimize disruption in other code (since we don't know the length, >>> devices on the MMIO bus only get a valid address in write, this >>> way we don't need to touch all devices to teach them handle >>> an dinvalid length) >>> >>> At the moment, this optimization is only supported for EPT on x86 and >>> silently ignored for NPT and MMU, so everything works correctly but >>> slowly. >>> >>> TODO: NPT, MMU and non x86 architectures. >>> >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for >>> pre-review and suggestions. >>> >>> Signed-off-by: Michael S. Tsirkin >> >> This still uses page fault intercepts which are orders of magnitudes slower >> than hypercalls. Why don't you just create a PV MMIO hypercall that the >> guest can use to invoke MMIO accesses towards the host based on physical >> addresses with explicit length encodings? >> > It is slower, but not an order of magnitude slower. It become faster > with newer HW. > >> That way you simplify and speed up all code paths, exceeding the speed of >> PIO exits even. It should also be quite easily portable, as all other >> platforms have hypercalls available as well. >> > We are trying to avoid PV as much as possible (well this is also PV, > but not guest visible). We haven't replaced PIO with hypercall for the > same reason. My hope is that future HW will provide us with instruction > decode for basic mov instruction at which point this optimisation can be > dropped. The same applies to an MMIO hypercall. Once the PV interface becomes obsolete, we can drop the capability we expose to the guest. > And hypercall has its own set of problems with Windows guests. > When KVM runs in Hyper-V emulation mode it expects to get Hyper-V > hypercalls. Mixing KVM hypercalls and Hyper-V requires some tricks. It Can't we simply register a hypercall ID range with Microsoft? > may also affect WHQLing Windows drivers since driver will talk to HW > bypassing Windows interfaces. Then the WHQL'ed driver doesn't support the PV MMIO hcall? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 02:09:53PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 13:04, Michael S. Tsirkin wrote: > > > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > >> > >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > >> > >>> With KVM, MMIO is much slower than PIO, due to the need to > >>> do page walk and emulation. But with EPT, it does not have to be: we > >>> know the address from the VMCS so if the address is unique, we can look > >>> up the eventfd directly, bypassing emulation. > >>> > >>> Add an interface for userspace to specify this per-address, we can > >>> use this e.g. for virtio. > >>> > >>> The implementation adds a separate bus internally. This serves two > >>> purposes: > >>> - minimize overhead for old userspace that does not use PV MMIO > >>> - minimize disruption in other code (since we don't know the length, > >>> devices on the MMIO bus only get a valid address in write, this > >>> way we don't need to touch all devices to teach them handle > >>> an dinvalid length) > >>> > >>> At the moment, this optimization is only supported for EPT on x86 and > >>> silently ignored for NPT and MMU, so everything works correctly but > >>> slowly. > >>> > >>> TODO: NPT, MMU and non x86 architectures. > >>> > >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > >>> pre-review and suggestions. > >>> > >>> Signed-off-by: Michael S. Tsirkin > >> > >> This still uses page fault intercepts which are orders of magnitudes > >> slower than hypercalls. > > > > Not really. Here's a test: > > compare vmcall to portio: > > > > vmcall 1519 > > ... > > outl_to_kernel 1745 > > > > compare portio to mmio: > > > > mmio-wildcard-eventfd:pci-mem 3529 > > mmio-pv-eventfd:pci-mem 1878 > > portio-wildcard-eventfd:pci-io 1846 > > > > So not orders of magnitude. > > https://dl.dropbox.com/u/8976842/KVM%20Forum%202012/MMIO%20Tuning.pdf > > Check out page 41. Higher is better (number is number of loop cycles in a > second). My test system was an AMD Istanbul based box. > Have you bypassed instruction emulation in your testing? > > > >> Why don't you just create a PV MMIO hypercall > >> that the guest can use to invoke MMIO accesses towards the host based > >> on physical addresses with explicit length encodings? > >> That way you simplify and speed up all code paths, exceeding the speed > >> of PIO exits even. It should also be quite easily portable, as all > >> other platforms have hypercalls available as well. > >> > >> > >> Alex > > > > I sent such a patch, but maintainers seem reluctant to add hypercalls. > > Gleb, could you comment please? > > > > A fast way to do MMIO is probably useful in any case ... > > Yes, but at least according to my numbers optimizing anything that is not > hcalls is a waste of time. > > > Alex -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 13:04, Michael S. Tsirkin wrote: > On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: >> >> On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: >> >>> With KVM, MMIO is much slower than PIO, due to the need to >>> do page walk and emulation. But with EPT, it does not have to be: we >>> know the address from the VMCS so if the address is unique, we can look >>> up the eventfd directly, bypassing emulation. >>> >>> Add an interface for userspace to specify this per-address, we can >>> use this e.g. for virtio. >>> >>> The implementation adds a separate bus internally. This serves two >>> purposes: >>> - minimize overhead for old userspace that does not use PV MMIO >>> - minimize disruption in other code (since we don't know the length, >>> devices on the MMIO bus only get a valid address in write, this >>> way we don't need to touch all devices to teach them handle >>> an dinvalid length) >>> >>> At the moment, this optimization is only supported for EPT on x86 and >>> silently ignored for NPT and MMU, so everything works correctly but >>> slowly. >>> >>> TODO: NPT, MMU and non x86 architectures. >>> >>> The idea was suggested by Peter Anvin. Lots of thanks to Gleb for >>> pre-review and suggestions. >>> >>> Signed-off-by: Michael S. Tsirkin >> >> This still uses page fault intercepts which are orders of magnitudes >> slower than hypercalls. > > Not really. Here's a test: > compare vmcall to portio: > > vmcall 1519 > ... > outl_to_kernel 1745 > > compare portio to mmio: > > mmio-wildcard-eventfd:pci-mem 3529 > mmio-pv-eventfd:pci-mem 1878 > portio-wildcard-eventfd:pci-io 1846 > > So not orders of magnitude. https://dl.dropbox.com/u/8976842/KVM%20Forum%202012/MMIO%20Tuning.pdf Check out page 41. Higher is better (number is number of loop cycles in a second). My test system was an AMD Istanbul based box. > >> Why don't you just create a PV MMIO hypercall >> that the guest can use to invoke MMIO accesses towards the host based >> on physical addresses with explicit length encodings? >> That way you simplify and speed up all code paths, exceeding the speed >> of PIO exits even. It should also be quite easily portable, as all >> other platforms have hypercalls available as well. >> >> >> Alex > > I sent such a patch, but maintainers seem reluctant to add hypercalls. > Gleb, could you comment please? > > A fast way to do MMIO is probably useful in any case ... Yes, but at least according to my numbers optimizing anything that is not hcalls is a waste of time. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes slower > than hypercalls. Why don't you just create a PV MMIO hypercall that the guest > can use to invoke MMIO accesses towards the host based on physical addresses > with explicit length encodings? > It is slower, but not an order of magnitude slower. It become faster with newer HW. > That way you simplify and speed up all code paths, exceeding the speed of PIO > exits even. It should also be quite easily portable, as all other platforms > have hypercalls available as well. > We are trying to avoid PV as much as possible (well this is also PV, but not guest visible). We haven't replaced PIO with hypercall for the same reason. My hope is that future HW will provide us with instruction decode for basic mov instruction at which point this optimisation can be dropped. And hypercall has its own set of problems with Windows guests. When KVM runs in Hyper-V emulation mode it expects to get Hyper-V hypercalls. Mixing KVM hypercalls and Hyper-V requires some tricks. It may also affect WHQLing Windows drivers since driver will talk to HW bypassing Windows interfaces. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On Thu, Apr 04, 2013 at 01:57:34PM +0200, Alexander Graf wrote: > > On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > > > With KVM, MMIO is much slower than PIO, due to the need to > > do page walk and emulation. But with EPT, it does not have to be: we > > know the address from the VMCS so if the address is unique, we can look > > up the eventfd directly, bypassing emulation. > > > > Add an interface for userspace to specify this per-address, we can > > use this e.g. for virtio. > > > > The implementation adds a separate bus internally. This serves two > > purposes: > > - minimize overhead for old userspace that does not use PV MMIO > > - minimize disruption in other code (since we don't know the length, > > devices on the MMIO bus only get a valid address in write, this > > way we don't need to touch all devices to teach them handle > > an dinvalid length) > > > > At the moment, this optimization is only supported for EPT on x86 and > > silently ignored for NPT and MMU, so everything works correctly but > > slowly. > > > > TODO: NPT, MMU and non x86 architectures. > > > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > > pre-review and suggestions. > > > > Signed-off-by: Michael S. Tsirkin > > This still uses page fault intercepts which are orders of magnitudes > slower than hypercalls. Not really. Here's a test: compare vmcall to portio: vmcall 1519 ... outl_to_kernel 1745 compare portio to mmio: mmio-wildcard-eventfd:pci-mem 3529 mmio-pv-eventfd:pci-mem 1878 portio-wildcard-eventfd:pci-io 1846 So not orders of magnitude. > Why don't you just create a PV MMIO hypercall > that the guest can use to invoke MMIO accesses towards the host based > on physical addresses with explicit length encodings? > That way you simplify and speed up all code paths, exceeding the speed > of PIO exits even. It should also be quite easily portable, as all > other platforms have hypercalls available as well. > > > Alex I sent such a patch, but maintainers seem reluctant to add hypercalls. Gleb, could you comment please? A fast way to do MMIO is probably useful in any case ... > Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 4/7] KVM: Add reset/restore rtc_status support
On Mon, Apr 01, 2013 at 08:40:13AM +0800, Yang Zhang wrote: > From: Yang Zhang > > Signed-off-by: Yang Zhang > --- > arch/x86/kvm/lapic.c |9 + > arch/x86/kvm/lapic.h |2 ++ > virt/kvm/ioapic.c| 43 +++ > virt/kvm/ioapic.h|1 + > 4 files changed, 55 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 96ab160..9c041fa 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -94,6 +94,14 @@ static inline int apic_test_vector(int vec, void *bitmap) > return test_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); > } > > +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector) > +{ > + struct kvm_lapic *apic = vcpu->arch.apic; > + > + return apic_test_vector(vector, apic->regs + APIC_ISR) || > + apic_test_vector(vector, apic->regs + APIC_IRR); > +} > + > static inline void apic_set_vector(int vec, void *bitmap) > { > set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec)); > @@ -1665,6 +1673,7 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, > apic->highest_isr_cache = -1; > kvm_x86_ops->hwapic_isr_update(vcpu->kvm, apic_find_highest_isr(apic)); > kvm_make_request(KVM_REQ_EVENT, vcpu); > + kvm_rtc_irq_restore(vcpu); > } > > void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu) > diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h > index 967519c..004d2ad 100644 > --- a/arch/x86/kvm/lapic.h > +++ b/arch/x86/kvm/lapic.h > @@ -170,4 +170,6 @@ static inline bool kvm_apic_has_events(struct kvm_vcpu > *vcpu) > return vcpu->arch.apic->pending_events; > } > > +bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector); > + > #endif > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > index 8664812..0b12b17 100644 > --- a/virt/kvm/ioapic.c > +++ b/virt/kvm/ioapic.c > @@ -90,6 +90,47 @@ static unsigned long ioapic_read_indirect(struct > kvm_ioapic *ioapic, > return result; > } > > +static void rtc_irq_reset(struct kvm_ioapic *ioapic) > +{ > + ioapic->rtc_status.pending_eoi = 0; > + bitmap_zero(ioapic->rtc_status.dest_map, KVM_MAX_VCPUS); > +} > + > +static void rtc_irq_restore(struct kvm_ioapic *ioapic) > +{ > + struct kvm_vcpu *vcpu; > + int vector, i, pending_eoi = 0; > + > + if (RTC_GSI >= IOAPIC_NUM_PINS) > + return; > + > + vector = ioapic->redirtbl[RTC_GSI].fields.vector; > + kvm_for_each_vcpu(i, vcpu, ioapic->kvm) { > + if (kvm_apic_pending_eoi(vcpu, vector)) { > + pending_eoi++; > + __set_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map); You should cleat dest_map at the beginning to get rid of stale bits. > + } > + } > + ioapic->rtc_status.pending_eoi = pending_eoi; > +} > + > +void kvm_rtc_irq_restore(struct kvm_vcpu *vcpu) > +{ > + struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic; > + int vector; > + > + if (!ioapic) > + return; > + Can this be called if ioapic == NULL? Should check for if (RTC_GSI >= IOAPIC_NUM_PINS) here too. > + spin_lock(&ioapic->lock); > + vector = ioapic->redirtbl[RTC_GSI].fields.vector; > + if (kvm_apic_pending_eoi(vcpu, vector)) { > + __set_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map); > + ioapic->rtc_status.pending_eoi++; The bit may have been set already. The logic should be: new_val = kvm_apic_pending_eoi(vcpu, vector) old_val = set_bit(vcpu_id, dest_map) if (new_val == old_val) return; if (new_val) { __set_bit(vcpu_id, dest_map); pending_eoi++; } else { __clear_bit(vcpu_id, dest_map); pending_eoi--; } The naming of above two functions are not good either. Call them something like kvm_rtc_eoi_tracking_restore_all() and kvm_rtc_eoi_tracking_restore_one(). And _all should call _one() for each vcpu. Make __rtc_irq_eoi_tracking_restore_one() that does not take ioapic lock and call it from kvm_rtc_eoi_tracking_restore_one() surrounded by locks. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] kvm: add PV MMIO EVENTFD
On 04.04.2013, at 12:50, Michael S. Tsirkin wrote: > With KVM, MMIO is much slower than PIO, due to the need to > do page walk and emulation. But with EPT, it does not have to be: we > know the address from the VMCS so if the address is unique, we can look > up the eventfd directly, bypassing emulation. > > Add an interface for userspace to specify this per-address, we can > use this e.g. for virtio. > > The implementation adds a separate bus internally. This serves two > purposes: > - minimize overhead for old userspace that does not use PV MMIO > - minimize disruption in other code (since we don't know the length, > devices on the MMIO bus only get a valid address in write, this > way we don't need to touch all devices to teach them handle > an dinvalid length) > > At the moment, this optimization is only supported for EPT on x86 and > silently ignored for NPT and MMU, so everything works correctly but > slowly. > > TODO: NPT, MMU and non x86 architectures. > > The idea was suggested by Peter Anvin. Lots of thanks to Gleb for > pre-review and suggestions. > > Signed-off-by: Michael S. Tsirkin This still uses page fault intercepts which are orders of magnitudes slower than hypercalls. Why don't you just create a PV MMIO hypercall that the guest can use to invoke MMIO accesses towards the host based on physical addresses with explicit length encodings? That way you simplify and speed up all code paths, exceeding the speed of PIO exits even. It should also be quite easily portable, as all other platforms have hypercalls available as well. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] kvm: add PV MMIO EVENTFD
With KVM, MMIO is much slower than PIO, due to the need to do page walk and emulation. But with EPT, it does not have to be: we know the address from the VMCS so if the address is unique, we can look up the eventfd directly, bypassing emulation. Add an interface for userspace to specify this per-address, we can use this e.g. for virtio. The implementation adds a separate bus internally. This serves two purposes: - minimize overhead for old userspace that does not use PV MMIO - minimize disruption in other code (since we don't know the length, devices on the MMIO bus only get a valid address in write, this way we don't need to touch all devices to teach them handle an dinvalid length) At the moment, this optimization is only supported for EPT on x86 and silently ignored for NPT and MMU, so everything works correctly but slowly. TODO: NPT, MMU and non x86 architectures. The idea was suggested by Peter Anvin. Lots of thanks to Gleb for pre-review and suggestions. Signed-off-by: Michael S. Tsirkin --- arch/x86/kvm/vmx.c | 4 arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 9 + virt/kvm/eventfd.c | 47 ++- virt/kvm/kvm_main.c | 1 + 6 files changed, 58 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6667042..cdaac9b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5127,6 +5127,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu) gpa_t gpa; gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS); + if (!kvm_io_bus_write(vcpu->kvm, KVM_PV_MMIO_BUS, gpa, 0, NULL)) { + skip_emulated_instruction(vcpu); + return 1; + } ret = handle_mmio_page_fault_common(vcpu, gpa, true); if (likely(ret == 1)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f19ac0a..b9223d9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2483,6 +2483,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_IRQFD: case KVM_CAP_IOEVENTFD: + case KVM_CAP_IOEVENTFD_PV_MMIO: case KVM_CAP_PIT2: case KVM_CAP_PIT_STATE2: case KVM_CAP_SET_IDENTITY_MAP_ADDR: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index cad77fe..35b74cd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -149,6 +149,7 @@ struct kvm_io_bus { enum kvm_bus { KVM_MMIO_BUS, KVM_PIO_BUS, + KVM_PV_MMIO_BUS, KVM_NR_BUSES }; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 3c56ba3..61783ee 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -449,11 +449,19 @@ enum { kvm_ioeventfd_flag_nr_datamatch, kvm_ioeventfd_flag_nr_pio, kvm_ioeventfd_flag_nr_deassign, + kvm_ioeventfd_flag_nr_pv_mmio, kvm_ioeventfd_flag_nr_max, }; #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) +/* + * PV_MMIO - Guest can promise us that all accesses touching this address + * are writes of specified length, starting at the specified address. + * If not - it's a Guest bug. + * Can not be used together with either PIO or DATAMATCH. + */ +#define KVM_IOEVENTFD_FLAG_PV_MMIO (1 << kvm_ioeventfd_flag_nr_pv_mmio) #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1) @@ -665,6 +673,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_EPR 86 #define KVM_CAP_ARM_PSCI 87 #define KVM_CAP_ARM_SET_DEVICE_ADDR 88 +#define KVM_CAP_IOEVENTFD_PV_MMIO 89 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 93e5b05..1b7619e 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -579,6 +579,7 @@ struct _ioeventfd { struct kvm_io_device dev; u8 bus_idx; bool wildcard; + bool pvmmio; }; static inline struct _ioeventfd * @@ -600,7 +601,15 @@ ioeventfd_in_range(struct _ioeventfd *p, gpa_t addr, int len, const void *val) { u64 _val; - if (!(addr == p->addr && len == p->length)) + if (addr != p->addr) + /* address must be precise for a hit */ + return false; + + if (p->pvmmio) + /* pvmmio only looks at the address, so always a hit */ + return true; + + if (len != p->length) /* address-range must be precise for a hit */ return false; @@ -671,9 +680,11 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) list_for_each_entry(_p, &kvm->ioeventfds, list) if (_p->bus_idx == p->bus_idx && - _p->addr == p->addr && _p->length == p->
[PATCH v2 6/6] pci-testdev: add pv mmio test
Add ability to test the speed of PV MMIO. Signed-off-by: Michael S. Tsirkin --- hw/pci-testdev.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/pci-testdev.c b/hw/pci-testdev.c index f0ebf99..9a50631 100644 --- a/hw/pci-testdev.c +++ b/hw/pci-testdev.c @@ -19,6 +19,7 @@ typedef struct IOTest { EventNotifier notifier; bool hasnotifier; unsigned size; +bool pv; bool match_data; PCITestDevHdr *hdr; unsigned bufsize; @@ -33,7 +34,8 @@ typedef struct IOTest { static const char *iotest_test[] = { "no-eventfd", "wildcard-eventfd", -"datamatch-eventfd" +"datamatch-eventfd", +"pv-eventfd" }; static const char *iotest_type[] = { @@ -80,7 +82,7 @@ static int pci_testdev_start(IOTest *test) memory_region_add_eventfd(test->mr, le32_to_cpu(test->hdr->offset), test->size, - false, + test->pv, test->match_data, test->hdr->data, &test->notifier); @@ -95,7 +97,7 @@ static void pci_testdev_stop(IOTest *test) memory_region_del_eventfd(test->mr, le32_to_cpu(test->hdr->offset), test->size, - false, + test->pv, test->match_data, test->hdr->data, &test->notifier); @@ -237,7 +239,8 @@ static int pci_testdev_init(PCIDevice *pci_dev) g_free(name); test->hdr->offset = cpu_to_le32(IOTEST_SIZE(i) + i * IOTEST_ACCESS_WIDTH); test->size = IOTEST_ACCESS_WIDTH; -test->match_data = strcmp(IOTEST_TEST(i), "wildcard-eventfd"); +test->pv = !strcmp(IOTEST_TEST(i), "pv-eventfd"); +test->match_data = !strcmp(IOTEST_TEST(i), "datamatch-eventfd"); test->hdr->test = i; test->hdr->data = test->match_data ? IOTEST_DATAMATCH : IOTEST_NOMATCH; test->hdr->width = IOTEST_ACCESS_WIDTH; -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/6] kvm: add PV MMIO
Add an option for users to specify PV eventfd listeners, and utilize KVM's PV MMIO underneath. Upodate all callers. Signed-off-by: Michael S. Tsirkin --- hw/dataplane/hostmem.c| 1 + hw/ivshmem.c | 2 ++ hw/pci-testdev.c | 2 ++ hw/vhost.c| 4 ++-- hw/virtio-pci.c | 4 ++-- include/exec/memory.h | 10 ++ kvm-all.c | 15 --- linux-headers/linux/kvm.h | 8 memory.c | 9 +++-- 9 files changed, 46 insertions(+), 9 deletions(-) diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c index 380537e..41d43e7 100644 --- a/hw/dataplane/hostmem.c +++ b/hw/dataplane/hostmem.c @@ -126,6 +126,7 @@ static void hostmem_listener_section_dummy(MemoryListener *listener, static void hostmem_listener_eventfd_dummy(MemoryListener *listener, MemoryRegionSection *section, + bool pv, bool match_data, uint64_t data, EventNotifier *e) { diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 68a2cf2..28e5978 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -352,6 +352,7 @@ static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i) memory_region_add_eventfd(&s->ivshmem_mmio, DOORBELL, 4, + false, true, (posn << 16) | i, &s->peers[posn].eventfds[i]); @@ -362,6 +363,7 @@ static void ivshmem_del_eventfd(IVShmemState *s, int posn, int i) memory_region_del_eventfd(&s->ivshmem_mmio, DOORBELL, 4, + false, true, (posn << 16) | i, &s->peers[posn].eventfds[i]); diff --git a/hw/pci-testdev.c b/hw/pci-testdev.c index 9486624..f0ebf99 100644 --- a/hw/pci-testdev.c +++ b/hw/pci-testdev.c @@ -80,6 +80,7 @@ static int pci_testdev_start(IOTest *test) memory_region_add_eventfd(test->mr, le32_to_cpu(test->hdr->offset), test->size, + false, test->match_data, test->hdr->data, &test->notifier); @@ -94,6 +95,7 @@ static void pci_testdev_stop(IOTest *test) memory_region_del_eventfd(test->mr, le32_to_cpu(test->hdr->offset), test->size, + false, test->match_data, test->hdr->data, &test->notifier); diff --git a/hw/vhost.c b/hw/vhost.c index 4d6aee3..9ad7101 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -750,13 +750,13 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev, } static void vhost_eventfd_add(MemoryListener *listener, - MemoryRegionSection *section, + MemoryRegionSection *section, bool pv, bool match_data, uint64_t data, EventNotifier *e) { } static void vhost_eventfd_del(MemoryListener *listener, - MemoryRegionSection *section, + MemoryRegionSection *section, bool pv, bool match_data, uint64_t data, EventNotifier *e) { } diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 19965e5..f9ff994 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -189,10 +189,10 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, } virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, - true, n, notifier); + false, true, n, notifier); } else { memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, - true, n, notifier); + false, true, n, notifier); virtio_queue_set_host_notifier_fd_handler(vq, false, false); event_notifier_cleanup(notifier); } diff --git a/include/exec/memory.h b/include/exec/memory.h index 2322732..3612004 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -215,8 +215,10 @@ struct MemoryListener { void (*log_global_start)(MemoryListener *listener); void (*log_global_stop)(MemoryListener *listener); void (*eventfd_add)(MemoryListener *listener, MemoryRegionSection *section, +bool pv, bool match_data, uint64_t data, EventNotifier *e)
[PATCH v2 3/6] kvm: support non datamatch ioeventfd
Adding restrictions just adds code. Signed-off-by: Michael S. Tsirkin --- kvm-all.c | 34 ++ 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 589e37c..ce823f9 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -501,21 +501,24 @@ int kvm_check_extension(KVMState *s, unsigned int extension) } static int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, - bool assign, uint32_t size) + bool assign, uint32_t size, bool datamatch) { int ret; struct kvm_ioeventfd iofd; -iofd.datamatch = val; +iofd.datamatch = datamatch ? val : 0; iofd.addr = addr; iofd.len = size; -iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH; +iofd.flags = 0; iofd.fd = fd; if (!kvm_enabled()) { return -ENOSYS; } +if (datamatch) { +iofd.flags |= KVM_IOEVENTFD_FLAG_DATAMATCH; +} if (!assign) { iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; } @@ -530,19 +533,22 @@ static int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, } static int kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint16_t val, - bool assign, uint32_t size) + bool assign, uint32_t size, bool datamatch) { struct kvm_ioeventfd kick = { -.datamatch = val, +.datamatch = datamatch ? val : 0, .addr = addr, +.flags = KVM_IOEVENTFD_FLAG_PIO, .len = size, -.flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO, .fd = fd, }; int r; if (!kvm_enabled()) { return -ENOSYS; } +if (datamatch) { +kick.flags |= KVM_IOEVENTFD_FLAG_DATAMATCH; +} if (!assign) { kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; } @@ -571,7 +577,7 @@ static int kvm_check_many_ioeventfds(void) if (ioeventfds[i] < 0) { break; } -ret = kvm_set_ioeventfd_pio(ioeventfds[i], 0, i, true, 2); +ret = kvm_set_ioeventfd_pio(ioeventfds[i], 0, i, true, 2, true); if (ret < 0) { close(ioeventfds[i]); break; @@ -582,7 +588,7 @@ static int kvm_check_many_ioeventfds(void) ret = i == ARRAY_SIZE(ioeventfds); while (i-- > 0) { -kvm_set_ioeventfd_pio(ioeventfds[i], 0, i, false, 2); +kvm_set_ioeventfd_pio(ioeventfds[i], 0, i, false, 2, true); close(ioeventfds[i]); } return ret; @@ -802,10 +808,8 @@ static void kvm_mem_ioeventfd_add(MemoryListener *listener, int fd = event_notifier_get_fd(e); int r; -assert(match_data && section->size <= 8); - r = kvm_set_ioeventfd_mmio(fd, section->offset_within_address_space, - data, true, section->size); + data, true, section->size, match_data); if (r < 0) { abort(); } @@ -820,7 +824,7 @@ static void kvm_mem_ioeventfd_del(MemoryListener *listener, int r; r = kvm_set_ioeventfd_mmio(fd, section->offset_within_address_space, - data, false, section->size); + data, false, section->size, match_data); if (r < 0) { abort(); } @@ -834,10 +838,8 @@ static void kvm_io_ioeventfd_add(MemoryListener *listener, int fd = event_notifier_get_fd(e); int r; -assert(match_data && section->size <= 8); - r = kvm_set_ioeventfd_pio(fd, section->offset_within_address_space, - data, true, section->size); + data, true, section->size, match_data); if (r < 0) { abort(); } @@ -853,7 +855,7 @@ static void kvm_io_ioeventfd_del(MemoryListener *listener, int r; r = kvm_set_ioeventfd_pio(fd, section->offset_within_address_space, - data, false, section->size); + data, false, section->size, match_data); if (r < 0) { abort(); } -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/6] kvm: support any size for pio eventfd
Signed-off-by: Michael S. Tsirkin --- kvm-all.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index ca9775d..589e37c 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -500,8 +500,8 @@ int kvm_check_extension(KVMState *s, unsigned int extension) return ret; } -static int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, bool assign, - uint32_t size) +static int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, + bool assign, uint32_t size) { int ret; struct kvm_ioeventfd iofd; @@ -529,13 +529,13 @@ static int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, bool assi return 0; } -static int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, - bool assign) +static int kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint16_t val, + bool assign, uint32_t size) { struct kvm_ioeventfd kick = { .datamatch = val, .addr = addr, -.len = 2, +.len = size, .flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO, .fd = fd, }; @@ -571,7 +571,7 @@ static int kvm_check_many_ioeventfds(void) if (ioeventfds[i] < 0) { break; } -ret = kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, true); +ret = kvm_set_ioeventfd_pio(ioeventfds[i], 0, i, true, 2); if (ret < 0) { close(ioeventfds[i]); break; @@ -582,7 +582,7 @@ static int kvm_check_many_ioeventfds(void) ret = i == ARRAY_SIZE(ioeventfds); while (i-- > 0) { -kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, false); +kvm_set_ioeventfd_pio(ioeventfds[i], 0, i, false, 2); close(ioeventfds[i]); } return ret; @@ -834,10 +834,10 @@ static void kvm_io_ioeventfd_add(MemoryListener *listener, int fd = event_notifier_get_fd(e); int r; -assert(match_data && section->size == 2); +assert(match_data && section->size <= 8); -r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space, - data, true); +r = kvm_set_ioeventfd_pio(fd, section->offset_within_address_space, + data, true, section->size); if (r < 0) { abort(); } @@ -852,8 +852,8 @@ static void kvm_io_ioeventfd_del(MemoryListener *listener, int fd = event_notifier_get_fd(e); int r; -r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space, - data, false); +r = kvm_set_ioeventfd_pio(fd, section->offset_within_address_space, + data, false, section->size); if (r < 0) { abort(); } -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/6] pci: add pci test device
This device is used for kvm unit tests, currently it supports testing performance of ioeventfd. Using updated kvm unittest, here's an example output: mmio-no-eventfd:pci-mem 8796 mmio-wildcard-eventfd:pci-mem 3609 mmio-datamatch-eventfd:pci-mem 3685 portio-no-eventfd:pci-io 5287 portio-wildcard-eventfd:pci-io 1762 portio-datamatch-eventfd:pci-io 1777 Signed-off-by: Michael S. Tsirkin --- docs/specs/pci-testdev.txt | 26 hw/Makefile.objs | 1 + hw/pci-testdev.c | 306 + hw/pci/pci.h | 1 + 4 files changed, 334 insertions(+) create mode 100644 docs/specs/pci-testdev.txt create mode 100644 hw/pci-testdev.c diff --git a/docs/specs/pci-testdev.txt b/docs/specs/pci-testdev.txt new file mode 100644 index 000..128ae22 --- /dev/null +++ b/docs/specs/pci-testdev.txt @@ -0,0 +1,26 @@ +pci-test is a device used for testing low level IO + +device implements up to two BARs: BAR0 and BAR1. +Each BAR can be memory or IO. Guests must detect +BAR type and act accordingly. + +Each BAR size is up to 4K bytes. +Each BAR starts with the following header: + +typedef struct PCITestDevHdr { +uint8_t test; <- write-only, starts a given test number +uint8_t width_type; <- read-only, type and width of access for a given test. + 1,2,4 for byte,word or long write. + any other value if test not supported on this BAR +uint8_t pad0[2]; +uint32_t offset; <- read-only, offset in this BAR for a given test +uint32_t data;<- read-only, data to use for a given test +uint32_t count; <- for debugging. number of writes detected. +uint8_t name[]; <- for debugging. 0-terminated ASCII string. +} PCITestDevHdr; + +All registers are little endian. + +device is expected to always implement tests 0 to N on each BAR, and to add new +tests with higher numbers. In this way a guest can scan test numbers until it +detects an access type that it does not support on this BAR, then stop. diff --git a/hw/Makefile.objs b/hw/Makefile.objs index eb7eb31..b4e3b1d 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -12,6 +12,7 @@ common-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o common-obj-$(CONFIG_VIRTIO) += virtio-bus.o common-obj-y += fw_cfg.o common-obj-$(CONFIG_PCI) += pci_bridge_dev.o +common-obj-$(CONFIG_PCI) += pci-testdev.o common-obj-$(CONFIG_PCI) += ioh3420.o xio3130_upstream.o xio3130_downstream.o common-obj-$(CONFIG_PCI) += i82801b11.o common-obj-y += watchdog.o diff --git a/hw/pci-testdev.c b/hw/pci-testdev.c new file mode 100644 index 000..9486624 --- /dev/null +++ b/hw/pci-testdev.c @@ -0,0 +1,306 @@ +#include "hw/hw.h" +#include "hw/pci/pci.h" +#include "qemu/event_notifier.h" +#include "qemu/osdep.h" + +typedef struct PCITestDevHdr { +uint8_t test; +uint8_t width; +uint8_t pad0[2]; +uint32_t offset; +uint8_t data; +uint8_t pad1[3]; +uint32_t count; +uint8_t name[]; +} PCITestDevHdr; + +typedef struct IOTest { +MemoryRegion *mr; +EventNotifier notifier; +bool hasnotifier; +unsigned size; +bool match_data; +PCITestDevHdr *hdr; +unsigned bufsize; +} IOTest; + +#define IOTEST_DATAMATCH 0xFA +#define IOTEST_NOMATCH 0xCE + +#define IOTEST_IOSIZE 128 +#define IOTEST_MEMSIZE 2048 + +static const char *iotest_test[] = { +"no-eventfd", +"wildcard-eventfd", +"datamatch-eventfd" +}; + +static const char *iotest_type[] = { +"mmio", +"portio" +}; + +#define IOTEST_TEST(i) (iotest_test[((i) % ARRAY_SIZE(iotest_test))]) +#define IOTEST_TYPE(i) (iotest_type[((i) / ARRAY_SIZE(iotest_test))]) +#define IOTEST_MAX_TEST (ARRAY_SIZE(iotest_test)) +#define IOTEST_MAX_TYPE (ARRAY_SIZE(iotest_type)) +#define IOTEST_MAX (IOTEST_MAX_TEST * IOTEST_MAX_TYPE) + +enum { +IOTEST_ACCESS_NAME, +IOTEST_ACCESS_DATA, +IOTEST_ACCESS_MAX, +}; + +#define IOTEST_ACCESS_TYPE uint8_t +#define IOTEST_ACCESS_WIDTH (sizeof(uint8_t)) + +typedef struct PCITestDevState { +PCIDevice dev; +MemoryRegion mmio; +MemoryRegion portio; +IOTest *tests; +int current; +} PCITestDevState; + +#define IOTEST_IS_MEM(i) (strcmp(IOTEST_TYPE(i), "portio")) +#define IOTEST_REGION(d, i) (IOTEST_IS_MEM(i) ? &(d)->mmio : &(d)->portio) +#define IOTEST_SIZE(i) (IOTEST_IS_MEM(i) ? IOTEST_MEMSIZE : IOTEST_IOSIZE) +#define IOTEST_PCI_BAR(i) (IOTEST_IS_MEM(i) ? PCI_BASE_ADDRESS_SPACE_MEMORY : \ + PCI_BASE_ADDRESS_SPACE_IO) + +static int pci_testdev_start(IOTest *test) +{ +test->hdr->count = 0; +if (!test->hasnotifier) { +return 0; +} +event_notifier_test_and_clear(&test->notifier); +memory_region_add_eventfd(test->mr, + le32_to_cpu(test->hdr->offset), + test->size, + test->match_data, + test->hdr->data, +
[PATCH v2 1/6] kvm: remove unused APIs
There are only used internally now, move them out of header and out of stub. Signed-off-by: Michael S. Tsirkin --- include/sysemu/kvm.h | 4 -- kvm-all.c| 107 ++- kvm-stub.c | 10 - 3 files changed, 54 insertions(+), 67 deletions(-) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index f2d97b5..4a65d9f 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -283,10 +283,6 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr, #endif #endif -int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t val, bool assign, - uint32_t size); - -int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign); int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg); int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg); diff --git a/kvm-all.c b/kvm-all.c index 9b433d3..ca9775d 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -500,6 +500,60 @@ int kvm_check_extension(KVMState *s, unsigned int extension) return ret; } +static int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, bool assign, + uint32_t size) +{ +int ret; +struct kvm_ioeventfd iofd; + +iofd.datamatch = val; +iofd.addr = addr; +iofd.len = size; +iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH; +iofd.fd = fd; + +if (!kvm_enabled()) { +return -ENOSYS; +} + +if (!assign) { +iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; +} + +ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &iofd); + +if (ret < 0) { +return -errno; +} + +return 0; +} + +static int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, + bool assign) +{ +struct kvm_ioeventfd kick = { +.datamatch = val, +.addr = addr, +.len = 2, +.flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO, +.fd = fd, +}; +int r; +if (!kvm_enabled()) { +return -ENOSYS; +} +if (!assign) { +kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; +} +r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick); +if (r < 0) { +return r; +} +return 0; +} + + static int kvm_check_many_ioeventfds(void) { /* Userspace can use ioeventfd for io notification. This requires a host @@ -1971,59 +2025,6 @@ int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset) return r; } - -int kvm_set_ioeventfd_mmio(int fd, uint32_t addr, uint32_t val, bool assign, - uint32_t size) -{ -int ret; -struct kvm_ioeventfd iofd; - -iofd.datamatch = val; -iofd.addr = addr; -iofd.len = size; -iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH; -iofd.fd = fd; - -if (!kvm_enabled()) { -return -ENOSYS; -} - -if (!assign) { -iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; -} - -ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &iofd); - -if (ret < 0) { -return -errno; -} - -return 0; -} - -int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) -{ -struct kvm_ioeventfd kick = { -.datamatch = val, -.addr = addr, -.len = 2, -.flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO, -.fd = fd, -}; -int r; -if (!kvm_enabled()) { -return -ENOSYS; -} -if (!assign) { -kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; -} -r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick); -if (r < 0) { -return r; -} -return 0; -} - int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr) { return kvm_arch_on_sigbus_vcpu(cpu, code, addr); diff --git a/kvm-stub.c b/kvm-stub.c index 760aadc..ef1f201 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -102,16 +102,6 @@ int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset) } #endif -int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) -{ -return -ENOSYS; -} - -int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t val, bool assign, uint32_t len) -{ -return -ENOSYS; -} - int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr) { return 1; -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/6] kvm: pci PORT IO MMIO and PV MMIO speed tests
These patches add a test device, useful to measure speed of MMIO versus PIO, in different configurations. As I didn't want to reserve a hardcoded range of memory, I added pci device for this instead. Used together with the kvm unittest patches I posted on kvm mailing list. To use, simply add the device on the pci bus. Example test output: vmcall 1519 outl_to_kernel 1745 mmio-no-eventfd:pci-mem 9075 mmio-wildcard-eventfd:pci-mem 3529 mmio-datamatch-eventfd:pci-mem 3509 mmio-pv-eventfd:pci-mem 1878 portio-no-eventfd:pci-io 5535 portio-wildcard-eventfd:pci-io 1846 portio-datamatch-eventfd:pci-io 1848 portio-pv-eventfd:pci-io 1842 First interesting conclusion is that the overhead of MMIO exit to QEMU as compared to PIO is double that of MMIO ioeventfd as compared to PIO eventfd. Is this a known fact? Second is that PV MMIO is almost as fast as portio, so it looks like a viable replacement when we have a lot of devices such that we run out of IO space. Third is that eventfd has small but measureable overhead, and that hypercalls are measureably faster than port io. The difference is probably in hardware. Whether this justifies considering hypercalls for e.g. virtio, is open for discussion. I also had to extend kvm in a minor way, making all ioeventfd options accessible through the API. This actually needs less code than checking that users DTRT. Patches 1-5 are useful in themselves. Patches 5 and 6 add support for PV MMIO eventfds and add the test for it, tested against a kernel patch I'll send shortly. They can be applied separately when we are sure the kernel interface is going to be there. Michael S. Tsirkin (6): kvm: remove unused APIs kvm: support any size for pio eventfd kvm: support non datamatch ioeventfd pci: add pci test device kvm: add PV MMIO pci-testdev: add pv mmio test docs/specs/pci-testdev.txt | 26 hw/Makefile.objs | 1 + hw/dataplane/hostmem.c | 1 + hw/ivshmem.c | 2 + hw/pci-testdev.c | 311 + hw/pci/pci.h | 1 + hw/vhost.c | 4 +- hw/virtio-pci.c| 4 +- include/exec/memory.h | 10 ++ include/sysemu/kvm.h | 4 - kvm-all.c | 142 +++-- kvm-stub.c | 10 -- linux-headers/linux/kvm.h | 8 ++ memory.c | 9 +- 14 files changed, 448 insertions(+), 85 deletions(-) create mode 100644 docs/specs/pci-testdev.txt create mode 100644 hw/pci-testdev.c -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: s390: virtio_ccw: reset errors for new I/O.
On 04.04.2013, at 12:20, Cornelia Huck wrote: > On Thu, 4 Apr 2013 11:28:35 +0200 > Alexander Graf wrote: > >> >> On 04.04.2013, at 10:25, Cornelia Huck wrote: >> >>> ccw_io_helper neglected to reset vcdev->err after a new channel >>> program had been successfully started, resulting in stale errors >>> delivered after one I/O failed. Reset the error after a new >>> channel program has been successfully started with no old I/O >>> pending. >>> >>> Signed-off-by: Cornelia Huck >> >> Shouldn't this be CC: stable? > > I don't think so, current code will not continue after errors anyway. Yeah, I got confused by the cover letter for a single patch. Usually, when you only send a single patch, you wouldn't also send a cover letter :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: fix MMIO/PIO collision misdetection
PIO and MMIO are separate address spaces, but ioeventfd registration code mistakenly detected two eventfds as duplicate if they use the same address, even if one is PIO and another one MMIO. Signed-off-by: Michael S. Tsirkin --- virt/kvm/eventfd.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index adb17f2..93e5b05 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -577,6 +577,7 @@ struct _ioeventfd { struct eventfd_ctx *eventfd; u64 datamatch; struct kvm_io_device dev; + u8 bus_idx; bool wildcard; }; @@ -669,7 +670,8 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p) struct _ioeventfd *_p; list_for_each_entry(_p, &kvm->ioeventfds, list) - if (_p->addr == p->addr && _p->length == p->length && + if (_p->bus_idx == p->bus_idx && + _p->addr == p->addr && _p->length == p->length && (_p->wildcard || p->wildcard || _p->datamatch == p->datamatch)) return true; @@ -717,6 +719,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) INIT_LIST_HEAD(&p->list); p->addr= args->addr; + p->bus_idx = bus_idx; p->length = args->len; p->eventfd = eventfd; @@ -775,7 +778,8 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) list_for_each_entry_safe(p, tmp, &kvm->ioeventfds, list) { bool wildcard = !(args->flags & KVM_IOEVENTFD_FLAG_DATAMATCH); - if (p->eventfd != eventfd || + if (p->bus_idx != bus_idx || + p->eventfd != eventfd || p->addr != args->addr || p->length != args->len || p->wildcard != wildcard) -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] ARM: KVM: switch to a dual-step HYP init code
On 04/04/13 00:15, Christoffer Dall wrote: > On Wed, Apr 03, 2013 at 11:38:30AM +0100, Marc Zyngier wrote: >> On 03/04/13 11:07, Will Deacon wrote: >>> On Tue, Apr 02, 2013 at 02:25:14PM +0100, Marc Zyngier wrote: Our HYP init code suffers from two major design issues: - it cannot support CPU hotplug, as we tear down the idmap very early - it cannot perform a TLB invalidation when switching from init to runtime mappings, as pages are manipulated from PL1 exclusively >>> >>> [...] >>> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S index 35a463f..b2c6967 100644 --- a/arch/arm/kvm/init.S +++ b/arch/arm/kvm/init.S @@ -21,6 +21,7 @@ #include #include #include +#include / * Hypervisor initialization @@ -47,6 +48,9 @@ __kvm_hyp_init: W(b). __do_hyp_init: + cmp r2, #0 @ We have a SP? + bne phase2 @ Yes, second stage init + @ Set the HTTBR to point to the hypervisor PGD pointer passed mcrrp15, 4, r0, r1, c2 @@ -96,14 +100,35 @@ __do_hyp_init: orr r0, r0, r1 isb mcr p15, 4, r0, c1, c0, 0 @ HSCR - isb - @ Set stack pointer and return to the kernel + eret + +phase2: + @ Set stack pointer mov sp, r2 @ Set HVBAR to point to the HYP vectors mcr p15, 4, r3, c12, c0, 0 @ HVBAR + @ Jump to the trampoline page + ldr r2, =#PAGE_MASK >>> >>> Shifting right by PAGE_SHIFT can avoid the load. >> >> Not really. We're masking out the top bits of "target" and adding them >> to the trampoline base address, so shifting doesn't help. >> >> But, as you suggested offline, BFI can come to the rescue and make that >> code totally fun and unreadable. How about (untested): >> >> ldr r2, =#TRAMPOLINE_VA >> adr r3, target >> bfi r2, r3, #0, #PAGE_SHIFT >> mov pc, r2 >> >> I really like it! :) >> > > What kind of drugs are you on? > > Ok, I actually like it too. Implemented, tested, works. >>> + adr r3, target + bic r3, r3, r2 + ldr r2, =#TRAMPOLINE_VA + add r3, r3, r2 + mov pc, r3 + + nop >>> >>> >> >> ... "You know I'll never sleep no more" ... >> > > Seriously, what kind of drugs are you guys on? Someone did comment last year about the quality of the water in Cambridge. He may have been right. But in this occurrence, it's only a mild case of Frank Zappatis (Zomby Woof variety). M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] ARM: KVM: enforce page alignment for identity mapped code
On 04/04/13 00:15, Christoffer Dall wrote: > On Wed, Apr 03, 2013 at 11:00:39AM +0100, Marc Zyngier wrote: >> On 03/04/13 10:50, Will Deacon wrote: >>> On Tue, Apr 02, 2013 at 02:25:12PM +0100, Marc Zyngier wrote: We're about to move to a init procedure where we rely on the fact that the init code fits in a single page. Make sure we align the idmap text on a page boundary, and that the code is not bigger than a single page. Signed-off-by: Marc Zyngier --- arch/arm/kernel/vmlinux.lds.S | 2 +- arch/arm/kvm/init.S | 7 +++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index b571484..d9dd265 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -20,7 +20,7 @@ VMLINUX_SYMBOL(__idmap_text_start) = .; \ *(.idmap.text) \ VMLINUX_SYMBOL(__idmap_text_end) = .; \ - ALIGN_FUNCTION(); \ + . = ALIGN(PAGE_SIZE); \ VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \ *(.hyp.idmap.text) \ VMLINUX_SYMBOL(__hyp_idmap_text_end) = .; diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S index 9f37a79..35a463f 100644 --- a/arch/arm/kvm/init.S +++ b/arch/arm/kvm/init.S @@ -111,4 +111,11 @@ __do_hyp_init: .globl __kvm_hyp_init_end __kvm_hyp_init_end: + /* + * The above code *must* fit in a single page for the trampoline + * madness to work. Whoever decides to change it must make sure + * we map the right amount of memory for the trampoline to work. + * The line below ensures any breakage will get noticed. > > This comment is a little funny. What's this about changing it? And why > madness? OK, it's a little mad. > > I think the nice thing to explain here is why .org would complain - I > had to look in the AS reference to figure out that it will complain if > the location pointer is forwarded outside the section - if that is the > right reason? > + */ + .org__kvm_hyp_init + PAGE_SIZE .popsection >>> >>> What effect does this have on the size of the kernel image? I'd expect the >>> idmap code to be pretty small, so aligning to a page might be overkill a lot >>> of the time. >> >> So we're indeed wasting memory between the kernel and the HYP idmaps. >> The reason for this is to ensure that the init code is not spread across >> two pages, which would make the trampoline harder to allocate. >> >> If there's a way to ensure that the init.S code will fit a single page, >> then we can remove this ALIGN(PAGE_SIZE). > > Just allocate a temporary page and copy the bit of code into there? It > should be easy enough to make sure it's location independent. We can > even give that page back later if we want and re-do the trick when a CPU > gets hot-plugged if we want. No? I'd really like to avoid freeing stuff, because it gives us even more chances to screw our TLBs (we'd have to invalidate on unmap, and deal with potential races between CPUs coming up). We could allocate this page *if* the HYP init code happens to cross a page boundary, and only then. I'll have a look at implementing this as it should solve Will's issue (which is even worse on arm64 with 64k pages). M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v3 1/6] kvm: add device control API
On Tue, Apr 02, 2013 at 08:57:48PM -0500, Scott Wood wrote: > Currently, devices that are emulated inside KVM are configured in a > hardcoded manner based on an assumption that any given architecture > only has one way to do it. If there's any need to access device state, > it is done through inflexible one-purpose-only IOCTLs (e.g. > KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is > cumbersome and depletes a limited numberspace. > > This API provides a mechanism to instantiate a device of a certain > type, returning an ID that can be used to set/get attributes of the > device. Attributes may include configuration parameters (e.g. > register base address), device state, operational commands, etc. It > is similar to the ONE_REG API, except that it acts on devices rather > than vcpus. > > Both device types and individual attributes can be tested without having > to create the device or get/set the attribute, without the need for > separately managing enumerated capabilities. > > Signed-off-by: Scott Wood > --- > v3: remove some changes that were merged into this patch by accident, > and fix the error documentation for KVM_CREATE_DEVICE. > > NOTE: I had some difficulty figuring out what ioctl numbers I should > assign... it seems that at one point care was taken to keep vcpu and > vm ioctls separate, but some overlap exists now (despite not exhausing > the ioctl space). Some of that was my fault, but not all of it. :-) > I moved to a new ioctl range for device control -- please let me know > if there's something else you'd prefer I do. > --- > Documentation/virtual/kvm/api.txt| 70 > ++ > Documentation/virtual/kvm/devices/README |1 + > include/uapi/linux/kvm.h | 27 > virt/kvm/kvm_main.c | 31 + > 4 files changed, 129 insertions(+) > create mode 100644 Documentation/virtual/kvm/devices/README > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 976eb65..d52f3f9 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -2173,6 +2173,76 @@ header; first `n_valid' valid entries with contents > from the data > written, then `n_invalid' invalid entries, invalidating any previously > valid entries found. > > +4.79 KVM_CREATE_DEVICE > + > +Capability: KVM_CAP_DEVICE_CTRL > +Type: vm ioctl > +Parameters: struct kvm_create_device (in/out) > +Returns: 0 on success, -1 on error > +Errors: > + ENODEV: The device type is unknown or unsupported > + EEXIST: Device already created, and this type of device may not > + be instantiated multiple times > + > + Other error conditions may be defined by individual device types or > + have their standard meanings. > + > +Creates an emulated device in the kernel. The file descriptor returned > +in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR. > + > +If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the > +device type is supported (not necessarily whether it can be created > +in the current vm). > + > +Individual devices should not define flags. Attributes should be used > +for specifying any behavior that is not implied by the device type > +number. > + > +struct kvm_create_device { > + __u32 type; /* in: KVM_DEV_TYPE_xxx */ > + __u32 fd; /* out: device handle */ > + __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */ > +}; > + > +4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR > + > +Capability: KVM_CAP_DEVICE_CTRL > +Type: device ioctl > +Parameters: struct kvm_device_attr > +Returns: 0 on success, -1 on error > +Errors: > + ENXIO: The group or attribute is unknown/unsupported for this device > + EPERM: The attribute cannot (currently) be accessed this way > + (e.g. read-only attribute, or attribute that only makes > + sense when the device is in a different state) > + > + Other error conditions may be defined by individual device types. > + > +Gets/sets a specified piece of device configuration and/or state. The > +semantics are device-specific. See individual device documentation in > +the "devices" directory. As with ONE_REG, the size of the data > +transferred is defined by the particular attribute. > + > +struct kvm_device_attr { > + __u32 flags; /* no flags currently defined */ > + __u32 group; /* device-defined */ > + __u64 attr; /* group-defined */ > + __u64 addr; /* userspace address of attr data */ > +}; > + Since now each device has its own fd is it an advantage to enforce common interface between different devices? If we do so though why not handle file creation, ioctl and file descriptor lifetime in the common code. Common code will have "struct kvm_device" with "struct kvm_device_arch" and "struct kvm_device_ops" members. Instead of kvm_mpic_ioctl there will be kvm_device_ioctl which will despatch ioctls to a devi
Re: [PATCH] KVM: s390: virtio_ccw: reset errors for new I/O.
On Thu, 4 Apr 2013 11:28:35 +0200 Alexander Graf wrote: > > On 04.04.2013, at 10:25, Cornelia Huck wrote: > > > ccw_io_helper neglected to reset vcdev->err after a new channel > > program had been successfully started, resulting in stale errors > > delivered after one I/O failed. Reset the error after a new > > channel program has been successfully started with no old I/O > > pending. > > > > Signed-off-by: Cornelia Huck > > Shouldn't this be CC: stable? I don't think so, current code will not continue after errors anyway. > > > Alex > > > --- > > drivers/s390/kvm/virtio_ccw.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c > > index 42d507c..5948f2a 100644 > > --- a/drivers/s390/kvm/virtio_ccw.c > > +++ b/drivers/s390/kvm/virtio_ccw.c > > @@ -133,8 +133,11 @@ static int ccw_io_helper(struct virtio_ccw_device > > *vcdev, > > do { > > spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags); > > ret = ccw_device_start(vcdev->cdev, ccw, intparm, 0, 0); > > - if (!ret) > > + if (!ret) { > > + if (!vcdev->curr_io) > > + vcdev->err = 0; > > vcdev->curr_io |= flag; > > + } > > spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev), flags); > > cpu_relax(); > > } while (ret == -EBUSY); > > -- > > 1.7.12.4 > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/6] kvm: add device control API
On Wed, Apr 03, 2013 at 07:39:52PM +0200, Alexander Graf wrote: > > On 03.04.2013, at 19:37, Scott Wood wrote: > > > On 04/03/2013 08:22:37 AM, Alexander Graf wrote: > >> On 03.04.2013, at 04:17, Paul Mackerras wrote: > >> > On Tue, Apr 02, 2013 at 08:19:56PM -0500, Scott Wood wrote: > >> >> On 04/02/2013 08:02:39 PM, Paul Mackerras wrote: > >> >>> On Mon, Apr 01, 2013 at 05:47:48PM -0500, Scott Wood wrote: > >> +4.79 KVM_CREATE_DEVICE > >> + > >> +Capability: KVM_CAP_DEVICE_CTRL > >> >>> > >> >>> I notice this patch doesn't add this capability; > >> >> > >> >> Yes, it does (see below). > >> >> > >> >>> you add it in a later patch. > >> >> > >> >> Maybe you're thinking of KVM_CAP_IRQ_MPIC? > >> > > >> > No, I was referring to the addition to kvm_dev_ioctl_check_extension() > >> > of a KVM_CAP_DEVICE_CTRL case. Since this patch adds the code to handle > >> > KVM_CREATE_DEVICE ioctl it should also add the code to return 1 if > >> > userspace queries the KVM_CAP_DEVICE_CTRL capability. > >> > > >> +/* ioctl for vm fd */ > >> +#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct > >> >>> kvm_create_device) > >> >>> > >> >>> This define should go with the other VM ioctls, otherwise the next > >> >>> person to add a VM ioctl will probably miss it and reuse the 0xe0 > >> >>> code. > >> >> > >> >> That's actually why I moved it to a new section, with device control > >> >> ioctls getting their own range, as the legacy "device model" and > >> >> some other things did. 0xe0 is not the next ioctl that would be > >> >> used for either vm or vcpu. The ioctl numbering is actually already > >> >> a mess, with sometimes care being taken to keep vcpu and vm ioctls > >> >> from overlapping, but on other places overlapping does happen. I'm > >> >> not sure what exactly I should do here. > >> > > >> > Well, even if you are using a new range, I still think that > >> > KVM_CREATE_DEVICE, being a VM ioctl, should go next to the other VM > >> > ioctls. I guess it's ultimately up to the maintainers. > >> I agree. Things get confusing for VM ioctls otherwise. > > > > Things are already confusing. :-) > > > > I can move KVM_CREATE_DEVICE back with the other VM ioctls, but what number > > should it get? The last VM ioctl is 0xab (which is also a VCPU ioctl). > > Should I use 0xac (which is also a VCPU ioctl)? Or should I try to avoid a > > conflict, as was sometimes done in the past -- in which case, which number > > should I use? > > Gleb, Marcelo? > > Yes, ioctls number assignments are a little bit of a mess :( There are 14 numbers that are used twice and some of them are used twice for the same type of fd, but with different direction bits and there is one, 0x9b, that is used twice and have the same IO direction and even, potentially, same parameter size (just checked it has the same parameter size)! Good that the second use is only on IA64 which is almost dead. Although ioctls numbers between different types of fds should not conflict lets try and make them unique. Scott, please put KVM_CREATE_DEVICE near device model ioctls and give it a unique number. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html