Re: [GIT PULL] KVM/ARM Updates for 3.12
On Fri, Aug 30, 2013 at 03:59:53PM -0700, Christoffer Dall wrote: Hi Gleb and Paolo, The following changes since commit cc2df20c7c4ce594c3e17e9cc260c330646012c8: KVM: x86: Update symbolic exit codes (2013-08-13 16:58:42 +0200) are available in the git repository at: git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-for-3.12 for you to fetch changes up to 1fe40f6d39d23f39e643607a3e1883bfc74f1244: ARM: KVM: Add newlines to panic strings (2013-08-30 15:48:02 -0700) Pulled, thanks. KVM/ARM Updates for Linux 3.12 Christoffer Dall (4): ARM: KVM: Fix kvm_set_pte assignment ARM: KVM: Simplify tracepoint text ARM: KVM: Work around older compiler bug ARM: KVM: Add newlines to panic strings arch/arm/include/asm/kvm_mmu.h |2 +- arch/arm/kvm/interrupts.S |8 arch/arm/kvm/reset.c |2 +- arch/arm/kvm/trace.h |7 +++ 4 files changed, 9 insertions(+), 10 deletions(-) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: mmu: allow page tables to be in read-only slots
On Fri, Aug 30, 2013 at 02:41:37PM +0200, Paolo Bonzini wrote: Page tables in a read-only memory slot will currently cause a triple fault because the page walker uses gfn_to_hva and it fails on such a slot. OVMF uses such a page table; however, real hardware seems to be fine with that as long as the accessed/dirty bits are set. Save whether the slot is readonly, and later check it when updating the accessed and dirty bits. The fix looks OK to me, but some comment below. Cc: sta...@vger.kernel.org Cc: g...@redhat.com Cc: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- CCing to stable@ since the regression was introduced with support for readonly memory slots. arch/x86/kvm/paging_tmpl.h | 7 ++- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c| 14 +- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 0433301..dadc5c0 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -99,6 +99,7 @@ struct guest_walker { pt_element_t prefetch_ptes[PTE_PREFETCH_NUM]; gpa_t pte_gpa[PT_MAX_FULL_LEVELS]; pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS]; + bool pte_writable[PT_MAX_FULL_LEVELS]; unsigned pt_access; unsigned pte_access; gfn_t gfn; @@ -235,6 +236,9 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu, if (pte == orig_pte) continue; + if (unlikely(!walker-pte_writable[level - 1])) + return -EACCES; + ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte); if (ret) return ret; @@ -309,7 +313,8 @@ retry_walk: goto error; real_gfn = gpa_to_gfn(real_gfn); - host_addr = gfn_to_hva(vcpu-kvm, real_gfn); + host_addr = gfn_to_hva_read(vcpu-kvm, real_gfn, + walker-pte_writable[walker-level - 1]); The use of gfn_to_hva_read is misleading. The code can still write into gfn. Lets rename gfn_to_hva_read to gfn_to_hva_prot() and gfn_to_hva() to gfn_to_hva_write(). This makes me think are there other places where gfn_to_hva() was used, but gfn_to_hva_prot() should have been? - kvm_host_page_size() looks incorrect. We never use huge page to map read only memory slots currently. - kvm_handle_bad_page() also looks incorrect and may cause incorrect address to be reported to userspace. - kvm_setup_async_pf() also incorrect. Makes all page fault on read only slot to be sync. - kvm_vm_fault() one looks OK since function assumes write only slots, but it is obsolete and should be deleted anyway. Others in generic and x86 code looks OK, somebody need to check ppc and arm code. if (unlikely(kvm_is_error_hva(host_addr))) goto error; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ca645a0..22f9cdf 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -533,6 +533,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages, struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); +unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable); unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn); void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f7e4334..418d037 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1078,11 +1078,15 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) EXPORT_SYMBOL_GPL(gfn_to_hva); /* - * The hva returned by this function is only allowed to be read. - * It should pair with kvm_read_hva() or kvm_read_hva_atomic(). + * If writable is set to false, the hva returned by this function is only + * allowed to be read. */ -static unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn) +unsigned long gfn_to_hva_read(struct kvm *kvm, gfn_t gfn, bool *writable) { + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + if (writable) + *writable = !memslot_is_readonly(slot); + return __gfn_to_hva_many(gfn_to_memslot(kvm, gfn), gfn, NULL, false); } @@ -1450,7 +1454,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, int r; unsigned long addr; - addr = gfn_to_hva_read(kvm, gfn); + addr = gfn_to_hva_read(kvm, gfn, NULL); if (kvm_is_error_hva(addr)) return -EFAULT; r = kvm_read_hva(data, (void __user *)addr + offset, len); @@ -1488,7 +1492,7 @@ int kvm_read_guest_atomic(struct kvm *kvm, gpa_t gpa,
Re: [PATCH v2] kvm: warn if num cpus is greater than num recommended
On Fri, Aug 23, 2013 at 03:24:37PM +0200, Andrew Jones wrote: The comment in kvm_max_vcpus() states that it's using the recommended procedure from the kernel API documentation to get the max number of vcpus that kvm supports. It is, but by always returning the maximum number supported. The maximum number should only be used for development purposes. qemu should check KVM_CAP_NR_VCPUS for the recommended number of vcpus. This patch adds a warning if a user specifies a number of cpus between the recommended and max. v2: Incorporate tests for max_cpus, which specifies the maximum number of hotpluggable cpus. An additional note is that the message for the fail case was slightly changed, 'exceeds max cpus' to 'exceeds the maximum cpus'. If this is unacceptable change for users like libvirt, then I'll need to spin a v3. Looks good to me. Any ACKs, objections? Signed-off-by: Andrew Jones drjo...@redhat.com --- kvm-all.c | 69 --- 1 file changed, 40 insertions(+), 29 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index a2d49786365e3..021f5f47e53da 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1322,24 +1322,20 @@ static int kvm_irqchip_create(KVMState *s) return 0; } -static int kvm_max_vcpus(KVMState *s) +/* Find number of supported CPUs using the recommended + * procedure from the kernel API documentation to cope with + * older kernels that may be missing capabilities. + */ +static int kvm_recommended_vcpus(KVMState *s) { -int ret; - -/* Find number of supported CPUs using the recommended - * procedure from the kernel API documentation to cope with - * older kernels that may be missing capabilities. - */ -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS); -if (ret) { -return ret; -} -ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS); -if (ret) { -return ret; -} +int ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS); +return (ret) ? ret : 4; +} -return 4; +static int kvm_max_vcpus(KVMState *s) +{ +int ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS); +return (ret) ? ret : kvm_recommended_vcpus(s); } int kvm_init(void) @@ -1347,11 +1343,19 @@ int kvm_init(void) static const char upgrade_note[] = Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n (see http://sourceforge.net/projects/kvm).\n; +struct { +const char *name; +int num; +} num_cpus[] = { +{ SMP, smp_cpus }, +{ hotpluggable, max_cpus }, +{ NULL, } +}, *nc = num_cpus; +int soft_vcpus_limit, hard_vcpus_limit; KVMState *s; const KVMCapabilityInfo *missing_cap; int ret; int i; -int max_vcpus; s = g_malloc0(sizeof(KVMState)); @@ -1392,19 +1396,26 @@ int kvm_init(void) goto err; } -max_vcpus = kvm_max_vcpus(s); -if (smp_cpus max_vcpus) { -ret = -EINVAL; -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max cpus -supported by KVM (%d)\n, smp_cpus, max_vcpus); -goto err; -} +/* check the vcpu limits */ +soft_vcpus_limit = kvm_recommended_vcpus(s); +hard_vcpus_limit = kvm_max_vcpus(s); -if (max_cpus max_vcpus) { -ret = -EINVAL; -fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds max cpus -supported by KVM (%d)\n, max_cpus, max_vcpus); -goto err; +while (nc-name) { +if (nc-num soft_vcpus_limit) { +fprintf(stderr, +Warning: Number of %s cpus requested (%d) exceeds +the recommended cpus supported by KVM (%d)\n, +nc-name, nc-num, soft_vcpus_limit); + +if (nc-num hard_vcpus_limit) { +ret = -EINVAL; +fprintf(stderr, Number of %s cpus requested (%d) exceeds +the maximum cpus supported by KVM (%d)\n, +nc-name, nc-num, hard_vcpus_limit); +goto err; +} +} +nc++; } s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0); -- 1.8.1.4 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] cpu: Move cpu state syncs up into cpu_dump_state()
On Tue, Aug 27, 2013 at 12:19:10PM +0100, James Hogan wrote: The x86 and ppc targets call cpu_synchronize_state() from their *_cpu_dump_state() callbacks to ensure that up to date state is dumped when KVM is enabled (for example when a KVM internal error occurs). Move this call up into the generic cpu_dump_state() function so that other KVM targets (namely MIPS) can take advantage of it. This requires kvm_cpu_synchronize_state() and cpu_synchronize_state() to be moved out of the #ifdef NEED_CPU_H in sysemu/kvm.h so that they're accessible to qom/cpu.c. Applied, thanks. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Andreas Färber afaer...@suse.de Cc: Alexander Graf ag...@suse.de Cc: Gleb Natapov g...@redhat.com Cc: qemu-...@nongnu.org Cc: kvm@vger.kernel.org --- Changes in v2 (was kvm: sync cpu state on internal error before dump) - rewrite to fix in cpu_dump_state() (Gleb Natapov) --- include/sysemu/kvm.h | 20 ++-- qom/cpu.c | 1 + target-i386/helper.c | 2 -- target-ppc/translate.c | 2 -- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index de74411..71a0186 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -270,16 +270,6 @@ int kvm_check_extension(KVMState *s, unsigned int extension); uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function, uint32_t index, int reg); -void kvm_cpu_synchronize_state(CPUState *cpu); - -/* generic hooks - to be moved/refactored once there are more users */ - -static inline void cpu_synchronize_state(CPUState *cpu) -{ -if (kvm_enabled()) { -kvm_cpu_synchronize_state(cpu); -} -} #if !defined(CONFIG_USER_ONLY) int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr, @@ -288,9 +278,19 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram_addr, #endif /* NEED_CPU_H */ +void kvm_cpu_synchronize_state(CPUState *cpu); void kvm_cpu_synchronize_post_reset(CPUState *cpu); void kvm_cpu_synchronize_post_init(CPUState *cpu); +/* generic hooks - to be moved/refactored once there are more users */ + +static inline void cpu_synchronize_state(CPUState *cpu) +{ +if (kvm_enabled()) { +kvm_cpu_synchronize_state(cpu); +} +} + static inline void cpu_synchronize_post_reset(CPUState *cpu) { if (kvm_enabled()) { diff --git a/qom/cpu.c b/qom/cpu.c index aa95108..cfe7e24 100644 --- a/qom/cpu.c +++ b/qom/cpu.c @@ -174,6 +174,7 @@ void cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf, CPUClass *cc = CPU_GET_CLASS(cpu); if (cc-dump_state) { +cpu_synchronize_state(cpu); cc-dump_state(cpu, f, cpu_fprintf, flags); } } diff --git a/target-i386/helper.c b/target-i386/helper.c index bf3e2ac..2aecfd0 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -188,8 +188,6 @@ void x86_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf, char cc_op_name[32]; static const char *seg_name[6] = { ES, CS, SS, DS, FS, GS }; -cpu_synchronize_state(cs); - eflags = cpu_compute_eflags(env); #ifdef TARGET_X86_64 if (env-hflags HF_CS64_MASK) { diff --git a/target-ppc/translate.c b/target-ppc/translate.c index f07d70d..c6a6ff8 100644 --- a/target-ppc/translate.c +++ b/target-ppc/translate.c @@ -9536,8 +9536,6 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf, CPUPPCState *env = cpu-env; int i; -cpu_synchronize_state(cs); - cpu_fprintf(f, NIP TARGET_FMT_lxLR TARGET_FMT_lx CTR TARGET_FMT_lx XER TARGET_FMT_lx \n, env-nip, env-lr, env-ctr, cpu_read_xer(env)); -- 1.8.1.2 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO
On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote: This reserves a capability number for upcoming support of VFIO-IOMMU DMA operations in real mode. This reserves a number for a new SPAPR TCE IOMMU KVM device which is going to manage lifetime of SPAPR TCE IOMMU object. This defines an attribute of the SPAPR TCE IOMMU KVM device which is going to be used for initialization. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute 2013/08/15: * fixed mistype in comments * fixed commit message which says what uses ioctls 0xad and 0xae 2013/07/16: * changed the number 2013/07/11: * changed order in a file, added comment about a gap in ioctl number --- arch/powerpc/include/uapi/asm/kvm.h | 8 include/uapi/linux/kvm.h| 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..c1ae1e5 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -511,4 +511,12 @@ struct kvm_get_htab_header { #define KVM_XICS_MASKED (1ULL 41) #define KVM_XICS_PENDING(1ULL 42) +/* SPAPR TCE IOMMU device specification */ +struct kvm_create_spapr_tce_iommu_linkage { + __u64 liobn; + __u32 fd; + __u32 flags; +}; +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE 0 + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 99c2533..9d20630 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 #define KVM_CAP_SPAPR_MULTITCE 94 +#define KVM_CAP_SPAPR_TCE_IOMMU 95 You do not need capability to check for a device support. Device API supports checking for that with KVM_CREATE_DEVICE_TEST flag to KVM_CREATE_DEVICE ioctl. #ifdef KVM_CAP_IRQ_ROUTING @@ -843,6 +844,7 @@ struct kvm_device_attr { #define KVM_DEV_TYPE_FSL_MPIC_20 1 #define KVM_DEV_TYPE_FSL_MPIC_42 2 #define KVM_DEV_TYPE_XICS3 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU 4 /* * ioctls for VM fds -- 1.8.4.rc4 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO
On 09/01/2013 09:27 PM, Gleb Natapov wrote: On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote: This reserves a capability number for upcoming support of VFIO-IOMMU DMA operations in real mode. This reserves a number for a new SPAPR TCE IOMMU KVM device which is going to manage lifetime of SPAPR TCE IOMMU object. This defines an attribute of the SPAPR TCE IOMMU KVM device which is going to be used for initialization. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute 2013/08/15: * fixed mistype in comments * fixed commit message which says what uses ioctls 0xad and 0xae 2013/07/16: * changed the number 2013/07/11: * changed order in a file, added comment about a gap in ioctl number --- arch/powerpc/include/uapi/asm/kvm.h | 8 include/uapi/linux/kvm.h| 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..c1ae1e5 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -511,4 +511,12 @@ struct kvm_get_htab_header { #define KVM_XICS_MASKED(1ULL 41) #define KVM_XICS_PENDING (1ULL 42) +/* SPAPR TCE IOMMU device specification */ +struct kvm_create_spapr_tce_iommu_linkage { +__u64 liobn; +__u32 fd; +__u32 flags; +}; +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE0 + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 99c2533..9d20630 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 #define KVM_CAP_SPAPR_MULTITCE 94 +#define KVM_CAP_SPAPR_TCE_IOMMU 95 You do not need capability to check for a device support. Device API supports checking for that with KVM_CREATE_DEVICE_TEST flag to KVM_CREATE_DEVICE ioctl. Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special in some way but SPAPR TCE IOMMU is not? I am confused, sorry. #ifdef KVM_CAP_IRQ_ROUTING @@ -843,6 +844,7 @@ struct kvm_device_attr { #define KVM_DEV_TYPE_FSL_MPIC_201 #define KVM_DEV_TYPE_FSL_MPIC_422 #define KVM_DEV_TYPE_XICS 3 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU4 /* * ioctls for VM fds -- 1.8.4.rc4 -- Gleb. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO
On Sun, Sep 01, 2013 at 09:39:23PM +1000, Alexey Kardashevskiy wrote: On 09/01/2013 09:27 PM, Gleb Natapov wrote: On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote: This reserves a capability number for upcoming support of VFIO-IOMMU DMA operations in real mode. This reserves a number for a new SPAPR TCE IOMMU KVM device which is going to manage lifetime of SPAPR TCE IOMMU object. This defines an attribute of the SPAPR TCE IOMMU KVM device which is going to be used for initialization. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute 2013/08/15: * fixed mistype in comments * fixed commit message which says what uses ioctls 0xad and 0xae 2013/07/16: * changed the number 2013/07/11: * changed order in a file, added comment about a gap in ioctl number --- arch/powerpc/include/uapi/asm/kvm.h | 8 include/uapi/linux/kvm.h| 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..c1ae1e5 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -511,4 +511,12 @@ struct kvm_get_htab_header { #define KVM_XICS_MASKED (1ULL 41) #define KVM_XICS_PENDING (1ULL 42) +/* SPAPR TCE IOMMU device specification */ +struct kvm_create_spapr_tce_iommu_linkage { + __u64 liobn; + __u32 fd; + __u32 flags; +}; +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE 0 + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 99c2533..9d20630 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 #define KVM_CAP_SPAPR_MULTITCE 94 +#define KVM_CAP_SPAPR_TCE_IOMMU 95 You do not need capability to check for a device support. Device API supports checking for that with KVM_CREATE_DEVICE_TEST flag to KVM_CREATE_DEVICE ioctl. Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special in some way but SPAPR TCE IOMMU is not? I am confused, sorry. Looking at it KVM_CAP_IRQ_XICS/KVM_CAP_IRQ_MPIC are not used to detect device existence, but to link a device to vcpu. KVM_CAP_IRQ_MPIC was introduced separately from MPIC device code. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table without passing them to user space which saves time on switching to user space and back. Both real and virtual modes are supported. The kernel tries to handle a TCE request in the real mode, if fails it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to user space. The first user of this is VFIO on POWER. Trampolines to the VFIO external user API functions are required for this patch. This adds a SPAPR TCE IOMMU KVM device to associate a logical bus number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap requests. The device supports a single attribute which is a struct with LIOBN and IOMMU fd. When the attribute is set, the device establishes the connection between KVM and VFIO. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU KVM device * release_spapr_tce_table() is not shared between different TCE types * reduced the patch size by moving VFIO external API trampolines to separate patche * moved documentation from Documentation/virtual/kvm/api.txt to Documentation/virtual/kvm/devices/spapr_tce_iommu.txt v8: * fixed warnings from check_patch.pl 2013/07/11: * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled for KVM_BOOK3S_64 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense for this here but the next patch for hugepages support will use it more. 2013/07/06: * added realmode arch_spin_lock to protect TCE table from races in real and virtual modes * POWERPC IOMMU API is changed to support real mode * iommu_take_ownership and iommu_release_ownership are protected by iommu_table's locks * VFIO external user API use rewritten * multiple small fixes 2013/06/27: * tce_list page is referenced now in order to protect it from accident invalidation during H_PUT_TCE_INDIRECT execution * added use of the external user VFIO API 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- .../virtual/kvm/devices/spapr_tce_iommu.txt| 37 +++ arch/powerpc/include/asm/kvm_host.h| 4 + arch/powerpc/kvm/book3s_64_vio.c | 310 - arch/powerpc/kvm/book3s_64_vio_hv.c| 122 arch/powerpc/kvm/powerpc.c | 1 + include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c| 5 + 7 files changed, 477 insertions(+), 3 deletions(-) create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt new file mode 100644 index 000..4bc8fc3 --- /dev/null +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt @@ -0,0 +1,37 @@ +SPAPR TCE IOMMU device + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc + +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU + +Groups: + KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE + Attributes: single attribute with pair { LIOBN, IOMMU fd} + +This is completely made up device which provides API to link +logical bus number (LIOBN) and IOMMU group. The user space has +to create a new SPAPR TCE IOMMU device per a logical bus. + Why not have one device that can handle multimple links? +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE). +IOMMU group is a minimal isolated device set which can be passed to +the user space via VFIO. + +Right after creation the device is in uninitlized state and requires +a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set. +The attribute contains liobn, IOMMU fd and flags: + +struct kvm_create_spapr_tce_iommu_linkage { + __u64 liobn; + __u32 fd; + __u32 flags; +}; +
[PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov g...@redhat.com --- Be warned that this is not even compiled tested. diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 94c1dd4..97adfe8 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -1244,8 +1244,10 @@ static int kvmppc_xics_create(struct kvm_device *dev, u32 type) kvm-arch.xics = xics; mutex_unlock(kvm-lock); - if (ret) + if (ret) { + kfree(xics); return ret; + } xics_debugfs_init(xics); diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c index 2861ae9..efbd996 100644 --- a/arch/powerpc/kvm/mpic.c +++ b/arch/powerpc/kvm/mpic.c @@ -1635,6 +1635,7 @@ static void mpic_destroy(struct kvm_device *dev) dev-kvm-arch.mpic = NULL; kfree(opp); + kfree(dev); } static int mpic_set_default_irq_routing(struct openpic *opp) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] RFC: KVM: Simple optimization based on Xiao's patch
On Fri, Aug 30, 2013 at 12:50:11PM +0900, Takuya Yoshikawa wrote: I think this patch set answers Gleb's comment. It does. Thanks. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: warn if num cpus is greater than num recommended
On Fri, Aug 23, 2013 at 03:24:37PM +0200, Andrew Jones wrote: The comment in kvm_max_vcpus() states that it's using the recommended procedure from the kernel API documentation to get the max number of vcpus that kvm supports. It is, but by always returning the maximum number supported. The maximum number should only be used for development purposes. qemu should check KVM_CAP_NR_VCPUS for the recommended number of vcpus. This patch adds a warning if a user specifies a number of cpus between the recommended and max. v2: Incorporate tests for max_cpus, which specifies the maximum number of hotpluggable cpus. An additional note is that the message for the fail case was slightly changed, 'exceeds max cpus' to 'exceeds the maximum cpus'. If this is unacceptable change for users like libvirt, then I'll need to spin a v3. Signed-off-by: Andrew Jones drjo...@redhat.com --- kvm-all.c | 69 --- 1 file changed, 40 insertions(+), 29 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index a2d49786365e3..021f5f47e53da 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1322,24 +1322,20 @@ static int kvm_irqchip_create(KVMState *s) return 0; } -static int kvm_max_vcpus(KVMState *s) +/* Find number of supported CPUs using the recommended + * procedure from the kernel API documentation to cope with + * older kernels that may be missing capabilities. + */ +static int kvm_recommended_vcpus(KVMState *s) { -int ret; - -/* Find number of supported CPUs using the recommended - * procedure from the kernel API documentation to cope with - * older kernels that may be missing capabilities. - */ -ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS); -if (ret) { -return ret; -} -ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS); -if (ret) { -return ret; -} +int ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS); +return (ret) ? ret : 4; +} -return 4; +static int kvm_max_vcpus(KVMState *s) +{ +int ret = kvm_check_extension(s, KVM_CAP_MAX_VCPUS); +return (ret) ? ret : kvm_recommended_vcpus(s); } int kvm_init(void) @@ -1347,11 +1343,19 @@ int kvm_init(void) static const char upgrade_note[] = Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n (see http://sourceforge.net/projects/kvm).\n; +struct { +const char *name; +int num; +} num_cpus[] = { +{ SMP, smp_cpus }, +{ hotpluggable, max_cpus }, +{ NULL, } +}, *nc = num_cpus; +int soft_vcpus_limit, hard_vcpus_limit; KVMState *s; const KVMCapabilityInfo *missing_cap; int ret; int i; -int max_vcpus; s = g_malloc0(sizeof(KVMState)); @@ -1392,19 +1396,26 @@ int kvm_init(void) goto err; } -max_vcpus = kvm_max_vcpus(s); -if (smp_cpus max_vcpus) { -ret = -EINVAL; -fprintf(stderr, Number of SMP cpus requested (%d) exceeds max cpus -supported by KVM (%d)\n, smp_cpus, max_vcpus); -goto err; -} +/* check the vcpu limits */ +soft_vcpus_limit = kvm_recommended_vcpus(s); +hard_vcpus_limit = kvm_max_vcpus(s); -if (max_cpus max_vcpus) { -ret = -EINVAL; -fprintf(stderr, Number of hotpluggable cpus requested (%d) exceeds max cpus -supported by KVM (%d)\n, max_cpus, max_vcpus); -goto err; +while (nc-name) { +if (nc-num soft_vcpus_limit) { +fprintf(stderr, +Warning: Number of %s cpus requested (%d) exceeds +the recommended cpus supported by KVM (%d)\n, +nc-name, nc-num, soft_vcpus_limit); + +if (nc-num hard_vcpus_limit) { +ret = -EINVAL; +fprintf(stderr, Number of %s cpus requested (%d) exceeds +the maximum cpus supported by KVM (%d)\n, +nc-name, nc-num, hard_vcpus_limit); +goto err; +} +} +nc++; } s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0); -- 1.8.1.4 ACK -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
On 08/31/2013 12:44 AM, Ben Hutchings wrote: On Fri, 2013-08-30 at 12:29 +0800, Jason Wang wrote: We used to poll vhost queue before making DMA is done, this is racy if vhost thread were waked up before marking DMA is done which can result the signal to be missed. Fix this by always poll the vhost thread before DMA is done. Does this bug only exist in net-next or is it older? Should the fix go to net and stable branches? This should go for the stable branches too (3.4 above). Thanks for the checking. Ben. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index ff60c2a..d09c17c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) struct vhost_virtqueue *vq = ubufs-vq; int cnt = atomic_read(ubufs-kref.refcount); +/* set len to mark this desc buffers done DMA */ +vq-heads[ubuf-desc].len = success ? +VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN; +vhost_net_ubuf_put(ubufs); + /* * Trigger polling thread if guest stopped submitting new buffers: * in this case, the refcount after decrement will eventually reach 1 @@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) */ if (cnt = 2 || !(cnt % 16)) vhost_poll_queue(vq-poll); -/* set len to mark this desc buffers done DMA */ -vq-heads[ubuf-desc].len = success ? -VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN; -vhost_net_ubuf_put(ubufs); } /* Expects to be always run from workqueue - which acts as -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling
On 09/01/2013 10:06 PM, Gleb Natapov wrote: On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table without passing them to user space which saves time on switching to user space and back. Both real and virtual modes are supported. The kernel tries to handle a TCE request in the real mode, if fails it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to user space. The first user of this is VFIO on POWER. Trampolines to the VFIO external user API functions are required for this patch. This adds a SPAPR TCE IOMMU KVM device to associate a logical bus number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap requests. The device supports a single attribute which is a struct with LIOBN and IOMMU fd. When the attribute is set, the device establishes the connection between KVM and VFIO. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU KVM device * release_spapr_tce_table() is not shared between different TCE types * reduced the patch size by moving VFIO external API trampolines to separate patche * moved documentation from Documentation/virtual/kvm/api.txt to Documentation/virtual/kvm/devices/spapr_tce_iommu.txt v8: * fixed warnings from check_patch.pl 2013/07/11: * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled for KVM_BOOK3S_64 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense for this here but the next patch for hugepages support will use it more. 2013/07/06: * added realmode arch_spin_lock to protect TCE table from races in real and virtual modes * POWERPC IOMMU API is changed to support real mode * iommu_take_ownership and iommu_release_ownership are protected by iommu_table's locks * VFIO external user API use rewritten * multiple small fixes 2013/06/27: * tce_list page is referenced now in order to protect it from accident invalidation during H_PUT_TCE_INDIRECT execution * added use of the external user VFIO API 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- .../virtual/kvm/devices/spapr_tce_iommu.txt| 37 +++ arch/powerpc/include/asm/kvm_host.h| 4 + arch/powerpc/kvm/book3s_64_vio.c | 310 - arch/powerpc/kvm/book3s_64_vio_hv.c| 122 arch/powerpc/kvm/powerpc.c | 1 + include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c| 5 + 7 files changed, 477 insertions(+), 3 deletions(-) create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt new file mode 100644 index 000..4bc8fc3 --- /dev/null +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt @@ -0,0 +1,37 @@ +SPAPR TCE IOMMU device + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc + +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU + +Groups: + KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE + Attributes: single attribute with pair { LIOBN, IOMMU fd} + +This is completely made up device which provides API to link +logical bus number (LIOBN) and IOMMU group. The user space has +to create a new SPAPR TCE IOMMU device per a logical bus. + Why not have one device that can handle multimple links? I can do that. If I make it so, it won't even look as a device at all, just some weird interface to KVM but ok. What bothers me is it is just a question what I will have to do next. Because I can easily predict a suggestion to move kvmppc_spapr_tce_table's (a links list) from kvm-arch.spapr_tce_tables to that device but I cannot do that for obvious compatibility reasons caused by the fact that the list is already used for emulated devices (for the starter - they need mmap()). Or
Re: [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time
On 08/31/2013 02:35 AM, Sergei Shtylyov wrote: Hello. On 08/30/2013 08:29 AM, Jason Wang wrote: Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if upend_idx != done_idx we still set zcopy_used to true and rollback this choice later. This could be avoided by determine zerocopy once by checking all conditions at one time before. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 46 +++--- 1 files changed, 19 insertions(+), 27 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8a6dd0d..ff60c2a 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -404,43 +404,35 @@ static void handle_tx(struct vhost_net *net) iov_length(nvq-hdr, s), hdr_size); break; } -zcopy_used = zcopy (len = VHOST_GOODCOPY_LEN || - nvq-upend_idx != nvq-done_idx); + +zcopy_used = zcopy len = VHOST_GOODCOPY_LEN + (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx + vhost_net_tx_select_zcopy(net); Could you leave on a first of two lines, matching the previous style? ok. /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { +struct ubuf_info *ubuf; +ubuf = nvq-ubuf_info + nvq-upend_idx; + vq-heads[nvq-upend_idx].id = head; [...] +vq-heads[nvq-upend_idx].len = VHOST_DMA_IN_PROGRESS; +ubuf-callback = vhost_zerocopy_callback; +ubuf-ctx = nvq-ubufs; +ubuf-desc = nvq-upend_idx; +msg.msg_control = ubuf; +msg.msg_controllen = sizeof(ubuf); 'sizeof(ubuf)' where 'ubuf' is a pointer? Are you sure it shouldn't be 'sizeof(*ubuf)'? Yes, pointer is sufficiet. Vhost allocate an arrays of ubuf and tun/macvtap just need a reference of it. WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is fallback vhost_net to qemu for live migrate available?
On 08/31/2013 12:45 PM, Qin Chuanyu wrote: On 2013/8/30 0:08, Anthony Liguori wrote: Hi Qin, By change the memory copy and notify mechanism ,currently virtio-net with vhost_net could run on Xen with good performance。 I think the key in doing this would be to implement a property ioeventfd and irqfd interface in the driver domain kernel. Just hacking vhost_net with Xen specific knowledge would be pretty nasty IMHO. Yes, I add a kernel module which persist virtio-net pio_addr and msix address as what kvm module did. Guest wake up vhost thread by adding a hook func in evtchn_interrupt. Did you modify the front end driver to do grant table mapping or is this all being done by mapping the domain's memory? There is nothing changed in front end driver. Currently I use alloc_vm_area to get address space, and map the domain's memory as what what qemu did. KVM and Xen represent memory in a very different way. KVM can only track when guest mode code dirties memory. It relies on QEMU to track when guest memory is dirtied by QEMU. Since vhost is running outside of QEMU, vhost also needs to tell QEMU when it has dirtied memory. I don't think this is a problem with Xen though. I believe (although could be wrong) that Xen is able to track when either the domain or dom0 dirties memory. So I think you can simply ignore the dirty logging with vhost and it should Just Work. Thanks for your advice, I have tried it, without ping, it could migrate successfully, but if there has skb been received, domU would crash. I guess that because though Xen track domU memory, but it could only track memory that changed in DomU. memory changed by Dom0 is out of control. No, we don't have a mechanism to fallback to QEMU for the datapath. It would be possible but I think it's a bad idea to mix and match the two. Next I would try to fallback datapath to qemu for three reason: 1: memory translate mechanism has been changed for vhost_net on Xen,so there would be some necessary changed needed for vhost_log in kernel. 2: I also maped IOREQ_PFN page(which is used for communication between qemu and Xen) in kernel notify module, so it also needed been marked as dirty when tx/rx exist in migrate period. 3: Most important of all, Michael S. Tsirkin said that he hadn't considered about vhost_net migrate on Xen,so there would be some changed needed in vhost_log for qemu. fallback to qemu seems to much easier, isn't it. Maybe we can just stop vhost_net in pre_save() and enable it in post_load()? Then no need to use enable the dirty logging of vhost_net. Regards Qin chuanyu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
On Fri, Aug 30, 2013 at 12:29:18PM +0800, Jason Wang wrote: We tend to batch the used adding and signaling in vhost_zerocopy_callback() which may result more than 100 used buffers to be updated in vhost_zerocopy_signal_used() in some cases. So wwitch to use switch vhost_add_used_and_signal_n() to avoid multiple calls to vhost_add_used_and_signal(). Which means much more less times of used index updating and memory barriers. pls put info on perf gain in commit log too Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 13 - 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 280ee66..8a6dd0d 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -281,7 +281,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, { struct vhost_net_virtqueue *nvq = container_of(vq, struct vhost_net_virtqueue, vq); - int i; + int i, add; int j = 0; for (i = nvq-done_idx; i != nvq-upend_idx; i = (i + 1) % UIO_MAXIOV) { @@ -289,14 +289,17 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, vhost_net_tx_err(net); if (VHOST_DMA_IS_DONE(vq-heads[i].len)) { vq-heads[i].len = VHOST_DMA_CLEAR_LEN; - vhost_add_used_and_signal(vq-dev, vq, - vq-heads[i].id, 0); ++j; } else break; } - if (j) - nvq-done_idx = i; + while (j) { + add = min(UIO_MAXIOV - nvq-done_idx, j); + vhost_add_used_and_signal_n(vq-dev, vq, + vq-heads[nvq-done_idx], add); + nvq-done_idx = (nvq-done_idx + add) % UIO_MAXIOV; + j -= add; + } } static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void
tweak subj s/returns/return/ On Fri, Aug 30, 2013 at 12:29:17PM +0800, Jason Wang wrote: None of its caller use its return value, so let it return void. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 969a859..280ee66 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -276,8 +276,8 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to, * of used idx. Once lower device DMA done contiguously, we will signal KVM * guest used idx. */ -static int vhost_zerocopy_signal_used(struct vhost_net *net, - struct vhost_virtqueue *vq) +static void vhost_zerocopy_signal_used(struct vhost_net *net, +struct vhost_virtqueue *vq) { struct vhost_net_virtqueue *nvq = container_of(vq, struct vhost_net_virtqueue, vq); @@ -297,7 +297,6 @@ static int vhost_zerocopy_signal_used(struct vhost_net *net, } if (j) nvq-done_idx = i; - return j; } static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers
On Fri, Aug 30, 2013 at 12:29:22PM +0800, Jason Wang wrote: As Michael point out, We used to limit the max pending DMAs to get better cache utilization. But it was not done correctly since it was one done when there's no new buffers submitted from guest. Guest can easily exceeds the limitation by keeping sending packets. So this patch moves the check into main loop. Tests shows about 5%-10% improvement on per cpu throughput for guest tx. But a 5% drop on per cpu transaction rate for a single session TCP_RR. Any explanation for the drop? single session TCP_RR is unlikely to exceed VHOST_MAX_PEND, correct? Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 15 --- 1 files changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d09c17c..592e1f2 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -363,6 +363,10 @@ static void handle_tx(struct vhost_net *net) if (zcopy) vhost_zerocopy_signal_used(net, vq); + if ((nvq-upend_idx + vq-num - VHOST_MAX_PEND) % UIO_MAXIOV == + nvq-done_idx) + break; + head = vhost_get_vq_desc(net-dev, vq, vq-iov, ARRAY_SIZE(vq-iov), out, in, @@ -372,17 +376,6 @@ static void handle_tx(struct vhost_net *net) break; /* Nothing new? Wait for eventfd to tell us they refilled. */ if (head == vq-num) { - int num_pends; - - /* If more outstanding DMAs, queue the work. - * Handle upend_idx wrap around - */ - num_pends = likely(nvq-upend_idx = nvq-done_idx) ? - (nvq-upend_idx - nvq-done_idx) : - (nvq-upend_idx + UIO_MAXIOV - - nvq-done_idx); - if (unlikely(num_pends VHOST_MAX_PEND)) - break; if (unlikely(vhost_enable_notify(net-dev, vq))) { vhost_disable_notify(net-dev, vq); continue; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO
On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote: This reserves a capability number for upcoming support of VFIO-IOMMU DMA operations in real mode. This reserves a number for a new SPAPR TCE IOMMU KVM device which is going to manage lifetime of SPAPR TCE IOMMU object. This defines an attribute of the SPAPR TCE IOMMU KVM device which is going to be used for initialization. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute 2013/08/15: * fixed mistype in comments * fixed commit message which says what uses ioctls 0xad and 0xae 2013/07/16: * changed the number 2013/07/11: * changed order in a file, added comment about a gap in ioctl number --- arch/powerpc/include/uapi/asm/kvm.h | 8 include/uapi/linux/kvm.h| 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..c1ae1e5 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -511,4 +511,12 @@ struct kvm_get_htab_header { #define KVM_XICS_MASKED (1ULL 41) #define KVM_XICS_PENDING(1ULL 42) +/* SPAPR TCE IOMMU device specification */ +struct kvm_create_spapr_tce_iommu_linkage { + __u64 liobn; + __u32 fd; + __u32 flags; +}; +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE 0 + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 99c2533..9d20630 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 #define KVM_CAP_SPAPR_MULTITCE 94 +#define KVM_CAP_SPAPR_TCE_IOMMU 95 You do not need capability to check for a device support. Device API supports checking for that with KVM_CREATE_DEVICE_TEST flag to KVM_CREATE_DEVICE ioctl. #ifdef KVM_CAP_IRQ_ROUTING @@ -843,6 +844,7 @@ struct kvm_device_attr { #define KVM_DEV_TYPE_FSL_MPIC_20 1 #define KVM_DEV_TYPE_FSL_MPIC_42 2 #define KVM_DEV_TYPE_XICS3 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU 4 /* * ioctls for VM fds -- 1.8.4.rc4 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO
On 09/01/2013 09:27 PM, Gleb Natapov wrote: On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote: This reserves a capability number for upcoming support of VFIO-IOMMU DMA operations in real mode. This reserves a number for a new SPAPR TCE IOMMU KVM device which is going to manage lifetime of SPAPR TCE IOMMU object. This defines an attribute of the SPAPR TCE IOMMU KVM device which is going to be used for initialization. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute 2013/08/15: * fixed mistype in comments * fixed commit message which says what uses ioctls 0xad and 0xae 2013/07/16: * changed the number 2013/07/11: * changed order in a file, added comment about a gap in ioctl number --- arch/powerpc/include/uapi/asm/kvm.h | 8 include/uapi/linux/kvm.h| 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..c1ae1e5 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -511,4 +511,12 @@ struct kvm_get_htab_header { #define KVM_XICS_MASKED(1ULL 41) #define KVM_XICS_PENDING (1ULL 42) +/* SPAPR TCE IOMMU device specification */ +struct kvm_create_spapr_tce_iommu_linkage { +__u64 liobn; +__u32 fd; +__u32 flags; +}; +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE0 + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 99c2533..9d20630 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 #define KVM_CAP_SPAPR_MULTITCE 94 +#define KVM_CAP_SPAPR_TCE_IOMMU 95 You do not need capability to check for a device support. Device API supports checking for that with KVM_CREATE_DEVICE_TEST flag to KVM_CREATE_DEVICE ioctl. Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special in some way but SPAPR TCE IOMMU is not? I am confused, sorry. #ifdef KVM_CAP_IRQ_ROUTING @@ -843,6 +844,7 @@ struct kvm_device_attr { #define KVM_DEV_TYPE_FSL_MPIC_201 #define KVM_DEV_TYPE_FSL_MPIC_422 #define KVM_DEV_TYPE_XICS 3 +#define KVM_DEV_TYPE_SPAPR_TCE_IOMMU4 /* * ioctls for VM fds -- 1.8.4.rc4 -- Gleb. -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 04/13] KVM: PPC: reserve a capability and KVM device type for realmode VFIO
On Sun, Sep 01, 2013 at 09:39:23PM +1000, Alexey Kardashevskiy wrote: On 09/01/2013 09:27 PM, Gleb Natapov wrote: On Wed, Aug 28, 2013 at 06:37:41PM +1000, Alexey Kardashevskiy wrote: This reserves a capability number for upcoming support of VFIO-IOMMU DMA operations in real mode. This reserves a number for a new SPAPR TCE IOMMU KVM device which is going to manage lifetime of SPAPR TCE IOMMU object. This defines an attribute of the SPAPR TCE IOMMU KVM device which is going to be used for initialization. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM ioctl is replaced with SPAPR TCE IOMMU KVM device type with KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute 2013/08/15: * fixed mistype in comments * fixed commit message which says what uses ioctls 0xad and 0xae 2013/07/16: * changed the number 2013/07/11: * changed order in a file, added comment about a gap in ioctl number --- arch/powerpc/include/uapi/asm/kvm.h | 8 include/uapi/linux/kvm.h| 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..c1ae1e5 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -511,4 +511,12 @@ struct kvm_get_htab_header { #define KVM_XICS_MASKED (1ULL 41) #define KVM_XICS_PENDING (1ULL 42) +/* SPAPR TCE IOMMU device specification */ +struct kvm_create_spapr_tce_iommu_linkage { + __u64 liobn; + __u32 fd; + __u32 flags; +}; +#define KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE 0 + #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 99c2533..9d20630 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 #define KVM_CAP_SPAPR_MULTITCE 94 +#define KVM_CAP_SPAPR_TCE_IOMMU 95 You do not need capability to check for a device support. Device API supports checking for that with KVM_CREATE_DEVICE_TEST flag to KVM_CREATE_DEVICE ioctl. Hm. I copied my device from KVM_DEV_TYPE_XICS and there is a capability for it - KVM_CAP_IRQ_XICS. Do We not need both capabilities? Or XICS is special in some way but SPAPR TCE IOMMU is not? I am confused, sorry. Looking at it KVM_CAP_IRQ_XICS/KVM_CAP_IRQ_MPIC are not used to detect device existence, but to link a device to vcpu. KVM_CAP_IRQ_MPIC was introduced separately from MPIC device code. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table without passing them to user space which saves time on switching to user space and back. Both real and virtual modes are supported. The kernel tries to handle a TCE request in the real mode, if fails it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to user space. The first user of this is VFIO on POWER. Trampolines to the VFIO external user API functions are required for this patch. This adds a SPAPR TCE IOMMU KVM device to associate a logical bus number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap requests. The device supports a single attribute which is a struct with LIOBN and IOMMU fd. When the attribute is set, the device establishes the connection between KVM and VFIO. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU KVM device * release_spapr_tce_table() is not shared between different TCE types * reduced the patch size by moving VFIO external API trampolines to separate patche * moved documentation from Documentation/virtual/kvm/api.txt to Documentation/virtual/kvm/devices/spapr_tce_iommu.txt v8: * fixed warnings from check_patch.pl 2013/07/11: * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled for KVM_BOOK3S_64 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense for this here but the next patch for hugepages support will use it more. 2013/07/06: * added realmode arch_spin_lock to protect TCE table from races in real and virtual modes * POWERPC IOMMU API is changed to support real mode * iommu_take_ownership and iommu_release_ownership are protected by iommu_table's locks * VFIO external user API use rewritten * multiple small fixes 2013/06/27: * tce_list page is referenced now in order to protect it from accident invalidation during H_PUT_TCE_INDIRECT execution * added use of the external user VFIO API 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- .../virtual/kvm/devices/spapr_tce_iommu.txt| 37 +++ arch/powerpc/include/asm/kvm_host.h| 4 + arch/powerpc/kvm/book3s_64_vio.c | 310 - arch/powerpc/kvm/book3s_64_vio_hv.c| 122 arch/powerpc/kvm/powerpc.c | 1 + include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c| 5 + 7 files changed, 477 insertions(+), 3 deletions(-) create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt new file mode 100644 index 000..4bc8fc3 --- /dev/null +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt @@ -0,0 +1,37 @@ +SPAPR TCE IOMMU device + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc + +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU + +Groups: + KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE + Attributes: single attribute with pair { LIOBN, IOMMU fd} + +This is completely made up device which provides API to link +logical bus number (LIOBN) and IOMMU group. The user space has +to create a new SPAPR TCE IOMMU device per a logical bus. + Why not have one device that can handle multimple links? +LIOBN is a PCI bus identifier from PPC64-server (sPAPR) DMA hypercalls +(H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE). +IOMMU group is a minimal isolated device set which can be passed to +the user space via VFIO. + +Right after creation the device is in uninitlized state and requires +a KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE attribute to be set. +The attribute contains liobn, IOMMU fd and flags: + +struct kvm_create_spapr_tce_iommu_linkage { + __u64 liobn; + __u32 fd; + __u32 flags; +}; +
[PATCH] KVM: PPC: fix couple of memory leaks in MPIC/XICS devices
XICS failed to free xics structure on error path. MPIC destroy handler forgot to delete kvm_device structure. Signed-off-by: Gleb Natapov g...@redhat.com --- Be warned that this is not even compiled tested. diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c index 94c1dd4..97adfe8 100644 --- a/arch/powerpc/kvm/book3s_xics.c +++ b/arch/powerpc/kvm/book3s_xics.c @@ -1244,8 +1244,10 @@ static int kvmppc_xics_create(struct kvm_device *dev, u32 type) kvm-arch.xics = xics; mutex_unlock(kvm-lock); - if (ret) + if (ret) { + kfree(xics); return ret; + } xics_debugfs_init(xics); diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c index 2861ae9..efbd996 100644 --- a/arch/powerpc/kvm/mpic.c +++ b/arch/powerpc/kvm/mpic.c @@ -1635,6 +1635,7 @@ static void mpic_destroy(struct kvm_device *dev) dev-kvm-arch.mpic = NULL; kfree(opp); + kfree(dev); } static int mpic_set_default_irq_routing(struct openpic *opp) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 12/13] KVM: PPC: Add support for IOMMU in-kernel handling
On 09/01/2013 10:06 PM, Gleb Natapov wrote: On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table without passing them to user space which saves time on switching to user space and back. Both real and virtual modes are supported. The kernel tries to handle a TCE request in the real mode, if fails it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to user space. The first user of this is VFIO on POWER. Trampolines to the VFIO external user API functions are required for this patch. This adds a SPAPR TCE IOMMU KVM device to associate a logical bus number (LIOBN) with an VFIO IOMMU group fd and enable in-kernel handling of map/unmap requests. The device supports a single attribute which is a struct with LIOBN and IOMMU fd. When the attribute is set, the device establishes the connection between KVM and VFIO. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v9: * KVM_CAP_SPAPR_TCE_IOMMU ioctl to KVM replaced with SPAPR TCE IOMMU KVM device * release_spapr_tce_table() is not shared between different TCE types * reduced the patch size by moving VFIO external API trampolines to separate patche * moved documentation from Documentation/virtual/kvm/api.txt to Documentation/virtual/kvm/devices/spapr_tce_iommu.txt v8: * fixed warnings from check_patch.pl 2013/07/11: * removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled for KVM_BOOK3S_64 * kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense for this here but the next patch for hugepages support will use it more. 2013/07/06: * added realmode arch_spin_lock to protect TCE table from races in real and virtual modes * POWERPC IOMMU API is changed to support real mode * iommu_take_ownership and iommu_release_ownership are protected by iommu_table's locks * VFIO external user API use rewritten * multiple small fixes 2013/06/27: * tce_list page is referenced now in order to protect it from accident invalidation during H_PUT_TCE_INDIRECT execution * added use of the external user VFIO API 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- .../virtual/kvm/devices/spapr_tce_iommu.txt| 37 +++ arch/powerpc/include/asm/kvm_host.h| 4 + arch/powerpc/kvm/book3s_64_vio.c | 310 - arch/powerpc/kvm/book3s_64_vio_hv.c| 122 arch/powerpc/kvm/powerpc.c | 1 + include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c| 5 + 7 files changed, 477 insertions(+), 3 deletions(-) create mode 100644 Documentation/virtual/kvm/devices/spapr_tce_iommu.txt diff --git a/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt new file mode 100644 index 000..4bc8fc3 --- /dev/null +++ b/Documentation/virtual/kvm/devices/spapr_tce_iommu.txt @@ -0,0 +1,37 @@ +SPAPR TCE IOMMU device + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc + +Device type supported: KVM_DEV_TYPE_SPAPR_TCE_IOMMU + +Groups: + KVM_DEV_SPAPR_TCE_IOMMU_ATTR_LINKAGE + Attributes: single attribute with pair { LIOBN, IOMMU fd} + +This is completely made up device which provides API to link +logical bus number (LIOBN) and IOMMU group. The user space has +to create a new SPAPR TCE IOMMU device per a logical bus. + Why not have one device that can handle multimple links? I can do that. If I make it so, it won't even look as a device at all, just some weird interface to KVM but ok. What bothers me is it is just a question what I will have to do next. Because I can easily predict a suggestion to move kvmppc_spapr_tce_table's (a links list) from kvm-arch.spapr_tce_tables to that device but I cannot do that for obvious compatibility reasons caused by the fact that the list is already used for emulated devices (for the starter - they need mmap()). Or