[PATCH] arch: powerpc: kvm: add signed type cast for comparation
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when failure occurs, so it need a type cast for comparing. 'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number when failure occurs, so it need a type cast for comparing. Signed-off-by: Chen Gang gang.c...@asianux.com --- arch/powerpc/kvm/book3s_hv.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2efa9dd..7629cd3 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1809,7 +1809,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu) rma_size = PAGE_SHIFT; rmls = lpcr_rmls(rma_size); err = -EINVAL; - if (rmls 0) { + if ((long)rmls 0) { pr_err(KVM: Can't use RMA of 0x%lx bytes\n, rma_size); goto out_srcu; } @@ -1874,7 +1874,7 @@ int kvmppc_core_init_vm(struct kvm *kvm) /* Allocate the guest's logical partition ID */ lpid = kvmppc_alloc_lpid(); - if (lpid 0) + if ((long)lpid 0) return -ENOMEM; kvm-arch.lpid = lpid; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/21/2013 11:39:45 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Thursday, July 18, 2013 11:09 PM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 12:32:18 PM, Alexander Graf wrote: On 18.07.2013, at 19:17, Scott Wood wrote: On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote: Likewise, we want to make sure this matches the host entry. Unfortunately, this is a bit of a mess already. 64-bit booke appears to always set MAS2_M for TLB0 mappings. The initial KERNELBASE mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC) replaces it uses _PAGE_COHERENT. 32-bit always uses _PAGE_COHERENT, except that initial KERNELBASE mapping. _PAGE_COHERENT appears to be set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case). As for what we actually want to happen, there are cases when we want M to be set for non-SMP. One such case is AMP, where CPUs may be sharing memory even if the Linux instance only runs on one CPU (this is not hypothetical, BTW). It's also possible that we encounter a hardware bug that requires MAS2_M, similar to what some of our non-booke chips require. How about we always set M then for RAM? M is like I in that bad things happen if you mix them. I am trying to list the invalid mixing of WIMG: 1) I M 2) W I 3) W M (Scott mentioned that he observed issues when mixing these two) 4) is there any other? That's not what I was talking about (and I don't think I mentioned W at all, though it is also potentially problematic). I'm talking about mixing I with not-I (on two different virtual addresses pointing to the same physical), M with not-M, etc. So we really want to match exactly what the rest of the kernel is doing. How the rest of kernel is doing is a bit complex. IIUC, if we forget about the boot state then this is how kernel set WIMG bits: 1) For Memory always set M if CONFIG_SMP set. - So KVM can do same. M will not be mixed with W and I. G and E are guest control. I don't think this is accurate for 64-bit. And what about the AMP case? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PPC: Don't sync timebase when inside VM
On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote: When running inside a virtual machine, we can not modify timebase, so let's just not call the functions for it then. This resolves hangs when booting e500 SMP guests on overcommitted hosts. Reported-by: Stuart Yoder b08...@freescale.com Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/platforms/85xx/smp.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index ff42490..d4b6c1f 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void) smp_85xx_ops.cause_ipi = doorbell_cause_ipi; } + /* When running under a hypervisor, we can not modify tb */ + np = of_find_node_by_path(/hypervisor); + if (np) { + smp_85xx_ops.give_timebase = NULL; + smp_85xx_ops.take_timebase = NULL; + } I'm marking this superseded as we now only set give/take_timebase if a guts node is present that corresponds to an SMP SoC. QEMU currently advertises an mpc8544 guts (which is not SMP) and will eventually move to a paravirt device with no guts at all. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap
Ping, anyone, please? Ben needs ack from any of MM people before proceeding with this patch. Thanks! On 07/16/2013 10:53 AM, Alexey Kardashevskiy wrote: The current VFIO-on-POWER implementation supports only user mode driven mapping, i.e. QEMU is sending requests to map/unmap pages. However this approach is really slow, so we want to move that to KVM. Since H_PUT_TCE can be extremely performance sensitive (especially with network adapters where each packet needs to be mapped/unmapped) we chose to implement that as a fast hypercall directly in real mode (processor still in the guest context but MMU off). To be able to do that, we need to provide some facilities to access the struct page count within that real mode environment as things like the sparsemem vmemmap mappings aren't accessible. This adds an API to increment/decrement page counter as get_user_pages API used for user mode mapping does not work in the real mode. CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported. Cc: linux...@kvack.org Reviewed-by: Paul Mackerras pau...@samba.org Signed-off-by: Paul Mackerras pau...@samba.org Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: 2013/07/10: * adjusted comment (removed sentence about virtual mode) * get_page_unless_zero replaced with atomic_inc_not_zero to minimize effect of a possible get_page_unless_zero() rework (if it ever happens). 2013/06/27: * realmode_get_page() fixed to use get_page_unless_zero(). If failed, the call will be passed from real to virtual mode and safely handled. * added comment to PageCompound() in include/linux/page-flags.h. 2013/05/20: * PageTail() is replaced by PageCompound() in order to have the same checks for whether the page is huge in realmode_get_page() and realmode_put_page() Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/include/asm/pgtable-ppc64.h | 4 ++ arch/powerpc/mm/init_64.c| 76 +++- include/linux/page-flags.h | 4 +- 3 files changed, 82 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index 46db094..aa7b169 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -394,6 +394,10 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array, hpte_slot_array[index] = hidx 4 | 0x1 3; } +struct page *realmode_pfn_to_page(unsigned long pfn); +int realmode_get_page(struct page *page); +int realmode_put_page(struct page *page); + static inline char *get_hpte_slot_array(pmd_t *pmdp) { /* diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index d0cd9e4..dcbb806 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -300,5 +300,79 @@ void vmemmap_free(unsigned long start, unsigned long end) { } -#endif /* CONFIG_SPARSEMEM_VMEMMAP */ +/* + * We do not have access to the sparsemem vmemmap, so we fallback to + * walking the list of sparsemem blocks which we already maintain for + * the sake of crashdump. In the long run, we might want to maintain + * a tree if performance of that linear walk becomes a problem. + * + * Any of realmode_ functions can fail due to: + * 1) As real sparsemem blocks do not lay in RAM continously (they + * are in virtual address space which is not available in the real mode), + * the requested page struct can be split between blocks so get_page/put_page + * may fail. + * 2) When huge pages are used, the get_page/put_page API will fail + * in real mode as the linked addresses in the page struct are virtual + * too. + */ +struct page *realmode_pfn_to_page(unsigned long pfn) +{ + struct vmemmap_backing *vmem_back; + struct page *page; + unsigned long page_size = 1 mmu_psize_defs[mmu_vmemmap_psize].shift; + unsigned long pg_va = (unsigned long) pfn_to_page(pfn); + for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back-list) { + if (pg_va vmem_back-virt_addr) + continue; + + /* Check that page struct is not split between real pages */ + if ((pg_va + sizeof(struct page)) + (vmem_back-virt_addr + page_size)) + return NULL; + + page = (struct page *) (vmem_back-phys + pg_va - + vmem_back-virt_addr); + return page; + } + + return NULL; +} +EXPORT_SYMBOL_GPL(realmode_pfn_to_page); + +#elif defined(CONFIG_FLATMEM) + +struct page *realmode_pfn_to_page(unsigned long pfn) +{ + struct page *page = pfn_to_page(pfn); + return page; +} +EXPORT_SYMBOL_GPL(realmode_pfn_to_page); + +#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */ + +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM) +int
Re: [PATCH 03/10] vfio: add external user support
On Tue, 2013-07-16 at 10:53 +1000, Alexey Kardashevskiy wrote: VFIO is designed to be used via ioctls on file descriptors returned by VFIO. However in some situations support for an external user is required. The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to use the existing VFIO groups for exclusive access in real/virtual mode on a host to avoid passing map/unmap requests to the user space which would made things pretty slow. The protocol includes: 1. do normal VFIO init operation: - opening a new container; - attaching group(s) to it; - setting an IOMMU driver for a container. When IOMMU is set for a container, all groups in it are considered ready to use by an external user. 2. User space passes a group fd to an external user. The external user calls vfio_group_get_external_user() to verify that: - the group is initialized; - IOMMU is set for it. If both checks passed, vfio_group_get_external_user() increments the container user counter to prevent the VFIO group from disposal before KVM exits. 3. The external user calls vfio_external_user_iommu_id() to know an IOMMU ID. PPC64 KVM uses it to link logical bus number (LIOBN) with IOMMU ID. 4. When the external KVM finishes, it calls vfio_group_put_external_user() to release the VFIO group. This call decrements the container user counter. Everything gets released. The vfio: Limit group opens patch is also required for the consistency. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru This looks fine to me. Is the plan to add this through the ppc tree again? Thanks, Alex --- Changes: 2013/07/11: * added vfio_group_get()/vfio_group_put() * protocol description changed Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- drivers/vfio/vfio.c | 62 include/linux/vfio.h | 7 ++ 2 files changed, 69 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c488da5..58b034b 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1370,6 +1370,68 @@ static const struct file_operations vfio_device_fops = { }; /** + * External user API, exported by symbols to be linked dynamically. + * + * The protocol includes: + * 1. do normal VFIO init operation: + * - opening a new container; + * - attaching group(s) to it; + * - setting an IOMMU driver for a container. + * When IOMMU is set for a container, all groups in it are + * considered ready to use by an external user. + * + * 2. User space passes a group fd to an external user. + * The external user calls vfio_group_get_external_user() + * to verify that: + * - the group is initialized; + * - IOMMU is set for it. + * If both checks passed, vfio_group_get_external_user() + * increments the container user counter to prevent + * the VFIO group from disposal before KVM exits. + * + * 3. The external user calls vfio_external_user_iommu_id() + * to know an IOMMU ID. + * + * 4. When the external KVM finishes, it calls + * vfio_group_put_external_user() to release the VFIO group. + * This call decrements the container user counter. + */ +struct vfio_group *vfio_group_get_external_user(struct file *filep) +{ + struct vfio_group *group = filep-private_data; + + if (filep-f_op != vfio_group_fops) + return ERR_PTR(-EINVAL); + + if (!atomic_inc_not_zero(group-container_users)) + return ERR_PTR(-EINVAL); + + if (!group-container-iommu_driver || + !vfio_group_viable(group)) { + atomic_dec(group-container_users); + return ERR_PTR(-EINVAL); + } + + vfio_group_get(group); + + return group; +} +EXPORT_SYMBOL_GPL(vfio_group_get_external_user); + +void vfio_group_put_external_user(struct vfio_group *group) +{ + vfio_group_put(group); + vfio_group_try_dissolve_container(group); +} +EXPORT_SYMBOL_GPL(vfio_group_put_external_user); + +int vfio_external_user_iommu_id(struct vfio_group *group) +{ + return iommu_group_id(group-iommu_group); +} +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id); + +/** * Module/class support */ static char *vfio_devnode(struct device *dev, umode_t *mode) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ac8d488..24579a0 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver( TYPE tmp; \ offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \ +/* + * External user API + */ +extern struct vfio_group *vfio_group_get_external_user(struct file *filep); +extern void vfio_group_put_external_user(struct vfio_group *group); +extern int vfio_external_user_iommu_id(struct vfio_group *group); + #endif /* VFIO_H */ -- To unsubscribe from this list: send the line unsubscribe
RE: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 23, 2013 12:18 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Alexander Graf; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/21/2013 11:39:45 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Thursday, July 18, 2013 11:09 PM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 12:32:18 PM, Alexander Graf wrote: On 18.07.2013, at 19:17, Scott Wood wrote: On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote: Likewise, we want to make sure this matches the host entry. Unfortunately, this is a bit of a mess already. 64-bit booke appears to always set MAS2_M for TLB0 mappings. The initial KERNELBASE mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC) replaces it uses _PAGE_COHERENT. 32-bit always uses _PAGE_COHERENT, except that initial KERNELBASE mapping. _PAGE_COHERENT appears to be set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case). As for what we actually want to happen, there are cases when we want M to be set for non-SMP. One such case is AMP, where CPUs may be sharing memory even if the Linux instance only runs on one CPU (this is not hypothetical, BTW). It's also possible that we encounter a hardware bug that requires MAS2_M, similar to what some of our non-booke chips require. How about we always set M then for RAM? M is like I in that bad things happen if you mix them. I am trying to list the invalid mixing of WIMG: 1) I M 2) W I 3) W M (Scott mentioned that he observed issues when mixing these two) 4) is there any other? That's not what I was talking about (and I don't think I mentioned W at all, though it is also potentially problematic). Here is cut paste of your one response: The architecture makes it illegal to mix cacheable and cache-inhibited mappings to the same physical page. Mixing W or M bits is generally bad as well. I've seen it cause machine checks, error interrupts, etc. -- not just corrupting the page in question. So I added not mixing W M. But at that time I missed to understood why mixing M I for same physical address can be issue :). I'm talking about mixing I with not-I (on two different virtual addresses pointing to the same physical), M with not-M, etc. When we say all RAM (page_is_ram() is true) will be having M bit, then RAM physical address will not have M mixed with any other, right? Similarly, For IO (which is not RAM), we will set I+G, so I will not be mixed with M. Is not that? -Bharat So we really want to match exactly what the rest of the kernel is doing. How the rest of kernel is doing is a bit complex. IIUC, if we forget about the boot state then this is how kernel set WIMG bits: 1) For Memory always set M if CONFIG_SMP set. - So KVM can do same. M will not be mixed with W and I. G and E are guest control. I don't think this is accurate for 64-bit. And what about the AMP case? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html