[PATCH] arch: powerpc: kvm: add signed type cast for comparation

2013-07-22 Thread Chen Gang
'rmls' is 'unsigned long', lpcr_rmls() will return negative number when
failure occurs, so it need a type cast for comparing.

'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number
when failure occurs, so it need a type cast for comparing.


Signed-off-by: Chen Gang gang.c...@asianux.com
---
 arch/powerpc/kvm/book3s_hv.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2efa9dd..7629cd3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1809,7 +1809,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
rma_size = PAGE_SHIFT;
rmls = lpcr_rmls(rma_size);
err = -EINVAL;
-   if (rmls  0) {
+   if ((long)rmls  0) {
pr_err(KVM: Can't use RMA of 0x%lx bytes\n, rma_size);
goto out_srcu;
}
@@ -1874,7 +1874,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
/* Allocate the guest's logical partition ID */
 
lpid = kvmppc_alloc_lpid();
-   if (lpid  0)
+   if ((long)lpid  0)
return -ENOMEM;
kvm-arch.lpid = lpid;
 
-- 
1.7.7.6
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-22 Thread Scott Wood

On 07/21/2013 11:39:45 PM, Bhushan Bharat-R65777 wrote:



 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, July 18, 2013 11:09 PM
 To: Alexander Graf
 Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org;  
k...@vger.kernel.org; Bhushan

 Bharat-R65777
 Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only  
for kernel

 managed pages

 On 07/18/2013 12:32:18 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 19:17, Scott Wood wrote:
 
   On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote:
   Likewise, we want to make sure this matches the host entry.
  Unfortunately, this is a bit of a mess already.  64-bit booke  
appears

  to always set MAS2_M for TLB0 mappings.  The initial KERNELBASE
  mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC)
  replaces it uses _PAGE_COHERENT.  32-bit always uses  
_PAGE_COHERENT,
  except that initial KERNELBASE mapping.  _PAGE_COHERENT appears  
to be

  set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config
  clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case).
  
   As for what we actually want to happen, there are cases when we
  want M to be set for non-SMP.  One such case is AMP, where CPUs  
may be
  sharing memory even if the Linux instance only runs on one CPU  
(this

  is not hypothetical, BTW).  It's also possible that we encounter a
  hardware bug that requires MAS2_M, similar to what some of our
  non-booke chips require.
 
  How about we always set M then for RAM?

 M is like I in that bad things happen if you mix them.

I am trying to list the invalid mixing of WIMG:

 1) I  M
 2) W  I
 3) W  M (Scott mentioned that he observed issues when  mixing these  
two)

 4) is there any other?


That's not what I was talking about (and I don't think I mentioned W at  
all, though it is also potentially problematic).  I'm talking about  
mixing I with not-I (on two different virtual addresses pointing to the  
same physical), M with not-M, etc.



  So we really want to
 match exactly what the rest of the kernel is doing.

How the rest of kernel is doing is a bit complex. IIUC, if we forget  
about the boot state then this is how kernel set WIMG bits:

 1) For Memory always set M if CONFIG_SMP set.
	- So KVM can do same. M will not be mixed with W and I. G  
and E are guest control.


I don't think this is accurate for 64-bit.  And what about the AMP case?

-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PPC: Don't sync timebase when inside VM

2013-07-22 Thread Scott Wood
On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote:
 When running inside a virtual machine, we can not modify timebase, so
 let's just not call the functions for it then.
 
 This resolves hangs when booting e500 SMP guests on overcommitted hosts.
 
 Reported-by: Stuart Yoder b08...@freescale.com
 Signed-off-by: Alexander Graf ag...@suse.de
 
 ---
 arch/powerpc/platforms/85xx/smp.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/platforms/85xx/smp.c 
 b/arch/powerpc/platforms/85xx/smp.c
 index ff42490..d4b6c1f 100644
 --- a/arch/powerpc/platforms/85xx/smp.c
 +++ b/arch/powerpc/platforms/85xx/smp.c
 @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void)
   smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
   }
  
 + /* When running under a hypervisor, we can not modify tb */
 + np = of_find_node_by_path(/hypervisor);
 + if (np) {
 + smp_85xx_ops.give_timebase = NULL;
 + smp_85xx_ops.take_timebase = NULL;
 + }

I'm marking this superseded as we now only set give/take_timebase if a
guts node is present that corresponds to an SMP SoC.  QEMU currently
advertises an mpc8544 guts (which is not SMP) and will eventually move to
a paravirt device with no guts at all.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-22 Thread Alexey Kardashevskiy
Ping, anyone, please?

Ben needs ack from any of MM people before proceeding with this patch. Thanks!


On 07/16/2013 10:53 AM, Alexey Kardashevskiy wrote:
 The current VFIO-on-POWER implementation supports only user mode
 driven mapping, i.e. QEMU is sending requests to map/unmap pages.
 However this approach is really slow, so we want to move that to KVM.
 Since H_PUT_TCE can be extremely performance sensitive (especially with
 network adapters where each packet needs to be mapped/unmapped) we chose
 to implement that as a fast hypercall directly in real
 mode (processor still in the guest context but MMU off).
 
 To be able to do that, we need to provide some facilities to
 access the struct page count within that real mode environment as things
 like the sparsemem vmemmap mappings aren't accessible.
 
 This adds an API to increment/decrement page counter as
 get_user_pages API used for user mode mapping does not work
 in the real mode.
 
 CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.
 
 Cc: linux...@kvack.org
 Reviewed-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Changes:
 2013/07/10:
 * adjusted comment (removed sentence about virtual mode)
 * get_page_unless_zero replaced with atomic_inc_not_zero to minimize
 effect of a possible get_page_unless_zero() rework (if it ever happens).
 
 2013/06/27:
 * realmode_get_page() fixed to use get_page_unless_zero(). If failed,
 the call will be passed from real to virtual mode and safely handled.
 * added comment to PageCompound() in include/linux/page-flags.h.
 
 2013/05/20:
 * PageTail() is replaced by PageCompound() in order to have the same checks
 for whether the page is huge in realmode_get_page() and realmode_put_page()
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/include/asm/pgtable-ppc64.h |  4 ++
  arch/powerpc/mm/init_64.c| 76 
 +++-
  include/linux/page-flags.h   |  4 +-
  3 files changed, 82 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
 b/arch/powerpc/include/asm/pgtable-ppc64.h
 index 46db094..aa7b169 100644
 --- a/arch/powerpc/include/asm/pgtable-ppc64.h
 +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
 @@ -394,6 +394,10 @@ static inline void mark_hpte_slot_valid(unsigned char 
 *hpte_slot_array,
   hpte_slot_array[index] = hidx  4 | 0x1  3;
  }
  
 +struct page *realmode_pfn_to_page(unsigned long pfn);
 +int realmode_get_page(struct page *page);
 +int realmode_put_page(struct page *page);
 +
  static inline char *get_hpte_slot_array(pmd_t *pmdp)
  {
   /*
 diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
 index d0cd9e4..dcbb806 100644
 --- a/arch/powerpc/mm/init_64.c
 +++ b/arch/powerpc/mm/init_64.c
 @@ -300,5 +300,79 @@ void vmemmap_free(unsigned long start, unsigned long end)
  {
  }
  
 -#endif /* CONFIG_SPARSEMEM_VMEMMAP */
 +/*
 + * We do not have access to the sparsemem vmemmap, so we fallback to
 + * walking the list of sparsemem blocks which we already maintain for
 + * the sake of crashdump. In the long run, we might want to maintain
 + * a tree if performance of that linear walk becomes a problem.
 + *
 + * Any of realmode_ functions can fail due to:
 + * 1) As real sparsemem blocks do not lay in RAM continously (they
 + * are in virtual address space which is not available in the real mode),
 + * the requested page struct can be split between blocks so get_page/put_page
 + * may fail.
 + * 2) When huge pages are used, the get_page/put_page API will fail
 + * in real mode as the linked addresses in the page struct are virtual
 + * too.
 + */
 +struct page *realmode_pfn_to_page(unsigned long pfn)
 +{
 + struct vmemmap_backing *vmem_back;
 + struct page *page;
 + unsigned long page_size = 1  mmu_psize_defs[mmu_vmemmap_psize].shift;
 + unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
  
 + for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back-list) {
 + if (pg_va  vmem_back-virt_addr)
 + continue;
 +
 + /* Check that page struct is not split between real pages */
 + if ((pg_va + sizeof(struct page)) 
 + (vmem_back-virt_addr + page_size))
 + return NULL;
 +
 + page = (struct page *) (vmem_back-phys + pg_va -
 + vmem_back-virt_addr);
 + return page;
 + }
 +
 + return NULL;
 +}
 +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
 +
 +#elif defined(CONFIG_FLATMEM)
 +
 +struct page *realmode_pfn_to_page(unsigned long pfn)
 +{
 + struct page *page = pfn_to_page(pfn);
 + return page;
 +}
 +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
 +
 +#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
 +
 +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
 +int 

Re: [PATCH 03/10] vfio: add external user support

2013-07-22 Thread Alex Williamson
On Tue, 2013-07-16 at 10:53 +1000, Alexey Kardashevskiy wrote:
 VFIO is designed to be used via ioctls on file descriptors
 returned by VFIO.
 
 However in some situations support for an external user is required.
 The first user is KVM on PPC64 (SPAPR TCE protocol) which is going to
 use the existing VFIO groups for exclusive access in real/virtual mode
 on a host to avoid passing map/unmap requests to the user space which
 would made things pretty slow.
 
 The protocol includes:
 
 1. do normal VFIO init operation:
   - opening a new container;
   - attaching group(s) to it;
   - setting an IOMMU driver for a container.
 When IOMMU is set for a container, all groups in it are
 considered ready to use by an external user.
 
 2. User space passes a group fd to an external user.
 The external user calls vfio_group_get_external_user()
 to verify that:
   - the group is initialized;
   - IOMMU is set for it.
 If both checks passed, vfio_group_get_external_user()
 increments the container user counter to prevent
 the VFIO group from disposal before KVM exits.
 
 3. The external user calls vfio_external_user_iommu_id()
 to know an IOMMU ID. PPC64 KVM uses it to link logical bus
 number (LIOBN) with IOMMU ID.
 
 4. When the external KVM finishes, it calls
 vfio_group_put_external_user() to release the VFIO group.
 This call decrements the container user counter.
 Everything gets released.
 
 The vfio: Limit group opens patch is also required for the consistency.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

This looks fine to me.  Is the plan to add this through the ppc tree
again?  Thanks,

Alex

 ---
 Changes:
 2013/07/11:
 * added vfio_group_get()/vfio_group_put()
 * protocol description changed
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  drivers/vfio/vfio.c  | 62 
 
  include/linux/vfio.h |  7 ++
  2 files changed, 69 insertions(+)
 
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index c488da5..58b034b 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -1370,6 +1370,68 @@ static const struct file_operations vfio_device_fops = 
 {
  };
  
  /**
 + * External user API, exported by symbols to be linked dynamically.
 + *
 + * The protocol includes:
 + *  1. do normal VFIO init operation:
 + *   - opening a new container;
 + *   - attaching group(s) to it;
 + *   - setting an IOMMU driver for a container.
 + * When IOMMU is set for a container, all groups in it are
 + * considered ready to use by an external user.
 + *
 + * 2. User space passes a group fd to an external user.
 + * The external user calls vfio_group_get_external_user()
 + * to verify that:
 + *   - the group is initialized;
 + *   - IOMMU is set for it.
 + * If both checks passed, vfio_group_get_external_user()
 + * increments the container user counter to prevent
 + * the VFIO group from disposal before KVM exits.
 + *
 + * 3. The external user calls vfio_external_user_iommu_id()
 + * to know an IOMMU ID.
 + *
 + * 4. When the external KVM finishes, it calls
 + * vfio_group_put_external_user() to release the VFIO group.
 + * This call decrements the container user counter.
 + */
 +struct vfio_group *vfio_group_get_external_user(struct file *filep)
 +{
 + struct vfio_group *group = filep-private_data;
 +
 + if (filep-f_op != vfio_group_fops)
 + return ERR_PTR(-EINVAL);
 +
 + if (!atomic_inc_not_zero(group-container_users))
 + return ERR_PTR(-EINVAL);
 +
 + if (!group-container-iommu_driver ||
 + !vfio_group_viable(group)) {
 + atomic_dec(group-container_users);
 + return ERR_PTR(-EINVAL);
 + }
 +
 + vfio_group_get(group);
 +
 + return group;
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 +
 +void vfio_group_put_external_user(struct vfio_group *group)
 +{
 + vfio_group_put(group);
 + vfio_group_try_dissolve_container(group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 +
 +int vfio_external_user_iommu_id(struct vfio_group *group)
 +{
 + return iommu_group_id(group-iommu_group);
 +}
 +EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id);
 +
 +/**
   * Module/class support
   */
  static char *vfio_devnode(struct device *dev, umode_t *mode)
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index ac8d488..24579a0 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -90,4 +90,11 @@ extern void vfio_unregister_iommu_driver(
   TYPE tmp;   \
   offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); }) \
  
 +/*
 + * External user API
 + */
 +extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 +extern void vfio_group_put_external_user(struct vfio_group *group);
 +extern int vfio_external_user_iommu_id(struct vfio_group *group);
 +
  #endif /* VFIO_H */



--
To unsubscribe from this list: send the line unsubscribe 

RE: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-22 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 23, 2013 12:18 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; Alexander Graf; kvm-ppc@vger.kernel.org;
 k...@vger.kernel.org
 Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only for kernel
 managed pages
 
 On 07/21/2013 11:39:45 PM, Bhushan Bharat-R65777 wrote:
 
 
   -Original Message-
   From: Wood Scott-B07421
   Sent: Thursday, July 18, 2013 11:09 PM
   To: Alexander Graf
   Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org;
  k...@vger.kernel.org; Bhushan
   Bharat-R65777
   Subject: Re: [PATCH 2/2 v2] kvm: powerpc: set cache coherency only
  for kernel
   managed pages
  
   On 07/18/2013 12:32:18 PM, Alexander Graf wrote:
   
On 18.07.2013, at 19:17, Scott Wood wrote:
   
 On 07/18/2013 08:19:03 AM, Bharat Bhushan wrote:
 Likewise, we want to make sure this matches the host entry.
Unfortunately, this is a bit of a mess already.  64-bit booke
  appears
to always set MAS2_M for TLB0 mappings.  The initial KERNELBASE
mapping on boot uses M_IF_SMP, and the settlbcam() that (IIRC)
replaces it uses _PAGE_COHERENT.  32-bit always uses
  _PAGE_COHERENT,
except that initial KERNELBASE mapping.  _PAGE_COHERENT appears
  to be
set based on CONFIG_SMP || CONFIG_PPC_STD_MMU (the latter config
clears _PAGE_COHERENT in the non-CPU_FTR_NEED_COHERENT case).

 As for what we actually want to happen, there are cases when we
want M to be set for non-SMP.  One such case is AMP, where CPUs
  may be
sharing memory even if the Linux instance only runs on one CPU
  (this
is not hypothetical, BTW).  It's also possible that we encounter a
hardware bug that requires MAS2_M, similar to what some of our
non-booke chips require.
   
How about we always set M then for RAM?
  
   M is like I in that bad things happen if you mix them.
 
  I am trying to list the invalid mixing of WIMG:
 
   1) I  M
   2) W  I
   3) W  M (Scott mentioned that he observed issues when  mixing these
  two)
   4) is there any other?
 
 That's not what I was talking about (and I don't think I mentioned W at all,
 though it is also potentially problematic).

Here is cut paste of your one response:
The architecture makes it illegal to mix cacheable and cache-inhibited  
mappings to the same physical page.  Mixing W or M bits is generally  
bad as well.  I've seen it cause machine checks, error interrupts, etc.  
-- not just corrupting the page in question.

So I added not mixing W  M. But at that time I missed to understood why mixing 
M  I for same physical address can be issue :).

  I'm talking about mixing I with
 not-I (on two different virtual addresses pointing to the same physical), M 
 with
 not-M, etc.

When we say all RAM (page_is_ram() is true) will be having M bit, then RAM 
physical address will not have M mixed with any other, right?

Similarly, For IO (which is not RAM), we will set I+G, so I will not be 
mixed with M. Is not that?

-Bharat

 
So we really want to
   match exactly what the rest of the kernel is doing.
 
  How the rest of kernel is doing is a bit complex. IIUC, if we forget
  about the boot state then this is how kernel set WIMG bits:
   1) For Memory always set M if CONFIG_SMP set.
  - So KVM can do same. M will not be mixed with W and I. G and E
  are guest control.
 
 I don't think this is accurate for 64-bit.  And what about the AMP case?
 
 -Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html