Re: [PATCH kernel v11 32/34] powerpc/mmu: Add userspace-to-physical addresses translation cache

2015-06-01 Thread David Gibson
On Fri, May 29, 2015 at 06:44:56PM +1000, Alexey Kardashevskiy wrote:
> We are adding support for DMA memory pre-registration to be used in
> conjunction with VFIO. The idea is that the userspace which is going to
> run a guest may want to pre-register a user space memory region so
> it all gets pinned once and never goes away. Having this done,
> a hypervisor will not have to pin/unpin pages on every DMA map/unmap
> request. This is going to help with multiple pinning of the same memory.
> 
> Another use of it is in-kernel real mode (mmu off) acceleration of
> DMA requests where real time translation of guest physical to host
> physical addresses is non-trivial and may fail as linux ptes may be
> temporarily invalid. Also, having cached host physical addresses
> (compared to just pinning at the start and then walking the page table
> again on every H_PUT_TCE), we can be sure that the addresses which we put
> into TCE table are the ones we already pinned.
> 
> This adds a list of memory regions to mm_context_t. Each region consists
> of a header and a list of physical addresses. This adds API to:
> 1. register/unregister memory regions;
> 2. do final cleanup (which puts all pre-registered pages);
> 3. do userspace to physical address translation;
> 4. manage usage counters; multiple registration of the same memory
> is allowed (once per container).
> 
> This implements 2 counters per registered memory region:
> - @mapped: incremented on every DMA mapping; decremented on unmapping;
> initialized to 1 when a region is just registered; once it becomes zero,
> no more mappings allowe;
> - @used: incremented on every "register" ioctl; decremented on
> "unregister"; unregistration is allowed for DMA mapped regions unless
> it is the very last reference. For the very last reference this checks
> that the region is still mapped and returns -EBUSY so the userspace
> gets to know that memory is still pinned and unregistration needs to
> be retried; @used remains 1.
> 
> Host physical addresses are stored in vmalloc'ed array. In order to
> access these in the real mode (mmu off), there is a real_vmalloc_addr()
> helper. In-kernel acceleration patchset will move it from KVM to MMU code.
> 
> Signed-off-by: Alexey Kardashevskiy 

There's a bunch of ways this could be cleaned up, but those can be
done later.  So,

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpNVjpols48V.pgp
Description: PGP signature


[PATCH kernel v11 32/34] powerpc/mmu: Add userspace-to-physical addresses translation cache

2015-05-29 Thread Alexey Kardashevskiy
We are adding support for DMA memory pre-registration to be used in
conjunction with VFIO. The idea is that the userspace which is going to
run a guest may want to pre-register a user space memory region so
it all gets pinned once and never goes away. Having this done,
a hypervisor will not have to pin/unpin pages on every DMA map/unmap
request. This is going to help with multiple pinning of the same memory.

Another use of it is in-kernel real mode (mmu off) acceleration of
DMA requests where real time translation of guest physical to host
physical addresses is non-trivial and may fail as linux ptes may be
temporarily invalid. Also, having cached host physical addresses
(compared to just pinning at the start and then walking the page table
again on every H_PUT_TCE), we can be sure that the addresses which we put
into TCE table are the ones we already pinned.

This adds a list of memory regions to mm_context_t. Each region consists
of a header and a list of physical addresses. This adds API to:
1. register/unregister memory regions;
2. do final cleanup (which puts all pre-registered pages);
3. do userspace to physical address translation;
4. manage usage counters; multiple registration of the same memory
is allowed (once per container).

This implements 2 counters per registered memory region:
- @mapped: incremented on every DMA mapping; decremented on unmapping;
initialized to 1 when a region is just registered; once it becomes zero,
no more mappings allowe;
- @used: incremented on every "register" ioctl; decremented on
"unregister"; unregistration is allowed for DMA mapped regions unless
it is the very last reference. For the very last reference this checks
that the region is still mapped and returns -EBUSY so the userspace
gets to know that memory is still pinned and unregistration needs to
be retried; @used remains 1.

Host physical addresses are stored in vmalloc'ed array. In order to
access these in the real mode (mmu off), there is a real_vmalloc_addr()
helper. In-kernel acceleration patchset will move it from KVM to MMU code.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v11:
* added mutex to protect adding and removing
* added mm_iommu_init() helper
* kref is removed, now there are an atomic counter (@mapped) and a mutex
(for @used)
* merged mm_iommu_alloc into mm_iommu_get and do check-and-alloc under
one mutex lock; mm_iommu_get() returns old @used value so the caller can
know if it needs to elevate locked_vm counter
* do locked_vm counting in mmu_context_hash64_iommu.c

v10:
* split mm_iommu_mapped_update into mm_iommu_mapped_dec + mm_iommu_mapped_inc
* mapped counter now keep one reference for itself and mm_iommu_mapped_inc()
can tell if the region is being released
* updated commit log

v8:
* s/mm_iommu_table_group_mem_t/struct mm_iommu_table_group_mem_t/
* fixed error fallback look (s/[i]/[j]/)
---
 arch/powerpc/include/asm/mmu-hash64.h  |   3 +
 arch/powerpc/include/asm/mmu_context.h |  16 ++
 arch/powerpc/kernel/setup_64.c |   3 +
 arch/powerpc/mm/Makefile   |   1 +
 arch/powerpc/mm/mmu_context_hash64.c   |   6 +
 arch/powerpc/mm/mmu_context_hash64_iommu.c | 297 +
 6 files changed, 326 insertions(+)
 create mode 100644 arch/powerpc/mm/mmu_context_hash64_iommu.c

diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index 1da6a81..a82f534 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -536,6 +536,9 @@ typedef struct {
/* for 4K PTE fragment support */
void *pte_frag;
 #endif
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+   struct list_head iommu_group_mem_list;
+#endif
 } mm_context_t;
 
 
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 73382eb..70a4f2a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -16,6 +16,22 @@
  */
 extern int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
 extern void destroy_context(struct mm_struct *mm);
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+struct mm_iommu_table_group_mem_t;
+
+extern bool mm_iommu_preregistered(void);
+extern long mm_iommu_get(unsigned long ua, unsigned long entries,
+   struct mm_iommu_table_group_mem_t **pmem);
+extern long mm_iommu_put(struct mm_iommu_table_group_mem_t *mem);
+extern void mm_iommu_init(mm_context_t *ctx);
+extern void mm_iommu_cleanup(mm_context_t *ctx);
+extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(unsigned long ua,
+   unsigned long size);
+extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
+   unsigned long ua, unsigned long *hpa);
+extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
+extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
+#endif
 
 extern void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next);
 extern voi