Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-13 Thread Kirti Wankhede




On 5/6/2020 4:24 PM, Cornelia Huck wrote:

On Mon, 4 May 2020 21:28:56 +0530
Kirti Wankhede  wrote:


VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
   copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Fixed below error by changing pgsize type from uint64_t to size_t.
Reported-by: kbuild test robot 

All errors:
drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3'

drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3'


Move that below the '---' delimiter so that it does not end up in the
commit? (Crediting the build bot is fine, but the details are not
really useful when you look at the code later.)



ok, removing errors.



Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  drivers/vfio/vfio_iommu_type1.c | 266 +++-
  1 file changed, 260 insertions(+), 6 deletions(-)



@@ -2278,6 +2435,93 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
  
  		return copy_to_user((void __user *)arg, , minsz) ?

-EFAULT : 0;
+   } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) {
+   struct vfio_iommu_type1_dirty_bitmap dirty;
+   uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
+   VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
+   VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
+   int ret = 0;
+
+   if (!iommu->v2)
+   return -EACCES;
+
+   minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
+   flags);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (dirty.argsz < minsz || dirty.flags & ~mask)
+   return -EINVAL;
+
+   /* only one flag should be set at a time */
+   if (__ffs(dirty.flags) != __fls(dirty.flags))
+   return -EINVAL;
+


Shouldn't you also check whether the flag that is set is actually
valid? (maybe dirty.flags & ~VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK and do a
switch/case over dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK)



There is a check above this check, dirty.flags & ~mask, which makes sure 
that flag is valid.


Thanks,
Kirti




Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-07 Thread Alex Williamson
On Thu, 7 May 2020 01:12:25 +0530
Kirti Wankhede  wrote:

> On 5/6/2020 1:45 PM, Yan Zhao wrote:
> > On Mon, May 04, 2020 at 11:58:56PM +0800, Kirti Wankhede wrote:  
> 
> 
> 
> >>   /*
> >>* Helper Functions for host iova-pfn list
> >>*/
> >> @@ -567,6 +654,18 @@ static int vfio_iommu_type1_pin_pages(void 
> >> *iommu_data,
> >>vfio_unpin_page_external(dma, iova, do_accounting);
> >>goto pin_unwind;
> >>}
> >> +
> >> +  if (iommu->dirty_page_tracking) {
> >> +  unsigned long pgshift =
> >> +   __ffs(vfio_pgsize_bitmap(iommu));
> >> +  
> > hi Kirti,
> > may I know if there's any vfio_pin_pages() happpening during NVidia's vGPU 
> > migration?
> > the code would enter into deadlock as I reported in last version.
> >   
> 
> Hm, you are right and same is the case in vfio_iommu_type1_dma_rw_chunk().
> 
> Instead of calling vfio_pgsize_bitmap() from lots of places, I'm 
> thinking of saving pgsize_bitmap in struct vfio_iommu, which should be 
> populated whenever domain_list is updated. Alex, will that be fine?

I've wondered why we don't already cache this, so yes, that's fine, but
the cached value will only be valid when evaluated under iommu->lock.
Thanks,

Alex




Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-06 Thread Kirti Wankhede




On 5/6/2020 1:45 PM, Yan Zhao wrote:

On Mon, May 04, 2020 at 11:58:56PM +0800, Kirti Wankhede wrote:





  /*
   * Helper Functions for host iova-pfn list
   */
@@ -567,6 +654,18 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
vfio_unpin_page_external(dma, iova, do_accounting);
goto pin_unwind;
}
+
+   if (iommu->dirty_page_tracking) {
+   unsigned long pgshift =
+__ffs(vfio_pgsize_bitmap(iommu));
+

hi Kirti,
may I know if there's any vfio_pin_pages() happpening during NVidia's vGPU 
migration?
the code would enter into deadlock as I reported in last version.



Hm, you are right and same is the case in vfio_iommu_type1_dma_rw_chunk().

Instead of calling vfio_pgsize_bitmap() from lots of places, I'm 
thinking of saving pgsize_bitmap in struct vfio_iommu, which should be 
populated whenever domain_list is updated. Alex, will that be fine?


Thanks,
Kirti



Thanks
Yan





Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-06 Thread Cornelia Huck
On Mon, 4 May 2020 21:28:56 +0530
Kirti Wankhede  wrote:

> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Fixed below error by changing pgsize type from uint64_t to size_t.
> Reported-by: kbuild test robot 
> 
> All errors:
> drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3'
> 
> drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3'

Move that below the '---' delimiter so that it does not end up in the
commit? (Crediting the build bot is fine, but the details are not
really useful when you look at the code later.)

> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 266 
> +++-
>  1 file changed, 260 insertions(+), 6 deletions(-)

> @@ -2278,6 +2435,93 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   return copy_to_user((void __user *)arg, , minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) {
> + struct vfio_iommu_type1_dirty_bitmap dirty;
> + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START |
> + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP |
> + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
> + int ret = 0;
> +
> + if (!iommu->v2)
> + return -EACCES;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
> + flags);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (dirty.argsz < minsz || dirty.flags & ~mask)
> + return -EINVAL;
> +
> + /* only one flag should be set at a time */
> + if (__ffs(dirty.flags) != __fls(dirty.flags))
> + return -EINVAL;
> +

Shouldn't you also check whether the flag that is set is actually
valid? (maybe dirty.flags & ~VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK and do a
switch/case over dirty.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_MASK)




Re: [PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-06 Thread Yan Zhao
On Mon, May 04, 2020 at 11:58:56PM +0800, Kirti Wankhede wrote:
> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Fixed below error by changing pgsize type from uint64_t to size_t.
> Reported-by: kbuild test robot 
> 
> All errors:
> drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3'
> 
> drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3'
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 266 
> +++-
>  1 file changed, 260 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index fa735047b04d..01dcb417836f 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;
>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,21 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */
> +#define DIRTY_BITMAP_PAGES_MAX((u64)INT_MAX)
> +#define DIRTY_BITMAP_SIZE_MAX 
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(>node, >dma_list);
>  }
>  
> +
> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, size_t pgsize)
> +{
> + uint64_t npages = dma->size / pgsize;
> +
> + if (npages > DIRTY_BITMAP_PAGES_MAX)
> + return -EINVAL;
> +
> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free(struct vfio_dma *dma)
> +{
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> +}
> +
> +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize)
> +{
> + struct rb_node *p;
> +
> + if (RB_EMPTY_ROOT(>pfn_list))
> + return;
> +
> + for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
> +
> + bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
> + }
> +}
> +
> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + int ret;
> +
> + ret = vfio_dma_bitmap_alloc(dma, pgsize);
> + if (ret) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + vfio_dma_bitmap_free(dma);
> + }
> + return ret;
> + 

[PATCH Kernel v18 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-05-04 Thread Kirti Wankhede
VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Fixed below error by changing pgsize type from uint64_t to size_t.
Reported-by: kbuild test robot 

All errors:
drivers/vfio/vfio_iommu_type1.c:197: undefined reference to `__udivdi3'

drivers/vfio/vfio_iommu_type1.c:225: undefined reference to `__udivdi3'

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 266 +++-
 1 file changed, 260 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index fa735047b04d..01dcb417836f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
 };
 
 struct vfio_domain {
@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
 };
 
 struct vfio_group {
@@ -125,7 +127,21 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
+
+/*
+ * Input argument of number of bits to bitmap_set() is unsigned integer, which
+ * further casts to signed integer for unaligned multi-bit operation,
+ * __bitmap_set().
+ * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
+ * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
+ * system.
+ */
+#define DIRTY_BITMAP_PAGES_MAX  ((u64)INT_MAX)
+#define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
+
 static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
 /*
  * This code handles mapping and unmapping of user data buffers
@@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
 }
 
+
+static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, size_t pgsize)
+{
+   uint64_t npages = dma->size / pgsize;
+
+   if (npages > DIRTY_BITMAP_PAGES_MAX)
+   return -EINVAL;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static void vfio_dma_bitmap_free(struct vfio_dma *dma)
+{
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+}
+
+static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize)
+{
+   struct rb_node *p;
+
+   if (RB_EMPTY_ROOT(>pfn_list))
+   return;
+
+   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
+
+   bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+}
+
+static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, size_t pgsize)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   int ret;
+
+   ret = vfio_dma_bitmap_alloc(dma, pgsize);
+   if (ret) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   vfio_dma_bitmap_free(dma);
+   }
+   return ret;
+   }
+   vfio_dma_populate_bitmap(dma, pgsize);
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct